Literacy and Numeracy Narrative + Grading Contract Check-in (5-10 min)

Let’s check in on this assignment and the survey results from Grading Contract feedback.

Grading Contract Questions

  • More chances to gain extra credit on grading contract?
  • Only way to get higher than a B in class is to do extra credit assignments?
  • Increase grade level per one response post rather than 2

Examining Power and Examining Data (25 min)

Returning to our Key Terms from Chapter 1 & Response Post 2

Here again is our ongoing class glossary.

To get back to discussion norms that we talked about last time, I want you to take 5 minutes and review the discussion on Discord. Choose one thing that stuck out to you that was written by another classmate. You are going to mention that person’s name, what you heard that person saying in their post or comment, and then make a point that extends something they said.

Power

D’Ignazio and Klein explain that to examine power is to “nam[e] and explai[n] the forces of oppression that are so baked into our daily lives–and into our datasets, our databases, and our algorithms–that we often don’t even see them. Seeing oppression is especially hard for those of us who occupy positions of privilege.”

Understanding these forces put us in a much better position to ask critical questions during data collection, managing data sets, or using data sets created by others.

But how do we do this if it is difficult to intuitively see forces of oppression, especially if we are in positions of privilege? (e.g., cisgender, male, white, straight, able-bodied)

There’s not an easy answer here, but we are going to try out two things that have a good track record in an activity:

  • Collaborate with others. Drawing from a variety of experiences will help a group, collectively, strongly identify how power and oppression are at work.
  • Be ready with questions. Using a set of critical guiding questions can help figure out an approximate answer to a problem you are facing.

picture of question mark on chalk board

Questions from Data Feminism chapter:

  1. Who is doing the work (and who is not)?
    • Example: D’Ignazio and Klein cite the Amazon algorithm that was used to flag resumes for interviews, but the model was trained on data of previous applicants that heavily skewed male.
  2. Who benefits (and who is overlooked or actively harmed)?
    • Example: Missing data, like the data on femicide in Mexico cited by D’Ignazio and Klein. Data on murders was not sufficient in terms of the information of the victims that were prioritized, and this opened up opportunities for sexist propaganda to fill the gap, benefiting the status quo of Mexico’s law enforcement.
  3. What goals are prioritized (and what are not)?
    • Example: the Allegheny County Office of Children, Youth, and Families that prioritized the goal of efficiency of the bureaucracy because it oversampled poor families that were more likely to use public services. Because this agency did not have enough resources to best help families, they chose the goal of efficiency instead. The goal was reached, but in doing so, poor families were unfairly harmed and targeted.
  4. How does the matrix of domination help with these questions?

Textbox that reads "Collins' Matrix of Domination"

Using Patricia Hill Collins’ model for the four domains of the matrix of domination can also be helpful. In the section “Power and the Matrix of Domination,” the full model is provided in Table 1.1 and described in the text of the section.

It can help to think about how data collection and analysis can be influenced by or influence

  • the theory/intent behind laws and policies
  • how laws and policies are enforced
  • how oppressive ideas are inhibited or furthered
  • how they contribute to individual experiences with oppression.

(In the section “Data Science for Whom?” there is an example analysis of data with the different domains in mind in relation to the femicide data in Mexico starting with the sentence, “The most grave and urgent manifestation…”)

 

Asking Questions of Data Sets (30-45 min)

In groups of 3 or 4, you are going to receive a data set that I will share in your group’s text channel. I want you to, as a group, take some time to ask the above questions of your data set and be ready to discuss it with your group.

First, take 5 minutes to just get acquainted with what it is telling you. After 5 minutes, I will prompt you to join your group.

Once you join your group, discuss the following and be prepared to present it to the large group:

  1. An answer to each of the three above questions.
  2. Which, if any, of the four domains of Patricia Hill Collins’ matrix of domination are related to any of your answers. You can name any number of them you feel are applicable, just be ready to explain why!

Feel free to look not only at the data set itself and how it is structured, but also do some searching around the internet to find more information on the organization collecting the data, how it might have been funded, what uses it has been put to, any commentary on the data from others, etc.

You’ll have about 15-20 minutes to work with your group.

Groups 1 and 4: Look at the Long-Term Productivity Database. Learn more about this data set from the about page but also click this spreadsheet to look into the data itself. (some of the acronyms used here are explained on the about page linked).

Groups 2 and 5: Look at this cleaned data set about maternal mortality ratios across the world (you may have to open this in a new window and not a tab) produced by a user of Kaggle, which is an online community for data scientists. Go here to learn a little more about the data set and what the user did to it and where they got it from. The “Indicator” column tells you what the measure is (i.e., maternal mortalities per 100,000) and the “First Tooltip” column tells you the ratio (e.g., 638 per 100,000) with a confidence interval (e.g., 638 [427-1010] means that the likely true number for the year was between 427 and 1010). To learn more about confidence intervals, this source from Simply Psychology has a good explanation.

Groups 3 and 6: Look at the 2016 data set of Offenders’ Race by Offense Category published by the FBI. You have the option to download Excel documents of each table – here is the first table (you may have to open this in a new window instead of a new tab.  It is a little easier to look at a spreadsheet than the web page. To learn more about how this data is collected, go to the FBI’s page on how different categories are defined around the subject of crime at the time of 2016.

 

Next Time (2-5 min)