From Cleaning to Embracing (Pluralism) (20 minutes)
Last class we talked about how value judgments can be inherent in even simple decisions like cleaning data to prepare it for analysis (or, rather, to “continue” analysis, depending on what you felt about Au’s claim that cleaning *is* analysis).
Today, I want to continue that conversation but to all stages of analysis along with some strategies for addressing that problem through our reading of chapter 5 of Data Feminism and an analysis activity we do later in class.
In our text-channel for today, please respond briefly (with half-formed thoughts, no need for being polished here) to ONE of the following questions.
Do the following in your Discord post:
-Put the “number” of the question you are responding to at the beginning of your post.
-Then put your response after that.
Here are the questions prompts:
- How do you think D’Ignazio and Klein would have agreed with Au on cleaning as analysis?
- What does it mean to be a “stranger” in a data set? And what should we do about that as data analysts?
- Go back to your classmates’ Response Post 6 post on Discord for this week. Extend one of their own thoughts with your own in response (think of this as a draft for your comment due tonight, if you have a comment due tonight!).
Counting and Mean in Excel (10 minutes)
To get some practice being a bit pluralistic with our analysis work, let’s learn two formulas from Excel (and can also be used in Google Sheets) for counting specific kinds of numbers or text for qualitative/categorical/discrete data and for calculating a mean for continuous/discrete data.
If you have an interactive database, this won’t apply as much since you won’t be working within a spreadsheet of any kind–it would be likely the case that much of this is automatic.
Let me show you with the Airbnb database and we will go over the following 2 formulas:
=countif(range:range, “item to count”)
=average(range:range)
Now, there will be issues here since we are not taking into account how the sample of data was collected, the variability within the sample, and a few other things that will be important to consider in statistical analysis. We will get to that, though. For now, let’s just count, add, and divide a bit.
Let’s start with some counts.
Counting (=COUNTIF)
Here again is the Airbnb data set we have previously worked with. Try doing it along with me if you can:
- First, find out how many rows there are. There’s a shortcut for this, but this is just as easy: scroll down. How many rows are there? (48,896).
- Let’s use COUNTIF first. Let’s see how many times “Manhattan” is listed for the variable “neighborhood_group.” Find a cell and enter the follow: =countif(e2:e48896, “Manhattan”). Remember that computers are stupid. They won’t do what you ask unless you give them very precise instructions.
- Hit “enter” or “return”
- What number do you get? (should be 21,661).
- Congratulations! You are a programmer. You used a line of code to return an output.
- Interlude: what do you think about this amount? What is some “analysis” you could do here?
- Let’s try one more for “minimum_nights”. Enter the following: =countif(k2:k48896, 1). Strings–or any letters or words in combination–have to be in quotes. Numbers do not.
- Hit “enter” or “return”.
- What number do you get? (should be 12,720).
- Interlude: what do you think about this amount? What is some “analysis” you could do here?
- More information on counting stuff can be found here: Counting – ENG 4950: Data and Writing Toward Social Change, Spring 2021 (cuny.edu)
Mean (=AVERAGE)
Let’s stick with the Airbnb data set. This time we will take a mean of the prices for Airbnb options in NYC in 2019:
- Enter the following in a cell: =average(j2:j48896)
- Hit “enter” or “return.”
- What number do you get? (should be $152.72)
- For other measures of central tendency (which we will talk a little bit about next time in relation to distributions and variability), go here: Mode, Median, and Mean and Using in Excel – ENG 4950: Data and Writing Toward Social Change, Spring 2021 (cuny.edu)
Do some preliminary analysis (30-45 minutes)
Steps:
-choose variable and if you are counting or mean-ing
-Do a 2 minute reflection (private writing) on two questions
Question 1: What is it about who you are and what you know that is an asset to understanding this data?
Could be something very interesting or unique about your perspective (I’ve been drinking tap water that has been arguably polluted all of my life and am very aware of considerations of things to look for and think about when it come to safe or unsafe drinking water through lived experiences and personal research) but could also be very boring (I’ve been drinking water all my life)
Question 2: Also note what you *don’t* have that might be missing. How might you be a “stranger” in this data set?
-do a pass at cleaning if needed (and if time–for purpose of exercise, we don’t have to worry about it)
-do the analysis! How many? What is the average? What do you think about that? What questions are you left with?
-Come up with one possible take-away or further question.
-In all of this, consider your work in your Data Set Critical Biography.
Partner with someone else
Here are the partners/trios (can rework if we have absences):
Group 1: Aftar and Terence
Group 2: Kabilan and Usri
Group 3: Calvin and TJ
Group 4: Alvy and Najae
Group 5: Dora and Letycja
Group 6: Eva and Mike
Group 7: Inesa and Joanna
Group 8: Evelyn and Harshita
Group 9: Isabella, Jesus, and E’Longe
Give perspectives. Talk about what you found. Give your partner a chance to offer their perspective on your data set.
What was new and different by being more pluralistic? Were you both strangers? Was someone less of a stranger in the data set?
What to do about our “stranger-ness” that while still not being exploitative of others? In other words, what is to be done in a way that helps our analysis and helps others rather than asking for more labor, time, etc. from others? How do we avoid epistemic violence while also avoiding exploitation?
Masks (5-10 minutes)
We can move this to Discord text channels if we don’t have time to address in class. Let me report back on the survey results and let’s talk through what we want to do.
So far, I’m hearing concerns about our classroom size and the ventilation which makes some folks uncomfortable due to:
—-being around children who cannot be vaccinated
—-because of being immunocompromised themselves (and thus being at a much greater risk of having health complications by contracting COVID–remember, vaccination is not as effective for some people which can result in greater likelihood of hospitalization and severe outcomes while spread is still fairly high, like it currently is now at 8 new cases per 100k in NYC for weekly average).
—-having older family members and other family members with health issues who are susceptible to issues from getting COVID
I think the easiest solution to make everyone feel comfortable in this space is to continue to wear masks while we are in our current classroom.
I cannot require you to do so, so I am only asking if we can agree to do that.
If we can’t agree to that, that is within anyone’s rights on campus to disagree. At that stage, what I will try to do is to change classrooms to a larger classroom so we can better distance from one another. I cannot guarantee that I can do this, as I am imagining several other faculty are all currently trying to change their classrooms to larger classrooms since the mandate has been lifted.
Next Time (2-5 minutes)
-Read Chapter 3 of Data Feminism
-If signed up, complete Response Post 7. If not, complete a comment.
-Proposal is due March 15. Very informal. Can talk briefly now.