- Data is Plural. A newsletter that also has an archive of posts about data sets that are publicly available. This is often where I go first. To navigate the archive of posts, you just click on any of the markdown files that read as something like “2022-02-09.md” and you get the newsletter edition of that day with 3-5 different publicly available data sets and a blurb about each one.
- Kaggle. An online platform for data scientists that also has a section on publicly available data sets. You can access this data by getting a free account (you can also set it up through a Google Account rather than creating a new account). Use the search bar in the data set section to see what you can find.
- Awesome Data. Just an archive of really interesting and useful data sets. Scroll down and you will see how it is organized by topic. Click on topics of interest to see if anything is worth checking out.
- Data.gov. The U.S. government’s repository of publicly available data sets. It is a little hard to navigate, but there is some useful stuff here if you can figure that part out. It is also worth checking out any government agency that you think might collect data on a topic of interest (e.g., CDC, Health and Human Services, FBI). They might have some stuff on their websites.
- KD Nuggets. This has a link to several data repositories that could be of interest to you.
- Numlock News. A newsletter like Data is Plural, but has fewer data sets.
If you are stuck, here are some interesting Data Is Plural Newsletter Editions that have some useable data sets:
- The January 12th Data Is Plural Newsletter has some interesting stuff: newsletter-archive/2022-01-12.md at master · data-is-plural/newsletter-archive · GitHub. Check out the one about Congress people who were slaveholders…csv file here: data-congress-slaveowners/data at main · washingtonpost/data-congress-slaveowners · GitHub). There is also an interesting database about pension plans that is fairly user Friendly: Interactive Tools | Public Plans Data.
- The January 5th newsletter also has interesting stuff on civil asset forefeiture: Policing for Profit III Data – Institute for Justice (ij.org).
- The February 2 newsletter has a data set about distance people have to travel to abortion facilities. Get to this part to get the csv file: OSF | abortiondistances_countyxmonth_2009to2021.csv. There is also a data set in there on immigration populations in 1900, which could be interesting to compare to today for various reasons. Here is the file: File Finder · GitHub
- The February 9 newsletter has some interesting data, as well. This data set on people affected by President Trump’s travel ban in 2017 is useable and can be helpful for a project about immigration. Another project on climate funding is available for download here: National climate funds: a new dataset on national financing vehicles for climate change (figshare.com)
Here are some other ones from 2020, too:
- December 2, 2020 (student loans, coups)
- November 18, 2020 (income inequality, education and civil rights, state spending on children)
- November 11, 2020 (child detention, transit costs–lots on NYC there)
- October 28, 2020 (COVID-19 in ICE facilities)
- August 26, 2020 (protests around the world, ruling elites)
- August 12, 2020 (funding for COVID-19 relief, climate change in localized ways, international banking)
- July 29, 2020 (NYPD office misconduct)
- July 22, 2020 (police surveillance technology, Mexican migration to US)
- June 24, 2020 (new policing bills, indigenous lands, COVID-19 and childcare)
A good thing to do is to just click around though! Don’t just rely on what I offer here. Kaggle is particularly good to navigate with its search function.
Primary and Secondary Research
See below for explanation on difference between primary and secondary research, as well as tips for finding quality secondary research: Primary and Secondary Research – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
Research Tips in March 17 Lesson Plan: March 17 Lesson Plan – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
Distribution and Variability
Information on understanding the shape of your data and how that should impact choices in analysis from March 15 lesson plan: March 15, 2022 Lesson Plan – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
Excel Tips
A good video on how to use common Excel formulas: Top 10 Most Important Excel Formulas – Made Easy! – YouTube
Filtering and sorting: March 3, 2022 Lesson Plan – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
How to visualize and do other things in Excel to understand shape of data for analysis: March 15, 2022 Lesson Plan – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
Counting and taking the mean: March 8, 2022 Lesson Plan – Data and Writing Toward Social Change, Spring 2022 (cuny.edu)
Formula for taking the median (in between the parentheses, put cell ranges just like with the formula for taking the mean): =MEDIAN( )
Ongoing Class Glossary
We are going to keep a running list of terms to define from our readings and work in data science, critical theory, and elsewhere. Here is the link to our class glossary.