8-27-2019 WWD

Introductions (10 min)

Names (your preference–include pronouns if you’d like), major and/or interests, best fast food sandwich? (thinking of Popeye’s new chicken sandwich made me wonder about this)

Study (10-20 min)

I am conducting a study for this class. I’m going to hand out a form to sign if you wish to participate (no obligation to do so, will not affect your grade at all). If you’d like to participate, we will fill out a short pre-survey in class now.

  1. Fill out consent form if you wish to participate.
  2. If you fill out consent form, then fill out survey with it.
  3. If you do not wish to participate, do not fill out the consent form or the survey.

Background and Interest Survey (10-15 min)

In the survey you filled out, there were several different kinds of data. Lots of it we could quantify, pretty much every question but the last one. But even in the last one, if we used some qualitative methods to code it, we could quantify some aspects of it.

This class asks questions about how we communicate, argue, tell stories with some combination of data–and for this class, the focus is doing things with calculating stuff from the data we work with.

There are many ways to think about this. For instance, to start more generally, let’s talk what we could do with this. What might we do with the data generated from this survey? What sorts of stories or arguments could we make?

There are always constraints with writing and working with data. For instance, let’s talk variables:

  1. Categorical (e.g., types of food, gender identity, brands of clothing)
  2. Continuous (e.g., weight, time, distance)
  3. Discrete (e.g., number of rainy days in a year)
  4. Ordinal (e.g., low income/middle income/high income)
  5. Binary (e.g., yes/no)
  6. Qualitative (e.g., writing, speech, images, video, audio)

Going back to the survey, what type of data was each question? As a writer, what sorts of questions do you have to ask yourself in how you’d write about these different types of data?

The first two questions were categorical—types of major. Can add them up and do all sorts of things, but you have to think about things like assumptions about your labeling—do these categories make sense? Is something left out?

The question about GPA asks for a continuous variable—that is, there isn’t technically an end to it, there are many intervals in between each point (perhaps infinitely). Things to consider might be how to contextualize something like the mean—e.g., three people have a 4.0, two people have a 3.9, (23.8) and 14 people have a 1.8 exactly (for simplicity’s sake)…2.3684 is mean GPA, but are 5 out of 19 enough to be an outlier or no? How should you communicate that? Or, should you use median instead of mean, to have 1.8 as the average here? Should you communicate what you chose and why? Should we round it? To what place? Why?

The question on year is ordinal. Like a categorical variable, but there is an order to it. First-year comes before sophomore, and so on. Can you calculate the “average” of this by assigning a number to each? Does “the average was a ‘2.4’” make sense? What is a “sophomore.4” exactly represent? And is it true? Honestly, this is less of an issue, but something like “low income, middle income, high income” could be tricky—how do you justify that categorization? Is it a fair to represent the data in these three groups?

Amount of credits is a discrete variable. It is not infinite in the way a continuous variable is, but there is number attached. Can be similar issues here, but when working with discrete variables, when communicating, you might want to consider rounding since it would be “unnatural” to talk about taking 2.5 credits—similar issue to ordinal data. Further, as compared to continuous data, can get as fine-grained analysis in (see here for more information on differences between continuous and discrete data). Limits in inferences, need larger samples, limits in ways you can visualize. Still, if that is the data, that is the data. Don’t turn discrete data into continuous data when it makes no sense to do so.

These are just quick examples that show, in rather mathematical ways, how communication and analyzing data can require careful thought. But there are many other things to consider. Many other things! And we will think about them all semester.

One word we are going to use about communication questions like these is rhetoric.

Data and Rhetoric (10-15 min)

PowerPoint (see CourseWeb for this)

Syllabus (10-15 min)

Let’s go over the syllabus (see CourseWeb). I’ve updated it a few times since I sent out the email about it a couple weeks ago so be sure you have the latest version.

Homework (5-10 min)

  1. Download Anaconda so you can use the program I wrote to look at, analyze, and visualize data
  2. Do the reading
  3. Write up journal response–which is another source of data to use along with survey!

Steps to Download Anaconda

  1. Go to Anaconda—this is software that allows you to work with the programming language Python.
  2. Click the “download” button under “Python 3.7 version.”
  3. Follow installer instructions.
  4. Once Anaconda is installed, you should be able to just search “Jupyter Notebook” and run it. This is the program we will use to run some code I wrote to help analyze and visualize data. We will start digging into that on 8/29.
  5. I should note that the computers in our classroom have Anaconda installed, but you should have on your laptop as well so you don’t have to be here to work on your projects. Also, if you are savvy enough or you are more comfortable with proprietary software like SPSS, SAS, or Stata, feel free to use that. Or, to use something like R instead of what I have–that is also okay.