Paid Internship (2 min)
I was asked to pass along a paid internship opportunity with Trib Total Media. This is a new opportunity which is going to begin with a tour of Total Trib Media tentatively scheduled for Friday morning October 25 and will lead to the students interviewing for positions for spring 2020. There will be multiple positions. These positions will involve significant digital composition for this media outlet. If you are interested, let me know and I can pass along your contact information to the English department person coordinating this with Trib Total Media.
Continue from 9-12-2019 (10-15 min)
Start from “Try it out” subheader. Continue work with group from last class.
More (Technical) Principles (30 min)
Chapter 4 explored some more principles to consider when writing about numbers:
- Get to know your variables–and thus, how to write about them.
- Know the units of measurement–and thus, make sure you are writing about them in an understandable way, that the way you compare different results makes sense, etc.
- Examine the distribution of your variables–and thus, how typical or atypical any one value is and to give your reader a sense of the range of values to help with contextualizing your data.
- Consider setting standards/cutoffs–and thus, allowing a way to focus on the results that are relevant to the story you are telling and to help give your reader a sense of how typical or atypical any one value is.
- Pick an appropriate number of digits and decimal places–consider precision but also conventions (e.g., some newspapers might have a style guide, some disciplines have certain conventions) as well as readability.
Today, we are going to focus on looking at distributions and setting a standard, two things that pair nicely with the reading on writing about quantitative comparisons that we will cover on Thursday (9/19). The other elements of the chapter are also important! We talked about different types of variables the other day (and this is something we will talk about within the other things we are talking about today) and the other two, while important, are not super critical to the work we are doing in this class (let me know, though, if you are working across many different units of measurement or you are unsure how to deal with digits and decimal places).
Finally, I should mention that there are some things in this chapter (and not in this chapter) that are technically important but beyond the scope of this class. For instance, transforming variables into, say, logarithms can be helpful for certain kinds of data. There’s lots of stuff like that, but we are sticking with some basic stuff so we can focus on the writing. If you ever feel like there is anything that feels “off” about your analysis or your data, let me know. A good thing to do is to hedge, and acknowledge limitations in your analysis and dataset.
Getting Descriptive
Descriptive statistics are just the statistics, well, that describe your data. They help give a sense of what is “typical” in a distribution, the range of values, etc. Visuals also can help supplement numbers and prose that describe samples.
These are really useful!
Measures of Central Tendency: mean, median, mode
–Go back to Miller (especially pages 80-81): what are the pros and cons of each of these? What sorts of distributions of data do they work best for?
Visuals: While a measure of central tendency will give you a sense of a “typical” value in your distribution to compare against any one value or to give a reader a sense of the typical values overall, having an actual image of the distribution is also crucial. As you might have noticed, you might not know what the best measure of central tendency to use is until you look at your distribution.
Histograms and Box and Whisker Plots are helpful here. Histograms show the spread of numeric data with bars. Box and Whisker Plots show the 10th percentile, the 25th percentile, the median, the 75th percentile, the 90th percentile and then the minimums and maxiums as well as some outliers in between.
What are you looking for in these visuals to help determine which measure of central tendency to use?
Standard Deviation formula (image for educational purposes from: http://statisticslectures.com/topics/variancesample/):
Standard Deviation (plain English): Variance is just the sum of the difference between each value and the mean, squaring that, and then dividing it by the population size (if you have all values in existence) or by the sample size – 1 (to reflect that you do not have all values in existence). To make this more manageable, Standard Deviation is conventionally used which is just the square root of the variance.
You should interpret differences in values based on both what the distribution of your data look like and the standard deviation. For example, if there are a lot of values clustered around the mean creating a standard deviation of 1.07, a difference of two points between values will be really far a part in terms of spread (i.e., nearly two standard deviations away from one another). By contrast, if if the distribution is more spread out and, say, has a standard deviation of 2.18, a difference of two points between values would be within one standard deviation and would not be very far a part in terms of spread.
If you look at a histogram, and your distribution looks roughly like a bell curve (see figure below), then any value within 1 standard deviation will be among about 68% percent of the other values, anything within 2 standard deviations will be within about 95% of all values, and 3 standard deviations covers everything. Nothing is really ever a perfectly normal distribution, so you are kind of eyeballing it.
Interquartile Range and “Five Number Summary”: A quartile is a value that is above a certain percentage of other values recorded. There are four quartiles: Q1 (25%), Q2 (50%–i.e., the median), Q3 (75%), and Q4 (100%–i.e., the maximum). You might also look at the minimum value, and together, this would be the Five Number Summary to show the min/max as well as the spread between them. Sometimes, to show spread that is less affected by outliers, you might want to look at and share the interquartile range. This is simply the difference between Q3 and Q1.
A useful thing about IQR is that a convention in statistics to identify outliers is to multiply the IQR by 1.5 and then subtract that value from Q1 and then add IQR*1.5 to Q3. If any value is below the IQR*1.5 – Q1 or above IQR*1.5 + Q2, then you can consider this an outlier (though, of course, sometimes the eye test in a visual might agree or disagree–use best judgment).
Setting a standard: To help consider if a value is meaningful in terms of the story you are telling or the argument you are making, it is useful to set a standard to compare values against. In a vacuum, this would be the measure of central tendency of your sample. However, many of you are not working in a vacuum, so you might need more measures. For instance, you might compare the weather on September 17, 2019 against the mean temperature of September 17 across several decades. Some measures are adjusted by age or they use cutoffs for years (e.g., everything since 1982) for contextual reasons. Thinking about what standards or cutoffs to use depends on your getting to know your data, the context that surrounds it, and the goals of your project.
Activity:
Choose a variable in your dataset. Find each measure of central tendency, note the standard deviation, find the Five Number Summary, the interquartile range, and create a visual of the distribution. What is the best measure of central tendency to use do you think? Anything interesting going on? What measures of spread are helpful? How varied are the data? What do visuals help you see about what measure of central tendency to use or anything else notable in your data?
In-class Time on Project (20-30 min)
Since your draft is due next class, I wanted to give you some time to work in it now. I’ll come around and check in as needed.
Next Time
-Quantitative Comparisons
-Submit Public Writing Draft (bring in two print copies). ***IMPORTANT***: tell me which publication you envision submitting your piece to or the organization you are representing (i.e., if doing some non-profit or governmental public writing, you need to think about whom you are writing for).
-If time: more on signaling evaluation and design/accessibility