Distributions and Variability (20-30 minutes)

It is really important to know about how values are distributed in your data set or database. If you are using a database, you should be able get some visual or measure of variance to help you make meaning of your measures of central tendency or the shape of your data in general.

 

Using Visuals To See Distributions

Visuals can be really helpful here. Making a bar graph for categorical data or a histogram for numeric data can help you get a sense of things.

Categorical Data and Bar Graphs

Calculating the mode will probably be just as informative as seeing it as a bar graph. Still, sometimes visuals can allow us to see patterns better than alphanumeric writing.

Click here to learn more about this and how to make a bar chart in Excel.

 

Numeric Data and Histograms

Seeing the distributions of numerical data can be much more important. It is foundational to the most simple and complex statistical analyses.

Click here to learn about the different types of distributions.

How to create a Histogram in Excel

 

Variance, Standard Deviation, Interquartile Range

Variance is just the sum of the difference between each value and the mean, squaring that, and then dividing it by the population size (if you have all values in existence) or by the sample size – 1 (to reflect that you do not have all values in existence).

Standard Deviation

To make using variance more manageable, Standard Deviation is conventionally used which is just the square root of the variance.

It can be helpful to use standard deviation as a way to think about any one value compared against other values. For example, if there are a lot of values clustered around the mean creating a standard deviation of 1.07, a difference of two points between two values will be really far a part in terms of spread (i.e., nearly two standard deviations away from one another).

By contrast, if the distribution is more spread out and, say, has a standard deviation of 2.18, a difference of two points between values would be within one standard deviation and would not be very far a part in terms of spread. (previous example adapted from Jane Miller, 2015, The Chicago Guide to Writing about Numbers, Second Edition, pp. 79-81).

Click here on how to calculate standard deviation, quartiles, and interquartile range.

 

 

Quartiles and Interquartile Range

Interquartile range is another way to describe the variability in your numeric data. A quartile is a value that is above a certain percentage of other values recorded. There are four quartiles:

  • Q1 (25%)
  • Q2 (50%–i.e., the median)
  • Q3 (75%)
  • Q4 (100%–i.e., the maximum).

You could include all of that information, along with the minimum to get a sense of the spread of the data.

You could also calculate what is called the interquartile range, which can be especially helpful if you have outliers. Since standard deviation relies on the mean, if there are outliers, then the standard deviation may not be that useful in explaining the spread (because the outlier will greatly increase or decrease the standard deviation).

To calculate the interquartile range, you simply subtract Q3 from Q1.

Click here on how to calculate standard deviation, quartiles, and interquartile range.

A convention in statistics is to use the interquartile range as a way to help see if you have any outliers. To do so, multiply the interquartile range by 1.5.

Take that number and subtract it from Q1–if any number is below that result, then it is considered an outlier.

Additionally, take the IQR*1.5 figure and add it to Q3–if any numbers is above that result, then it is also considered an outlier.

Click here on how to calculate standard deviation, quartiles, and interquartile range.

Using Median and Mode vs. Mean

You can use all three common measures of central tendency, though, for continuous data–data that is numeric and can technically go on for ever (think decimal points). Discrete data–numeric data that does not go on forever, like the number of cloudy days in a year–can also use all three, but the mean might not be as useful as median and mode.

If outliers or really wide spread in data, it might be better to use median rather than mean.

 

Making Data Interesting (20-30 minutes)

Each of you was asked to pick one of the example data journalism pieces to read and be prepared to talk about:

  • what intrigued you most as a reader
  • why that intrigued you
  • and how moments of quantification were accessible

We are going to open up discussion about this, but let me give you 1 minute to collect your thoughts so you are ready to talk about it.

We are going to do a popcorn discussion. One person will go first and then choose someone else to respond. You can say something out loud or type in our Discord text thread for today. Not everyone will go since we won’t have time for everyone!

Re-Vision

Let’s look again at the data journalism piece you read.

Look for the following in your piece to see what grabbed you in relation to communicating about the data:

  • word choice
  • the way sentences were structured or emphasized things
  • images
  • storytelling
  • figurative language (e.g., metaphors, similes, synecdoche, metonymy)
  • examples
  • repetition
  • organization or layout or formatting

Find at least element of choices the writer make that made you pause, that made something understandable, that tapped into an emotional response of some kind.

Returning to our conversation from March 10 (e.g., 2% vs. 1 in 50), there are lots of choices we have when communicating, how should that influence how we communicate with data? Should we try to make things “interesting”? Or to stir emotions in some way? Why or why not?

 

Genre (10-15 minutes)

The long-form non-fiction piece can be structured in the sorts of ways these examples we looked at today are. General things to notice:

  • Usually context-setting in early paragraph that ties it to current news or events
  • Introducing the project that was done to collect data, how it was done, etc.
  • Quotes from experts throughout (i.e., secondary sources, interviews)
  • Written to entertain AND inform
  • Usually ends with some kind of “what’s next” angle to the topic, but it might be more muted or implied than explicitly stated.

Here are the three examples posted in today’s assignment instructions’ folder:

American Legion: Veteran Suicides

US Department of Agriculture: Climate Change Science

Christian Community Development Association: Immigration

Review these examples again to think about general conventions you start to notice in terms of words, sentences, formatting, and organization.

As you will notice, there is no 1-to-1 correlation between these examples and exactly how a white paper should be organized. This is true of nearly any genre, but sometimes there are genres that are more rigid (e.g., a business letter).

However, I hope you notice even in the differences there is a general strategy for how to organize a white paper:

  • some sort of introduction that explains the purpose of writing it and/or the perspective of the organization for why it is involved in the subject
  • some organizing scheme for presenting background information. Perhaps it is one section, a section with subsections, or multiple sections. Depending on your topic, it might make sense to do any one of those options.
  • some sort of section that focuses on possible solutions, actions to take, or actions already taken in response to the background information provided. Sometimes this is in the conclusion or sometimes it is its own section (or section with subsections).

 

Next Time (2-5 minutes)

-Proposal by end of day today on Blackboard

-Read chapter 6 of Data Feminism for Thursday

-Do Response Post 8 if you are signed up for that by class time.

-Do comment for Response Post 8 if you are not signed up. That can be done by 11:59pm on Thursday.