## Quick notes (2 min)

-Margin of Error: just the distance between both ends of the confidence interval, divided by two. Then you write it as this common phrasing for most polling data “There was a margin of error of +/- 3%.” In other words, the estimate might actually be somewhere between three points lower or three points higher than the estimate we have.”

-Hypothesis tests: They are super confusing and non-intuitive! You absolutely WILL reverse the logic. That’s okay! For comparing two means, just keep in mind these general points:

- you are testing against the null hypothesis that the true population difference between two means is zero. In other words, that the true difference between the two actual populations is that there is no difference. That there is no notable difference between the two means.
- Whatever your alpha level is (e.g., usually 0.05), if the p-value you get is
**less**than the alpha level for a difference between the two means that is NOT zero (that is, between the samples, there is indeed*some*difference), then that means you are saying something like, “wow, it would be really unlikely that I drew these samples at random and got such an unlikely draw–it is probably true that the true population difference is not zero but some other value, and thus, there is a difference between these populations. Now, if you get something greater than 0.05, then you are saying the opposite. That you just can’t say for sure that there is a difference or not. - People are critical of hypothesis tests, and many prefer doing confidence intervals for differences between two samples. I wanted to go over hypothesis tests for two samples because this is a very commonly used expression of difference. How a confidence interval would work is something like this: you’d get an interval of possible differences. So, you might get 12 and 16, meaning that the true difference is likely somewhere between there. For 95% confidence, this would be the same as getting a p-value of less than 0.05. If you got an interval that had zero in it (e.g., -3 and 4), then the p-value would be greater than 0.05 at 95% confidence because it includes zero in the interval (i.e., possible value is that there is no difference). If interested in doing confidence interval in your writing for two means, let me know and I can work up some code for you.
- Need to compare like to like. So, for instance, mean SAT score of two groups, not GRE test score of one group and SAT test score of another group. Or, GRE test score of same group and SAT test score of same group. Remember, too, the assumptions going into doing this test that I laid out in the last lesson plan (the numbered list from 1-6 starting “To start“). In Journal 5, what some of you were interested in doing was seeing if there were correlations between, for example, height and weight. Think back to basic arithmetic here: you can’t subtract 3 oranges from 90 apples. The null hypothesis depends on subtraction of like units.

## Rhetoric, Intuition, Complication, Goodwill (30-45 min)

### Comprehension vs. Goodwill

As we have discussed, it will depend on your audience and your goals for how you approach explaining your data analysis. In terms of comprehension, this can be of in two ways

Numbers 1 and 2 have to do with comprehension: is the explanation of the analysis essential to the comprehension of your argument or narrative? To what degree?

Number 3 has to do with your alignment, as speaker/writer, with your audience. In other words, writing about data analysis can be thought of as how you are trying to build trust between you and your audience. Sometimes Number 1 and Number 3 are indistinguishable–but this does not mean that their effects do not occur simultaneously.

- Some calculations need to be explained because your audience will want to know (e.g., in an online essay about marginal income tax increases, it is probably a good idea to explain how it works vs. other tax strategies).
- Sometimes, calculations are too complicated or too well-known to get into in much depth, depending on your audience (e.g., for comparison of two means using hypothesis test it probably is not worth getting into the two separate formulas for whether both samples have the same variance or not; an audience more familiar with confidence intervals probably does not need it explained to them; etc.).
- Then there is trust and your own credibility. Aristotle, one of the major ancient rhetorical scholars (among other things) came up with the rhetorical triangle, which many of you have probably heard of in one way or another:
*logos*,*pathos*, and*ethos*. Essentially, these come down to three appeals (that are often interwoven in any piece of rhetoric): to the argument itself, to the audience in some way (usually in terms of emotion), and to the credibility of the speaker/writer.*Ethos*is further broken down by Aristotle into these three concepts:*phronēsis*(display of practical wisdom)*, aretē*(display of virtue)*,*and*eunoia*(display of goodwill). One way to achieve goodwill from your audience is to make*some*effort at explaining how something was found. This isn’t always necessary, but can be a helpful thing to do for an audience that may be mistrustful of things in presented as if the analysis were done in a “blackbox” that cannot be accessed. Making some effort doesn’t necessarily eliminate the blackbox, but it shows that what was done was something human…not a pure lie nor something that fell out of the heavens.

### Examples in Wild and in Your Writing

Let’s look at some examples and talk about how you might classify them based on the above. Also, note some things going on in terms of emphasis and in terms of how interpretations are being signaled about the numbers, using what we have learned from past classes.

Activity: Label each with a number corresponding to the above.

Arizona is one of eight states that had “statistically significant” increases in the number of people without health insurance between 2017 and 2018, the U.S. Census Bureau says.

In Arizona, 750,000 people didn’t have any health insurance last year. That’s about 1 in 10 people or 10.6% of the population. The number of Arizonans without coverage jumped by 55,000 people over 2017, after several years of drops in the rate of uninsured.

A census report released Sept. 10 says roughly 27.5 million Americans were without health insurance at some point last year — that’s 1.9 million more people uninsured than in 2017, the report says.

A shocking statistic from Oxfam attracted huge attention this week. Coinciding with the World Economic Forum in Davos, the charity group said that the richest 26 people on earth have the same net worth as half the world’s population, around 3.8 billion people. In an attempt to tackle glaring financial inequality in the United States, newly elected Representative Alexandria Ocasio-Cortez has proposed implementing a 70 percent marginal tax rate on the nation’s highest earners. That would involve a person who earns $10 million in a single year paying up to 70 percent tax on every dollar earned after that. Unsurprisingly, the super-rich at Davos dismissed the idea but it isn’t unprecedented historically.

But opponents of the U.S. refugee program have also criticized it on financial grounds, tallying not only the direct government costs and resources used for the resettlement programs but also the indirect costs incurred when refugees enroll in welfare programs. A new working paper from the National Bureau of Economic Research, however, argues that it is a mistake to focus on the costs of refugee resettlement without also looking at the economic and financial benefits.

“You can’t just look at one side of this equation. [They’re] getting benefits, but they’re also generating income,” said William Evans, a Notre Dame economist and one of the paper’s authors. “They’re living [here], so therefore they are paying taxes.”

To try to estimate both the costs and benefits of admitting refugees, Evans and his coauthor, research assistant Daniel Fitzgerald, used data from the American Community Survey to identify people who are likely to be refugees. From that group, researchers pulled a sample of 18-to-45-year-olds who resettled in the U.S. over the past 25 years and examined how their employment and earnings changed over time. They found that the U.S. spends roughly $15,000 in relocation costs and $92,000 in social programs over a refugee’s first 20 years in the country. However, they estimated that over the same time period, refugees pay nearly $130,000 in taxes — over $20,000 more than they receive in benefits.

The authors found that, when compared to rates among U.S.-born residents, unemployment was higher and earnings were lower among adult refugees during their first few years in the country, but these outcomes changed substantially over time. After six years in the U.S., refugees were more likely to be employed than U.S.-born residents around the same age. The longer they live longer in the U.S., the more refugees’ economic outcomes improved and the less they relied on government assistance. While refugees’ average wages are never as high as the average for U.S.-born residents, after about eight years in the U.S., refugees aren’t significantly more likely to receive welfare or food stamps than native-born residents with similar education and language skills.

From your classmate:

This interval means there is a 95% chance, or 95 times out of 100, that if someone were to select a bear randomly, their age would be between 34.524 and 52.513 months. To put this into context, it would be in the range of close to 3 years old to a little more than 4 years old.

From your classmate:

I told him that a 95% confidence interval is a range of values that you are 95% confident contains the actual mean of a sample population based on predictive statistics. For the bear example, this means that for the given data set, we can be 95% confident that the mean weight for a bear of the sample population is between 150 and 216 lbs.

From your classmate:

A confidence interval are constructed at a confidence level, such as 95% confidence. If someone reports a “95% confidence interval” that means that if we selected different samples from the same population and computed an interval estimate, we would expect the true population parameter to fall within the interval estimates 95% of the time.

From your classmate:

The 95% confidence interval of the age of bears in months is (34.31454344174561, 52.722493595291425) using a t-distribution and (34.52446283619187, 52.512574200845165) using a z-distribution.

This means that we can be 95% confident that the mean of the ages of the entire population of bears is somewhere in the confidence interval.

From your classmate:

In the case of the bears, if the p-value is less than five percent, it can be asserted with a high level of confidence that there is a difference in average bear weight between male and female bears.

The p-value calculated from the researchers’ data is 0.125. Since this is not less than 0.05, there is too big of a chance that the observed differences in bear weights were caused by random chance and it cannot be confidently asserted that there is a difference in the average weights of male and female bears.

This result may seem counterintuitive. Sexual dimorphism is common in animals and male bears weighed almost 50lbs more than female bears in the sample, so why does the t-test seem to contradict the idea that male bears are heavier? In reality, the results of the t-test do not contradict that idea directly. Instead, the t-test results indicate that the data should not be considered convincing evidence of that idea. If the researches had a larger sample of male and female bears, it is possible that the t-test would have led to a p-value lower than 0.05. If that were the case, the data could be considered to be strong evidence that there is a statistically significant difference in the average weight of male and female bears.

From your classmate:

The average neck circumference for male and female bears was 21.7 inches and 18.4 inches, respectively. I wanted to know if the difference between male and female neck circumference in bears was significant and not due to random chance. I ran a t-test, which is a test to determine whether the difference is due by chance by looking at variation in the data and at the difference in the means. My null hypothesis was that there is no difference between the two sets of data. Before running a t-test, I have to decide how sensitive I want my test to be, so I chose it to be fairly sensitive, rather than extremely sensitive. In order to reject the null hypothesis, the t-test will need to give me number, called the p-value, that is less than 0.05. If I wanted my test to be more sensitive, I would have chosen this value to be less than 0.01. My t-test gave me a p-value of 0.039, and that is less that 0.05, so I can reject the null hypothesis. In other words, the observed difference in neck circumference between male and female bears is not likely due to random chance.

### Advanced Calculations, Rhetoric, and Next Assignment

For your next major assignment, which we will go over on Tuesday (10/15), you’ll be required to write about at least one advanced calculation that you either generate yourself from your own analysis or that you cite from secondary research as part of a scientific/technical writing piece and a short piece of writing that is more public-facing. We’ll talk more next week when we go over the prompt, but it might be a good idea to think now about what sort of more complicated data analysis you’d like to write about.

You are welcome to write about a confidence interval, a hypothesis test, or correlation (which we will talk about next class), but you are not required to do so. As long as it is something that has good amount of steps to it (e.g., marginal income tax) or is not intuitive (e.g., hypothesis test), I think you’ll get good practice for figuring out how to write about more advanced calculations both in terms of conventions and in terms of accessibility.

## Mid-term Survey (10 min)

I like to do this! It can help folks, so I try to do it every term.

## Learning Narrative 2 / Revision Work (15 min)

–Revision: Don’t lose the big picture. What is the story that you want to tell? What is the argument you are making? How is that threaded throughout the entire piece?

–Learning Narrative 2: Be sure to read over the prompt as you write LN2. The gist of what I’m looking for, though, is this: in the time past since you wrote LN1 (e.g., lessons since then, reading since then, first draft of public writing project, how you revised the public writing project), what do you believe about writing with data? What has changed for you? What has stayed the same, but perhaps has strengthened in depth? **Have examples from your own writing to support your claims.**

**–**Both your revision and LN2 are due by **11am on Tuesday (10/15)**.

-Please use the rest of class to work on these and grab me for questions as needed.

## Next Time (5 min)

-Turn in LN2 and the Public Writing Revision by 11am on 10/15.

-Read Miller, chapter 9.

-Next class, we will talk more in depth about the next major writing assignment, but you might think now about some advanced calculations you’d like to write about for the next project. I think I’m going to change Journal 6 to a prompt about brainstorming some advanced calculations about your current data (or different data) to use for the next assignment.