Review (5 min)
Standard Error, Sampling Distribution, Confidence Intervals
Hypothesis Testing (45 min)
Hypothesis testing is a test based on the theoretical existence of the sampling distribution–it really is just another way to express a confidence interval.
Another way to put it: the difference is rhetorical just as much as it is mathematical. An interval suggests a range of possibilities whereas a hypothesis test tells you if a given guess is plausible. What are the different rhetorical effects here of these choices? What do they do?
You could do a hypothesis test for a given mean of one sample against an estimate you have, but that is less useful than just doing the confidence interval unless you have a very specific situation (e.g., thinking that a company has discriminatory promotion practices based on chance and not actual discrimination so you test to see how likely the proportion you have at the company between men and women is if the real value would be 50%). What you see more often with hypothesis testing is in relation to comparing samples rather than looking only at one sample.
Here, the difference would be instead of thinking about a sampling distribution of the mean of each random sample. You are doing random paired samples of the differences between their means. So, it is a distribution of differences for the two samples.
Practically speaking, you are seeing how likely a gap between two groups is by chance. If not likely, then you can conclude there is a legitimate difference between the two groups. Conversely, if it is fairly likely, then you can’t say there is a real difference between the two groups.
More formally, you are looking to either reject or fail to reject the null hypothesis that the difference between both means is zero. So, we are saying that there is probably a difference or that we aren’t sure.
Here are the considerations to make when running this test (specifically, a two-sample t-test of differences between means):
- To start: You are going to have an outcome variable and an explanatory variable. The explanatory variable explains the outcome variable. For instance, gender explains salary or treatment explains SAT score. In our example, it will be age explains differences in cholesterol.
- Like with the confidence interval, we can only work with representative data. Thus, samples of randomized data from the population are best, especially with a higher sample size (the higher the better).
- Do you have a proportion from categorical data (e.g., comparing amount of complications a group that received a medication had vs. a group that did not receive medication) or do you have continuous data (e.g., mean difference in SAT score for group that received tutoring and a group that did not)? Different formulas for the test depending on type of data, as with confidence interval.
- Do you have a dependent or independent sample (i.e., do you have the people/animals/objects in each sample or are they unrelated?)? Different formulas depending on this.
- Do the two samples have similar variances? There are two different t-tests for comparing two figures. One assumes standard deviation is equal and one does not. If you have different variances, this alternate t-test also applies something called the Welch-Satterhwaite approximation, which factors in standard deviation and size of each sample to come up with a little bit more of a conservative estimate.
- Set the “alpha level” for your test. This is often 0.05 but can also be 0.01 (another way of putting 95% confidence level or 99% confidence level) or even smaller. The idea here is that if the probability of you obtaining a difference that is the amount of standard errors away from the mean of differences in the sampling distribution (this is called the “p-value“) is less than your alpha level (e.g., 0.02 for an alpha level of 0.05), then you are saying that it is likely your result is not by chance, so you reject the null hypothesis (i.e., there is likely a difference). If it is more (e.g., 0.29), then you fail to reject the null hypothesis (i.e., there might be no difference).
For the hypothesis test, to just get the test statistic (that is, the number of standard errors between the estimate and the null hypothesis value–always 0 for comparison of two means!), the formula is:
(y1 – y2)/se, where se = sqrt.[(s1^2/n1) + (s2^2/n2)] (y1 = mean of first sample; y2 = mean of second sample; se = standard error; sqrt. = square root; s1^2 = the first sample’s standard deviation squared;n1 = first sample size; s2^2 = second sample’s standard deviation squared; n2 = second sample size)
The probability of that test-statistic being drawn from the theoretical sampling distribution is then calculated based on the sample size because they are then assigned different shapes of the sampling distribution (NOTE: different ways of doing this depending on whether the samples have similar variances or not). This is called the p-value.
As stated above, if your p-value is less than the alpha level, you reject the null hypothesis. If higher, your fail to reject it. In other words, it is either likely or not as likely the case that there is a difference between the two groups and this is not by chance: good chance they are different or fairly good chance that there is no difference.
The confidence interval is really just another way of expressing a hypothesis test. You can put the comparison of two means t-test in terms of a confidence interval.
Margin of error/Confidence interval for difference between two means:
(y1 – y2) +/- t(se). The t-score is calculated based on the sample size if variance is roughly the same and by sample size and standard deviation if variance is different. If zero is in the range of your margin of error, then it is the same thing as saying you fail to reject the null hypothesis. If it is not, then you reject the null hypothesis.
Let’s try it out
I made a new csv file containing two groups of people of different age ranges of “young” and “old,” defined as people of ages from 17-29 and people from ages 30 and older. We are going to see if there is a significant difference in their mean cholesterol.
Download the notebook and csv file from CourseWeb in the folder.
For our class, we are just going to focus on quantitative data and independent samples. For “cross-sectional data”, which is data already collected that you can slice into “different groups,” we treat these data as independent samples (so, many of your could run this test if you have random data).
For our data, we can assume similar enough variance as long as they have roughly the same standard deviation. If not, you’ll use a different version of the t-test for comparing two means. Which one should we use for the example?
So, what are we doing here? Reject or fail to reject?
Let us try to think of as many ways to write about this as possible. Start as technically as you can. Then get as informally as you can. Think of at least 5 expressions. Pick your favorite one and post it to the Goolge Doc here.
Revision Plan Check-in (15-20 min)
Let’s return to the revision plan from last class. Keep formulating it and start revising. I’ll come around and check in on how it is going.
Next Time
-Will review Journal 5 and Journal 4 responses together to think more about the possibilities of writing about advanced calculations in terms of your goals, your audience, your constraints (e.g., space you have, expected conventions).
-Mid-term survey to check in on how things are going.
-Time in class to work on revision