• Skip to primary navigation
  • Skip to main content

The Lexington Review

A Journal of Scholarly Writing from the Baruch College Writing Center

  • Journal
    • Arts & Literature
    • Business
    • Culture & Society
    • Personal Narratives
  • Writing Guides
  • Submissions & Nominations
  • Baruch Writing Center
  • Show Search
Hide Search

History’s Manifestation: One-sided, Two-Sample z-Test of Population Proportions

By Serena Zou

In this statistical analysis and the accompanying reflections, Serena Zou uses census data to analyze how a historical event, World War II, impacted various populations. She not only walks the reader step-by-step through her calculations, but she also shows how she pivoted and adapted to various constraints. We found that her meta-analysis was especially useful in response to her findings: Serena reflects on each step of the process. She models how to actively move from hypothesis to data calculations and then evaluate the results. Serena also demonstrates how it is sometimes necessary to shift the research question, describing how she shifted away from her initial focus on Chinese-American immigrants due to lack of data. Serena’s research examines population proportion relative to gender before and after WWII using two Census sheets, and she offers helpful insights on moving through the research process.

—Joss, Lexington Review Editor


Takeaway: The overall takeaway from doing this final project is that I now know how to access census data, immigration data, and many other forms of data that document the history of our country, which I previously did not know. 

A1. Research Hypothesis

Since I have never done a research hypothesis before, I encountered a few challenges during the process: 1) I initially complicated the assignment so much by thinking that it would require a whole stack of census sheets rather than just the two required random sheets for Part A. 2) I also attempted to match the research topic with historical events and figures that shared my very own ethnic background as a Chinese-American that lacked the supportive data needed for my hypothesis. All these attempts ended up taking a significant amount of my time at the beginning until I spoke to my statistics professor to clarify the assignment requirements. 

Initially, I seriously thought about framing my research hypothesis around the proportion of Chinese Americans immigrating to the US before and after the Chinese Exclusion Act of 1882 given my own identity as a teenage immigrant from China. However, there simply is not enough documented data for me to conduct meaningful research to back up my hypothesis. I wondered why that might be; I then realized that the Chinese population had never been in the mainstream and was “camera-shy.” Therefore, while the early Chinese immigrants contributed significantly to the transcontinental railroad and the development of the American West, very few documents have made it into the archives. Hence, the first lesson I learned through this research process is: not all great research ideas can be executed. This can be due to a variety of factors, such as a lack of documented materials, as is with my case. Therefore, after speaking with the professor, I have decided to study the proportion of men and women in the US before and after World War II (WWII), a war that took place between September 1, 1939 and September 2, 1945, with a specific focus on the period before and after the US entered the war starting December 8, 1941, just a day after the Japanese attack on Pearl Harbor on December 7, 1941.

To define a research hypothesis, I located the meaning of research hypothesis from Researcher.Life, which states:

A research hypothesis is a statement that proposes a possible explanation for an observable phenomenon or pattern. It guides the direction of a study and predicts the outcome of the investigation. A research hypothesis is testable, i.e., it can be supported or disproven through experimentation or observation (Singh). 

To that end, below is what I attempt to learn: 

Research Hypothesis: I want to study how WWII reduced the proportion of men as shown through the comparison between two census sheets from 1940 and 1950, right before and after the war. I predict that the proportion of men will be reduced at least by 25% because mainly men fought in the war and the death toll would impact more men than women.  

A2.

A2. (a) Null Hypothesis and Alternate Hypothesis:

Null Hypothesis: (H0)
P1940 = P1950

Alternative Hypothesis: (H1)
P1940 > P1950

A2. (b) Hypothesis Written Meanings:

The Null Hypothesis (H₀) states that the proportion of males in the population of the US in 1940 was equal to the proportion of males in 1950. In other words, according to the null hypothesis, as reflected on my manual count on the 1940 census proportion shown on a later page of this document where out of 50 people, the 25 females and 25 males were proportionally equal.  The null hypothesis states that I should have the same proportion, 50/50 for my 1950 census proportion count. 

The Alternate Hypothesis (H₁) states that the proportion of males in the US in 1940 was greater than the proportion of males in 1950. This suggests, which aligns with my own theory, that the proportion of males decreased in US after WWII. This in fact also aligns with the result of my manual count of the two census sheets. 

A2. (c) Sample Population:

My sample population consists of all individuals living in the US during the years 1940 and 1950, specifically those who were recorded in the census data for those years. As I have already learned from the class and through this research process that the data is often incomplete due to various reasons, I need to make a simple assumption that these populations include every individual residing in the US at the time.

To conduct my study, I selected two small, random samples from these larger populations: one census sheet from 1940 consisting of 50 people, and another from 1950 with 16 people. While they are small compared to the full population, they can still offer a window into broader demographic trends that may have been influenced by WWII.

A2. (d) Population Proportion Computation:

I know many errors can happen during a research process. To minimize the number of mistakes I make, I counted the raw data, in this case the numbers of men and women on each census sheet, twice manually and once on Excel, to make sure that I’m off to a good start. To calculate the population proportions for males and females, I first counted the total number of females and males shown on each of the two census sheets and wrote my progress down as I counted the numbers. After I finished counting, I used the numbers for females and males respectively to divide by the total number of individuals on each sheet to determine the proportion of each gender for each year. Below is a summary of my count and computations for each sheet: 

For 1940, I have 50% females and 50% males. 
Females: 25 counts / 50 total = 50%
Males: 25 counts / 50 total = 50% 

For 1950, I have 56.25% females and 43.75% males. 
Females: 9 counts / 16 total = 56.25%
Males: 7 counts / 16 total = 43.75%

I know that my small research project will give me the method and skillsets to conduct a larger research project when needed. Hence, I’m making a note here that the proportions for the actual US population would follow the same method, but instead of using my small samples of 50 and 16 people, the calculation would use a lot more random census data drawn from all individuals in the US during those years.

Below are the two names I used to find my two census sheets for before and after WWII to help me compare the proportion of men and women during these two decades. Since the assignment instruction suggests that I find something that interests me to develop my research hypothesis on, I have decided to use Chinese American WWII soldiers’ names to locate my census sheets. Google showed that only 13,000 Chinese Americans fought during the war; below are the names of two prominent soldiers, one (Mr. Wai) died during the war and the other (Mr. Chung-Hoon) survived:

Francis B. Wai (Before War Census: 4/6/1940)

photo of Francis B. Wai

Born: April 14, 1917, Honolulu, HI
Died: October 20, 1944 (age 27 years), Leyte, Philippines
FamilySearch.org Link: https://www.familysearch.org/ark:/61903/1:1:65MZ-M7DX
About Francis B. Wai – Based on the Google search results, Mr. Wai seemed to be a prominent young military man who died fighting the war. 
Per Google, Mr. Wai’s accomplishments during WWII included: 

  • The first Chinese American to receive the Medal of Honor (posthumously)
  • Recognized for heroic leadership in the Philippines during the Leyte Gulf invasion

Gordon Pai’ea Chung-Hoon (After War Census: 1950)

photo of Gordon Pai'ea Chung-Hoon

Born: July 25, 1910, Honolulu, HI
Died: July 24, 1979 (age 68 years), Honolulu, HI
FamilySearch.org Link: https://www.familysearch.org/ark:/61903/3:1:3QHK-SQH4-W1Q8?view=index&personArk=%2Fark%3A%2F61903%2F1%3A1%3A6FN3-BC83&action=view&cc=4464515
Per Google, below shows some of Mr. Chung-Hoon’s accomplishments during WWII: 

  • Served as a naval officer and received the Navy Cross and Silver Star
  • Commanded the USS Sigsbee during intense combat

A3. Census Sheet Images

Census Sheet #1 – BEFORE WAR: April 6th, 1940
Source: FamilySearch.org (It led me to the below URL when I clicked open the “view documents” link.)
https://www.ancestry.com/search/collections/2442/records/78804497
I highlighted the column where the census sheet indicates whether the 50 people listed are male or female. The proportion of males and females is my area of study here. 

Census Sheet 1

Above: (Census Sheet #1) (1940) (Before War)

Mr. Wai’s name is listed as line #20 on the census sheet.

Census Sheet #1 Analysis: Coincidentally, out of the 50 names listed on the 1940 census sheet, 25 are women and 25 are men. This results in an even split between females and males, a proportion of 50% female and 50% male.

census sheet 2

Above: (Census Sheet #2) (1950) (After War)

Mr. Chung-Hoon’s name is listed on line #13 on the census sheet. 

Census Sheet #2 Analysis: On the 1950 census sheet with 16 names, 9 are women and 7 are men. This results in an uneven split between females and males. The proportion of females is (9/16) = 56.25%, whereas the proportion of males is (7/16) = 43.75%. This aligns with our in-class discussion about how the population of young men reduces after a war because many die fighting. 

A3. Written Explanation: 

I went from the census sheets to the proportion by focusing on the one census sheet item that I aimed to compare: gender. I highlighted the gender column on both sheets and manually counted how many females and males there were on each sheet, as shown in the above images. After I performed my counts, I mathematically computed the ratio between females and males on each sheet, one at a time, to derive the proportion. 

A4. Hypothesis Walk-Through

Null Hypothesis: (H0)
P1940 = P1950 [No Significant Change]

Alternative Hypothesis: (H1)
P1940 > P1950 [Proportion of Males Decreased After WWII]

1940 Census: 
Total Number of People (n1940) = 50, with Males = 25
Proportion of Males (p1940) = 25/50 = 50%

1950 Census: 
Total Number of People (n1950) = 16, with Males = 7
Proportion of Males (p1950) = 7/16 = 43.75%

Pooled Proportion: 
P = (25+7) / (50+16) = 32 / 66 = 0.4848 

Standard Error (Sd): 
Sd = √P (1-P) [1/n1940 + 1/n1950]
Sd = √0.4848 (1-0.4848) (1/50 + 1/16)
Sd = √0.4848 * 0.5152 * 0.0825 
Sd = 0.1435

Z-Score:
Z = (p1940 – p1950) / Sd
Z = (0.50 – 0.4375) / 0.1435 
Z = 0.0625 / 0.1435
Z = 0.4355

Since for a one-tailed test at a significant level of 0.05, the critical Z-Value is 1.645, comparing the computed Z-Score of 0.4355 with the critical value of 1.645, I can say that I fail to reject the null hypothesis. 

I do not have enough evidence to conclude that the proportion of males significantly decreased from 1940 to 1950 due to WWII based on the data from the two census sheets I selected. The observed difference in the 0.50 versus 0.4375 is not statistically significant at the 0.05 level. 

To test my hypothesis that the proportion of males in the US decreased from 1940 to 1950, I started by defining the null and alternate hypotheses. The null hypothesis (H₀​) assumes the proportion of males in 1940 (p1940) is the same as in 1950 (p1950), meaning there was no change. The alternate hypothesis (H₁) assumes p1940 is greater than p1950, suggesting a decrease in the male population reduced by WWII.

I then calculated the z-test, which measures how far the observed difference between my sample proportions is from what I would expect if the null hypothesis were true. I first combined the data from both years to estimate the overall male proportion, and I then used this to find the standard error. Dividing the difference in proportions by the standard error gave me a z-score, which told me how many standard deviations the observed difference was from zero. 

Finally, I compared the z-score to the critical value for my chosen significance level (e.g., 0.05). If the z-score exceeded the critical value, I rejected the null hypothesis and concluded the male proportion likely decreased from 1940 to 1950. 

To more explicitly address the question, “how far is far?”, the z-score is about 0.4355, meaning the observed difference is only about 0.4355 standard deviations away from zero. Since this is much smaller than the critical value of 1.645, which is the threshold for being “far enough” to say the difference is significant, I conclude that the observed difference is not far enough from what I would expect. Therefore, I unfortunately do not have strong evidence to say the male proportion truly decreased after WWII.

A5. Graph

graph

Note: Since z=0.4355 is outside of the rejection region, I fail to reject the null hypothesis. 

A6. I think a major challenge of doing thorough research is the lack of sufficient time. While selecting two random census sheets to compute the proportion is a good start, the result is not representative of all cases. As I have limited time to prove my theory that the proportion of men decreased by 25% after the war, I only took two census sheets. Through comparing the two sheets, I learned that the proportion of men decreased from 50% pre-war to 43.75% post-war, which supports my theory of decrease, although by a different percentage. The small percentage change is also supported by what the professor said about how the proportion of men and women who died during WWII did not vary as significantly as during WWI or the Cold War, given that the primary cause of death was bombing, which is indiscriminate of genders (Tatum).

However, using a single page from the US Census can create problems because it might not represent the entire population. Additionally, given my interest in Chinese American history, the two census sheets I selected both came from that ethnic group and the state of Hawaii, which has a higher proportion of Asian Americans. I realized that this is not representative of the whole US population. This lack of randomness can lead to bias, where my sample overrepresents certain trends or groups. Additionally, a sample of 40 people is very small compared to the US population, making it more likely that the results are influenced by chance rather than showing an actual trend.

With more time and resources, randomly selecting 40 people from the entire population would solve these issues by giving everyone an equal chance of being included in my sample. This approach reduces bias because the selected group would likely include individuals from different regions and backgrounds, making it more representative of the whole population. A truly random sample also allows for better conclusions about the overall population, making my test of the hypothesis more reliable.


Appendix – Works Cited

Singh, Sunaina. “What is a Research Hypothesis: How to Write it, Types, and Examples.” Researcher.Life, 8 February 2023. https://researcher.life/blog/article/how-to-write-a-research-hypothesis-definition-types-examples/.

Tatum, Lawrence. “STA 9708 LN7.A Two-Sample t-Test.” Managerial Statistics, October 15th, 2024, Baruch College. Class lecture.

“U.S., World War II Draft Cards Young Men, 1940-1947.” Ancestry, https://www.ancestry.com/search/collections/2442/records/78804497. Accessed 20 Dec. 2024.

“United States Census, 1940.” FamilySearch, https://www.familysearch.org/ark:/61903/3:1:3QHK-SQH4-W1Q8?view=index&personArk=%2Fark%3A%2F61903%2F1%3A1%3A6FN3-BC83&action=view&cc=4464515. Accessed 20 Dec. 2024.


Published May 6, 2025

Contact

[email protected]
646.312.4012

Copyright © 2025 · Monochrome Pro on Genesis Framework · WordPress · Log in