Use results from the assumption of normality or the Central Limit Theorem to perform hypothesis testing.
Compare and contrast the parts of estimation and hypothesis testing that differ between simulation- and resampling-based approaches with the assumption of normality or the Central Limit Theorem-based approaches.
Write a computer script to perform hypothesis testing based on results from the assumption of normality or the Central Limit Theorem.
Discuss the potential limitations of these methods.
Define null and alternative hypotheses
Set significance level
Choose a test statistic
Create the distribution of test statistic under null
Calculate observed test statistic and associated \(p\)-value
Draw a conclusion
Normal population
Any population
Population (binary)
Your friend claims that they have psychic abilities and can predict the outcome of coin flips before they happen. You decide to test them by flipping a coin 100 times, and count the number of times they guessed right.
\[\hat{p} \sim N \left(p_0, \frac{p_0 (1-p_0)}{n} \right)\]
\[\hat{p} \sim N \left(0.5, \frac{0.5\times0.5}{100} \right)\]
There is not enough evidence, at \(10\%\) significance level, to suggest that your friend’s guesses were better than random guessing
When relying on the CLT, we usually use a standardized version of the test statistic;
For example, instead of using \(\hat{p} \sim N\left(p_0, \frac{p_0(1-p_0)}{n}\right)\), we use the z-score: \[Z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\sim N(0,1)\]
The advantage of doing this is that we can use the same process for the proportion, mean, difference in proportion, and difference in means.
Your friend claims that they have psychic abilities and can predict the outcome of coin flips before they happen. You decide to test them by flipping a coin 100 times, and count the number of times they guessed right.
\[Z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\]
\[Z \sim N \left(0, 1\right)\]
There is not enough evidence, at \(10\%\) significance level, to suggest that your friend’s guesses were better than random guessing
Alice, the coffee shop owner, wants to see if her new marketing campaign increased her average daily latte sales, which were 50 before the campaign. After 25 days, the average sales increased to 55 lattes per day, with a sample standard deviation of 8.
\[T \sim t_{n-1}\]
Assumptions and conditions:
\[T = \frac{55 - 50}{\frac{8}{\sqrt{25}}} = 3.125\]
There is sufficient evidence, at \(5\%\) significance level, to conclude that the average daily latte sales have increased after the new marketing campaign.
Example from: https://online.stat.psu.edu/stat415/lesson/9/9.4 Pennsylvania State University
Via a telephone poll, Time magazine asked 800 adult Americans:
“Should the federal tax on cigarettes be raised to pay for health care reform?”
The results of the survey were:
Is there sufficient evidence at the \(\alpha = 0.05\) to conclude that the two populations differ significantly with respect to their opinions?
The hypotheses are: \[H_0: p_1 - p_2 = 0\quad vs \quad H_A: p_1 - p_2 \neq 0\]
\(p_1\): proportion of the non-smoker population who reply “yes”
\(p_2\): proportion of the smoker population who reply “yes”
where \(\hat{p}\) is the overall sample proportion, i.e., \(\hat{p} = \frac{\# Successes}{\# Total}\) (considering both groups).
\[Z \sim N \left(0, 1\right)\]
Observed proportions:
Non-smokers: \(\hat{p}_1 = \frac{351}{605} = 0.58\)
Smokers: \(\hat{p}_2 = \frac{41}{195} = 0.21\)
Overall sample proportion: \(\hat{p} = \frac{351 + 41}{605 + 195} = 0.49\)
Observed test statistic:
\[ Z = \frac{0.58 - 0.21}{\sqrt{0.49\times 0.51\left( \frac{1}{195} + \frac{1}{605}\right)}} = 8.99\]
There is sufficient evidence at the \(5\%\) significance level to conclude that the two populations differ with respect to their opinions concerning imposing a federal tax to help pay for health care reform.
To discuss the difference in means, we need to consider the relationship between the two groups.
If the two groups are independent, we use the Welch’s two-sample t-test.
If the two groups are dependent, we use the paired t-test.
We want to see if people tend to marry later in life in the US compared to Canada.
We want to compare the red cells count in healthy people and people with leukemia.
We want to compare how much money Apple users are willing to spend on a new phone compared to Samsung users.
We want to see if a new drug is effective in reducing blood pressure. We measure the blood pressure before the treatment and after the treatment.
We want to see if married people have similar IQ levels.
We want to compare the weight of twins at birth.
A researcher wants to investigate whether there’s a difference in the average daily screen time between teenagers in urban and rural areas.
The researcher randomly samples 20 teenagers from urban areas and 25 teenagers from rural areas. They ask each teenager to report their average daily screen time (in hours) over the past week.
The results of the survey were:
Group | Sample size | Sample mean | Std Dev |
---|---|---|---|
Urban | 20 | 6.2 hours | 1.5 hours |
Rural | 25 | 5.5 hours | 1.2 hours |
The hypotheses are: \[H_0: \mu_1 - \mu_2 = 0\quad vs \quad H_A: \mu_1 - \mu_2 \neq 0\]
\(\mu_1\): average screen time for teenagers in urban areas
\(\mu_2\): average screen time for teenagers in rural areas
\[T \sim t_k\] where k is \[k = \frac{ \left(\color{red}{\frac{S^2_1}{n_1}} + \color{blue}{\frac{S^2_2}{n_2}}\right)^2 }{ \color{red}{\frac{S_1^4}{n_1^2(n_1-1)}} + \color{blue}{\frac{S_2^4}{n_2^2(n_2-1)}} } \]
If both \(x_1\) and \(x_2\) follow the Normal model, there is no restriction on the sample sizes \(n_1\) and \(n_2\).
If \(x_1\) and \(x_2\) are non-Normal or follow an unknown distribution, we need reasonably large sample sizes to validate the Normal approximation by the CLT as well as the use of the t-model.
Observed test statistic:
\[T = \frac{\left(6.2 - 5.5\right) - 0}{\sqrt{\frac{1.5^2}{20} + \frac{1.2^2}{25}}} \approx 1.6973\]
Degrees of freedom (\(k\)): \[k = 35.97\]
There is sufficient evidence to conclude that teenagers in urban and rural areas have a different average daily screen times.
A fitness instructor wants to evaluate the effectiveness of a new 8-week training program designed to improve participants’ resting heart rate (RHR). They believe the program will lower RHR.
The instructor recruits 12 participants and measures their RHR (in beats per minute) before starting the program and again after completing the 8-week program.
The data is collected as follows:
Statistic | Before | After | Difference |
---|---|---|---|
Mean | 73.42 | 70.08 | 0.8 |
Std Dev | 5.16 | 5.23 | 1.07 |
n | 12 | 12 | 12 |
The hypotheses are: \[H_0: \mu_1 - \mu_2 = 0\quad vs \quad H_A: \mu_1 - \mu_2 > 0\]
\(\mu_1\): average RHR before the program
\(\mu_2\): average RHR after the program
The test statistic we will use is: \[T = \frac{\bar{d} - \Delta_0}{\frac{s_d}{\sqrt{n}}}\]
where:
\[T \sim t_{n-1}\]
Assumptions and conditions for validity of using the t-model:
Observed test statistic:
\[T = \frac{0.8 - 0}{\frac{1.07}{\sqrt{12}}} \approx 2.48\]
There is not enough evidence, at \(\alpha = 1\%\), to conclude that the 8-week training program decreases participants’ average resting heart rate.
Traditional theory based approach
Simulation approach
© 2024 Rodolfo Lourenzutti, Melissa Lee, Marie Auger-Méthé – Material Licensed under CC By-SA 4.0