errors_in_inf

Learning objectives

Define type I & II errors
Define power
Describe the responsible use and reporting of p-values from hypothesis tests
Discuss how these errors are linked to a “reproducibility crisis”
Measure how these errors amplify when performing multiple hypothesis testing in the context of multiple comparisons

Review: null hypothesis testing

Review: general procedures for null hypothesis testing

Define null and alternative hypotheses
Set significance level
Choose a test statistic
Create the distribution of test statistic under null
Calculate observed test statistic and associated \(p\)-value
Draw a conclusion

Review: general procedures for null hypothesis testing

Define null and alternative hypotheses
Set significance level
Choose a test statistic
Create the distribution of test statistic under null
Calculate observed test statistic and associated \(p\)-value
Draw a conclusion

Today’s goal

Our goal: understand the errors that can be made when performing null hypothesis testing and exploring the concept of power

Hypotheses and significance level

Hypotheses: Null (\(H_0\)) and Alternative (\(H_A\)) hypotheses.
Test statistic: function of the data on which the decision is based
p-value: assuming \(H_0\) is true, it is the probability of observing a test statistic value as or more extreme towards \(H_A\) than what we observed
significance level: the probability wrongly rejection \(H_0\).
- if p-value \(\leq \alpha\) \(\Rightarrow\) reject \(H_0\)
- if p-value \(> \alpha\) \(\Rightarrow\) fail to reject \(H_0\)

\(H_0\), p-value, \(\alpha\), and critical value

critical value: value of the test statistic at the boundary between the rejection and non-rejection regions
rejection region: reject \(H_0\) if the observed test statistic falls in this region
non-rejection region: do not reject \(H_0\) if the observed test statistic falls in this region

Decisions and possible errors

Decision\Reality	\(H_0\) is true	\(H_0\) is false
Reject \(H_0\)	Type I error	Correct decision
Do not reject \(H_0\)	Correct decision	Type II error

Type I error: rejecting \(H_0\) when it is true
- Pr(Type I error) = \(\alpha\)
Type II error: not rejecting \(H_0\) when it is false
- Pr(Type II error) = \(\beta\)

We want a test that minimizes both types of errors

\(\alpha\) and type I error

We control the Type I Error rate by specifying the significance level
Example: if we set \(\alpha = 0.05\) \(\Rightarrow\) ok with rejecting \(H_0\) when it is true 5% of the time

Type II error

Type II error occurs when \(H_0\) is false, but we do not reject it
Note that if \(H_0\) is false, the data comes from the pink curve, not the orange.

Type II error

Probability of Type II error (\(\beta\)) will depend on:
- effect size (i.e. difference between \(H_0\) and reality)
- sample size
- significance level
- using a left- or right-tailed test instead of a two-tailed
- the test itself
If the sample size is fixed, decreasing the significance level (\(\alpha\)) will increase \(\beta\)
If we increase the sample size, we can reduce \(\beta\) while maintaining our desired \(\alpha\)

What type of error is worse?

A doctor tests a patient for a disease. The null hypothesis is that the patient is healthy.

Type I error: test shows patient has a disease when in fact the patient does not have the disease
Type II error: test shows the patient does not have the disease when in fact they do

Type I and Type II errors

We can control the probability of type I error by our choice of the significance level, \(\alpha\)
It’s difficult to control the probability of making type II error

Power

power: the probability of correctly rejecting the null hypothesis \(H_0\), when \(H_0\) is false

\(\text{Pr(Reject } H_0 \text{ when } H_0 \text{ is false}) = 1 - \beta\)
We want the power to be large
When to use power:
- Choose between similar tests
- Power analysis, see what sample size is needed

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Critical value

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \(H_0:\mu = 100 \; vs. \; H_A: \mu > 100\).

Test statistic: \(T = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} = \frac{\bar{X} - 100}{10/\sqrt{16}}\)
Critical value: \(\text{Pr}(T \geq t^*) = 0.05\) \(\Rightarrow\) \(t^* =\) qt(0.05, df = 15, lower.tail = FALSE) = 1.75
We reject \(H_0\) when \(T \geq\) 1.75
For what value of \(\bar{X}\) do we reject \(H_0\)?
- \(t^* = \frac{\bar{X}^* - \mu_0}{s/\sqrt{n}}\) \(\Rightarrow\) \(\bar{X}^* = (s/\sqrt{n}) t^* + \mu_0 =\) 104.38

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Type II error

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \[H_0:\mu = 100 \; vs. \; H_A: \mu > 100\]

We want to find the type II error rate if the true mean is 108.

We reject \(H_0\) when \(\bar{X} \geq\) 104.38
\[\begin{align} P(\text{Type II error}) &= P(\bar{X} < 104.38 \text{ when } \mu = 108) \\ &= P(T < \frac{104.38 - 108}{10/\sqrt{16}}) \\ &= P(T < -1.448) \end{align}\]
pt(-1.448, df = 15) = 0.08

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Clicker question

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \[H_0:\mu = 100 \; vs. \; H_A: \mu > 100\]

What is the power of the test if the true mean is 108?

A. 0.05

B. 0.45

C. 0.64

D. 0.92

E. 0.95

Power Function

Unfortunately, in reality, we would not know that the true mean is 108;
So, what we do is to calculate the power for different values of \(\mu\) in \(H_A\);
- This is the so-called power function.

Confidence intervals and two-tailed hypothesis tests

If we have a two-sided hypothesis test \(H_0:\mu = \mu_0 \; vs. \; H_A: \mu \ne \mu_0\) with significance level \(\alpha\)
- \(H_0\) is rejected if the \(100\times(1 - \alpha)\)% confidence interval does not include \(\mu_0\)
Example: Suppose a 95% confidence interval for the mean is 10 to 12, and we want to test \(H_0: \mu = 13\) vs. \(H_0: \mu \ne 13\).
- We would reject \(H_0\) at a 5% significance level since the hypothesized value is not captured by the interval

Today’s worksheet

Using known populations, explore:
- type I error rates
- factors that affects type II error rates and power
- issues associated with multiple hypothesis testing
Explore how estimating multiple parameters affect the coverage of confidence intervals;

Take home

Type I error: rejecting \(H_0\) when it is true
- Controlled by specifying the significance level

Type II error: not rejecting \(H_0\) when it is false

Affected by:

- effect size (i.e., the difference between the null hypothesis and reality)
- sample size

- significance level

- using a left- or right-tailed test instead of a two-tailed

- the test itself

Power: probability of rejecting \(H_0\) when it is false
We want a test that minimizes type I error rate and has high power

Now it’s your turn!

navigate to Canvas, open worksheet_08

We are here to help!

Module 8: Errors in inference

Learning objectives

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: general procedures for null hypothesis testing

Review: general procedures for null hypothesis testing

Today’s goal

Hypotheses and significance level

\(H_0\), p-value, \(\alpha\), and critical value

\(H_0\), p-value, \(\alpha\), and critical value

\(H_0\), p-value, \(\alpha\), and critical value

Decisions and possible errors

\(\alpha\) and type I error

Type II error

Type II error

Type II error

What type of error is worse?

Type I and Type II errors

Power

Critical value

Type II error

Clicker question

Power Function

Confidence intervals and two-tailed hypothesis tests

Today’s worksheet

Take home

Now it’s your turn!