Module 8: Errors in inference

Learning objectives

  • Define type I & II errors

  • Define power

  • Describe the responsible use and reporting of p-values from hypothesis tests

  • Discuss how these errors are linked to a “reproducibility crisis”

  • Measure how these errors amplify when performing multiple hypothesis testing in the context of multiple comparisons

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: null hypothesis testing

Review: general procedures for null hypothesis testing

  1. Define null and alternative hypotheses
  2. Set significance level
  3. Choose a test statistic
  4. Create the distribution of test statistic under null
  5. Calculate observed test statistic and associated \(p\)-value
  6. Draw a conclusion

Review: general procedures for null hypothesis testing

  1. Define null and alternative hypotheses
  2. Set significance level
  3. Choose a test statistic
  4. Create the distribution of test statistic under null
  5. Calculate observed test statistic and associated \(p\)-value
  6. Draw a conclusion

Today’s goal

  • Our goal: understand the errors that can be made when performing null hypothesis testing and exploring the concept of power

Hypotheses and significance level

  • Hypotheses: Null (\(H_0\)) and Alternative (\(H_A\)) hypotheses.

  • Test statistic: function of the data on which the decision is based

  • p-value: assuming \(H_0\) is true, it is the probability of observing a test statistic value as or more extreme towards \(H_A\) than what we observed

  • significance level: the probability wrongly rejection \(H_0\).

    • if p-value \(\leq \alpha\) \(\Rightarrow\) reject \(H_0\)
    • if p-value \(> \alpha\) \(\Rightarrow\) fail to reject \(H_0\)

\(H_0\), p-value, \(\alpha\), and critical value

\(H_0\), p-value, \(\alpha\), and critical value

\(H_0\), p-value, \(\alpha\), and critical value

  • critical value: value of the test statistic at the boundary between the rejection and non-rejection regions
  • rejection region: reject \(H_0\) if the observed test statistic falls in this region
  • non-rejection region: do not reject \(H_0\) if the observed test statistic falls in this region

Decisions and possible errors

Decision\Reality \(H_0\) is true \(H_0\) is false
Reject \(H_0\) Type I error Correct decision
Do not reject \(H_0\) Correct decision Type II error
  • Type I error: rejecting \(H_0\) when it is true
    • Pr(Type I error) = \(\alpha\)
  • Type II error: not rejecting \(H_0\) when it is false
    • Pr(Type II error) = \(\beta\)

We want a test that minimizes both types of errors

\(\alpha\) and type I error

  • We control the Type I Error rate by specifying the significance level
  • Example: if we set \(\alpha = 0.05\) \(\Rightarrow\) ok with rejecting \(H_0\) when it is true 5% of the time

Type II error

Type II error

  • Type II error occurs when \(H_0\) is false, but we do not reject it

  • Note that if \(H_0\) is false, the data comes from the pink curve, not the orange.

Type II error

  • Probability of Type II error (\(\beta\)) will depend on:
    • effect size (i.e. difference between \(H_0\) and reality)
    • sample size
    • significance level
    • using a left- or right-tailed test instead of a two-tailed
    • the test itself
  • If the sample size is fixed, decreasing the significance level (\(\alpha\)) will increase \(\beta\)
  • If we increase the sample size, we can reduce \(\beta\) while maintaining our desired \(\alpha\)

What type of error is worse?

A doctor tests a patient for a disease. The null hypothesis is that the patient is healthy.

  • Type I error: test shows patient has a disease when in fact the patient does not have the disease

  • Type II error: test shows the patient does not have the disease when in fact they do

Type I and Type II errors

  • We can control the probability of type I error by our choice of the significance level, \(\alpha\)
  • It’s difficult to control the probability of making type II error

Power

power: the probability of correctly rejecting the null hypothesis \(H_0\), when \(H_0\) is false

  • \(\text{Pr(Reject } H_0 \text{ when } H_0 \text{ is false}) = 1 - \beta\)

  • We want the power to be large

  • When to use power:

    • Choose between similar tests
    • Power analysis, see what sample size is needed

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Critical value

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \(H_0:\mu = 100 \; vs. \; H_A: \mu > 100\).

  • Test statistic: \(T = \frac{\bar{X} - \mu_0}{s/\sqrt{n}} = \frac{\bar{X} - 100}{10/\sqrt{16}}\)

  • Critical value: \(\text{Pr}(T \geq t^*) = 0.05\) \(\Rightarrow\) \(t^* =\) qt(0.05, df = 15, lower.tail = FALSE) = 1.75

  • We reject \(H_0\) when \(T \geq\) 1.75

  • For what value of \(\bar{X}\) do we reject \(H_0\)?

    • \(t^* = \frac{\bar{X}^* - \mu_0}{s/\sqrt{n}}\) \(\Rightarrow\) \(\bar{X}^* = (s/\sqrt{n}) t^* + \mu_0 =\) 104.38

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Type II error

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \[H_0:\mu = 100 \; vs. \; H_A: \mu > 100\]

We want to find the type II error rate if the true mean is 108.

  • We reject \(H_0\) when \(\bar{X} \geq\) 104.38

  • \[\begin{align} P(\text{Type II error}) &= P(\bar{X} < 104.38 \text{ when } \mu = 108) \\ &= P(T < \frac{104.38 - 108}{10/\sqrt{16}}) \\ &= P(T < -1.448) \end{align}\]

  • pt(-1.448, df = 15) = 0.08

Source: https://online.stat.psu.edu/stat415/lesson/25/25.2

Clicker question

Suppose the IQ of adults follow a Normal distribution. We take a random sample of \(n = 16\) people. The sample mean is \(\bar{X} = 101\) and sample standard deviation is \(s = 10\). We set \(\alpha = 0.05\) and want to test: \[H_0:\mu = 100 \; vs. \; H_A: \mu > 100\]

What is the power of the test if the true mean is 108?

A. 0.05

B. 0.45

C. 0.64

D. 0.92

E. 0.95

Power Function

  • Unfortunately, in reality, we would not know that the true mean is 108;

  • So, what we do is to calculate the power for different values of \(\mu\) in \(H_A\);

    • This is the so-called power function.

Confidence intervals and two-tailed hypothesis tests

  • If we have a two-sided hypothesis test \(H_0:\mu = \mu_0 \; vs. \; H_A: \mu \ne \mu_0\) with significance level \(\alpha\)
    • \(H_0\) is rejected if the \(100\times(1 - \alpha)\)% confidence interval does not include \(\mu_0\)
  • Example: Suppose a 95% confidence interval for the mean is 10 to 12, and we want to test \(H_0: \mu = 13\) vs. \(H_0: \mu \ne 13\).
    • We would reject \(H_0\) at a 5% significance level since the hypothesized value is not captured by the interval

Today’s worksheet

  • Using known populations, explore:
    • type I error rates
    • factors that affects type II error rates and power
    • issues associated with multiple hypothesis testing
  • Explore how estimating multiple parameters affect the coverage of confidence intervals;

Take home

  • Type I error: rejecting \(H_0\) when it is true

    • Controlled by specifying the significance level
  • Type II error: not rejecting \(H_0\) when it is false

    • Affected by:

      - effect size (i.e., the difference between the null hypothesis and reality)
      - sample size
      
      - significance level
      
      - using a left- or right-tailed test instead of a two-tailed
      
      - the test itself
  • Power: probability of rejecting \(H_0\) when it is false

  • We want a test that minimizes type I error rate and has high power

Now it’s your turn!

  • navigate to Canvas, open worksheet_08

We are here to help!