Module 4: confidence intervals via bootstrapping

Learning objectives

  • Define what a confidence interval is and why we want to generate one

  • Explain how the bootstrap sampling distribution can be used to create confidence intervals

  • Write a computer script to calculate confidence intervals for a population parameter using bootstrapping

  • Effectively visualize point estimates and confidence intervals

  • Interpret and explain results from confidence intervals

  • Discuss the potential limitations of these methods

  • Define what are percentiles and write a computer script to calculate them

Review: boostrap approximation of the sampling distribution

Review: boostrap approximation of the sampling distribution

Review: boostrap approximation of the sampling distribution

Review: boostrap approximation of the sampling distribution

Review: boostrap approximation of the sampling distribution

Today’s goal

  • Our goal: define confidence intervals, and compute them using the bootstrap distribution

Point Estimation vs. Confidence Intervals

image source: Modern Dive by Kim & McConville

Using the bootstrap to calculate a confidence interval (plausible range)

Using the bootstrap to calculate a confidence interval (plausible range)

Using the bootstrap to calculate a confidence interval (plausible range)

Using the bootstrap to calculate a confidence interval (plausible range)

95% confidence interval via bootstrapping

confidence interval: range from lower to upper

  • lower: \(2.5^{th}\) percentile
    • value such that 2.5% of the bootstrap point estimates fall below
  • upper: \(97.5^{th}\) percentile
    • value such that 97.5% of the bootstrap point estimates fall below

Confidence interval

attribution: https://www.zoology.ubc.ca/~whitlock/Kingfisher/CIMean.htm

Interpretation of 95% confidence interval (CI)

  • Technical: 95% of samples of size \(n\) will produce a 95% CI that contains the true population parameter value
  • Simpler: we are 95% confident that the true population parameter value lies in our interval

The uncertainty is about whether the sample is one of the successful ones that captured the true population parameter

Wrong interpretation of CIs

Scroll down

  • CI represents the probability that the true parameter value is contained within the interval (this is wrong!!!)
  • Population parameter has only 1 value.
  • If you repeat the sampling process, the population parameter still has only 1 value.
  • Thus it is incorrect to ask about the probability that the population mean lies within a certain range
    • Population parameter either is in the interval or it isn’t.

Notes about confidence intervals - P1

  • CI depends on the sample you collect.

  • If you collect a different sample, your CI will almost certainly be different.

Notes about confidence intervals - P2

Notes about confidence intervals - P3

Properties of bootstrap distribution

Properties of bootstrap distribution

When I present results, what should I report?

Can report both our sample point estimate and the confidence interval

  • Point estimate: best estimate of the population parameter value

  • Confidence interval: plausible range where we expect our true population quantity to fall

Today’s worksheet

  • Calculate confidence intervals using the bootstrap distribution
  • Interpret confidence intervals
  • Explore its properties and the effect of sample size

Take home

Scroll down

  • Confidence interval (CI) gives a plausible range where we expect our true population quantity to fall

  • Can calculate the \(C\)% CI by taking the \(\bigl(\frac{100-C}{2} \bigr)^{th}\) and \(\bigl( \frac{100+C}{2} \bigr)^{th}\) percentiles from the bootstrap distribution

    • e.g., 90% CI, take the \(5^{th}\) and \(95^{th}\) percentiles
  • Interpretation: we are \(C\)% confident that the true population parameter value lies in our interval

  • Confidence vs precision trade-off: higher level of confidence –> larger interval

Now it’s your turn!

  • navigate to Canvas, open worksheet_04

We are here to help!