The Normal Distribution

STAT 200 - Chapter 5 Part II

Introduction

  • Surprisingly, many unrelated variables from different studies have an unimodal distribution that is (roughly) symmetric around the mean. For example:
    • Birthweight;
    • Housefly wing’s length;
    • Pulse rate per minute of adults;

Introduction

  • We might be interested in questions like:
    • What is the proportion of newborns with weight between 2.5kg and 5kg?
    • What is the proportion of adults with a pulse rate above 100 beats/min?
    • What is the birthweight such that 95% of the newborns are below that? (quantile)
    • What is the rate such that 95% of adults’ pulse rate are above that? (quantile)

Normal Model

  • A specific probabilistic model, named Normal (or Gaussian) Distribution, frequently can model these variables quite well.
  • But why use models?
    • We can use the model to answer questions (such as the ones in the previous slide), instead of the data itself;
    • Models can help us to describe the relation between variables;

Normal Model

Scroll down

  • Properties:
    • Bell-shaped and Unimodal;

    • Fully specified by two parameters, \(\mu\) and \(\sigma\):

      • \(\mu\) determines the location;

      • \(\sigma\) determines the spread;

    • Symmetric about the mean \(\mu\);

Areas under the Normal Model

  • The area under the Normal model tells us the probability that the corresponding variable is in a specified region.

  • We need to use computers to obtain the area under the normal model (there’s no analytical solution).

  • But, there’s a rule that can help us do a quick check of our calculations.

The 68-95-99.7% Rule

Scroll down

No matter what is the value of \(\mu\) and \(\sigma\) we have the following rule

Interval % of data within the interval
within \(1\sigma\) of \(\mu\) about \(68\%\)
within \(2\sigma\) of \(\mu\) about \(95\%\)
within \(3\sigma\) of \(\mu\) about \(99.7\%\)


  • This is an useful approximation for sanity check!
    • For actual solutions use R (or a table if you don’t have access to R).

R’s pnorm and qnorm functions

Scroll down

Probability:

  • To obtain the area under the curve, we use the pnorm function.

  • For example, suppose we have a \(N( \mu = 10, \sigma^2 = 3)\) and want the area below 11.5:

  • We can use the following code
pnorm( 11.5, mean = 10,  sd = sqrt(3))  
[1] 0.8067619

Quantile:

  • To obtain the quantile of a Normal, we use the qnorm function.

  • For example, suppose we have a \(N( \mu = 10, \sigma^2 = 3)\) and want the 0.69-quantile:

  • We can use the following code
qnorm( 0.69, mean = 10,  sd = sqrt(3))
[1] 10.85884

Standard Normal

  • The \(Z\)-score of a variable coming from \(N(\mu, \sigma^2)\) follows the Standard Normal distribution, i.e., \(N(0, 1)\).

  • There are multiple ways to check for adequacy of the Normal model. A simple (and subjective) way is to check if the relative frequency histogram looks like a Normal curve.

Example 1: Housefly Wing Lengths

  • Sokal and Hunter (1955) studied the wing lengths of houseflies.

Example 2: Birthweight

In this case, we have a heavier left tail, which might compromise the Normal approximation.

Exercise 1

Scores on a standard IQ test for the 20 to 34 age group follow approximately the Normal model with mean \(\mu=110\) and standard deviation \(\sigma=25\).

  1. What percentage of people aged 20 to 34 have IQ scores below 160?

  2. What percentage have scores between 90 and 120?

  3. How high is the IQ such that only 0.15% of the group fall above?

Exercise 2

A machine used to regulate the amount of dye dispensed for mixing shades of paint can be so that it discharges an average of \(\mu\) milliliters of dye per can of paint. The amount of dye discharged is known to follow the Normal model with a standard deviation of 0.4 milliliter. If more than 6 milliliters of dye are discharged when making a certain shade of blue paint, the shade is unacceptable. Determine the setting for the mean \(\mu\) such that only 2% of the cans of paint will be unacceptable.

References

Image Attributions

  • Fly Image Attribution: See page for author, CC BY 4.0, via Wikimedia Commons.

Data Attributions

Other references

Sokal, Robert R., and Preston E. Hunter. 1955. A Morphometric Analysis of Ddt-Resistant and Non-Resistant House Fly Strains1, 2.” Annals of the Entomological Society of America 48 (6): 499–507. https://doi.org/10.1093/aesa/48.6.499.