STAT 306 - Lecture 06
Scroll down
There’s no guarantee that the model assumptions are reasonable for the data at hand.
Using \(Y\) to check the assumptions is hard because \(Y\) contains both the signal (the linear part) and the noise (the errors).
But if the model is appropriate (i.e., the assumptions are reasonable), then the errors \(\varepsilon_i\) should behave like i.i.d. observations from \(N(0,\sigma^2)\).
Unfortunately, the errors are not observable, so we use the residuals \(e_i\) as a proxy for the errors \(\varepsilon_i\).
\[ \sum_{i=1}^n e_i = 0 \;. \]
\[ \sum_{i=1}^n X_i e_i = 0 \;. \]
Residuals are often used for model diagnostics, i.e., to assess how well the model fits to the data.
Several model diagnostic methods use plots to graphically determine if the observed residuals behave in ways that differ from how they should behave (in theory) under the model assumptions.
Some common plots include:
Scroll down


Scroll down


Scroll down


Scroll down
The normality assumption for the errors can be checked using a normal quantile plot of the residuals.
A normal quantile plot (sometimes called a normal probability plot) plots the ordered residuals \(e_{(1)}\leq\dotsc\leq e_{(n)}\) against \(E[Z_{(1)}]\leq\dotsc\leq E[Z_{(n)}]\), where \(Z_{(1)}\leq\dotsc\leq Z_{(n)}\) are an ordered sample of size \(n\) from \(N(0,1)\).
If the distribution of the residuals is close to normal, then the residual quantiles should resemble the expected quantiles, and so the points in the plot should lie closely along a straight line that has an intercept of \(0\) and a slope of \(\sigma\).



Scroll down
When the covariate represents (most commonly) discrete time points, serial correlation (or more generally, autocorrelation) refers to the correlations in the responses across time points.
If there is no serial correlation, then plotting sequential residuals \(e_{i+1}\) against \(e_i\) should show no obvious patterns.


Show that \[\sum_{i=1}^n (Y_i-\bar{Y})^2 = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 + \sum_{i=1}^n (Y_i-\hat{Y}_i)^2\]
What does each of the three terms represent?
We have that: \[ \begin{align} \sum_{i=1}^n (Y_i-\bar{Y})^2 &= \sum_{i=1}^n (Y_i-\hat{Y}_i + \hat{Y}_i - \bar{Y})^2\\ &= \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 + \sum_{i=1}^n (Y_i-\hat{Y}_i)^2 - 2\sum_{i=1}^n (Y_i-\hat{Y}_i)(\hat{Y}_i - \bar{Y})\\ &= \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 + \sum_{i=1}^n (Y_i-\hat{Y}_i)^2 \end{align} \]
Because, \[ \begin{align} \sum_{i=1}^n (Y_i-\hat{Y}_i)(\hat{Y}_i - \bar{Y}) &= \sum_{i=1}^n e_i(\hat{Y}_i - \bar{Y})\\ &= \hat{\beta}_0\sum_{i=1}^n e_i + \hat{\beta}_1\sum_{i=1}^n e_i x_i - \bar{Y}\sum_{i=1}^n e_i\\ &= 0 + 0 - 0\\ &= 0 \end{align} \]
Note that \(\sum_{i=1}^n (Y_i-\bar{Y})^2\) measures the total variability in the response \(Y\), and is called total sum of squares (TSS).
The term \(\sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2\) measures the variability in the fitted values \(\hat{Y}_i\), and is called model sum of squares (MSS).
The term \(\sum_{i=1}^n (Y_i-\hat{Y}_i)^2\) measures the variability in the residuals \(e_i\), and is called residual sum of squares (RSS).
From \(\sum_{i=1}^n (Y_i-\bar{Y})^2 = \sum_{i=1}^n (\hat{Y}_i - \bar{Y})^2 + \sum_{i=1}^n (Y_i-\hat{Y}_i)^2\), we can write \[ \begin{align} TSS &= MSS + RSS \\ 1 &= \frac{MSS}{TSS} + \frac{RSS}{TSS} \end{align} \]
The term \(R^2 = \frac{MSS}{TSS} = 1 - \frac{RSS}{TSS}\) is called the coefficient of determination.
Solution:
We want to prove that \(R^2 = r^2\). We start with the definition \(R^2 = \frac{MSS}{TSS}\). Let \(S_{xx} = \sum (x_i - \bar{x})^2\), \(S_{yy} = \sum (y_i - \bar{y})^2\), and \(S_{xy} = \sum (x_i - \bar{x})(y_i - \bar{y})\). Then,
\[ \begin{align*} R^2 &= \frac{MSS}{TSS} \\ &= \frac{\sum(\hat{y}_i - \bar{y})^2}{S_{yy}} \\ &= \frac{\sum(\bar{y}-\hat{\beta}_1(x_i - \bar{x})-\bar{y})^2}{S_{yy}} \\ &= \frac{\sum(\hat{\beta}_1(x_i - \bar{x}))^2}{S_{yy}} \\ &= \frac{\hat{\beta}_1^2 \sum(x_i - \bar{x})^2}{S_{yy}} \\ &= \frac{\hat{\beta}_1^2 S_{xx}}{S_{yy}} \\ &= \frac{(S_{xy}/S_{xx})^2 S_{xx}}{S_{yy}} \\ &= \frac{S_{xy}^2}{S_{xx}S_{yy}} \\ &= \left(\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\right)^2 \\ &= r^2 \end{align*} \]
body mass based on the bill depth.
Call:
lm(formula = body_mass_g ~ bill_depth_mm, data = penguins)
Coefficients:
(Intercept) bill_depth_mm
7520 -193

species, the correlation between bill depth and body mass becomes positive;We keep saying that we assume a linear relationship between \(X\) and \(Y\).
The linearity assumption is linear in the parameters. For example:
We can just think of \(Z = X\), \(Z = X^2\), \(Z = \log(X)\), \(Z = e^X\), etc., and then fit a linear model of the form \[ Y_i = \beta_0 + \beta_1 Z_i + \epsilon_i \]
The model is not linear if it includes terms like \(\sin(\beta_1 X)\).


Everything we discussed is valid for this model as well.
However, note that the interpretation of \(\beta_1\) now is about the expected change in \(Y\) for a one-unit increase in \(\sin(X)\), not \(X\), which is not as straightforward as before.
© Rodolfo Lourenzutti - Adapted from Kenny Chiu’s material – licensed under CC By 4.0