Association between variables

STAT 200 - Chapter 6

Association

  • Earlier, we explored the association between categorical variables;

  • We will now extend this discussion to quantitative variables;

Association

  • Do people with bigger brains tend to be more intelligent?

  • Do students with higher attendance tend to have better performance in a course?

  • Do taller athletes tend to be faster in 100m?

  • Do taller penguins tend to be heavier?

Scatterplot

  • In scatterplot, each observation is represented by a point;
    • explanatory variable goes in the \(x\)-axis;
    • response variable goes in the \(y\)-axis;
  • Scatterplots are really useful to visualize the relationship between two quantitative variables;

Scatterplot: Example 1

Miles per Gallon Horsepower
18 130
15 165
18 150
16 150
17 140
15 198
14 220
14 215
14 225
15 190
15 170
14 160
15 150
14 225
24 95
22 95
18 97
21 85
27 88
26 46
25 87
24 90
25 95
26 113
21 90
10 215
10 200
11 210
9 193
27 88
28 90
25 95
19 100
16 105
17 100
19 88
18 100
14 165
14 175
14 153
14 150
12 180
13 170
13 175
18 110
22 72
19 100
18 88
23 86
28 90
30 70
30 76
31 65
35 69
27 60
26 70
24 95
25 80
23 54
20 90
21 86
13 165
14 175
15 150
14 153
17 150
11 208
13 155
12 160
13 190
19 97
15 150
13 130
13 140
14 150
18 112
22 76
21 87
26 69
22 86
28 92
23 97
28 80
27 88
13 175
14 150
13 145
14 137
15 150
12 198
13 150
13 158
14 150
13 215
12 225
13 175
18 105
16 100
18 100
18 88
23 95
26 46
11 150
12 167
13 170
12 180
18 100
20 88
21 72
22 94
18 90
19 85
21 107
26 90
15 145
16 230
29 49
24 75
20 91
19 112
15 150
24 110
20 122
11 180
20 95
19 100
15 100
31 67
26 80
32 65
25 75
16 100
16 110
18 105
16 140
13 150
14 150
14 140
14 150
29 83
26 67
26 78
31 52
32 61
28 75
24 75
26 75
24 97
26 93
31 67
19 95
18 105
15 72
15 72
16 170
15 145
16 150
14 148
17 110
16 105
15 110
18 95
21 110
20 110
13 129
29 75
23 83
20 100
23 78
24 96
25 71
24 97
18 97
29 70
19 90
23 95
23 88
22 98
25 115
33 53
28 86
25 81
25 92
26 79
27 83
17.5 140
16 150
15.5 120
14.5 152
22 100
22 105
24 81
22.5 90
29 52
24.5 60
29 70
33 53
20 100
18 78
18.5 110
17.5 95
29.5 71
32 70
28 75
26.5 72
20 102
13 150
19 88
19 108
16.5 120
16.5 180
13 145
13 130
13 150
31.5 68
30 80
36 58
25.5 96
33.5 70
17.5 145
17 110
15.5 145
15 130
17.5 110
20.5 105
19 100
18.5 98
16 180
15.5 170
15.5 190
16 149
29 78
24.5 88
26 75
25.5 89
30.5 63
33.5 83
30 67
30.5 78
22 97
21.5 110
21.5 110
43.1 48
36.1 66
32.8 52
39.4 70
36.1 60
19.9 110
19.4 140
20.2 139
19.2 105
20.5 95
20.2 85
25.1 88
20.5 100
19.4 90
20.6 105
20.8 85
18.6 110
18.1 120
19.2 145
17.7 165
18.1 139
17.5 140
30 68
27.5 95
27.2 97
30.9 75
21.1 95
23.2 105
23.8 85
23.9 97
20.3 103
17 125
21.6 115
16.2 133
31.5 71
29.5 68
21.5 115
19.8 85
22.3 88
20.2 90
20.6 110
17 130
17.6 129
16.5 138
18.2 135
16.9 155
15.5 142
19.2 125
18.5 150
31.9 71
34.1 65
35.7 80
27.4 80
25.4 77
23 125
27.2 71
23.9 90
34.2 70
34.5 70
31.8 65
37.3 69
28.4 90
28.8 115
26.8 115
33.5 90
41.5 76
38.1 60
32.1 70
37.2 65
28 90
26.4 88
24.3 90
19.1 90
34.3 78
29.8 90
31.3 75
37 92
32.2 75
46.6 65
27.9 105
40.8 65
44.3 48
43.4 48
36.4 67
30 67
44.6 67
33.8 67
29.8 62
32.7 132
23.7 100
35 88
32.4 72
27.2 84
26.6 84
25.8 92
23.5 110
30 84
39.1 58
39 64
35.1 60
32.3 67
37 65
37.7 62
34.1 68
34.7 63
34.4 65
29.9 65
33 74
33.7 75
32.4 75
32.9 100
31.6 74
28.1 80
30.7 76
25.4 116
24.2 120
22.4 110
26.6 105
20.2 88
17.6 85
28 88
27 88
34 88
31 85
29 84
27 90
24 92
36 74
37 68
31 68
38 63
36 70
36 88
36 75
34 70
38 67
32 67
38 67
25 110
38 85
26 92
22 112
32 96
36 84
27 90
27 86
44 52
32 84
28 79
31 82

Scatterplot: Example 2

Weight_in_lbs Acceleration
3504 12
3693 11.5
3436 11
3433 12
3449 10.5
4341 10
4354 9
4312 8.5
4425 10
3850 8.5
3090 17.5
4142 11.5
4034 11
4166 10.5
3850 11
3563 10
3609 8
3353 8
3761 9.5
3086 10
2372 15
2833 15.5
2774 15.5
2587 16
2130 14.5
1835 20.5
2672 17.5
2430 14.5
2375 17.5
2234 12.5
2648 15
4615 14
4376 15
4382 13.5
4732 18.5
2130 14.5
2264 15.5
2228 14
2046 19
1978 20
2634 13
3439 15.5
3329 15.5
3302 15.5
3288 15.5
4209 12
4464 11.5
4154 13.5
4096 13
4955 11.5
4746 12
5140 12
2962 13.5
2408 19
3282 15
3139 14.5
2220 14
2123 14
2074 19.5
2065 14.5
1773 19
1613 18
1834 19
1955 20.5
2278 15.5
2126 17
2254 23.5
2408 19.5
2226 16.5
4274 12
4385 12
4135 13.5
4129 13
3672 11.5
4633 11
4502 13.5
4456 13.5
4422 12.5
2330 13.5
3892 12.5
4098 14
4294 16
4077 14
2933 14.5
2511 18
2979 19.5
2189 18
2395 16
2288 17
2506 14.5
2164 15
2100 16.5
4100 13
3672 11.5
3988 13
4042 14.5
3777 12.5
4952 11.5
4464 12
4363 13
4237 14.5
4735 11
4951 11
3821 11
3121 16.5
3278 18
2945 16
3021 16.5
2904 16
1950 21
4997 14
4906 12.5
4654 13
4499 12.5
2789 15
2279 19
2401 19.5
2379 16.5
2124 13.5
2310 18.5
2472 14
2265 15.5
4082 13
4278 9.5
1867 19.5
2158 15.5
2582 14
2868 15.5
3399 11
2660 14
2807 13.5
3664 11
3102 16.5
2875 17
2901 16
3336 17
1950 19
2451 16.5
1836 21
2542 17
3781 17
3632 18
3613 16.5
4141 14
4699 14.5
4457 13.5
4638 16
4257 15.5
2219 16.5
1963 15.5
2300 14.5
1649 16.5
2003 19
2125 14.5
2108 15.5
2246 14
2489 15
2391 15.5
2000 16
3264 16
3459 16
3432 21
3158 19.5
4668 11.5
4440 14
4498 14.5
4657 13.5
3907 21
3897 18.5
3730 19
3785 19
3039 15
3221 13.5
3169 12
2171 16
2639 17
2914 16
2592 18.5
2702 13.5
2223 16.5
2545 17
2984 14.5
1937 14
3211 17
2694 15
2957 17
2945 14.5
2671 13.5
1795 17.5
2464 15.5
2220 16.9
2572 14.9
2255 17.7
2202 15.3
4215 13
4190 13
3962 13.9
4215 12.8
3233 15.4
3353 14.5
3012 17.6
3085 17.6
2035 22.2
2164 22.1
1937 14.2
1795 17.4
3651 17.7
3574 21
3645 16.2
3193 17.8
1825 12.2
1990 17
2155 16.4
2565 13.6
3150 15.7
3940 13.2
3270 21.9
2930 15.5
3820 16.7
4380 12.1
4055 12
3870 15
3755 14
2045 18.5
2155 14.8
1825 18.6
2300 15.5
1945 16.8
3880 12.5
4060 19
4140 13.7
4295 14.9
3520 16.4
3425 16.9
3630 17.7
3525 19
4220 11.1
4165 11.4
4325 12.2
4335 14.5
1940 14.5
2740 16
2265 18.2
2755 15.8
2051 17
2075 15.9
1985 16.4
2190 14.1
2815 14.5
2600 12.8
2720 13.5
1985 21.5
1800 14.4
1985 19.4
2070 18.6
1800 16.4
3365 15.5
3735 13.2
3570 12.8
3535 19.2
3155 18.2
2965 15.8
2720 15.4
3430 17.2
3210 17.2
3380 15.8
3070 16.7
3620 18.7
3410 15.1
3425 13.2
3445 13.4
3205 11.2
4080 13.7
2155 16.5
2560 14.2
2300 14.7
2230 14.5
2515 14.8
2745 16.7
2855 17.6
2405 14.9
2830 15.9
3140 13.6
2795 15.7
3410 15.8
1990 14.9
2135 16.6
3245 15.4
2990 18.2
2890 17.3
3265 18.2
3360 16.6
3840 15.4
3725 13.4
3955 13.2
3830 15.2
4360 14.9
4054 14.3
3605 15
3940 13
1925 14
1975 15.2
1915 14.4
2670 15
3530 20.1
3900 17.4
3190 24.8
3420 22.2
2200 13.2
2150 14.9
2020 19.2
2130 14.7
2670 16
2595 11.3
2700 12.9
2556 13.2
2144 14.7
1968 18.8
2120 15.5
2019 16.4
2678 16.5
2870 18.1
3003 20.1
3381 18.7
2188 15.8
2711 15.5
2542 17.5
2434 15
2265 15.2
2110 17.9
2800 14.4
2110 19.2
2085 21.7
2335 23.7
2950 19.9
3250 21.8
1850 13.8
1835 17.3
2145 18
1845 15.3
2910 11.4
2420 12.5
2500 15.1
2905 14.3
2290 17
2490 15.7
2635 16.4
2620 14.4
2725 12.6
2385 12.9
1755 16.9
1875 16.4
1760 16.1
2065 17.8
1975 19.4
2050 17.3
1985 16
2215 14.9
2045 16.2
2380 20.7
2190 14.2
2320 15.8
2210 14.4
2350 16.8
2615 14.8
2635 18.3
3230 20.4
2800 15.4
3160 19.6
2900 12.6
2930 13.8
3415 15.8
3725 19
3060 17.1
3465 16.6
2605 19.6
2640 18.6
2395 18
2575 16.2
2525 16
2735 18
2865 16.4
3035 20.5
1980 15.3
2025 18.2
1970 17.6
2125 14.7
2125 17.3
2160 14.5
2205 14.5
2245 16.9
1965 15
1965 15.7
1995 16.2
2945 16.4
3015 17
2585 14.5
2835 14.7
2665 13.9
2370 13
2950 17.3
2790 15.6
2130 24.6
2295 11.6
2625 18.6
2720 19.4

Scatterplot: Example 3

body_mass flipper_length
3750 181
3800 186
3250 195
3450 193
3650 190
3625 181
4675 195
3200 182
3800 191
4400 198
3700 185
3450 195
4500 197
3325 184
4200 194
3400 174
3600 180
3800 189
3950 185
3800 180
3800 187
3550 183
3200 187
3150 172
3950 180
3250 178
3900 178
3300 188
3900 184
3325 195
4150 196
3950 190
3550 180
3300 181
4650 184
3150 182
3900 195
3100 186
4400 196
3000 185
4600 190
3425 182
3450 190
4150 191
3500 186
4300 188
3450 190
4050 200
2900 187
3700 191
3550 186
3800 193
2850 181
3750 194
3150 185
4400 195
3600 185
4050 192
2850 184
3950 192
3350 195
4100 188
3050 190
4450 198
3600 190
3900 190
3550 196
4150 197
3700 190
4250 195
3700 191
3900 184
3550 187
4000 195
3200 189
4700 196
3800 187
4200 193
3350 191
3550 194
3800 190
3500 189
3950 189
3600 190
3550 202
4300 205
3400 185
4450 186
3300 187
4300 208
3700 190
4350 196
2900 178
4100 192
3725 192
4725 203
3075 183
4250 190
2925 193
3550 184
3750 199
3900 190
3175 181
4775 197
3825 198
4600 191
3200 193
4275 197
3900 191
4075 196
2900 188
3775 199
3350 189
3325 189
3150 187
3500 198
3450 176
3875 202
3050 186
4000 199
3275 191
4300 195
3050 191
4000 210
3325 190
3500 197
3500 193
4475 199
3425 187
3900 190
3175 191
3975 200
3400 185
4250 193
3400 193
3475 187
3050 188
3725 190
3000 192
3650 185
4250 190
3475 184
3450 195
3750 193
3700 187
4000 201
4500 211
5700 230
4450 210
5700 218
5400 215
4550 210
4800 211
5200 219
4400 209
5150 215
4650 214
5550 216
4650 214
5850 213
4200 210
5850 217
4150 210
6300 221
4800 209
5350 222
5700 218
5000 215
4400 213
5050 215
5000 215
5100 215
5650 215
4600 210
5550 220
5250 222
4700 209
5050 207
6050 230
5150 220
5400 220
4950 213
5250 219
4350 208
5350 208
3950 208
5700 225
4300 210
4750 216
5550 222
4900 217
4200 210
5400 225
5100 213
5300 215
4850 210
5300 220
4400 210
5000 225
4900 217
5050 220
4300 208
5000 220
4450 208
5550 224
4200 208
5300 221
4400 214
5650 231
4700 219
5700 230
5800 229
4700 220
5550 223
4750 216
5000 221
5100 221
5200 217
4700 216
5800 230
4600 209
6000 220
4750 215
5950 223
4625 212
5450 221
4725 212
5350 224
4750 212
5600 228
4600 218
5300 218
4875 212
5550 230
4950 218
5400 228
4750 212
5650 224
4850 214
5200 226
4925 216
4875 222
4625 203
5250 225
4850 219
5600 228
4975 215
5500 228
5500 215
4700 210
5500 219
4575 208
5500 209
5000 216
5950 229
4650 213
5500 230
4375 217
5850 230
6000 222
4925 214
4850 215
5750 222
5200 212
5400 213
3500 192
3900 196
3650 193
3525 188
3725 197
3950 198
3250 178
3750 197
4150 195
3700 198
3800 193
3775 194
3700 185
4050 201
3575 190
4050 201
3300 197
3700 181
3450 190
4400 195
3600 181
3400 191
2900 187
3800 193
3300 195
4150 197
3400 200
3800 200
3700 191
4550 205
3200 187
4300 201
3350 187
4100 203
3600 195
3900 199
3850 195
4800 210
2700 192
4500 205
3950 210
3650 187
3550 196
3500 196
3675 196
4450 201
3400 190
4300 212
3250 187
3675 198
3325 199
3950 201
3600 193
4050 203
3350 187
3450 197
3250 191
4050 203
3800 202
3525 194
3950 206
3650 189
3650 195
4000 207
3400 202
3775 193
4100 210
3775 198

Scatterplot: what to look for?

Direction

Positive Correlation

Negative Correlation

Scatterplot: what to look for?

Form of relationship

Linear Relationship

Non-Linear Relationship

Scatterplot: what to look for?

Strength of Relationship

Stronger Relationship

Weaker Relationship

The correlation coefficient

  • The correlation coefficient, \(r\), measures the strength of the linear association between two quantitative variables;
  • The correlation coefficient is always between -1 and 1:
    • Positive correlation: an increase in \(X\) tends to be followed by an increase in \(Y\);
    • Negative correlation: an increase in \(X\) tends to be followed by an decrease in \(Y\);

Warning

In general, an increase in \(X\) does not cause a change in \(Y\). It is associated with a change in \(Y\).

Properties of Correlation

  • \(r\) has no unit;

  • if \(r<0\) the variables are negatively correlated

    • \(r=-1\) for perfect negative correlation
  • if \(r>0\) the variables are positively correlated

    • \(r=1\) for perfect positive correlation
  • \(r\approx 0\) implies very weak or no linear relationship between the variables;

  • Swapping the \(x\) and \(y\) variables does not affect the value of \(r\);

  • The value of \(r\) does not change if all values of either variable are added a constant or multiplied by a positive constant;

  • \(r\) is sensitive to outliers;

Example

Non-linear relationship and correlation

Warning

\(r\) close to zero does not imply two variables are not related. They could still have a non-linear relationship.

Example: Non-linear relationship and correlation

Correlation is not causality

  • Even if there is an association (linear or otherwise) between two variables, it does not necessarily mean that one variable causes the other.
  • There could be a third variable, referred to as lurking variable, that causes changes in both, \(X\) and \(Y\).

  • Association does not imply causation!

Example: Firefighters

  • Consider two variables:
    • \(X\): total number of firefighters sent to an incident;
    • \(Y\): total damage (in dollars);
  • Do you think \(X\) and \(Y\) are associated?
    • Positively or negatively?

Example: Firefighters

  • Can we conclude that sending more firefighters to the fire scene cause more damage?!
  • Can you think of a possible lurking variable that could affect both, \(X\) and \(Y\)?

Let’s play a game

Guess the correlation

References & Attributions

Images Attributions

Data Attributions

References

Horst, Allison Marie, Alison Presmanes Hill, and Kristen B Gorman. 2020. Palmerpenguins: Palmer Archipelago (Antarctica) Penguin Data. https://doi.org/10.5281/zenodo.3960218.