5.2 KiB
L10 Correlation (WMS 3.1-8)
Long lecture; use time efficiently
-
Correlation coefficient
\rho\in\lbrack - 1,1\rbrack-
Positive correlation means variables
XandYtend to move in same direction (e.g. temperatureXand ice cream salesY) -
Negative correlation means variables
XandYtend to move in opposite direction (e.g. frequency of exerciseXand body mass index\text{Y}) -
Magnitude indicates strength of relationship
-
Independence implies
\rho = 0
-
-
Joint distribution function
- Employee hours per week
Xand hourly wageY
- Employee hours per week
Y = 10 |
Y = 15 |
|
|---|---|---|
X = 20 |
.2 | .1 |
X = 30 |
.1 | .2 |
X = 40 |
.1 | .3 |
Illustrate: P(x,y) = P(X = x\cap Y = y) =\begin{cases}.2\ if (x,y) =(20,10)\\.1\ if\ (x,y) =(20,15)\\.1\ if\ (x,y) =(30,10)\\.2\ if\ (x,y) =(30,15)\\.1\ if\ (x,y) =(40,10)\\.3\ if\ (x,y) =(40,15)\end{cases} |
-
Marginal distributions
-
Sum rows or columns:
P(x) =\begin{cases}.3\ if\ x = 20\\.3\ if\ x = 30\\.4\ if\ x = 40\end{cases}andP(y) =\begin{cases}.4\ if\ y = 10\\.6\ if\ y = 15\end{cases} -
Summary statistics (quick recap)
\mu_{x} = 20(.3) + 30(.3) + 40(.4) = 31\sigma_{x}\approx 8.3E(X^{2}) = 20^{2}(.3) + 30^{2}(.3) + 40^{2}(.4) = 1,030\sigma_{x}^{2} = 1,030 - 31^{2} = 69\sigma_{x} =\sqrt{69}\approx 8.3
\mu_{y} = 10(.4) + 15(.6) = 13\sigma_{y}\approx 2.4-
E(Y^{2}) = 10^{2}(.4) + 15^{2}(.6) = 175 -
\sigma_{y}^{2} = 175 - 13^{2} = 6 -
\sigma_{y} =\sqrt{6}\approx 2.4
-
-
Independence
- Definition:
P(x,y) = P_{x}(x)P_{y}(y)for every(x,y)pair - Not independent here, since
P(20,10) = .20\neq P(20)P(10) = .3\times .4 = .12
- Definition:
-
-
Expectations of functions of
XandY-
Average weekly payment
E(\text{XY}) =(20\cdot 10)(.20) +(20\cdot 15)(.10) +(30\cdot 10)(.10) +(30\cdot 15)(.20) +(40\cdot 10)(.10) +(40\cdot 15)(.30) = 40 + 30 + 30 + 90 + 40 + 180 = 410 -
Can do any function
E\lbrack f(X,Y)\rbrack = f(x,y)P(x,y)
-
-
Correlation
-
Covariance
\sigma_{\text{xy}} = E\lbrack(X -\mu_{x})(Y -\mu_{y})\rbrack=\lbrack(20 - 31)(10 - 13)\rbrack(.20)+\lbrack(20 - 31)(15 - 13)\rbrack(.10)+\lbrack(30 - 31)(10 - 13)\rbrack(.10)+\lbrack(30 - 31)(10 - 15)\rbrack(.20)+\lbrack(40 - 31)(10 - 13)\rbrack(.10)+\lbrack(40 - 31)(15 - 13)\rbrack(.30) = 7 -
Simpler formula:
\sigma_{\text{xy}} = E(\text{XY}) -\mu_{x}\mu_{y} = 410 -(31)(13) = 7 -
Correlation
\rho =\frac{\sigma_{\text{xy}}}{\sigma_{x}\sigma_{y}} =\frac{7}{(8.3)(2.4)}\approx 0.35- Positive, but fairly weak (again, not independent)
- Later:
\rho^{2}\approx 0.12measures % of variation inYthat covaries withX
-
-
Algebra shortcuts
-
Covariance of a sum
\text{Cov}(X,1200 - 2000Y) = Cov(X,1200) + Cov(X, - 2000Y) = 0 +(- 2000)\sigma_{\text{xy}} -
Correlation of a sum
\text{Corr}(X,1200 - 2000Y) = -\rho -
Variance of a sum
V(X + Y) = V(X) + V(Y) + 2Cov(X,Y) -
Variance of a larger sum
V(X + Y + Z) = V(X) + V(Y) + V(Z) + 2Cov(X,Y) + 2Cov(X,Z) + 2Cov(Y,Z)(with 100 variables,C_{2}^{100}\approx 5000\text{Cov}terms) -
Importance of independent samples
-
-
Application: financial diversification
-
Assume two stocks have same average return
\mu_{x} =\mu_{y} =\muand same standard deviation\sigma_{x} =\sigma_{y} =\sigma. -
You could buy two shares of
X, or one share ofXand one share ofY. Since you are indifferent betweenXandY, it might seem that you should be indifferent between these two stock portfolios. -
However, on your homework, you will prove that
E(2X) = E(X + Y)butV(2X) < V(X + Y), as long asXandYare not perfectly correlated (i.e.\rho < 1). In fact, if they are perfectly negatively correlated thenV(X + Y) = 0! -
Thus, the common financial advice to "Diversify your portfolio".
-
-
Practice[if time]: Cell phone use
Xand numberYof car accidents
Y = 0 |
Y = 1 |
Y = 2 |
|
|---|---|---|---|
X = 0 |
.60 | .08 | .02 |
X = 1 |
.10 | .12 | .08 |
-
Note: numerical values can be assigned to binary categorical variables, so that notion of correlation is still meaningful.
-
Marginal distribution
P(X = x) =\begin{cases}.70\ if\ x = 0\\.30\ if\ x = 1\end{cases}, meanE(X) = .3, variance\sigma_{x}^{2} = E(X^{2}) -\mu_{x}^{2} = .3 - {.3}^{2} = \sqrt{.21}, standard deviation\sigma_{x} =\approx 0.458 -
Marginal distribution
P(Y = y) =\begin{cases}.70\ if\ y = 0\\.20\ if\ y = 1\\.10\ if\ y = 2\end{cases}, meanE(Y) = .4, variance\sigma_{y}^{2} = \sqrt{.44}, standard deviation\sigma_{y} =\approx 0.663 -
Not independent since
P(0,0) = .6\neq P_{x}(0)P_{y}(0) =(.7)(.7) = .49 -
E(\text{XY}) =(0)(0)(.60) +(0)(1)(.08) +(0)(2)(.02) +(1)(0)(.10) +(1)(1)(.12) +(1)(2)(.08) = .28 -
Covariance
\sigma_{\text{xy}} = E(\text{XY}) -\mu_{x}\mu_{y} = .28 -(.3)(.4) = .16 -
Correlation
\rho =\frac{\sigma_{\text{xy}}}{\sigma_{x}\sigma_{y}} =\frac{.16}{(.458)(.663)}\approx 0.527positive and moderately strong