quartz/content/vault/econometrics/lectnot/L10 Correlation.md
2022-06-07 16:56:28 -06:00

5.2 KiB

L10 Correlation (WMS 3.1-8)

Long lecture; use time efficiently
  1. Correlation coefficient \rho\in\lbrack - 1,1\rbrack

    • Positive correlation means variables X and Y tend to move in same direction (e.g. temperature X and ice cream sales Y)

    • Negative correlation means variables X and Y tend to move in opposite direction (e.g. frequency of exercise X and body mass index \text{Y})

    • Magnitude indicates strength of relationship

    • Independence implies \rho = 0

  2. Joint distribution function

    • Employee hours per week X and hourly wage Y
Y = 10 Y = 15
X = 20 .2 .1
X = 30 .1 .2
X = 40 .1 .3
Illustrate: P(x,y) = P(X = x\cap Y = y) =\begin{cases}.2\ if (x,y) =(20,10)\\.1\ if\ (x,y) =(20,15)\\.1\ if\ (x,y) =(30,10)\\.2\ if\ (x,y) =(30,15)\\.1\ if\ (x,y) =(40,10)\\.3\ if\ (x,y) =(40,15)\end{cases}
  1. Marginal distributions

    • Sum rows or columns: P(x) =\begin{cases}.3\ if\ x = 20\\.3\ if\ x = 30\\.4\ if\ x = 40\end{cases} and P(y) =\begin{cases}.4\ if\ y = 10\\.6\ if\ y = 15\end{cases}

    • Summary statistics (quick recap)

      1. \mu_{x} = 20(.3) + 30(.3) + 40(.4) = 31
      2. \sigma_{x}\approx 8.3
        1. E(X^{2}) = 20^{2}(.3) + 30^{2}(.3) + 40^{2}(.4) = 1,030
        2. \sigma_{x}^{2} = 1,030 - 31^{2} = 69
        3. \sigma_{x} =\sqrt{69}\approx 8.3
      3. \mu_{y} = 10(.4) + 15(.6) = 13
      4. \sigma_{y}\approx 2.4
        1. E(Y^{2}) = 10^{2}(.4) + 15^{2}(.6) = 175

        2. \sigma_{y}^{2} = 175 - 13^{2} = 6

        3. \sigma_{y} =\sqrt{6}\approx 2.4

    • Independence

      1. Definition: P(x,y) = P_{x}(x)P_{y}(y) for every (x,y) pair
      2. Not independent here, since P(20,10) = .20\neq P(20)P(10) = .3\times .4 = .12
  2. Expectations of functions of X and Y

    • Average weekly payment E(\text{XY}) =(20\cdot 10)(.20) +(20\cdot 15)(.10) +(30\cdot 10)(.10) +(30\cdot 15)(.20) +(40\cdot 10)(.10) +(40\cdot 15)(.30) = 40 + 30 + 30 + 90 + 40 + 180 = 410

    • Can do any function E\lbrack f(X,Y)\rbrack = f(x,y)P(x,y)

  3. Correlation

    • Covariance \sigma_{\text{xy}} = E\lbrack(X -\mu_{x})(Y -\mu_{y})\rbrack =\lbrack(20 - 31)(10 - 13)\rbrack(.20) +\lbrack(20 - 31)(15 - 13)\rbrack(.10) +\lbrack(30 - 31)(10 - 13)\rbrack(.10) +\lbrack(30 - 31)(10 - 15)\rbrack(.20) +\lbrack(40 - 31)(10 - 13)\rbrack(.10) +\lbrack(40 - 31)(15 - 13)\rbrack(.30) = 7

    • Simpler formula: \sigma_{\text{xy}} = E(\text{XY}) -\mu_{x}\mu_{y} = 410 -(31)(13) = 7

    • Correlation \rho =\frac{\sigma_{\text{xy}}}{\sigma_{x}\sigma_{y}} =\frac{7}{(8.3)(2.4)}\approx 0.35

      1. Positive, but fairly weak (again, not independent)
      2. Later: \rho^{2}\approx 0.12 measures % of variation in Y that covaries with X
  4. Algebra shortcuts

    • Covariance of a sum \text{Cov}(X,1200 - 2000Y) = Cov(X,1200) + Cov(X, - 2000Y) = 0 +(- 2000)\sigma_{\text{xy}}

    • Correlation of a sum \text{Corr}(X,1200 - 2000Y) = -\rho

    • Variance of a sum V(X + Y) = V(X) + V(Y) + 2Cov(X,Y)

    • Variance of a larger sum V(X + Y + Z) = V(X) + V(Y) + V(Z) + 2Cov(X,Y) + 2Cov(X,Z) + 2Cov(Y,Z) (with 100 variables, C_{2}^{100}\approx 5000 \text{Cov} terms)

    • Importance of independent samples

  5. Application: financial diversification

    • Assume two stocks have same average return \mu_{x} =\mu_{y} =\mu and same standard deviation \sigma_{x} =\sigma_{y} =\sigma.

    • You could buy two shares of X, or one share of X and one share of Y. Since you are indifferent between X and Y, it might seem that you should be indifferent between these two stock portfolios.

    • However, on your homework, you will prove that E(2X) = E(X + Y) but V(2X) < V(X + Y), as long as X and Y are not perfectly correlated (i.e. \rho < 1). In fact, if they are perfectly negatively correlated then V(X + Y) = 0!

    • Thus, the common financial advice to "Diversify your portfolio".

  6. Practice[if time]: Cell phone use X and number Y of car accidents

Y = 0 Y = 1 Y = 2
X = 0 .60 .08 .02
X = 1 .10 .12 .08
  • Note: numerical values can be assigned to binary categorical variables, so that notion of correlation is still meaningful.

  • Marginal distribution P(X = x) =\begin{cases}.70\ if\ x = 0\\.30\ if\ x = 1\end{cases}, mean E(X) = .3, variance \sigma_{x}^{2} = E(X^{2}) -\mu_{x}^{2} = .3 - {.3}^{2} = \sqrt{.21}, standard deviation \sigma_{x} =\approx 0.458

  • Marginal distribution P(Y = y) =\begin{cases}.70\ if\ y = 0\\.20\ if\ y = 1\\.10\ if\ y = 2\end{cases}, mean E(Y) = .4, variance \sigma_{y}^{2} = \sqrt{.44}, standard deviation \sigma_{y} =\approx 0.663

  • Not independent since P(0,0) = .6\neq P_{x}(0)P_{y}(0) =(.7)(.7) = .49

  • E(\text{XY}) =(0)(0)(.60) +(0)(1)(.08) +(0)(2)(.02) +(1)(0)(.10) +(1)(1)(.12) +(1)(2)(.08) = .28

  • Covariance \sigma_{\text{xy}} = E(\text{XY}) -\mu_{x}\mu_{y} = .28 -(.3)(.4) = .16

  • Correlation \rho =\frac{\sigma_{\text{xy}}}{\sigma_{x}\sigma_{y}} =\frac{.16}{(.458)(.663)}\approx 0.527 positive and moderately strong