6.6 KiB
L14 Regressions (WMS 5.3, 11)
-
Regressions
-
Sir Francis Galton 1886 (cousin of Darwin)
-
Use data to determine (average) linear relationship
Y =\beta_{0} +\beta_{1}X. Once relationship is known, we can predictYfor anyXvalue (even out of sample), as if through a crystal ball! -
Examples:
- Income
Yas function of educationX - Unemployment
Ynext year as function of (e.g. fiscal or monetary) policyX - Stock price tomorrow
Yas function of today's earnings/priceX - Consultant's "secret formula" predicting sales, etc.
- Income
-
Puts units on correlation ("education and income are strongly correlated" vs. "each year of education is associated with an additional$4k of income")
-
Working example: education
\mu_{x} = 15years;\sigma_{x} = 3years; income\mu_{y} =\$ 70,000;\sigma_{y} =\$ 20,000; correlation\rho = .6 -
Any
\beta_{0}and\beta_{1}determine lineY =\beta_{0} +\beta_{1}X, implying an error term\varepsilon = Y -\beta_{0} -\beta_{1}Xsuch that the data satisfiesY =\beta_{0} +\beta_{1}X +\varepsilon. We can choose\beta_{0}and\beta_{1}so that the resulting line is as useful as possible. -
"Least squares" regression: choose
\beta_{0}and\beta_{1}to minimizeE(\varepsilon^{2})- Equivalently, choose
\beta_{0}so thatE(\varepsilon) = 0and\beta_{1}to minimizeV(\varepsilon) - Can use other criteria (e.g. least absolute deviation
E(|\varepsilon|)), but less common
- Equivalently, choose
-
-
Intercept
-
If
\beta_{0}is high, most\varepsilon_{i}will be negative; if\beta_{0}is low, most\varepsilon_{i}will be positive -
E(\varepsilon) =\mu_{y} -\beta_{0} -\beta_{1}\mu_{x} = 0implies that\beta_{0} =\mu_{y} -\beta_{1}\mu_{x}. Easier:\mu_{y} =\beta_{0} +\beta_{1}\mu_{x}, so regression line passes through(\mu_{x},\mu_{y})
-
- Example:
\beta_{0} =\$ 70,000 -\$ 4,000\cdot 15 =\$ 10,000
-
Slope
-
V(\varepsilon) = V(Y) + V(-\beta_{1}X) + 2Cov(Y, -\beta_{1}X) =\sigma_{y}^{2} +\beta_{1}^{2}\sigma_{x}^{2} - 2\beta_{1}\sigma_{\text{xy}} -
To minimize,
0 =\frac{\text{dV}(\varepsilon)}{d\beta_{1}} = 2\beta_{1}\sigma_{x}^{2} - 2\sigma_{\text{xy}} -
Solution
\beta_{1} =\frac{\sigma_{\text{xy}}}{\sigma_{x}^{2}} =\frac{\sigma_{\text{xy}}}{\sigma_{x}\sigma_{y}}\frac{\sigma_{y}}{\sigma_{x}} =\rho\frac{\sigma_{y}}{\sigma_{x}} -
Slope is simply (normalized) correlation coefficient
-
Example: $\beta_{1} = .6\frac{$ 20,000}{3yr.} =$ 4,000$/yr. (e.g. four-year degree provides extra $16,000/yr)
-
Equivalently,
\beta_{1}solves\text{Cov}(X,\varepsilon) = 0(see homework)
-
-
Predictions
-
High school grad (
X^{*} = 12) expectsY^{*} =\$ 10k +\$ 4k(12) =\$ 58k -
College grad (
X^{*} = 16) expectsY^{*} =\$ 10k +\$ 4k(16) =\$ 74k -
Three PhDs (
X^{*} = 31) expectsY^{*} =\$ 10k +\$ 4k(31) =\$ 134k- This assumes linear trend holds up, constant returns to scale (which may not be reasonable); in econometrics (Econ 388), learn nonlinear regressions
-
Standardized
\frac{Y^{*} -\mu_{y}}{\sigma_{y}} =\rho\frac{X^{*} -\mu_{x}}{\sigma_{x}}(since\beta_{1} =\rho\frac{\sigma_{y}}{\sigma_{x}},\mu_{y} =\beta_{0} +\beta_{1}\mu_{x}, andY^{*} =\beta_{0} +\beta_{1}X^{*}).- If
X^{*}is1or2orkstandard deviation above\mu_{x}thenY^{*}is\rhoor2\rhoor\text{kρ}standard deviations above\mu_{y}
-
Stay in school to get rich?
- Caveat 1: I made these numbers up. Before making important financial decisions, you should collect the true numbers.
- Caveat 2: We've modeled this as straight line, implying constant marginal returns to education; if decreasing marginal returns, might be better to use parabola (take Econometrics first).
- Caveat 3: Regressions just express correlation, still not causation (despite popular terminology of "dependent" and "independent" variables).
-
Maybe causation: school teaches useful skills that generate income.
-
Maybe reverse causation: schooling is pure consumption, and wealthy individuals can afford more.
-
Maybe spurious correlation: smart kids enjoy school (just as athletes enjoy sports) but would earn high incomes with or without school.
-
- Either way, predict higher incomes for those who do stay in school: going to school increases my prediction of your income, even if it doesn't increase your income.
-
Reverse predictions
- What if worker makes $100k income and asks for you to guess education?
- Could read regression backward, but it was designed to minimize errors in income not errors in education
- Better to start over, with income as
Xand education asY
-
-
Errors
-
\varepsilon_{i} = y_{i} -(\beta_{0} +\beta_{1}x_{i}) -
De-trend data (e.g. "skill" or "luck", above and beyond education)
-
Example: who is more genius (or luckier):
(x,y) = (12,\$ 80k)or(x,y) =(16,\$ 100k)? 1. $$ 80k -(10 + 4\cdot 12) =$ 22k$ 2.\$ 100k -(10 + 4\cdot 16) =\$ 26k -
Variance
\sigma_{\varepsilon}^{2}of error distribution tells us how far people are off the regression line\sigma_{\varepsilon}^{2} = V(Y -\beta_{0} -\beta_{1}X) =\sigma_{y}^{2} +\beta_{1}^{2}\sigma_{x}^{2} - 2\beta_{1}\text{cov}(X,Y)= {20k}^{2} + {4k}^{2}3^{2} - 2(4k)(.6\times 20k\times 3) =(\$ 16k)^{2}
-
-
Explanatory power (%)
-
Partition
V(Y) =\beta_{1}^{2}V(X) + V(\varepsilon) = 144 + 256- Note:
2\beta_{1}\text{Cov}(X,\varepsilon) = 0(see homework) because optimal slope extracts all correlation - This decomposes
V(Y)into "explained" 144 plus "unexplained" 256 (e.g. talent, luck, or some other mystery). (Warning: terminology sounds like causation, but isn't; more accurately, variation is "related to education" and "unrelated to education".)
- Note:
-
"Explained" portion is
\rho^{2}fraction of\sigma_{y}^{2}- "Explained" portion is
\frac{144}{400} = .36 - Generically,
\beta_{1}^{2}\sigma_{x}^{2} =(\frac{\sigma_{y}}{\sigma_{x}}\rho)^{2}\sigma_{x}^{2} =\rho^{2}\sigma_{y}^{2}; thus, "explained" variation is always\rho^{2}(sometimes called "coefficient of determination") fraction of\sigma_{y}^{2}. In this case{.6}^{2} = 36\%
- "Explained" portion is
-
"Unexplained" portion is
1 -\rho^{2}- In this case,
1 - {.6}^{2} = 64\%, so\sigma_{\varepsilon}^{2} = .64(\$ 20k)^{2}
- In this case,
-
Implicit linearity of
\rho- Fundamentally, what does
\rhomeasure? X^{2}is perfectly predictable fromX, but linear regression produces\rho^{2} < 1- Thus,
\text{corr}(X,X^{2})\neq 1 \rhofundamentally measures linear relationship (see homework)
- Fundamentally, what does
-