6.8 KiB
L16 Normal, Chi Square, t Distributions (WMS 4.5-4.6)
-
Standardization (for later reference)
- If
E(X) =\muandV(X) =\sigma^{2}then you can always change units to create a new random variableZ =\frac{X -\mu}{\sigma}such thatE(Z) = 0andV(Z) = 1E(Z) = E\lbrack\frac{1}{\sigma}(X -\mu)\rbrack =\frac{1}{\sigma}\lbrack E(X) -\mu\rbrack = 0V(Z) = V\lbrack\frac{1}{\sigma}X -\frac{1}{\sigma}\mu\rbrack = V(\frac{1}{\sigma}X) =\frac{1}{\sigma^{2}}V(X) = 1
- If
-
Normal (or Gaussian, after German mathematician Carl Friedrich Gauss, 1809)
N(\mu,\sigma^{2})-
f(x) =\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}(integrate using polar coordinates or trig substitutions) -
E(X) = x\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx =\ldots =\mu(u substitution) -
V(X) = x^{2}\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx -\mu^{2} =\ldots =\sigma^{2}(integration by parts) -
No analytical cdf; instead, approximate numerically
- Excel: NORM.DIST(x, mu, sd, cdf?)
- Percentiles: NORM.INV(percentile, mu, sd)
-
Special Properties
- $N + 7\sim N$
In other words, adding a constant changes the precise distribution of
Xbut keeps it in the normal family
- $N + 7\sim N$
In other words, adding a constant changes the precise distribution of
-
-
Note: this is true of some other families of random variables (e.g. uniform) but not all (e.g. Bernoulli, binomial, exponential)
ii. $3N\sim N$
In other words, multiplying by a constant keeps X in the normal family
- Note: this is true of some other families of random variables (e.g. uniform, exponential) but not all (e.g. Bernoulli, binomial)
iii. $N + N\sim N$
That is, if X\sim N(\mu_{x},\sigma_{x}^{2}) and Y\sim N(\mu_{y},\sigma_{y}^{2}) then X + Y\sim N(\mu_{x} +\mu_{y},\sigma_{x}^{2} +\sigma_{y}^{2} + 2\sigma_{\text{xy}}) In other words, the sum of two normally distributed random variables is a normally distributed random variable
-
Note: this is true of some other families of random variables (e.g. independent binomials), but not all (e.g. Bernoulli, correlated binomials, uniform, exponential)
-
Standard normal
N(0,1)-
Practice reading Table A
- Excel: NORM.S.DIST(x, cdf?) or NORM.S.INV(percentile)
P(- 1 < X < 1)\approx .68P(- 2 < X < 2)\approx .95P(- 3 < X < 3)\approx .997
-
Symmetric:
P(X < - 3) = P(X > 3) -
Standardized normal
Z =\frac{X -\mu}{\sigma}is standard normal\sim N(0,1)(because of special properties of normalX) -
Example 1:
X\sim N(75,25)to find $P(X > 80) = P(Z >\frac{80 - 75}{})$= P(Z > 1) = 1 - P(Z\leq 1)\approx 1 - .8413 = .1587
-
-
Example 2: costs
C\sim N(120,100)- Budget
bso thatP(C < b) = .9 .90 = P(C < b) = P(Z <\frac{b - 120}{10})\approx P(Z < 1.28)(from Table A)- If
\frac{b - 120}{10}\approx 1.28thenb\approx 132.8
- Budget
-
Example 3: costs
C\sim N(120,100)) and revenueR\sim N(150,400)are independent; how often are profitsY = R - Cpositive?Y\sim NE(Y) = E(R) - E(C) = 150 - 120 = 30V(Y) = V(R - C) = V(R) +(- 1)^{2}V(C) + 2Cov(R,C) = 400 + 100 = 500- So
Y\sim N(30,500) P(Y > 0) = P(Z >\frac{0 - 30}{})\approx P(Z > - 1.34) = P(Z < 1.34)\approx .9099
-
W\sim\chi^{2}(\nu)(German statistican Friedrich Robert Helmert, 1875)-
Domain is
\lbrack 0,\infty), roughly bell-shaped (but asymmetric, unlike Normal distribution) -
\nuis often called "degrees of freedom", because in the most common application, it corresponds to how many -
E(W) =\nuandV(W) = 2\nu -
f(w) = ugly(I won't expect you to know or use) -
CDF
F(w)approximated on Table 6\chi_{\alpha}^{2}represents a realization of the random variable, where\alphais the probability to the right of that value (i.e.,1 - F(w))- Example: suppose sales follow Chi-square distribution, with average of 30 units
-
Degrees of freedom
\nu = 30 -
10^th^ percentile is
\chi_{.90}^{2}\approx 20.6, 90^th^ percentile is\chi_{.10}^{2}\approx 40.3 -
Putting these together,
P(20.6 < W < 40.3)\approx .8
-
- Note: Table 6 only gives 10 points on the cdf. With a computer, you can get the rest. Excel: CHISQ.DIST(x,df, cdf?), CHISQ.INV(percentile, df)
-
Facts
- If
Z\sim N(0,1)thenZ^{2}\sim\chi^{2}(1) - If
W_{1}\sim\chi^{2}(4)andW_{2}\sim\chi^{2}(7)independent thenW_{1} + W_{2}\sim\chi^{2}(11) - Variance is a quadratic function of a random variable, so when we estimate the variance of a random variable that has a normal distribution (in lecture L19), our estimates will follow a
\chi^{2}distribution.
- If
-
-
tdistribution (Friedrich Robert Helmert 1876, Karl Pearson 1900)-
T\sim t(\nu); as in Chi-square distribution,\nuis called "degrees of freedom" -
Similar to standard normal, but with higher variance (i.e. thicker tails)
-
Approaches
N(0,1)as\nuarrow\infty -
f(t) = ugly(I won't expect you to know or use) -
E(T) = 0,V(T) =\frac{\nu}{\nu - 2}arrow 1 -
CDF
F(t)approximated on Table C- Table is oriented so that probability
Clies between- t^{*}andt^{*}. - Example: if
T\sim t(20)find 90^th^ percentile-
Following
C = 80\%(fifth column) fordf = 20leads tot^{*} = 1.325. -
In other words,
10\%of the distribution is left of- 1.325,80\%is between- 1.325and1.325, and10\%is above1.325. -
Since
10\% + 80\% = 90\%of the distribution is below1.325and10\%is above,1.325is the 90^th^ percentile of the distribution. -
Alternatively, can come up from a one-sided p-value of
.10or a two-sided p-value of.20(bottom of the table) to reach the same conclusion.
-
- For degrees of freedom greater than
1000, can readz^{*}row of the table, which corresponds to a standard normal distribution (i.e.,\inftydegrees of freedom). - Note: Table C only gives 12 points on CDF. With a computer, you can get the rest. Excel: T.DIST(x, df, cdf?) and T.INV(percentile, df)
- Table is oriented so that probability
-
Fact
- If
Z\sim N(0,1)andW\sim\chi^{2}(\nu)independent then\frac{Z}{}\sim t(\nu) - If we knew the population variance, then estimates of the mean would follow a normal distribution. Since we have to estimate the population variance, and estimates follow a
\chi^{2}distribution, our estimates of the mean follow atdistribution
- If
-
-
Other distributions
-
The distributions we've gone over are some of the most common; there are many others, with various shapes, properties, and uses.
-
Illustrated: https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
-
Discrete
- Uniform
- Binomial
- Geometric
- Poisson
- Hypergeometric
-
Continuous
- Exponential
- F
- Beta
- Gamma
- Log-normal
- Pareto
- Weibull
-