L16 Normal, Chi Square, t Distributions (WMS 4.5-4.6)

Standardization (for later reference)
- If E(X) =\mu and V(X) =\sigma^{2} then you can always change units to create a new random variable Z =\frac{X -\mu}{\sigma} such that E(Z) = 0 and V(Z) = 1
  1. E(Z) = E\lbrack\frac{1}{\sigma}(X -\mu)\rbrack =\frac{1}{\sigma}\lbrack E(X) -\mu\rbrack = 0
  2. V(Z) = V\lbrack\frac{1}{\sigma}X -\frac{1}{\sigma}\mu\rbrack = V(\frac{1}{\sigma}X) =\frac{1}{\sigma^{2}}V(X) = 1
Normal (or Gaussian, after German mathematician Carl Friedrich Gauss, 1809) N(\mu,\sigma^{2})
- f(x) =\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}} (integrate using polar coordinates or trig substitutions)
- E(X) = x\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx =\ldots =\mu (u substitution)
- V(X) = x^{2}\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx -\mu^{2} =\ldots =\sigma^{2} (integration by parts)
- No analytical cdf; instead, approximate numerically
  1. Excel: NORM.DIST(x, mu, sd, cdf?)
  2. Percentiles: NORM.INV(percentile, mu, sd)
- Special Properties
  1. $N + 7\sim N$ In other words, adding a constant changes the precise distribution of X but keeps it in the normal family
Note: this is true of some other families of random variables (e.g. uniform) but not all (e.g. Bernoulli, binomial, exponential)

ii. $3N\sim N$ In other words, multiplying by a constant keeps X in the normal family

Note: this is true of some other families of random variables (e.g. uniform, exponential) but not all (e.g. Bernoulli, binomial)

iii. $N + N\sim N$ That is, if X\sim N(\mu_{x},\sigma_{x}^{2}) and Y\sim N(\mu_{y},\sigma_{y}^{2}) then X + Y\sim N(\mu_{x} +\mu_{y},\sigma_{x}^{2} +\sigma_{y}^{2} + 2\sigma_{\text{xy}}) In other words, the sum of two normally distributed random variables is a normally distributed random variable

Note: this is true of some other families of random variables (e.g. independent binomials), but not all (e.g. Bernoulli, correlated binomials, uniform, exponential)
Standard normal N(0,1)
- Practice reading Table A
  1. Excel: NORM.S.DIST(x, cdf?) or NORM.S.INV(percentile)
  2. P(- 1 < X < 1)\approx .68
  3. P(- 2 < X < 2)\approx .95
  4. P(- 3 < X < 3)\approx .997
- Symmetric: P(X < - 3) = P(X > 3)
- Standardized normal Z =\frac{X -\mu}{\sigma} is standard normal \sim N(0,1) (because of special properties of normal X)
- Example 1: X\sim N(75,25) to find $P(X > 80) = P(Z >\frac{80 - 75}{})$ = P(Z > 1) = 1 - P(Z\leq 1) \approx 1 - .8413 = .1587

Example 2: costs C\sim N(120,100)
1. Budget b so that P(C < b) = .9
2. .90 = P(C < b) = P(Z <\frac{b - 120}{10})\approx P(Z < 1.28) (from Table A)
3. If \frac{b - 120}{10}\approx 1.28 then b\approx 132.8
Example 3: costs C\sim N(120,100)) and revenue R\sim N(150,400) are independent; how often are profits Y = R - C positive?
1. Y\sim N
2. E(Y) = E(R) - E(C) = 150 - 120 = 30
3. V(Y) = V(R - C) = V(R) +(- 1)^{2}V(C) + 2Cov(R,C) = 400 + 100 = 500
4. So Y\sim N(30,500)
5. P(Y > 0) = P(Z >\frac{0 - 30}{})\approx P(Z > - 1.34) = P(Z < 1.34)\approx .9099

W\sim\chi^{2}(\nu) (German statistican Friedrich Robert Helmert, 1875)
- Domain is \lbrack 0,\infty), roughly bell-shaped (but asymmetric, unlike Normal distribution)
- \nu is often called "degrees of freedom", because in the most common application, it corresponds to how many
- E(W) =\nu and V(W) = 2\nu
- f(w) = ugly (I won't expect you to know or use)
- CDF F(w) approximated on Table 6
  1. \chi_{\alpha}^{2} represents a realization of the random variable, where \alpha is the probability to the right of that value (i.e., 1 - F(w))
  2. Example: suppose sales follow Chi-square distribution, with average of 30 units
    1. Degrees of freedom \nu = 30
    2. 10^th^ percentile is \chi_{.90}^{2}\approx 20.6, 90^th^ percentile is \chi_{.10}^{2}\approx 40.3
    3. Putting these together, P(20.6 < W < 40.3)\approx .8
  3. Note: Table 6 only gives 10 points on the cdf. With a computer, you can get the rest. Excel: CHISQ.DIST(x,df, cdf?), CHISQ.INV(percentile, df)
- Facts
  1. If Z\sim N(0,1) then Z^{2}\sim\chi^{2}(1)
  2. If W_{1}\sim\chi^{2}(4) and W_{2}\sim\chi^{2}(7) independent then W_{1} + W_{2}\sim\chi^{2}(11)
  3. Variance is a quadratic function of a random variable, so when we estimate the variance of a random variable that has a normal distribution (in lecture L19), our estimates will follow a \chi^{2} distribution.
t distribution (Friedrich Robert Helmert 1876, Karl Pearson 1900)
- T\sim t(\nu); as in Chi-square distribution, \nu is called "degrees of freedom"
- Similar to standard normal, but with higher variance (i.e. thicker tails)
- Approaches N(0,1) as \nuarrow\infty
- f(t) = ugly (I won't expect you to know or use)
- E(T) = 0, V(T) =\frac{\nu}{\nu - 2}arrow 1
- CDF F(t) approximated on Table C
  1. Table is oriented so that probability C lies between - t^{*} and t^{*}.
  2. Example: if T\sim t(20) find 90^th^ percentile
    1. Following C = 80\% (fifth column) for df = 20 leads to t^{*} = 1.325.
    2. In other words, 10\% of the distribution is left of - 1.325, 80\% is between - 1.325 and 1.325, and 10\% is above 1.325.
    3. Since 10\% + 80\% = 90\% of the distribution is below 1.325 and 10\% is above, 1.325 is the 90^th^ percentile of the distribution.
    4. Alternatively, can come up from a one-sided p-value of .10 or a two-sided p-value of .20 (bottom of the table) to reach the same conclusion.
  3. For degrees of freedom greater than 1000, can read z^{*} row of the table, which corresponds to a standard normal distribution (i.e., \infty degrees of freedom).
  4. Note: Table C only gives 12 points on CDF. With a computer, you can get the rest. Excel: T.DIST(x, df, cdf?) and T.INV(percentile, df)
- Fact
  1. If Z\sim N(0,1) and W\sim\chi^{2}(\nu) independent then \frac{Z}{}\sim t(\nu)
  2. If we knew the population variance, then estimates of the mean would follow a normal distribution. Since we have to estimate the population variance, and estimates follow a \chi^{2} distribution, our estimates of the mean follow a t distribution
Other distributions
- The distributions we've gone over are some of the most common; there are many others, with various shapes, properties, and uses.
- Illustrated: https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
- Discrete
  1. Uniform
  2. Binomial
  3. Geometric
  4. Poisson
  5. Hypergeometric
- Continuous
  1. Exponential
  2. F
  3. Beta
  4. Gamma
  5. Log-normal
  6. Pareto
  7. Weibull

6.8 KiB Raw Blame History

L16 Normal, Chi Square, t Distributions (WMS 4.5-4.6)

6.8 KiB

Raw Blame History