quartz/content/vault/L16 Normal, Chi Square, t Distributions.md
2022-06-07 16:56:28 -06:00

6.8 KiB

L16 Normal, Chi Square, t Distributions (WMS 4.5-4.6)

  1. Standardization (for later reference)

    • If E(X) =\mu and V(X) =\sigma^{2} then you can always change units to create a new random variable Z =\frac{X -\mu}{\sigma} such that E(Z) = 0 and V(Z) = 1
      1. E(Z) = E\lbrack\frac{1}{\sigma}(X -\mu)\rbrack =\frac{1}{\sigma}\lbrack E(X) -\mu\rbrack = 0
      2. V(Z) = V\lbrack\frac{1}{\sigma}X -\frac{1}{\sigma}\mu\rbrack = V(\frac{1}{\sigma}X) =\frac{1}{\sigma^{2}}V(X) = 1
  2. Normal (or Gaussian, after German mathematician Carl Friedrich Gauss, 1809) N(\mu,\sigma^{2})

    • f(x) =\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}} (integrate using polar coordinates or trig substitutions)

    • E(X) = x\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx =\ldots =\mu (u substitution)

    • V(X) = x^{2}\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx -\mu^{2} =\ldots =\sigma^{2} (integration by parts)

    • No analytical cdf; instead, approximate numerically

      1. Excel: NORM.DIST(x, mu, sd, cdf?)
      2. Percentiles: NORM.INV(percentile, mu, sd)
    • Special Properties

      1. $N + 7\sim N$ In other words, adding a constant changes the precise distribution of X but keeps it in the normal family
  3. Note: this is true of some other families of random variables (e.g. uniform) but not all (e.g. Bernoulli, binomial, exponential)

ii. $3N\sim N$ In other words, multiplying by a constant keeps X in the normal family

  1. Note: this is true of some other families of random variables (e.g. uniform, exponential) but not all (e.g. Bernoulli, binomial)

iii. $N + N\sim N$ That is, if X\sim N(\mu_{x},\sigma_{x}^{2}) and Y\sim N(\mu_{y},\sigma_{y}^{2}) then X + Y\sim N(\mu_{x} +\mu_{y},\sigma_{x}^{2} +\sigma_{y}^{2} + 2\sigma_{\text{xy}}) In other words, the sum of two normally distributed random variables is a normally distributed random variable

  1. Note: this is true of some other families of random variables (e.g. independent binomials), but not all (e.g. Bernoulli, correlated binomials, uniform, exponential)

  2. Standard normal N(0,1)

    • Practice reading Table A

      1. Excel: NORM.S.DIST(x, cdf?) or NORM.S.INV(percentile)
      2. P(- 1 < X < 1)\approx .68
      3. P(- 2 < X < 2)\approx .95
      4. P(- 3 < X < 3)\approx .997
    • Symmetric: P(X < - 3) = P(X > 3)

    • Standardized normal Z =\frac{X -\mu}{\sigma} is standard normal \sim N(0,1) (because of special properties of normal X)

    • Example 1: X\sim N(75,25) to find $P(X > 80) = P(Z >\frac{80 - 75}{})$ = P(Z > 1) = 1 - P(Z\leq 1) \approx 1 - .8413 = .1587

  • Example 2: costs C\sim N(120,100)

    1. Budget b so that P(C < b) = .9
    2. .90 = P(C < b) = P(Z <\frac{b - 120}{10})\approx P(Z < 1.28) (from Table A)
    3. If \frac{b - 120}{10}\approx 1.28 then b\approx 132.8
  • Example 3: costs C\sim N(120,100)) and revenue R\sim N(150,400) are independent; how often are profits Y = R - C positive?

    1. Y\sim N
    2. E(Y) = E(R) - E(C) = 150 - 120 = 30
    3. V(Y) = V(R - C) = V(R) +(- 1)^{2}V(C) + 2Cov(R,C) = 400 + 100 = 500
    4. So Y\sim N(30,500)
    5. P(Y > 0) = P(Z >\frac{0 - 30}{})\approx P(Z > - 1.34) = P(Z < 1.34)\approx .9099
  1. W\sim\chi^{2}(\nu) (German statistican Friedrich Robert Helmert, 1875)

    • Domain is \lbrack 0,\infty), roughly bell-shaped (but asymmetric, unlike Normal distribution)

    • \nu is often called "degrees of freedom", because in the most common application, it corresponds to how many

    • E(W) =\nu and V(W) = 2\nu

    • f(w) = ugly (I won't expect you to know or use)

    • CDF F(w) approximated on Table 6

      1. \chi_{\alpha}^{2} represents a realization of the random variable, where \alpha is the probability to the right of that value (i.e., 1 - F(w))
      2. Example: suppose sales follow Chi-square distribution, with average of 30 units
        1. Degrees of freedom \nu = 30

        2. 10^th^ percentile is \chi_{.90}^{2}\approx 20.6, 90^th^ percentile is \chi_{.10}^{2}\approx 40.3

        3. Putting these together, P(20.6 < W < 40.3)\approx .8

      3. Note: Table 6 only gives 10 points on the cdf. With a computer, you can get the rest. Excel: CHISQ.DIST(x,df, cdf?), CHISQ.INV(percentile, df)
    • Facts

      1. If Z\sim N(0,1) then Z^{2}\sim\chi^{2}(1)
      2. If W_{1}\sim\chi^{2}(4) and W_{2}\sim\chi^{2}(7) independent then W_{1} + W_{2}\sim\chi^{2}(11)
      3. Variance is a quadratic function of a random variable, so when we estimate the variance of a random variable that has a normal distribution (in lecture L19), our estimates will follow a \chi^{2} distribution.
  2. t distribution (Friedrich Robert Helmert 1876, Karl Pearson 1900)

    • T\sim t(\nu); as in Chi-square distribution, \nu is called "degrees of freedom"

    • Similar to standard normal, but with higher variance (i.e. thicker tails)

    • Approaches N(0,1) as \nuarrow\infty

    • f(t) = ugly (I won't expect you to know or use)

    • E(T) = 0, V(T) =\frac{\nu}{\nu - 2}arrow 1

    • CDF F(t) approximated on Table C

      1. Table is oriented so that probability C lies between - t^{*} and t^{*}.
      2. Example: if T\sim t(20) find 90^th^ percentile
        1. Following C = 80\% (fifth column) for df = 20 leads to t^{*} = 1.325.

        2. In other words, 10\% of the distribution is left of - 1.325, 80\% is between - 1.325 and 1.325, and 10\% is above 1.325.

        3. Since 10\% + 80\% = 90\% of the distribution is below 1.325 and 10\% is above, 1.325 is the 90^th^ percentile of the distribution.

        4. Alternatively, can come up from a one-sided p-value of .10 or a two-sided p-value of .20 (bottom of the table) to reach the same conclusion.

      3. For degrees of freedom greater than 1000, can read z^{*} row of the table, which corresponds to a standard normal distribution (i.e., \infty degrees of freedom).
      4. Note: Table C only gives 12 points on CDF. With a computer, you can get the rest. Excel: T.DIST(x, df, cdf?) and T.INV(percentile, df)
    • Fact

      1. If Z\sim N(0,1) and W\sim\chi^{2}(\nu) independent then \frac{Z}{}\sim t(\nu)
      2. If we knew the population variance, then estimates of the mean would follow a normal distribution. Since we have to estimate the population variance, and estimates follow a \chi^{2} distribution, our estimates of the mean follow a t distribution
  3. Other distributions

    • The distributions we've gone over are some of the most common; there are many others, with various shapes, properties, and uses.

    • Illustrated: https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

    • Discrete

      1. Uniform
      2. Binomial
      3. Geometric
      4. Poisson
      5. Hypergeometric
    • Continuous

      1. Exponential
      2. F
      3. Beta
      4. Gamma
      5. Log-normal
      6. Pareto
      7. Weibull