quartz/content/vault/L16 Normal, Chi Square, t Distributions.md

# L16 Normal, Chi Square, t Distributions (WMS 4.5-4.6)

1.  Standardization (for later reference)

    -  If $E(X) =\mu$ and $V(X) =\sigma^{2}$ then you can always change units to create a new random variable $Z =\frac{X -\mu}{\sigma}$ such that $E(Z) = 0$ and $V(Z) = 1$
		1.  $E(Z) = E\lbrack\frac{1}{\sigma}(X -\mu)\rbrack =\frac{1}{\sigma}\lbrack E(X) -\mu\rbrack = 0$
		2. $V(Z) = V\lbrack\frac{1}{\sigma}X -\frac{1}{\sigma}\mu\rbrack = V(\frac{1}{\sigma}X) =\frac{1}{\sigma^{2}}V(X) = 1$

2.  Normal (or Gaussian, after German mathematician Carl Friedrich Gauss, 1809) $N(\mu,\sigma^{2})$

    -  $f(x) =\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}$ (integrate using polar coordinates or trig substitutions)

    -  $E(X) = x\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx =\ldots =\mu$ (u substitution)

    -  $V(X) = x^{2}\frac{1}{\sigma}e^{-\frac{1}{2}(\frac{x -\mu}{\sigma})^{2}}dx -\mu^{2} =\ldots =\sigma^{2}$ (integration by parts)

    -  No analytical cdf; instead, approximate numerically
		  1.  Excel: NORM.DIST(x, mu, sd, cdf?)
		  2. Percentiles: NORM.INV(percentile, mu, sd)

    -  Special Properties
		  1.  $N + 7\sim N$
              In other words, adding a constant changes the precise distribution of $X$ but keeps it in the normal family

1.  Note: this is true of some other families of random variables (e.g. uniform) but not all (e.g. Bernoulli, binomial, exponential)


ii. $3N\sim N$
 In other words, multiplying by a constant keeps $X$ in the normal family

1.  Note: this is true of some other families of random variables (e.g. uniform, exponential) but not all (e.g. Bernoulli, binomial)


iii. $N + N\sim N$
 That is, if $X\sim N(\mu_{x},\sigma_{x}^{2})$ and $Y\sim N(\mu_{y},\sigma_{y}^{2})$ then $X + Y\sim N(\mu_{x} +\mu_{y},\sigma_{x}^{2} +\sigma_{y}^{2} + 2\sigma_{\text{xy}})$ In other words, the sum of two normally distributed random variables is a normally distributed random variable

1.  Note: this is true of some other families of random variables (e.g. independent binomials), but not all (e.g. Bernoulli, correlated binomials, uniform, exponential)


3.  Standard normal $N(0,1)$

    -  Practice reading Table A
		1.  Excel: NORM.S.DIST(x, cdf?) or NORM.S.INV(percentile)
		2. $P(- 1 < X < 1)\approx .68$
		3. $P(- 2 < X < 2)\approx .95$
		4. $P(- 3 < X < 3)\approx .997$

    -  Symmetric: $P(X < - 3) = P(X > 3)$

    -  Standardized normal $Z =\frac{X -\mu}{\sigma}$ is standard normal $\sim N(0,1)$ (because of special properties of normal $X$)

    -  Example 1: $X\sim N(75,25)$ to find $P(X > 80) = P(Z >\frac{80 - 75}{})$
 $= P(Z > 1) = 1 - P(Z\leq 1)$ $\approx 1 - .8413 = .1587$

-  Example 2: costs $C\sim N(120,100)$
	1.  Budget $b$ so that $P(C < b) = .9$
	2. $.90 = P(C < b) = P(Z <\frac{b - 120}{10})\approx P(Z < 1.28)$ (from Table A)
	3. If $\frac{b - 120}{10}\approx 1.28$ then $b\approx 132.8$

-  Example 3: costs $C\sim N(120,100)$) and revenue $R\sim N(150,400)$ are independent; how often are profits $Y = R - C$ positive?
	1.  $Y\sim N$
	2. $E(Y) = E(R) - E(C) = 150 - 120 = 30$
	3. $V(Y) = V(R - C) = V(R) +(- 1)^{2}V(C) + 2Cov(R,C) = 400 + 100 = 500$
	4. So $Y\sim N(30,500)$
	5.  $P(Y > 0) = P(Z >\frac{0 - 30}{})\approx P(Z > - 1.34) = P(Z < 1.34)\approx .9099$


4.  $W\sim\chi^{2}(\nu)$ (German statistican Friedrich Robert Helmert, 1875)

    -  Domain is $\lbrack 0,\infty)$, roughly bell-shaped (but asymmetric, unlike Normal distribution)

    -  $\nu$ is often called "degrees of freedom", because in the most common application, it corresponds to how many

    -  $E(W) =\nu$ and $V(W) = 2\nu$

    -  $f(w) = ugly$ (I won't expect you to know or use)

    -  CDF $F(w)$ approximated on Table 6
		1.  $\chi_{\alpha}^{2}$ represents a realization of the random variable, where $\alpha$ is the probability to the right of that value (i.e., $1 - F(w)$)
		2. Example: suppose sales follow Chi-square distribution, with average of 30 units
            1.  Degrees of freedom $\nu = 30$

            2.  10^th^ percentile is $\chi_{.90}^{2}\approx 20.6$, 90^th^ percentile is $\chi_{.10}^{2}\approx 40.3$

            3.  Putting these together, $P(20.6 < W < 40.3)\approx .8$
		3. Note: Table 6 only gives 10 points on the cdf. With a computer, you can get the rest. Excel: CHISQ.DIST(x,df, cdf?), CHISQ.INV(percentile, df)

    -  Facts
		1.  If $Z\sim N(0,1)$ then $Z^{2}\sim\chi^{2}(1)$
		2. If $W_{1}\sim\chi^{2}(4)$ and $W_{2}\sim\chi^{2}(7)$ independent then $W_{1} + W_{2}\sim\chi^{2}(11)$
		3. Variance is a quadratic function of a random variable, so when we estimate the variance of a random variable that has a normal distribution (in lecture L19), our estimates will follow a $\chi^{2}$ distribution.

5.  $t$ distribution (Friedrich Robert Helmert 1876, Karl Pearson 1900)

    -  $T\sim t(\nu)$; as in Chi-square distribution, $\nu$ is called "degrees of freedom"

    -  Similar to standard normal, but with higher variance (i.e. thicker tails)

    -  Approaches $N(0,1)$ as $\nuarrow\infty$

    -  $f(t) = ugly$ (I won't expect you to know or use)

    -  $E(T) = 0$, $V(T) =\frac{\nu}{\nu - 2}arrow 1$

    -  CDF $F(t)$ approximated on Table C
		1.  Table is oriented so that probability $C$ lies between $- t^{*}$ and $t^{*}$.
		2. Example: if $T\sim t(20)$ find 90^th^ percentile
            1.  Following $C = 80\%$ (fifth column) for $df = 20$ leads to $t^{*} = 1.325$.

            2.  In other words, $10\%$ of the distribution is left of $- 1.325$, $80\%$ is between $- 1.325$ and $1.325$, and $10\%$ is above $1.325$.

            3.  Since $10\% + 80\% = 90\%$ of the distribution is below $1.325$ and $10\%$ is above, $1.325$ is the 90^th^ percentile of the distribution.

            4.  Alternatively, can come up from a one-sided p-value of $.10$ or a two-sided p-value of $.20$ (bottom of the table) to reach the same conclusion.
		3. For degrees of freedom greater than $1000$, can read $z^{*}$ row of the table, which corresponds to a standard normal distribution (i.e., $\infty$ degrees of freedom).
		4. Note: Table C only gives 12 points on CDF. With a computer, you can get the rest. Excel: T.DIST(x, df, cdf?) and T.INV(percentile, df)

    -  Fact
		1.  If $Z\sim N(0,1)$ and $W\sim\chi^{2}(\nu)$ independent then $\frac{Z}{}\sim t(\nu)$
		2. If we knew the population variance, then estimates of the mean would follow a normal distribution. Since we have to estimate the population variance, and estimates follow a $\chi^{2}$ distribution, our estimates of the mean follow a $t$ distribution

6.  Other distributions

    -  The distributions we've gone over are some of the most common; there are many others, with various shapes, properties, and uses.

    -  Illustrated: https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm

    -  Discrete
		1.  Uniform
		2. Binomial
		3. Geometric
		4. Poisson
		5.  Hypergeometric

    -  Continuous
		1.  Exponential
		2. F
		3. Beta
		4. Gamma
		5.  Log-normal
		6. Pareto
		7. Weibull