Probability: Difference between revisions
Line 140: | Line 140: | ||
===Exponential Distribution=== | ===Exponential Distribution=== | ||
* <math>\operatorname{Exp}(\lambda)</math> is equivalent to <math>\Gamma(1, 1/\lambda)</math> | |||
** Note that some conventions flip the second parameter of gamma, so it would be <math>\Gamma(1, \lambda)</math> | |||
* If <math>\epsilon_1, ..., \epsilon_n</math> are exponential distributions then <math>\min\{\epsilon_i\} \sim \exp(\sum \lambda_i)</math> | * If <math>\epsilon_1, ..., \epsilon_n</math> are exponential distributions then <math>\min\{\epsilon_i\} \sim \exp(\sum \lambda_i)</math> | ||
* Note that the maximum is not exponentially distributed | * Note that the maximum is not exponentially distributed |
Revision as of 19:24, 17 December 2019
Calculus-based Probability
Basics
Axioms of Probability
- \(\displaystyle 0 \leq P(E) \leq 1\)
- \(\displaystyle P(S) = 1\) where \(\displaystyle S\) is your sample space
- For mutually exclusive events \(\displaystyle E_1, E_2, ...\), \(\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)\)
Monotonicity
- For all events \(\displaystyle A\), \(\displaystyle B\), \(\displaystyle A \subset B \implies P(A) \leq P(B)\)
Expectation and Variance
Some definitions and properties.
Definitions
Let \(\displaystyle X \sim D\) for some distribution \(\displaystyle D\). Let \(\displaystyle S\) be the support or domain of your distribution.
- \(\displaystyle E(X) = \sum_S xp(x)\) or \(\displaystyle \int_S xp(x)dx\)
- \(\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2\)
Total Expection
\(\displaystyle E(X) = E(E(X|Y))\)
Dr. Xu refers to this as the smooth property.
\(\displaystyle E(X) = \int_S xp(x)dx = \int_x x \int_y p(x,y)dy dx = \int_x x \int_y p(x|y)p(y)dy dx = \int_y\int_x x p(x|y)dxp(y)dy \)
Total Variance
\(\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X)\)
This one is not used as often on tests as total expectation
Moments and Moment Generating Functions
Definitions
We call \(\displaystyle E(X^i)\) the i'th moment of \(\displaystyle X\).
We call \(\displaystyle E(|X - E(X)|^i)\) the i'th central moment of \(\displaystyle X\).
Therefore the mean is the first moment and the variance is the second central moment.
Moment Generating Functions
\(\displaystyle E(e^{tX})\)
We call this the moment generating function (mgf).
We can differentiate it with respect to \(\displaystyle t\) and set \(\displaystyle t=0\) to get the higher moments.
- Notes
- The mgf, if it exists, uniquely defines the distribution.
- The mgf of \(\displaystyle X+Y\) is \(\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})\)
Characteristic function
Convergence
There are 4 types of convergence typically taught in undergraduate courses.
See Wikipedia Convergence of random variables
Almost Surely
- \(\displaystyle P(\lim X_i = X) = 1\)
In Probability
For all \(\displaystyle \epsilon \gt 0\)
\(\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0\)
- Implies Convergence in distribution
In Distribution
Pointwise convergence of the cdf
A sequence of random variables \(\displaystyle X_1,...\) converges to \(\displaystyle X\) in probability
if for all \(\displaystyle x \in S\),
\(\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)\)
- Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)
In Mean Squared
\(\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0\)
Delta Method
See Wikipedia
Suppose \(\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)\).
Let \(\displaystyle g\) be a function such that \(\displaystyle g'\) exists and \(\displaystyle g'(\theta) \neq 0\)
Then \(\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)\)
Multivariate:
\(\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))\)
- Notes
- You can think of this like the Mean Value theorem for random variables.
- \(\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)\)
Order Statistics
Inequalities and Limit Theorems
Markov's Inequality
Let \(\displaystyle X\) be a non-negative random variable.
Then \(\displaystyle P(X \geq a) \leq \frac{E(X)}{a}\)
\(\displaystyle \begin{aligned} E(X) &= \int_{0}^{\infty}xf(x)dx \\ &= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}af(x)dx\\ &=a \int_{a}^{\infty}f(x)dx\\ &=a * P(X \geq a)\\ \implies& P(X \geq a) \leq \frac{E(X)}{a} \end{aligned} \)
Chebyshev's Inequality
- \(\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}\)
- \(\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}\)
Apply Markov's inequality:
Let \(\displaystyle Y = |X - \mu|\)
\(\displaystyle P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}\)
- Usually used to prove convergence in probability
Central Limit Theorem
Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let \(\displaystyle \mu = E(x)\) and \(\displaystyle \sigma^2 = Var(x)\)
Different ways of saying the same thing:
- \(\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)\)
- \(\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)\)
- \(\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)\)
Law of Large Numbers
The sample mean converges to the population mean in probability.
For all \(\displaystyle \epsilon \gt 0\),
\(\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0\)
- Notes
- The sample mean converges to the population mean almost surely.
Properties and Relationships between distributions
This is important for exams.
See Relationships among probability distributions.
Poisson Distribution
- If \(\displaystyle X_i \sim Poisson(\lambda_i)\) then \(\displaystyle \sum X_i \sim Poisson(\sum \lambda_i)\)
Normal Distribution
- If \(\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)\) and \(\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)\) then \(\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)\) for any \(\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}\)
Exponential Distribution
- \(\displaystyle \operatorname{Exp}(\lambda)\) is equivalent to \(\displaystyle \Gamma(1, 1/\lambda)\)
- Note that some conventions flip the second parameter of gamma, so it would be \(\displaystyle \Gamma(1, \lambda)\)
- If \(\displaystyle \epsilon_1, ..., \epsilon_n\) are exponential distributions then \(\displaystyle \min\{\epsilon_i\} \sim \exp(\sum \lambda_i)\)
- Note that the maximum is not exponentially distributed
- However, if \(\displaystyle X_1, ..., X_n \sim \exp(1)\) then \(\displaystyle Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)\)
Gamma Distribution
Note exponential distributions are also Gamma distrubitions
- If \(\displaystyle X \sim \Gamma(k, \theta)\) then \(\displaystyle \lambda X \sim \Gamma(k, c\theta)\).
- If \(\displaystyle X_1 \sim \Gamma(k_1, \theta)\) and \(\displaystyle X_2 \sim \Gamma(k_2, \theta)\) then \(\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)\).
- If \(\displaystyle X_1 \sim \Gamma(\alpha, \theta)\) and \(\displaystyle X_2 \sim \Gamma(\beta, \theta)\), then \(\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)\).
T-distribution
- Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
- If \(\displaystyle Z \sim N(0,1)\) and \(\displaystyle V \sim \Chi^2(v)\) then \(\displaystyle \frac{Z}{\sqrt{V/v}} \sim t-dist(v)\)
Chi-Sq Distribution
- The ratio of two normalized Chi-sq is an F-distributions
- If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
- If \(\displaystyle Z_1,...,Z_k \sim N(0,1)\) then \(\displaystyle Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)\)
- If \(\displaystyle X_i \sim \Chi^2(k_i)\) then \(\displaystyle X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)\)
F Distribution
Too many. See the Wikipedia Page. Most important are Chi-sq and T distribution
- If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
- If \(\displaystyle X \sim t_{(n)}\) then \(\displaystyle X^2 \sim F(1, n)\) and \(\displaystyle X^{-2} \sim F(n, 1)\)
Textbooks
- Sheldon Ross' A First Course in Probability
- This is a very good textbook despite the poor reviews on Amazon
- Hogg and Craig's Mathematical Statistics
- Casella and Burger's Statistical Inference