Probability
Calculus-based Probability
Basics
Axioms of Probability
- \(\displaystyle 0 \leq P(E) \leq 1\)
- \(\displaystyle P(S) = 1\) where \(\displaystyle S\) is your sample space
- For mutually exclusive events \(\displaystyle E_1, E_2, ...\), \(\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)\)
Monotonicity
- For all events \(\displaystyle A\), \(\displaystyle B\), \(\displaystyle A \subset B \implies P(A) \leq P(B)\)
Expectation and Variance
Some definitions and properties.
Definitions
Let \(\displaystyle X \sim D\) for some distribution \(\displaystyle D\). Let \(\displaystyle S\) be the support or domain of your distribution.
- \(\displaystyle E(X) = \sum_S xp(x)\) or \(\displaystyle \int_S xp(x)dx\)
- \(\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2\)
Total Expection
\(\displaystyle E(X) = E(E(X|Y))\)
Dr. Xu refers to this as the smooth property.
\(\displaystyle E(X) = \int_S xp(x)dx = \int_x x \int_y p(x,y)dy dx = \int_x x \int_y p(x|y)p(y)dy dx = \int_y\int_x x p(x|y)dxp(y)dy \)
Total Variance
\(\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X)\)
This one is not used as often on tests as total expectation
Moments and Moment Generating Functions
Definitions
We call \(\displaystyle E(X^i)\) the i'th moment of \(\displaystyle X\).
We call \(\displaystyle E(|X - E(X)|^i)\) the i'th central moment of \(\displaystyle X\).
Therefore the mean is the first moment and the variance is the second central moment.
Moment Generating Functions
\(\displaystyle E(e^{tX})\)
We call this the moment generating function (mgf).
We can differentiate it with respect to \(\displaystyle t\) and set \(\displaystyle t=0\) to get the higher moments.
- Notes
- The mgf, if it exists, uniquely defines the distribution.
- The mgf of \(\displaystyle X+Y\) is \(\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})\)
Characteristic function
Convergence
There are 4 types of convergence typically taught in undergraduate courses.
See Wikipedia Convergence of random variables
Almost Surely
- \(\displaystyle P(\lim X_i = X) = 1\)
In Probability
For all \(\displaystyle \epsilon \gt 0\)
\(\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0\)
- Implies Convergence in distribution
In Distribution
Pointwise convergence of the cdf
A sequence of random variables \(\displaystyle X_1,...\) converges to \(\displaystyle X\) in probability
if for all \(\displaystyle x \in S\),
\(\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)\)
- Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)
In Mean Squared
\(\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0\)
Delta Method
See Wikipedia
Suppose \(\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)\).
Let \(\displaystyle g\) be a function such that \(\displaystyle g'\) exists and \(\displaystyle g'(\theta) \neq 0\)
Then \(\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)\)
Multivariate:
\(\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))\)
- Notes
- You can think of this like the Mean Value theorem for random variables.
- \(\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)\)
Inequalities and Limit Theorems
Markov's Inequality
Let \(\displaystyle X\) be a non-negative random variable.
Then \(\displaystyle P(X \geq a) \leq \frac{E(X)}{a}\)
\(\displaystyle E(X) = \int_{0}^{\infty}xf(x)dx = \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx \geq \int_{a}^{\infty}xf(x)dx \geq \int_{a}^{\infty}af(x)dx =a \int_{a}^{\infty}f(x)dx =a*P(X \geq a)\\ \implies P(x\geq a) \leq \frac{E(X)}{a} \)
Chebyshev's Inequality
- \(\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}\)
- \(\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}\)
Apply Markov's inequality:
Let \(\displaystyle Y = |X - \mu|\)
\(\displaystyle P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}\)
- Usually used to prove convergence in probability
Central Limit Theorem
Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let \(\displaystyle \mu = E(x)\) and \(\displaystyle \sigma^2 = Var(x)\)
Different ways of saying the same thing:
- \(\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)\)
- \(\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)\)
- \(\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)\)
Law of Large Numbers
The sample mean converges to the population mean in probability.
For all \(\displaystyle \epsilon \gt 0\),
\(\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0\)
- Notes
- The sample mean converges to the population mean almost surely.
Relationships between distributions
This is important for tests.
See Relationships among probability distributions.
Poisson Distributions
Sum of poission is poisson sum of lambda.
Normal Distributions
- If \(\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)\) and \(\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)\) then \(\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)\) for any \(\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}\)
Gamma Distributions
Note exponential distributions are also Gamma distrubitions
- If \(\displaystyle X \sim \Gamma(k, \theta)\) then \(\displaystyle \lambda X \sim \Gamma(k, c\theta)\).
- If \(\displaystyle X_1 \sim \Gamma(k_1, \theta)\) and \(\displaystyle X_2 \sim \Gamma(k_2, \theta)\) then \(\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)\).
- If \(\displaystyle X_1 \sim \Gamma(\alpha, \theta)\) and \(\displaystyle X_2 \sim \Gamma(\beta, \theta)\), then \(\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)\).
T-distribution
Ratio of normal and squared-root of Chi-sq distribution yields T-distribution.
Chi-Sq Distribution
The ratio of two normalized Chi-sq is an F-distributions
F Distribution
Too many. See the Wikipedia Page. Most important are Chi-sq and T distribution
Textbooks
- Sheldon Ross' A First Course in Probability
- Hogg and Craig's Mathematical Statistics
- Casella and Burger's Statistical Inference