Probability: Difference between revisions

← Older edit Newer edit →

Revision as of 19:24, 17 December 2019

Calculus-based Probability

Basics

Axioms of Probability

\(\displaystyle 0 \leq P(E) \leq 1\)
\(\displaystyle P(S) = 1\) where \(\displaystyle S\) is your sample space
For mutually exclusive events \(\displaystyle E_1, E_2, ...\), \(\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)\)

Monotonicity

For all events \(\displaystyle A\), \(\displaystyle B\), \(\displaystyle A \subset B \implies P(A) \leq P(B)\)

Proof

Expectation and Variance

Some definitions and properties.

Definitions

Let \(\displaystyle X \sim D\) for some distribution \(\displaystyle D\). Let \(\displaystyle S\) be the support or domain of your distribution.

\(\displaystyle E(X) = \sum_S xp(x)\) or \(\displaystyle \int_S xp(x)dx\)
\(\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2\)

Total Expection

\(\displaystyle E(X) = E(E(X|Y))\)
Dr. Xu refers to this as the smooth property.

Proof

\(\displaystyle E(X) = \int_S xp(x)dx = \int_x x \int_y p(x,y)dy dx = \int_x x \int_y p(x|y)p(y)dy dx = \int_y\int_x x p(x|y)dxp(y)dy \)

Total Variance

\(\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X)\)
This one is not used as often on tests as total expectation

Proof

Moments and Moment Generating Functions

Definitions

We call \(\displaystyle E(X^i)\) the i'th moment of \(\displaystyle X\).
We call \(\displaystyle E(|X - E(X)|^i)\) the i'th central moment of \(\displaystyle X\).
Therefore the mean is the first moment and the variance is the second central moment.

Moment Generating Functions

\(\displaystyle E(e^{tX})\)
We call this the moment generating function (mgf).
We can differentiate it with respect to \(\displaystyle t\) and set \(\displaystyle t=0\) to get the higher moments.

Notes

The mgf, if it exists, uniquely defines the distribution.
The mgf of \(\displaystyle X+Y\) is \(\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})\)

Characteristic function

Convergence

There are 4 types of convergence typically taught in undergraduate courses.
See Wikipedia Convergence of random variables

Almost Surely

\(\displaystyle P(\lim X_i = X) = 1\)

In Probability

For all \(\displaystyle \epsilon \gt 0\)
\(\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0\)

Implies Convergence in distribution

In Distribution

Pointwise convergence of the cdf
A sequence of random variables \(\displaystyle X_1,...\) converges to \(\displaystyle X\) in probability if for all \(\displaystyle x \in S\),
\(\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)\)

Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)

In Mean Squared

\(\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0\)

Delta Method

See Wikipedia
Suppose \(\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)\).
Let \(\displaystyle g\) be a function such that \(\displaystyle g'\) exists and \(\displaystyle g'(\theta) \neq 0\)
Then \(\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)\)
Multivariate:
\(\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))\)

Notes

You can think of this like the Mean Value theorem for random variables.

\(\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)\)

Order Statistics

Inequalities and Limit Theorems

Markov's Inequality

Let \(\displaystyle X\) be a non-negative random variable.
Then \(\displaystyle P(X \geq a) \leq \frac{E(X)}{a}\)

Proof

\(\displaystyle \begin{aligned} E(X) &= \int_{0}^{\infty}xf(x)dx \\ &= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}af(x)dx\\ &=a \int_{a}^{\infty}f(x)dx\\ &=a * P(X \geq a)\\ \implies& P(X \geq a) \leq \frac{E(X)}{a} \end{aligned} \)

Chebyshev's Inequality

\(\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}\)
\(\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}\)

Proof

Apply Markov's inequality:
Let \(\displaystyle Y = |X - \mu|\) \(\displaystyle P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}\)

Usually used to prove convergence in probability

Central Limit Theorem

Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let \(\displaystyle \mu = E(x)\) and \(\displaystyle \sigma^2 = Var(x)\)
Different ways of saying the same thing:

\(\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)\)
\(\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)\)
\(\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)\)

Law of Large Numbers

The sample mean converges to the population mean in probability.
For all \(\displaystyle \epsilon \gt 0\), \(\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0\)

Notes

The sample mean converges to the population mean almost surely.

Properties and Relationships between distributions

This is important for exams.
See Relationships among probability distributions.

Poisson Distribution

If \(\displaystyle X_i \sim Poisson(\lambda_i)\) then \(\displaystyle \sum X_i \sim Poisson(\sum \lambda_i)\)

Normal Distribution

If \(\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)\) and \(\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)\) then \(\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)\) for any \(\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}\)

Exponential Distribution

\(\displaystyle \operatorname{Exp}(\lambda)\) is equivalent to \(\displaystyle \Gamma(1, 1/\lambda)\)
- Note that some conventions flip the second parameter of gamma, so it would be \(\displaystyle \Gamma(1, \lambda)\)
If \(\displaystyle \epsilon_1, ..., \epsilon_n\) are exponential distributions then \(\displaystyle \min\{\epsilon_i\} \sim \exp(\sum \lambda_i)\)
Note that the maximum is not exponentially distributed
- However, if \(\displaystyle X_1, ..., X_n \sim \exp(1)\) then \(\displaystyle Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)\)

Gamma Distribution

Note exponential distributions are also Gamma distrubitions

If \(\displaystyle X \sim \Gamma(k, \theta)\) then \(\displaystyle \lambda X \sim \Gamma(k, c\theta)\).
If \(\displaystyle X_1 \sim \Gamma(k_1, \theta)\) and \(\displaystyle X_2 \sim \Gamma(k_2, \theta)\) then \(\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)\).
If \(\displaystyle X_1 \sim \Gamma(\alpha, \theta)\) and \(\displaystyle X_2 \sim \Gamma(\beta, \theta)\), then \(\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)\).

T-distribution

Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
- If \(\displaystyle Z \sim N(0,1)\) and \(\displaystyle V \sim \Chi^2(v)\) then \(\displaystyle \frac{Z}{\sqrt{V/v}} \sim t-dist(v)\)

Chi-Sq Distribution

The ratio of two normalized Chi-sq is an F-distributions
- If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
If \(\displaystyle Z_1,...,Z_k \sim N(0,1)\) then \(\displaystyle Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)\)
If \(\displaystyle X_i \sim \Chi^2(k_i)\) then \(\displaystyle X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)\)

F Distribution

Too many. See the Wikipedia Page. Most important are Chi-sq and T distribution

If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
If \(\displaystyle X \sim t_{(n)}\) then \(\displaystyle X^2 \sim F(1, n)\) and \(\displaystyle X^{-2} \sim F(n, 1)\)

Textbooks

Sheldon Ross' A First Course in Probability
- This is a very good textbook despite the poor reviews on Amazon
Hogg and Craig's Mathematical Statistics
Casella and Burger's Statistical Inference

@@ Line 140: / Line 140: @@
 ===Exponential Distribution===
+* <math>\operatorname{Exp}(\lambda)</math> is equivalent to <math>\Gamma(1, 1/\lambda)</math>
+** Note that some conventions flip the second parameter of gamma, so it would be <math>\Gamma(1, \lambda)</math>
 * If <math>\epsilon_1, ..., \epsilon_n</math> are exponential distributions then <math>\min\{\epsilon_i\} \sim \exp(\sum \lambda_i)</math>
 * Note that the maximum is not exponentially distributed