# Probability


Calculus-based Probability

This is content covered in STAT410 and STAT700 at UMD.

## Basics

### Axioms of Probability

• $$\displaystyle 0 \leq P(E) \leq 1$$
• $$\displaystyle P(S) = 1$$ where $$\displaystyle S$$ is your sample space
• For mutually exclusive events $$\displaystyle E_1, E_2, ...$$, $$\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)$$

### Monotonicity

• For all events $$\displaystyle A$$ and $$\displaystyle B$$, $$\displaystyle A \subset B \implies P(A) \leq P(B)$$
Proof

## Expectation and Variance

Some definitions and properties.

### Definitions

Let $$\displaystyle X \sim D$$ for some distribution $$\displaystyle D$$. Let $$\displaystyle S$$ be the support or domain of your distribution.

• $$\displaystyle E(X) = \sum_S xp(x)$$ or $$\displaystyle \int_S xp(x)dx$$
• $$\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2$$

### Total Expection

$$\displaystyle E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))$$
Dr. Xu refers to this as the smooth property.

Proof

$$\displaystyle E(X) = \int_S xp(x)dx = \int_x x \int_y p(x,y)dy dx = \int_x x \int_y p(x|y)p(y)dy dx = \int_y\int_x x p(x|y)dxp(y)dy$$

### Total Variance

$$\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X)$$
This one is not used as often on tests as total expectation

Proof

### Sample Mean and Variance

The sample mean is $$\displaystyle \bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i$$.
The unbiased sample variance is $$\displaystyle S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2$$.

#### Student's Theorem

Let $$\displaystyle X_1,...,X_n$$ be from $$\displaystyle N(\mu, \sigma^2)$$.
Then the following results about the sample mean $$\displaystyle \bar{X}$$ and the unbiased sample variance $$\displaystyle S^2$$ hold:

• $$\displaystyle \bar{X}$$ and $$\displaystyle S^2$$ are independent
• $$\displaystyle \bar{X} \sim N(\mu, \sigma^2 / n)$$
• $$\displaystyle (n-1)S^2 / \sigma^2 \sim \chi^2(n-1)$$

## Moments and Moment Generating Functions

### Definitions

We call $$\displaystyle E(X^i)$$ the i'th moment of $$\displaystyle X$$.
We call $$\displaystyle E(|X - E(X)|^i)$$ the i'th central moment of $$\displaystyle X$$.
Therefore the mean is the first moment and the variance is the second central moment.

### Moment Generating Functions

$$\displaystyle E(e^{tX})$$
We call this the moment generating function (mgf).
We can differentiate it with respect to $$\displaystyle t$$ and set $$\displaystyle t=0$$ to get the higher moments.

Notes
• The mgf, if it exists, uniquely defines the distribution.
• The mgf of $$\displaystyle X+Y$$ is $$\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})$$

## Convergence

There are 4 common types of convergence.

### Almost Surely

• $$\displaystyle P(\lim X_i = X) = 1$$

### In Probability

For all $$\displaystyle \epsilon \gt 0$$
$$\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0$$

• Implies Convergence in distribution

### In Distribution

Pointwise convergence of the cdf
A sequence of random variables $$\displaystyle X_1,...$$ converges to $$\displaystyle X$$ in probability if for all $$\displaystyle x \in S$$,
$$\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)$$

• Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)

### In Mean Squared

$$\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0$$

## Delta Method

Suppose $$\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)$$.
Let $$\displaystyle g$$ be a function such that $$\displaystyle g'$$ exists and $$\displaystyle g'(\theta) \neq 0$$
Then $$\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)$$

Multivariate:
$$\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))$$

Notes
• You can think of this like the Mean Value theorem for random variables.
• $$\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)$$

## Inequalities and Limit Theorems

### Markov's Inequality

Let $$\displaystyle X$$ be a non-negative random variable.
Then $$\displaystyle P(X \geq a) \leq \frac{E(X)}{a}$$

Proof

\displaystyle \begin{aligned} E(X) &= \int_{0}^{\infty}xf(x)dx \\ &= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}af(x)dx\\ &=a \int_{a}^{\infty}f(x)dx\\ &=a * P(X \geq a)\\ \implies& P(X \geq a) \leq \frac{E(X)}{a} \end{aligned}

### Chebyshev's Inequality

• $$\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}$$
• $$\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}$$
Proof

Apply Markov's inequality:
Let $$\displaystyle Y = |X - \mu|$$ $$\displaystyle P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}$$

• Usually used to prove convergence in probability

### Central Limit Theorem

Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let $$\displaystyle \mu = E(x)$$ and $$\displaystyle \sigma^2 = Var(x)$$
Different ways of saying the same thing:

• $$\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)$$
• $$\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)$$
• $$\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)$$

### Law of Large Numbers

The sample mean converges to the population mean in probability.
For all $$\displaystyle \epsilon \gt 0$$, $$\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0$$

Notes
• The sample mean converges to the population mean almost surely.

## Properties and Relationships between distributions

This is important for exams.

### Poisson Distribution

• If $$\displaystyle X_i \sim Poisson(\lambda_i)$$ then $$\displaystyle \sum X_i \sim Poisson(\sum \lambda_i)$$

### Normal Distribution

• If $$\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)$$ and $$\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)$$ then $$\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)$$ for any $$\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}$$

### Exponential Distribution

• $$\displaystyle \operatorname{Exp}(\lambda)$$ is equivalent to $$\displaystyle \Gamma(1, 1/\lambda)$$
• Note that some conventions flip the second parameter of gamma, so it would be $$\displaystyle \Gamma(1, \lambda)$$
• If $$\displaystyle \epsilon_1, ..., \epsilon_n$$ are exponential distributions then $$\displaystyle \min\{\epsilon_i\} \sim \exp(\sum \lambda_i)$$
• Note that the maximum is not exponentially distributed
• However, if $$\displaystyle X_1, ..., X_n \sim \exp(1)$$ then $$\displaystyle Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)$$

### Gamma Distribution

Note exponential distributions are also Gamma distrubitions

• If $$\displaystyle X \sim \Gamma(k, \theta)$$ then $$\displaystyle \lambda X \sim \Gamma(k, c\theta)$$.
• If $$\displaystyle X_1 \sim \Gamma(k_1, \theta)$$ and $$\displaystyle X_2 \sim \Gamma(k_2, \theta)$$ then $$\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)$$.
• If $$\displaystyle X_1 \sim \Gamma(\alpha, \theta)$$ and $$\displaystyle X_2 \sim \Gamma(\beta, \theta)$$, then $$\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)$$.

### T-distribution

• Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
• If $$\displaystyle Z \sim N(0,1)$$ and $$\displaystyle V \sim \Chi^2(v)$$ then $$\displaystyle \frac{Z}{\sqrt{V/v}} \sim t-dist(v)$$

### Chi-Sq Distribution

• The ratio of two normalized Chi-sq is an F-distributions
• If $$\displaystyle X \sim \chi^2_{d1}$$ and $$\displaystyle Y \sim \chi^2_{d2}$$ then $$\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)$$
• If $$\displaystyle Z_1,...,Z_k \sim N(0,1)$$ then $$\displaystyle Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)$$
• If $$\displaystyle X_i \sim \Chi^2(k_i)$$ then $$\displaystyle X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)$$
• $$\displaystyle \Chi^2(k)$$ is equivalent to $$\displaystyle \Gamma(k/2, 2)$$

### F Distribution

Too many to list. See Wikipedia: F-distribution.

Most important are Chi-sq and T distribution:

• If $$\displaystyle X \sim \chi^2_{d1}$$ and $$\displaystyle Y \sim \chi^2_{d2}$$ then $$\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)$$
• If $$\displaystyle X \sim t_{(n)}$$ then $$\displaystyle X^2 \sim F(1, n)$$ and $$\displaystyle X^{-2} \sim F(n, 1)$$

## Textbooks

The books below cover both introductory probability as well as statistics.