# Probability

Calculus-based Probability

## Basics

### Axioms of Probability

• ${\displaystyle 0\leq P(E)\leq 1}$
• ${\displaystyle P(S)=1}$ where ${\displaystyle S}$ is your sample space
• For mutually exclusive events ${\displaystyle E_{1},E_{2},...}$, ${\displaystyle P\left(\bigcup _{i}^{\infty }E_{i}\right)=\sum _{i}^{\infty }P(E_{i})}$

### Monotonicity

• For all events ${\displaystyle A}$, ${\displaystyle B}$, ${\displaystyle A\subset B\implies P(A)\leq P(B)}$
Proof

## Expectation and Variance

Some definitions and properties.

### Definitions

Let ${\displaystyle X\sim D}$ for some distribution ${\displaystyle D}$. Let ${\displaystyle S}$ be the support or domain of your distribution.

• ${\displaystyle E(X)=\sum _{S}xp(x)}$ or ${\displaystyle \int _{S}xp(x)dx}$
• ${\displaystyle Var(X)=E[(X-E(X))^{2}]=E(X^{2})-(E(X))^{2}}$

### Total Expection

${\displaystyle E(X)=E(E(X|Y))}$
Dr. Xu refers to this as the smooth property.

Proof

${\displaystyle E(X)=\int _{S}xp(x)dx=\int _{x}x\int _{y}p(x,y)dydx=\int _{x}x\int _{y}p(x|y)p(y)dydx=\int _{y}\int _{x}xp(x|y)dxp(y)dy}$

### Total Variance

${\displaystyle Var(Y)=E(Var(Y|X))+Var(E(Y|X)}$
This one is not used as often on tests as total expectation

Proof

## Moments and Moment Generating Functions

### Definitions

We call ${\displaystyle E(X^{i})}$ the i'th moment of ${\displaystyle X}$.
We call ${\displaystyle E(|X-E(X)|^{i})}$ the i'th central moment of ${\displaystyle X}$.
Therefore the mean is the first moment and the variance is the second central moment.

### Moment Generating Functions

${\displaystyle E(e^{tX})}$
We call this the moment generating function (mgf).
We can differentiate it with respect to ${\displaystyle t}$ and set ${\displaystyle t=0}$ to get the higher moments.

Notes
• The mgf, if it exists, uniquely defines the distribution.
• The mgf of ${\displaystyle X+Y}$ is ${\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})}$

## Convergence

There are 4 types of convergence typically taught in undergraduate courses.
See Wikipedia Convergence of random variables

### Almost Surely

• ${\displaystyle P(\lim X_{i}=X)=1}$

### In Probability

For all ${\displaystyle \epsilon >0}$
${\displaystyle \lim P(|X_{i}-X|\geq \epsilon )=0}$

• Implies Convergence in distribution

### In Distribution

Pointwise convergence of the cdf
A sequence of random variables ${\displaystyle X_{1},...}$ converges to ${\displaystyle X}$ in probability if for all ${\displaystyle x\in S}$,
${\displaystyle \lim _{i\rightarrow \infty }F_{i}(x)=F(x)}$

• Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)

### In Mean Squared

${\displaystyle \lim _{i\rightarrow \infty }E(|X_{i}-X|^{2})=0}$

## Delta Method

See Wikipedia
Suppose ${\displaystyle {\sqrt {n}}(X_{n}-\theta ){\xrightarrow {D}}N(0,\sigma ^{2})}$.
Let ${\displaystyle g}$ be a function such that ${\displaystyle g'}$ exists and ${\displaystyle g'(\theta )\neq 0}$
Then ${\displaystyle {\sqrt {n}}(g(X_{n})-g(\theta )){\xrightarrow {D}}N(0,\sigma ^{2}g'(\theta )^{2})}$
Multivariate:
${\displaystyle {\sqrt {n}}(B-\beta ){\xrightarrow {D}}N(0,\Sigma )\implies {\sqrt {n}}(h(B)-h(\beta )){\xrightarrow {D}}N(0,h'(\theta )^{T}\Sigma h'(\theta ))}$

Notes
• You can think of this like the Mean Value theorem for random variables.
${\displaystyle (g(X_{n})-g(\theta ))\approx g'(\theta )(X_{n}-\theta )}$

## Inequalities and Limit Theorems

### Markov's Inequality

Let ${\displaystyle X}$ be a non-negative random variable.
Then ${\displaystyle P(X\geq a)\leq {\frac {E(X)}{a}}}$

Proof

{\displaystyle {\begin{aligned}E(X)&=\int _{0}^{\infty }xf(x)dx\\&=\int _{0}^{a}xf(x)dx+\int _{a}^{\infty }xf(x)dx\\&\geq \int _{a}^{\infty }xf(x)dx\\&\geq \int _{a}^{\infty }af(x)dx\\&=a\int _{a}^{\infty }f(x)dx\\&=a*P(X\geq a)\\\implies &P(X\geq a)\leq {\frac {E(X)}{a}}\end{aligned}}}

### Chebyshev's Inequality

• ${\displaystyle P(|X-\mu |\geq k\sigma )\leq {\frac {1}{k^{2}}}}$
• ${\displaystyle P(|X-\mu |\geq k)\leq {\frac {\sigma ^{2}}{k^{2}}}}$
Proof

Apply Markov's inequality:
Let ${\displaystyle Y=|X-\mu |}$ ${\displaystyle P(|X-\mu |\geq k)=P(Y\geq k)==P(Y^{2}\geq k^{2})\leq {\frac {E(Y^{2})}{k^{2}}}={\frac {E((X-\mu )^{2})}{k^{2}}}}$

• Usually used to prove convergence in probability

### Central Limit Theorem

Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let ${\displaystyle \mu =E(x)}$ and ${\displaystyle \sigma ^{2}=Var(x)}$
Different ways of saying the same thing:

• ${\displaystyle {\sqrt {n}}({\bar {x}}-\mu )\sim N(0,\sigma ^{2})}$
• ${\displaystyle {\frac {\sqrt {n}}{\sigma }}({\bar {x}}-\mu )\sim N(0,1)}$
• ${\displaystyle {\bar {x}}\sim N(\mu ,\sigma ^{2}/n)}$

### Law of Large Numbers

The sample mean converges to the population mean in probability.
For all ${\displaystyle \epsilon >0}$, ${\displaystyle \lim _{n\rightarrow \infty }P(|{\bar {X}}_{n}-E(X)|\geq \epsilon )=0}$

Notes
• The sample mean converges to the population mean almost surely.

## Relationships between distributions

This is important for tests.
See Relationships among probability distributions.

### Poisson Distributions

Sum of poission is poisson sum of lambda.

### Normal Distributions

• If ${\displaystyle X_{1}\sim N(\mu _{1},\sigma _{1}^{2})}$ and ${\displaystyle X_{2}\sim N(\mu _{2},\sigma _{2}^{2})}$ then ${\displaystyle \lambda _{1}X_{1}+\lambda _{2}X_{2}\sim N(\lambda _{1}\mu _{1}+\lambda _{2}X_{2},\lambda _{1}^{2}\sigma _{1}^{2}+\lambda _{2}^{2}+\sigma _{2}^{2})}$ for any ${\displaystyle \lambda _{1},\lambda _{2}\in \mathbb {R} }$

### Gamma Distributions

Note exponential distributions are also Gamma distrubitions

• If ${\displaystyle X\sim \Gamma (k,\theta )}$ then ${\displaystyle \lambda X\sim \Gamma (k,c\theta )}$.
• If ${\displaystyle X_{1}\sim \Gamma (k_{1},\theta )}$ and ${\displaystyle X_{2}\sim \Gamma (k_{2},\theta )}$ then ${\displaystyle X_{2}+X_{2}\sim \Gamma (k_{1}+k_{2},\theta )}$.
• If ${\displaystyle X_{1}\sim \Gamma (\alpha ,\theta )}$ and ${\displaystyle X_{2}\sim \Gamma (\beta ,\theta )}$, then ${\displaystyle {\frac {X_{1}}{X_{1}+X_{2}}}\sim B(\alpha ,\beta )}$.

### T-distribution

Ratio of normal and squared-root of Chi-sq distribution yields T-distribution.

### Chi-Sq Distribution

The ratio of two normalized Chi-sq is an F-distributions

### F Distribution

Too many. See the Wikipedia Page. Most important are Chi-sq and T distribution