Probability: Difference between revisions

From David's Wiki
No edit summary
 
(48 intermediate revisions by the same user not shown)
Line 1: Line 1:
Calculus-based Probability
Calculus-based Probability
This is content covered in STAT410 and STAT700 at UMD.


==Basics==
==Basics==
Line 6: Line 8:
* <math>P(S) = 1</math> where <math>S</math> is your sample space
* <math>P(S) = 1</math> where <math>S</math> is your sample space
* For mutually exclusive events <math>E_1, E_2, ...</math>, <math>P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)</math>
* For mutually exclusive events <math>E_1, E_2, ...</math>, <math>P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)</math>
===Monotonicity===
===Monotonicity===
* For all events <math>A</math>, <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
* For all events <math>A</math> and <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
{{hidden | Proof | }}
{{hidden | Proof | }}
===Conditional Probability===
<math>P(A|B)</math> is the probability of event A given event B.<br>
Mathematically, this is defined as <math>P(A|B) = P(A,B) / P(B)</math>.<br>
Note that this can also be written as <math>P(A|B)P(B) = P(A, B)</math>
With some additional substitution, we get '''Baye's Theorem''':
<math>
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
</math>
==Random Variables==
A random variable is a variable which takes on a distribution rather than a value.
===PMF, PDF, CDF===
For discrete distributions, we call <math>p_{X}(x)=P(X=x)</math> the probability mass function (PMF).<br>
For continuous distributions, we have the probability density function (PDF) <math>f(x)</math>.<br>
The comulative distribution function (CDF) is <math>F(x) = P(X \leq x)</math>.<br>
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.
===Joint Random Variables===
Two random variables are independant iff <math>f_{X,Y}(x,y) = f_X(x) f_Y(y)</math>.<br>
Otherwise, the marginal distribution is <math>f_X(x) = \int f_{X,Y}(x,y) dy</math>.
===Change of variables===
Let <math>g</math> be a monotonic increasing function and <math>Y = g(X)</math>.<br>
Then <math>F_Y(y) = P(Y \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))</math>.<br>
And <math>f_Y(y) = \frac{d}{dy} F_Y(y) = \frac{d}{dy} F_X(g^{-1}(y)) = f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y)</math><br>
Hence:
<math display="block">
  f_Y(y) = f_x(g^{-1}(y)) \frac{d}{dy} g^{-1}(y)
</math>


==Expectation and Variance==
==Expectation and Variance==
Line 17: Line 51:
* <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math>
* <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math>
* <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math>
* <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math>
===Total Expection===
===Total Expection===
<math>E(X) = E(E(X|Y))</math><br>
<math>E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))</math><br>
Dr. Xu refers to this as the smooth property.
Dr. Xu refers to this as the smooth property.
{{hidden | Proof |
{{hidden | Proof |
<math>
<math>
E(X) = \int_S xp(x)dx  
\begin{aligned}
= \int_x x \int_y p(x,y)dy dx
E(X) &= \int_S x p(x)dx \\
= \int_x x \int_y p(x|y)p(y)dy dx
&= \int_x x \int_y p(x,y)dy dx \\
= \int_y\int_x x  p(x|y)dxp(y)dy
&= \int_x x \int_y p(x|y)p(y)dy dx \\
&= \int_y\int_x x  p(x|y)dxp(y)dy
\end{aligned}
</math>
</math>
}}
}}


===Total Variance===
===Total Variance===
<math>Var(Y) = E(Var(Y|X)) + Var(E(Y | X)</math><br>
<math>Var(Y) = E(Var(Y|X)) + Var(E(Y | X))</math><br>
This one is not used as often on tests as total expectation
This one is not used as often on tests as total expectation
{{hidden | Proof |
{{hidden | Proof |
<math>
\begin{aligned}
Var(Y) &= E(Y^2) - E(Y)^2 \\
&= E(E(Y^2|X)) - E(E(Y|X))^2\\
&= E(Var(Y|X) + E(Y|X)^2) - E(E(Y|X))^2\\
&= E((Var(Y|X)) + E(E(Y|X)^2) - E(E(Y|X))^2\\
&= E((Var(Y|X)) + Var(E(Y|X))\\
\end{aligned}
</math>
}}


}}
===Sample Mean and Variance===
The sample mean is <math>\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i</math>.<br>
The unbiased sample variance is <math>S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2</math>.
 
====Student's Theorem====
Let <math>X_1,...,X_n</math> be from <math>N(\mu, \sigma^2)</math>.<br>
Then the following results about the sample mean <math>\bar{X}</math>
and the unbiased sample variance <math>S^2</math> hold:
* <math>\bar{X}</math> and <math>S^2</math> are independent
* <math>\bar{X} \sim N(\mu, \sigma^2 / n)</math>
* <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math>
 
===Jensen's Inequality===
{{main | Wikipedia: Jensen's inequality}}
Let g be a convex function (i.e. second derivative is positive).
Then <math>g(E(x)) \leq E(g(x))</math>.


==Moments and Moment Generating Functions==
==Moments and Moment Generating Functions==
===Definitions===
===Definitions===
We call <math>E(X^i)</math> the i'th moment of <math>X</math>.<br>
{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}}
We call <math>E(|X - E(X)|^i)</math> the i'th central moment of <math>X</math>.<br>
* <math>E(X^n)</math> the n'th moment
Therefore the mean is the first moment and the variance is the second central moment.
* <math>E((X-\mu)^n)</math> the n'th central moment
* <math>E(((X-\mu) / \sigma)^n)</math> the n'th standardized moment
Expectation is the first moment and variance is the second central moment.<br>
Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment.
 
===Moment Generating Functions===
===Moment Generating Functions===
<math>E(e^{tX})</math><br>
To compute moments, we can use a moment generating function (MGF):
We call this the moment generating function (mgf).<br>
<math display="block">M_X(t) = E(e^{tX})</math>
We can differentiate it with respect to <math>t</math> and set <math>t=0</math> to get the higher moments.
With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>.
; Notes
; Notes
* The mgf, if it exists, uniquely defines the distribution.
* The MGF, if it exists, uniquely defines the distribution.
* The mgf of <math>X+Y</math> is <math>E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})</math>
* The MGF of <math>X+Y</math> is <math>MGF_{X+Y}(t) = E(e^{t(X+Y)})=E(e^{tX})E(e^{tY}) = MGF_X(t) * MGF_Y(t)</math>
 
===Characteristic function===
===Characteristic function===


==Convergence==
==Convergence==
There are 4 types of convergence typically taught in undergraduate courses.<br>
{{main | Wikipedia: Convergence of random variables}}
See [https://en.wikipedia.org/wiki/Convergence_of_random_variables Wikipedia Convergence of random variables]
There are 4 common types of convergence.
 
===Almost Surely===
===Almost Surely===
* <math>P(\lim X_i = X) = 1</math>
* <math>P(\lim X_i = X) = 1</math>
Line 72: Line 140:


==Delta Method==
==Delta Method==
See [https://en.wikipedia.org/wiki/Delta_method Wikipedia]<br>
{{main | Wikipedia:Delta method}}
 
Suppose <math>\sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)</math>.<br>
Suppose <math>\sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)</math>.<br>
Let <math>g</math> be a function such that <math>g'</math> exists and <math>g'(\theta) \neq 0</math><br>
Let <math>g</math> be a function such that <math>g'</math> exists and <math>g'(\theta) \neq 0</math><br>
Then <math>\sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)</math><br>
Then <math>\sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)</math>
 
Multivariate:<br>
Multivariate:<br>
<math>\sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))</math><br>
<math>\sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))</math>
 
;Notes
;Notes
* You can think of this like the Mean Value theorem for random variables.
* You can think of this like the Mean Value theorem for random variables.
: <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>
** <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>


==Order Statistics==
==Order Statistics==
Consider iid random variables <math>X_1, ..., X_n</math>.<br>
Then the order statistics are <math>X_{(1)}, ..., X_{(n)}</math> where <math>X_{(i)}</math> represents the i'th smallest number.
===Min and Max===
The easiest to reason about are the minimum and maximum order statistics:
<math>P(X_{(1)} <= x) = P(\text{min}(X_i) <= x) = 1 - P(X_1 > x, ..., X_n > x)</math>
<math>P(X_{(n)} <= x) = P(\text{max}(X_i) <= x) = P(X_1 <= x, ..., X_n <= x)</math>
===Joint PDF===
If <math>X_i</math> has pdf <math>f</math>, the joint pdf of <math>X_{(1)}, ..., X_{(n)}</math> is:
<math>
g(x_1, ...) = n!*f(x_1)*...*f(x_n)
</math>
since there are n! ways perform a change of variables.
===Individual PDF===
<math>
f_{X(i)}(x) = \frac{n!}{(i-1)!(n-i)!} F(x)^{i-1} f(x) [1-F(x)]^{n-1}
</math>


==Inequalities and Limit Theorems==
==Inequalities and Limit Theorems==
Line 90: Line 181:
{{hidden | Proof |  
{{hidden | Proof |  
<math>
<math>
\begin{aligned}
E(X)  
E(X)  
= \int_{0}^{\infty}xf(x)dx  
&= \int_{0}^{\infty}xf(x)dx \\
= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx
&= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\
\geq \int_{a}^{\infty}xf(x)dx
&\geq \int_{a}^{\infty}xf(x)dx\\
\geq \int_{a}^{\infty}af(x)dx
&\geq \int_{a}^{\infty}af(x)dx\\
=a \int_{a}^{\infty}f(x)dx
&=a \int_{a}^{\infty}f(x)dx\\
=a*P(X \geq a)\\
&=a * P(X \geq a)\\
\implies P(x\geq a) \leq \frac{E(X)}{a}
\implies& P(X \geq a) \leq \frac{E(X)}{a}
\end{aligned}
</math>
</math>
}}
}}
===Chebyshev's Inequality===
===Chebyshev's Inequality===
* <math>P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}</math>
* <math>P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}</math>
Line 105: Line 199:
{{hidden | Proof |  
{{hidden | Proof |  
Apply Markov's inequality:<br>
Apply Markov's inequality:<br>
Let <math>Y = |X - \mu|</math>
Let <math>Y = |X - \mu|</math><br>
<math>P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}</math>
Then:<br>
<math>
\begin{aligned}
P(|X - \mu| \geq k) &= P(Y \geq k) \\
&= P(Y^2 \geq k^2) \\
&\leq \frac{E(Y^2)}{k^2} \\
&= \frac{E((X - \mu)^2)}{k^2}
\end{aligned}
</math>
}}
}}
* Usually used to prove convergence in probability
* Usually used to prove convergence in probability
Line 126: Line 228:
* The sample mean converges to the population mean almost surely.
* The sample mean converges to the population mean almost surely.


==Relationships between distributions==
==Properties and Relationships between distributions==
This is important for tests.<br>
{{main | Wikipedia: Relationships among probability distributions}}
See [https://en.wikipedia.org/wiki/Relationships_among_probability_distributions Relationships among probability distributions].
;This is important for exams.


===Poisson Distributions===
===Poisson Distribution===
Sum of poission is poisson sum of lambda.
* If <math>X_i \sim Poisson(\lambda_i)</math> then <math>\sum X_i \sim Poisson(\sum \lambda_i)</math>


===Normal Distributions===
===Normal Distribution===
* If <math>X_1 \sim N(\mu_1, \sigma_1^2)</math> and <math>X_2 \sim N(\mu_2, \sigma_2^2)</math> then <math>\lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)</math> for any <math>\lambda_1, \lambda_2 \in \mathbb{R}</math>
* If <math>X_1 \sim N(\mu_1, \sigma_1^2)</math> and <math>X_2 \sim N(\mu_2, \sigma_2^2)</math> then <math>\lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)</math> for any <math>\lambda_1, \lambda_2 \in \mathbb{R}</math>


===Gamma Distributions===
===Exponential Distribution===
* <math>\operatorname{Exp}(\lambda)</math> is equivalent to <math>\Gamma(1, 1/\lambda)</math>
** Note that some conventions flip the second parameter of gamma, so it would be <math>\Gamma(1, \lambda)</math>
* If <math>\epsilon_1, ..., \epsilon_n</math> are exponential distributions then <math>\min\{\epsilon_i\} \sim \exp(\sum \lambda_i)</math>
* Note that the maximum is not exponentially distributed
** However, if <math>X_1, ..., X_n \sim \exp(1)</math> then <math>Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)</math>
 
===Gamma Distribution===
Note exponential distributions are also Gamma distrubitions
Note exponential distributions are also Gamma distrubitions
* If <math>X \sim \Gamma(k, \theta)</math> then <math>\lambda X \sim \Gamma(k, c\theta)</math>.<br>
* If <math>X \sim \Gamma(k, \theta)</math> then <math>\lambda X \sim \Gamma(k, c\theta)</math>.<br>
Line 143: Line 252:


===T-distribution===
===T-distribution===
Ratio of normal and squared-root of Chi-sq distribution yields T-distribution.
* Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
** If <math>Z \sim N(0,1)</math> and <math> V \sim \Chi^2(v)</math> then <math>\frac{Z}{\sqrt{V/v}} \sim \text{t-dist}(v)</math>


===Chi-Sq Distribution===
===Chi-Sq Distribution===
The ratio of two normalized Chi-sq is an F-distributions
* The ratio of two normalized Chi-sq is an F-distributions
** If <math>X \sim \chi^2_{d1}</math> and <math>Y \sim \chi^2_{d2}</math> then <math>\frac{X/d1}{Y/d2} \sim F(d1,d2)</math>
* If <math>Z_1,...,Z_k \sim N(0,1)</math> then <math>Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)</math>
* If <math>X_i \sim \Chi^2(k_i)</math> then <math>X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)</math>
* <math>\Chi^2(k)</math> is equivalent to <math>\Gamma(k/2, 2)</math>


===F Distribution===
===F Distribution===
Too many. See [https://en.wikipedia.org/wiki/F-distribution the Wikipedia Page].
Too many to list. See [[Wikipedia: F-distribution]].
Most important are Chi-sq and T distribution
 
Most important are Chi-sq and T distribution:
* If <math>X \sim \chi^2_{d1}</math> and <math>Y \sim \chi^2_{d2}</math> then <math>\frac{X/d1}{Y/d2} \sim F(d1,d2)</math>
* If <math>X \sim t_{(n)}</math> then <math>X^2 \sim F(1, n)</math> and <math>X^{-2} \sim F(n, 1)</math>


==Textbooks==
==Textbooks==
* Sheldon Ross' A First Course in Probability
* [https://smile.amazon.com/dp/032179477X Sheldon Ross' A First Course in Probability]
* [https://smile.amazon.com/Introduction-Mathematical-Statistics-Robert-Hogg/dp/0321795431?sa-no-redirect=1 Hogg and Craig's Mathematical Statistics]
* [https://smile.amazon.com/dp/0321795431 Hogg and Craig's Mathematical Statistics]
* Casella and Burger's Statistical Inference
* [https://smile.amazon.com/dp/0534243126 Casella and Burger's Statistical Inference]

Latest revision as of 22:57, 22 February 2024

Calculus-based Probability

This is content covered in STAT410 and STAT700 at UMD.

Basics

Axioms of Probability

  • \(\displaystyle 0 \leq P(E) \leq 1\)
  • \(\displaystyle P(S) = 1\) where \(\displaystyle S\) is your sample space
  • For mutually exclusive events \(\displaystyle E_1, E_2, ...\), \(\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)\)

Monotonicity

  • For all events \(\displaystyle A\) and \(\displaystyle B\), \(\displaystyle A \subset B \implies P(A) \leq P(B)\)
Proof

Conditional Probability

\(\displaystyle P(A|B)\) is the probability of event A given event B.
Mathematically, this is defined as \(\displaystyle P(A|B) = P(A,B) / P(B)\).
Note that this can also be written as \(\displaystyle P(A|B)P(B) = P(A, B)\) With some additional substitution, we get Baye's Theorem: \(\displaystyle P(A|B) = \frac{P(B|A)P(A)}{P(B)} \)

Random Variables

A random variable is a variable which takes on a distribution rather than a value.

PMF, PDF, CDF

For discrete distributions, we call \(\displaystyle p_{X}(x)=P(X=x)\) the probability mass function (PMF).
For continuous distributions, we have the probability density function (PDF) \(\displaystyle f(x)\).
The comulative distribution function (CDF) is \(\displaystyle F(x) = P(X \leq x)\).
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.

Joint Random Variables

Two random variables are independant iff \(\displaystyle f_{X,Y}(x,y) = f_X(x) f_Y(y)\).
Otherwise, the marginal distribution is \(\displaystyle f_X(x) = \int f_{X,Y}(x,y) dy\).

Change of variables

Let \(\displaystyle g\) be a monotonic increasing function and \(\displaystyle Y = g(X)\).
Then \(\displaystyle F_Y(y) = P(Y \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))\).
And \(\displaystyle f_Y(y) = \frac{d}{dy} F_Y(y) = \frac{d}{dy} F_X(g^{-1}(y)) = f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y)\)
Hence: \[ f_Y(y) = f_x(g^{-1}(y)) \frac{d}{dy} g^{-1}(y) \]

Expectation and Variance

Some definitions and properties.

Definitions

Let \(\displaystyle X \sim D\) for some distribution \(\displaystyle D\). Let \(\displaystyle S\) be the support or domain of your distribution.

  • \(\displaystyle E(X) = \sum_S xp(x)\) or \(\displaystyle \int_S xp(x)dx\)
  • \(\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2\)

Total Expection

\(\displaystyle E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))\)
Dr. Xu refers to this as the smooth property.

Proof

\(\displaystyle \begin{aligned} E(X) &= \int_S x p(x)dx \\ &= \int_x x \int_y p(x,y)dy dx \\ &= \int_x x \int_y p(x|y)p(y)dy dx \\ &= \int_y\int_x x p(x|y)dxp(y)dy \end{aligned} \)

Total Variance

\(\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X))\)
This one is not used as often on tests as total expectation

Proof

\(\displaystyle \begin{aligned} Var(Y) &= E(Y^2) - E(Y)^2 \\ &= E(E(Y^2|X)) - E(E(Y|X))^2\\ &= E(Var(Y|X) + E(Y|X)^2) - E(E(Y|X))^2\\ &= E((Var(Y|X)) + E(E(Y|X)^2) - E(E(Y|X))^2\\ &= E((Var(Y|X)) + Var(E(Y|X))\\ \end{aligned} \)

Sample Mean and Variance

The sample mean is \(\displaystyle \bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i\).
The unbiased sample variance is \(\displaystyle S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2\).

Student's Theorem

Let \(\displaystyle X_1,...,X_n\) be from \(\displaystyle N(\mu, \sigma^2)\).
Then the following results about the sample mean \(\displaystyle \bar{X}\) and the unbiased sample variance \(\displaystyle S^2\) hold:

  • \(\displaystyle \bar{X}\) and \(\displaystyle S^2\) are independent
  • \(\displaystyle \bar{X} \sim N(\mu, \sigma^2 / n)\)
  • \(\displaystyle (n-1)S^2 / \sigma^2 \sim \chi^2(n-1)\)

Jensen's Inequality

Let g be a convex function (i.e. second derivative is positive). Then \(\displaystyle g(E(x)) \leq E(g(x))\).

Moments and Moment Generating Functions

Definitions

  • \(\displaystyle E(X^n)\) the n'th moment
  • \(\displaystyle E((X-\mu)^n)\) the n'th central moment
  • \(\displaystyle E(((X-\mu) / \sigma)^n)\) the n'th standardized moment

Expectation is the first moment and variance is the second central moment.
Additionally, skew is the third standardized moment and kurtosis is the fourth standardized moment.

Moment Generating Functions

To compute moments, we can use a moment generating function (MGF): \[M_X(t) = E(e^{tX})\] With the MGF, we can get any order moments by taking n derivatives and setting \(t=0\).

Notes
  • The MGF, if it exists, uniquely defines the distribution.
  • The MGF of \(\displaystyle X+Y\) is \(\displaystyle MGF_{X+Y}(t) = E(e^{t(X+Y)})=E(e^{tX})E(e^{tY}) = MGF_X(t) * MGF_Y(t)\)

Characteristic function

Convergence

There are 4 common types of convergence.

Almost Surely

  • \(\displaystyle P(\lim X_i = X) = 1\)

In Probability

For all \(\displaystyle \epsilon \gt 0\)
\(\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0\)

  • Implies Convergence in distribution

In Distribution

Pointwise convergence of the cdf
A sequence of random variables \(\displaystyle X_1,...\) converges to \(\displaystyle X\) in probability if for all \(\displaystyle x \in S\),
\(\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)\)

  • Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)

In Mean Squared

\(\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0\)

Delta Method

Suppose \(\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)\).
Let \(\displaystyle g\) be a function such that \(\displaystyle g'\) exists and \(\displaystyle g'(\theta) \neq 0\)
Then \(\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)\)

Multivariate:
\(\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))\)

Notes
  • You can think of this like the Mean Value theorem for random variables.
    • \(\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)\)

Order Statistics

Consider iid random variables \(\displaystyle X_1, ..., X_n\).
Then the order statistics are \(\displaystyle X_{(1)}, ..., X_{(n)}\) where \(\displaystyle X_{(i)}\) represents the i'th smallest number.

Min and Max

The easiest to reason about are the minimum and maximum order statistics: \(\displaystyle P(X_{(1)} \lt = x) = P(\text{min}(X_i) \lt = x) = 1 - P(X_1 \gt x, ..., X_n \gt x)\) \(\displaystyle P(X_{(n)} \lt = x) = P(\text{max}(X_i) \lt = x) = P(X_1 \lt = x, ..., X_n \lt = x)\)

Joint PDF

If \(\displaystyle X_i\) has pdf \(\displaystyle f\), the joint pdf of \(\displaystyle X_{(1)}, ..., X_{(n)}\) is: \(\displaystyle g(x_1, ...) = n!*f(x_1)*...*f(x_n) \) since there are n! ways perform a change of variables.

Individual PDF

\(\displaystyle f_{X(i)}(x) = \frac{n!}{(i-1)!(n-i)!} F(x)^{i-1} f(x) [1-F(x)]^{n-1} \)

Inequalities and Limit Theorems

Markov's Inequality

Let \(\displaystyle X\) be a non-negative random variable.
Then \(\displaystyle P(X \geq a) \leq \frac{E(X)}{a}\)

Proof

\(\displaystyle \begin{aligned} E(X) &= \int_{0}^{\infty}xf(x)dx \\ &= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}af(x)dx\\ &=a \int_{a}^{\infty}f(x)dx\\ &=a * P(X \geq a)\\ \implies& P(X \geq a) \leq \frac{E(X)}{a} \end{aligned} \)

Chebyshev's Inequality

  • \(\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}\)
  • \(\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}\)
Proof

Apply Markov's inequality:
Let \(\displaystyle Y = |X - \mu|\)
Then:
\(\displaystyle \begin{aligned} P(|X - \mu| \geq k) &= P(Y \geq k) \\ &= P(Y^2 \geq k^2) \\ &\leq \frac{E(Y^2)}{k^2} \\ &= \frac{E((X - \mu)^2)}{k^2} \end{aligned} \)

  • Usually used to prove convergence in probability

Central Limit Theorem

Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let \(\displaystyle \mu = E(x)\) and \(\displaystyle \sigma^2 = Var(x)\)
Different ways of saying the same thing:

  • \(\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)\)
  • \(\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)\)
  • \(\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)\)

Law of Large Numbers

The sample mean converges to the population mean in probability.
For all \(\displaystyle \epsilon \gt 0\), \(\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0\)

Notes
  • The sample mean converges to the population mean almost surely.

Properties and Relationships between distributions

This is important for exams.

Poisson Distribution

  • If \(\displaystyle X_i \sim Poisson(\lambda_i)\) then \(\displaystyle \sum X_i \sim Poisson(\sum \lambda_i)\)

Normal Distribution

  • If \(\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)\) and \(\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)\) then \(\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)\) for any \(\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}\)

Exponential Distribution

  • \(\displaystyle \operatorname{Exp}(\lambda)\) is equivalent to \(\displaystyle \Gamma(1, 1/\lambda)\)
    • Note that some conventions flip the second parameter of gamma, so it would be \(\displaystyle \Gamma(1, \lambda)\)
  • If \(\displaystyle \epsilon_1, ..., \epsilon_n\) are exponential distributions then \(\displaystyle \min\{\epsilon_i\} \sim \exp(\sum \lambda_i)\)
  • Note that the maximum is not exponentially distributed
    • However, if \(\displaystyle X_1, ..., X_n \sim \exp(1)\) then \(\displaystyle Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)\)

Gamma Distribution

Note exponential distributions are also Gamma distrubitions

  • If \(\displaystyle X \sim \Gamma(k, \theta)\) then \(\displaystyle \lambda X \sim \Gamma(k, c\theta)\).
  • If \(\displaystyle X_1 \sim \Gamma(k_1, \theta)\) and \(\displaystyle X_2 \sim \Gamma(k_2, \theta)\) then \(\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)\).
  • If \(\displaystyle X_1 \sim \Gamma(\alpha, \theta)\) and \(\displaystyle X_2 \sim \Gamma(\beta, \theta)\), then \(\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)\).

T-distribution

  • Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
    • If \(\displaystyle Z \sim N(0,1)\) and \(\displaystyle V \sim \Chi^2(v)\) then \(\displaystyle \frac{Z}{\sqrt{V/v}} \sim \text{t-dist}(v)\)

Chi-Sq Distribution

  • The ratio of two normalized Chi-sq is an F-distributions
    • If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
  • If \(\displaystyle Z_1,...,Z_k \sim N(0,1)\) then \(\displaystyle Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)\)
  • If \(\displaystyle X_i \sim \Chi^2(k_i)\) then \(\displaystyle X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)\)
  • \(\displaystyle \Chi^2(k)\) is equivalent to \(\displaystyle \Gamma(k/2, 2)\)

F Distribution

Too many to list. See Wikipedia: F-distribution.

Most important are Chi-sq and T distribution:

  • If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
  • If \(\displaystyle X \sim t_{(n)}\) then \(\displaystyle X^2 \sim F(1, n)\) and \(\displaystyle X^{-2} \sim F(n, 1)\)

Textbooks