Probability: Difference between revisions
Tags: Mobile edit Mobile web edit |
|||
Line 19: | Line 19: | ||
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF. | The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF. | ||
==Expectation and | ==Expectation, Variance, and Moments== | ||
Some definitions and properties. | Some definitions and properties. | ||
===Definitions=== | ===Definitions=== | ||
Line 26: | Line 26: | ||
* <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math> | * <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math> | ||
* <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math> | * <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math> | ||
===Total Expection=== | ===Total Expection=== | ||
<math>E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))</math><br> | <math>E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))</math><br> | ||
Line 31: | Line 32: | ||
{{hidden | Proof | | {{hidden | Proof | | ||
<math> | <math> | ||
E(X) = \int_S | E(X) = \int_S x p(x)dx | ||
= \int_x x \int_y p(x,y)dy dx | = \int_x x \int_y p(x,y)dy dx | ||
= \int_x x \int_y p(x|y)p(y)dy dx | = \int_x x \int_y p(x|y)p(y)dy dx | ||
Line 42: | Line 43: | ||
This one is not used as often on tests as total expectation | This one is not used as often on tests as total expectation | ||
{{hidden | Proof | | {{hidden | Proof | | ||
}} | }} | ||
Line 57: | Line 57: | ||
* <math>\bar{X} \sim N(\mu, \sigma^2 / n)</math> | * <math>\bar{X} \sim N(\mu, \sigma^2 / n)</math> | ||
* <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math> | * <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math> | ||
===Moments=== | |||
{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}} | |||
* <math>E(X^n)</math> the n'th moment | |||
* <math>E((X-\mu)^n)</math> the n'th central moment | |||
* <math>E(((X-\mu) / \sigma)^n)</math> the n'th standardized moment | |||
Expectation is the first moment and variance is the second central moment.<br> | |||
Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment. | |||
To compute moments, we can use a moment generating function (MGF): | |||
<math>M_X(t) = E(e^{tX})</math> | |||
With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>. | |||
==Moments and Moment Generating Functions== | ==Moments and Moment Generating Functions== |
Revision as of 09:39, 4 February 2023
Calculus-based Probability
This is content covered in STAT410 and STAT700 at UMD.
Basics
Axioms of Probability
- \(\displaystyle 0 \leq P(E) \leq 1\)
- \(\displaystyle P(S) = 1\) where \(\displaystyle S\) is your sample space
- For mutually exclusive events \(\displaystyle E_1, E_2, ...\), \(\displaystyle P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)\)
Monotonicity
- For all events \(\displaystyle A\) and \(\displaystyle B\), \(\displaystyle A \subset B \implies P(A) \leq P(B)\)
PMF, PDF, CDF
For discrete distributions, we call \(\displaystyle p_{X}(x)=P(X=x)\) the probability mass function (PMF).
For continuous distributions, we have the probability density function (PDF) \(\displaystyle f(x)\).
The comulative distribution function (CDF) is \(\displaystyle F(x) = P(X \leq x)\).
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.
Expectation, Variance, and Moments
Some definitions and properties.
Definitions
Let \(\displaystyle X \sim D\) for some distribution \(\displaystyle D\). Let \(\displaystyle S\) be the support or domain of your distribution.
- \(\displaystyle E(X) = \sum_S xp(x)\) or \(\displaystyle \int_S xp(x)dx\)
- \(\displaystyle Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2\)
Total Expection
\(\displaystyle E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))\)
Dr. Xu refers to this as the smooth property.
\(\displaystyle E(X) = \int_S x p(x)dx = \int_x x \int_y p(x,y)dy dx = \int_x x \int_y p(x|y)p(y)dy dx = \int_y\int_x x p(x|y)dxp(y)dy \)
Total Variance
\(\displaystyle Var(Y) = E(Var(Y|X)) + Var(E(Y | X))\)
This one is not used as often on tests as total expectation
Sample Mean and Variance
The sample mean is \(\displaystyle \bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i\).
The unbiased sample variance is \(\displaystyle S^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i - \bar{X})^2\).
Student's Theorem
Let \(\displaystyle X_1,...,X_n\) be from \(\displaystyle N(\mu, \sigma^2)\).
Then the following results about the sample mean \(\displaystyle \bar{X}\)
and the unbiased sample variance \(\displaystyle S^2\) hold:
- \(\displaystyle \bar{X}\) and \(\displaystyle S^2\) are independent
- \(\displaystyle \bar{X} \sim N(\mu, \sigma^2 / n)\)
- \(\displaystyle (n-1)S^2 / \sigma^2 \sim \chi^2(n-1)\)
Moments
- \(\displaystyle E(X^n)\) the n'th moment
- \(\displaystyle E((X-\mu)^n)\) the n'th central moment
- \(\displaystyle E(((X-\mu) / \sigma)^n)\) the n'th standardized moment
Expectation is the first moment and variance is the second central moment.
Additionally, skew is the third standardized moment and kurtosis is the fourth standardized moment.
To compute moments, we can use a moment generating function (MGF): \(\displaystyle M_X(t) = E(e^{tX})\) With the MGF, we can get any order moments by taking n derivatives and setting \(t=0\).
Moments and Moment Generating Functions
Definitions
We call \(\displaystyle E(X^i)\) the i'th moment of \(\displaystyle X\).
We call \(\displaystyle E(|X - E(X)|^i)\) the i'th central moment of \(\displaystyle X\).
Therefore the mean is the first moment and the variance is the second central moment.
Moment Generating Functions
\(\displaystyle E(e^{tX})\)
We call this the moment generating function (mgf).
We can differentiate it with respect to \(\displaystyle t\) and set \(\displaystyle t=0\) to get the higher moments.
- Notes
- The mgf, if it exists, uniquely defines the distribution.
- The mgf of \(\displaystyle X+Y\) is \(\displaystyle E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})\)
Characteristic function
Convergence
There are 4 common types of convergence.
Almost Surely
- \(\displaystyle P(\lim X_i = X) = 1\)
In Probability
For all \(\displaystyle \epsilon \gt 0\)
\(\displaystyle \lim P(|X_i - X| \geq \epsilon) = 0\)
- Implies Convergence in distribution
In Distribution
Pointwise convergence of the cdf
A sequence of random variables \(\displaystyle X_1,...\) converges to \(\displaystyle X\) in probability
if for all \(\displaystyle x \in S\),
\(\displaystyle \lim_{i \rightarrow \infty} F_i(x) = F(x)\)
- Equivalent to convergence in probability if it converges to a degenerate distribution (i.e. a number)
In Mean Squared
\(\displaystyle \lim_{i \rightarrow \infty} E(|X_i-X|^2)=0\)
Delta Method
Suppose \(\displaystyle \sqrt{n}(X_n - \theta) \xrightarrow{D} N(0, \sigma^2)\).
Let \(\displaystyle g\) be a function such that \(\displaystyle g'\) exists and \(\displaystyle g'(\theta) \neq 0\)
Then \(\displaystyle \sqrt{n}(g(X_n) - g(\theta)) \xrightarrow{D} N(0, \sigma^2 g'(\theta)^2)\)
Multivariate:
\(\displaystyle \sqrt{n}(B - \beta) \xrightarrow{D} N(0, \Sigma) \implies \sqrt{n}(h(B)-h(\beta)) \xrightarrow{D} N(0, h'(\theta)^T \Sigma h'(\theta))\)
- Notes
- You can think of this like the Mean Value theorem for random variables.
- \(\displaystyle (g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)\)
Order Statistics
Inequalities and Limit Theorems
Markov's Inequality
Let \(\displaystyle X\) be a non-negative random variable.
Then \(\displaystyle P(X \geq a) \leq \frac{E(X)}{a}\)
\(\displaystyle \begin{aligned} E(X) &= \int_{0}^{\infty}xf(x)dx \\ &= \int_{0}^{a}xf(x)dx + \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}xf(x)dx\\ &\geq \int_{a}^{\infty}af(x)dx\\ &=a \int_{a}^{\infty}f(x)dx\\ &=a * P(X \geq a)\\ \implies& P(X \geq a) \leq \frac{E(X)}{a} \end{aligned} \)
Chebyshev's Inequality
- \(\displaystyle P(|X - \mu| \geq k \sigma) \leq \frac{1}{k^2}\)
- \(\displaystyle P(|X - \mu| \geq k) \leq \frac{\sigma^2}{k^2}\)
Apply Markov's inequality:
Let \(\displaystyle Y = |X - \mu|\)
Then \(\displaystyle P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}\)
- Usually used to prove convergence in probability
Central Limit Theorem
Very very important. Never forget this.
For any distribution, the sample mean converges in distribution to normal.
Let \(\displaystyle \mu = E(x)\) and \(\displaystyle \sigma^2 = Var(x)\)
Different ways of saying the same thing:
- \(\displaystyle \sqrt{n}(\bar{x} - \mu) \sim N(0, \sigma^2)\)
- \(\displaystyle \frac{\sqrt{n}}{\sigma}(\bar{x} - \mu) \sim N(0, 1)\)
- \(\displaystyle \bar{x} \sim N(\mu, \sigma^2/n)\)
Law of Large Numbers
The sample mean converges to the population mean in probability.
For all \(\displaystyle \epsilon \gt 0\),
\(\displaystyle \lim_{n \rightarrow \infty} P(|\bar{X}_n - E(X)| \geq \epsilon) = 0\)
- Notes
- The sample mean converges to the population mean almost surely.
Properties and Relationships between distributions
- This is important for exams.
Poisson Distribution
- If \(\displaystyle X_i \sim Poisson(\lambda_i)\) then \(\displaystyle \sum X_i \sim Poisson(\sum \lambda_i)\)
Normal Distribution
- If \(\displaystyle X_1 \sim N(\mu_1, \sigma_1^2)\) and \(\displaystyle X_2 \sim N(\mu_2, \sigma_2^2)\) then \(\displaystyle \lambda_1 X_1 + \lambda_2 X_2 \sim N(\lambda_1 \mu_1 + \lambda_2 X_2, \lambda_1^2 \sigma_1^2 + \lambda_2^2 + \sigma_2^2)\) for any \(\displaystyle \lambda_1, \lambda_2 \in \mathbb{R}\)
Exponential Distribution
- \(\displaystyle \operatorname{Exp}(\lambda)\) is equivalent to \(\displaystyle \Gamma(1, 1/\lambda)\)
- Note that some conventions flip the second parameter of gamma, so it would be \(\displaystyle \Gamma(1, \lambda)\)
- If \(\displaystyle \epsilon_1, ..., \epsilon_n\) are exponential distributions then \(\displaystyle \min\{\epsilon_i\} \sim \exp(\sum \lambda_i)\)
- Note that the maximum is not exponentially distributed
- However, if \(\displaystyle X_1, ..., X_n \sim \exp(1)\) then \(\displaystyle Z_n=n\exp(\max\{\epsilon_i\}) \rightarrow \exp(1)\)
Gamma Distribution
Note exponential distributions are also Gamma distrubitions
- If \(\displaystyle X \sim \Gamma(k, \theta)\) then \(\displaystyle \lambda X \sim \Gamma(k, c\theta)\).
- If \(\displaystyle X_1 \sim \Gamma(k_1, \theta)\) and \(\displaystyle X_2 \sim \Gamma(k_2, \theta)\) then \(\displaystyle X_2 + X_2 \sim \Gamma(k_1 + k_2, \theta)\).
- If \(\displaystyle X_1 \sim \Gamma(\alpha, \theta)\) and \(\displaystyle X_2 \sim \Gamma(\beta, \theta)\), then \(\displaystyle \frac{X_1}{X_1 + X_2} \sim B(\alpha, \beta)\).
T-distribution
- Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
- If \(\displaystyle Z \sim N(0,1)\) and \(\displaystyle V \sim \Chi^2(v)\) then \(\displaystyle \frac{Z}{\sqrt{V/v}} \sim \text{t-dist}(v)\)
Chi-Sq Distribution
- The ratio of two normalized Chi-sq is an F-distributions
- If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
- If \(\displaystyle Z_1,...,Z_k \sim N(0,1)\) then \(\displaystyle Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)\)
- If \(\displaystyle X_i \sim \Chi^2(k_i)\) then \(\displaystyle X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)\)
- \(\displaystyle \Chi^2(k)\) is equivalent to \(\displaystyle \Gamma(k/2, 2)\)
F Distribution
Too many to list. See Wikipedia: F-distribution.
Most important are Chi-sq and T distribution:
- If \(\displaystyle X \sim \chi^2_{d1}\) and \(\displaystyle Y \sim \chi^2_{d2}\) then \(\displaystyle \frac{X/d1}{Y/d2} \sim F(d1,d2)\)
- If \(\displaystyle X \sim t_{(n)}\) then \(\displaystyle X^2 \sim F(1, n)\) and \(\displaystyle X^{-2} \sim F(n, 1)\)
Textbooks
- Sheldon Ross' A First Course in Probability
- This is a very good textbook that is standard across many universities. However, it only covers one semester of content.
The books below cover both introductory probability as well as statistics.