Probability: Difference between revisions

(29 intermediate revisions by the same user not shown)

Line 1:

Calculus-based Probability

This is content covered in STAT410 and STAT700 at UMD.

==Basics==

Line 6:

Line 8:

* <math>P(S) = 1</math> where <math>S</math> is your sample space

* For mutually exclusive events <math>E_1, E_2, ...</math>, <math>P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)</math>

===Monotonicity===

* For all events <math>A</math>, <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>

* For all events <math>A</math> and <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>

===Conditional Probability===

<math>P(A|B)</math> is the probability of event A given event B.

Mathematically, this is defined as <math>P(A|B) = P(A,B) / P(B)</math>.

Note that this can also be written as <math>P(A|B)P(B) = P(A, B)</math>

With some additional substitution, we get '''Baye's Theorem''':

<math>

P(A|B) = \frac{P(B|A)P(A)}{P(B)}

</math>

==Random Variables==

A random variable is a variable which takes on a distribution rather than a value.

===PMF, PDF, CDF===

For discrete distributions, we call <math>p_{X}(x)=P(X=x)</math> the probability mass function (PMF).

For continuous distributions, we have the probability density function (PDF) <math>f(x)</math>.

The comulative distribution function (CDF) is <math>F(x) = P(X \leq x)</math>.

The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.

===Joint Random Variables===

Two random variables are independant iff <math>f_{X,Y}(x,y) = f_X(x) f_Y(y)</math>.

Otherwise, the marginal distribution is <math>f_X(x) = \int f_{X,Y}(x,y) dy</math>.

===Change of variables===

Let <math>g</math> be a monotonic increasing function and <math>Y = g(X)</math>.

Then <math>F_Y(y) = P(Y \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))</math>.

And <math>f_Y(y) = \frac{d}{dy} F_Y(y) = \frac{d}{dy} F_X(g^{-1}(y)) = f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y)</math>

Hence:

f_Y(y) = f_x(g^{-1}(y)) \frac{d}{dy} g^{-1}(y)

</math>

==Expectation and Variance==

Line 17:

Line 51:

* <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math>

* <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math>

===Total Expection===

Line 22:

Line 57:

{{hidden | Proof |

<math>

E(X) = \int_S xp(x)dx

\begin{aligned}

= \int_x x \int_y p(x,y)dy dx

E(X) &= \int_S x p(x)dx \\

= \int_x x \int_y p(x|y)p(y)dy dx

&= \int_x x \int_y p(x,y)dy dx \\

= \int_y\int_x x p(x|y)dxp(y)dy

&= \int_x x \int_y p(x|y)p(y)dy dx \\

&= \int_y\int_x x p(x|y)dxp(y)dy

\end{aligned}

</math>

}}

===Total Variance===

This one is not used as often on tests as total expectation

{{hidden | Proof |

<math>

\begin{aligned}

Var(Y) &= E(Y^2) - E(Y)^2 \\

&= E(E(Y^2|X)) - E(E(Y|X))^2\\

&= E(Var(Y|X) + E(Y|X)^2) - E(E(Y|X))^2\\

&= E((Var(Y|X)) + E(E(Y|X)^2) - E(E(Y|X))^2\\

&= E((Var(Y|X)) + Var(E(Y|X))\\

\end{aligned}

</math>

}}

Line 48:

Line 92:

* <math>\bar{X} \sim N(\mu, \sigma^2 / n)</math>

* <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math>

===Jensen's Inequality===

Let g be a convex function (i.e. second derivative is positive).

Then <math>g(E(x)) \leq E(g(x))</math>.

==Moments and Moment Generating Functions==

===Definitions===

~~We call~~ <math>E(X^i)</math> the i'th moment of <math>X</math>~~. ~~

{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}}

~~We call~~ <math>E(|X - ~~E(X~~)|^i)</math> the i'th central moment ~~of <math>X</math>~~.

* <math>E(X^n)</math> the n'th moment

~~Therefore the mean~~ is the ~~first~~ moment and ~~the variance~~ is the ~~second central~~ moment.

* <math>E((X-\mu)^n)</math> the n'th central moment

* <math>E(((X-\mu) / \sigma)^n)</math> the n'th standardized moment

Expectation is the first moment and variance is the second central moment.

Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment.

===Moment Generating Functions===

To compute moments, we can use a moment generating function (MGF):

~~We call this~~ the ~~moment generating function (mgf). ~~

We can ~~differentiate it with respect to <math>t</math>~~ and ~~set~~ <math>t=0</math> ~~to get the higher moments~~.

With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>.

; Notes

* The ~~mgf~~, if it exists, uniquely defines the distribution.

* The MGF, if it exists, uniquely defines the distribution.

* The ~~mgf~~ of <math>X+Y</math> is <math>E(e^{t(X+Y)})=E(e^{~~t(X)~~})E(e^{t(~~Y)}~~)</math>

* The MGF of <math>X+Y</math> is <math>MGF_{X+Y}(t) = E(e^{t(X+Y)})=E(e^{tX})E(e^{tY}) = MGF_X(t) * MGF_Y(t)</math>

===Characteristic function===

Line 97:

Line 151:

;Notes

* You can think of this like the Mean Value theorem for random variables.

: <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>

** <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>

==Order Statistics==

Consider iid random variables <math>X_1, ..., X_n</math>.

Then the order statistics are <math>X_{(1)}, ..., X_{(n)}</math> where <math>X_{(i)}</math> represents the i'th smallest number.

===Min and Max===

The easiest to reason about are the minimum and maximum order statistics:

===Joint PDF===

If <math>X_i</math> has pdf <math>f</math>, the joint pdf of <math>X_{(1)}, ..., X_{(n)}</math> is:

<math>

g(x_1, ...) = n!*f(x_1)*...*f(x_n)

</math>

since there are n! ways perform a change of variables.

===Individual PDF===

<math>

f_{X(i)}(x) = \frac{n!}{(i-1)!(n-i)!} F(x)^{i-1} f(x) [1-F(x)]^{n-1}

</math>

==Inequalities and Limit Theorems==

Line 125:

Line 199:

{{hidden | Proof |

Apply Markov's inequality:

Let <math>Y = |X - \mu|</math>

Let <math>Y = |X - \mu|</math>

Then:

<math>

\begin{aligned}

P(|X - \mu| \geq k) &= P(Y \geq k) \\

&= P(Y^2 \geq k^2) \\

&\leq \frac{E(Y^2)}{k^2} \\

&= \frac{E((X - \mu)^2)}{k^2}

\end{aligned}

</math>

}}

* Usually used to prove convergence in probability

Line 147:

Line 229:

==Properties and Relationships between distributions==

This is important for exams~~. ~~

{{main | Wikipedia: Relationships among probability distributions}}

~~See [https://en.wikipedia.org/wiki/Relationships_among_probability_distributions Relationships among probability distributions]~~.

;This is important for exams.

===Poisson Distribution===

Line 171:

Line 253:

===T-distribution===

* Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.

** If <math>Z \sim N(0,1)</math> and <math> V \sim \Chi^2(v)</math> then <math>\frac{Z}{\sqrt{V/v}} \sim t-dist(v)</math>

** If <math>Z \sim N(0,1)</math> and <math> V \sim \Chi^2(v)</math> then <math>\frac{Z}{\sqrt{V/v}} \sim \text{t-dist}(v)</math>

===Chi-Sq Distribution===

Line 178:

Line 260:

* If <math>Z_1,...,Z_k \sim N(0,1)</math> then <math>Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)</math>

* If <math>X_i \sim \Chi^2(k_i)</math> then <math>X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)</math>

* <math>\Chi^2(k)</math> is equivalent to <math>\Gamma(k/2, 2)</math>

===F Distribution===

Too many. See [~~https~~:~~//en.wikipedia.org/wiki/~~F-distribution ~~the Wikipedia Page~~].

Too many to list. See [[Wikipedia: F-distribution]].

~~Most important are Chi-sq and T distribution~~

Most important are Chi-sq and T distribution:

* If <math>X \sim \chi^2_{d1}</math> and <math>Y \sim \chi^2_{d2}</math> then <math>\frac{X/d1}{Y/d2} \sim F(d1,d2)</math>

* If <math>X \sim t_{(n)}</math> then <math>X^2 \sim F(1, n)</math> and <math>X^{-2} \sim F(n, 1)</math>

Line 188:

Line 271:

==Textbooks==

* [https://smile.amazon.com/dp/032179477X Sheldon Ross' A First Course in Probability]

** This is a very good textbook despite the poor reviews on Amazon

* [https://smile.amazon.com/dp/0321795431 Hogg and Craig's Mathematical Statistics]

* [https://smile.amazon.com/dp/0534243126 Casella and Burger's Statistical Inference]

@@ Line 1: / Line 1: @@
 Calculus-based Probability
+This is content covered in STAT410 and STAT700 at UMD.
 ==Basics==
@@ Line 6: / Line 8: @@
 * <math>P(S) = 1</math> where <math>S</math> is your sample space
 * For mutually exclusive events <math>E_1, E_2, ...</math>, <math>P\left(\bigcup_i^\infty E_i\right) = \sum_i^\infty P(E_i)</math>
 ===Monotonicity===
-* For all events <math>A</math>, <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
+* For all events <math>A</math> and <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
 {{hidden | Proof | }}
+===Conditional Probability===
+<math>P(A|B)</math> is the probability of event A given event B.<br>
+Mathematically, this is defined as <math>P(A|B) = P(A,B) / P(B)</math>.<br>
+Note that this can also be written as <math>P(A|B)P(B) = P(A, B)</math>
+With some additional substitution, we get '''Baye's Theorem''':
+<math>
+P(A|B) = \frac{P(B|A)P(A)}{P(B)}
+</math>
+==Random Variables==
+A random variable is a variable which takes on a distribution rather than a value.
+===PMF, PDF, CDF===
+For discrete distributions, we call <math>p_{X}(x)=P(X=x)</math> the probability mass function (PMF).<br>
+For continuous distributions, we have the probability density function (PDF) <math>f(x)</math>.<br>
+The comulative distribution function (CDF) is <math>F(x) = P(X \leq x)</math>.<br>
+The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.
+===Joint Random Variables===
+Two random variables are independant iff <math>f_{X,Y}(x,y) = f_X(x) f_Y(y)</math>.<br>
+Otherwise, the marginal distribution is <math>f_X(x) = \int f_{X,Y}(x,y) dy</math>.
+===Change of variables===
+Let <math>g</math> be a monotonic increasing function and <math>Y = g(X)</math>.<br>
+Then <math>F_Y(y) = P(Y \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))</math>.<br>
+And <math>f_Y(y) = \frac{d}{dy} F_Y(y) = \frac{d}{dy} F_X(g^{-1}(y)) = f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y)</math><br>
+Hence:
+<math display="block">
+  f_Y(y) = f_x(g^{-1}(y)) \frac{d}{dy} g^{-1}(y)
+</math>
 ==Expectation and Variance==
@@ Line 17: / Line 51: @@
 * <math>E(X) = \sum_S xp(x)</math> or <math>\int_S xp(x)dx</math>
 * <math>Var(X) = E[(X-E(X))^2] = E(X^2) - (E(X))^2</math>
 ===Total Expection===
 <math>E_{X}(X) = E_{Y}(E_{X|Y}(X|Y))</math><br>
@@ Line 22: / Line 57: @@
 {{hidden | Proof |
 <math>
-E(X) = \int_S xp(x)dx
+\begin{aligned}
-= \int_x x \int_y p(x,y)dy dx
+E(X) &= \int_S x p(x)dx \\
-= \int_x x \int_y p(x|y)p(y)dy dx
+&= \int_x x \int_y p(x,y)dy dx \\
-= \int_y\int_x x  p(x|y)dxp(y)dy
+&= \int_x x \int_y p(x|y)p(y)dy dx \\
+&= \int_y\int_x x  p(x|y)dxp(y)dy
+\end{aligned}
 </math>
 }}
 ===Total Variance===
-<math>Var(Y) = E(Var(Y|X)) + Var(E(Y | X)</math><br>
+<math>Var(Y) = E(Var(Y|X)) + Var(E(Y | X))</math><br>
 This one is not used as often on tests as total expectation
 {{hidden | Proof |
+<math>
+\begin{aligned}
+Var(Y) &= E(Y^2) - E(Y)^2 \\
+&= E(E(Y^2|X)) - E(E(Y|X))^2\\
+&= E(Var(Y|X) + E(Y|X)^2) - E(E(Y|X))^2\\
+&= E((Var(Y|X)) + E(E(Y|X)^2) - E(E(Y|X))^2\\
+&= E((Var(Y|X)) + Var(E(Y|X))\\
+\end{aligned}
+</math>
 }}
@@ Line 48: / Line 92: @@
 * <math>\bar{X} \sim N(\mu, \sigma^2 / n)</math>
 * <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math>
+===Jensen's Inequality===
+{{main | Wikipedia: Jensen's inequality}}
+Let g be a convex function (i.e. second derivative is positive).
+Then <math>g(E(x)) \leq E(g(x))</math>.
 ==Moments and Moment Generating Functions==
 ===Definitions===
-We call <math>E(X^i)</math> the i'th moment of <math>X</math>.<br>
+{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}}
-We call <math>E(|X - E(X)|^i)</math> the i'th central moment of <math>X</math>.<br>
+* <math>E(X^n)</math> the n'th moment
-Therefore the mean is the first moment and the variance is the second central moment.
+* <math>E((X-\mu)^n)</math> the n'th central moment
+* <math>E(((X-\mu) / \sigma)^n)</math> the n'th standardized moment
+Expectation is the first moment and variance is the second central moment.<br>
+Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment.
 ===Moment Generating Functions===
-<math>E(e^{tX})</math><br>
+To compute moments, we can use a moment generating function (MGF):
-We call this the moment generating function (mgf).<br>
+<math display="block">M_X(t) = E(e^{tX})</math>
-We can differentiate it with respect to <math>t</math> and set <math>t=0</math> to get the higher moments.
+With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>.
 ; Notes
-* The mgf, if it exists, uniquely defines the distribution.
+* The MGF, if it exists, uniquely defines the distribution.
-* The mgf of <math>X+Y</math> is <math>E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})</math>
+* The MGF of <math>X+Y</math> is <math>MGF_{X+Y}(t) = E(e^{t(X+Y)})=E(e^{tX})E(e^{tY}) = MGF_X(t) * MGF_Y(t)</math>
 ===Characteristic function===
@@ Line 97: / Line 151: @@
 ;Notes
 * You can think of this like the Mean Value theorem for random variables.
-: <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>
+** <math>(g(X_n) - g(\theta)) \approx g'(\theta)(X_n - \theta)</math>
 ==Order Statistics==
+Consider iid random variables <math>X_1, ..., X_n</math>.<br>
+Then the order statistics are <math>X_{(1)}, ..., X_{(n)}</math> where <math>X_{(i)}</math> represents the i'th smallest number.
+===Min and Max===
+The easiest to reason about are the minimum and maximum order statistics:
+<math>P(X_{(1)} <= x) = P(\text{min}(X_i) <= x) = 1 - P(X_1 > x, ..., X_n > x)</math>
+<math>P(X_{(n)} <= x) = P(\text{max}(X_i) <= x) = P(X_1 <= x, ..., X_n <= x)</math>
+===Joint PDF===
+If <math>X_i</math> has pdf <math>f</math>, the joint pdf of <math>X_{(1)}, ..., X_{(n)}</math> is:
+<math>
+g(x_1, ...) = n!*f(x_1)*...*f(x_n)
+</math>
+since there are n! ways perform a change of variables.
+===Individual PDF===
+<math>
+f_{X(i)}(x) = \frac{n!}{(i-1)!(n-i)!} F(x)^{i-1} f(x) [1-F(x)]^{n-1}
+</math>
 ==Inequalities and Limit Theorems==
@@ Line 125: / Line 199: @@
 {{hidden | Proof |
 Apply Markov's inequality:<br>
-Let <math>Y = |X - \mu|</math>
+Let <math>Y = |X - \mu|</math><br>
-<math>P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}</math>
+Then:<br>
+<math>
+\begin{aligned}
+P(|X - \mu| \geq k) &= P(Y \geq k) \\
+&= P(Y^2 \geq k^2) \\
+&\leq \frac{E(Y^2)}{k^2} \\
+&= \frac{E((X - \mu)^2)}{k^2}
+\end{aligned}
+</math>
 }}
 * Usually used to prove convergence in probability
@@ Line 147: / Line 229: @@
 ==Properties and Relationships between distributions==
-This is important for exams.<br>
+{{main | Wikipedia: Relationships among probability distributions}}
-See [https://en.wikipedia.org/wiki/Relationships_among_probability_distributions Relationships among probability distributions].
+;This is important for exams.
 ===Poisson Distribution===
@@ Line 171: / Line 253: @@
 ===T-distribution===
 * Ratio of standard normal and squared-root of Chi-sq distribution yields T-distribution.
-** If <math>Z \sim N(0,1)</math> and <math> V \sim \Chi^2(v)</math> then <math>\frac{Z}{\sqrt{V/v}} \sim t-dist(v)</math>
+** If <math>Z \sim N(0,1)</math> and <math> V \sim \Chi^2(v)</math> then <math>\frac{Z}{\sqrt{V/v}} \sim \text{t-dist}(v)</math>
 ===Chi-Sq Distribution===
@@ Line 178: / Line 260: @@
 * If <math>Z_1,...,Z_k \sim N(0,1)</math> then <math>Z_1^2 + ... + Z_k^2 \sim \Chi^2(k)</math>
 * If <math>X_i \sim \Chi^2(k_i)</math> then <math>X_1 + ... + X_n \sim \Chi^2(k_1 +...+ k_n)</math>
+* <math>\Chi^2(k)</math> is equivalent to <math>\Gamma(k/2, 2)</math>
 ===F Distribution===
-Too many. See [https://en.wikipedia.org/wiki/F-distribution the Wikipedia Page].
+Too many to list. See [[Wikipedia: F-distribution]].
-Most important are Chi-sq and T distribution
+Most important are Chi-sq and T distribution:
 * If <math>X \sim \chi^2_{d1}</math> and <math>Y \sim \chi^2_{d2}</math> then <math>\frac{X/d1}{Y/d2} \sim F(d1,d2)</math>
 * If <math>X \sim t_{(n)}</math> then <math>X^2 \sim F(1, n)</math> and <math>X^{-2} \sim F(n, 1)</math>
@@ Line 188: / Line 271: @@
 ==Textbooks==
 * [https://smile.amazon.com/dp/032179477X Sheldon Ross' A First Course in Probability]
-** This is a very good textbook despite the poor reviews on Amazon
 * [https://smile.amazon.com/dp/0321795431 Hogg and Craig's Mathematical Statistics]
 * [https://smile.amazon.com/dp/0534243126 Casella and Burger's Statistical Inference]