Probability: Difference between revisions

 
(14 intermediate revisions by the same user not shown)
Line 12: Line 12:
* For all events <math>A</math> and <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
* For all events <math>A</math> and <math>B</math>, <math>A \subset B \implies P(A) \leq P(B)</math>
{{hidden | Proof | }}
{{hidden | Proof | }}
===Conditional Probability===
<math>P(A|B)</math> is the probability of event A given event B.<br>
Mathematically, this is defined as <math>P(A|B) = P(A,B) / P(B)</math>.<br>
Note that this can also be written as <math>P(A|B)P(B) = P(A, B)</math>
With some additional substitution, we get '''Baye's Theorem''':
<math>
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
</math>
==Random Variables==
A random variable is a variable which takes on a distribution rather than a value.


===PMF, PDF, CDF===
===PMF, PDF, CDF===
Line 19: Line 31:
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.
The CDF is the prefix sum of the PMF or the integral of the PDF. Likewise, the PDF is the derivative of the CDF.


==Expectation, Variance, and Moments==
===Joint Random Variables===
Two random variables are independant iff <math>f_{X,Y}(x,y) = f_X(x) f_Y(y)</math>.<br>
Otherwise, the marginal distribution is <math>f_X(x) = \int f_{X,Y}(x,y) dy</math>.
 
===Change of variables===
Let <math>g</math> be a monotonic increasing function and <math>Y = g(X)</math>.<br>
Then <math>F_Y(y) = P(Y \leq y) = P(X \leq g^{-1}(y)) = F_X(g^{-1}(y))</math>.<br>
And <math>f_Y(y) = \frac{d}{dy} F_Y(y) = \frac{d}{dy} F_X(g^{-1}(y)) = f_X(g^{-1}(y)) \frac{d}{dy}g^{-1}(y)</math><br>
Hence:
<math display="block">
  f_Y(y) = f_x(g^{-1}(y)) \frac{d}{dy} g^{-1}(y)
</math>
 
==Expectation and Variance==
Some definitions and properties.
Some definitions and properties.
===Definitions===
===Definitions===
Line 32: Line 57:
{{hidden | Proof |
{{hidden | Proof |
<math>
<math>
E(X) = \int_S x p(x)dx  
\begin{aligned}
= \int_x x \int_y p(x,y)dy dx
E(X) &= \int_S x p(x)dx \\
= \int_x x \int_y p(x|y)p(y)dy dx
&= \int_x x \int_y p(x,y)dy dx \\
= \int_y\int_x x  p(x|y)dxp(y)dy
&= \int_x x \int_y p(x|y)p(y)dy dx \\
&= \int_y\int_x x  p(x|y)dxp(y)dy
\end{aligned}
</math>
</math>
}}
}}
Line 43: Line 70:
This one is not used as often on tests as total expectation
This one is not used as often on tests as total expectation
{{hidden | Proof |
{{hidden | Proof |
 
<math>
\begin{aligned}
Var(Y) &= E(Y^2) - E(Y)^2 \\
&= E(E(Y^2|X)) - E(E(Y|X))^2\\
&= E(Var(Y|X) + E(Y|X)^2) - E(E(Y|X))^2\\
&= E((Var(Y|X)) + E(E(Y|X)^2) - E(E(Y|X))^2\\
&= E((Var(Y|X)) + Var(E(Y|X))\\
\end{aligned}
</math>
}}
}}


Line 58: Line 93:
* <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math>
* <math>(n-1)S^2 / \sigma^2 \sim \chi^2(n-1)</math>


===Moments===
===Jensen's Inequality===
{{main | Wikipedia: Jensen's inequality}}
Let g be a convex function (i.e. second derivative is positive).
Then <math>g(E(x)) \leq E(g(x))</math>.
 
==Moments and Moment Generating Functions==
===Definitions===
{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}}
{{main | Wikipedia: Moment (mathematics) | Wikipedia: Central moment | Wikipedia: Moment-generating function}}
* <math>E(X^n)</math> the n'th moment
* <math>E(X^n)</math> the n'th moment
Line 66: Line 107:
Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment.
Additionally, ''skew'' is the third standardized moment and ''kurtosis'' is the fourth standardized moment.


===Moment Generating Functions===
To compute moments, we can use a moment generating function (MGF):
To compute moments, we can use a moment generating function (MGF):
<math>M_X(t) = E(e^{tX})</math>
<math display="block">M_X(t) = E(e^{tX})</math>
With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>.
With the MGF, we can get any order moments by taking n derivatives and setting <math display="inline">t=0</math>.
; Notes
* The MGF, if it exists, uniquely defines the distribution.
* The MGF of <math>X+Y</math> is <math>MGF_{X+Y}(t) = E(e^{t(X+Y)})=E(e^{tX})E(e^{tY}) = MGF_X(t) * MGF_Y(t)</math>


==Moments and Moment Generating Functions==
===Definitions===
We call <math>E(X^i)</math> the i'th moment of <math>X</math>.<br>
We call <math>E(|X - E(X)|^i)</math> the i'th central moment of <math>X</math>.<br>
Therefore the mean is the first moment and the variance is the second central moment.
===Moment Generating Functions===
<math>E(e^{tX})</math><br>
We call this the moment generating function (mgf).<br>
We can differentiate it with respect to <math>t</math> and set <math>t=0</math> to get the higher moments.
; Notes
* The mgf, if it exists, uniquely defines the distribution.
* The mgf of <math>X+Y</math> is <math>E(e^{t(X+Y)})=E(e^{t(X)})E(e^{t(Y)})</math>
===Characteristic function===
===Characteristic function===


Line 121: Line 154:


==Order Statistics==
==Order Statistics==
Consider iid random variables <math>X_1, ..., X_n</math>.<br>
Then the order statistics are <math>X_{(1)}, ..., X_{(n)}</math> where <math>X_{(i)}</math> represents the i'th smallest number.
===Min and Max===
The easiest to reason about are the minimum and maximum order statistics:
<math>P(X_{(1)} <= x) = P(\text{min}(X_i) <= x) = 1 - P(X_1 > x, ..., X_n > x)</math>
<math>P(X_{(n)} <= x) = P(\text{max}(X_i) <= x) = P(X_1 <= x, ..., X_n <= x)</math>
===Joint PDF===
If <math>X_i</math> has pdf <math>f</math>, the joint pdf of <math>X_{(1)}, ..., X_{(n)}</math> is:
<math>
g(x_1, ...) = n!*f(x_1)*...*f(x_n)
</math>
since there are n! ways perform a change of variables.
===Individual PDF===
<math>
f_{X(i)}(x) = \frac{n!}{(i-1)!(n-i)!} F(x)^{i-1} f(x) [1-F(x)]^{n-1}
</math>


==Inequalities and Limit Theorems==
==Inequalities and Limit Theorems==
Line 147: Line 200:
Apply Markov's inequality:<br>
Apply Markov's inequality:<br>
Let <math>Y = |X - \mu|</math><br>
Let <math>Y = |X - \mu|</math><br>
Then <math>P(|X - \mu| \geq k) = P(Y \geq k) = = P(Y^2 \geq k^2) \leq \frac{E(Y^2)}{k^2} = \frac{E((X - \mu)^2)}{k^2}</math>
Then:<br>
<math>
\begin{aligned}
P(|X - \mu| \geq k) &= P(Y \geq k) \\
&= P(Y^2 \geq k^2) \\
&\leq \frac{E(Y^2)}{k^2} \\
&= \frac{E((X - \mu)^2)}{k^2}
\end{aligned}
</math>
}}
}}
* Usually used to prove convergence in probability
* Usually used to prove convergence in probability
Line 210: Line 271:
==Textbooks==
==Textbooks==
* [https://smile.amazon.com/dp/032179477X Sheldon Ross' A First Course in Probability]
* [https://smile.amazon.com/dp/032179477X Sheldon Ross' A First Course in Probability]
** This is a very good textbook that is standard across many universities. However, it only covers one semester of content.
The books below cover both introductory probability as well as statistics.
* [https://smile.amazon.com/dp/0321795431 Hogg and Craig's Mathematical Statistics]
* [https://smile.amazon.com/dp/0321795431 Hogg and Craig's Mathematical Statistics]
* [https://smile.amazon.com/dp/0534243126 Casella and Burger's Statistical Inference]
* [https://smile.amazon.com/dp/0534243126 Casella and Burger's Statistical Inference]