Probability

Concept

Probability Theory - Random Variables and Stochastic Processes

Probability Spaces
Conditional Probability
Random Variables
Expectation and Variance
Moment Generating Functions
Multivariate Distributions
Convergence
Markov Chains
Poisson Processes
Martingales

Probability Spaces

Sample Space and Events

Sample space $\Omega$ : Set of all outcomes

Event: Subset of $\Omega$

$\sigma$ -algebra $\mathcal{F}$ : Collection of events satisfying:

$\Omega \in \mathcal{F}$
If $A \in \mathcal{F}$ , then $A' \in \mathcal{F}$ (complement)
If $A_1, A_2, \dots \in \mathcal{F}$ , then $\cup A_i \in \mathcal{F}$ (countable unions)

Probability Measure

$P: \mathcal{F} \to [0,1]$ satisfying:

$P(\Omega) = 1$
Countable additivity: If $A_1, A_2, \dots$ disjoint, then $P(\cup A_i) = \sum P(A_i)$

Axioms:

$P(A) \geq 0$
$P(\Omega) = 1$
$P(\cup A_i) = \sum P(A_i)$ for disjoint events

Basic Properties

$P(\emptyset) = 0$
$P(A') = 1 - P(A)$
If $A \subseteq B$ , then $P(A) \leq P(B)$
$P(A \cup B) = P(A) + P(B) - P(A \cap B)$
$P(A \cap B) \leq \min(P(A), P(B))$

De Morgan’s Laws

$P((A \cup B)') = P(A' \cap B')$
$P((A \cap B)') = P(A' \cup B')$

Conditional Probability

Definition

$P(A | B) = \frac{P(A \cap B)}{P(B)}$

Provided $P(B) > 0$

Bayes’ Theorem

$P(A | B) = \frac{P(B | A) P(A)}{P(B)}$

Partition version: If $A_1, \dots, A_k$ partition $\Omega$ :

$P(A_i | B) = \frac{P(B | A_i) P(A_i)}{\sum_{j=1}^k P(B | A_j) P(A_j)}$

Independence

Two events: $P(A \cap B) = P(A)P(B)$

Mutual independence: $P\left(\bigcap_{i=1}^n A_i\right) = \prod_{i=1}^n P(A_i)$

Pairwise independence: $P(A_i \cap A_j) = P(A_i)P(A_j)$ for all $i \neq j$

Note: Pairwise independence $\neq$ mutual independence

Random Variables

Definition

Random variable $X$ : Function $X: \Omega \to \mathbb{R}$

Measurable: { $\omega : X(\omega) \leq x\} \in \mathcal{F}$ for all $x$

Cumulative Distribution Function

CDF: $F(x) = P(X \leq x)$

Properties:

Non-decreasing
Right-continuous
$\lim_{x \to - \infty} F(x) = 0$
$\lim_{x \to \infty} F(x) = 1$

Probability Mass Function (Discrete)

PMF: $p(x) = P(X = x)$

Properties: $\sum p(x) = 1$ , $p(x) \geq 0$

Probability Density Function (Continuous)

PDF: $f(x)$ such that $P(a \leq X \leq b) = \int_a^b f(x)dx$

Properties:

$f(x) \geq 0$
$\int_{-\infty}^{\infty} f(x)dx = 1$
$F(x) = \int_{-\infty}^x f(t)dt$
$f(x) = F'(x)$ (where derivative exists)

Transformations

Discrete: If $Y = g(X)$ , then $p_Y(y) = \sum_{x: g(x)=y} p_X(x)$

Continuous: If $Y = g(X)$ with $g$ strictly monotonic:

$f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy}g^{-1}(y)\right|$

Expectation and Variance

Expected Value

Discrete: $E[X] = \sum x \cdot p(x)$

Continuous: $E[X] = \int_{-\infty}^{\infty} x f(x)dx$

Properties:

$E[c] = c$ (constant)
$E[cX] = c E[X]$
$E[X + Y] = E[X] + E[Y]$
$E[XY] = E[X]E[Y]$ if independent

Variance

$\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2$

Standard deviation: $\sigma = \sqrt{\text{Var}(X)}$

Properties:

$\text{Var}(c) = 0$
$\text{Var}(cX) = c^2 \text{Var}(X)$
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$ if independent

Bound: $\text{Var}(X) \geq 0$ with equality $\iff X$ constant

Covariance

$\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]$

Properties:

$\text{Cov}(X,X) = \text{Var}(X)$
$\text{Cov}(X,Y) = \text{Cov}(Y,X)$
$\text{Cov}(aX + b, cY + d) = ac \text{Cov}(X,Y)$
$\text{Cov}(X + Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)$

Correlation

$\rho(X,Y) = \text{Cov}(X,Y)/(\sigma_X \sigma_Y)$

$|\rho| \leq 1$ (Cauchy-Schwarz)
$\rho = \pm 1 \iff Y = aX + b$
$\rho = 0$ : uncorrelated (implies Cov = 0)

Note: Uncorrelated $\neq$ Independent

Moment Generating Functions

Definition

MGF: $M_X(t) = E[e^{tX}]$

Properties:

$M_X(0) = 1$
$M_X^{(n)}(0) = E[X^n]$ , the $n$ -th moment
$M_{aX+b}(t) = e^{bt} M_X(at)$
If $X, Y$ independent: $M_{X+Y}(t) = M_X(t) M_Y(t)$
Uniqueness: MGF determines distribution

Existence: May not exist for some distributions.

Characteristic Function

$\phi_X(t) = E[e^{itX}]$ (always exists)

Properties similar to MGF but uses complex exponential.

Common MGFs

Binomial: $M(t) = (pe^t + q)^n$
Poisson: $M(t) = e^{\lambda(e^t - 1)}$
Exponential: $M(t) = \lambda/(\lambda - t)$ for $t < \lambda$
Normal: $M(t) = e^{\mu t + \sigma^2 t^2/2}$

Multivariate Distributions

Joint Distribution

Joint CDF: $F(x,y) = P(X \leq x, Y \leq y)$

Joint PMF: $p(x,y) = P(X = x, Y = y)$

Joint PDF: $f(x,y)$ such that $P((X,Y) \in A) = \iint_A f(x,y) dx dy$

Marginal Distributions

Discrete: $p_X(x) = \sum_y p(x,y)$ $p_Y(y) = \sum_x p(x,y)$

Continuous: $f_X(x) = \int_{-\infty}^{\infty} f(x,y) dy$ $f_Y(y) = \int_{-\infty}^{\infty} f(x,y) dx$

Independence

$X$ and $Y$ independent $\iff p(x,y) = p_X(x) p_Y(y)$ or $f(x,y) = f_X(x) f_Y(y)$ (for continuous)

If independent, then:

$E[XY] = E[X] E[Y]$
$\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)$
$M_{X+Y}(t) = M_X(t) M_Y(t)$

Conditional Distributions

Discrete: $p_{Y|X}(y|x) = \frac{p(x,y)}{p_X(x)}$

Continuous: $f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}$

Conditional Expectation

$E[Y | X = x] = \sum y \cdot p_{Y|X}(y|x)$ (discrete)

$E[Y | X = x] = \int y \cdot f_{Y|X}(y|x) dy$ (continuous)

$E[Y | X]$ is function of $X$ (random variable)

Properties:

$E[E[Y | X]] = E[Y]$
If $X, Y$ independent: $E[Y | X] = E[Y]$
$E[Y | X, Z]$ involves $X$ and $Z$

Law of Total Expectation (Tower Property)

$E[Y] = E[E[Y | X]]$

Law of Total Variance

$\text{Var}(Y) = E[\text{Var}(Y | X)] + \text{Var}(E[Y | X])$

Convergence

Almost Sure Convergence

$X_n \to X \text{ a.s.}$

if $P(\lim_{n \to \infty} X_n = X) = 1$

Convergence in Probability

$X_n \to_p X$

if for $\epsilon > 0$ : $P(|X_n - X| > \epsilon) \to 0 \text{ as } n \to \infty$

Convergence in Distribution (Weak)

$X_n \to_d X$

if $\lim_{n \to \infty} F_{X_n}(x) = F_X(x)$ at continuity points

Implies: $E[g(X_n)] \to E[g(X)]$ for bounded continuous $g$

Central Limit Theorem

If $X_1, \dots, X_n$ i.i.d. with $E[X_i] = \mu, \text{Var}(X_i) = \sigma^2$ :

$\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \to_d N(0,1)$

Slutsky’s Theorem

If $X_n \to_d X$ and $Y_n \to_p c$ (constant):

$X_n + Y_n \to_d X + c$
$X_n Y_n \to_d cX$
$X_n/Y_n \to_d X/c$ (if $c \neq 0$ )

Markov Chains

Definition

Markov property: $P(X_{n+1} = j | X_n = i, \dots, X_0 = i_0) = P(X_{n+1} = j | X_n = i)$

Homogeneous: Transition probabilities independent of time

Transition matrix P: $(P)_{ij} = P_{ij} = P(X_{n+1} = j | X_n = i)$

Properties: Rows sum to 1, entries $\geq 0$

Classification of States

Transient: $P(\text{ever return}) < 1$ Recurrent: $P(\text{ever return}) = 1$

Positive recurrent: $E[T_i] < \infty$
Null recurrent: $E[T_i] = \infty$

Periodic: Period $d$ divides all return times Aperiodic: $d = 1$

Stationary Distribution

$\pi$ satisfies: $\pi = \pi P$ and $\sum \pi_i = 1$

For irreducible chain: Stationary distribution unique if positive recurrent

Long-run behavior: $\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j$

Convergence

Ergodic theorem: For positive recurrent aperiodic chain:

$\lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} \mathbb{I}(X_k = j) = \pi_j \text{ a.s.}$

Poisson Processes

Definition

Counting process {N(t), t ≥ 0}:

$N(0) = 0$
Independent increments
$N(t) - N(s) \sim \text{Poisson}(\lambda(t-s))$

Rate $\lambda$ : Expected arrivals per unit time

Properties

Number of arrivals in [0,t]: $N(t) \sim \text{Poisson}(\lambda t)$

$P(N(t) = n) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}$

Inter-arrival times: $T_1, T_2, \dots$ i.i.d. Exponential( $\lambda$ )

Arrival times: $S_n = \sum_{i=1}^n T_i \sim \text{Gamma}(n, \lambda)$

$f_{S_n}(t) = \frac{\lambda^n t^{n-1} e^{-\lambda t}}{(n-1)!}$

Compound Poisson

$Y(t) = \sum_{i=1}^{N(t)} Y_i$

where $Y_i$ i.i.d., independent of $N(t)$

Martingales

Definition

Sequence { $M_n$ } is martingale if:

$E[|M_n|] < \infty$
$E[M_{n+1} | M_n, \dots, M_1] = M_n$

Doob’s Optional Stopping Theorem

If $\tau$ stopping time, then under conditions:

$E[M_τ] = E[M_0]$

Conditions: Bounded stopping time, or bounded martingale, etc.

Applications

Gambling: Fair game Random walks Ruin probabilities

Probability Theory - Random Variables and Stochastic Processes

Table of Contents

Probability Spaces

Sample Space and Events

Probability Measure

Basic Properties

De Morgan’s Laws

Conditional Probability

Definition

Bayes’ Theorem

Independence

Random Variables

Definition

Cumulative Distribution Function

Probability Mass Function (Discrete)

Probability Density Function (Continuous)

Transformations

Expectation and Variance

Expected Value

Variance

Covariance

Correlation

Moment Generating Functions

Definition

Characteristic Function

Common MGFs

Multivariate Distributions

Joint Distribution

Marginal Distributions

Independence

Conditional Distributions

Conditional Expectation

Law of Total Expectation (Tower Property)

Law of Total Variance

Convergence

Almost Sure Convergence

Convergence in Probability

Convergence in Distribution (Weak)

Central Limit Theorem

Slutsky’s Theorem

Markov Chains

Definition

Classification of States

Stationary Distribution

Convergence

Poisson Processes

Definition

Properties

Compound Poisson

Martingales

Definition

Doob’s Optional Stopping Theorem

Applications