Probability

Concept

Probability Theory - Random Variables and Stochastic Processes

Table of Contents

  1. Probability Spaces
  2. Conditional Probability
  3. Random Variables
  4. Expectation and Variance
  5. Moment Generating Functions
  6. Multivariate Distributions
  7. Convergence
  8. Markov Chains
  9. Poisson Processes
  10. Martingales

Probability Spaces

Sample Space and Events

Sample space Ω\Omega: Set of all outcomes

Event: Subset of Ω\Omega

σ\sigma-algebra F\mathcal{F}: Collection of events satisfying:

  • ΩF\Omega \in \mathcal{F}
  • If AFA \in \mathcal{F}, then AFA' \in \mathcal{F} (complement)
  • If A1,A2,FA_1, A_2, \dots \in \mathcal{F}, then AiF\cup A_i \in \mathcal{F} (countable unions)

Probability Measure

P:F[0,1]P: \mathcal{F} \to [0,1] satisfying:

  • P(Ω)=1P(\Omega) = 1
  • Countable additivity: If A1,A2,A_1, A_2, \dots disjoint, then P(Ai)=P(Ai)P(\cup A_i) = \sum P(A_i)

Axioms:

  1. P(A)0P(A) \geq 0
  2. P(Ω)=1P(\Omega) = 1
  3. P(Ai)=P(Ai)P(\cup A_i) = \sum P(A_i) for disjoint events

Basic Properties

  • P()=0P(\emptyset) = 0
  • P(A)=1P(A)P(A') = 1 - P(A)
  • If ABA \subseteq B, then P(A)P(B)P(A) \leq P(B)
  • P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • P(AB)min(P(A),P(B))P(A \cap B) \leq \min(P(A), P(B))

De Morgan’s Laws

  • P((AB))=P(AB)P((A \cup B)') = P(A' \cap B')
  • P((AB))=P(AB)P((A \cap B)') = P(A' \cup B')

Conditional Probability

Definition

P(AB)=P(AB)P(B)P(A | B) = \frac{P(A \cap B)}{P(B)}

Provided P(B)>0P(B) > 0

Bayes’ Theorem

P(AB)=P(BA)P(A)P(B)P(A | B) = \frac{P(B | A) P(A)}{P(B)}

Partition version: If A1,,AkA_1, \dots, A_k partition Ω\Omega:

P(AiB)=P(BAi)P(Ai)j=1kP(BAj)P(Aj)P(A_i | B) = \frac{P(B | A_i) P(A_i)}{\sum_{j=1}^k P(B | A_j) P(A_j)}

Independence

Two events: P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)

Mutual independence: P(i=1nAi)=i=1nP(Ai)P\left(\bigcap_{i=1}^n A_i\right) = \prod_{i=1}^n P(A_i)

Pairwise independence: P(AiAj)=P(Ai)P(Aj)P(A_i \cap A_j) = P(A_i)P(A_j) for all iji \neq j

Note: Pairwise independence \neq mutual independence


Random Variables

Definition

Random variable XX: Function X:ΩRX: \Omega \to \mathbb{R}

Measurable: {ω:X(ω)x}F\omega : X(\omega) \leq x\} \in \mathcal{F} for all xx

Cumulative Distribution Function

CDF: F(x)=P(Xx)F(x) = P(X \leq x)

Properties:

  • Non-decreasing
  • Right-continuous
  • limxF(x)=0\lim_{x \to - \infty} F(x) = 0
  • limxF(x)=1\lim_{x \to \infty} F(x) = 1

Probability Mass Function (Discrete)

PMF: p(x)=P(X=x)p(x) = P(X = x)

Properties: p(x)=1\sum p(x) = 1, p(x)0p(x) \geq 0

Probability Density Function (Continuous)

PDF: f(x)f(x) such that P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_a^b f(x)dx

Properties:

  • f(x)0f(x) \geq 0
  • f(x)dx=1\int_{-\infty}^{\infty} f(x)dx = 1
  • F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t)dt
  • f(x)=F(x)f(x) = F'(x) (where derivative exists)

Transformations

Discrete: If Y=g(X)Y = g(X), then pY(y)=x:g(x)=ypX(x)p_Y(y) = \sum_{x: g(x)=y} p_X(x)

Continuous: If Y=g(X)Y = g(X) with gg strictly monotonic:

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy}g^{-1}(y)\right|


Expectation and Variance

Expected Value

Discrete: E[X]=xp(x)E[X] = \sum x \cdot p(x)

Continuous: E[X]=xf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x)dx

Properties:

  • E[c]=cE[c] = c (constant)
  • E[cX]=cE[X]E[cX] = c E[X]
  • E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]
  • E[XY]=E[X]E[Y]E[XY] = E[X]E[Y] if independent

Variance

Var(X)=E[(XE[X])2]=E[X2](E[X])2\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2

Standard deviation: σ=Var(X)\sigma = \sqrt{\text{Var}(X)}

Properties:

  • Var(c)=0\text{Var}(c) = 0
  • Var(cX)=c2Var(X)\text{Var}(cX) = c^2 \text{Var}(X)
  • Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y) if independent

Bound: Var(X)0\text{Var}(X) \geq 0 with equality     X\iff X constant

Covariance

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]

Properties:

  • Cov(X,X)=Var(X)\text{Cov}(X,X) = \text{Var}(X)
  • Cov(X,Y)=Cov(Y,X)\text{Cov}(X,Y) = \text{Cov}(Y,X)
  • Cov(aX+b,cY+d)=acCov(X,Y)\text{Cov}(aX + b, cY + d) = ac \text{Cov}(X,Y)
  • Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z)\text{Cov}(X + Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)

Correlation

ρ(X,Y)=Cov(X,Y)/(σXσY)\rho(X,Y) = \text{Cov}(X,Y)/(\sigma_X \sigma_Y)

  • ρ1|\rho| \leq 1 (Cauchy-Schwarz)
  • ρ=±1    Y=aX+b\rho = \pm 1 \iff Y = aX + b
  • ρ=0\rho = 0: uncorrelated (implies Cov = 0)

Note: Uncorrelated \neq Independent


Moment Generating Functions

Definition

MGF: MX(t)=E[etX]M_X(t) = E[e^{tX}]

Properties:

  • MX(0)=1M_X(0) = 1
  • MX(n)(0)=E[Xn]M_X^{(n)}(0) = E[X^n], the nn-th moment
  • MaX+b(t)=ebtMX(at)M_{aX+b}(t) = e^{bt} M_X(at)
  • If X,YX, Y independent: MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t)
  • Uniqueness: MGF determines distribution

Existence: May not exist for some distributions.

Characteristic Function

ϕX(t)=E[eitX]\phi_X(t) = E[e^{itX}] (always exists)

Properties similar to MGF but uses complex exponential.

Common MGFs

  • Binomial: M(t)=(pet+q)nM(t) = (pe^t + q)^n
  • Poisson: M(t)=eλ(et1)M(t) = e^{\lambda(e^t - 1)}
  • Exponential: M(t)=λ/(λt)M(t) = \lambda/(\lambda - t) for t<λt < \lambda
  • Normal: M(t)=eμt+σ2t2/2M(t) = e^{\mu t + \sigma^2 t^2/2}

Multivariate Distributions

Joint Distribution

Joint CDF: F(x,y)=P(Xx,Yy)F(x,y) = P(X \leq x, Y \leq y)

Joint PMF: p(x,y)=P(X=x,Y=y)p(x,y) = P(X = x, Y = y)

Joint PDF: f(x,y)f(x,y) such that P((X,Y)A)=Af(x,y)dxdyP((X,Y) \in A) = \iint_A f(x,y) dx dy

Marginal Distributions

Discrete: pX(x)=yp(x,y)p_X(x) = \sum_y p(x,y) pY(y)=xp(x,y)p_Y(y) = \sum_x p(x,y)

Continuous: fX(x)=f(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f(x,y) dy fY(y)=f(x,y)dxf_Y(y) = \int_{-\infty}^{\infty} f(x,y) dx

Independence

XX and YY independent     p(x,y)=pX(x)pY(y)\iff p(x,y) = p_X(x) p_Y(y) or f(x,y)=fX(x)fY(y)f(x,y) = f_X(x) f_Y(y) (for continuous)

If independent, then:

  • E[XY]=E[X]E[Y]E[XY] = E[X] E[Y]
  • Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
  • MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t)

Conditional Distributions

Discrete: pYX(yx)=p(x,y)pX(x)p_{Y|X}(y|x) = \frac{p(x,y)}{p_X(x)}

Continuous: fYX(yx)=f(x,y)fX(x)f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}

Conditional Expectation

E[YX=x]=ypYX(yx)E[Y | X = x] = \sum y \cdot p_{Y|X}(y|x) (discrete)

E[YX=x]=yfYX(yx)dyE[Y | X = x] = \int y \cdot f_{Y|X}(y|x) dy (continuous)

E[YX]E[Y | X] is function of XX (random variable)

Properties:

  • E[E[YX]]=E[Y]E[E[Y | X]] = E[Y]
  • If X,YX, Y independent: E[YX]=E[Y]E[Y | X] = E[Y]
  • E[YX,Z]E[Y | X, Z] involves XX and ZZ

Law of Total Expectation (Tower Property)

E[Y]=E[E[YX]]E[Y] = E[E[Y | X]]

Law of Total Variance

Var(Y)=E[Var(YX)]+Var(E[YX])\text{Var}(Y) = E[\text{Var}(Y | X)] + \text{Var}(E[Y | X])


Convergence

Almost Sure Convergence

XnX a.s.X_n \to X \text{ a.s.}

if P(limnXn=X)=1P(\lim_{n \to \infty} X_n = X) = 1

Convergence in Probability

XnpXX_n \to_p X

if for ϵ>0\epsilon > 0: P(XnX>ϵ)0 as nP(|X_n - X| > \epsilon) \to 0 \text{ as } n \to \infty

Convergence in Distribution (Weak)

XndXX_n \to_d X

if limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x) at continuity points

Implies: E[g(Xn)]E[g(X)]E[g(X_n)] \to E[g(X)] for bounded continuous gg

Central Limit Theorem

If X1,,XnX_1, \dots, X_n i.i.d. with E[Xi]=μ,Var(Xi)=σ2E[X_i] = \mu, \text{Var}(X_i) = \sigma^2:

n(Xˉnμ)σdN(0,1)\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \to_d N(0,1)

Slutsky’s Theorem

If XndXX_n \to_d X and YnpcY_n \to_p c (constant):

  • Xn+YndX+cX_n + Y_n \to_d X + c
  • XnYndcXX_n Y_n \to_d cX
  • Xn/YndX/cX_n/Y_n \to_d X/c (if c0c \neq 0)

Markov Chains

Definition

Markov property: P(Xn+1=jXn=i,,X0=i0)=P(Xn+1=jXn=i)P(X_{n+1} = j | X_n = i, \dots, X_0 = i_0) = P(X_{n+1} = j | X_n = i)

Homogeneous: Transition probabilities independent of time

Transition matrix P: (P)ij=Pij=P(Xn+1=jXn=i)(P)_{ij} = P_{ij} = P(X_{n+1} = j | X_n = i)

Properties: Rows sum to 1, entries 0\geq 0

Classification of States

Transient: P(ever return)<1P(\text{ever return}) < 1 Recurrent: P(ever return)=1P(\text{ever return}) = 1

  • Positive recurrent: E[Ti]<E[T_i] < \infty
  • Null recurrent: E[Ti]=E[T_i] = \infty

Periodic: Period dd divides all return times Aperiodic: d=1d = 1

Stationary Distribution

π\pi satisfies: π=πP\pi = \pi P and πi=1\sum \pi_i = 1

For irreducible chain: Stationary distribution unique if positive recurrent

Long-run behavior: limnPij(n)=πj\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j

Convergence

Ergodic theorem: For positive recurrent aperiodic chain:

limn1nk=0n1I(Xk=j)=πj a.s.\lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} \mathbb{I}(X_k = j) = \pi_j \text{ a.s.}


Poisson Processes

Definition

Counting process {N(t), t ≥ 0}:

  • N(0)=0N(0) = 0
  • Independent increments
  • N(t)N(s)Poisson(λ(ts))N(t) - N(s) \sim \text{Poisson}(\lambda(t-s))

Rate λ\lambda: Expected arrivals per unit time

Properties

Number of arrivals in [0,t]: N(t)Poisson(λt)N(t) \sim \text{Poisson}(\lambda t)

P(N(t)=n)=(λt)neλtn!P(N(t) = n) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}

Inter-arrival times: T1,T2,T_1, T_2, \dots i.i.d. Exponential(λ\lambda)

Arrival times: Sn=i=1nTiGamma(n,λ)S_n = \sum_{i=1}^n T_i \sim \text{Gamma}(n, \lambda)

fSn(t)=λntn1eλt(n1)!f_{S_n}(t) = \frac{\lambda^n t^{n-1} e^{-\lambda t}}{(n-1)!}

Compound Poisson

Y(t)=i=1N(t)YiY(t) = \sum_{i=1}^{N(t)} Y_i

where YiY_i i.i.d., independent of N(t)N(t)


Martingales

Definition

Sequence {MnM_n} is martingale if:

  • E[Mn]<E[|M_n|] < \infty
  • E[Mn+1Mn,,M1]=MnE[M_{n+1} | M_n, \dots, M_1] = M_n

Doob’s Optional Stopping Theorem

If τ\tau stopping time, then under conditions:

E[Mτ]=E[M0]E[M_τ] = E[M_0]

Conditions: Bounded stopping time, or bounded martingale, etc.

Applications

Gambling: Fair game Random walks Ruin probabilities