Probability

Concept

Probability Theory - Random Variables and Stochastic Processes

Table of Contents

  1. Probability Spaces
  2. Conditional Probability
  3. Random Variables
  4. Expectation and Variance
  5. Moment Generating Functions
  6. Multivariate Distributions
  7. Convergence
  8. Markov Chains
  9. Poisson Processes
  10. Martingales

Probability Spaces

Sample Space and Events

Sample space Ω\Omega: Set of all outcomes

Event: Subset of Ω\Omega

** σ\sigma-algebra F\mathcal{F}**: Collection of events satisfying:

  • ΩF\Omega \in \mathcal{F}
  • If AFA \in \mathcal{F}, then AFA' \in \mathcal{F}(complement)
  • If A1,A2,s˙FA_1, A_2, \dot s \in \mathcal{F}, then AiF\cup A_i \in \mathcal{F}(countable unions)

Probability Measure

** P:F[0,1]P: \mathcal{F} \to [0,1]** satisfying:

  • P(Ω)=1P(\Omega) = 1
  • Countable additivity: If A1,A2,s˙A_1, A_2, \dot sdisjoint, then P(Ai)=P(Ai)P(\cup A_i) = \sum P(A_i)

Axioms:

  1. P(A)0P(A) \geq 0
  2. P(Ω)=1P(\Omega) = 1
  3. P(Ai)=P(Ai)P(\cup A_i) = \sum P(A_i)for disjoint events

Basic Properties

  • P()=0P(\emptyset) = 0
  • P(A)=1P(A)P(A') = 1 - P(A)
  • If ABA \subseteq B, then P(A)P(B)P(A) \leq P(B)
  • P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
  • P(AB)min(P(A),P(B))P(A \cap B) \leq \min(P(A), P(B))

De Morgan’s Laws

  • P((AB))=P(AB)P((A \cup B)') = P(A' \cap B')
  • P((AB))=P(AB)P((A \cap B)') = P(A' \cup B')

Conditional Probability

Definition

P(AB)=P(AB)P(B)P(A | B) = \frac{P(A \cap B)}{P(B)}

Provided P(B)>0P(B) > 0

Bayes’ Theorem

P(AB)=P(BA)P(A)P(B)P(A | B) = \frac{P(B | A) P(A)}{P(B)}

Partition version: If A1,s˙,AkA_1, \dot s, A_kpartition Ω\Omega:

P(AiB)=P(BAi)P(Ai)j=1kP(BAj)P(Aj)P(A_i | B) = \frac{P(B | A_i) P(A_i)}{\sum_{j=1}^k P(B | A_j) P(A_j)}

Independence

Two events: P(AB)=P(A)P(B)P(A \cap B) = P(A)P(B)

Mutual independence: P(i=1nAi)=i=1nP(Ai)P\left(\bigcap_{i=1}^n A_i\right) = \prod_{i=1}^n P(A_i)

Pairwise independence: P(AiAj)=P(Ai)P(Aj)P(A_i \cap A_j) = P(A_i)P(A_j)for all iji \neq j

Note: Pairwise independence \neqmutual independence


Random Variables

Definition

Random variable XX: Function X:ΩRX: \Omega \to \mathbb{R}

Measurable: {ω:X(ω)x}F\omega : X(\omega) \leq x\} \in \mathcal{F}for all xx

Cumulative Distribution Function

CDF: F(x)=P(Xx)F(x) = P(X \leq x)

Properties:

  • Non-decreasing
  • Right-continuous
  • limxF(x)=0\lim_{x \to - \infty} F(x) = 0
  • limxF(x)=1\lim_{x \to \infty} F(x) = 1

Probability Mass Function (Discrete)

PMF: p(x)=P(X=x)p(x) = P(X = x)

Properties: p(x)=1\sum p(x) = 1, p(x)0p(x) \geq 0

Probability Density Function (Continuous)

PDF: f(x)f(x)such that P(aXb)=abf(x)dxP(a \leq X \leq b) = \int_a^b f(x)dx

Properties:

  • f(x)0f(x) \geq 0
  • f(x)dx=1\int_{-\infty}^{\infty} f(x)dx = 1
  • F(x)=xf(t)dtF(x) = \int_{-\infty}^x f(t)dt
  • f(x)=F(x)f(x) = F'(x)(where derivative exists)

Transformations

Discrete: If Y=g(X)Y = g(X), then pY(y)=x:g(x)=ypX(x)p_Y(y) = \sum_{x: g(x)=y} p_X(x)

Continuous: If Y=g(X)Y = g(X)with ggstrictly monotonic:

fY(y)=fX(g1(y))ddyg1(y)f_Y(y) = f_X(g^{-1}(y)) \left|\frac{d}{dy}g^{-1}(y)\right|


Expectation and Variance

Expected Value

Discrete: E[X]=xp(x)E[X] = \sum x \cdot p(x)

Continuous: E[X]=xf(x)dxE[X] = \int_{-\infty}^{\infty} x f(x)dx

Properties:

  • E[c]=cE[c] = c(constant)
  • E[cX]=cE[X]E[cX] = c E[X]
  • E[X+Y]=E[X]+E[Y]E[X + Y] = E[X] + E[Y]
  • E[XY]=E[X]E[Y]E[XY] = E[X]E[Y]if independent

Variance

** Var(X)=E[(XE[X])2]=E[X2](E[X])2\text{Var}(X) = E[(X - E[X])^2] = E[X^2] - (E[X])^2**

Standard deviation: σ=Var(X)\sigma = \sqrt{\text{Var}(X)}

Properties:

  • Var(c)=0\text{Var}(c) = 0
  • Var(cX)=c2Var(X)\text{Var}(cX) = c^2 \text{Var}(X)
  • Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)if independent

Bound: Var(X)0\text{Var}(X) \geq 0with equality     X\iff Xconstant

Covariance

** Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]\text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y]**

Properties:

  • Cov(X,X)=Var(X)\text{Cov}(X,X) = \text{Var}(X)
  • Cov(X,Y)=Cov(Y,X)\text{Cov}(X,Y) = \text{Cov}(Y,X)
  • Cov(aX+b,cY+d)=acCov(X,Y)\text{Cov}(aX + b, cY + d) = ac \text{Cov}(X,Y)
  • Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z)\text{Cov}(X + Y, Z) = \text{Cov}(X,Z) + \text{Cov}(Y,Z)

Correlation

** ρ(X,Y)=Cov(X,Y)/(σXσY)\rho(X,Y) = \text{Cov}(X,Y)/(\sigma_X \sigma_Y)**

  • ρ1|\rho| \leq 1(Cauchy-Schwarz)
  • ρ=±1    Y=aX+b\rho = \pm 1 \iff Y = aX + b
  • ρ=0\rho = 0: uncorrelated (implies Cov = 0)

Note: Uncorrelated \neqIndependent


Moment Generating Functions

Definition

MGF: MX(t)=E[etX]M_X(t) = E[e^{tX}]

Properties:

  • MX(0)=1M_X(0) = 1
  • MX(n)(0)=E[Xn]M_X^{(n)}(0) = E[X^n], the nn-th moment
  • MaX+b(t)=ebtMX(at)M_{aX+b}(t) = e^{bt} M_X(at)
  • If X,YX, Yindependent: MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t)
  • Uniqueness: MGF determines distribution

Existence: May not exist for some distributions.

Characteristic Function

** ϕX(t)=E[eitX]\phi_X(t) = E[e^{itX}]** (always exists)

Properties similar to MGF but uses complex exponential.

Common MGFs

  • Binomial: M(t)=(pet+q)nM(t) = (pe^t + q)^n
  • Poisson: M(t)=eλ(et1)M(t) = e^{\lambda(e^t - 1)}
  • Exponential: M(t)=λ/(λt)M(t) = \lambda/(\lambda - t)for t<λt < \lambda
  • Normal: M(t)=eμt+σ2t2/2M(t) = e^{\mu t + \sigma^2 t^2/2}

Multivariate Distributions

Joint Distribution

Joint CDF: F(x,y)=P(Xx,Yy)F(x,y) = P(X \leq x, Y \leq y)

Joint PMF: p(x,y)=P(X=x,Y=y)p(x,y) = P(X = x, Y = y)

Joint PDF: f(x,y)f(x,y)such that P((X,Y)A)=Af(x,y)dxdyP((X,Y) \in A) = \iint_A f(x,y) dx dy

Marginal Distributions

Discrete: pX(x)=yp(x,y)p_X(x) = \sum_y p(x,y) pY(y)=xp(x,y)p_Y(y) = \sum_x p(x,y)

Continuous: fX(x)=f(x,y)dyf_X(x) = \int_{-\infty}^{\infty} f(x,y) dy fY(y)=f(x,y)dxf_Y(y) = \int_{-\infty}^{\infty} f(x,y) dx

Independence

** XXand YYindependent     p(x,y)=pX(x)pY(y)\iff p(x,y) = p_X(x) p_Y(y)** or f(x,y)=fX(x)fY(y)f(x,y) = f_X(x) f_Y(y)(for continuous)

If independent, then:

  • E[XY]=E[X]E[Y]E[XY] = E[X] E[Y]
  • Var(X+Y)=Var(X)+Var(Y)\text{Var}(X + Y) = \text{Var}(X) + \text{Var}(Y)
  • MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t) M_Y(t)

Conditional Distributions

Discrete: pYX(yx)=p(x,y)pX(x)p_{Y|X}(y|x) = \frac{p(x,y)}{p_X(x)}

Continuous: fYX(yx)=f(x,y)fX(x)f_{Y|X}(y|x) = \frac{f(x,y)}{f_X(x)}

Conditional Expectation

** E[YX=x]=ypYX(yx)E[Y | X = x] = \sum y \cdot p_{Y|X}(y|x)** (discrete)

** E[YX=x]=yfYX(yx)dyE[Y | X = x] = \int y \cdot f_{Y|X}(y|x) dy** (continuous)

** E[YX]E[Y | X]** is function of XX(random variable)

Properties:

  • E[E[YX]]=E[Y]E[E[Y | X]] = E[Y]
  • If X,YX, Yindependent: E[YX]=E[Y]E[Y | X] = E[Y]
  • E[YX,Z]E[Y | X, Z]involves XXand ZZ

Law of Total Expectation (Tower Property)

E[Y]=E[E[YX]]E[Y] = E[E[Y | X]]

Law of Total Variance

** Var(Y)=E[Var(YX)]+Var(E[YX])\text{Var}(Y) = E[\text{Var}(Y | X)] + \text{Var}(E[Y | X])**


Convergence

Almost Sure Convergence

XnX a.s.X_n \to X \text{ a.s.}

if P(limnXn=X)=1P(\lim_{n \to \infty} X_n = X) = 1

Convergence in Probability

XnpXX_n \to_p X

if for ϵ>0\epsilon > 0: P(XnX>ϵ)0 as nP(|X_n - X| > \epsilon) \to 0 \text{ as } n \to \infty

Convergence in Distribution (Weak)

XndXX_n \to_d X

if limnFXn(x)=FX(x)\lim_{n \to \infty} F_{X_n}(x) = F_X(x)at continuity points

Implies: E[g(Xn)]E[g(X)]E[g(X_n)] \to E[g(X)]for bounded continuous gg

Central Limit Theorem

If X1,s˙,XnX_1, \dot s, X_ni.i.d. with E[Xi]=μ,Var(Xi)=σ2E[X_i] = \mu, \text{Var}(X_i) = \sigma^2:

n(Xˉnμ)σdN(0,1)\frac{\sqrt{n}(\bar{X}_n - \mu)}{\sigma} \to_d N(0,1)

Slutsky’s Theorem

If XndXX_n \to_d Xand YnpcY_n \to_p c(constant):

  • Xn+YndX+cX_n + Y_n \to_d X + c
  • XnYndcXX_n Y_n \to_d cX
  • Xn/YndX/cX_n/Y_n \to_d X/c(if c0c \neq 0)

Markov Chains

Definition

Markov property: P(Xn+1=jXn=i,s˙,X0=i0)=P(Xn+1=jXn=i)P(X_{n+1} = j | X_n = i, \dot s, X_0 = i_0) = P(X_{n+1} = j | X_n = i)

Homogeneous: Transition probabilities independent of time

Transition matrix P: (P)ij=Pij=P(Xn+1=jXn=i)(P)_{ij} = P_{ij} = P(X_{n+1} = j | X_n = i)

Properties: Rows sum to 1, entries 0\geq 0

Classification of States

Transient: P(ever return)<1P(\text{ever return}) < 1 Recurrent: P(ever return)=1P(\text{ever return}) = 1

  • Positive recurrent: E[Ti]<E[T_i] < \infty
  • Null recurrent: E[Ti]=E[T_i] = \infty

Periodic: Period dddivides all return times Aperiodic: d=1d = 1

Stationary Distribution

** π\pisatisfies:** π=πP\pi = \pi Pand πi=1\sum \pi_i = 1

For irreducible chain: Stationary distribution unique if positive recurrent

Long-run behavior: limnPij(n)=πj\lim_{n \to \infty} P_{ij}^{(n)} = \pi_j

Convergence

Ergodic theorem: For positive recurrent aperiodic chain:

limn1nk=0n1I(Xk=j)=πj a.s.\lim_{n \to \infty} \frac{1}{n}\sum_{k=0}^{n-1} \mathbb{I}(X_k = j) = \pi_j \text{ a.s.}


Poisson Processes

Definition

Counting process {N(t), t ≥ 0}:

  • N(0)=0N(0) = 0
  • Independent increments
  • N(t)N(s)Poisson(λ(ts))N(t) - N(s) \sim \text{Poisson}(\lambda(t-s))

Rate λ\lambda: Expected arrivals per unit time

Properties

Number of arrivals in [0,t]: N(t)Poisson(λt)N(t) \sim \text{Poisson}(\lambda t)

P(N(t)=n)=(λt)neλtn!P(N(t) = n) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}

Inter-arrival times: T1,T2,s˙T_1, T_2, \dot si.i.d. Exponential(λ\lambda)

Arrival times: Sn=i=1nTiGamma(n,λ)S_n = \sum_{i=1}^n T_i \sim \text{Gamma}(n, \lambda)

fSn(t)=λntn1eλt(n1)!f_{S_n}(t) = \frac{\lambda^n t^{n-1} e^{-\lambda t}}{(n-1)!}

Compound Poisson

** Y(t)=i=1N(t)YiY(t) = \sum_{i=1}^{N(t)} Y_i**

where YiY_ii.i.d., independent of N(t)N(t)


Martingales

Definition

Sequence {MnM_n} is martingale if:

  • E[Mn]<E[|M_n|] < \infty
  • E[Mn+1Mn,s˙,M1]=MnE[M_{n+1} | M_n, \dot s, M_1] = M_n

Doob’s Optional Stopping Theorem

If τ\taustopping time, then under conditions:

E[Mτ]=E[M0]E[M_\tau] = E[M_0]

Conditions: Bounded stopping time, or bounded martingale, etc.

Applications

Gambling: Fair game Random walks Ruin probabilities