Statistics

Concept

Statistics - Descriptive Statistics, Distributions, and Inference

Table of Contents

  1. Descriptive Statistics
  2. Probability Distributions
  3. Sampling Distributions
  4. Estimation
  5. Hypothesis Testing
  6. Regression
  7. Analysis of Variance
  8. Nonparametric Methods

Descriptive Statistics

Measures of Center

Mean (average): xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i

Median: Middle value (50th percentile) If nn odd: middle value If nn even: average of two middle values

Mode: Most frequent value

Trimmed mean: Mean with outliers removed

Measures of Spread

Variance: s2=1n1i=1n(xixˉ)2s^2 = \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2

Standard deviation: s=s2s = \sqrt{s^2}

Range: maxmin\max - \min

Interquartile Range (IQR): Q3Q1Q_3 - Q_1 (Q1=25thpercentile,Q3=75thpercentileQ_1 = 25th percentile, Q_3 = 75th percentile)

Outlier rule: Values outside [Q11.5IQR,Q3+1.5IQR][Q_1 - 1.5 \cdot \text{IQR}, Q_3 + 1.5 \cdot \text{IQR}]

Box Plot

  • Whiskers extend to min/max within 1.5IQR
  • Box from Q1Q_1 to Q3Q_3
  • Median line inside box

Standardized Score (z-score)

z=xxˉsz = \frac{x - \bar{x}}{s}

Chebyshev’s Inequality: At least 11/k21 - 1/k^2 of data within kk standard deviations of mean.

Empirical Rule (Normal): 68-95-99.7 within 1, 2, 3 standard deviations.


Probability Distributions

Discrete Distributions

Uniform: P(X=k)=1/nP(X = k) = 1/n for k=1,,nk = 1, \dots, n

Bernoulli: P(X=1)=p,P(X=0)=1pP(X = 1) = p, P(X = 0) = 1-p E[X]=p,Var(X)=p(1p)E[X] = p, \text{Var}(X) = p(1-p)

Binomial: XBin(n,p)X \sim \text{Bin}(n, p) P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k}p^k(1-p)^{n-k} E[X]=np,Var(X)=np(1p)E[X] = np, \text{Var}(X) = np(1-p)

Geometric: Number of trials until first success P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p E[X]=1/p,Var(X)=(1p)/p2E[X] = 1/p, \text{Var}(X) = (1-p)/p^2

Negative Binomial: Number of trials until rr successes P(X=k)=(k1r1)pr(1p)krP(X = k) = \binom{k-1}{r-1}p^r(1-p)^{k-r}

Hypergeometric: Sampling without replacement NN total objects, MM successes, nn draws: P(X=k)=(Mk)(NMnk)(Nn)P(X = k) = \frac{\binom{M}{k}\binom{N-M}{n-k}}{\binom{N}{n}}

Poisson: λ\lambda parameter P(X=k)=λkeλk!P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!} E[X]=Var(X)=λE[X] = \text{Var}(X) = \lambda

Continuous Distributions

Uniform: f(x)=1/(ba)f(x) = 1/(b-a) for axba \leq x \leq b E[X]=(a+b)/2,Var(X)=(ba)2/12E[X] = (a+b)/2, \text{Var}(X) = (b-a)^2/12

Exponential: f(x)=λeλxf(x) = \lambda e^{-\lambda x} for x0x \geq 0 E[X]=1/λ,Var(X)=1/λ2E[X] = 1/\lambda, \text{Var}(X) = 1/\lambda^2 Memoryless property: P(X>s+tX>s)=P(X>t)P(X > s+t | X > s) = P(X > t)

Normal: XN(μ,σ2)X \sim N(\mu, \sigma^2) f(x)=1σ2πe(xμ)22σ2f(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}} E[X]=μ,Var(X)=σ2E[X] = \mu, \text{Var}(X) = \sigma^2 Standard normal: ZN(0,1)Z \sim N(0, 1)

Standardizing: Z=(Xμ)/σZ = (X - \mu)/\sigma

Gamma: f(x)=λαΓ(α)xα1eλxf(x) = \frac{\lambda^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\lambda x} E[X]=α/λ,Var(X)=α/λ2E[X] = \alpha/\lambda, \text{Var}(X) = \alpha/\lambda^2

Chi-square ((\chi^2_\nu)): Gamma with α=ν/2,λ=1/2\alpha = \nu/2, \lambda = 1/2 Degrees of freedom parameter ν\nu

t-distribution: Heavy-tailed, parameter ν\nu (degrees of freedom) Approaches normal as ν\nu \to \infty

F-distribution: Fν1,ν2F_{\nu_1,\nu_2} (two degrees of freedom)

Central Limit Theorem

If X1,,XnX_1, \dots, X_n i.i.d. with E[Xi]=μ,Var(Xi)=σ2E[X_i] = \mu, \text{Var}(X_i) = \sigma^2:

Xˉμσ/nN(0,1)\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \to N(0,1)

In distribution as nn \to \infty

Law of Large Numbers

Sample mean converges to population mean

Xˉn=1ni=1nXiμ\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \to \mu

Almost surely (strong law).


Sampling Distributions

Sample Mean

If X1,,XnN(μ,σ2)X_1, \dots, X_n \sim N(\mu, \sigma^2) i.i.d.:

XˉN(μ,σ2n)\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)

Standardized:

Xˉμσ/nN(0,1)\frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1)

Sample Variance

S2=1n1i=1n(XiXˉ)2S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2

Chi-square Distribution

If X1,,XnN(μ,σ2)X_1, \dots, X_n \sim N(\mu, \sigma^2) i.i.d.:

(n1)S2σ2χn12\frac{(n-1)S^2}{\sigma^2} \sim \chi^2_{n-1}

t-distribution

T=XˉμS/ntn1T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}

Comparison: Student’s t versus normal (use t when σ\sigma unknown).

Two Samples

Difference of means:

Xˉ1Xˉ2N(μ1μ2,σ12n1+σ22n2)\bar{X}_1 - \bar{X}_2 \sim N\left(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}\right)

Pooled variance: If σ1=σ2\sigma_1 = \sigma_2:

t=Xˉ1Xˉ2sp1/n1+1/n2tn1+n22t = \frac{\bar{X}_1 - \bar{X}_2}{s_p\sqrt{1/n_1 + 1/n_2}} \sim t_{n_1+n_2-2}

where sp2=(n11)s12+(n21)s22n1+n22s_p^2 = \frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}


Estimation

Point Estimation

Estimator: Function of sample (random variable)

Estimate: Realization for specific sample

Unbiased: E[θ^]=θE[\hat{\theta}] = \theta

Consistent: θ^nθ\hat{\theta}_n \to \theta (in probability)

Efficient: Minimum variance among unbiased estimators

Maximum Likelihood Estimation

Likelihood:

L(θ)=i=1nf(xi;θ)L(\theta) = \prod_{i=1}^n f(x_i; \theta)

MLE: θ^\hat{\theta} maximizes L(θ)L(\theta) or equivalently logL(θ)\log L(\theta)

Invariance: If θ^\hat{\theta} is MLE for θ\theta, then g(θ^)g(\hat{\theta}) is MLE for g(θ)g(\theta)

Method of Moments

Moment estimator: Equate sample moments to population moments

E[Xk]=xik/nE[X^k] = \sum x_i^k/n

Confidence Intervals

Interpretation: C% of intervals contain true parameter

Normal, known variance: [xˉzα/2σn,xˉ+zα/2σn]\left[\bar{x} - z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \bar{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right]

Normal, unknown variance: [xˉtn1,α/2sn,xˉ+tn1,α/2sn]\left[\bar{x} - t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}, \bar{x} + t_{n-1,\alpha/2}\frac{s}{\sqrt{n}}\right]

Proportion: [p^zα/2p^(1p^)n,p^+zα/2p^(1p^)n]\left[\hat{p} - z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}, \hat{p} + z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\right]

(For large nn)

Sample size needed: n=(zα/2σE)2n = \left(\frac{z_{\alpha/2}\sigma}{E}\right)^2

where EE is desired margin of error.

Bootstrapping

Resampling method: Sample with replacement from data

Use empirical distribution as approximate population distribution.


Hypothesis Testing

Hypotheses

H0H_0: Null hypothesis (status quo) H1H_1: Alternative hypothesis

Type I error: Reject H0H_0 when H0H_0 true (significance level α\alpha)

Type II error: Fail to reject H0H_0 when H1H_1 true Power: 1P(Type II error)=P(reject H0H1 true)1 - P(\text{Type II error}) = P(\text{reject } H_0 | H_1 \text{ true})

Test Statistic

One-sample z-test (normal, σ\sigma known): z=xˉμ0σ/nz = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}

One-sample t-test (normal, σ\sigma unknown): t=xˉμ0s/ntn1t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} \sim t_{n-1}

Two-sample t-test: t=xˉ1xˉ2sp1/n1+1/n2tn1+n22t = \frac{\bar{x}_1 - \bar{x}_2}{s_p\sqrt{1/n_1 + 1/n_2}} \sim t_{n_1+n_2-2}

p-value

Probability of observing statistic at least as extreme assuming H0H_0 true

Decision: Reject H0H_0 if p<αp < \alpha

Interpretation: Smaller p-value is stronger evidence against H0H_0

Rejection Regions

Two-tailed: Reject if |test statistic| > critical value

One-tailed: Reject if test statistic > critical value (upper) or < -critical value (lower)

t-tests

One-sample, two-sided: H0:μ=μ0H_0: \mu = \mu_0 vs H1:μμ0H_1: \mu \neq \mu_0

Reject if t>tn1,α/2|t| > t_{n-1,\alpha/2}

One-sample, upper: H0:μμ0H_0: \mu \leq \mu_0 vs H1:μ>μ0H_1: \mu > \mu_0

Reject if t>tn1,αt > t_{n-1,\alpha}

Tests for Proportions

One-sample: z=p^p0p0(1p0)/nN(0,1)z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} \sim N(0,1)

Two-sample: z=p^1p^2p^(1p^)(1/n1+1/n2)z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(1/n_1 + 1/n_2)}}

where p^=(x1+x2)/(n1+n2)\hat{p} = (x_1 + x_2)/(n_1 + n_2)

Chi-square Tests

Goodness of fit:

χ2=(OiEi)2Eiχk1m2χ^2 = \sum\frac{(O_i - E_i)^2}{E_i} \sim χ^2_{k-1-m}

where OiO_i observed, EiE_i expected, mm parameters estimated.

Test of independence (contingency table):

χ2=(OijEij)2Eijχ(r1)(c1)2χ^2 = \sum\sum\frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim χ^2_{(r-1)(c-1)}

where Eij=(row i total)(column j total)/nE_{ij} = (\text{row } i \text{ total})(\text{column } j \text{ total})/n.


Regression

Simple Linear Regression

Model: y=β0+β1x+ϵ,ϵN(0,σ2)y = \beta_0 + \beta_1x + \epsilon, \epsilon \sim N(0, \sigma^2)

Least squares estimates:

b1=(xixˉ)(yiyˉ)(xixˉ)2=SxySxxb_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2} = \frac{S_{xy}}{S_{xx}}

b0=yˉb1xˉb_0 = \bar{y} - b_1\bar{x}

Inference on Slope

t-test: H0:β1=0H_0: \beta_1 = 0 vs H1:β10H_1: \beta_1 \neq 0

t=b1s/Sxxtn2t = \frac{b_1}{s/\sqrt{S_{xx}}} \sim t_{n-2}

where s=SSEn2s = \sqrt{\frac{SSE}{n-2}} (residual standard error)

Confidence interval:

b1±tn2,α/2sSxxb_1 \pm t_{n-2,\alpha/2}\frac{s}{\sqrt{S_{xx}}}

Confidence and Prediction Intervals

For mean: μyx0\mu_{y|x_0}

y^0±tn2,α/2s1n+(x0xˉ)2Sxx\hat{y}_0 \pm t_{n-2,\alpha/2}s\sqrt{\frac{1}{n} + \frac{(x_0-\bar{x})^2}{S_{xx}}}

For new observation: y0y_0

y^0±tn2,α/2s1+1n+(x0xˉ)2Sxx\hat{y}_0 \pm t_{n-2,\alpha/2}s\sqrt{1 + \frac{1}{n} + \frac{(x_0-\bar{x})^2}{S_{xx}}}

Multiple Regression

y = β0+β1x1++βkxk+ϵ\beta_0 + \beta_1x_1 + \dots + \beta_k x_k + \epsilon

Matrix form: y=Xβ+ϵy = X\beta + \epsilon

Least squares:

β^=(XTX)1XTy\hat{\beta} = (X^T X)^{-1} X^T y

Inference: F-test for model, t-tests for individual coefficients

R²: Proportion of variance explained R2=SSR/SST=1SSE/SSTR^2 = SSR/SST = 1 - SSE/SST


Analysis of Variance

One-Way ANOVA

Model: yij=μi+ϵij,ϵijN(0,σ2)y_{ij} = \mu_i + \epsilon_{ij}, \epsilon_{ij} \sim N(0, \sigma^2)

Hypothesis: H0:μ1=μ2==μkH_0: \mu_1 = \mu_2 = \dots = \mu_k

F-test:

F=MSTrMSEFk1,nkF = \frac{MSTr}{MSE} \sim F_{k-1, n-k}

where:

  • MSTr=SSTr/(k1)MSTr = SSTr/(k-1)
  • MSE=SSE/(nk)MSE = SSE/(n-k)

Test statistic:

F=ni(yˉiyˉ)2/(k1)(yijyˉi)2/(nk)F = \frac{\sum n_i(\bar{y}_i - \bar{y})^2/(k-1)}{\sum\sum(y_{ij} - \bar{y}_i)^2/(n-k)}

ANOVA Table:

SourcedfSSMSF
Treatmentsk-1SSTrMSTrMSTr/MSE
Errorn-kSSEMSE
Totaln-1SST

Two-Way ANOVA

Model: yijk=μ+αi+βj+(αβ)ij+ϵijky_{ijk} = \mu + \alpha_i + \beta_j + (\alpha\beta)_{ij} + \epsilon_{ijk}

Tests for main effects and interaction.


Nonparametric Methods

Sign Test

For median μ~0\tilde{\mu}_0:

Test H0:μ~=μ~0H_0: \tilde{\mu} = \tilde{\mu}_0 using signs of (xiμ~0)(x_i - \tilde{\mu}_0)

Wilcoxon Signed-Rank Test

For paired differences (nonparametric alternative to paired t-test)

Use ranks of |differences|, account for signs.

Wilcoxon Rank-Sum Test

For two independent samples (nonparametric alternative to two-sample t-test)

Mann-Whitney: Sum ranks from one sample

Kruskal-Wallis Test

Multi-sample nonparametric test

Uses ranks, alternative to one-way ANOVA

H=12n(n+1)Ri2ni3(n+1)χk12H = \frac{12}{n(n+1)}\sum\frac{R_i^2}{n_i} - 3(n+1) \sim χ^2_{k-1}


Supplementary Statistics & Probability Reference (from mathematics_GPT)

Unique Example Summaries

  1. Chebyshev’s Inequality: If a data set has a mean of 100 and a standard deviation of 20, at least 75% of the data falls within 60 to 140.

  2. Empirical Rule (Normal): For a normal distribution, 68% of the data falls within one standard deviation of the mean, 95% within two, and 99.7% within three.

  3. Law of Large Numbers: If you flip a fair coin repeatedly, the proportion of heads will converge to 0.5 as the number of flips increases.

  4. Central Limit Theorem: If you repeatedly roll a fair six-sided die, the average of the results will be approximately normally distributed, regardless of the number of rolls.

  5. Chi-square Distribution: If you have a large number of independent, identically distributed random variables, their sum will be approximately chi-square distributed.

  6. t-distribution: If you take a sample from a normal distribution and calculate the sample mean and sample standard deviation, the ratio of the sample mean to the sample standard deviation divided by the square root of the sample size will be approximately t-distributed.

Mnemonic Tables

  1. Probability Distributions:

    • Uniform: All outcomes equally likely.
    • Bernoulli: Binary outcome (success/failure).
    • Binomial: Multiple independent Bernoulli trials.
    • Geometric: Number of trials until first success.
    • Negative Binomial: Number of trials until r successes.
    • Hypergeometric: Sampling without replacement.
    • Poisson: Counts of rare events.
  2. Continuous Distributions:

    • Uniform: Flat line.
    • Exponential: Decaying curve.
    • Normal: Bell curve.
    • Gamma: Right-skewed.
    • Chi-square: Right-skewed.
    • t-distribution: Heavy-tailed.
    • F-distribution: Fatter than normal.
  3. Hypothesis Testing:

    • One-sample z-test: Normal, σ known.
    • One-sample t-test: Normal, σ unknown.
    • Two-sample t-test: Two samples, pooled variance.
    • Chi-square tests: Goodness of fit, independence.
  4. Confidence Intervals:

    • Normal, known variance: z-score.
    • Normal, unknown variance: t-distribution.
    • Proportion: Normal approximation.
  5. Regression:

    • Simple linear: Least squares, t-test.
    • Multiple: Matrix form, F-test.

Practical Rules

  1. Sampling:

    • For large populations, sampling with replacement is often sufficient.
    • For small populations, sampling without replacement is necessary.
  2. Hypothesis Testing:

    • If the sample size is large (n > 30), z-tests can be used.
    • If the sample size is small (n < 30), t-tests are preferred.
    • If σ is unknown, use t-tests.
  3. Confidence Intervals:

    • For large n, z-intervals are accurate.
    • For small n, t-intervals are more robust.
    • For proportions, use normal approximation for large n.
  4. Regression:

    • R² measures the proportion of variance explained.
    • A high R² does not imply causality.
    • Always check residuals for normality and independence.

Next: Probability Theory


Last updated: Comprehensive statistics reference covering descriptive, inferential, and regression methods.