Preface
First Edition Preface
[[2024-09-14]] Today the makeup exam finally ended. I heard the original exam directly reused past papers. These past few days I practiced three sets of “XDU original papers” (from 2021 and two from 2023) found online. I did the 2021 paper in the morning, and in the afternoon $\frac{1}{4}$ of the questions were exact copies without any changes. I couldn’t help but laugh.
Dai Hao once said he would try his best to find the best teachers for the Qian Class. But now it seems the School of Mathematics and Statistics has no one left? Poor teaching could be excused as not focusing on education or lacking talent in teaching; but directly reusing recent past papers for exams, full of unchecked errors and omissions, made me laugh in frustration.
The exams they create have no value, and they don’t even bother to test them themselves. This is an attitude problem.
It’s fine that your university goes easy on final exams, but don’t keep fooling people with old material. You preach innovation to students, yet for yourselves, just getting by is enough. This is not the attitude for academic work, nor is it the attitude one should have for teaching.
Probability theory ends here for now. Over the past two days, I repeatedly reviewed notes, practiced problems, and corrected many errors, clarifying the knowledge structure of this course. Although the content is still relatively sparse, it should suffice as final review material. This edition will likely be the final version (probably). I’ll continue organizing Electrodynamics and Digital Signal Processing during the Mid-Autumn Festival.
Second Edition Preface
Nothing is final!!! ——Qian Xuesen
Added content on the left/right continuity of distribution functions. It seems this course is far from final…
Event Operations to Logical Operations
- $A \cup B=A+B$
- $A \cap B=A \cdot B$
- $A-B=A \bar{B}$ Event $A$ occurs and event $B$ does not occur, easily proven by Venn diagrams. $-B$ can be interpreted as $\cdot (-B)$, where $-B$ is $\bar{B}$.
- If $A \subset B$, then $A \cup B=B$, $A \cap B=A$.
After converting event operations to logical operations, most rules are shared. Using logical function operations and simplification learned in digital circuits, complex event operations can be simplified. Tips: Karnaugh maps.
Four Major Probability Formulas
$$ \begin{cases} P(A+B)=P(A)+P(B)-P(AB)\\ P(A-B)=P(A)-P(AB)=P(A \bar{B})\\ P(AB)=P(B) \cdot P(A|B)=P(A) \cdot P(B|A)\\ P(A|B)=\frac{P(AB)}{P(B)}\\ \end{cases} $$Corollary
$P(A+B+C)$: Treat $A+B$ as a single event and apply the addition formula above, splitting twice to get:
$$ P(A+B+C)=P(A)+P(B)+P(C)-P(AB)-P(AC)-P(BC)+P(ABC) $$Probabilities for more joint events can be derived recursively.
Complementary event: The probability that $A$ does not occur, obvious from Venn diagrams.
$$ P(\bar{A})=P(1 \cdot \bar{A})=P(1-A)=P(1)-P(1 \cdot A)=1-P(A) $$Non-Negativity and Normalization
Non-negativity: For any event $A$, $0 \le P(A) \le 1$. Normalization: For the total event $\Omega$, $P(\Omega)=1$.
Independence
$$ \begin{cases} P(AB)=P(A) \cdot P(B)\\ P(A|B)=P(A) \end{cases} $$Independence implies mutual independence.
Classical Probability Model
All elementary events have equal probability.
Eg. Coin toss, dice roll…
$$ P(A)=\frac{\text{Number of elementary events in } A}{\text{Total elementary events in } \Omega} $$Classical conditional probability formula:
$$ P(B|A)=\frac{P(AB)}{P(A)}=\frac{\text{Elementary events in both } A \text{ and } B}{\text{Elementary events in } A} $$Bernoulli Trials (Binomial Distribution)
$n$ independent trials, each with only two outcomes: $A$ or $\bar{A}$.
$X \sim B(n,p)$
$$ P_n(k)=C_n^kp^k(1-p)^{n-k} $$Where $p=P(A)$, $1-p=P(\bar{A})$.
Geometric Probability Model
The ratio of the length/area/volume occupied by the event to the total length/area/volume of the sample space $\Omega$. When the event’s dimension is lower than $\Omega$’s dimension, its probability is always 0. ==Warning==: A probability of 0 does not mean the event cannot occur. Eg: Randomly selecting a point inside a circle, the probability of selecting any specific point is 0, but it can still happen.
Uniform Distribution
$x \sim U(a,b)$ Approximates a linear distribution in geometric probability, with probability density:
$$ f(x)= \begin{cases} 0,x \le a\\ \frac{1}{b-a},a \lt x \le b\\ 0,x \gt b\\ \end{cases} $$Cumulative distribution function:
$$ F(x)= \begin{cases} 0,x \le a\\ \frac{x-a}{b-a},a \lt x \le b\\ 1,x \gt b\\ \end{cases} $$Exponential Distribution
$x \sim E(\lambda)$
Probability Density
$$ f(x)= \begin{cases} \lambda e^{-\lambda x},x \gt 0\\ 0,x \le 0\\ \end{cases} $$Cumulative Distribution Function
$$ F(x)= \begin{cases} 1-e^{-\lambda x},x \ge 0\\ 0,x \lt 0\\ \end{cases} $$Poisson Distribution
$X \sim \pi(\lambda)$
$$ P(X=k)=\frac{e^{-\lambda}\lambda^k}{k!} $$Normal Distribution
$x \sim N(\mu,\sigma^2)$
Probability Density
$$ f(x)=\frac{1}{\sqrt{2 \pi} \sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},x \in R,\sigma \gt 0 $$Cumulative Distribution Function
$$ F(x)=\int^{x}_{-\infty}f(t)dt $$Clearly, $F(\mu)=\frac{1}{2}$, meaning $P(x \le \mu)=P(x \gt \mu)=\frac{1}{2}$.
Standard Normal Distribution
When $\mu=0,\sigma=1$, it becomes the standard normal distribution.
$$ \varphi(x)=\frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} $$$$ \varPhi(x)=\int^{x}_{-\infty}\varphi(t)dt $$Corollaries
$$ \varPhi(-x)=1-\varPhi(x) $$$$ F(x)=\varPhi(\frac{x-\mu}{\sigma}) $$Normalization of normal distribution:
$$ X \sim N(\mu,\sigma^2),Z=\frac{X-\mu}{\sigma}\sim N(0,1) $$Total Probability Formula
Complete Event Group
$$ \begin{cases} B_1 \cup B_2 \cup B_3 \cup \cdots \cup B_n=\Omega\\ B_i \cap B_j=\varnothing,i \ne j,1 \le i \le n,1 \le j \le n\\ \end{cases} $$$B_1,B_2,B_3,\cdots B_n$ form a complete event group for $\Omega$.
Total Probability Formula
$$ \begin{align} P(A) &=P(AB_1 \cup AB_2 \cup \cdots \cup AB_n)\\ &=P(AB_1)+P(AB_2)+\cdots +P(AB_n)\\ &=P(B_1)P(A|B_1)+P(B_2)P(A|B_2)+\cdots +P(B_n)P(A|B_n)\\ \end{align} $$Bayes’ Formula
$$ P(B_1|A)=\frac{P(AB_1)}{P(A)}=\frac{P(B_1)P(A|B_1)}{P(A)} $$One-Dimensional Discrete Random Variables
Probability Mass Function
$$ P(X=x_i)=p_i=\frac{\text{Count of } X=x_i}{\text{Total count}},i=1,2,\cdots $$Cumulative Distribution Function
$$ F(x)=\sum_{x_i \lt x}p_i,x \in R $$One-Dimensional Continuous Random Variables
Probability Density Function
$$ f(x)=F'(x) $$Cumulative Distribution Function
$$ F(x)=\int_{-\infty}^xf(t)dt $$Interval Probability
$$ P(a \lt x \le b)=\int_a^bf(x)dx=F(b)-F(a) $$$\because$ $P(x=a)=0,a \in R$ $\therefore$ The equality signs on the interval can be chosen freely.
Normalization
$$ F(\infty)=\int^{\infty}_{-\infty}f(x)dx=1 $$$$ F(-\infty)=0 $$Two-Dimensional Discrete Random Variables
Joint Probability Mass Function
$P(X=x_i,Y=y_j)$ Create a 2D table of possible values for X and Y, filling in corresponding probabilities.
Marginal Probability Mass Function
$P(X=x_i),P(Y=y_j)$ Sum the rows/columns of the joint probability table to get $f_Y(x),f_X(y)$.
Conditional Distribution
$P(X=x_i|Y=y_j),P(Y=y_i|X=x_j)$ Divide each row/column of the joint probability table by its marginal probability. This scales the joint probabilities so each row/column sums to 1.
Independence of Two Variables
==Independence here refers to linear independence, not complete statistical independence.== Write the joint probability table as a matrix $\vec{A}$. If $\det \vec{A}=0$, X and Y are independent. Or: If the rows/columns of the joint probability table are proportional, X and Y are independent. Or: If the joint probability $\ne$ the product of marginal probabilities, i.e., $P(X=x_i,Y=y_j)\ne P(X=x_i)P(Y=y_j)$, then X and Y are not independent.
Two-Dimensional Continuous Random Variables
Joint Density Function
$$ f(x,y) $$Normalization
$$ \int^{\infty}_{-\infty}\int^{\infty}_{-\infty}f(x,y)dxdy=1 $$Marginal Density Functions
$$ f_X(x)=\int^{\infty}_{-\infty}f(x,y)dy $$$$ f_Y(y)=\int^{\infty}_{-\infty}f(x,y)dx $$Conditional Density
$$ f_{Y|X}(y|x)=\frac{f(x,y)}{f_X(x)} $$Independence
$$ f(x,y)=f_X(x)f_Y(y) $$When this holds, X and Y are independent.
Distribution Function
Let $Z=X-Y$,
$$ \begin{align} F_Z(z) &=P(Z \lt z)\\ &=P(X-Y \lt z)\\ &=P(X \lt Y+z)\\ &=\int^{y}_{-\infty}\int^{y+z}_{-\infty}f(x,y)dxdy\\ \end{align} $$The distribution function $F_Z(z)=\iint_Df(x,y)dxdy$. Differentiate to get the probability density function $f_Z(z)$. ==Warning==: $F_Z(z)$ must satisfy normalization.
Expectation and Variance
Relations
$$ DX=EX^2-(EX)^2 $$$$ D(cX)=c^2DX $$$$ D(X+Y)=D(X)+D(Y)+2Cov(X,Y) $$When X and Y are independent, $Cov(X,Y)=0$.
Common Expectations and Variances
$(0,1)$ Distribution
$$ EX=p,DX=p(1-p) $$$B(n,p)$ Binomial Distribution
$$ EX=np,DX=np(1-p) $$$U(a,b)$ Uniform Distribution
$$ EX=\frac{a+b}{2},DX=\frac{(b-a)^2}{12} $$$E(\lambda)$ Exponential Distribution
$$ EX=\frac{1}{\lambda},DX=\frac{1}{\lambda^2} $$$P(\lambda)$ Poisson Distribution
$$ EX=\lambda,DX=\lambda $$$N(\mu,\sigma^2)$ Normal Distribution
$$ EX=\mu,DX=\sigma^2 $$Covariance and Correlation Coefficient
Covariance
$$ Cov(X,Y)=E(XY)-E(X)E(Y) $$Clearly, when $X=Y$, $Cov(X,X)=DX$.
$$ Cov(X+Y,Z)=Cov(X,Z)+Cov(Y,Z) $$$$ Cov(X-Y,Z)=Cov(X,Z)+Cov(-Y,Z)=Cov(X,Z)-Cov(Y,Z) $$Correlation Coefficient
$$ \rho_{XY}=\frac{Cov(X,Y)}{\sqrt{DX \cdot DY}} $$Higher $|\rho|$ means stronger correlation. When $Y=X$, $X$ and $X$ are perfectly correlated, $\rho=1$. When $Y=-X$, $-X$ and $X$ are perfectly correlated, $\rho=-1$. Clearly $|\rho| \le 1$. $\rho=0$ means X and Y are uncorrelated. ==Warning==: Uncorrelated $\nRightarrow$ Independent, but Independent $\Rightarrow$ Uncorrelated.
Chebyshev’s Inequality for Probability Estimation
$$ P(|X-EX|\ge \varepsilon)\le \frac{DX}{\varepsilon^2} $$Central Limit Theorem
A large number of independent, identically distributed variables can be approximated by a normal distribution. If $x_1,x_2,\cdots,x_n$ are independent and identically distributed, then
$$ \sum_{i=1}^nx_i \sim N(\sum^{n}_{i=1}E(x_i),\sum^{n}_{i=1}D(x_i)) $$Three Major Distributions
$\chi^2$ (Chi-Squared) Distribution
$$ X=x_1^2+x_2^2+\cdots +x_n^2 \sim \chi^2(n),x_i \sim N(0,1) \text{ and independent} $$Upper $\alpha$ quantile $\chi^2_\alpha(n)$ Density function is in the first quadrant.
$t$ Distribution
$$ X=\frac{x_1}{\sqrt{x_2/n}}\sim t(n),x_1 \sim N(0,1),x_2 \sim \chi^2(n),x_1 \text{ and } x_2 \text{ independent} $$Upper $\alpha$ quantile $t_\alpha(n)$ Density function resembles normal distribution, symmetric.
$F$ Distribution
$$ X=\frac{x_1/n_1}{x_2/n_2} \sim F(n_1,n_2),x_1 \sim \chi^2(n_1),x_2 \sim \chi^2(n_2),x_1 \text{ and } x_2 \text{ independent} $$Upper $\alpha$ quantile $F_\alpha(n_1,n_2)$ Density function is in the first quadrant.
Estimation Methods
For simple random samples that are independent and identically distributed, estimate unknown parameters.
Method of Moments
When sample size is large, approximate the sample as uniformly distributed, using sample mean to replace population mean (population moment = sample moment).
- Calculate the expectation $EX$ (first population moment) from the given probability mass/density function.
- Calculate the sample mean $\bar{X}$ (first sample moment) from the given sample.
- Set $EX=\bar{X}$ and solve for $\theta_0$ as $\hat{\theta}$.
Maximum Likelihood Estimation
The estimate maximizes the probability of the observed sample. Likelihood function for the sample:
$$ L(x_1,x_2,\cdots,x_n;\theta)= \begin{cases} P(X=x_1)P(X=x_2)\cdots P(X=x_n), \text{discrete}\\ f(x_1;\theta)f(x_2;\theta)\cdots f(x_n;\theta), \text{continuous}\\ \end{cases} $$To find the maximum of $L$, take the derivative to find critical points. Since the product form is cumbersome, first take the logarithm before differentiating with respect to $\theta$.
$$ (\ln L)'= \begin{cases} (\ln P_1+\ln P_2+\cdots +\ln P_n)', \text{discrete}\\ [\ln f(x_1;\theta)+\ln f(x_2;\theta)+\cdots +\ln f(x_n;\theta)]', \text{continuous}\\ \end{cases} =0 $$Solve for the critical point $\theta_0$, which is the estimate $\hat{\theta}$.
Unbiasedness and Efficiency
If $E(\hat{\theta})=\theta$, then $\hat{\theta}$ is an unbiased estimator of $\theta$. If $\hat{\theta_1},\hat{\theta_2}$ are both unbiased, and $D(\hat{\theta_1}) \lt \hat{\theta_2}$, then $\hat{\theta_1}$ is more efficient than $\hat{\theta_2}$.
Interval Estimation
$X \sim N(\mu,\sigma^2)$, typically given $\bar{X}=\mu,S=\sigma$. Confidence level: $1-\alpha$, usually $\alpha=5\%$.
Confidence Interval for $\mu$
$\sigma^2$ Known
Pivotal quantity (standardized):
$$ \frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1) $$$$ \mu \in (\bar{x}-\frac{\sigma}{\sqrt{n}}\mu_{\frac{\alpha}{2}},\bar{x}+\frac{\sigma}{\sqrt{n}}\mu_{\frac{\alpha}{2}}) $$$\sigma^2$ Unknown
Pivotal quantity:
$$ \frac{\bar{X}-\mu}{S/\sqrt{n}}\sim t(n-1) $$$$ \mu \in (\bar{x}-\frac{S}{\sqrt{n}}t_{\frac{\alpha}{2}}(n-1),\bar{x}+\frac{S}{\sqrt{n}}t_{\frac{\alpha}{2}}(n-1)) $$Confidence Interval for $\sigma^2$
Usually $\mu$ is unknown. Pivotal quantity:
$$ \frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1) $$$$ \sigma^2 \in (\frac{(n-1)S^2}{\chi^2_{\frac{\alpha}{2}}(n-1)},\frac{(n-1)S^2}{\chi^2_{1-\frac{\alpha}{2}}(n-1)}}) $$Hypothesis Testing
Generally, the significance level is set at $\alpha=5\%$.
$\mu$ Test (Mean Test)
Hypothesis Formulation
$H_0: \mu = \mu_0$ (null hypothesis)
$H_1: \mu \ne \mu_0$ (alternative hypothesis)Test Statistic Selection
- When population variance $\sigma^2$ is known:
Use $Z = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \sim N(0,1)$ (Z-test) - When population variance $\sigma^2$ is unknown:
Use $T = \frac{\bar{X} - \mu}{S / \sqrt{n}} \sim t(n-1)$ (T-test)
- When population variance $\sigma^2$ is known:
Rejection Region Determination
- For Z-test:
$W = (-\infty, -z_{\alpha/2}) \cup (z_{\alpha/2}, \infty)$ - For T-test:
$W = (-\infty, -t_{\alpha/2}(n-1)) \cup (t_{\alpha/2}(n-1), \infty)$
- For Z-test:
Decision Rule
Reject $H_0$ if the computed test statistic falls within the rejection region $W$.
$\sigma^2$ Test (Variance Test)
Sample standard deviation formula:
Hypothesis Formulation
$H_0: \sigma^2 = \sigma_0^2$
$H_1: \sigma^2 \ne \sigma_0^2$Test Statistic Selection
Use $\chi^2 = \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$ (Chi-square test)Rejection Region Determination
$W = (0, \chi^2_{1-\alpha/2}(n-1)) \cup (\chi^2_{\alpha/2}(n-1), \infty)$Decision Rule
Reject $H_0$ if the test statistic falls within the rejection region $W$.
Supplementary Notes
Properties of Distribution Functions
For different types of random variables:
- Continuous random variables: The distribution function is continuous.
- Discrete random variables: The continuity of the distribution function depends on its definition.
Left-Continuous Definition
$$ F(x) = P(X \lt x) $$
In this case:
- $F(x) = F(x^-) = F(x-0) = P(X \lt x)$
- $F(x^+) = F(x+0) = P(X \lt x) + P(X = x)$
When $P(X = x) \ne 0$, $F(x^+) \gt F(x) = F(x^-)$, making the distribution function left-continuous but not right-continuous.
Right-Continuous Definition
$$ F(x) = P(X \le x) $$
In this case:
- $F(x) = F(x^+) = F(x+0) = P(X \le x)$
- $F(x^-) = F(x-0) = P(X \le x) - P(X = x)$
When $P(X = x) \ne 0$, $F(x^+) = F(x) \gt F(x^-)$, making the distribution function right-continuous but not left-continuous.
Coin Toss Example
Consider a single coin toss:
- Heads (1): Probability 0.5
- Tails (0): Probability 0.5
Random variable $X$ has the distribution:
Cumulative probabilities:
Using the left-continuous definition $F(x) = P(X \lt x)$:
Here:
- $F(0^-) = F(0) = 0$
- $F(0^+) = 0.5$
- At $x=0$, there is a discontinuity point where the function is left-continuous but not right-continuous.

When will I have a drink and discuss the details again?