7.1 The Normal Distribution - 教材内容

7.1.1 连续随机变量 / Continuous Random Variables

在统计学中,连续随机变量是可以取无限多个值的随机变量。与离散随机变量不同,连续随机变量取任何特定值的概率为0,我们只能计算它在某个区间内取值的概率。

In statistics, a continuous random variable is a random variable that can take an unlimited number of values. Unlike discrete random variables, the probability that a continuous random variable takes any specific value is 0; we can only calculate the probability that it takes values within a certain interval.

对于连续随机变量X,有以下重要性质:

For a continuous random variable X, the following important properties hold:

1. \( P(X = a) = 0 \) 对于任何实数a

1. \( P(X = a) = 0 \) for any real number a

2. \( P(a \leq X \leq b) = P(a < X < b) \) 因为单点概率为0

2. \( P(a \leq X \leq b) = P(a < X < b) \) because single point probability is 0

7.1.2 正态分布的定义与特征 / Definition and Characteristics of Normal Distribution

正态分布是最常见的连续概率分布之一,其概率密度函数(PDF)呈钟形曲线。正态分布由两个参数完全描述:均值μ和方差σ²,记作X ~ N(μ, σ²)。

The normal distribution is one of the most common continuous probability distributions, with a bell-shaped probability density function (PDF). It is completely described by two parameters: the mean μ and the variance σ², denoted as X ~ N(μ, σ²).

正态分布的概率密度函数为:

The probability density function of the normal distribution is:

\[ f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \]

正态分布的主要特征包括:

The main characteristics of the normal distribution include:

  1. 对称性 / Symmetry: 曲线关于均值μ对称。
    The curve is symmetric about the mean μ.
  2. 单峰性 / Unimodality: 曲线只有一个峰值,位于均值μ处。
    The curve has a single peak at the mean μ.
  3. 渐近性 / Asymptotic Behavior: 曲线向两侧无限延伸,但永远不会接触x轴。
    The curve extends infinitely in both directions but never touches the x-axis.
  4. 面积性质 / Area Property: 曲线下的总面积等于1。
    The total area under the curve is equal to 1.
  5. 拐点位置 / Inflection Points: 曲线在μ±σ处有拐点。
    The curve has inflection points at μ±σ.
7.1.3 经验法则(68-95-99.7法则) / Empirical Rule (68-95-99.7 Rule)

正态分布的一个重要性质是经验法则,它描述了数据在均值附近的分布情况:

An important property of the normal distribution is the empirical rule, which describes how data is distributed around the mean:

• 约68%的数据位于均值±1个标准差范围内:\( P(\mu - \sigma \leq X \leq \mu + \sigma) \approx 0.68 \)

• Approximately 68% of the data lies within ±1 standard deviation of the mean: \( P(\mu - \sigma \leq X \leq \mu + \sigma) \approx 0.68 \)

• 约95%的数据位于均值±2个标准差范围内:\( P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.95 \)

• Approximately 95% of the data lies within ±2 standard deviations of the mean: \( P(\mu - 2\sigma \leq X \leq \mu + 2\sigma) \approx 0.95 \)

• 约99.7%的数据位于均值±3个标准差范围内:\( P(\mu - 3\sigma \leq X \leq \mu + 3\sigma) \approx 0.997 \)

• Approximately 99.7% of the data lies within ±3 standard deviations of the mean: \( P(\mu - 3\sigma \leq X \leq \mu + 3\sigma) \approx 0.997 \)

注意 / Note

经验法则是一个近似规则,在实际应用中非常有用,可以快速评估数据的分布情况。

The empirical rule is an approximate rule that is very useful in practical applications, allowing for quick assessment of data distribution.

7.1.4 实例分析 / Example Analysis
例7.1.1 / Example 7.1.1

一家工厂生产的金属销直径服从正态分布N(10, 0.04),其中单位为毫米。计算以下概率:

The diameters of metal pins produced by a factory follow a normal distribution N(10, 0.04) in millimeters. Calculate the following probabilities:

  1. 销的直径在9.6到10.4毫米之间的概率
    The probability that a pin's diameter is between 9.6 and 10.4 millimeters
  2. 销的直径大于10.3毫米的概率
    The probability that a pin's diameter is greater than 10.3 millimeters

解 / Solution:

已知均值μ = 10,方差σ² = 0.04,因此标准差σ = 0.2

Given mean μ = 10, variance σ² = 0.04, so standard deviation σ = 0.2

1. 9.6 = 10 - 2(0.2) = μ - 2σ,10.4 = 10 + 2(0.2) = μ + 2σ

根据68-95-99.7法则,约95%的数据在μ±2σ范围内

因此,P(9.6 ≤ X ≤ 10.4) ≈ 0.95


1. 9.6 = 10 - 2(0.2) = μ - 2σ, 10.4 = 10 + 2(0.2) = μ + 2σ

According to the 68-95-99.7 rule, approximately 95% of data lies within μ±2σ

Therefore, P(9.6 ≤ X ≤ 10.4) ≈ 0.95


2. 10.3 = 10 + 1.5(0.2) = μ + 1.5σ

由于正态分布的对称性,我们知道:

• P(μ - σ ≤ X ≤ μ + σ) ≈ 0.68 → P(X ≤ μ + σ) ≈ 0.84

• P(μ - 2σ ≤ X ≤ μ + 2σ) ≈ 0.95 → P(X ≤ μ + 2σ) ≈ 0.975

1.5σ位于1σ和2σ之间,通过线性近似估算:

P(X > 10.3) = P(X > μ + 1.5σ) ≈ 0.067


2. 10.3 = 10 + 1.5(0.2) = μ + 1.5σ

Due to the symmetry of the normal distribution, we know:

• P(μ - σ ≤ X ≤ μ + σ) ≈ 0.68 → P(X ≤ μ + σ) ≈ 0.84

• P(μ - 2σ ≤ X ≤ μ + 2σ) ≈ 0.95 → P(X ≤ μ + 2σ) ≈ 0.975

1.5σ is between 1σ and 2σ, using linear approximation:

P(X > 10.3) = P(X > μ + 1.5σ) ≈ 0.067

重要提示 / Important Note

上述例子中使用的是经验法则的近似值。在实际应用中,我们需要使用标准正态分布表或计算器来获取更精确的概率值。

The above example uses approximate values from the empirical rule. In practical applications, we need to use standard normal distribution tables or calculators to obtain more precise probability values.

7.1.5 正态分布的重要性 / Importance of Normal Distribution

正态分布在统计学中占据中心地位,这主要归功于以下几个原因:

The normal distribution occupies a central position in statistics, mainly due to the following reasons:

  1. 广泛适用性 / Wide Applicability: 许多自然和社会现象都近似服从正态分布,如身高、体重、测量误差等。
    Many natural and social phenomena approximately follow the normal distribution, such as height, weight, and measurement errors.
  2. 中心极限定理 / Central Limit Theorem: 在大样本情况下,许多统计量的分布会趋近于正态分布。
    For large samples, the distribution of many statistics approaches the normal distribution due to the Central Limit Theorem.
  3. 数学便利性 / Mathematical Convenience: 正态分布具有良好的数学性质,便于进行理论分析和计算。
    The normal distribution has good mathematical properties, making it convenient for theoretical analysis and calculation.