← 返回第3章导航

3.1 Histograms

直方图 - 分组连续数据的可视化表示

一、核心知识点

1. 适用场景

用于展示分组连续数据,可直观呈现数据的分布位置、形状和离散程度。

2. 面积与频率的关系

直方图中条形的面积与该组频率成正比,因此可用于组距不等的分组数据展示。

3. 频率密度计算

\(\text{频率密度} = \frac{\text{频率}}{\text{组宽}}\),直方图纵轴为频率密度。

4. 频率多边形

连接每个条形顶端中点的折线,用于展示数据分布趋势。

5. 区间频数估算

通过计算"频率密度×组宽"(即条形面积)来估算某区间内的频数。

二、例题

Example 1

200 students were asked how long it took them to complete homework, with data summarised below:

Time, \( t \) (minutes) \( 25 \leq t < 30 \) \( 30 \leq t < 35 \) \( 35 \leq t < 40 \) \( 40 \leq t < 50 \) \( 50 \leq t < 80 \)
Frequency 55 39 68 32 6

a) Draw a histogram and a frequency polygon to represent the data.

b) Estimate how many students took between 36 and 45 minutes to complete their homework.

解答

(a)绘制直方图与频率多边形

先计算组宽频率密度

  • \( 25 \leq t < 30 \):组宽\( 30-25=5 \),频率密度\( \frac{55}{5}=11 \)
  • \( 30 \leq t < 35 \):组宽\( 35-30=5 \),频率密度\( \frac{39}{5}=7.8 \)
  • \( 35 \leq t < 40 \):组宽\( 40-35=5 \),频率密度\( \frac{68}{5}=13.6 \)
  • \( 40 \leq t < 50 \):组宽\( 50-40=10 \),频率密度\( \frac{32}{10}=3.2 \)
  • \( 50 \leq t < 80 \):组宽\( 80-50=30 \),频率密度\( \frac{6}{30}=0.2 \)

以"Time, \( t \) (min)"为横轴,"Frequency density"为纵轴绘制直方图;再连接每个条形顶端中点,得到频率多边形。

(b)估算36-45分钟的频数

将区间分为两段计算面积(频数):

  • 第一段:\( 36 \leq t < 40 \),组宽\( 40-36=4 \),频率密度\( 13.6 \),面积\( 13.6×4=54.4 \)
  • 第二段:\( 40 \leq t < 45 \),组宽\( 45-40=5 \),频率密度\( 3.2 \),面积\( 3.2×5=16 \)
  • 总频数:\( 54.4 + 16 = 70.4 \)(约70人)

Example 2

A histogram displays data from 100 people on how long they took to complete a word puzzle (in minutes).

a) Why should a histogram be used to represent these data?

b) Write down the underlying feature associated with each bar in a histogram.

c) Given 5 people completed the puzzle between 2 and 3 minutes, find the number of people who completed it between 0 and 2 minutes.

解答

a) 因为时间是连续数据,直方图适用于展示连续数据的分布。

b) 直方图中每个条形的面积与该组频率成正比

c) 2-3分钟区间对应25个小方格,代表5人,因此1个小方格代表\( \frac{5}{25}=0.2 \)人。0-2分钟区间对应20个小方格,故频数为\( 20×0.2=4 \)人。