
480 statistics: descriptive
Classifying distributions
A bell-shaped distribution has a single peak and
is approximately symmetrical about both sides of
the peak. A uniform distribution exhibits an equal
number of measures in each category, and J-shaped
and reverse J-shaped distributions exhibit increasing
and decreasing trends. If a distribution has a single
peak, but is not symmetrical, then it is called either
positively or negatively skewed. If a distribution has
two distinct peaks then it is called bimodal. (For
example, incidence of broken limbs relative to age is
bimodal, since accidents tend to occur more fre-
quently among children and the elderly than they do
the rest of the population.)
2. Measures of Central Tendency: A measure of “cen-
tral tendency” is a single measurement that, in some
sense, is typical of the entire data set. It represents
the approximate “center” of the frequency distribu-
tion. There are four measures commonly in use.
a. The mean or average, usually denoted by the
Greek letter μ, is found by summing together all
the data values and dividing by the total number
of measurements. It is equivalent to the
ARITH
-
METIC MEAN
of the data values.
For example, the mean of the four data values
4, 5, 8, 8 is: . If in another
study the value 6 occurs 37 times and the value 9
occurs 20 times, the mean is:
≈7.05. In general, if a data value x1appears f1
times, the data value x2a total of f2times, and so
on down to the data value xnappearing a total of
fntimes, then the mean is given by:
The sum of differences of each data value from
the mean is always zero. For example, for the
first data set presented above, we have (4 – 6.25)
+ (5 – 6.25) + (8 – 6.25) + (8 – 6.25) = 0.
The mean is the most commonly used measure
of central tendency. (See also
EXPECTED VALUE
.)
b. The mode is the value in the data set that occurs
most often. For example, from the 10 data values
3, 6, 5, 3, 1, 6, 5, 3, 8, 3 the mode is 3, and for
4, 5, 8, 8 the mode is 8. A distribution might
have more than one mode if two or more scores
occur an equal number of times.
The mode is used when the most typical case
of a study is desired. For nonnumerical data, the
mode is the only measure of central tendency
available.
c. The median is the middle value of a sequence of
data values, once they are arranged in order from
smallest to largest. For example, the median of
the data set 3, 3, 5, 6, 7, 16, 16, 19, 37 is 7. If the
data set contains an even number of entries, then
the average of the middle two values is taken as
the median. For example, the median of 4, 5, 8, 8
is = 6.5.
The median is useful for finding the value at
the center of the distribution. It divides the data
set into two equally sized groups.
d. The midrange of a data set is found by taking the
average values of the smallest and largest data
values that occur. For example, the midrange of
the data set 4, 5, 8, 8 is = 6.
The midrange provides a quick estimate to a
central value. It is easy to compute, but is highly
affected by extremely low or high values in the
data set.
3. Measures of Dispersion: In order to interpret how
well a measure of central tendency is likely to repre-
sent an entire group of data values, statisticians
must also compute a measure of dispersion. Data
values clustered around a central value can be well
4 + 8
––
2
5 + 8
––
2
μ
=×+×++×
+++
fxfx fx
ff f
nn
n
112 2
12
L
L
μ
=×+ ×37 6 20 9
57
μ
=+++=
4588
4625.