Professional Documents
Culture Documents
Example 2.3 :
20
Marks x
i
5 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
Total N
frequencies n
i
2 1 2 2 2 1 1 2 2 1 1 18
Relative frequency f
i
Cumulative Frequency
B .Grouped frequency distribution
When the number of distinct data values in a set of raw data is large more than 20, a
simple frequency distribution is not appropriate, since there will be too much
information, not easily assimilated.
In this case, a grouped frequency distribution is used. A grouped frequency distribution
organizes data items into groups or classes of values, each showing how many items have
values included within the group, known as the class frequency. The number of classes is
usually between 5 and 15
Definitions associated with frequency distribution classes
a) Class limits : are the lower and upper values of the classes;
b) The lower class limit represents the smallest data value that can be
included in the class;
c) The upper class limit represents the largest data value that can be
included in the class;
d) Class boundaries: are the lower and upper values of a class that mark
common points between classes. These classes are used when there are
the closed intervals.
e) Class width (or length): is the difference between the lower and upper
class boundaries. If all class intervals of a frequency distribution have
equal widths, this common width is denoted by C in such case C is equal
to the difference between two successive lower class limits or two
successive upper class limits. Class width = Upper boundary lower
boundary;
21
f) Class mark or class mid- point: the class midpoint
m
X
is obtained by
adding the lower and upper class limits and dividing by 2, or adding the
lower and upper boundaries and dividing by 2
lower boundary + upper boundary
2
lower limit +upper limit
2
m
m
X
or
X
4
2, 5 Yule's rule. K N
or
1 Herbert Sturge proposed
2
1 log k N +
22
4) The first classs boundary of the frequency distribution equal lowest value of series
-
2
h
The last classs boundary of the frequency distribution equal the first class boundary +
Hk
The completed frequency distribution is:
Class
limits
Frequency Cumulative
Frequency
Relative
Frequency
Percentage
Total
EXCLUSIVE AND INCLUSIVE CLASS-INTERVAL
Class-interval of the type ( ) : ( , ) x a x b a b < <
are called exclusive (opened) since they
exclude the upper limit of the class. The following data are classified on this basis.
Income 50-100 100-150 150-200 200-250 250-300
No.of
persoms
88 70 52 30 23
In this method, the upper limit of one class is the lower limit of the next class.
Class intervals of the type { } [ ] : , x a x b a b < <
are called inclusive since they include
the upper limit of the class. The following data are classified on the basis.
Income 50-99 100-149 150-199 200-249 250-299
No.of
persoms
60 38 22 16 7
However, to nsure continuity and to get correct class-limits, exclusive method of
classification should be adopted. To convert inclusive class-intervals into exclusive, we
have to make an adjustment.
23
Adjustment: find the difference between the lower-limit of the second class and upper
limit of the first class. Divide it by 2, subtract the value so obtained from all the lower
limits and add the value to all upper limits. In the above example, the adjustment factor is
100 99
0.5
2
+ + + +
+ + + +
Where:
The symbol ( Geek capital letter "sigma")stands for summation: it means
the total of";
x represents any particular value of an observation;
x
is the sum of all values in the sample or population;
N represents the total number of observations in the population;
n refers to the number of observations in the sample.
Assume that the data are obtained from samples unless otherwise specified.
Example 1: Find the arithmetic mean (the average) of the numbers 8, 3, 5, 12, and 10.
Solution: in this data set,
1 2 3 4 5
8, 3, 5, 12, 10 x x x x x
, n = 5
Then
8 3 5 12 10 38
x 7.6
5 5
+ + + +
49
3. The mean of a simple( discrete ) frequency distribution
The mean for a simple frequency distribution is calculated using the following
formula:
k
j
j=1
1 1 2 2 3 3 x
1 3 3
1
x
x x
x + x x ... x
Mean, x
n
j
k
k
k
j
j
f
f f
f f f f
f f f f f
f
+ +
+ + +
Where
X represents values
f represents frequencies
f
is the total frequency or the total number of observations ( n)
fx
refers to the sum of each value x times its frequency f
Example : calculate the arithmetic Mean of the marks of 46 students given in the
following table.
Table 3.1 Frequency of marks of 46 students
Marks ( X) Frequency ( f ) fx
9
10
11
12
13
14
15
16
17
18
1
2
3
6
10
11
7
3
2
1
9
20
33
72
130
154
105
48
34
18
Total 46 623
50
The total of all these values (
fx
) = 623
Total number of observations ( n) = 46
Therefore, the arithmetic mean of the marks of 46 students is,
623
13.54
46
fx
x
n
and
x
are calculated by
x x
and x =
N n
f f
Where f is the frequency, x the mid-point of the class interval and n the total number of
observation.
Procedure of finding the Mean of grouped frequency distribution
Characteristics of the Arithmetic Mean
1. Make a table as shown.
Class interval Frequency( f) Midpoint (x) of
class interval
f.x
2. Find the midpoints of each class
3. Multiply the frequency by the midpoint for each class
4. Find de sum of the frequency
f
of each class times the class midpoint X.
4. Divide the sum obtained by the sum of the frequencies.
51
Example 1: Calculate the arithmetic mean of the following data:
Table 3.2 shows profit per shop
Profit in N.of shops( f) Mid-point of
Class interval
f.x
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330
Total 100 2800
The mean profit is:
2800
28
100
fx
n
Example 2: The following data relates to the number of successful sales made by the
salesmen in a particular quarter.
Number of sales: 0- 4 5 9 10 14 15- 19 20 24 25 29
Number of salesmen 1 14 23 21 15 6
Calculate the mean number of sales
Answer:
Number of sales
( class interval)
Number of
Salesmen (f)
Class midpoint ( x) ( fx)
0 to 4 1 2 2
5 to 9 14 7 98
10 to 14 23 12 276
15 to 19 21 17 357
20 t0 24 15 22 330
25 to 29 6 27 162
Totals 80 1225
1225
80
1225
15.3
80
fx
f
fx
x
f
B. THE MEDIAN
The median is generally considered as an alternative average to the mean
53
The value of the variable which divides the distribution so that exactly half of the
distribution has the same or larger values and exactly half has the same or lower values is
called the median.
1. The median for ungrouped data
The median of a set of data is the middle value that separates the higher half from the
lower half of the data set after they have been ordered from the smallest to the largest, or
the largest to the smallest.
Procedure for obtaining the median of a set of data:
order the given data from the smallest to the largest or the largest to the
smallest;
Select the middle point.
Example: Find the median of the following five observations
1 2 3 4 5
x 10, x 15, x 6, x 12 and x 11
Solution: We must:
1. order the given numbers from the smallest to the largest: 6, 10, 11, 12, 15
2. Select the middle point: the middle value is 11.
Therefore the Median (MD) = 11
Note 1.When a set of data contains an even number of items; there is no unique middle or
central value. The convention in this situation is to use the mean of the middle two items
to give a median
.
Example 1: Find the median of the following six observations:
1 2 3 4 5 6
x 10, x 15, x 6, x 12 and x 11, x 17
Solution: As before arrange all the values of the observations in numerical order:
6, 10, 11, 12, 15, 17
54
Evidently there is no middle value. However two numbers lie in the middle: 11 and 12.
The two must be added together and divided by 2; thus obtaining their average:
11+12
MD = 11.5
2
Example 2: calculation of the median for the data given in table 3.1
Solution:
Arranging all the 24 values in ascending order of magnitude, we get the following data:
2.90 3.57 3.73
2.98 3.61 3.75
3.30 3.62 3.76
3.43 3.66 3.76
3.43 3.68 3.77
3.45 3.71 3. 84
3.55 3.72 3.88
The 12
th
value is 3.66 and 13
th
is 3.68; the median is the average of these two.
Median =
3.66 3.68
3.67 %
2
g
+
Note 2. For a set with an odd number ( n) of items, the median can be precisely identified
as the value of the
1
2
n
th
+
item. Thus in a size ordered set of the 15 items, the median
would be the
15 1
8th item along.
2
th the
+
2. Median for a simple frequency distribution
Where there is a large number of discrete items in a data set, but the range of values is
limited, a simple frequency distribution will probably have been compiled.
The median for a simple frequency distribution is calculated by the following formula:
MD =
1
2
f +
55
Where
f
;
2. Form a F ( cumulative frequency) column;
3. Find that F value which first exceeds
1
2
f +
;
4. The median is that x value corresponding to the F value identified in 3.
Example: calculate the median for the following distribution of delivery times of orders
sent out from a firm.
Delivery time (days) 0 1 2 3 4 5 6 7 8 9 10 11
Number of orders 4 8 11 12 21 15 10 4 2 2 1 1
Answer
STEP 1 The median is the
1
2
N
th
+
=
91 1
2
th
+
= 46
th item
STEP 2 The F Column is shown in the following table:
Delivery time Number of orders
(Days) orders cum
( x ) ( f ) ( F)
0 4 4
1 8 12
2 11 23
3 12 35
4 21 56
5 15 71
56
6 10 81
7 4 85
8 2 87
9 2 89
10 1 90
11 1 91
STEP 3 The first F value to exceed 46 is F = 56
STEP4 The median is thus 4 (days)
3: Median for a grouped frequency distribution
There are two methods commonly employed for estimating the median for a grouped
frequency distribution.
a) using an interpolation formula;
b) by graphical interpolation
a) Estimating the median by formula
Given a grouped frequency distribution, the best that can be done is to identify the
class or group that contains the median item. From there, using cumulative
frequencies and the fact the median must lie exactly one half of the way along the
distribution.
The formula for calculating the median for a grouped distribution is:
Median =
2
.
N
F
L c
f
_
+
,
57
Where
lower bound(limit) of the median class ( the class contains the middle
item of distribution)
sum of frequecies of all classes lower than the median class
= median class widt
L
F
c
,
+ +
Procedure for estimating the median by formula
The procedure for estimating the median (by formula) for a grouped frequency
distribution is:
1. Form a cumulative frequency (F) Column;
2. Find the value of
N
( where N = ).
2
f
58
3. Find that F value first exceeds, which identifies the median class M.
4. Calculate the median using the following interpolation formula:
2
.
N
F
L c
f
_
+
,
Example: Estimate the median for the following data, which represents the ages of a set
of 130 representatives who took part in a statistical survey.
Age in years 20 and 25 and 30 and 35 and 40 and 45 and
Under 25 under 30 under 35 under 40 under 45 under 50
Number of 2 14 29 43 33 9
Representatives
Answer
1.
Age ( years) Number of representatives ( f) ( F )
20 and under 25 2 2
25 and under 30 14 16
30 and under 35 29 45
35 and under 40 43 88
40 and under 45 33 121
45 and under 50 9 130
2.
130
65
2 2
N
3. The median class is the class that has the first F greater than 65. Here, it is 35 to 40.
4. The median can now be estimated using the interpolation formula.
59
35; 43; 5
2
Thus, median = .
65-45
= 35 + 5
43
= 37.33
Median = 37.33years
L F c
N
F
c
f
_
,
_
,
b) Estimating the median graphically
A percentage cumulative frequency curve (or ogive ) is drawn and the value of the
variable that corresponds to the 50% point is read off and gives the median estimate.
Procedure for estimating the median graphically
1. Form a cumulative ( percentage ) frequency distribution
2. Draw up cumulative frequency curve by plotting class upper bounds against
cumulative percentage frequency and join the points a smoth curve.
3. Read off 50% point to give median.
Properties of Median
1. The median is particularly useful where :
a) a set or distribution has extreme values present and
b) Values at the end of a set or distribution are not known. This means that
median is used for an open ended distributions.
2. The median can be determined for all levels of data except nominal
3. the median is unique; there is only one median for a set of data
The advantages of the median
The advantages of the median are:
it is not affected by extremely large or small values ;
60
it is easily understood ( i.e half the data are smaller than the
median and half are greater);
it can be calculated even when the last class is open ended and
when the data ere qualitative rather than quantitative;
The disadvantages of the median
It does not use much of the information available;
It requires that observations be arranged into any array, which is time
consuming for a large body of ungrouped data.
C. THE MODE
1. Definition
The mode is the value of the observation that appears most frequently, or equivalently
has the largest frequency. Especially, the mode is used in describing nominal and ordinal
levels of measurement
It is possible for data not to have any mode at all; like in a case where observations occur
with equal frequency.
Example:
The mode of the set 2, 1, 3, 3, 1,1, 2, 4 is 1, since this value occurs most often.
For the data in table 3.1 is 3.76 this observation is most commonly occurring
The mode of the following simple discrete frequency distribution :
X 4 5 6 7 8 9 10
f 2 5 21 18 9 2 1
Is 6, since this value has the largest frequency
2. The mode for grouped data
For a grouped frequency distribution, the mode cannot be determined exactly and so must
be estimated. The technique used is one of interpolation. There are two methods that can
be used to estimate the mode:
Using an interpolation
61
Graphically, using a histogram.
Mode of a grouped frequency distribution by formula
An estimate of the mode for a grouped frequency distribution can be obtained using the
following procedure:
1. Determine the modal class ( that class which has the largest frequency)
2. Calculate D
1
= difference between the largest frequency and the frequency
immediately preceding it.
3. Calculate D
2
= difference between the largest frequency and the frequency
immediately following it.
4. Use the following interpolation formula:
Interpolation formula for the mode
1
1 2
D
Mode = L+ .
D
C
D
_
+
,
Where: L = lower bound of modal class
C = modal class width
And: D
1,
D
2
are as described above in 2 and 3
Example 1: Estimate the mode of the following distribution of ages.
Age (years) 20-25 25-30 30-35 35-40 40-45 45-50
Number of employees 2 14 29 43 33 9
Answer:
Age (years) number of employees
20 and under 25 2
25 and under 30 14
30 and under 35 29
35 and under 40 43
40 and under 45 33
45 and under 50 9
62
D
1
= 43 29 = 14
D
2
= 43-33 = 10
The lower class bound of the modal class, L = 35
The class width of the modal class, C = 5 (from 35 to 40 )
1
1 2
Thus: mode= .C
14
= 35+ .5
14+10
mode = 37.92 years
D
L
D D
_
+
+
,
_
,
Graphical estimation of the mode
The graphical equivalent of the above interpolation formula is to construct three
histogram bars, representing the class with the highest frequency and the ones on either
side of it, and to draw two lines. The mode estimate is the x value corresponding to the
intersection of the lines.
Example 2: Estimation of the mode of a frequency distribution using the graphical
formula.
Using the data of ex 1:
Age (years) number of employees
30 and under 35 29
35 and under 40 43
40 and under 45 33
Draw the graph
63
The advantages of the mode
The mode has the advantage of not being affected by extremely
high or low values;
It is easily understood ( half the data are smaller than the median
and half are greater), not difficult to calculate and can be used
when the last class of a distribution is open ended;
The mode is used for al levels of data: nominal, ordinal, interval,
and ratio.
The disadvantages of the mode
The disadvantages of the mode are:
The mode does not use much of the information available;
For many sets of data, there is no mode because no value
appears more than once. For example, there is no mode for this
set of price data: RWF250 , RWF 400, RWF 650 and RWF
1250 ;
The mode is not always unique. Example: suppose the ages of
the individuals in a scout Club is 14, 16, 17, 18, 18, 20, 20, 22,
24, 24, and 25. Both the ages 27 and 35 are modes.
In general, the mean is the most frequently used measure of central tendency and the
mode is the least used.
lowest value highest value
MR
2
+
Example:
D. THE MIDRANGE
The midrange is defined as the sum of the lowest and highest values in the data set,
divided by 2. The symbol MR is used for the midrange.
Find the midrange of these numbers: 2, 3, 6, 8, 4, and 1
64
1 8 9
MR 4.5
2 2
+
Then, the midrange is 4.5
The Relationship between the Arithmetic Mean, the Median and the Mode
In a symmetrical frequency distribution the mode, median, and mean are located
at the center and are always equal illustrates this for a normal distribution .Fig
(a ) in this case one of these measures may be used.
Mean
Median
Mode
If the distribution of the variable is not symmetrical, we have a skew distribution:
the arithmetic mean is not so typical of the distribution. In a positively skewed
distribution, the mean is not at the centre. The mean is dragged to the right of
centre by a few extremely high values of the variable that have been observed.
The median is generally the next largest measure in a positively skewed
frequency distribution. The mode is the smallest of the three measures. If the
distribution is highly skewed, the mean would not be a good measure to use. The
median and mode would be more representative.
mode median mean
65
In a negatively skewed distribution the mean is reduced by a few extremely low
values of the variable and hence will be left of centre. The median is greater than
the arithmetic mean, and the modal value is the largest of the three measures.
Again, if the distribution is highly skewed, the mean should not be used to
represent the data.
In a moderately skew distribution the following relationship holds approximately:
1. Mean - Mode= 3 (mean-Median);
2. Median mode = 2 ( mean median );
3. Median =
2 mean + mode
3
;
4. Mode = 3 median 2 mean ;
5. Mean =
3 median - mode
2
THE GEOMETRIC MEAN G
The geometric mean is useful in finding the average of percentages, ratios, indexes, or
growth rates. It has a wide application in business and economics because we are often
interested in finding the percentage changes in sales, salaries, or economic figures, such
as the Gross Domestic Product, which compound or build on each other.
The geometric mean G of a set of N positive numbers
1 2 3
, , ,...
n
x x x x
, is calculated using
the formula: Geometric mean=
1 2 3
...
n
n
x x x x
Where n is the number of observation made of the variable
x
and
1 2 3
, , ,...,
n
x x x x
are the
values of these observations.
Example: the geometric mean of the numbers 3, 25 and 45 is:
G =
3
3 25 45 =
3
3375
66
Mean median mode
THE HARMONIC MEAN H
The harmonic mean is another specialized measure of location used only in particular
circumstances; namely when the data consists of a set of rates, such as prices, speeds or
productivity.
The harmonic mean H of a set of N numbers
1 2 3
, , ,...
n
x x x x
, is the reciprocal of the
arithmetic mean of the reciprocals of the numbers:
H =
1
1
1 1 1
n
i i
n
x n x
Where n is the number of observations.
Example: the harmonic mean of the numbers 2, 4, and 8 is:
H =
3 3
3.43
1 1 1 7
2 4 8 8
+ +
The relation between the arithmetic, geometric, and harmonic means.
The geometric mean of a set of positive numbers
1 2 3
, , ,...
n
x x x x
is less than or equal to
their arithmetic mean but is greater than or equal to their harmonic mean. In symbols:
X H G
The equality signs hold only if all the numbers
1 2 3
, , ,...
n
x x x x
are identical.
Example: The set 2, 4, 8 has arithmetic mean 4.67, geometric mean 4, and harmonic
mean 3.43.
67
3.2 MEASURES OF DISPERSION
Dispersion refers to the variability or spread in the data. A small value for a measure of
dispersion indicates that the data are clustered closely, say, around the arithmetic mean. A
large measure of dispersion indicates that the mean is not reliable.
The most important measures of dispersion are:
1. Range is the difference between the largest and the smallest values in
a data. The range is the simplest of the three measures and is defined
now. The symbol R is used for the range.
R= Largest value smallest value
1. Find the range of the following distribution.
35, 45, 30, 35, 40, 25
R = 45- 25 = 20
2. Mean Deviation (MD) is the arithmetic mean of the deviations of the
observations from the arithmetic mean ignoring the sign of these
deviations.
a) The formula for the mean deviation for ungrouped data is
MD = for populations
X
N
MD =
for samples
X X
n
mean
Where:
X is the value of each observation;
X is the arithmetic mean of the values;
68
43 50 75 50 48 50 39 50 51 50 47 50 50 50 47 50
8
+ + + + + + +
7 25 2 11 1 0 3
8
6.5
+ + + + + +
Where f refers to the frequency of each class and X to the class midpoints.
69
Example: calculate the mean and the mean deviation of the number of sales
(see ex 4.2)
Table 1 Number of sales made by salesmen
Number of sales 0-4 5-9 10-14 15-19 20-24 25-29
Number of salesman 1 14 23 21 15 6
Table 2 Layout of calculations
Number of
sales
Number of
Salesman f
Mid-point
( x)
( fx)
x x f x x
0 to 4
4 to 9
10 to 14
15 to 19
20 to 24
25 to 29
1
12
23
21
15
6
2
7
12
17
22
27
2
98
276
357
330
162
13.3
8.3
3.3
1.7
6.7
11.7
13.3
116.2
75.9
35.7
100.5
70.2
Totals 80 1225 411.8
Mean number of sales,
1225
80
x
= 15.3
Thus, mean deviation, MD =
f x x
f
=
411.8
80
= 5.1 sales
70
Characteristic of the mean deviation
a. The mean deviation can be regarded as a good representative measure
of dispersion that is not difficult to understand. It is useful comparing
the variability between distributions of like nature.
b. Its practical disadvantage is that it can be complicated to calculate if
the mean is anything other than a whole number.
c. Because of the modulus sign, the mean deviation is virtually
impossible to handle theoretically and thus is not used in more
advanced analysis.
3. Variance is the arithmetic mean of the squared deviations from the
mean. The variance is nonnegative and is zero only if all observations
are the same. The population variance
2
(the Greek letter sigma
squared) and the sample variance
2
s for ungrouped data are given
by:
( ) ( )
2
2
2 2
and
1
X X
x
s
N n
4. Standard deviation.
71
The population standard deviation and sample standard deviation s are the positive
square roots of their respective variances.
a) For ungrouped data:
( ) ( )
2
2
and s =
1
X X
X
N n
Exemple 1 (for ungrouped data): calculate the variance and standard deviation of the
following table
Table 3.3 Haemoglobin values ( g%) of 26 Normal Children
11.8 12.9 12.4 13.3 13.8
11.4 12.3 11.7 12.9 12.2
10.4 10.8 12.7 13.2
11.6 12.0 12.2 14.2
10.8 10.5 11.6 13.5
12.2 11.2 12.6 13.0
Table 3.4 calculation of standard Deviation and variation for the data of table 3.3
Serial No Haenoglobin values Deviation from
Aritm.mean 12.2
Square of deviation
1
2
3
4
5
6
7
8
9
11.8
11.4
10.4
11.6
10.8
12.2
12.9
12.3
10.8
- 0.4
- 0.8
- 1.8
- 0.6
- 1.4
0.0
0.7
0.1
-1.4
0.16
0.64
3.24
0.36
1.96
0.0
0.49
0.01
1.96
72
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
12.0
10.5
11.2
12.4
11.7
12.7
12.2
11.6
12.6
13.3
12.9
13.2
14.2
13.5
13.0
13.8
12.2
-0.2
-1.7
-1.0
-0.2
-0.5
-0.5
0.0
-0.6
0.4
1.1
0.7
1.0
2.0
1.3
0.8
1.6
0.0
0.04
2.89
1.00
0.04
0.25
0.25
0.0
0.36
0.16
1.21
0.49
1.00
4.00
1.69
0.64
2.56
0
Total 0 2540
Arithmetic mean is 12.2
Standard deviation = S =
( )
25.40
1 25
x x
n
S =
2
1.016 1.01 g%
variance = s 1.016
Example: calculation of Variance and Standard Deviation for Data of table 3.2
Example1: calculation of variance and standard deviation for Data of table 3.2
Protein
intake/consumption
( class interval)
N.of
families
Frequencies
( f)
Mid-point
Of class
Interval (x)
Deviation
Of mid-
point
from
arithmetic
Mean
( )
x x
Squared
Deviation
( )
2
x x
Frequency
sq
Deviation
( )
f x x
15-25 30 20 -27.5 756.25 22687.5
25-35 40 30 -17.5 306.25 12250.0
35-45 100 40 -7.5 56.25 5625.0
45-55 110 50 2.5 6.25 687.5
55-65 80 60 12.5 156.25 12500
65-75 30 70 22.5 506.25 15187.5
75-85 10 80 32.5 1056.25 10562.5
Total 400 79500
Arithmetic mean = 47.5
From this table, we get
( )
2
400
f x x
f
For the example given in table 3.1, the standard deviation, s = 1.01 and the arithmetic
mean
12.2 x
, the coefficient of variation is
1.01 100
8.28%
12.2
For the example given in table 3.2, the standard deviation, s = 14.10 and the arithmetic
mean 47.5, x the coefficient of variation, therefore, is
14.10 100
29.68%
47.5
A student scored 65 on a calculus test that had a mean of 50 and standard deviation of 10;
she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare
her relative positions on the tests.
Solution
First, find the z scores. For calculus the z score is.
65 50
1.5
10
X X
z
s
For history the z score is:
30 25
1.0
5
z
Since the z score for calculus is larger, her relative position in the calculus class is higher
than her relative position in the history class.
Note that if the z score is positive, the score is above the mean. If the z score is 0, the
score is the same as the mean. And if the z score is negative, the score is below the mean.
B. Quartiles
76
Quartiles divide the distribution into four equal parts (quarters).
The value of the variable for which the cumulative frequency is
4
N
is called the first
quartile or lower quartile and it is denoted by
1
Q
.
Similarly, the value of the variable for which the cumulative frequency is
3
4
N
is
called the third quartile or upper quartile and it is denoted by
3
Q
.
Cleary median is the second quartile and it can be denoted by
In the case of ungrouped data with n items
1
Q
is calculated as follows.
Let ( )
1
1
4
i n
1
+
1
]
= integral part of ( )
1
1
4
n +
Let ( ) ( )
1 1
1 1 .
4 4
q n n
1
+ +
1
]
Hence q is the fractional part.
Then ( )
1 1 i i i
Q x q x x
+
+
where similarly ( )
3 1 i i i
Q x q x x
+
+
( ) ( ) ( )
3 3 3
1 and 1 1
4 4 4
i n q n n
1 1
+ + +
1 1
] ]
In the case of grouped frequency distribution the quartiles are calculated by using the
formula:
1
4
N
F C
Q L
f
_
,
+
is called the lower quartile
2
2
N
F C
Q L
f
_
,
+
is the median
3
3
4
N
F C
Q L
f
_
,
+
is called the upper quartile
Where L is the lower limit of the class in which the particular quartile lies, f is the
frequency of this class, C is the width of the class and F is the cumulative frequency of
the preceding class.
77
C. Deciles
Similarly, Deciles are the values of the variables which divide to the frequency into 10
equal parts.
Consider a frequency distribution with total frequency N. The value of the variable for
which the cumulative frequencies are
( ) 1, 2,..., 9
10
iN
i are called deciles. The ith decile is denoted by
i
D
.
Clearly median is the fifth decile. Hence the median can also be denoted by
5
D
.
In the case of the ungrouped data with n items for k = 1, 2, 3, , 9.
( )
1 k i i i
D x q x x
+
+
Where
( ) ( ) ( ) 1 1 1
and
10 10 10
k n k n k n
i q
+ + + 1 1
1 1
] ]
For a grouped frequency distribution, we have
10
; ( 1, 2,..., 9)
i
iN
F C
D L i
f
_
,
+
D. Percentiles
Percentiles are the values of the variables which divide to the frequency into 100 equal
parts denoted by
1 2 99
, ,... . P P P
and the ith percentile is denoted by
i
P
.
Cleary median is 50
th
percentile and hence median can also be denoted by
50
P
.
In the case of ungrouped data with n items, for k = 1, 2, 3, 99
( )
1 k i i i
P x q x x
+
+
Where
( ) ( ) ( ) 1 1 1
and q
100 100 100
k n k n k n
i
+ + + 1 1
1 1
] ]
Percentiles are got from the following formulae in the case of grouped frequency
distribution.
100
; 1, 2,..., 99
i
iN
F C
P L i
f
_
,
+
78
ILLUSTRATIVE EXAMPLES
1. Find the median and quartiles of the heights in cm. of eleven students given by
66, 65, 64, 70, 61, 60, 56, 63, 60, 67, 62.
Solution: Arranging the given data in ascending order of magnitude we get
56, 60, 60, 61, 62, 63, 64, 65, 66, 67, 70.
Here n = 11. Since n is odd, median is the sixth item which is equal ton 63.
( )
( )
1
1
Size of 1 item.
4
1
11 1 3
4
th
Q n +
+
1
Q
= third item = 60
( )
3
3
1 item 9 item = 66
4
th
Q n th +
2. Find the median and quartile marks of 10 students in statistics test whose marks are
given as
40, 90, 61, 68, 72, 43, 50, 84, 75, 33.
Solution: Arranging in ascending order of magnitude we get
33, 40, 43, 50, 61, 68, 72, 75, 84,90.
Here n = 10. Since n is an even, median is the average of the two middle items: 61 and
68.
Median = ( )
1
61 68 64.5 marks.
2
+
First quartile
Here ( ) ( ) ( )
1 1 1
1 2 and 1 1 0.75
4 4 4
n q n n
1 1
+ + +
1 1
] ]
( ) ( )
1 2 3 2
.75 40 .75 43 40 42.5 Q x x x + +
Third quartile
( ) ( ) ( )
( ) ( )
3 8 9 8
3 3 3
1 8 and 1 1 0.25
4 4 4
0.25 75 0.25 84 75 77.25
n q n n
Q x x x
1 1
+ + +
1 1
] ]
+ +
79
3. Find the lower quartile, median, upper quartile, 4
th
decile and 60
th
percentile of the
following data.
Marks 0-4 4-8 8-12 12-14 14-18 18-20 20-25 25&above
No.of
student
10 12 18 7 5 8 4 6
Solution
Marks No.of student Cumulative frequency
0-4
4-8
8-12
12-14
14-18
18-20
20-25
25 &above
10
12
18
7
5
8
4
6
10
22
40
47
52
60
64
70
70 N f
i) Median =
2
C N
L F
f
_
+
,
Here
70
35, Median class is 8-12, L 8, C 12 8 4, F 22, 18
2 2
N
f
Median = ( )
4
8 35 22 10.89
18
+
Here
70
17.5 4, C 4, 12, F 10
4 4
N
L f
ii) Lower quartile
1
4
C N
Q L F
f
_
+
,
( )
1
4
4 17.5 10 6.5
12
Q +
80
iii) Upper quartile:
3
3
4
C N
Q L F
f
_
+
,
3 3 70
52.5 18, C 20 18 2, 8, F 52
4 4
N
L f
( )
3
2
18 52.5 52 18.125
8
Q +
iv) 4
th
Decile is
4
4
10
C N
D L F
f
_
+
,
Here
4 280
28 8, C 4, 18, F 22
10 10
N
L f
( )
4
4
8 28 22 9.33
12
D +
V) 60
th
percentile is
60
P
which is given by
60
60.
100
C N
P L F
f
_
+
,
Here
60 60.70
42 12, C 14 12 2, 7, F 40
100 100
N
L f
( )
60
2
12 42 40 12.57
7
P +
81