DATA ANALYSIS PROCESS: MEASURES OF CENTRAL TENDANCY

DATA ANALYSIS PROCESS:
MEASURES OF CENTRAL TENDANCY, MOCT

These are measures or statistics that indicate the center of data or locations
of a distribution of a variable X. The variable may be continuous or
discrete. These measures are generally referred to as averages. Averages,
especially the mean, are values that are psychologically appeasing to clients.
For instance, one is appeased when he or she finds out that his or her height
or weight or whatever measurement, is average or in the normal range.
The list of these averages, by category, is in summary as follows.
(i) Means: Arithmetic mean, x ; (ii) Other means: Harmonic, h m;
Geometric, g m; Weighted, w m and Root mean square (r m s).
(iii) Median, x%and Mode, x .
(iv) Data partitions: Quartiles : Q1 , Q2 , andQ3 ; Deciles, Di; and Percentiles, Pi.
The data partitions locations subdivide the data array into respective equal
portions. The portions are respectively, four, ten or one hundred. The
simplest illustration this is the portions of Norman distribution resulting
from the use of quartiles locations, as shown below.
Portion
P1
P2
P3
P4
Q1
Q2
Q3
For a normal distribution the locations of the averages coincide, but if it is
not normally distributed (not symmetrical or skewed) only three of the
averages, the mean, median and mode are used to describe the distribution .
The data is said to be skewed and the following illustration indicate the
locations of the averages in both positive and negative skewness cases. The
illustrations are very useful when describing the distribution or sketching the
distribution as on page 10 of this document under skewness.
Coincidence
~
x
Positively skewed data
x%/ x / x
Symmetrical data
~
x
Negatively skewed data
To understand the computation formulae of these averages the following

notation techniques and concepts, as detailed below, will be very useful.
Subscript notation, (on x i or f i )
Let the symbol xi (read as x subscript i) denote the n values of x:
x1, x2 , x3 , x4 ,...xn assumed by the random variable x. Thus, xi x1 , x2 , x3 ,...xn .
The subscript indicates the location of the value in the arranged data.
Summation notation, ()
The notation is referred to as a summation notation. A complete
n
summation includes limits. I.e.
. For instance,
i1
x denotes the sum of all

i 1
the values of x from i = 1 to i = n. I.e. xi x1 x2 x3 ... xn .

i 1
Sample range
The raw values of any given investigation are initially not in any order. They
form what is referred to as raw data or scores. When the data are arranged in
either ascending or descending order they constitute an array.
Member of an array are distinguished by a star in the subscript notation
discussed earlier. In this case, an array of a random variable x is denoted by
xi* x1* , x2* , x3* ,..., xn* . And, its sample range, R = x* n x*1.
The summation theory summary
This summary has been found to be useful in derivation of statistical
formulae.
(i)
C = C +C +C ++C (N terms of C) = NC.
(ii) XY = XY and XX = X2.
(iii) CX = CX, or 1/CX = 1/C (Division by C)
(iv) (X +Y) = X +Y (AX +BY) = AX + BY.
(v) X = N. ( From the formula for the mean)
(vi) (X X0)/N = X0 + X (Used in assumed mean formula)
(vii) (X - X ) 2 = X2 -N X 2 = X2 (X) 2 /N = SS x x.
(viii) (X- X )(X - Y ) = XY (X) (Y)/N = SS x y.
2
Applications of the summation theory in means formulae

There are two aspects of application of the theory, a described below.
Discrete data aspects
Arithmetic sample mean ( x ) computation
This mean is simply referred to as the mean, x of the data, given by,
x
1
x1 x2 x3 ... xn . It is a balance point of all the values (fair).
n
For example the mean of $12, $8, $25, $26, and $10 is
x $
12 8 25 25 10 81
$16.2 . Its value does depend on array position.
5
5
Merits and demerits of the arithmetic mean

This mean is the simplest average, popular, familiar and easy to calculate. It
is regarded as the true or fairest representative of the data; it takes into
account every value of the data. But, in the case of extreme values at one end
it can result into an average that is not a representative of the data and
therefore not useable in practice. E.g. the mean of the wages: $159, $ 138,
$141, $148, $148, $146, $147 and$ 252, is $161, which is not a true
representative of this data. This is attributed to one extreme value of $252.
It can not be calculated when one of the values is missing or the extreme
classes are open. In such cases, the median value is a better representative of
the group or sample. List at least 3 merits and demerits of this mean?
Secondly, the arithmetic sample mean, x can lead to wrong conclusions
when the details for the data from which it was calculated is not given. For
example, the performance of two students, X and Y in a course, where their
scores for 3 exams are as shown below, may be seen to give the same
average of 50%, yet from the details of the data it is evident that the
performance of X was on an improvement trend, while that Y was on a
decline.
Exam/Student
I
II
III
Average
X
30%
50%
70%
50%
Y
70%
50%
30%
50%
It has also been found that this mean is a stable or reliable average because,
the means of many samples drawn from the same population usually do not
fluctuate or vary as widely as other statistics used to estimate for the
population mean, . I.e. xi . Due to this reason it s referred to as a stable
or reliable average.
3
Characteristic of the arithmetic mean

The main characteristic of the mean is that the sum of the deviations, d from
it is zero. Deviation refers to the difference or distance between the values of
the data or variable X and its mean value x . I.e. (x- x ) = 0 or d = 0.
Using a number line this could be illustrates by the use of two values of the
variable as shown below.
d`
d``
x
x1
x2
As per the definition of the mean, for a given value x, the x is at the center
of the data giving a sum of the two deviations from it to be d = d` + d`` =
0. This simply confirms the fact that for any given data d = 0.
Note that, the term deviation may be analogously associated with the
deviation of a planes flight direction to a different one and the deviation of
darts or fired missiles from the given target. I.e.
Deviation from target
Target
The calculation of the deviations from the mean of 1, 2, 3, 4, 5, 6, 7, 8, and
9; could be used to illustrate this characteristic, using the table shown
below.
In this case the mean, x = 5and it is also a balancing point of the data.
x
1
2
3
4
5
6
7
8
9
d
d =(x- x )
-4 -3
-2
-1
0
1
2
3
4
0
This characteristic of d = 0 makes it impossible for these deviations to be
directly used in calculating any statistic. As such, the sum of
deviations from an arbitrary value, x0 that is close to the mean is used.
This value is referred to as the assumed mean. In using it the first
formula for the mean that is obtained is also referred to as the
assumed mean or d1 coding formula. These formulae are discussed in
details under the computation of grouped data arithmetic mean section
of this paper.
Weighted mean ( wm )
This is a mean that involves some weighting wi of the values of x.
I.e. wm w1 [ w1 x1 w2 x2 ..... wn xn ] , where w = w1 + w2 ++ w n.
For example in computing the average of all sales completed by three sales
representatives who sales: A-an average of $86.42 from 24 sales, B- $112.91
from 37 sales and C- $104.22 from 25 sales.
4
The weighted mean is given by,

wm
24 86.42 37 112.91 25 104.22

24 37 25
$102.99 .
Geometric mean ( g m )
This is a mean that is generally given by g m n x1 x2 x3 ... xn .
In business it is used to average proportional increases or decreases. (Rate of
growth/decay) For example, if students enrollment in a course, on a yearly
basis was: 84, 97,116, and 129, the g m 4 84 97 116 129 105.081168 . To
compute proportional rise in number, we find pi then gm multipliers as,
p1 =
97 84
0.155 ,
84
p2 =
116 97
1.96 ,
97
and p3 =
129 116
0.112 ;
116
then each
years multiplier, given as (1+p): 1.155, 1.196, and 1.112. Using the
multipliers geometric mean formula, g m multipliers 3 1 p1 1 p2 .....1 pn to
give g m multipliers 3 1.155 1.196 1.112 1.153823333 1.154 .The actual
average rise is r = [gm m -1], giving r 15.4%. Note that, the gm of the
proportions does not give the correct answer for the average rise. The use of
multipliers is the accepted way; avoid
g m 3 0.155 0.196 0.112 0.150407189 .
Alternatively, multipliers of the form: (100 + pi %) could be used to get r.
Self check exercise
If it is known that the price of an item has increased by 6%, 13%, 11%, and
15% in each of the four successive years, find the geometric rise of the price.
Work out the geometric multipliers first, after which you compute r. (11.2%)
(iii) The mean of the rise or fall in value within a period is also given by,
gm A
Vn
1.
V0
For example, the average rise in population of a remote area,
X, from 2 in 1990 to 22 in 2000 is given by g m A 10
22
1 0.270981615 .
2
Compute the yearly rate of increase of subscribers to satellite TV companies,

if there were 9.19 million in 1998 and 54.87 million in 2,008. (19.56%)
Harmonic mean ( hm )
It is another specialized measure of location which is used in particular
circumstances; namely when the data consists of a set of rates, such as prices
($/kilo), speeds (Kmph), productivity (output/man hour), and air time rates.
It is defined as,
hm
n
1
n
1 1
1 . *Use x-1 function to calculate it.
...
x1 x2
xn
For example, the mean of the speeds of a KQ plane that flies to different
destinations at 100,200, and 300 Km per hour is given by,
hm
3
7
163 ; 163.64
1
1
1
Km per hour
11
100 200 300
Compute the Harmonic mean for (i) 2, 4, and 8 (3.43) (ii) 2, 4, and 6.
(3.27) (iii) Ksh15/min, Ksh12/min and Ksh7/min. (10.24)
The root mean square value (r m s)
This is also referred to as the quadratic mean. This is denoted by,
rms
1
xi 2 . For example, the r m s of: 1,3,4,5 and 7 is given by,
n
rms
1
1 32 42 52 72 2 5 4.472135955 4.47 units.
5
At this point it will be useful to compare x , g m , and hm of 2, 4 and 8.

Use the inequalities sign to relate the three means. ( x g m hm )!
Median value ( x%)
This is the value that is at the center of the values of an array.
~ * n 1
For data with odd size, n, the median is given by x x 2 .
x* n x* n 1
~
2 . Thus, the
For data with even size, n, the median is given by x 2
2
median lies between the two values at the middle of the array. E.g. the
median of 1, 2, 3, and 4 is between 2 and 3; which is 2.5.
Self check exercise
What makes the median of the numbers: 3, 4, 4, 5, 6, 8, 8, 8, and 10 to be the
5th number in the array with a value of 6? (n is odd)
Why is the median of 5, 6, 6, 6, 6, 7, 7, 8, 9, and 10 given as 6.5? (n is even)
Modal value ( x )
This is the value that occurs with the greatest frequency. For example, in the
case of the values: 80, 40, 40, 30, 50, and 40 the mode is 40. However, in the
cases where the frequencies are the same we have no mode. E.g. 17, 18, 35,
43, 42, and 45, have no modal value.
Grouped data aspects
Computation of arithmetic mean for grouped/ frequency data

The grouped or frequency data formula for the arithmetic mean is
fx
f
i
. The following coding formulae may also be
used.
Assumed mean, x0 and coding formulae

Using an arbitral value x0 that is close to the mean, d1 coding calculation is done using the formula:
x x0
f x x x fd
n
f
0
given by, x xo
fd
n
, where d1 = (x x0) and n
c , where d 2
f . The d2 -coding formula is
x x0 ; and c is a common divisor or GCD of the d1
column. State the d3 coding method formula { x
x0
fd
1
, d3 x x0 c }.
c
Note that, the coding methods are used to reduce the BULKINESS or remove large decimal expression
values used in in the calculation to manageable whole number ones. The d3 coding method is especially
used in enlarging tiny values expressed with large decimal places to manageable whole number values. To
decide on the coding method to use, between d2 and d3, one must work out the d1 column first, then
examine the pattern of the values obtained to see if there is a common divisor, GCD to be used in d2 or a
common multiplier to be use in the d3 option. An example where d2 coding is decided on is shown below.
Let x0 = 25 and c =10.
Class
0-10
10-20
20-30
30-40
40-50
50-60
Mid-mark (x)
5
15
25
35
45
55
Frequency (f)
12
18
27
20
17
6
f = 100
d1=(x-x0)
-20
-10
0
10
20
30
d2 = d1/c
-2
-1
0
1
2
3
fd2
-24
-18
0
20
34
18
fd2 = 30
(28 units)
In the above table it is noticeable that d1 column values pattern has a GCD, c = 10, giving a d2 = d1/c. This
is why the d2 coding is applicable in this table. Try: The example on circular bolts diameters, shown
overleaf. Which of the codings would be appropriate in computing the mean of their diameters? Give a
reason for your answer, the table format and how you decide on the value of c to use in the tabulation.
Circular bolts diameters summary
Diam.
0.9747
0.9750
0.9753
0.9756
0.9759
0.9762
0.9765
0.9768
0.9771
0.9774
0.9777
0.9780
0.9749
0.9752
0.9755
0.9758
0.9761
0.9764
0.9767
0.9770
0.9773
0.9776
0.9779
0.9782
15
42
68
49
25
18
12
Freq.
Exercise
Use a suitable table, based on coding method, to find the arithmetic mean of the following data:
X
Y
20-25
2
25-30
14
30-35
29
35-40
43
40-45
33
45-50
9
Check your results using the assumed mean formula (or coding d1).
Grouped data formula for the geometric and harmonic means

In this perspective the table used has the format shown below.
Xf
1/X
f 1/X =f/X
Log X
f Log X
20
1/3
6.66
0.477
9.54
40
1/5
8.00
0.699
27.96
30
1/7
4.29
0.845
25.35
10
1/9
1.11
0.954
9.54
100
f log X 72.39
20.06
In this case the frequency data computation formulae used are respectively described as follows.
Using,
hm
f
f
x
hm
100
4.98 In the case of g m, the g m = anti-log [
20.06
This formula may also be presented as,
g m 10
f log x
n
. Thus, gm =
10
72.39
100
f log x ].
n
5.295414984 5.295
Notice that for the sake of accuracy the f/x and f x log x may be just straight on added on the calculator.
Find the geometric and harmonic means of: X:
f:
2 3
10
20 40
5
50
30 25 20
Also, find the r m s of the same data.

MEDIAN, MODEL, QUARTILE, PERCENTILE AND DECILE GROUPED DATA FORMULAE
n / 2 f
1
c . Note, that (n+1)/2 is used for odd sum of frequency, f.
f
med
Median, ~
x = L1
1.
1
c , where the d1 are differences in frequencies.
= L1
Model values, x
d
1 d2
2.
3.
Quartile values,
2n / 4 f
n / 4 ( f )1
1
c
c Q2 L1
f
f
Q1
Q2
Q1 L1
Write an equation of each of the following: Q3 , P4 and D10. Note that the median formula is a guide or
basis for writing of theses formulae. Also, the use of (n+1)/2 rule on odd size discrete data does not apply.
Exercise
1. Compute the median and modal value of:
(i)
(ii)
X
Y
90-100,
9
80-89,
32
70-79,
43
60-69,
21
50-59,
11
10-19,
20-29,
30-39,
40-49,
50-59
10
12
20
40-49,
3
30-39
1
2. Find the Q1 and Q3 for the data in 1 (i) determine the 1st percentile of the very data.
3. Compute the missing frequencies and then the arithmetic mean of the data shown overleaf
Class: 0-10 10-20 20-30 30-40 40Freq.: 14
27
15
Let f= 100, ~
x 24 , and x 24 .
4. Given n 1 = 20 and n 2 = 30 with 1 = 64 and 2 = 47 find the combined mean, of the two sets of values.
Measures of shape: Skew ness and Kurtosis

These measures have to be mentioned at this point because they are defined
using the measures of central tendency. They are measures that describe the
closeness of the shape of data to that of a symmetrical or normal
distribution, which is usually of a bell-shape. Statistically, a normal
distribution is associated with normal or natural occurrences or events. For
example, a distribution of measures of heights or weights of a large sample
of people when graphically presented gives a normal curve. On the other
hand, a measures shape may depart from this curve resulting into skew ness
or kurtosis. As such skew ness and kurtosis are described as the main
measures of shape. The criteria for the significance of the departure from
symmetry is the ratio of any of these measures to the respective standard
error, SE of the distribution. The most related SE is that of the distributions
, where is the parent population SD and n is

n
s
the size of the sample used. It may also be given by SE
, where s is the
n
of the means, give as, SE
sample SD. For a significant departure from symmetry s k / SE 1.96 or

k/ SE 1.96. The distribution is also said to have significantly departed
from the normal distribution trend.
Note that, a distribution may have only one mode (unimodal) or two
(bimodal) or more or none. Give examples of each. The following are the
formulae used in computing these measures and a detailed explanation of
each of them.
(i)
Skew ness
For a unimodal distribution skew ness is a departure of it from the normal
distribution shape or symmetry. The departure may result into a positive or
negative skew ness, as shown below.
10
~
x
x%/ x / x
Positively skewed data
Symmetrical data
~
x
Negatively skewed data
For a moderately skewe ness of a unimodal distribution the following

empirical relation between the three averages (mean, mode and median)
holds: Mean mode = 3(mean median). Thus, in moderate skew ness
the distance from the mean to the mode is usually three times the distance
from the mean to the median. This relation is a basis for estimation of any of
these averages whenever any of the two are given. Estimation is based on
making any of them as the subject for its value to be estimated. The
following is a list of the relationships arising from this exercise.
(i) median
(iii)
mean
2 mean mod e
3
, (ii)
mod e 3 median 2 mean ,
1
3 median mod e .
2
For example, the estimate the mode of a distribution with a mean of 52 and a
median of 54 is given by, x 3x 2 x% x 3 52 2 54 48 . x 48.
The obtained value and the other two can be used to sketch the skew ness of
the parent data. This is done by the use of the relative locations of the three
averages on the real number line, as indicated in the skew ness shown above.
In most cases, the mode is located at the hump or pile-up of the data and
the median is usually in between of the other two values.
Hence, skew ness, s k =
3(mean - median) x-x
(s.d)
. These are referred to as
Pearsons coefficient of skew ness formulae (1&2).
z , where
n
xx
Alternatively, sk
, or
sk
Q1 Q3 2 ~
x
,
Q3 Q1
Bowleys.
Kurtosis or convexity of curve

This is a measure of the peakedness of the distribution, as discussed earlier
under data organization. There basically three types of kurtosis namely,
Leptokurtic, Mesokurtic and Platykurtic. Use a sketch to show each of them.
Kurtosis is defined in terms of Quartiles and Percentiles as follows.
(ii)
1
Q
The Kurtosis, k = 2
, where Q is the semi-inter-quartile range given
P90 P10
11
by Q =
1
( Q 3 Q1 ) .
2
Alternatively,
3.
SOME APPLICATIONS OF THE AVERAGES

Arithmetic mean
Some of the applications of the arithmetic mean include,
a) Locating the centre of the data.
b) Summation mathematics and derivation of formulae
c) Combining and correction of means.
The following is a detailed explanation of each of theses applications.
a) Location of the centre of data
Of all the averages, the arithmetic mean, as indicated earlier, is the least
affected by the fluctuation of sampling. Due to this reason it is called a
stable or reliable average. It is seen as a dependant or reliable indicator of
the location of the center of symmetrical data. It is also the best
representative of the entire data. In a normal distribution, or symmetric
data case, the mean is located at the centre of the array of the data and it
coincides with the other averages namely, the median, the mode, the 5th
decile, D5; the 2nd quartile, Q2 ( x%, median); and the 50th percentiles, P50.
This provides the percentile method of computing the averages, where
Pi X*ni (as a location). E.g. Q1 P25 Q1 X *0.25n (n is size of data).
It therefore evident that:
-The mean can be calculated for any set of data, so it always exists. It is
not like the mode which does not exist when all the values of the data are
of the same frequency. This makes it a stable and reliable average.
- All the values are used in computing it, hence the best representative.
- A set of data has one and only one mean, so it is always unique.
Besides, the mean is the balance point of any given data, even when the
data is skewed. This is confirmed by its characteristic of d = 0 in P.4.
b) Summation mathematics
The arithmetic mean has been found to be very useful in summation
mathematics. This is further culminated into the main property of the
mean where (x - x ) = 0 or simply, d = 0. The following are some of
the summation generalizations that have been found to be very useful in
the derivation of statistical formulae. (i) x = n x , (ii) c x = c x , (iii)
(x c)/n = x c, (iv) 1/c x = x /c (a division by a constant, c).
Secondly, the use of assumed mean in coding methods and in
transformation and change of scale cases have been found to be very
useful. Coding methods includes d1, d2 and d3 methods. See coding
12
methods in calculating the mean. Generally, transformation is described

in terms of: If Y = BX X0, then Y BX X 0 , and Y B X .
Example: If the altitudes of four cities, above sea level in metres are,
h = {4,100, 5,500, 6,900, 6,700}, compute the mean and SD of h in feet
when observation is now at 4,000 metres above sea level. (5,905.51)
c) Combining and correction of means
The arithmetic mean lends itself to further statistical treatments like the
combining of means or and the correction of the calculated mean. The
other averages are difficult to use in such cases. The following is
detailed account of each of these applications of the mean.
i. Combining the means
This application is useful when one has several mean values to combine
into one average value, like combining the average wages of different
branches of a given industry or company. The theory used is based on
the definition of the arithmetic mean, x
the
xi nx
x and the fact that

i
. Thus, the combined mean, x is the sum of all the values of
k groups or samples, divided by their number,
n n1 n2 n3 ... nk .
Hence, in combining the means of two sets of data, x
n1 x1 n2 x2
.
n1 n2
Illustration
The mean weight of 25 male students in a class is 64kgs, the mean
weight of 35 female students in the same class is 58kgs. Find the
combined mean weight of the class.
Solution
Let the sizes of the two genders be, n1 = 25 and n2 = 35 and their
average weights to be, x1 64 and x2 58 .
Using the combined mean formula, x
x
25 64 35 58 3,630
60.5kgs.
25 35
60
n1 x1 n2 x2
we have,
n1 n2
Thus, the combined mean of the
class is 60.5 Kgs.
ii. Correction of the mean value

The correction of the value of the calculated mean is an application that
is used when incorrect values are discovered to have been used in the
computation. In this case, instead of re-doing the entire calculation the
sum of correct values, v c is added to the original summation of the
values, x nx , followed by a subtraction of that of incorrect values,
13
. This provides the corrected summation of the values, x c. Thus

the xc nx vc v ~c . After finding this value the corrected mean is
v~ c
calculated using the definition of the mean as xc
xc
Illustration
In computing the average price at which 200 items were sold by a
vendor it was initially found to be Ksh. 40. It was later discovered that
during the summation of the data the prices 43 and 35 had been misread
as 34 and 53. Use this information to compute the corrected mean of the
data.
Solution
Let the size of the items be n = 200, and the mean price x 40 .
Using the correction of the summation formula above,
xc 200 40 43 35 34 53 7,991 .
The corrected mean is given as xc
xc
xc
7,991
Ksh.39.955 .
200
The corrected mean price of the items is Ksh. 39.955.

Exercise
1. The number of the new Nokia 5130 that were sold in three separate
weeks, by a certain Safaricom shop were 475, 310 and 420 at
average prices of Ksh. 2,200, 2,500 and 3,000 respectively. Find
the combined average price of this mobile. (2,556.02)
2. Students average scores in two sections of Mth 2210 course with
class sizes of 32 and 48 were 78 and 84 points, respectively. What
is the overall average score of the two sections of the course? (81.6)
3. The average weekly wage in a certain firm is Ksh. 5,200 for male
workers and Ksh. 4,200 for female workers. The average wage for
all employees is Ksh. 5,000. Use this data to find the ratio of males
to female employees. (4:1)
4. A distribution consists of three components with frequencies of 300,
200, 600, and means of 16, 8, and 4, respectively. Find the
combined mean of the three components. (8)
5. The mean weight of 150 students in a given course is 55 Kg. The
14
mean weight of the male students is 70 Kg. Find the mean weight
of the female students. Hence, find the ratio of the two genders.
( 60, )
7. The average time IT students spent in the computer lab in two days
is 5 hrs. The rest of the students spend during the same time 4
hrs. If the combined average of these students stay in the lab is
5hrs, find the ratio of IT students to the rest of the students. (1:1)
8. (a) Two samples of sizes of 60, and 40 and means of 3 and 5 are put
together to see how their combined mean would be like. Find the
combined mean of the two samples. (3.8)
(b) If it is later realized that the means 3 and 5 were not the correct
means of the two sample and that the correct means were 6 and 4,
respectively. Compute the correct combined mean of the samples.
(5.2)
9. (a) In a review of same data of size n1 = 50 and a mean, x1 =30 it
was discovered that two of the values used, 19 and 18 had by
mistake entered as 16 and 28 respectively. Compute the
corrected mean of this data. (29.86)
(b) This data was combined with that of a second sample that was
of size, n2 = 60 and a mean x 2 = 25. What is the combined
mean of the two samples? (27.21)
Applications of the median, the mode and skew ness
A part from the indication of the centre of data, the two statistics have
the following uses.
a) The median
(i) Estimating for the mean and modal value of the data
The estimation is based on the moderate skew ness relationships
between three averages, discussed earlier under skew ness.
As mentioned earlier, for naturally occurring events, whose distribution
is normal or bell-shaped, respective averages coincide. For instance,
median, mode, quartile 2, Q2, decile 5, D5, and percentile 50, P50
coincide with the mean. They are said to be equivalent ().
However, for a skewed distribution they dont coincide. Instead the
mean, median and the mode are related by respective equations derived
earlier under skew ness. Hence, estimation of one of the averages given
15
any of the two is easily done using the relevant estimation relationship.
For example, given the mean is 27 and the mode is 30 units, the median
x 2/3 x + 1/3 x , ~
x
is estimated using the relationship, Median, ~
2/3 x 27 + 1/3 x 30 = 28. The estimated median, x%= 28. Using the
obtained results the skew ness of the data is described as negative.
(ii) The Median is a reference for commenting on skew ness
In using the obtained averages in commenting generally on the skew
ness of some given data, the median is used as a reference. In doing so
it takes the place of zero or origin on the number line, such that a value
that is located above it is said to be on the higher side and vice-versa.
For example, for a positive skew ness where the mean, median and
mode is given as 27.6, 26.03 and 23.31units, the related comments on
the skew ness may be as follows.
(i) The mean of the data used is slightly on the higher side, because its
value (27.6) is slightly greater than that of the median of 26.3 units.
(ii) Using the mode value of 23.31 it is evident that most of the values
used are on the lower side, because the mode value is less than that
of the median one.
The interpretation of this, in terms of salary, would be that most of the
workers are under paid, but on the average they are overpaid. This is
attributed to a few of the workers being slightly over paid. Draw a
skew ness sketch that clearly communicates these conclusions.
b) The mode
The mode is mainly use to indicate the value or class that occurs most
frequently. It is not based on all the values in an observation and it is
not affected by extreme values. It can also be obtained graphically. But
it is not capable of further mathematical treatment apart from the
estimation of the mean or the median, given the other averages.
It should be noted that the quartiles, deciles, and percentiles are
mainly used in dividing the data into the required portions. The
statistics like Kurtosis are also defined in terms of quartiles and
percentiles as discussed earlier under measures of shape.
c) Skewness and kurtosis
These are measures of shape that are use to describe collectively or in
summary the relative locations of the averages involved. The skew
ness indicate the locations of the mean, median and mode while
Kurtosis has to do with the semi-inter-quartile and percentile ranges.
For example a negatively skewed data has the mode on the higher side
16
and the mean (average) on the lower side and vice- versa. On the other
hand a leptokurtic data has shorter ranges than a platykurtic one.
Exercise
(a) List the merits of using skewness to give a summary
description of data over the use of the related averages.
(b) Describe wages or salaries that have a mesokurtic shape.
(c) Describe the locations of the averages associated with
skewness. Use a sketch to show their locations.
A SUMMARY OF MEASURES OF CENTRAL TENDENCY
The summary in tabular form is as shown below.
Category
List of the 4
categories
Examples of
computations
Most popular
data
representative
Arithmetic mean
and other means
Positional measure
Data
partitioner
Median, and Mode
Quartiles
Deciles and
Percentiles
Using 1, 2, 3 and Median = 2.5

4: (discrete data Mode =? (none)
case)
n
2 f 1
x = 2.5,
x% L1
c,
f med
h m = 1.92,
g m 2.2134.
for grouped data
cases.
Q1 P25
Using, Pi Xi n
Q1 X*.254
X*1
( X*1 +X*2)/2
1.5
REFERNCES
Francis A. (1998). Business Mathematics and Statistics, 5th Edition, Ashford Color press, Gosport,
Hants, UK.
Saleemi N..A., (1991) Business Calculations and Statistics Simplified, A text book for K.A.T.C. Paper
3 and Business Calculations Papers of other Examining Bodies. Saleemi N. A. Publishers, Nairobi,,
Kenya.
Thomas H. Wonnacott et al.,(1990), Introductory Statistics for Business and Economics, 4th Edition,
John Wiley and Sons, New York.
Douglas A. Lind et al., (2000), Basic Statistics for Business and Economics, 3rd Edition, The McGrawHill Companies, Inc.
17
Sukhminder Singh et al. (1991), Statistical Methods for Research Workers, 2nd Edition, Kalyani
Publishers, New Delhi, India.
Panneerselvam R. (2005), Research Methodology, Prentice-HALL of India Private Limited, New
Delhi.
Gupta S. C. and Kapoor V. K. (2002), Fundamentals of Mathematical Statistics, Sultan Chand & Sons,
Delhi.
18

DATA ANALYSIS PROCESS: MEASURES OF CENTRAL TENDANCY

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA ANALYSIS PROCESS: MEASURES OF CENTRAL TENDANCY

Uploaded by

Copyright:

Available Formats

DATA ANALYSIS PROCESS:

MEASURES OF CENTRAL TENDANCY, MOCT

Positively skewed data

Negatively skewed data

To understand the computation formulae of these averages the following

summation includes limits. I.e.

x denotes the sum of all

the values of x from i = 1 to i = n. I.e. xi x1 x2 x3 ... xn .

Applications of the summation theory in means formulae

Merits and demerits of the arithmetic mean

Characteristic of the arithmetic mean

The weighted mean is given by,

24 86.42 37 112.91 25 104.22

For example, the average rise in population of a remote area,

X, from 2 in 1990 to 22 in 2000 is given by g m A 10

Compute the yearly rate of increase of subscribers to satellite TV companies,

100 200 300

At this point it will be useful to compare x , g m , and hm of 2, 4 and 8.

Computation of arithmetic mean for grouped/ frequency data

. The following coding formulae may also be

Assumed mean, x0 and coding formulae

, where d1 = (x x0) and n

f . The d2 -coding formula is

x x0 ; and c is a common divisor or GCD of the d1

column. State the d3 coding method formula { x

Grouped data formula for the geometric and harmonic means

This formula may also be presented as,

Also, find the r m s of the same data.

Measures of shape: Skew ness and Kurtosis

, where is the parent population SD and n is

of the means, give as, SE

sample SD. For a significant departure from symmetry s k / SE 1.96 or

Positively skewed data

Negatively skewed data

For a moderately skewe ness of a unimodal distribution the following

mod e 3 median 2 mean ,

3(mean - median) x-x

. These are referred to as

Pearsons coefficient of skew ness formulae (1&2).

Kurtosis or convexity of curve

SOME APPLICATIONS OF THE AVERAGES

methods in calculating the mean. Generally, transformation is described

x and the fact that

. Thus, the combined mean, x is the sum of all the values of

k groups or samples, divided by their number,

Hence, in combining the means of two sets of data, x

Thus, the combined mean of the

class is 60.5 Kgs.

ii. Correction of the mean value

. This provides the corrected summation of the values, x c. Thus

calculated using the definition of the mean as xc

The corrected mean price of the items is Ksh. 39.955.

Median, and Mode

Using 1, 2, 3 and Median = 2.5

You might also like