Ines Descriptive Statistics Level I Asta 2010

INES- RUHENGERI
Faculty of Fundamental Applied sciences

Department of Applied Statistics

LECTURE NOTES
DESCRIPTIVE STATISTICS I

LEVEL I APPLIED STATISTICS, 2010
BY Ir. DANCILLE NYIRARUGERO, Tutorial Assistant
1
DESCRITIVE STATISTICS I
COURSE OBJECTIVE
At the end of this Course students must be knowledgeable about vocabulary, concepts,
and statistical procedures used in these studies.
Students may be called on to conduct research in their fields, since statistical procedures
are basic to research. To accomplish this, they must also be able to collect, organize,
analyze, summarize data and present data and communicate the results of the study in
their own words. Students must be also able to determine measures of central tendency,
measures of dispersion and position.
COURSE CONTENTS
Chapter 1: Introduction, Definitions and statistics vocabulary ;
Chapter 2: Frequency distributions and graphs: organizing data, histograms,
frequency polygons and ogives, other types of graphs;
Chapter 3: Data description: measures of central tendency, measures of
dispersion ( variation), measures of position;
Chapter 4: Exploratory data analysis: Box plot, Moments, Skweness and
Kurtosis, Contingency table, presentation and charts.
2
Bibliographie indicative:
1. Marcel AVELANGE : Statistique Descriptive, classe de 3
me
Ed. Sciences et
Lettres ; Lige, 197
2. DAGNERIE, P., Statistiques thorique et applique, T.1, De Boeck & Larcier s.a,
Paris, Bruxelles 199
3. Allan G. Bluman, Elementary Statistics,2004
4. MURRAY R. SPIEGEL, Ph. D., SCHAUMS OUTLINE OF Theory and Problems
of STATISTICS, 3eme Ed, 2008
5. Cottrell M, Genon-Catalot V, Duhamel C, et Meyre T. Exercices de probabilits.
Licence-Master-coles d'ingnieurs. Cassini, 200
6. Foata D et Fuchs A. Calcul des probabilits. Cours, exercices et problmes corrigs.
Dunod, 2003
7. DOMINICK SALVATORE, Ph. D. DERRICK REAGLE, Ph.D, SCHAUMS
OUTLINE OF Theory and Problems of Statistics and Econometrics, 2th Ed, New
York, 2001
8. Saporta G. Probabilits, analyse des donnes et statistique. Technip, 2006
9. ANDRE FRANCIS, Business Mathematics and Statistics, sixth edition, 2004
10. Douglas A. Lind, Statistical Techniques in Business & Economics, Twelfth
Edition, 2005
11. GEORGE K. KINGORIAH, Fundamentals of Applied Statistics, Nairobi, 2004
12. P.S.S. Sundar Rao and J. Richard, Introduction to Biostatistics and Research
Methods, 4
th
Ed, 2006;
13. GIARD VINCENT. Statistique Applique la gestion, 2
me Ed. Economica 2003 ;

14. Walder Masiri, Statistique et Calcul des Probabilits, 2001.
15. CB Gupta. Vijay Gupta, An Introduction to Statistical Methods, 23rd Revised
Edition , 2007;
16. Dr. P.K. Srimani & M. Vinayaka Moorthy, Probability & Statistics, 1
st
Edition,
Bangarore, 2000
3
CHAPTER 1: INTRODUCTION, DEFINITIONS AND STATISTICS
VOCABULARY
1.1 Introduction
Statistics refers to the collection, organizing, presentation, analyzing, and interpretation
of numerical data to make inferences and reach decisions in all branches of economics,
business, medicine, and other social and physical sciences.
A. Definition: the Meaning of Statistics
The word statistics has two meanings:
1. In plural sense, statistics is considered as a numerical description of quantitative
aspect of things. It stands for numerical facts pertaining to a collection of objects.
2. In singular sense, statistics means the science of collection, organization,
presentation, analysis and interpretation of numerical data to assist in making
more effective decisions.
The term statistics is used to mean either statistical data or statistical method.
When it used in the sense of statistical data it refers to quantitative aspect of things, and is
numerical description.
Every data is not statistics. It must fulfil certain essential characteristics to be called
statistics.
B. Branches of Statistics
Statistics is subdivided into two branches: descriptive and inductive or inferential.
(i) Descriptive statistics consists of the collection, organization, summarization,
and presentation of data in various forms such as tables, graphs and diagrams
or using a numerical summary. The purpose of descriptive statistics is to
display and pass on information from which conclusions can be drawn and
decisions made. Businesses, for example, use descriptive statistics when
presenting their annual accounts and reports.
4
(ii) Inferential or inductive statistics consists of generalizing from samples to
populations, performing estimations and hypothesis tests, determining
relationships among variables, and making decisions.
.
1.2 Characteristics of statistics are the following:
A. Statistics means an aggregate of facts.
Facts can be analyzed only when there are more than one fact. Single fact cannot be
analyzed.
Example: the weights of 60 students of a class can be statistically analyzed. But the
weight of one student cannot be called statistics.
Hence, only a collection of many facts can be called statistics.
B. Statistics are affected to a marked extent by multiplicity of causes
The facts are the results of action and interaction of a number of factors.
C. Statistics are numerically expressed.
Only numerical facts can be statistically analyzed. Therefore, facts such as Price
decrease with increasing production can not be called statistics.
5
Statistics
Describing data
Numerical summaries Visual display
Making inferences from samples
Estimating parameters Testing hypotheses
D. Statistics are enumerated or estimated according to reasonable standards of
accuracy.
The facts should be enumerated or estimated with required degree of accuracy. The
degree of accuracy differs from purpose to purpose.
E. Statistics are collected in a systematic manner.
The facts should be collected according to planned and scientific methods. Otherwise,
they are likely to be wrong and misleading.
F. Statistics are collected for a pre determined purpose
There must be a definite purpose for collecting facts. Otherwise, the facts become useless
and hence, they cannot be called statistics.
G. Statistics are placed in relation to each other
The facts must be placed in such a way that a comparative and analytical study becomes
possible. Thus, only related facts which are arranged in logical order can be statistics.
1.3 Functions of statistics
The following are the six important functions of the science of statistics:
i) To present facts in a precise and definite form (i.e., helps proper comprehension
and avoids ambiguity).
ii) To simplify mass of figures (i.e., condensing the mass of data).
iii) To facilitate comparison (by furnishing suitable devices). Statistics adds
precision to thinking.
iv) To help formulation and testing of hypothesis (by appropriate statistical tools).
Statistics helps in comparing different sets of figures. For example, the
imports and exports of a country may be compared among themselves or they
may be compared with those of another country.
v) To help in framing suitable policies and plans (i.e., in making predictions). It
guides in the formulation of policies and helps in planning. Planning and
policy making by the government is based on statistics of production, demand,
6
etc. it indicates trends and tendencies. Knowledge of trend and tendencies
helps future planning.
vi) To help in the formulation of policies (i.e., to provide the basic Material).
Statistics helps in studying relationship between different factors. Statistical
methods may be used for studying the relation between production and price
of commodities.
Limitations of statistics
Statistics deals with only those subjects of inquiry which are capable of being
quantitatively measured and numerically expressed.
This is an essential condition for the application of statistical methods.
1.4. Origin of statistics
The term statistics is linked to the notion of State from Latin STATUS which was
changed into Latin word statisticum. Statisticum was the activity of collecting data
which helped government to ensure knowledge about state income and possessions.
The history of statistics showed that the first census had been made in Sumerian
Kingdom (Babylone) around 3000 before J.C. In 2238 before Jesus-Christ,
agriculture survey had been done in Chine by King YAO. In 2500 before J.C. in
Egypt they had to collect data for taxes.
Statistics originated from two quite dissimilar field, games of chance and political
states. These two different fields are also termed as two disciplines
1
0
. Primarily analytical
2
0
. Secondarily essentially descriptive.
Some of pioneers of statistics are: Pascal (1623-1662), Bernouilli (1654-1705),
As regards the descriptive side of statistics it may be stated that statistics is as old as
statecraft.
Since time immorial men must have been compiling information about wealth and
manpower for purpose of peace and war. This activity considerably expanded at each
upsurge of social and political development and received added impetus in periods of
war.
7
The development of statistics can be divided into three stages: the empirical stage
(down to 1600), the comparative stage (1600-1800), the modern stage (1800 up to
day).
It has now become a useful tool and statistical methods of analysis are now being
increasingly used in biology, psychology, education, economics and business.
1.5. Statistics vocabulary
subject or individual is : an item for study;
Population or universe: a population consists of all subjects (the totalities
of all observations) that are being studied;
Statistical units: the individual subjects or objects upon whom the data
are collected.
Raw data: are collected data have not been organized numerical;
ARRAY: An array is an arrangement of raw numerical data in ascending
or descending order of magnitude;
Frequency: the frequency is the number of values in a specific class of
the distribution.
Variable: is a characteristic of the subject or individual which varies from
unit to unit. Example: height, weight, age, etc.,

1.6. Types of variables
There are two main types of variables: qualitative and quantitative.
A. Qualitative variable
A qualitative variable is one that, generally, cannot be expressed in numbers. It is an
attribute, and is descriptive in nature.
Example sex (male or female); state of birth, cause of death, religious (Catholics,
protestants, ect).
When the data are qualitative, we are usually interested in how many or what proportion
in each category. Qualitative data are often summarized in charts and bar graphs.
8
B. Quantitative variable
A quantitative variable is numerical and can be ordered or ranked.
Example: level of hemoglobin in the blood; age; heights, weights; body temperatures;
the number of children in a family.
A quantitative variable can be a discrete variable or a continuous variable.
Discrete variables assume values that can be counted and represented by an integer
such as 1, 2, 3, etc.
Example: number of children in a family, the number of rooms in a house, number of
patients in a hospital, etc.
Continuous variables can assume all values between any two specific values (within
an interval). They are obtained by measuring (ex: heights, weights, age, level of
protein in blood, etc.)
Figure 1.1: summary of the types of variables
9
Types of variables
Qualitative Quantitative
Gender
Color
Marital
status
Discrete Continuous
1. children in family
2. cows in a farm
3. patient in a
hospital
oAge
oWeight
oHeight
oTime
1.7 Levels of Measurement
Data can be classified according to levels of measurement. The level of measurement of
the data often dictates the calculations that can be done to summarize and present the
data. There are four levels of measurement: Nominal, Ordinal, Interval and Ratio.
a) Nominal-level data or nominal measurement
From Latin nomen meaning name, nominal data are the same as qualitative, attribute,
categorical, or classification.
With the nominal level, the data is classified into categories and cannot be arranged in
any particular order. EX: gender, eye color, Religions affiliation, marital status.
Nominal level variables must be: mutually exclusive and exhaustive.

- Mutually exclusive means an individual or object is included in only one category.
- Exhaustive means each individual or object must appear in a category.
To summarise, the nominal-level data have the following properties:
Data categories are mutually exclusive and exhaustive.
Data categories have no logical order.
Example: list of jobs in Rwanda, consumption in Rwanda
We usually code nominal data numerically. However, the codes are arbitrary
placeholders with no numerical meaning, so it in improper to perform mathematical
analysis on them.
Example: yes as 1. No as 2.
b) Ordinal-level data involves data arranged in some order, but the differences
between data values cannot be determined.
Example 1: when appreciating student dissertation we can have:
Superior, good, average, poor, inferior.
The data classifications are mutually exclusive and exhaustive.
Data classifications are ranked or ordered according to the particular trait they
possess.
10
c) Interval-level data or interval measurement
This kind of data is acquired through process of measurement where equal measuring
units are employed. The movement in magnitude between one measure to the one above
it or below it is identical in the subject population under consideration.
The data contains all the characteristics of nominal and ordinal data; the only one
difference being the scale of measurement that moves uniformly in equal interval in
which real number form can show several decimal places.
Example: temperature, shoe size.
Data classifications are mutually exclusive and exhaustive.
Data classifications are ordered according to the amount of characteristic they
possess.
Equal differences in the characteristic are represented by equal differences in the
measurements.
d) Ratio-level data or ratio measurement: Practically all quantitative data are the ratio
level of measurement. The ratio level is the "highest level of measurement. It has all the
characteristics of the interval level, but in addition, the o point is meaningful and the ratio
between two numbers is meaningful. Ex: Wages, Weight, etc.
Data classifications are mutually exclusive and exhaustive.
Data classifications are ordered according to the amount of the characteristics
they possess.
Equal differences in the characteristic are represented by equal differences in the
numbers assigned to classifications.
The zero point is the absence of the characteristic.

11
Levels of measurements
Figure 1.2: Summary of the characteristic for Levels of Measurement
1.8 Statistic Method
For the purpose the following, procedure may be adopted with advantages:
Collect data: information should be collected regarding
Organize the data obtained
Present this information by means of diagrams or other visual aids
Analyze the data above to determine the average, the extent of disparities that
exist.
To have an understanding of the phenomenon (interpretation of facts)
All this lead to a policy decision for improvement of the existing situation.
1.9 Collection of data
Statistics is concerned with the analysis of numerical data, so the first stage in statistical
method must be the collection of the data to be analyzed. Data can be collected in two
ways: first as primary data and second as secondary data.
a) Primary data
12
Nominal Ordinal Interval Ratio
Data may only
be classified
Data are ranked Meaningful difference
between values
Meaningful o point and
ratio between values
Type of residence
(rural, urban)
Rank in class Temperature Number of patients
Primary data is data which is collected by the investigator himself with a specific
objective. This means that primary data is original in character. Sources of primary data
are either censuses or samples.
Census
A census is the name given to a survey which examines every item of the population
Three important official censuses are the population census, the census of distribution
and the census of production.
A census has the advantages of completeness and being accepted and as representative,
but of course must be paid for in terms of manpower, time and resources.
Sample
A sample is a relatively small subset of a population with advantages over a census that
costs, time and resources are much less. Sample is used when it is impossible or
impractical to observe the entire group or population. The main disadvantage is that of
acceptability by layman.
b) Secondary data
Secondary data is data that has already been collected by some other investigator or
agency, and used by an investigator for his purpose.
As far as the investigator is concerned, the data he uses is from a secondary source, that
is, he did not collect it.
The prime example of secondary data is the official statistics that are published by the
Government: Financial statistics, Economic trends, etc.
The advantages of using secondary data are savings in time, manpower and resources in
sampling and data collection.

The dangers of secondary Data
13
If we have to use secondary data, there are dangers to be aware of:
(i) The data available may not be very up-to-date.
(ii) We do not necessarily know how the data were collected and analyzed or for
what reason. They may be biased because of poor collection techniques or
simply because they were collected for a different purpose.
(iii) We may not be able to find a complete set of data for our purposes in one
place. This could mean we would have to collate data from several sources
with the chance of making errors while doing so. Obtaining the data from
more than one source may also compound the chances of bias discussed in.
(iv) There is the distinct possibility of transcription or printing errors in published
data.
If you are using secondary data to support arguments in reports, articles or essays it is
advisable to try to find out more about how the data were collected and analyzed and why
they were collected. These mean that:
Before using secondary data it is necessary to scrutinize them in the light of the
following points:
(i) The type and purpose of the institution that publishes statistics as a routine;
(ii) The purpose for which the data are issued and the consumers to whom they
are addressed;
(iii) The nature of the data themselves. Are the data biased? Are the data samples
only or complete enumeration?
(iv) In what types of units are the data expressed? Are they the same at different
times, at different places, and for all cases at the same time or place?
(v) Are the data accurate?
(vi) Do the data refer to homogeneous condition?
(vii) Are the data germane to the problem under study?
14
1.10 Misuse of statistics
The figures themselves cannot mislead, but the statisticians who present the figures
certainly can.
Data can be misused in the following ways:
(i) They can be used for the wrong purpose, that is, one that is different from the
purpose for which they were collected.
(ii) They can be collected incorrectly so that they are biased
(iii) They can be analyzed carelessly so that the results obtained from them are
misleading.
.
1.11. Data classification
The data collected from the sample is generally referred to as the raw data, because it is
not arranged and organized into any format. Raw data conveys very little information to
the investigator or to anyone interested in that investigation. Therefore, the mass of
numbers must be classified.
Classifications are the process of arranging the available facts into in groups or classes
according to their resemblances, affinities and other relationships.
The main objectives of classifying data are:
1. To condense the mass of data into a concise format;
2. to bring out the relevant points of similarity and dissimilarity, and thus
facilitate comparison;
3. To make the statistical treatment of the data easy.
Types of classification
15
Generally, classification of data may be of the following types: spatial or geographical,
temporal or chronological, qualitative, and quantitative.
Spatial or geographical classification: this classification is based on space, that is,
geographical locations. For example, data on human population may be classified on the
basis of different continents or countries or states of a country or districts of a state or
towns and villages of a district.
Temporal or chronological: data are arranged on the basis of time (years, months, days,
hours, minutes and seconds).
Qualitative classification: data are classified on the basis of quality or attribute such as
sex, colour, behaviour, religion, marital status, literacy, etc.
Quantitative classification: the classification of data is done according to some variable
(characteristics) that may be measured, such as, height, weight etc., in this type of
classification there are two elements: the variable and frequency.
Classification of units on the basis of one variable is called simple or one-way
classification.
Simultaneous classification of units on the basis of two variables is called two way
classification. A table that presents the two way classification is called Contingency
table.
CHAPTER2: FREQUENCY DISTRIBUTIONS AND CHARTS
16
2.1 FREQUENCY DISTRIBUTIONS
2.1.1. Introduction
After collecting the data, the researcher must organize and present them so they can be
understood by those who will benefit from reading the study. The most convenient
method of organizing data is to construct a frequency distribution.
The most useful method of presenting the data is by constructing charts and graphs.
This chapter describes how to organize data by constructing frequency distributions and
how to present the data by constructing charts and graphs. The charts and graphs
illustrated here are histograms, frequency polygons, ogives, pie graphs.
2. 1.2 .Organizing Data
Before the data obtained from a statistical survey or investigations have been worked on,
they are called raw data. Since little information can be obtained from looking at raw
data.
The following table gives an example of a set of raw data.
Table 2.1 Marks in Statistics obtained by 20 Students of Level I STEA in 2004
Data as originally collected
15 18 7 12 17 9 13 14 12 14
16 11 10 8 9 16 13 14 10 8
In order to make the data easily understandable, the first task of the researcher is to
prepare an array ". The array is prepared by arranging the values of the variable in an
ascending or descending order. Data array give a general idea of distribution.
Example: the raw data of table 2.1 have been arrayed and are shown in table 2.2.
Table 2.2 Raw data of Table 1 put into an array
7 8 8 9 9 10 10 11 12 12
13 13 14 14 14 15 16 16 17 18
From this table, the highest and lowest marks are immediately seen and the marks
which occur most frequently are readily identified.
17
After arranging the data, their bulk must be condensed, reduced, and simplified so
that the mind comprehends them easily. A first step in such a condensation would be
achieved by representing the repetitions of a particular value of observation by tallies
instead of rewriting the value itself.
The number of tallies corresponding to any given values is the frequency of that
value and usually represented by the letter f. Frequency means thus the number of
times a certain value of the variables is repeated in the given data. A table so formed
is known as frequency distribution
In other words a frequency distribution is the organization of raw data in table form,
using classes and frequencies.
Statistical table
A statistical table presents numerical data in columns and rows. The main object of
statistical table is to arrange the physical presentation of numerical facts that the attention
of the reader is automatically directed to the information. Some of advantages statistical
tables are:
Tabulated data can be easily understood than facts stated in the form of
descriptions;
They facilitate quick comparison;
They leave a lasting impression;
They make easier the summation of items and detection of errors and
omissions;
A tabular arrangement makes it unnecessary to repeat explanations,
phrases and headings;
All unnecessary details and repetitions are avoided.
2.1.3. Types of frequency distributions
18
There are two types of frequency distributions: simple frequency distribution or one-
way table and grouped frequency distribution
A. Simple frequency distributions
A simple frequency distribution consists of a list of data values, each showing the
number of items having that value.
a) Quantitative
Variable X Frequencies
x
1
n
i
x
n
n
n
Total N
Example: There are data from a classroom marks in probability exam in 2005.
16, 14,5, 8, 15, 15, 9, 12, 10, 9, 11, 11, 10, 17, 12, 10,14,5
Table 2.3. Frequency distribution of the marks obtained by 18 students
Marks x
i
5 8 9 10 11 12 13 14 15 16 17 Total N
Tally marks II I II II III I I II II I I 18
frequencies n
i
2 1 2 2 2 1 1 2 2 1 1 18

Tally chart is used to record the occurrence of repeated values systematically
b) Qualitative
Example: The experience consists to know the number of students in Level I statistics in
2010 according to their sex. There are 45 students in Level I STA, then gender is coded
as G for girl and B for boy.
Table 2.4. Distribution of 45 students in Level I STA according to their sex clothes in
2009.
Tally
marks
frequency
n
i
B
19
G
Total
To convert a frequency distribution to relative frequency distribution, each the
frequencies is divided by the total number of observations.
When a relative frequency is multiplied by hundred it gives percentage. It is a
percentage distribution.
Cumulative frequency distribution is used when we require information on number of
observations whose characteristic is less than a given value. Data may be arranged in
such a way as to form a cumulative frequency distribution.
This is obtained by adding the numbers of observations in value cumulatively.
Cumulative distributions may be constructed for relative frequencies and percentages by
adding either the relative frequencies or the percentages in a cumulative way as has been
for absolute frequencies.
i
n
i
i-1
i
i
i
n
i
i-1
1 1
% *100, total of f % equal 100
f % 100
n
relative frequency :f = ,
N
total of f equal 1
f 1
cumulative frequency
or

i
i
p p
i i
i i
n
f
N
cum of ni n f

Example 2.3 :
20
Marks x
i
5 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
Total N
frequencies n
i
2 1 2 2 2 1 1 2 2 1 1 18
Relative frequency f
i
Cumulative Frequency

B .Grouped frequency distribution
When the number of distinct data values in a set of raw data is large more than 20, a
simple frequency distribution is not appropriate, since there will be too much
information, not easily assimilated.
In this case, a grouped frequency distribution is used. A grouped frequency distribution
organizes data items into groups or classes of values, each showing how many items have
values included within the group, known as the class frequency. The number of classes is
usually between 5 and 15
Definitions associated with frequency distribution classes
a) Class limits : are the lower and upper values of the classes;
b) The lower class limit represents the smallest data value that can be
included in the class;
c) The upper class limit represents the largest data value that can be
included in the class;
d) Class boundaries: are the lower and upper values of a class that mark
common points between classes. These classes are used when there are
the closed intervals.
e) Class width (or length): is the difference between the lower and upper
class boundaries. If all class intervals of a frequency distribution have
equal widths, this common width is denoted by C in such case C is equal
to the difference between two successive lower class limits or two
successive upper class limits. Class width = Upper boundary lower
boundary;
21
f) Class mark or class midpoint: the class midpoint
m
X
is obtained by
adding the lower and upper class limits and dividing by 2, or adding the
lower and upper boundaries and dividing by 2
lower boundary + upper boundary
2
lower limit +upper limit
2
m
m
X
or
X
Formulation of grouped frequency distributions

A tabulation of n data values into k classes called bins, based on values of data. The bin
limits are cutoff points that define each bin. Bins must have equal widths and their limits
cannot overlap.
1) calculate the range : highest value minus lowest value (W)
2) find number of classes (K) using following formula:
K number of classes is 2
K
N.
(rule of Sturge
1
)
4
2, 5 Yule's rule. K N
3) Calculate class interval or class widths ( lengths )

1
h
K
or
1 Herbert Sturge proposed
2
1 log k N +
22
4) The first classs boundary of the frequency distribution equal lowest value of series
-
2
h
The last classs boundary of the frequency distribution equal the first class boundary +
Hk
The completed frequency distribution is:
Class
limits
Frequency Cumulative
Frequency
Relative
Frequency
Percentage
Total
EXCLUSIVE AND INCLUSIVE CLASS-INTERVAL
Class-interval of the type ( ) : ( , ) x a x b a b < <
are called exclusive (opened) since they
exclude the upper limit of the class. The following data are classified on this basis.
Income 50-100 100-150 150-200 200-250 250-300
No.of
persoms
88 70 52 30 23
In this method, the upper limit of one class is the lower limit of the next class.
Class intervals of the type { } [ ] : , x a x b a b < <
are called inclusive since they include
the upper limit of the class. The following data are classified on the basis.
Income 50-99 100-149 150-199 200-249 250-299
No.of
persoms
60 38 22 16 7
However, to nsure continuity and to get correct class-limits, exclusive method of
classification should be adopted. To convert inclusive class-intervals into exclusive, we
have to make an adjustment.
23
Adjustment: find the difference between the lower-limit of the second class and upper
limit of the first class. Divide it by 2, subtract the value so obtained from all the lower
limits and add the value to all upper limits. In the above example, the adjustment factor is
100 99
0.5
2
The adjusted classes would then be as follows:

Income 49.5-99.5 99.5-149.5 149.5-199.5 199.5-249.5 249.5-299.5
No.of
persoms
60 38 22 16 7
Example: the following data show the height in millimeters for 106 maize plants
after 2 weeks.
129 148 139 141 150 148 138 141 140 146 153 141 148 138
145 141 141 142 141 141 143 140 138 138 145 141 142 131
142 141 140 143 144 135 134 139 148 137 146 121 148 136
141 140 147 146 144 142 136 137 140 143 148 140 136 146
143 143 145 142 138 148 143 144 139 141 143 137 144 133
146 143 158 149 136 148 134 138 145 144 139 138 143 141
145 141 139 140 140 142 133 139 149 139 142 145 132 146
140 140 140 132 145 145 142 149
Construct a grouped frequency distribution for the data.
Solution
The procedure for constructing a grouped frequency distribution for numerical data
follows:
1. Determine the classes intervals:
Find the highest value and lowest value: H = 158 and L = 121
Find the range: R = highest value lowest value = H L, so R = 158 121 = 37
Find the class width by dividing the range by the number of classes.
Width =
R 37
3, 7 4
number of classes 10

24
Find the lower limit of the first class of distributions by taking: the lower limit of
series -
width
2
=
4
121 119
2

The upper class limit of the first class = the lower limit + width = 119 + 4 = 123
Find the upper class limit, the high value of distributions by taking: the lower value
of distributions+ width* number of classes = 119 + 4 *10 = 159

The completed frequency distribution is:

Class
limits
Class mark
or
Mid point
Frequency Cumulative
Frequency
Relative
Frequency
Percentage
119 - 123
123 - 127
127 - 131
131 135
135 139
139 143
143 147
147 151
151 155
155 - 159
121
125
129
133
137
141
145
149
153
157
1
0
1
7
15
39
28
13
1
1
1
1
2
9
24
63
91
104
105
106
0.009
0.000
0.009
0.066
0.142
0.368
0.264
0.123
0.009
0.009
0.9
0.0
0.9
6.6
14.2
36.8
26.4
12.3
0.9
0.9
Total 106 1.000 100
Exercise
Example: construct a grouped frequency distribution of students of applied statistic Level
I in INES in 2010 according to: height, weight, age.
25
2.1.4. Two way Frequency Distribution ( Bivariate Frequency Distribution)
A two way Frequency Distribution is used when two variables are involved.
A two way frequency table has class intervals for one variable as columns and for the
other variables as rows. The boxes formed at the intersection of rows and columns thus
represent a joint class.
The column and row where are the total are named marginal distributions.
The others columns and rows are named conditional distribution.
The frequency of this joint class is the number of items that has the value of the first
variable in the class given by the column heading and the value of the second variable in
the class given by the row heading.
The method of constructing of the two way table consists of the following steps:
Determine the class intervals for each of the variables;
Place one of the variables at the top of the table and the other on the left hand
side;
Place each item in the approximate box;
Total the tallies in each box and in each row and column. The grand total of rows
and columns should check with the total number of items.
Example1: the following table shows the performance of students in two subjects:
statistics and Accountancy.
Roll number of students Marks in Statistics Marks in Accountancy
1
2
3
4
5
6
7
8
9
10
15
1
1
3
16
2
18
5
4
17
13
1
2
7
8
9
12
9
17
16
26
11
12
13
14
15
16
17
18
19
20
21
22
23
24
6
19
14
9
8
13
10
13
11
11
12
18
9
7
6
18
11
3
5
4
10
11
14
17
18
15
15
3
Construct a two way frequency for data, take class interval of two variables (Statistics
and Accountancy) as 1 5; 6 11, etc. Use of 4 classes of width 5 for each variable.
The Two way Frequency table for marks in Statistics and Accountancy is shown
as:
Statistics
ACC
1 - 5 6 - 10 11 - 15 16 -20 Total
1 - 5 2 3 1 6
6 - 10 3 2 2 6
11 - 15 1 4 2 7
16 - 20 1 2 5 5
Total 6 6 7 5 24
Example 2: The age of 20 husbands and wives are given below. Form a two way
frequency table showing the relationship between the ages of husbands and wives with
the class-intervals 20-24, 25-29, etc.
27
S. No. Age of
husband
Age of wife S. No Age of
husband
Age of wife
1 28 23 11 27 24
2 37 30 12 39 34
3 42 40 13 23 20
4 25 26 14 33 31
5 29 25 15 36 29
6 47 31 16 32 35
7 37 35 17 22 23
8 35 25 18 29 27
9 23 21 19 38 34
10 41 38 20 48 47
Solution
Frequency Distribution of Age of Husbands and Wives
Age of W
Age of H
20-24 25-29 30-34 35-39 40-44 45-49 Total
20-24 III 3
25-29 II III 5
30-34 I I 2
35-39 II III I 6
40-44 I I 2
45-49 I I 2
Total 5 5 4 3 2 1 20
Exercises
1. Prepare a two-way frequency table and marginal frequency tables for 25 values of
the two variables x and y given below. Take class interval of x as 10-20, 20-30,
etc., and that of y as 100-200, 200-300, etc.
x y x y
12 140 51 250
24 256 27 550
28
33 360 42 360
22 470 43 570
44 470 52 290
37 380 57 416
26 280 44 380
36 315 48 452
55 420 48 370
48 390 52 312
27 440 41 330
57 390 69 590
21 590
2. Prepare a bivariate frequency distribution for the following data:
Marks in Law Marks in
Statistics
Marks in Law Marks in
Statistics
10 20 13 24
11 21 12 23
10 22 11 22
11 21 12 23
11 23 10 22
14 23 14 22
12 22 12 20
12 21 13 24
13 24 10 23
10 23 14 24
2.2 . GRAPHIC REPRESENTATION OF A FREQUENCY DISTRIBUTION
After the data have been organized into a frequency distribution, they can be presented in
graphical form. It is easier to comprehend the meaning of data presented graphically than
data presented numerically in tables or frequency distributions.
The three most commonly used graphs in research are:
1. The histogram ;
2. The frequency polygon;
3. The cumulative frequency graph or ogive.
29
1. Histogram
A histogram is a graphic presentation of a frequency distribution, in which the classes
are marked on the horizontal axis and the class frequencies on the vertical axis. The class
frequencies are represented by the heights of the rectangle. Each rectangle represents just
one class; the rectangle width corresponds to the class width and the rectangles are drawn
adjacent to each other.
Notice: in drawing histograms class intervals must be equal and exclusive.
Example: For the following frequency distribution of height of students drawn the
histogram.
Height
140-145 145-150 150-155 155-160 160-165
165-170 170-175
No.of
Students
4 10 18 20 19 6 3
Solution
Histogram of distribution of height of
students
0
5
10
15
20
25
Height( Class)
N
u
m
b
e
r
s

o
f

s
t
u
d
e
n
t
s

(

f
r
e
q
u
e
n
c
i
e
s
)
140-145
145-150
150-155
155-160
160-165
165-170
170-175
In the frequency distribution, if the class intervals are of unequal width, we have
first to calculate frequency density on a convenient scale.
30
i

d
i
i
i
i
i
n
d
a
f
a
Some time we can multiply densities to the smallest class interval

Otherwise to multiply to a predetermined interval or choose the smallest in your
distribution.
0
i 0
a
d a
i
i
i
i
i
n
d
a
f
a

With a
0
the smallest interval
Example: Average monthly earning of 1035 employees in construction industry
Monthly earning Number
of workers
Width
0
a
i
i
i
n
d
a

60-70 25 10 25
70-80 100 10 100
80-90 150 10 150
90-100 200 10 200
100-120 240 20 120
120-140 160 20 80
140-150 50 10 50
150-180 90 30 30
180 and more 20 - -
Draw the histogram
Histogram of average monthly earninng of
1035 employees
0
50
100
150
200
250
Average monthly earning ( classes)
N
u
m
b
e
r

o
f

w
o
r
k
e
r
s

(

f
r
e
q
u
e
n
c
i
e
s
)
60-70
70-80
80-90
90-100
100-120
120-140
140-150
150-180
31
If the frequency distribution has inclusive class intervals, they should be converted
into the exclusive type and only then, the histogram should be drawn.
Example: Draw histogram to present the following data.
Income No.of Employees Income No.of Employees
100-149
150-199
200-249
250-299
21
32
52
105
300-349
350-399
400-449
450-499
62
43
18
9
Solution: here the grouped frequency distribution is not continuous because the class
intervals are inclusive. We first convert it into a continuous distribution as follows:
Adjustment factor
150 149
0.5
2
. Subtract it from each lower limit and add to each

upper limit so as to have exclusive class intervals. Thus
Income No.of
Employees
Income No.of Employees
99.5 -149.5
149.5-199.5
199.5-249.5
249.5-299.5
21
32
52
105
299.5-349.5
349.5-399.5
399.5-449.5
449.5-499.5
62
43
18
9
32
Frequency distribution of employees by
earned income (HISTOGRAM)
0
20
40
60
80
100
120
Income
N
u
m
b
e
r

o
f

e
m
p
l
o
y
e
e
s
99.5-149.5
149.5-199.5
199.5-249.5
249.5-299.5
299.5-349.5
349.5-399.5
399.5-449.5
449.5-499.5
2. A frequency Polygon
A frequency polygon is a graph of class marks. Class marks are values of middle points
of class intervals. The polygon is drawn by placing the class marks on the horizontal axis,
and on the vertical axis are placed the frequency of observations.
If the class intervals are of equal width, the class frequencies are plotted against the class
mid values. If the class intervals are of unequal width, the graph is obtained by plotting
frequency density against class mid values.
Description of a frequency polygon:
1) Each class is represented by a single point. The height of the point represents the
class frequency; the position of the point must be directly above the
corresponding class mid point;
2) The points are joined by straight lines.
3) The extremities of the graph are joined with the mid- values of the class preceding
the first class and the class following the last class at zero frequency i.e on the x-
axis.
A curve of relative frequencies can also be drawn, and so can a curve of percentages.
These are called frequency curves.
Example: For the following frequency distribution, draw a frequency polygon.
33
Income 300-400 400-500 500-600 600- 700 700-800 800-900 900-1000
workers 18 32 35 30 21 12 4
Solution
midpoint 350 450 550 650 750 850 950
workers 18 32 35 30 21 12 4
Frequency polygon of distribution of Income
3. Cumulative Frequency Curve or the Ogive
A cumulative frequency distribution (traditionally called an ogive) is a graph that
represents the cumulative frequencies for the classes in a frequency distribution.
Cumulative frequency graph is used to visually how many values are below a certain
upper class boundary.
There are two types of ogives:
A) Less than ogive: Plot the points with the upper limits of the classes as abscissae and
the corresponding less than cumulative frequency as ordinates.
For less than distributions, the cumulation will proceed from the least to the greatest size,
and the series so obtained will be called less than cumulative frequency distribution.
B) For more than distributions, the cumulation will proceed from the greatest to the least,
and the series so obtained will be called more than cumulative frequency distribution.
To form cumulative frequency distributions, the points are joined with straight lines.
34
Example
Draw the two ogives for the following distribution showing the number of marks of 59
students.
Marks No. Of students Marks No. Of students
0-10
10-20
20-30
30-40
4
8
11
15
40-50
50-60
60-70
12
6
3
Solution
Construction of two Ogives
marks No.of students ( f) Less than cumulat f More than Cumul f
0-10
10-20
20-30
30-40
40-50
50-60
60-70
4
8
11
15
12
6
3
4
12
23
38
50
56
59
59
55
47
36
21
9
3
Plotting the points ( 10, 4), ( 20,12), (30,23), ( 40, 38), ( 50,50 ), ( 60, 56), ( 70, 59) and
joining them by free hand, the smooth rising curve so obtained is less than ogive.
Plotting the points (0, 59), (10, 55), (20, 47), (30, 36), (40, 21), (50, 9), (60, 3) and
joining them by free-hand, the smooth falling curve so obtained is the more than ogive.
Less-than and more than cumulative frequency of marks distribution

35
EXERCISES
1. This table represents sex, age, height, weight of 24 students of Level I AST at INES in
2010.
Order Sex Age Height (en cm) Weight (en kg)
1 F 22 160 58
2 F 19 170 60
3 M 23 161 50
4 M 26 180 61
5 M 22 159 49
6 M 27 172 70
7 M 23 150 45
8 M 22 150 48
9 F 23 170 65
10 M 23 160 58
11 F 25 155 59
12 F 23 162 60
13 F 24 171 80
14 F 24 170 62
15 F 24 165 64
16 F 23 173 61
17 F 22 160 57
18 F 18 163 52
19 F 19 143 48
20 F 25 167 67
21 F 23 168 59
22 F 22 172 63
23 F 24 162 55
24 F 22 174 63
Draft a form of tabulation to show:
Sex and age, weight and height, age and weight, age and height.
Present absolute , relative, %, cumulative frequency distributions.
Draw Histogram, frequency polygon and ogive for age, height and weight.
2. Draw a histogram for the following frequency distribution of heights of students. From
the histogram, obtain the frequency polygon.
Height 140-150 150-160 160-165 165-170 170-180 180-190
No. of
students
5 15 15 20 10 2
36
3. Daily wages of works of a factory has the following distribution. Draw the less than
cumulative frequency graph for the wages.

wages 30-39 40-49 50-59 60-69 70-79 80-89 90-99 100-109 Total
No of
works
9 25 34 25 19 13 7 2 134
4. Draw a histogram and frequency polygon for the following data:
Marks No. Of students Marks No. Of students
0-10
10-20
20-30
30-40
40-50
5
13
12
11
8
50-60
60-70
70-80
80-90
90-100
4
1
3
1
2
5. The following are the weights of 30 students. Draw up a frequency distribution with:
a) Class intervals 40-44, 45-49, 50-54,kgs.
b) Class intervals of width 6 kgs each.
Weights ( kgs) : 51, 47, 50, 54, 62, 52, 42, 49, 52, 49, 44, 50, 53, 58, 46, 50, 51, 53, 48,
50, 55, 52, 55, 58, 63, 54, 52,49,50,58.
2.3 . DIAGRAMATIC AND CHARTS REPRESENTATION
In section 2.2, graphs such as the histogram, frequency polygon, and ogive showed how
data can be represented when the variable displayed on the horizontal axis is quantitative,
such as heights and weights.
On the other hand, when the variable displayed on the horizontal axis is qualitative or
categorical several types of charts are used such that: pictograms, statistical maps or
cartogram, spider chart, Gantt charts, bar chart, pareto charts, time series graphs, pie
graphs and so on.
37
This section is concerned with the presentation of non numeric or qualitative frequency
distributions data. The types of diagram described in this section include various types of
bar charts, pie charts, pareto charts and time series graphs.
1) Bar charts
a) Simple bar charts
It is a chart constructing of a set of non-joint bars. A separate bar for each class is drawn
to a height proportional to the frequency.
%
0.00
20.00
40.00
60.00
80.00
%

0.00 20.00 40.00 60.00 80.00
tuer cul oi d
i ndeter mi nate
%
%

0. 00
50. 00
100. 00
%
% 60.64 27.31 7.23 4.82
tuer cul l epr om i ndeter bor del i
%
60. 64
27. 31
7. 23
4. 82
%
The following bar charts is used for discrete variable
38
Example: This table shows the details of monthly expenditure of two families.
Draw a bar diagram to the data.
Family items of expenditure Family A Family B
Food
Clothing
House Rent
Education
Fuel and Lighting
Miscellaneous
Saving
140
80
100
30
40
40
70
240
160
120
80
40
80
80
Total 500 800
Solution
Detail for monthly expenditure of family A
0
20
40
60
80
100
120
140
160
F
o
o
d
C
l
o
t
h
i
n
g
H
o
u
s
e

R
e
n
t
E
d
u
c
a
t
io
n
F
u
e
l

a
n
d

L
ig
h
t
in
g
M
i
s
c
e
l
l
a
n
e
o
u
s
S
a
v
i
n
g
Expenditure
R
e
v
e
n
u
e
Series1

39
Details for monthly expenditure of family
B
0
50
100
150
200
250
300
F
o
o
d
H
o
u
s
e

R
e
n
t
F
u
e
l

a
n
d
L
i
g
h
t
i
n
g
S
a
v
i
n
g
Expenditure
R
e
v
e
n
u
e
Series1
ii. Multiple bar charts
These charts are used as extension of simple bar charts, where another dimension of the
data is given.
0 . 0 0
1 0 . 0 0
2 0 . 0 0
3 0 . 0 0
4 0 . 0 0
t u e r c u l o i dl e p r o ma t o u s i n d e t e r mi n a t e b o r d e l i n e
ma l e
f e ma l e

mal e
f emal e

0.00
20.00
40.00
60.00
80.00
f emal e
mal e
Example: draw bar charts to show the details of monthly expenditure of two families.
Solution
40
Details for monthly expenditure of two
families A and B
0
50
100
150
200
250
300
F
o
o
d
H
o
u
s
e

R
e
n
t
F
u
e
l

a
n
d
L
i
g
h
t
i
n
g
S
a
v
i
n
g
Expenditure
R
e
v
e
n
u
e
Series1
Series2
2) Pie charts
A pie chart shows the totality of the data being represented using a single circle. The
circle is split into sectors, the size of each one being drawn in proportion to the class
frequency. Each sector can be shaded or colored differently if desired.
Procedures of drawing a pie graph are:
Step 1: Calculate the proportion of the total that each frequency represents, using the
formula
f
n
where f = frequency of the class and n = total number of values.
Step 2: Find the number of degrees for each class, using the formula
Degrees =
o
360
f
n
g
or
Step 3: Find the percentage of values in each class by using the formula
% 100
f
n
.
Step 4: Using a protractor and compass, graph each section and write its name and
corresponding degrees or percentage
41
Advantages and disadvantages of Pie charts
Advantages: easy to construct; easy to understand, a sense of continuity is given by line
diagram which is not present in a bar chart.
Disadvantages: might be confusing if too many diagrams with closely associated values
are compared together. Where several diagrams are displayed, there is no provision for
total figures.
Example1: construct a pie charts for the following data
Monthly expenditure
of family A
Family A
Rs % age Cumulative % age
Food
Clothing
House Rent
Education
Fuel and Lighting
Miscellaneous
Saving
140
80
100
30
40
40
70
28
16
20
6
8
8
14
28
44
64
70
78
86
100
Monthly expenditure of Family A
28%
16%
20%
6%
8%
8%
14%
Food
Clothing
House Rent
Education
Fuel and Lighting
Miscellaneous
Saving
This graph shows that food is the most expenditure of family A.
42
Example 2: a survey of the students in the school of education of a large university
obtained the following data for students enrolled in specific fields. Construct a pie graph
for the data and analyze the results.
Major Number
% 100
f
n

Preschool
Elementary
Middle
secondary
893
605
245
1096
31
21
9
39
Total 2839 100
Students enrolled in specific fields
Preschool
31%
Elementary
21%
Middle
9%
secondary
39%
This graph shows that there are many students in secondary School than other
fields. Exercises
1. In a study of 100 women, the numbers shown here indicate the major
reason why each woman surveyed worked outside the home. Construct a
pie graph for the data and analyze the results.
Reason Number
To support self/family 62
43
For extra money
For something different to do
Other
18
12
8
1. A questionnaire about how people get news resulted in the following information
from 25 respondents. Construct a frequency distribution and a pie graph for the data
(N = newspaper, T = television, R = radio, M = magazine).
N N R T T R N T M R M M N R M T R M
N M T R R N N
2. A questionnaire on housing arrangements showed this information obtained from 25
respondents. Construct a frequency distribution and pie graph for the data (H =
house, A = apartment, M = mobile home, C = condominium).
H C H M H A C A M C M C A M A C
C M C C H A H H M
4) Pareto chart
A pareto chart is used to represent a frequency distribution for a categorical or qualitative
variable, and the frequencies are displayed by the heights of vertical bars, which are
arranged in order from highest to lowest.
Procedures of drawing a pareto chart
1. Arrange the data from the largest to smallest according to frequency
2. Draw and label the x and y axes
3. Draw the bars corresponding to the frequencies.
Example: The following data are based on a survey from American Travel Survey on
why people travel. Construct a pareto for the data and comment.
Purpose Number
Personal business
Visit friends or relatives
146
330
44
Work related
Leisure
225
299
Source: USA TODAY
0
50
100
150
200
250
300
350
1
Visit friends or relatives
Leisure
Work related
Personal business
This chart shows that the majority of American travel for visiting friends or relatives and
the minority travel for personal business.
5) Time series
When data are collected over a period of time, they can be represented by a time series
graph. A time series graph represents data that occur over a specific period of time.
Procedures of drawing a time series
Step 1: Draw and label the x and y axes
Step 2: Label the x axis for years and the y axis for the number of
Step 3: Plot each point according to the table
Step 4: Draw line segments connecting adjacent points.
Example 1: the number of bank failures in the United States during the years 1989 2000
is shown. Draw a time series graph to represent the data and comment the results.
Year 198
9
199
0
199
1
199
2
199
3
199
4
199
5
199
6
199
7
199
8
199
9
2000
N. of 207 169 127 122 41 13 6 5 1 3 8 7
45
failures
0
50
100
150
200
250
1985 1990 1995 2000 2005
Series1
The graph shows the bank failures from 1989 trough 2000. The most bank failed was
between 1989 and 1992.
Example 2: The following table shows meat production for lamb for the years 1960
2000 (data are in millions of pounds), construct a time series for the data.
year 1960
1970 1980 1990 2000
Lamb 769
551 318 358 234
46
0
100
200
300
400
500
600
700
800
900
1950 1960 1970 1980 1990 2000 2010
Year
M
e
a
t

p
r
o
d
u
c
t
i
o
n

f
o
r

L
a
m
b
Series1
The graph shows a decline in the quantity of meat production for lamb from 1960
through 2000.
47
Chapter 3: DATA DESCRIPTION: MEASURES OF CENTRAL TENDENCY,
MEASURES OF DISPERSION, MEASURES OF POSITION.
3.1 INTRODUCTION
This chapter explains the basic ways to summarize data. These include measures of
central tendency, measures of variation or dispersion, and measures of position.
Central tendency refers to the location of a distribution. A measure of central tendency is
any of a number of ways of specifying this "central value". Several types of averages can
be defined, the most important being the mean, the median, the mode and midrange.
Means could be arithmetic, geometric, or harmonic mean.
The three most commonly measures of variation are the range, variance, and standard
deviation. The most common measures of position are percentiles, quartiles, and deciles
3.2 MEASURES OF CENTRAL TENDENCY
A. The arithmetic mean
1. Definition of the arithmetic mean
The arithmetic mean of a set of values is the simple arithmetic average of the
observations. This is defined as the sum of the values of all the observations divided by
the number of observations"
The arithmetic mean is normally abbreviated to just the "mean or average
The mean or average, of a population is represented by, the Greek letter
( mu); and for

a sample, by the Roman letter
X
(read X bar ).
That is arithmetic mean=
the sum of all the values of observations in the sample
the number of values in the sample
48
2. The arithmetic mean for ungrouped data
The formula for calculating the arithmetic mean is:
j
1
1 2 3 N
n
j
j=1
1 2 3 n
x
x
x x x ... x
for the population
N N N
x
x
x x x ... x
X for the sample
n n
N
j
n

+ + + +

+ + + +

Where:
The symbol ( Geek capital letter "sigma")stands for summation: it means
the total of";
x represents any particular value of an observation;
x
is the sum of all values in the sample or population;
N represents the total number of observations in the population;
n refers to the number of observations in the sample.
Assume that the data are obtained from samples unless otherwise specified.

Example 1: Find the arithmetic mean (the average) of the numbers 8, 3, 5, 12, and 10.
Solution: in this data set,
1 2 3 4 5
8, 3, 5, 12, 10 x x x x x
, n = 5
Then
8 3 5 12 10 38
x 7.6
5 5
+ + + +

49
3. The mean of a simple( discrete ) frequency distribution
The mean for a simple frequency distribution is calculated using the following
formula:

k
j
j=1
1 1 2 2 3 3 x
1 3 3
1
x
x x
x + x x ... x
Mean, x
n
j
k
k
k
j
j
f
f f
f f f f
f f f f f
f
+ +

+ + +

Where
X represents values
f represents frequencies

f
is the total frequency or the total number of observations ( n)
fx
refers to the sum of each value x times its frequency f
Example : calculate the arithmetic Mean of the marks of 46 students given in the
following table.
Table 3.1 Frequency of marks of 46 students
Marks ( X) Frequency ( f ) fx
9
10
11
12
13
14
15
16
17
18
1
2
3
6
10
11
7
3
2
1
9
20
33
72
130
154
105
48
34
18
Total 46 623
50
The total of all these values (
fx
) = 623
Total number of observations ( n) = 46
Therefore, the arithmetic mean of the marks of 46 students is,
623
13.54
46
fx
x
n

4. The mean of a grouped frequency distribution

For grouped data,
and
x
are calculated by
x x
and x =
N n
f f

Where f is the frequency, x the mid-point of the class interval and n the total number of
observation.
Procedure of finding the Mean of grouped frequency distribution
Characteristics of the Arithmetic Mean
1. Make a table as shown.

Class interval Frequency( f) Midpoint (x) of
class interval
f.x
2. Find the midpoints of each class
3. Multiply the frequency by the midpoint for each class
4. Find de sum of the frequency
f
of each class times the class midpoint X.
4. Divide the sum obtained by the sum of the frequencies.

51
Example 1: Calculate the arithmetic mean of the following data:
Table 3.2 shows profit per shop
Profit in N.of shops( f) Mid-point of
Class interval
f.x
0-10 12 5 60
10-20 18 15 270
20-30 27 25 675
30-40 20 35 700
40-50 17 45 765
50-60 6 55 330
Total 100 2800
The mean profit is:
2800
28
100
fx
n

Example 2: The following data relates to the number of successful sales made by the
salesmen in a particular quarter.
Number of sales: 0- 4 5 9 10 14 15- 19 20 24 25 29
Number of salesmen 1 14 23 21 15 6
Calculate the mean number of sales
Answer:

Number of sales
( class interval)
Number of
Salesmen (f)
Class midpoint ( x) ( fx)
0 to 4 1 2 2
5 to 9 14 7 98
10 to 14 23 12 276
15 to 19 21 17 357
20 t0 24 15 22 330
25 to 29 6 27 162
Totals 80 1225
1225
80
1225
15.3
80
fx
f
fx
x
f
The advantages of the mean

52
The mean is the most commonly used measure of central tendency
Every set of interval- or ratio- level data has a mean;
It is easily understood;
All the values are included in computing the mean;
A set of data has only one mean. The mean is unique;
It is used in performing many other statistical procedures and tests.
It is not necessary, to know the value of each individual observation in order to
calculate the arithmetic mean. Only the total of the observations and the number
of observations are required.
The disadvantages of the mean are:
The mean is affected by extremely high or low values, called outliers, and may
not be the appropriate average to use in these situations;
It is time consuming to compute for a large body of ungrouped data;
It cannot be calculated when the last class of grouped data is open ended ( i.e., it
includes the lower limit of the last class " and over ");
The sum of the deviations of each value from the mean will always zero:
Expressed symbolically:
( )
0 X X
As an example, the mean of 3, 8, and 4 is 5. Then:

( )
( ) ( ) ( ) 3 5 8 5 4 5 X X + +
B. THE MEDIAN
The median is generally considered as an alternative average to the mean
53
The value of the variable which divides the distribution so that exactly half of the
distribution has the same or larger values and exactly half has the same or lower values is
called the median.
1. The median for ungrouped data
The median of a set of data is the middle value that separates the higher half from the
lower half of the data set after they have been ordered from the smallest to the largest, or
the largest to the smallest.
Procedure for obtaining the median of a set of data:
order the given data from the smallest to the largest or the largest to the
smallest;
Select the middle point.
Example: Find the median of the following five observations
1 2 3 4 5
x 10, x 15, x 6, x 12 and x 11
Solution: We must:
1. order the given numbers from the smallest to the largest: 6, 10, 11, 12, 15
2. Select the middle point: the middle value is 11.
Therefore the Median (MD) = 11
Note 1.When a set of data contains an even number of items; there is no unique middle or
central value. The convention in this situation is to use the mean of the middle two items
to give a median
.
Example 1: Find the median of the following six observations:
1 2 3 4 5 6
x 10, x 15, x 6, x 12 and x 11, x 17
Solution: As before arrange all the values of the observations in numerical order:
6, 10, 11, 12, 15, 17
54
Evidently there is no middle value. However two numbers lie in the middle: 11 and 12.
The two must be added together and divided by 2; thus obtaining their average:

11+12
MD = 11.5
2

Example 2: calculation of the median for the data given in table 3.1
Solution:
Arranging all the 24 values in ascending order of magnitude, we get the following data:
2.90 3.57 3.73
2.98 3.61 3.75
3.30 3.62 3.76
3.43 3.66 3.76
3.43 3.68 3.77
3.45 3.71 3. 84
3.55 3.72 3.88
The 12
th
value is 3.66 and 13
th
is 3.68; the median is the average of these two.
Median =
3.66 3.68
3.67 %
2
g
+
Note 2. For a set with an odd number ( n) of items, the median can be precisely identified
as the value of the
1

2
n
th
+
item. Thus in a size ordered set of the 15 items, the median
would be the
15 1
8th item along.
2
th the
+

2. Median for a simple frequency distribution
Where there is a large number of discrete items in a data set, but the range of values is
limited, a simple frequency distribution will probably have been compiled.
The median for a simple frequency distribution is calculated by the following formula:
MD =
1
2
f +
55
Where
f
is cumulative frequency, represented by F or N

Procedure for calculating the median
To calculate the median for a simple (discrete) frequency distribution, the following
procedures should be followed
1. Calculate the value of
1
2
f +
;
2. Form a F ( cumulative frequency) column;
3. Find that F value which first exceeds
1
2
f +
;
4. The median is that x value corresponding to the F value identified in 3.
Example: calculate the median for the following distribution of delivery times of orders
sent out from a firm.
Delivery time (days) 0 1 2 3 4 5 6 7 8 9 10 11
Number of orders 4 8 11 12 21 15 10 4 2 2 1 1
Answer
STEP 1 The median is the
1
2
N
th
+
=
91 1
2
th
+
= 46

th item
STEP 2 The F Column is shown in the following table:
Delivery time Number of orders
(Days) orders cum
( x ) ( f ) ( F)
0 4 4
1 8 12
2 11 23
3 12 35
4 21 56
5 15 71
56
6 10 81
7 4 85
8 2 87
9 2 89
10 1 90
11 1 91
STEP 3 The first F value to exceed 46 is F = 56
STEP4 The median is thus 4 (days)
3: Median for a grouped frequency distribution
There are two methods commonly employed for estimating the median for a grouped
frequency distribution.
a) using an interpolation formula;
b) by graphical interpolation
a) Estimating the median by formula
Given a grouped frequency distribution, the best that can be done is to identify the
class or group that contains the median item. From there, using cumulative
frequencies and the fact the median must lie exactly one half of the way along the
distribution.
The formula for calculating the median for a grouped distribution is:
Median =
2
.
N
F
L c
f
_

+

,

57
Where
lower bound(limit) of the median class ( the class contains the middle
item of distribution)
sum of frequecies of all classes lower than the median class

= median class widt
L
F
c
h(interval of median class)

= frequency of the median class
N = total number of obsrvations
f
Example1: calculation of Median for the Data of table 3.2
Protein intake/consumption
unit (g) /day ( class interval)
N.of families
Frequencies ( f)
Cumulative
frequency
15-25 30 30
25-35 40 70
35-45 100 170
45-55 110 280
55-65 80 360
65-75 30 390
75-85 10 400
Total 400
Median class is 45 -55 N=400
Median = ( )
.
200 170 .10
2
45 47.73
110
N
F C
L g
f
_
,
+ +
Procedure for estimating the median by formula
The procedure for estimating the median (by formula) for a grouped frequency
distribution is:
1. Form a cumulative frequency (F) Column;
2. Find the value of
N
( where N = ).
2
f
58
3. Find that F value first exceeds, which identifies the median class M.
4. Calculate the median using the following interpolation formula:
2
.
N
F
L c
f
_

+

,
Example: Estimate the median for the following data, which represents the ages of a set
of 130 representatives who took part in a statistical survey.
Age in years 20 and 25 and 30 and 35 and 40 and 45 and
Under 25 under 30 under 35 under 40 under 45 under 50
Number of 2 14 29 43 33 9
Representatives
Answer
1.
Age ( years) Number of representatives ( f) ( F )
20 and under 25 2 2
25 and under 30 14 16
30 and under 35 29 45
35 and under 40 43 88
40 and under 45 33 121
45 and under 50 9 130

2.
130
65
2 2
N

3. The median class is the class that has the first F greater than 65. Here, it is 35 to 40.
4. The median can now be estimated using the interpolation formula.
59
35; 43; 5
2
Thus, median = .
65-45
= 35 + 5
43
= 37.33
Median = 37.33years
L F c
N
F
c
f

_

,
_

,
b) Estimating the median graphically
A percentage cumulative frequency curve (or ogive ) is drawn and the value of the
variable that corresponds to the 50% point is read off and gives the median estimate.
Procedure for estimating the median graphically
1. Form a cumulative ( percentage ) frequency distribution
2. Draw up cumulative frequency curve by plotting class upper bounds against
cumulative percentage frequency and join the points a smoth curve.
3. Read off 50% point to give median.

Properties of Median
1. The median is particularly useful where :
a) a set or distribution has extreme values present and
b) Values at the end of a set or distribution are not known. This means that
median is used for an open ended distributions.
2. The median can be determined for all levels of data except nominal
3. the median is unique; there is only one median for a set of data
The advantages of the median
The advantages of the median are:
it is not affected by extremely large or small values ;
60
it is easily understood ( i.e half the data are smaller than the
median and half are greater);
it can be calculated even when the last class is open ended and
when the data ere qualitative rather than quantitative;

The disadvantages of the median
It does not use much of the information available;
It requires that observations be arranged into any array, which is time
consuming for a large body of ungrouped data.

C. THE MODE
1. Definition
The mode is the value of the observation that appears most frequently, or equivalently
has the largest frequency. Especially, the mode is used in describing nominal and ordinal
levels of measurement
It is possible for data not to have any mode at all; like in a case where observations occur
with equal frequency.
Example:
The mode of the set 2, 1, 3, 3, 1,1, 2, 4 is 1, since this value occurs most often.
For the data in table 3.1 is 3.76 this observation is most commonly occurring
The mode of the following simple discrete frequency distribution :
X 4 5 6 7 8 9 10
f 2 5 21 18 9 2 1
Is 6, since this value has the largest frequency

2. The mode for grouped data
For a grouped frequency distribution, the mode cannot be determined exactly and so must
be estimated. The technique used is one of interpolation. There are two methods that can
be used to estimate the mode:
Using an interpolation
61
Graphically, using a histogram.
Mode of a grouped frequency distribution by formula
An estimate of the mode for a grouped frequency distribution can be obtained using the
following procedure:
1. Determine the modal class ( that class which has the largest frequency)
2. Calculate D
1
= difference between the largest frequency and the frequency
immediately preceding it.
3. Calculate D
2
= difference between the largest frequency and the frequency
immediately following it.
4. Use the following interpolation formula:
Interpolation formula for the mode

1
1 2
D
Mode = L+ .
D
C
D
_

+
,
Where: L = lower bound of modal class
C = modal class width
And: D
1,
D
2
are as described above in 2 and 3
Example 1: Estimate the mode of the following distribution of ages.
Age (years) 20-25 25-30 30-35 35-40 40-45 45-50
Number of employees 2 14 29 43 33 9
Answer:
Age (years) number of employees
20 and under 25 2
25 and under 30 14
30 and under 35 29
35 and under 40 43
40 and under 45 33
45 and under 50 9
62
D
1
= 43 29 = 14
D
2
= 43-33 = 10
The lower class bound of the modal class, L = 35
The class width of the modal class, C = 5 (from 35 to 40 )
1
1 2
Thus: mode= .C
14
= 35+ .5
14+10
mode = 37.92 years
D
L
D D
_
+

+
,
_

,
Graphical estimation of the mode
The graphical equivalent of the above interpolation formula is to construct three
histogram bars, representing the class with the highest frequency and the ones on either
side of it, and to draw two lines. The mode estimate is the x value corresponding to the
intersection of the lines.
Example 2: Estimation of the mode of a frequency distribution using the graphical
formula.
Using the data of ex 1:
Age (years) number of employees
30 and under 35 29
35 and under 40 43
40 and under 45 33
Draw the graph
63
The advantages of the mode
The mode has the advantage of not being affected by extremely
high or low values;
It is easily understood ( half the data are smaller than the median
and half are greater), not difficult to calculate and can be used
when the last class of a distribution is open ended;
The mode is used for al levels of data: nominal, ordinal, interval,
and ratio.
The disadvantages of the mode
The disadvantages of the mode are:
The mode does not use much of the information available;
For many sets of data, there is no mode because no value
appears more than once. For example, there is no mode for this
set of price data: RWF250 , RWF 400, RWF 650 and RWF
1250 ;
The mode is not always unique. Example: suppose the ages of
the individuals in a scout Club is 14, 16, 17, 18, 18, 20, 20, 22,
24, 24, and 25. Both the ages 27 and 35 are modes.
In general, the mean is the most frequently used measure of central tendency and the
mode is the least used.
lowest value highest value
MR
2
+
Example:
D. THE MIDRANGE
The midrange is defined as the sum of the lowest and highest values in the data set,
divided by 2. The symbol MR is used for the midrange.
Find the midrange of these numbers: 2, 3, 6, 8, 4, and 1
64
1 8 9
MR 4.5
2 2
+

Then, the midrange is 4.5
The Relationship between the Arithmetic Mean, the Median and the Mode
In a symmetrical frequency distribution the mode, median, and mean are located
at the center and are always equal illustrates this for a normal distribution .Fig
(a ) in this case one of these measures may be used.

Mean
Median
Mode
If the distribution of the variable is not symmetrical, we have a skew distribution:
the arithmetic mean is not so typical of the distribution. In a positively skewed
distribution, the mean is not at the centre. The mean is dragged to the right of
centre by a few extremely high values of the variable that have been observed.
The median is generally the next largest measure in a positively skewed
frequency distribution. The mode is the smallest of the three measures. If the
distribution is highly skewed, the mean would not be a good measure to use. The
median and mode would be more representative.

mode median mean

65
In a negatively skewed distribution the mean is reduced by a few extremely low
values of the variable and hence will be left of centre. The median is greater than
the arithmetic mean, and the modal value is the largest of the three measures.
Again, if the distribution is highly skewed, the mean should not be used to
represent the data.

In a moderately skew distribution the following relationship holds approximately:
1. Mean - Mode= 3 (mean-Median);
2. Median mode = 2 ( mean median );
3. Median =
2 mean + mode
3
;
4. Mode = 3 median 2 mean ;
5. Mean =
3 median - mode
2
THE GEOMETRIC MEAN G
The geometric mean is useful in finding the average of percentages, ratios, indexes, or
growth rates. It has a wide application in business and economics because we are often
interested in finding the percentage changes in sales, salaries, or economic figures, such
as the Gross Domestic Product, which compound or build on each other.
The geometric mean G of a set of N positive numbers
1 2 3
, , ,...
n
x x x x
, is calculated using
the formula: Geometric mean=
1 2 3
...
n
n
x x x x
Where n is the number of observation made of the variable
x
and
1 2 3
, , ,...,
n
x x x x
are the
values of these observations.
Example: the geometric mean of the numbers 3, 25 and 45 is:
G =
3
3 25 45 =
3
3375
66
Mean median mode
THE HARMONIC MEAN H
The harmonic mean is another specialized measure of location used only in particular
circumstances; namely when the data consists of a set of rates, such as prices, speeds or
productivity.
The harmonic mean H of a set of N numbers
1 2 3
, , ,...
n
x x x x
, is the reciprocal of the
arithmetic mean of the reciprocals of the numbers:
H =
1
1
1 1 1
n
i i
n
x n x

Where n is the number of observations.
Example: the harmonic mean of the numbers 2, 4, and 8 is:
H =
3 3
3.43
1 1 1 7
2 4 8 8

+ +
The relation between the arithmetic, geometric, and harmonic means.
The geometric mean of a set of positive numbers
1 2 3
, , ,...
n
x x x x
is less than or equal to
their arithmetic mean but is greater than or equal to their harmonic mean. In symbols:

X H G
The equality signs hold only if all the numbers
1 2 3
, , ,...
n
x x x x
are identical.
Example: The set 2, 4, 8 has arithmetic mean 4.67, geometric mean 4, and harmonic
mean 3.43.
67
3.2 MEASURES OF DISPERSION
Dispersion refers to the variability or spread in the data. A small value for a measure of
dispersion indicates that the data are clustered closely, say, around the arithmetic mean. A
large measure of dispersion indicates that the mean is not reliable.
The most important measures of dispersion are:
1. Range is the difference between the largest and the smallest values in
a data. The range is the simplest of the three measures and is defined
now. The symbol R is used for the range.
R= Largest value smallest value
1. Find the range of the following distribution.
35, 45, 30, 35, 40, 25
R = 45- 25 = 20
2. Mean Deviation (MD) is the arithmetic mean of the deviations of the
observations from the arithmetic mean ignoring the sign of these
deviations.
a) The formula for the mean deviation for ungrouped data is
MD = for populations
X
N

MD =
for samples
X X
n
mean
Where:
X is the value of each observation;
X is the arithmetic mean of the values;
68
is the arithmetic mean of the population;

n is the number of observations in the sample;
N is the number of observations in the population;
Indicates the absolute value.
Example: calculate the mean deviation of 43, 75, 48, 39, 51, 47, 50, 47
Solution
First determine the mean as:
400
50
8

, and then:
MD =
x x
n

43 50 75 50 48 50 39 50 51 50 47 50 50 50 47 50
8
+ + + + + + +
7 25 2 11 1 0 3
8
6.5
+ + + + + +
b) Mean deviation for grouped data:

MD = for populations
MD = for samples
f X
N
f x x
n

Where f refers to the frequency of each class and X to the class midpoints.
69
Example: calculate the mean and the mean deviation of the number of sales
(see ex 4.2)
Table 1 Number of sales made by salesmen
Number of sales 0-4 5-9 10-14 15-19 20-24 25-29
Number of salesman 1 14 23 21 15 6
Table 2 Layout of calculations
Number of
sales
Number of
Salesman f
Mid-point
( x)
( fx)
x x f x x
0 to 4
4 to 9
10 to 14
15 to 19
20 to 24
25 to 29
1
12
23
21
15
6
2
7
12
17
22
27

2
98
276
357
330
162

13.3
8.3
3.3
1.7
6.7
11.7
13.3
116.2
75.9
35.7
100.5
70.2
Totals 80 1225 411.8
Mean number of sales,
1225
80
x
= 15.3
Thus, mean deviation, MD =
f x x
f
=
411.8
80
= 5.1 sales
70
Characteristic of the mean deviation
a. The mean deviation can be regarded as a good representative measure
of dispersion that is not difficult to understand. It is useful comparing
the variability between distributions of like nature.
b. Its practical disadvantage is that it can be complicated to calculate if
the mean is anything other than a whole number.
c. Because of the modulus sign, the mean deviation is virtually
impossible to handle theoretically and thus is not used in more
advanced analysis.
3. Variance is the arithmetic mean of the squared deviations from the
mean. The variance is nonnegative and is zero only if all observations
are the same. The population variance
2
(the Greek letter sigma
squared) and the sample variance
2
s for ungrouped data are given
by:
( ) ( )
2
2
2 2
and
1
X X
x
s
N n
Where: N is the number of observation in the population;

n-1 is the number of observations in the sample.
For grouped data are given by:
( ) ( )
2
2
2 2
and
1
f X X
f X
s
N n
4. Standard deviation.
71
The population standard deviation and sample standard deviation s are the positive
square roots of their respective variances.
a) For ungrouped data:
( ) ( )
2
2
and s =
1
X X
X
N n
Exemple 1 (for ungrouped data): calculate the variance and standard deviation of the
following table
Table 3.3 Haemoglobin values ( g%) of 26 Normal Children
11.8 12.9 12.4 13.3 13.8
11.4 12.3 11.7 12.9 12.2
10.4 10.8 12.7 13.2
11.6 12.0 12.2 14.2
10.8 10.5 11.6 13.5
12.2 11.2 12.6 13.0
Table 3.4 calculation of standard Deviation and variation for the data of table 3.3
Serial No Haenoglobin values Deviation from
Aritm.mean 12.2
Square of deviation
1
2
3
4
5
6
7
8
9
11.8
11.4
10.4
11.6
10.8
12.2
12.9
12.3
10.8
- 0.4
- 0.8
- 1.8
- 0.6
- 1.4
0.0
0.7
0.1
-1.4
0.16
0.64
3.24
0.36
1.96
0.0
0.49
0.01
1.96
72
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
12.0
10.5
11.2
12.4
11.7
12.7
12.2
11.6
12.6
13.3
12.9
13.2
14.2
13.5
13.0
13.8
12.2
-0.2
-1.7
-1.0
-0.2
-0.5
-0.5
0.0
-0.6
0.4
1.1
0.7
1.0
2.0
1.3
0.8
1.6
0.0
0.04
2.89
1.00
0.04
0.25
0.25
0.0
0.36
0.16
1.21
0.49
1.00
4.00
1.69
0.64
2.56
0
Total 0 2540
Arithmetic mean is 12.2
Standard deviation = S =
( )
25.40
1 25
x x
n

S =
2
1.016 1.01 g%
variance = s 1.016
b) For grouped data:

73
( ) ( )
2
2
and s =
1
X X
f X
N n
Example: calculation of Variance and Standard Deviation for Data of table 3.2
Example1: calculation of variance and standard deviation for Data of table 3.2
Protein
intake/consumption
( class interval)
N.of
families
Frequencies
( f)
Mid-point
Of class
Interval (x)
Deviation
Of mid-
point
from
arithmetic
Mean
( )
x x
Squared
Deviation
( )
2
x x
Frequency
sq
Deviation
( )
f x x
15-25 30 20 -27.5 756.25 22687.5
25-35 40 30 -17.5 306.25 12250.0
35-45 100 40 -7.5 56.25 5625.0
45-55 110 50 2.5 6.25 687.5
55-65 80 60 12.5 156.25 12500
65-75 30 70 22.5 506.25 15187.5
75-85 10 80 32.5 1056.25 10562.5
Total 400 79500
Arithmetic mean = 47.5
From this table, we get
( )
2
400
f x x
f
Therefore, Standard deviation = s =

2
79500.0
14.10
400
variance = S 198.75
ga
The major characteristics of the standard deviation are:

It is in the same units as the original ;
74
It is the same square root of the average squared distance from the mean;
It cannot be negative
It is the most widely reported measure of dispersion.
4. The coefficient of variation
In the majority of cases where distributions need to be compared with respect to
variability, the following measure, known as the coefficient of variation, is much more
appropriate and is considered as the standard measure of relative variation.
The coefficient of variation is the standard deviation divided by the mean. The result is
expressed as a percentage.

Coefficient of variation (C.V.) =
standard deviation 100
Mean

For the example given in table 3.1, the standard deviation, s = 1.01 and the arithmetic
mean
12.2 x
, the coefficient of variation is
1.01 100
8.28%
12.2
For the example given in table 3.2, the standard deviation, s = 14.10 and the arithmetic
mean 47.5, x the coefficient of variation, therefore, is
14.10 100
29.68%
47.5
3.3 MEASURES OF POSITION

In addition to measures of central tendency and measures of variation (dispersion), there
are measures of position or location. These measures include standard scores,
75
percentiles, deciles, and quartiles. They are used to locate the relative position of a data
set. For example, if a value is located at the 80
th
percentile, it means that 80% of the
values fall below it in the distribution and 20 % of the values fall above it.
A. standards scores
The standard score represents the number of standard deviations that a data value falls
above or below the mean. The symbol for a standard score is z. the formula is:
value-mean
standard deviation
z
For samples, the formula is:
X X
z
s
For populations, the formula is:

X
z

A student scored 65 on a calculus test that had a mean of 50 and standard deviation of 10;
she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare
her relative positions on the tests.
Solution
First, find the z scores. For calculus the z score is.
65 50
1.5
10
X X
z
s

For history the z score is:
30 25
1.0
5
z

Since the z score for calculus is larger, her relative position in the calculus class is higher
than her relative position in the history class.
Note that if the z score is positive, the score is above the mean. If the z score is 0, the
score is the same as the mean. And if the z score is negative, the score is below the mean.
B. Quartiles
76
Quartiles divide the distribution into four equal parts (quarters).
The value of the variable for which the cumulative frequency is
4
N
is called the first
quartile or lower quartile and it is denoted by
1
Q
.
Similarly, the value of the variable for which the cumulative frequency is
3
4
N
is
called the third quartile or upper quartile and it is denoted by
3
Q
.
Cleary median is the second quartile and it can be denoted by
In the case of ungrouped data with n items
1
Q
is calculated as follows.
Let ( )
1
1
4
i n
1
+
1
]
= integral part of ( )
1
1
4
n +
Let ( ) ( )
1 1
1 1 .
4 4
q n n
1
+ +
1
]
Hence q is the fractional part.
Then ( )
1 1 i i i
Q x q x x
+
+
where similarly ( )
3 1 i i i
Q x q x x
+
+
( ) ( ) ( )
3 3 3
1 and 1 1
4 4 4
i n q n n
1 1
+ + +
1 1
] ]
In the case of grouped frequency distribution the quartiles are calculated by using the
formula:
1
4
N
F C
Q L
f
_

,
+
is called the lower quartile

2
2
N
F C
Q L
f
_

,
+
is the median

3
3
4
N
F C
Q L
f
_

,
+
is called the upper quartile
Where L is the lower limit of the class in which the particular quartile lies, f is the
frequency of this class, C is the width of the class and F is the cumulative frequency of
the preceding class.
77
C. Deciles
Similarly, Deciles are the values of the variables which divide to the frequency into 10
equal parts.
Consider a frequency distribution with total frequency N. The value of the variable for
which the cumulative frequencies are
( ) 1, 2,..., 9
10
iN
i are called deciles. The ith decile is denoted by
i
D
.
Clearly median is the fifth decile. Hence the median can also be denoted by
5
D
.
In the case of the ungrouped data with n items for k = 1, 2, 3, , 9.
( )
1 k i i i
D x q x x
+
+
Where
( ) ( ) ( ) 1 1 1
and
10 10 10
k n k n k n
i q
+ + + 1 1

1 1
] ]
For a grouped frequency distribution, we have
10
; ( 1, 2,..., 9)
i
iN
F C
D L i
f
_

,
+
D. Percentiles
Percentiles are the values of the variables which divide to the frequency into 100 equal
parts denoted by
1 2 99
, ,... . P P P
and the ith percentile is denoted by
i
P
.
Cleary median is 50
th
percentile and hence median can also be denoted by
50
P
.
In the case of ungrouped data with n items, for k = 1, 2, 3, 99
( )
1 k i i i
P x q x x
+
+
Where
( ) ( ) ( ) 1 1 1
and q
100 100 100
k n k n k n
i
+ + + 1 1

1 1
] ]
Percentiles are got from the following formulae in the case of grouped frequency
distribution.
100
; 1, 2,..., 99
i
iN
F C
P L i
f
_

,
+
78
ILLUSTRATIVE EXAMPLES
1. Find the median and quartiles of the heights in cm. of eleven students given by
66, 65, 64, 70, 61, 60, 56, 63, 60, 67, 62.
Solution: Arranging the given data in ascending order of magnitude we get
56, 60, 60, 61, 62, 63, 64, 65, 66, 67, 70.
Here n = 11. Since n is odd, median is the sixth item which is equal ton 63.
( )
( )
1
1
Size of 1 item.
4
1
11 1 3
4
th
Q n +
+
1
Q
= third item = 60
( )
3
3
1 item 9 item = 66
4
th
Q n th +
2. Find the median and quartile marks of 10 students in statistics test whose marks are
given as
40, 90, 61, 68, 72, 43, 50, 84, 75, 33.
Solution: Arranging in ascending order of magnitude we get
33, 40, 43, 50, 61, 68, 72, 75, 84,90.
Here n = 10. Since n is an even, median is the average of the two middle items: 61 and
68.
Median = ( )
1
61 68 64.5 marks.
2
+
First quartile
Here ( ) ( ) ( )
1 1 1
1 2 and 1 1 0.75
4 4 4
n q n n
1 1
+ + +
1 1
] ]
( ) ( )
1 2 3 2
.75 40 .75 43 40 42.5 Q x x x + +
Third quartile

( ) ( ) ( )
( ) ( )
3 8 9 8
3 3 3
1 8 and 1 1 0.25
4 4 4
0.25 75 0.25 84 75 77.25
n q n n
Q x x x
1 1
+ + +
1 1
] ]
+ +
79
3. Find the lower quartile, median, upper quartile, 4
th
decile and 60
th
percentile of the
following data.
Marks 0-4 4-8 8-12 12-14 14-18 18-20 20-25 25&above
No.of
student
10 12 18 7 5 8 4 6
Solution
Marks No.of student Cumulative frequency
0-4
4-8
8-12
12-14
14-18
18-20
20-25
25 &above
10
12
18
7
5
8
4
6
10
22
40
47
52
60
64
70
70 N f
i) Median =
2
C N
L F
f
_
+

,
Here
70
35, Median class is 8-12, L 8, C 12 8 4, F 22, 18
2 2
N
f
Median = ( )
4
8 35 22 10.89
18
+
Here
70
17.5 4, C 4, 12, F 10
4 4
N
L f
ii) Lower quartile
1
4
C N
Q L F
f
_
+

,
( )
1
4
4 17.5 10 6.5
12
Q +
80
iii) Upper quartile:
3
3
4
C N
Q L F
f
_
+

,

3 3 70
52.5 18, C 20 18 2, 8, F 52
4 4
N
L f

( )
3
2
18 52.5 52 18.125
8
Q +
iv) 4
th
Decile is

4
4
10
C N
D L F
f
_
+

,
Here
4 280
28 8, C 4, 18, F 22
10 10
N
L f
( )
4
4
8 28 22 9.33
12
D +
V) 60
th
percentile is
60
P
which is given by

60
60.
100
C N
P L F
f
_
+

,
Here
60 60.70
42 12, C 14 12 2, 7, F 40
100 100
N
L f
( )
60
2
12 42 40 12.57
7
P +
81

Ines Descriptive Statistics Level I Asta 2010

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ines Descriptive Statistics Level I Asta 2010

Uploaded by

Copyright:

Available Formats

INES- RUHENGERI

Faculty of Fundamental Applied sciences

me Ed. Economica 2003 ;

Formulation of grouped frequency distributions

3) Calculate class interval or class widths ( lengths )

The adjusted classes would then be as follows:

Some time we can multiply densities to the smallest class interval

. Subtract it from each lower limit and add to each

( mu); and for

4. The mean of a grouped frequency distribution

The advantages of the mean

As an example, the mean of 3, 8, and 4 is 5. Then:

is cumulative frequency, represented by F or N

h(interval of median class)

is the arithmetic mean of the population;

b) Mean deviation for grouped data:

Where: N is the number of observation in the population;

b) For grouped data:

Therefore, Standard deviation = s =

The major characteristics of the standard deviation are:

3.3 MEASURES OF POSITION

For populations, the formula is:

You might also like