You are on page 1of 34

CHAPTER-I

INTRODUCTION TO STATISTICS
Origin and development:
The word statistics has been derived from the Latin word status that means a political state. It has also root either to the Italian word statista or the German word statistik or the French word statistique, each one of which means a political state. For several decades, the word statistics was associated solely with the display of facts and figures pertaining to economic, demographic and political situations prevailing in a country, usually, collected and brought out by local governments. Statistics is a tool in the hands of mankind to translate complex facts into simple and understandable statements of facts.

Meaning and Definition of Statistics:


The word statistics is used in two different senses Plural and singular. Statistics in plural form is defined as follows: By statistics we mean the aggregate of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy collected in a systematic manner for a pre-determined purpose and placed in relation to each other. Statistics in singular form is defined as follows: Statistics may be defined as identification, collection, presentation, analysis and interpretation of numerical data. Thus, statistics (as used in the sense of data) are numerical statements of facts capable of analysis and interpretation and the science of statistics is a study of the principles and methods used in the collection, presentation, analysis and interpretation of numerical data in any sphere of enquiry.

Functions of statistics:
1. It simplifies the unwieldy and complex data. 2. It helps in collection and presentation of facts in a systematic manner. 3. It presents data in a precise and definite form. 4. It helps in comparing data with respect to time and location. 5. Statistics enlarges human knowledge and experience. 6. It helps in formulating and testing hypothesis. 7. Statistics helps in testing the laws of social science. 8. It helps in forecasting future tendency of given phenomenon. 9. It helps in measuring risk and uncertainty. 10. It helps in finding extent of relationship between different set of data. 11. It helps in formulating and implementing policies in different fields.

Limitations of Statistics:
1. 2. 3. 4. 5. 6. Statistics does not deal with individual items. Statistics deals with more of quantitative data. Statistics may be misleading to wrong conclusions in the absence of details. Statistical laws are true only on average. Statistics does not reveal the entire story. Statistics is liable to be misused.
1.1 Introduction to statistics.

Quantitative Aptitude

The distrust of statistics is because of the following reasons: 1. Deliberate twisting of facts. 2. Quoting figures without their context. 3. Selections of non-representative statistical units. 4. Failure to present complete data. 5. Inappropriate comparison. 6. Wrong inference drawn. 7. Improperly classified data. 8. Data collected by improper persons. 9. Inaccurate measurement. 10. Arithmetical errors.

Statistical Enquiry:
Meaning of Statistical Enquiry: Statistical enquiry means search for knowledge with the help of statistical devices. It may be either a general purpose enquiry (like population census or a special purpose enquiry (i.e. to analyse a specific problem). The stages in a statistical enquiry can be classified under two categories: 1. Planning the enquiry and 2. Executing the enquiry.

Statistical Unit:
Meaning: A statistical unit is that thing in terms of which requisite information is collected and expressed during a statistical inquiry. A statistical unit is a well defined and identifiable object or group of objects with which the measurements of counts in any statistical investigation are associated. A statistical unit is the measure of variables or attributes selected for enumeration, analysis and interpretation. For effective comparison and valid conclusions, it is necessary that the statistical unit is suitably chosen. A mistake in the choice of the appropriate unit is more harmful than mistakes in the collection of data. Types of statistical units: Statistical units can be of two types: I. Units of measurement. II. Units of presentation. I. Units of measurement: Units of measurement are the means to identify, understand, enumerate, analyse and interpret the data. Units of measurement can be divided into two: 1. Units of enumeration and 2. Units of analysis & interpretation. 1. Units of enumeration: Units of enumeration are used while collecting the data for the purpose of counting and identification and also to enable the classification, tabulation, diagrammatic and graphic presentation. Units of enumeration can be simple or composite. Ton, mile, meter etc. are the examples of simple unit and kilowatt-hour, passenger-mile, ton-kilometer etc. are the examples of composite units.

Quantitative Aptitude

1.2

Introduction to statistics.

2. Units of analysis & interpretation: Units of analysis and interpretation are used to measure and express the result of analysis and interpretation of statistical data. They are absolute value, rates, percentage, proportion, ratio and coefficient. Actual number, number per thousand, number per hundred, number per one, relation between two numbers and relation between actual & standard value are examples respectively. The essential requisites of a statistical unit are: 1. The statistical unit must be clear and unambiguous. 2. It should be rigid, easy to ascertain and homogeneous. 3. It should be suitable for the enquiry. 4. It should be stable. 5. It must be simple to understand.

Collection of Data:
The facts collected through censuses, survey or in a routine manner are called a raw data. The raw data is a statistical data in original form before any statistical techniques are used to redefine, process or summarize it. There are two types of statistical data: 1. Primary data. 2. Secondary data.

Primary Data:
Primary data is the data collected by a particular person or organization for his/her or its own purpose and use from the primary sources. It is the data which is originally collected for the first time by the investigator or his agent with a pre-determined purpose.

Secondary data
Secondary data is the data collected by some other person or organization for their own use but the investigator also gets it for his/her use.

Difference between Primary data and secondary data:


Primary data 1. Primary data is collection of original data for the first time. 2. It is collected by the investigators or their agents. 3. It is highly expensive. Secondary data Secondary data is basically compilation of Existing data. It is compiled by the persons other than the persons who collected the primary data. It is relatively less expensive when compared to primary data. 4. It is usually directly suitable to the purpose of It may or may not be directly suitable to the enquiry. purpose of enquiry. 5. It may be used as it is for the purpose of It may require adjustments to be made to suit enquiry. the purpose of enquiry. 6. There is possibility of personal prejudice in its There is no possibility of personal prejudice in collection. its compilation since such data are already collected. 7. Primary data becomes secondary data after use. Secondary data can never be converted into a primary data.

Quantitative Aptitude

1.3

Introduction to statistics.

Considerations as to choice between primary and secondary data:


The choice between primary and secondary data depends on the following considerations: 1. Nature, objective and scope of enquiry. 2. Availability of financial resources. 3. Availability of time. 4. Degree of accuracy expected. 5. The status of investigator.

Methods of Collecting Primary Data:


Broadly speaking there are five different methods of collecting primary data which are as under: 1. Direct personal investigation or interview. 2. Direct personal observation. 3. Indirect oral investigation. 4. Through local correspondents. 5. Mailed Questionnaire. 6. Schedules sent through enumerators.

1. Direct personal investigation or interview:


Under this method, the investigator himself personally goes to the source of the data and collects the necessary information through interview with the informants. This method is adopted in the following cases: 1. Where greater accuracy is needed. 2. Where the field of enquiry is not large. 3. Where confidential data are to be collected. 4. Where the field is a complex one. 5. Where intensive study is needed. 6. Where sufficient time is available. Merits: 1. The information proves to be more reliable as they are collected directly by the investigator himself after careful observation of the phenomenon and clarification of the various doubts and cross-examinations of the informants. 2. More response and cooperation of the informants are secured on account of personal approach and requests being made under this method. 3. Sensitive questions can be avoided and twisted keeping in view the reactions of the informants. 4. The language of the questions can be adjusted with the standard of understanding of the informants. 5. Information relating to the character and condition of the informants can be gathered easily along with the data required for the enquiry. 6. The data can be collected quickly and promptly by the use electronic gadgets etc. Demerits: 1. The personal bias and prejudices on the part of investigator may lead to incorrect results. 2. It is not suitable for extensive investigation, particularly where the field of enquiry is very vast and wide. 3. It takes a long time to collect the data from all the informants selected.
Quantitative Aptitude 1.4 Introduction to statistics.

4. 5. 6. 7.

It is very much expensive. It needs a large army of enumerators. It is purely objective in nature and the success of the investigation largely depends upon personality involving intelligence, skill, tact, insight, courage diplomacy, honesty, politeness , courtesy and keen sense of observation of the investigator. Some informants may be reluctant to part with the required information out of fear or shame.

2. Direct personal observation:


The investigator extracts the required information by personal observation of the data occurring on the spot from the units and the respondents. Most of the surveys in various scientific, social and economic fields are done by this method. This method is adopted in the following cases: 1. Where greater accuracy is needed. 2. Where the field of enquiry is not large. 3. Where confidential data are to be collected. 4. Where the field is a complex one. 5. Where intensive study is needed. 6. Where sufficient time is available. Merits 1. Original first-hand information or data are collected. 2. True and reliable data can be had. 3. A high degree of accuracy can be aimed. 4. The investigator can get correct information. 5. Uniformity and homogeneity can be maintained. Demerits: 1. It is unsuitable where the area is large. 2. It is expensive and time-consuming. 3. The chances of bias are more 4. An untrained investigator will not bring good result.

3. Indirect Oral Investigation:


Under this method, the investigator collects the data indirectly by interviewing the third persons who are supposed to be in close touch with original informants or the incidence. This method of collecting the data is adopted when the original informants are either not found or found to be reluctant to part with the desired information, or the incidence concerned is not accessible. Such type of method is usually adopted by the enquiry committees or commissions appointed by a government Central or a State. In this type of enquiry, usually a small list of questions is prepared and the persons connected with the matter (known as witnesses) are individually invited and asked to answer to those questions. The replies given by them are recorded systematically by the investigators. The accuracy of the data collected under this method would largely depend upon the type of the persons selected for interrogations and depositions. For reliability of the data it is required that the persons so selected:

Quantitative Aptitude

1.5

Introduction to statistics.

a) Should be fully aware of the problem under study. b) Should not be motivated to give colours to the facts. c) Should not be biased and prejudiced. d) Should be capable of expressing himself or herself correctly and precisely. e) Should neither be optimist nor pessimist but normal in character. Suitability of this method: This system is more suitable, where the area to be studied is large. It is used when information cannot be obtained directly. Governments generally adopt this system. Merits: 1. A wide area can be covered within a given time. 2. It needs fewer amounts of resources in terms of time, energy, and money. 3. Prejudices of the original informants are eliminated as the information is recorded from the disinterested third parties. 4. It ensures accuracy in information as the data are collected by the investigator himself with the exercise of his intelligence, skill, tact and administration of cross- examinations. 5. It allows for getting the expert views and suggestions of the specialists in the conduct of enquiry effectively and efficiently. Demerits: 1. The data obtained from the third parties may not be reliable at times because witnesses may colour the information according to their interests. 2. Due to Absence of direct contact, the information cannot be relied upon without verification. 3. The careless attitude of the informant will affect the degree of accuracy. 4. A wrong and improper choice of the witness may spoil the result of enquiry. 5. The investigators may be influenced by bribery, nepotism or undue requests for which the true information obtained may be twisted by them.

4. Through Local correspondents:


Under this method, the investigator collects the required data through the local correspondents and agents placed in the different regions of the enquiry. Such type of data collecting method is usually adopted by newspapers, or periodical agencies, and various departments of a government who require regular information from a wide area on various matters viz. economics, commerce politics, agriculture sports, accidents, riots, strikes, lockouts, stock markets etc. The correspondents, or the agents so appointed at the different localities collect the relevant information in own ways and fashions and submit them periodically to the investigating offices for their necessary use and analysis . In the fitness of the thing, such data cannot be reliable and as such this suitable in those cases where the information is to be gathered regularly from a wide area and the purpose of investigation can be served with rough estimates only without insisting on a high degree of precision. Suitability: This method is generally adopted in those cases where the information is to be obtained at regular intervals from a wide area.

Quantitative Aptitude

1.6

Introduction to statistics.

Merits: 1. It does not require any formal procedure and hence, a lot of botherations associated can be avoided. 2. It is less expensive in terms of both money, time and energy 3. The quality of the data is likely to be better since they are collected through the through the local agents who happen to be in close touch with the events or the source of the data. 4. The data can be collected expeditiously from a wide area. 5. Speedy collection of information is possible. 6. It is useful where information is needed regularly. Demerits: 1. The data are not very reliable as they are obtained informally through the correspondents who collect the data in their own ways and according to their own likings and decisions. 2. The local agents may use foul play in supplying the data regular and correctly. 3. The data are likely to be fabricated and twisted by the correspondents to exaggerate their ulterior motives. 4. Uniformity in collection of data cannot be maintained every time. 5. Mailed questionnaire: Under this method, a list of relevant questions i.e. questionnaire relating to the problem under study is set and the same is sent along with a covering letter to the selected informants by mail i.e. by post who are requested to furnish the necessary data therewith by return of post. This method of data collection is usually followed by private bodies, individuals, research scholars, institutions and governments as well. The success of this method depends largely upon the precautions taken in respect of the two things: (1) Covering letter and (2) questionnaire Suitability: This method is appropriate in cases where informants are spread over a wide area. This method should be preferred in such enquiries where there could be a legal compulsion to supply the information so that the risk of non- response is eliminated. Merits: 1. The chief advantage of this method is that it is the most economical method in terms of time, money and energy, provided the respondents respond to the questionnaire in time. 2. It ensures a reasonable standard of accuracy when the investigation is properly conducted. 3. The information gathered under this method is original, and so, more authentic. 4. There is no room for personal bias and prejudices on the part of the investigators and the enumerators, since the questionnaire is framed by the organization and the answers are given straight by the respondents. 5. Under this method, information relating to a large number of items can be collected from a wide area in a comparatively short period of time. Demerits: 1. Success in this method entirely depends upon the co-operation of the respondents which is lacking in most of the cases now-a days. In many cases, the respondents do not even return the questionnaires.
Quantitative Aptitude 1.7 Introduction to statistics.

2. The answers given by the respondents may be vague, unintelligible and haphazard and in such cases the information obtained may not be useful. 3. The information under this method cannot be expected from an informant unless he is under the direct control of the investigator or the investigating agency or is obliged to them in some form or the other. This is because; the informant has to devote some time and energy in answering to the questionnaire. 4. This method cannot be used, if the informants are illiterates and indifferent. 5. This method is not flexible i.e. adjustable to the changing circumstances. 6. In case of inadequate or incomplete answers it may be difficult to get the correct information. 7. Under this method, it is difficult to verify the veracity of the information in as much as the respondents might suppress the correct information, and furnish the wrong answers deliberately. This method is suitable for clarifying it he doubts in the minds of the respondents in regard any question. 8. This method is not suitable for clarifying the doubts in the minds of the respondents in regard to any question. 9. As required under this method, most of the informants do not like give written information in their own handwriting on certain questions of private nature viz. personal habits, income, wealth etc.

6. Schedules sent through enumerators:


Under this method, schedules i.e. list of questions are sent to the informants through the enumerators who read out the questions from the schedules to the informants and record their answers on the same schedules. Before doing so, the enumerator explains first the aims and objectives of the enquiry to the informants and seeks their co-operation in the recording of the data. He/ she also explain the various terms and concepts used in the schedules and dispel their doubts on question put to them. The cardinal difference between the mailed questionnaire method and this method is that in the former, the respondents themselves record the answers while in the latter, the enumerators record the information obtained from the informants. Suitability This method is usually used by big business houses, large public enterprises government etc. This method is quite popularly used in practice. The main reason for this is a very high rate of response because of personal contact of the enumerators. Merits: 1. The informants are given to understand clearly the aims and objectives of the enquiry and so also, the need and utility of furnishing the correct information by the enumerators. 2. The doubts of the informants on the various questions are readily removed by the enumerators, who happen to be present on the spot of the enquiry. 3. The data collected prove to be more reliable and dependable, since they are recorded by the enumerators trained in the matter. 4. It ensures large response from the respondents since they are called on personally by the enumerators.

Quantitative Aptitude

1.8

Introduction to statistics.

5. It is possible to collect the required data even from the illiterate informants, since the enumerators are there to read out and explain the questions to them and to record properly the answers given by them. 6. It is possible to rectify the erratic replies given by some cunning informants through cross questions being put by the enumerators on the spot. 7. This method is very useful in extensive enquiries. Demerits: 1. This method is extremely expensive as it requires an army of trained enumerators to be sent along with the schedules. Hence, this method can be afforded only by the big organizations. 2. This method is highly time consuming as it requires the enumerators to approach each of the informants at their door-steps and explain everything to them in clear terms. 3. The accuracy of the data may be affected by the personal prejudices of the enumerators who can very well twist the questions and the answers. 4. There is likelihood of the schedules being drafted haphazardly and incompletely in which case, the enumerators may find it very difficult to obtain the desired information correctly. 5. Due to the inherent variation in the personalities of the enumerators, there is bound to be variation in the information recorded by the different enumerators.

Methods of Collecting Secondary Data:


The various sources of secondary data can be divided into two broad categories: 1. Published sources 2. Unpublished sources

A. Published Sources
1. Government publications. There are a number of Government Publications which collect and publish data at different times like: 1. Indian Trade Journal Weekly. 2. Reserve Bank of India Bulletin Monthly. 3. Statistical of Abstracts of India Annual. 4. Estimates of National Income Annual. 5. Census Reports. 2. International publications. There are a number of international organizations which publish data at different times like: 1. International Labour Organisation (I.L.O) 2. World Health Organisation (W.H.O.) 3. United Nations Organizations (U.N.O.) 4. International Monetary Fund (I.M.F.) 3. Reports of Commissions and Committees: There are a number of reports of commissions and committees which provide complete and reliable data like: 1. Reports of Finance Commission. 2. Reports of Bhaba Committee. 3. Reports of Hazari Committee. 4. Reports of Dutt Committee.
Quantitative Aptitude 1.9 Introduction to statistics.

5. Reports of Tariff Commission. 6. Reports of Gorwala Committee. 4. Publications of Research Institutes. There are various research institutes which publish results of research works at different times like: 1. University Research Bureaus. 2. Central Statistical Bureaus. 3. Statistical Research Bureaus. 4. Institute of Economic Growth. 5. National Council of Applied Economic Research. 5. Magazines. There are a number of magazines which collect and publish data relating to different subjects like: 1. Commerce. 2. Economic Journals. 3. Economic Times. 4. Financial Times. 6. Reports of Trade Associations. There are a large number of trade associations which publish data at different times like: 1. Trade Unions. 2. Trade Associations. 3. Produce and stock Exchanges. 4. Federation of Indian Chamber of Commerce and Industry. 7. Publications of Personal Investigation. The persons who do research work first collect and then publish the data.

B. Unpublished sources:
The unpublished sources include the following: 1. Office files and documents. 2. Research dissertations of research scholars. 3. Project reports. 4. Notes of experts. etc.,

Editing of data:
Meaning: Editing of data refers to scrutiny or careful checking of the data collected from various sources. Before the data are used for analysis, it is highly necessary that they should be properly edited by the experts; else, it may lead to wrong conclusion and decision. Therefore, both the primary and secondary data should be properly edited before they are put to any use by the investigators. 1. Primary data: The primary data which is collected is collected for the first time from their sources of origin would be in the form of raw materials and are prone to many defects, viz. inaccuracy, incompleteness, inconsistency and heterogeneity. Thus, they should be carefully edited and scrutinized in order to ensure the following.
Quantitative Aptitude 1.10 Introduction to statistics.

a. b. c. d. e.

Accuracy. Consistency. Adequacy. Completeness. Homogeneity.

2. Secondary data: Secondary data whether published, or otherwise, should be used with much caution because they were collected by some others originally for their purpose at different times under different situations which may not suit the present investigation in all respects. There might be many errors of omission, commission, compensation and duplication with that data. Therefore, before using such data they must be very carefully edited and checked to ensure that they are free from inaccuracy, inconsistency, inadequacy and unsuitability. Precautions in the use of Secondary Data 1. Suitability of data. 2. Adequacy of data. 3. Reliability of data. 4. Flexibility of data.

Classification and Tabulation:


Meaning of classification: Classification is the process of grouping the heterogeneous data into homogeneous sub-groups based on certain characteristics. According to L.R. Connor, Classification is defined as the process of arranging things (either actually or notionally) in groups or classes according to their resemblances and affinities and giving expression to the unity of attributes that may subsist amongst a diversity of individuals. According to secrist, Classification is the process of arranging data into sequences and groups according to their common characteristics or separating them into different but related parts. Thus, classification is the process of arranging the available data into various homogeneous classes and sub-classes according to some common characteristic or attribute or objective of investigation. Chief Characteristics of classification are: 1. All the facts are classified into homogeneous groups by the process of classification. 2. The basis of classification is unity in diversity. 3. Classification may be either real or imaginary. 4. The classification may be according to either similarities or dissimilarities.

Principal objectives of classification


The principal objectives of classification are:

1. To reduce the hugeness of the collected data by dividing them into a number of classes on a certain basis. 2. To unravel the significance of the information that remains hidden in the data. 3. To make the data easily understandable and fit for further analysis and interpretation. 4. To facilitate comparison between related variables. 5. To provide basis for presentation of data either through tabulation, diagrams or graphs.
Quantitative Aptitude 1.11 Introduction to statistics.

6. To present the facts in a simple form. 7. To provide a ground for studying the relationship between variables. 8. To highlight significant data and eliminate unnecessary details in the data.

Kinds of Classification
The four kinds of classification are given below: 1. Qualitative Classification 2. Quantitative Classification 3. Geographical or Spatial Classification. 4. Chronological or Temporal Classification.

1. Qualitative Classification
It refers to classification of data on the basis of certain descriptive character or qualitative aspect of a phenomenon such as gender, literacy, honesty, religion, health, marriage, intelligence, etc. this sort of classification is also known as descriptive classification. Such type of classification are usually dichotomous in nature in which the whole data are divided into two groups viz. a group with the presence of the attribute and a group with the absence of the attribute such as educated and uneducated etc. however, in certain cases, such classification can also be made in manifold manner in which data are grouped under more than two classes. For instance, in the field of education, the classification can be made into groups, viz. primary, secondary, higher secondary, and higher education. Example of qualitative Classification of a City Showing Classification of Population of a city according to Gender, Literacy and Employment: Males Females Total Persons Literates Illiterates Total Literates Illiterates Employed 1,00,000 2,00,000 3,00,000 50,000 1,00,000 1,50,000 Unemployed 20,000 1,00,000 1,20,000 10,000 50,000 60,000 Total 1,20,000 3,00,000 4,20,000 60,000 1,50,000 2,10,000

2. Quantitative Classification:
Quantitative classification refers to the classification of data on the basis of such characteristics which are capable of quantitative expression and measurement such as height, weight, distance, number of marks obtained by students of class, income, expenditure of persons in a locality, age, etc. Under this classification, data are classified by assigning arbitrary limits called class-limits. Example: weight in Kgs, height in Inches etc. this type of classification is also known as classification by variables. Height, weight, marks, etc. represent variables and number of students, persons etc. indicate frequencies. Example of Quantitative classification of discrete and continuous variables:
Marks 20 25 35 45 50 Discrete variables No. of children Per family. 0 1 2 3 4 No. of children Per family. 1 2 3 4 5 Marks in percentage. 0-10 10-20 20-30 30-40 40-50 Continuous variables Wages (in rupees.) 50-100 100-150 150-200 200-250 250-300 Height (in inches) 70-75 75-80 85-90 90-95 95-100

Quantitative Aptitude

1.12

Introduction to statistics.

3. Chronological or Temporal Classification:


Chronological or Temporal Classification is the classification collected data on the basis of time of their occurrence. For example, population, production, sales, results etc. As such, the series obtained under this classification is known as time series.
Example of chronological classification; sales of a company over a period five years: Year Sales (In Lakhs) 2006 50 6007 75 2008 80 2009 90 2010 95

4. Geographical or Spatial Classification:


Geographical or Spatial Classification refers to classification of data on the basis geographical area or place. The areas may be in terms of countries, states, districts, or zones. For example, production of sugarcane in India- state-wise or production of paddy in different states, railway zones in India, etc.
Example of Geographical or Spatial Classification ; Showing the production of wheat in five important states: Name of State Wheat Production (in Thousand Tones) Haryana 50000 Punjab 40000 Uttar Pradesh 30000 Madhya Pradesh 25000

Advantages of Classification
Classification of data helps in organizing raw data into smaller groups. It has the following advantages. 1. It condenses the data and ignores unnecessary details. 2. It facilitates comparison of data. 3. It helps in studying the relationship between several characteristics. 4. It facilitates further statistical treatment.

Primary rules to be followed while classifying


Primary rules that should be observed while classifying data are as follows: 1. Unambiguously Defined The classes should be unambiguously defined. There should not any chance for doubt or confusion. 2. Exhaustive The classes should be exhaustive. In other words, every observation must get classified into a class. 3. It should be stable An ideal classification should be stable. If a classification is not stable and every time an enquiry is conducted it has to be changed, the data would not fit for comparison. 4. It should be flexible A good classification should be flexible and should have the capacity of adjustment to new situations and circumstances. 5. It should have suitability The classification should conform to object of enquiry. 6. It should be homogeneous The items included in each class must be homogeneous, otherwise there may be further classification into sub-groups. 7. It should be a revealing classification A classification is said to be revealing if it brings out the essential features of the collected data. This is done by selecting a suitable number of classes.

Quantitative Aptitude

1.13

Introduction to statistics.

8. Mutually exclusive Each item of the given data should fit only in one class. In other words, classes must not overlap. 9. Equal Width As far as possible, the classes should be of equal width. 10. Neither too large for too small The number of classes should neither be too large nor too small. 11. Width of class interval is decided first by fixing the number of class intervals and then dividing the total range into those many classes.

Presentation of Data:
Data can be presented in the following three modes of presentation 1. Textual presentation 2. Tabular presentation or Tabulation. 3. Diagrammatic presentation.

1. Textual presentation:
Textual presentation involves presenting data with the help of a paragraph or a number of paragraphs in a systematic and logical manner.

2. Tabular Presentation or Tabulation:


Definition:
Tabulation is defined as logical and systematic arrangement of statistical data in appropriate rows and columns. It is designed to simplify the presentation of data for the purpose of analysis and statistical inferences

Tabulation means systematic representation of the information collected in the data in rows or columns according to certain characteristics. Classification is the first step in tabulation. A statistical table is a systematic organization of data in the columns and rows. Tabulation is the process of presenting data in tables. Objects: A good statistical table is not a mere careless grouping of columns and rows of figures; it is a triumph of ingenuity and technique, a master-piece of economy of space combined with a maximum of clearly presented information. To prepare a first class table, one must have to clear idea of the facts to be presented, the contrasts to be stressed, the points upon which emphasis is to be placed and lastly a familiarity with the technique of preparation. Characteristics of tabulation: 1. It is a systematic arrangement of quantitative data. 2. The data are related to one another on some logical basis. 3. The data are arranged in appropriate columns and rows. 4. It has explanatory notes and qualifying words to make clear the meaning of the data. 5. It is done after classification. The main objectives of tabulation are: 1. To clarify the objective of investigation. 2. To simplify the complexity of data. 3. To present facts in the minimum of space. 4. To facilitate comparison of data. 5. To detect errors and omissions in the data. 6. To indicate the trend, tendencies and pattern of the data. 7. To facilitate statistical processing. 8. To help reference.
Quantitative Aptitude 1.14 Introduction to statistics.

Advantages of tabulation 1. It expresses the information in concise, attractive and easy to read and understand form. 2. It facilitates detection of errors in the data. 3. It facilitates comparison of two or more sets of data gathered on the same characteristics. 4. The row and column headings eliminate the necessity of repeating explanatory details. Difference between Classification and Tabulation of data: Classification of data Tabulation of data 1. Classification refers to grouping of data on Tabulation refers to arrangement of data in appropriate basis. appropriate columns and rows. 2. Classification is a conceptual function. Tabulation is a mechanical function of classification. 3. Classification precedes tabulation. Tabulation succeeds classification. 4. Classification is the first step in the process of Tabulation is process of presenting data in statistical analysis. suitable structure. 5. Classification is possible without tabulation Tabulation is not possible without of data. classification of data. Parts of a statistical table: The statistical table has five parts. They are as follows: 1. Title: This is a brief description of the contents and is shown at the top of the table. It includes table number, title of the table and head note. 2. Stub or row heading: The extreme left part of the table, where descriptions of rows are shown is called stub. In stub a characteristic of variable is presented. 3. Caption and box head: The upper part of the table which shows the description of columns and sub-columns is called caption. The whole upper part, including caption, units of measurement and column numbers, if any, is called box-head. In caption usually frequencies are depicted. 4. Body of the table: It is that part of the table which shows the values related to variable under study. It is below caption and box-head and right side of stub. It is the most important part in the statistical table. It is gist of the data of the problem under consideration. 5. Foot note: Foot note is the part below the body of the table, where the source of data and any important explanations are shown. Points to be kept in mind while preparing a good statistical table: A good statistical table should present the data, highlighting the important details and excluding any additional or non-repeating character of information from the main body of the table. Therefore, while constructing a statistical table, the following points should be kept in mind: 1. Table number: Each table should be numbered for identification when there are a large number of tables in a study.

Quantitative Aptitude

1.15

Introduction to statistics.

2. Title: A clear and brief title should be given to the table to explain what it contains and the title should be carefully worded so that it is capable of only one interpretation. The title should be set in bold type so as to give it prominence. 3. Date: The date of preparation of the table should be mentioned. 4. Stubs or row designations: Each row must be given a heading to explain what the values in the rows represent. 5. Captions or Column headings: Each column must be given a heading to explain what the values in the columns represent. 6. Body of the table: The actual data should be so arranged that any value must be readily located. 7. The unit of measurement: The unit of measurement should always be stated along with the title, if the unit is uniform throughout. If different units have been adopted, this may be given along with stubs or captions whichever is appropriate. 8. Source: The source of information should be mentioned.

General Rules
1. 2. 3. 4. The table should be simple and compact. It should not be overloaded with details. The captions and stubs in the table should be arranged in a systematic manner. It should suit the purpose of the investigation. The unit of measurement should be clearly defined and given in the tables; for example, height in meters, weight in Kilograms, etc., 5. Figures may be rounded off to avoid unnecessary details in the table. But a footnote must be given to this effect. 6. A miscellaneous column should be added to include unimportant items. 7. A table should be complete and self-explanatory. 8. A table should be attractive to draw the attention of readers. 9. Abbreviations should be avoided. 10. As it forms a basis for statistical analysis, it should be accurate and free from all sorts of errors. 11. Do not use ditto marks that may be mistaken. 12. Proper lettering will help to adjust the size of the table. 13. If it is a big table, it will lose its simplicity and understandability; and in such a case break it into two or three tables. Types of tables: Statistical tables are classified into various categories depending upon the basis of their classification. Their choice basically depends upon the following basis: 1) Extent of coverage given in the enquiry 2) Objective and scope of enquiry. 3) Nature of enquiry. 1) On the basis of the extent of coverage given in the enquiry: 1. Simple table. 2. Complex table.
a. Two-way table. b. Three-way table. c. Manifold or higher order table.

1. Simple table:
Quantitative Aptitude 1.16 Introduction to statistics.

In a simple table, the data are classified according to one characteristic only and accordingly it is also termed as one-way table. 2. Complex table: A complex table is used to present data according to two or more characteristics or criterion simultaneously. It is also called a manifold table. In particular if the data are classified according to two or three characteristics simultaneously, we a two-way or a three-way table. a. Two-way table: A two table furnishes information about the two inter-related attributes or characteristics of a particular phenomenon. In such a table, the columns of the table are further divided into sub-columns. b. Three-way table: If the data classified simultaneously according to three variables, we get a three-way table. The three-way table provides us information regarding three inter-related characteristics or attributes of a particular phenomenon. c. Manifold or higher order table: Manifold tables provide us the information on a large number of inter-related characteristics of a given phenomenon.

2) On the basis of objective and scope of enquiry:


1. General purpose table. 2. Special purpose table. 1. General purpose table or reference work table: General purpose tables provide a convenient way of compiling and presenting a systematically arranged data, usually in the chronological order, in a form which is suitable for ready reference and record without any intentions of comparative studies, relationship or significance of values. 2. Special purpose table or summary tables: Special purpose tables present information relating a specific subject under study. They are of analytical in nature and are prepared with the idea of making comparative studies and studying the relationship and the significance of figures provided by the data. In such tables, interpretive values like ratios, percentages etc. are used in order to facilitate comparisons.

3) On the basis of nature of enquiry:


1. Primary or original table. 2. Secondary or Derived table. 1. Primary or original table: Primary or original tables contain data which were initially collected from the original source. 2. Secondary or Derived table: A table which present results derived from the original data like averages, standard deviations, coefficients etc. would constitute derivative or derived tables. It expresses the information in terms of ratios, percentages, or statistical measures like averages, dispersion, skewness or coefficients etc. for example, a time series table containing original values is a original table; but a table containing trend values constitutes a derived table.

Presentation of data by frequency distribution:


The raw data can be arranged in any one of the following methods.
Quantitative Aptitude 1.17 Introduction to statistics.

1. 2. 3. 4.

Serial order or alphabetical order Ascending order or descending order Tables or charts Groups or class intervals.

Meaning of variable
Any character, which can vary from individual to another, is called variable or variate . For example, Age, income, height intelligence, colour etc., are variate. Some variate or variables are measurable and others not directly measurable. Variables or observations with numbers as possible values are called quantitative variables, whereas those with names of places, attributes. Things etc., as possible values are called qualitative variables.

Variables are two types


1. Discrete variables or discontinuous variables. 2. Continuous variables. 1. Discrete variables or discontinuous variables: A discrete variable can assume only integral values and can never assume any fractional value. These variables are characterized by discontinuity or jumps and gaps between each other. e.g. children per family, population per village etc. in case such variables, the data are collected through counting rather than measurement. These are the variables, which can take only a finite set of values. 2. Continuous variables: A continuous variable is one which can assume any value of both integral and fractional nature within a specified range of numbers. E.g. height, weight, age etc. can assume any value within the specified range. In case of such variables, the data are obtained through measurement rather than counting. Frequency: The number of times an observation occurs in a given data is called the frequency of the observation.

Frequency Distribution:
A frequency distribution is the arrangement of the given data in the form of a table showing frequency with which each variable occurs. Frequency distribution of a variable x is the ordered set (x, f), where f is the frequency. It shows all scores in a set of data together with the frequency of each score. Frequency distribution is a classification according to the number possessing the same values of the variables. It is simply a table in which the data are grouped into classes and the number of cases that fall in each class is recorded. A frequency distribution consists of a listing of several measurement categories, or classes, with an indication of the number of observed measurements, or frequency, associated with each class.

Basically, frequency distribution can be of two kinds:


(1) Univariate frequency Distribution (2) Bivariate Frequency Distribution (Two-Way Frequency Distribution) Univariate Frequency Distribution is of two types

Quantitative Aptitude

1.18

Introduction to statistics.

1. Discrete Frequency Distribution. 2. Continuous Frequency Distribution. 1. Discrete Frequency Distribution Discrete or simple frequency distribution is a distribution in which the values of the variable are shown individually with their respective frequencies. It is usually used to present discrete variables and also the continuous variables when the range is small. 2. Continuous Frequency Distribution. Continuous frequency distribution is a distribution in which values of variable are shown in groups of intervals with respective frequencies. It can be used only for continuous variables.

Practical steps involved in forming the frequency distribution.


1. Discrete frequency distribution: Step 1: Find the highest and lowest value given in the data. Step 2: Make two columns, one for variable (x) and the other for tally bars. Step 3: Arrange the values of variable in the ascending order in the variable column and take another column for Tally bars. Step 4: Record the data in frequency tally Step 5: Find the frequency by counting the frequency tally. 2. Continuous frequency distribution: Step 1: Find range. Step 2: Fix the number of classes or the class intervals (The number should neither be too small nor too big). We can use Struges rule to find number of classes. K=1+ (3.2 log n) Step 3: Find the class interval or class width for the frequency distribution. largest value - smallest value Class-Interval = number of class intervals Step 4: Make two columns one for variable(x) and the other for tally bars. Step 5: Arrange the values of variable in the form class groups starting from the lowest value taking into account class interval in the ascending order in the variable column and another column for Tally bars. Step 6: Record the data in frequency tally. Step 7: Find the frequency by counting the frequency tally.

Seriation of data:
Meaning: Seriation means successive arrangement of the classified data on certain logical basis viz, time, size, importance etc. This is done with a view to facilitate the analysis and interpretation of the data from different characteristics points of view. Characteristics: 1. It is a sort of arrangement of the different values of a variable in successive manner. 2. There must be certain logical basis on which the data are arranged successively. 3. It is done after the classification of the data. 4. It shows the related frequencies along with the data. Types of Series and Frequency Distribution: The statistically series are of two types:
Quantitative Aptitude 1.19 Introduction to statistics.

1. Time series. 2. Frequency series. 1. Time series: A series of data that is arranged chronologically or in relation to time is called a time series. Example of time series: Years No of students 2006 270 2007 320 2008 450 2009 500 2010 550 2. Frequency series: A series of data that is formed along with the frequencies of their occurrences is called a frequency series. A frequency series can be of three types. 1. Individual series 2. Discrete series and 3. Continuous series 1. Individual series: An individual series is one in which each variable occur for only once. In other words, the frequency of occurs of all the values in such a series is only one. As such essentially such series are displayed without the frequency column. An individual series may be unorganized or organized. Example of individual series: Marks obtained by students in an examination. Marks: 41,25,5,33,12,21,19,39,19,21,12,1,19,12,19,17,12,17,17,41,41,19,41,33,12,21,33,5,1,21. 2. Discrete series: A discrete series is one in which different values of a variable are shown in discontinuous manner along with their respective frequency and at least one of the values has a frequency of more than one. Such a series can be organized or unorganized. Example of discrete series for unorganized/unarrayed and organized/arrayed data: Unarrayed Arrayed Marks No of students Marks No of students 20 05 10 03 30 13 20 05 10 03 30 13 50 18 40 25 60 06 50 18 40 25 60 06 3. Continuous series: A continuous series is one in which the different values of the variables are stated in continuous manner along with the respective frequencies. Such series can be arranged either in ascending or descending order. In addition to this such series can be stated in form of exclusive, inclusive, mid-value, open-ended or cumulative series.

Quantitative Aptitude

1.20

Introduction to statistics.

There are five methods of presenting the continuous series. 1. Exclusive method (Overlapping) 2. Inclusive method (non-overlapping) 3. Mid-value method. 4. Open-ended method. 5. Cumulative method. 1. Exclusive method: In this method of classification an assumption is left to be made that the upper boundary of a class is excluded from the said class while all other values beginning with lower boundary are included therein. Thus, in the class 0 to 10, 10 will be excluded, and all the remaining values beginning with 0 will be included in the said class. Example of exclusive series: Marks (x) 0-10 10-20 20-30 30-40 40-50 No. of Students (f) 6 12 20 10 2 2. Inclusive method: In this method of classification, an assumption is left to be made that both the limits of a class are included in the same class. Thus, in the class, 0 to 9 both 0 &9 are included in the said class. Though in this form of classification appears to be discontinues leaving gaps between the upper boundary of one class and lower boundary of the next class. Viz, 9 and 10, 19 and 20, but in reality, it is not so as such in classification, it assumed that there remains no value in between such gaps. As such, this type of classification is suitable for discrete variables. This type of classification is never suitable for continuous variables not so. Example of inclusive series: Marks (x) 0-9 10-19 20-29 30-39 40-49 No. of Students (f) 6 12 20 10 2 Distinction between exclusive classification and inclusive classification Exclusive class limits Inclusive class limits 1. Class groups will be like 0-10, 10-20, 20-30 1. Class groups will be like 0-9, 10-19, 20etc., 29 etc., 2. An item exactly 10 is put in the group 10- 2. there is no ambiguity to which an item 20 not in 0-10 be longs 3. Class limits are continuous 3. idea of continuity is lost 4. advantages for many further mathematical 4. stated class limits are also real class computations limits 5. suitable for continuous variable data 5. suitable for discrete variable data 3. Mid-value method: In this method of classification, the mid-values of the classes are taken as the representative values of each class. This method is not suitable for analysis of certain type of data. Example of mid-value series:
Marks (x) ( mid-values) No. of Student (f) 5 6 15 12 25 20 35 10 45 2

4. Open-ended method: In this method, of classification, it is assumed that a class boundary remains missing either at the lower end of the lowest class, or at the upper limit of the upper most class, or that
Quantitative Aptitude 1.21 Introduction to statistics.

both the boundaries are missing. This form of classification is suitable in practical situation like mark distributions, economic conditions etc., Where few very high values or few very low values remain far away from the majority of the data. Example of open-ended series:
Marks (x) No. of Student (f) Below 10 6 10-20 12 20-30 20 30-40 10 40 & Above 2

5. Cumulative frequency distribution: Cumulative frequency distribution is a statistical table which shows the values of the variable and the corresponding cumulative frequencies. It can be derived from a grouped frequency distribution by writing down the consecutive class boundary points and noting the number of observations less than/more than each class boundary point. There are two types of cumulative frequency distributions; 1. Less than cumulative frequency distribution. 2. More than cumulative frequency distribution. Uses or importance: 1. Cumulative frequency distribution is used to find the number of observations less than/more than any given value. 2. It is used to find the number of observation falling between any two specified values of the variable. 3. It is helpful in finding median, quartiles, deciles and percentiles. Example of Less than cumulative series: Marks (x) Less than Less than Less than Less than Less than 40 10 20 30 50 No. of Student (f) 6 18 38 48 50 Example of More than cumulative series: Marks (x) More than More than 0 10 No. of Student (f) 50 44 More than 20 32 More than 30 12 More than 40 2

Different components of continuous classification: A continuous form of classification has different components which need to be understood and determined carefully. These are as follows: 1. Class interval: A class interval means the gap or range between the two extreme values of a class in its exclusive form. In other words, it is the difference between upper boundary and lower boundary of the class. It is also known as class magnitude. 2. Class boundaries: Class boundaries or class walls mean the two extreme values of a class to which the data belonging to the said the class cannot cross or exceed either way. The lower of the two is called the lower boundary and the higher of the two is called the upper boundary of the class. In case of the exclusive classification, all the values class remains below the upper boundary but in the case of inclusive classification all the values of a class remain within the two boundaries ending to upper one.

Quantitative Aptitude

1.22

Introduction to statistics.

3. Class limits: Class limits mean the two extreme values of class within which all the values of class remain. The lower value of the class is called the lower limit and upper value below which all the data remain is called upper limit of the class. In case of exclusive classification, the class limits are equal to the class boundaries. In case of inclusive classification, the class limits are not equal to the class boundaries. Classes with class boundaries are easier for analysis. 4. Mid- value or class mark: The value that lies at the centre of a class is called the mid- value, mid-point or class mark of the said class. It is calculated by adding the upper limit/boundary and lower limit/boundary and then dividing the sum by two. Mid- values are the representative of the respective classes of the variable. 5. Types of frequencies used in the frequency distributions: Frequency is the number of times a value of the variable or class occurs in the data. The following are the four types frequencies used in frequency distributions. 1. Simple frequencies. 2. Frequency density. 3. Relative frequencies. 4. Cumulative frequencies. 1. Simple frequencies: The number of times an observation occurs for a particular value in a given data is called the simple frequency of the observation. Variable 10-20 20-30 30-40 40-50 50-60 Simple frequency 2 5 20 14 3 2. Frequency density: Frequency density of a class is its frequency per unit width. It shows the concentration of the frequency in a class. Frequency density is the simple frequency of a class divided by the class interval of that class. Frequency density is used in drawing histogram, when the class intervals are not same. They help in comparing the classes in the frequency distribution. Symbolically, frequency density is calculated as follows:

Variable Simple frequency Frequency density

60-80 10 0.5

80-90 90-95 95-100 100-105 105-110 110-115 115-135 15 10 16 12 5 15 10 1.5 2.0 3.2 2.4 1.0 3.0 0.5

3. Relative frequencies: Relative frequency is the class frequency expressed as a ratio of the total frequency. The sum of all the frequencies is equal to unity. When the relative frequencies are shown against the corresponding classes, the distribution is known as relative frequency distribution. Symbolically, Relative frequency is calculated as follows:

Quantitative Aptitude

1.23

Introduction to statistics.

Variable Simple frequency Relative frequency

10-20 3 0.03

20-30 30-40 40-50 50-60 60-70 70-80 80-90 Total 100 8 10 15 25 20 14 5 0.08 0.10 0.15 0.25 0.20 0.14 0.05 1.00

4. Cumulative frequencies: Cumulative frequency corresponding to a class is the sum of the frequencies upto and including that class. Cumulative frequencies can be less than cumulative frequencies and more than cumulative frequencies. They are used in drawing cumulative frequency or ogive curves. Example of Less than cumulative frequency: Less than Less than Less than Less than Less than Variable
100 200 300 400 500

Less than cumulative frequency


Variable More than cumulative frequency

18

38

48

50

Example of More than cumulative frequency:


More than More than More than More than More than 0 100 200 300 400

50

44

32

12

Diagrammatic Representation of Data: Meaning & Definition:


A diagram is a visual form for presentation of data. Diagram refers to various types of devices such as Bars, Circles, Maps, and Pictorials etc. Diagrams are the last resort in hands if statistician to present the data before the common mass, who do not understand the numerical values, and for whom the numerical values are considered very much boring and complicated. Characteristics: 1. They give a pictorial presentation of data. 2. They are concerned only with the homogeneous data 3. They do not any constant shape 4. They give only birds eye-view of the data. Objectives of Diagrammatic Representation: 1. To present the data in an attractive and impressive manner. 2. To facilitate comparison. 3. To leave a lasting impression. 4. To bring out the characteristics of the data. 5. To element the complexity of data. 6. To save time and energy. Merits: 1. They are attractive. 2. They are easily intelligible. 3. They are highly impressive. 4. They are easily comparable. 5. They save and time energy. 6. They are very much helpful in various types of studies. Demerits: 1. They are not fit for exhaustive study.
1.24

Quantitative Aptitude

Introduction to statistics.

2. 3. 4. 5. 6.

They are liable to be misused. They are fit only for comparative study They are liable to be misinterpreted. Diagrams show only approximate value. The uses of certain diagrams are limited to the experts only. For example, the three dimensional or Multi-dimensional diagrams. 7. It exposes only limited facts. All details cannot be presented diagrammatically. 8. It is a supplement to the tabular presentation but not an alternative to it. 9. Minute readings cannot be made. Small differences in large measurements cannot be studied. For example, the difference between 9025 and 9000 shown in diagram cannot be apparent. 10. If there is a wide gap between two different measurements, the diagram will not give a meaningful look. For example, 10 and 900 cannot be shown in a diagram, whatever the scale be adopted. Rules for making a Diagram: 1. Heading. 2. Size. 3. Length and breadth. 4. Clear Drawing and A proper scale. 5. Selection of appropriate Diagram. 6. Index. 7. Sources. 8. Simplicity.

Types of diagrams:
A. Dimensional diagrams. B. Pictograms. C. Cartograms. A. Dimensional diagrams: A diagram in which certain dimensions are displayed in a prominent and proportional manner is called a dimensional diagram. Diagrams can have three types of dimensions, viz. length, breadth and thickness. There are three types of dimensional diagrams. 1. One-Dimensional Diagrams. 2. Two Dimensional Diagrams. 3. Three Dimensional Diagrams. 1. One-Dimensional Diagrams (line and bar) In one-dimensional diagram, length of bar is considered while width is not considered. They are in the shape of vertical or horizontal lines or bars. 1. Line diagram. 2. Bar diagram. a. Horizontal bar diagram. b. Vertical bar diagram. 1. Simple bar diagram. 2. Multiple bar diagram (Compound bar diagram). 3. Sub-divided bar diagram (Component bar diagram). 4. Percentage sub-divided bar diagram.
Quantitative Aptitude 1.25 Introduction to statistics.

1) Two Dimensional Diagrams(Area or surface diagram): In the one-dimensional diagrams, we consider only the length of the bar. On the other hand, in two dimensional diagrams the length and the width of the bars are considered. These types of diagrams are known as Surface Diagrams or Area Diagrams. The following are different types two -dimensional diagrams. 1. Rectangles. 2. Square diagram. 3. Circle. 4. Angular or pie diagram. 2. Three Dimensional Diagrams: Three-dimensional diagrams are also known as volume diagrams. These diagrams consist of cubes, spheres, cylinders, etc., in these diagrams three things, such as length, width and thickness are taken into account. The following are three dimensional diagrams. 1. Cubes. 2. Spheres. 3. Cylinders. B. Pictograms: Pictograms refer to the pictures or cartoons, under this type of diagram, appropriate pictures are drawn to represent the quantitative data relating to a phenomenon. The number and size of the pictures are determined in proportion to the volume of the various figures to be represented. C. Cartograms. A cartogram refers to a map through which statistical information is represented in different manner viz. shades, Dots, columns. The regional distribution of data viz, distribution of rainfall in various parts of a country, deposits of minerals in various regions, density of population in various geographical areas etc. are well presented through this type of diagram. For this purpose, first of all, a map is drawn for the broad area and then the different parts of its regions are identified precisely with reference to the geographical area. The given data are then represented through the use of appropriate symbols or marks.

Graphic Presentation of Data


Meaning: The representation of quantitative data suitably through charts and diagrams is known as Graphical Representation of Statistical Information. Graphs include both charts and diagrams. A graphic representation is the geometrical image of a set of data. It is a mathematical picture. It enables us to think about a statistical problem in visual terms. A picture is said to be more effective than words for describing a particular thing or phenomenon. Consequently, the graphic representation of data proves quite an effective and economic device for the presentation, understanding and interpretation of the collected statistical data. Advantages of Graphical Representation 1. It is easily understood by all. 2. The data can be presented in a more attractive form and appealing to the eye. 3. I It simplifies complexity of data. 4. It provides easy comparison of two or more phenomena. 5. It shows relationship between two or more sets of figures.
Quantitative Aptitude 1.26 Introduction to statistics.

6. It shows the trend and tendencies of values of the variables. 7. It helps interpolation of the values of the variables. 8. It has the universal applicability. 9. Various valuable statistical averages like median, mode, quartiles, may be easily observed 10. It helps in Comparative analysis and interpretation of data in an easy manner. 11. It provides a more lasting effect on the brain. It is possible to have an immediate and meaningful grasp of large amounts of data through such presentation. 12. The real value of graphical representation lies in its economy and effectiveness. It carries a lot of communication power. 13. It may help in the proper estimation, evaluation and interpretation of the characteristics of items and individuals. Disadvantages of Graphical Representation 1. Diagram does not show all the facts or the data in detail. 2. It is a supplement to the tabular presentation but not an alternative to it. 3. Minute readings cannot be made. General Rules: 1. Every graph must have a title, indicating the facts presented by the graph 2. It is necessary to plot the independent variables on the horizontal axis and dependent variables on the vertical axis. 3. Problem arises regarding the choice of suitable scale. The choice must accommodate the whole data. 4. The principle of drawing graph is that the vertical scale must start from zero. If the fluctuations are quite small compared to the size of variables, there is no need of showing the entire vertical scale from the origin. The scale just sufficient for the purpose need be shown and for this purpose a false line may be used (and due to want of space).The portion of the scale which lies between zero and the smallest variable is omitted, by drawing two horizontal lines. 5. For showing proportional relative changes in the magnitude, the ratio or Logarithmic scale should be used. 6. The graph must not be overcrowded with curves. 7. If more than one variable is plotted on the same graph, it is necessary to distinguish them by different lines viz., dotted lines, broken lines, dot-cum-dash, thick, thin dashed lines etc., 8. Index should be given to show the scales and the meaning of different curves. 9. All lettering must be horizontal. 10. It should be remembered that for every value of independent variable there is a corresponding value of the dependent variable. It is these matched values (pair of values) that are to be plotted. Each pair of value is represented on the graph by a point. This point corresponds to the values on X-axis and Y-axis. 11. Source of information should be mentioned as footnote.

Types of graph:
Graphs can be presented on two scales, viz. natural scale and ratio scale. Natural scale: In natural scale, equal distances represent equal absolute magnitudes on both the axis. Such graphs can be used with advantage if we are interested in displaying the absolute changes in the value of phenomenon and variation in the magnitudes are such that they can be plotted on the graph paper. Natural scale is based on arithmetic progression.
Quantitative Aptitude 1.27 Introduction to statistics.

Under natural scale graphs can be classified into the two heads: A. Graphs of time series. B. Graphs of frequency distribution. A. Graphs of time series: 1. Line Graph. a. Single variate graph or historiagram. b. Multi- variate graph or mixed graph. 2. Multiple- axis graph. 3. Range graph. B. 1. 2. 3. 4. Graphs of frequency distribution Histogram. Frequency Polygon. Frequency curve. Cumulative frequency or ogive curves.

Graphs of time series: 1. Line Graph or historiagram: When a graph represents data relating to time series, it is called a graph of time series. It can be single variate graph or multi-variate graph. When a graph represents data of only one dependent variable relating to time series is called a single variate graph. When a graph represents data of two or more dependent variables given in the same statistical units relating to a time series is called a multi- variate graph or mixed graph or multiple line graph. 2. Multiple- axis graph: When only two variables given in two different statistical units relating to a time series are presented on the graph it is known as multiple-axis graph. In this type of graph time is plotted on the x axis and two variables on y axis. y axis is divided into two side, the left and the right side. 3. Range graph: A graph where the minimum and maximum values or variation in the variable with reference to different periods or times are plotted is known as range graph. Graphs of frequency distribution 1. Histogram: A histogram is a series of rectangles. A histogram is a set of vertical bars whose areas are proportional of the frequencies represented. In histogram, the variable is always shown on the X- axis and the frequencies on the Y- axis. Each class is then represented by a distance on the scale that is proportional to its class-interval. The distance for each rectangle on the X- axis shall remain the same in case the class-intervals are uniform. If they are not uniform the width of the rectangles shall also vary. The Y- axis shows the frequencies of each class which constitute the height of its rectangle. Thus, by doing so we get a series of rectangles each having a class-interval instance as its width and frequency distances as its height.

Quantitative Aptitude

1.28

Introduction to statistics.

Uses
1. It gives visual representation of the relative sizes of the various groups. 2. The surface of the tops of rectangles also gives idea of the nature of the frequency curve of the population. 3. Histogram may be used to find the mode graphically. Difference between bar diagram and histogram: The distinction can be done on the basis of dimension. A bar diagram is one dimensional, i.e. only the length of the bar is kept in mind not the width. Histogram is two-dimensional, i.e., both length and the width are taken into consideration.

2. Frequency polygon:
A frequency polygon is a line graph for the graphical representation of a frequency distribution if the mid-points of the top horizontal sides of the rectangles in a histogram are marked and joined by a straight line; the line so formed is called a frequency polygon. It is assumed that frequencies in a class interval are evenly distributed throughout the classes and consequently the mid-points are representative. . Frequency Polygon in Discrete Series: The Variables are shown on X-axis and frequency is shown on Y-axis. Thereafter we plot various frequencies and the points are joined by a straight line. Frequency polygons in Continuous Series: There are two ways of making frequency polygon.

1. By making Histogram:
We can draw a histogram of the given data and then join by straight line the mid-points of the upper horizontal side of each rectangle with the adjacent one. The figure so made is called frequency polygon, it is an accepted practice to close the polygon at both ends of the distribution by extending them to the base line. This practice or method should be followed 2. By Not Making Histogram: In this method of preparing frequency polygon we have to take the mid-points of various class-intervals and then plot the frequency corresponding to each point and to join all these points by straight lines. The figures obtained would exactly be the same as obtained under first method. The only difference is that we have not to construct histogram.

Uses
1. It is particularly useful in representing a simple and ungrouped frequency distribution of discrete variables. 2. It gives approximate idea of the shape of the frequency curve. 3. By making a frequency polygon, we can find the value of mode very easily. 4. Apart from it, frequency Polygon facilitates comparison of two or more frequency distributions on the same graph.

3. Frequency curve:
It is also called as smoothed frequency curve. This curve can be drawn from the various points of the polygon. The curve is made freehand in such a manner that the area included under the curve is approximately the same as that of the polygon. The main object of this type of curve is to eliminate accidental variations that might be present in the data. For drawing a smoothed frequency curve it is essential to first draw the histogram then a polygon and lastly to smooth it to obtain the smoothed frequency curve. This curve should start and end
Quantitative Aptitude 1.29 Introduction to statistics.

at the base line. As a general rule it may be extended to the mid-point of class-interval just outside the histogram. The area under the curve should represent the total number of frequencies in the whole distribution. Note: Remember the following points while smoothing a frequency curve. 1. We can smooth only frequency distribution based on sample. 2. We can smooth only continuous series. 3. The total area under the curve must be equal to the area under the original histogram or polygon.

4. Cumulative frequency curve or ogive curve:


Ogive curve is based on cumulative frequencies and is, therefore, also known as cumulative frequency curve. The basic difference between a frequency curve and an ogive curve is that in the latter, we plot the cumulative frequency in the upper limit of the class interval rather than plotting the individual frequencies on the entire class-interval. There are two methods of constructing ogive curve they are: (A) The Less than method, (B) The More than method.

Less than method:


In case of less than method we start with the upper limits of the classes and go on adding the frequencies. When these frequencies are plotted we get a rising trend curve.

More than method:


Under this method, we start with the lower limits of the classes and the frequencies of each class are left out as we move on to further classes. When these frequencies are plotted then we get a declining curve.

Utility of ogive curves


The ogive curves are used for the following purposes. 1. To find out and to show the number of proportion of cases above or below a given value. 2. To make comparison of two or more frequency distribution. 3. Ogives can also be drawn to determine certain values graphically such as median, quartile, deciles etc. 4. It is also useful in finding the cumulative frequency corresponding to a given value of the variable. Ratio scale or Semi-logarithmic Scale: Ratio scale is meant to study the relative changes in values. It tells us about the rate of change or ratio of change. Ratio Scale or Semi-logarithmic scale is basically used to highlight or emphasize relative or proportionate or percentage changes in the value of phenomenon over different periods of time. Types of Ratio scales or graph: 1. Logarithmic scale or graph. 2. Semi- logarithmic scale or graph. 1. Logarithmic graph: When logarithm of values of both the variables or plotted on a natural scale it is known as logarithmic graphs.

Quantitative Aptitude

1.30

Introduction to statistics.

2. Semi- logarithm graph: When one of the variables is plotted on a logarithmic scale on vertical axis against the other which is plotted on natural scale on horizontal axis, the graph is known as semi logarithmic graph. Difference between diagram and graph Diagram
1. A plain paper is used. 2. Diagrams failed to present the data Relating to time series and frequency distribution. 3. Diagrams cannot estimate mean, Median or mode etc. 4. It is not helpful in interpolation and extrapolation techniques. 5. It furnishes only approximate information 6 7 It is used for comparison and not to study mathematical relationship. It is not much useful to the researcher for further statistical analysis.

Graph
1. Graph paper is used in the construction of graph. 2. Graphs are used for the study of time series and frequency distribution. 3. The value of median and mode can be estimated. 4. It is helpful in interpolation and Extrapolation techniques. 5 It furnished more precise and accurate information. 6 It helps us to study mathematical relationship between two variables. 7 It is useful to the researches for the study of slopes, rates, estimation etc.,

Diagramatic and Graphic Representation.


One-dimensional diagrams. Practical Steps Involved In the Construction of Simple Bar Diagram.
Step 1: If the bars are prepared vertically, take the values of variable on y-axis and time variable on x-axis. Step 2: Draw the bars according to the given data.

Practical Steps Involved In the Construction of Multiple Bar Diagrams.


Step 1: Take the time variable on x-axis and values of other variables on y-axis. Step 2: Draw Identical bars as per given data. Set of the bars (of different variables) should be constructed adjoining each other and their width should be identical. Step 3: Leave Identical gap in between all the sets. Practical Steps Involved In the Construction of Sub-Divided Bar Diagrams. Step 1: Take the time on x-axis and value of variable on y-axis. Step 2: Calculate the cumulative value of different components of a given variable. Step 3: Construct one bar to represent the total value of the variable. Step 4: Ascertain the cumulated points where bars are to be sub-divided. Step 5: Sub-divide the bar at these cumulated points.

Two-dimensional diagrams. Practical Steps Involved In the Construction of Rectangle Diagram.


Step 1: Identify the variable to be represented by the length and represent the same. Step 2: Identify the variable to be represented by the width and represent the same. Step 3: leave an identical gap in between various rectangles.

Quantitative Aptitude

1.31

Introduction to statistics.

Practical steps involved in the construction of sub-divided rectangle diagram


Step 1: Take the time on x-axis and value of variable on y-axis. Step 2: Calculate the cumulative values of different components of a given variable. Step 3: Construct one rectangle to represent the total value of the variable. Step 4: Ascertain the cumulated points where rectangles are to be sub-divided. Step 5: Sub-divide the rectangle at these cumulated points.

Practical steps involved in the construction of squares


Step 1: Calculate the square root of value of each variable. Step 2: Take the minimum square root value as equal to any particular length/width of a side of square. Step 3: Calculate the sides of other squares on the basis unitary method. Step 4: Draw the squares. Practical steps involved in the construction of circles. Step 1: Calculate the square root of value of each variable. Step 2: Take the minimum square root value as equal to any particular measurement of a radius of a circle. Step 3: Calculate the radius of other circles on the basis of unitary method. Step 4: Draw the circles.

Practical steps involved in the construction of pie diagram.


Step 1: Construct or draw a Circle. Step 2: Calculate degrees of various components of a given variable as follows: Step 3: Calculate Cumulative Degrees. Step 4: Sub-divide the circle on the basis of cumulative degrees.

Practical steps involved in drawing a graph.


Step 1: Draw two perpendiculars on each other. The intersecting point of these perpendiculars is called the originating point or the origin. Step 2: Denote the horizontal line as x-axis and the vertical line as y-axis. The intersection of two perpendiculars drawn on each other provides us four quadrants. Step 3: Take the positive values of x to the right and negative values on the left of origin and take the positive values of y above and negative values below the origin. Step 4: Fix the position of a point of the graph by measuring how much it is away from the origin along with the x-axis and along with the y-axis and designate it by writing the x distance and then y distance enclosing both in a bracket. The point denoted by xis known as abscise and the point denoted by y is known as ordinate.
Case If both X and Y values are positive (say 3, 2) If X values are negative but Y values are positive (say- 3, 2) If both X and Y values are Negative (say- 3, 2 ) If X values are positive but Y Values are negative, ( say 3, 2) Point will be plotted in Quadrant I Quadrant II Quadrant III Quadrant IV

1. 2. 3. 4.

Practical steps involved in the construction of time series graph.


Step 1: Represent time on x axis and value of variable on y-axis. Step 2: Plot the values of the variable corresponding to the factor and join different points by drawing a straight line. Quantitative Aptitude 1.32 Introduction to statistics.

Practical steps involved in the construction of a mixed graph.


Step 1: Take time on x-axis. Step 2: Take one dependent variable on y-axis on right of horizontal axis. Step 3: Plot the values of each variable corresponding to the time factor and join different points by drawing a straight line. Note: False base line may be taken if necessary.

Practical steps involved in the construction of a range graph


Step 1: Take time on x-axis and variable on y-axis. Step 2: Draw one curve for maximum values and another curve for minimum values of given variable. Step 3: Draw vertical lines to show the range at different points of time.

Practical steps involved in drawing histogram in case of equal class intervals.


Step 1: Take time on x-axis and variable on y-axis. Step 2: Draw adjacent vertical rectangle for each class the frequency, the series of such rectangles so formed give the histogram of the frequency distribution and its area represents the total frequency of distribution spread throughout various classes.

Practical steps involved in drawing histogram in case of unequal class intervals.


Step 1: Take the variable on x-axis and frequency on y-axis Step 2: Calculate frequency density for all the classes given. Step 3 Draw adjacent vertical rectangle for each class taking frequency density. The series of such adjacent vertical rectangles so formed give the histogram of frequency distribution.

Practical steps involved in drawing frequency polygon in case of discrete frequency distribution.
Step 1: Plot the frequencies on vertical axis. (Y-axis) and corresponding values of variable on the horizontal axis (x-axis). Step 2: Join the points by straight line and the figure so obtained is the frequency polygon.

Practical steps involved in drawing frequency polygon in case of continuous frequency distribution with the help of histogram
Step 1: Draw the histogram from the available statistical information. Step 2: Join the mid-points of tops of adjacent rectangles of the histogram by straight lines. Step 3: The two end points are joined to the base line at the mid values of empty classes at both ends of the frequency distribution. Step 4: The figure so obtained is known as frequency polygon.

Practical steps involved in drawing frequency polygon in case of continuous frequency distribution without the construction of histogram.
Step 1: Take the frequency on y-axis and values and x-axis. Step 2: Plot the frequencies of different classes against the mid-values of corresponding class. Step 3: Join all these points by straight line and the figure so obtained is the frequency polygon.

Practical steps involved in the construction of frequency curve.


Step 1: Draw a histogram and frequency polygon of the given statistical data. Step 2: Follow the general pattern of the frequency polygon and draw a smooth line this is sometimes below the polygon and sometimes above the polygon.

Quantitative Aptitude

1.33

Introduction to statistics.

Practical steps involved in construction of a less than and more than ogive
Step 1: Calculate the less than and more than cumulative class series. Step 2: Take the variable on the x-axis and the frequency on the y-axis. Step 3: To draw less than ogive curve, plot less than cumulative frequencies against the upper limits of respective class and join the points so plotted by a smooth freehand Curve. Step 4: To draw more than ogive curve, plot more than cumulative frequency against the lower limits of class intervals and join the points by smooth freehand curve.

Quantitative Aptitude

1.34

Introduction to statistics.

You might also like