You are on page 1of 129

GENERAL STATISTICS

TABLE OF CONTENTS
Chapter 1. Preliminary Concepts 1.1 Introduction and Basic Concepts 1.2 Variables and Data 1.3 Summation Chapter 2. Data Collection and Presentation 2.1 Data Collection 2.2 Data Presentation 2.3 Graphical Representation of Frequency Distribution

Chapter 3. Measure of Central Tendency 3.1 The Mean 3.2 Median and Mode 3.3 Percentiles, Deciles, and Quartiles Chapter 4. Measure of Disfersion and Skewness 4.1 Measure of Variability 4.2 Coefficient of Variation

Chapter 5. Permutations and Combinations 5.1 Principle of Counting 5.2 Permutations 5.3 Combinations

Chapter 1

1.1 Introduction and Basic Concepts


This section aims to: discuss the background and the development of statistics; Define and differentiate the two branches of statistics; and Differentiate population from sample.

Statistical information and development can be traced back from ancient times. People compiled statistical data with regard to all sorts of things such as agricultural crops, athletic events, commerce and trade and so on. As time went by, statistical work has continued to have a marked influence on the activities of mankind in a wider scope from describing important features of the data and analyzing them.

Statistics
A science of conducting studies to collect, organize, summarize, analyze, and draw conclusion from data; interpreting and presenting numerical data. Can refer to the mere tabulation of numeric information as in reports of stock, market, transactions, or to the body of techniques used in processing or analyzing data.

Data
Data are the raw material which the statistician works. Data can be found through surveys, experiments, numerical records, and other modes of research.

Statistician
Statistician is also used in several ways. It can be a person who simply collects information or one who prepares analysis or interpretations. It may mean a scholar who develops a mathematical theory on which the science of statistics is based.

Two Branches of Statistics


Statistics can be organized into descriptive statistics and inferential statistics.

Descriptive Statistics

Concerned with collecting, organizing, presenting, and analyzing numerical data.

Inferential Statistics

Its main concern is to analyze the organized data leading to prediction or inferences.

The word population and sample are the most commonly used words associated with statistics.

Population

Refers to the groups or aggregates of people, objects, materials, events or thing of any form.

Sample

Consist of few or more members of the population.

1.2 Variables and Data


This section aims to: Differentiate the two types of variables; Identify and illustrate the two areas of quantitative variables; Enumerate the classifications of data; and Apply the types of variables in various fields of applications.

Statistical data or information can be gathered through different ways such as interviewing people, observing or inspecting items, using questionnaires and checklists. The characteristic that is being studied is called a variable. It varies from one person or thing to another.

Examples of variables for people are height, weight, age, sex, marital status, eye color, etc. The first three of the given variables yield numerical values and are examples of quantitative variables. The last three yield non-numerical values or attributes are examples of qualitative variables.

Qualitative Variables are further classified as either discrete or continuous. A discrete variable is a variable whose values can be counted using integral values such as the number of enrollees, drop-outs, graduates in a certain college, deaths, and number of employees. A continuous variable is a variable that can assume any numerical value over an interval or intervals. Height, weight, temperature, and time are examples of continuous variables.

A variable can be dependent or independent depending on its use. To predict the value of variable on the other, independent variable is the predictor while the dependent variable is the variable whose value is being predicted. For example, to predict the value of sunlight on the growth of a certain plants, the dependent variable is the growth of the plant while the independent variable is the amount of sunlight exposed to the plant.

Scales of Measurement of Data

Nominal Data

Use numbers for the purpose of identifying name or membership in a group or category.

Ordinal Data

Connote ranking or inequalities in this type of data, numbers represents greater than or less than measurement, such as preferences or rankings.

Interval Data

Indicate an actual amount and there is equal unit of measurement separating each score, specifically equal intervals. The true zero is present.

Ratio Data

Similar to interval data but has an absolute zero and multiples are meaningful. It include all the usual measurement of length, height, weight, area, volume, density, velocity, money and duration.

1.3 Summation
This section aims to: Introduce a special notation that will work as a shortcut for expressing sum of terms and thereby appreciate mathematics as a tool of symbols; and State and analyze the properties of summation.

When dealing with a sum of terms, we shall have occasions to use an abbreviated form. This special symbol for writing of sums is called summation.

Summation defined as
i

is

denoted
n

by

is

x =x +x ++x
1 2 i=1

Where 1 and n are called the lower and upper limits respectively. We note that x1, is read as x sub 1

Chapter 2

2.1 Data Collection


This section aims to: Identify, compare and contrast the different types of data; List and explain the various techniques of selecting a sample; and Enumerate and illustrate the different sampling techniques.

Types of Data

1. 2. 3. 4.

Primary Data - data collected directly by the researcher himself. These are first-hand or original sources. They can be collected through the ff: Direct observation or measurement (primary source of info). By interview (questionnaires or rating scales). By mail of recording or of recording forms. Experimentation.

Secondary Data

Are information taken from published or unpublished materials previously gathered by other researchers or agencies such as book, newspapers, magazines; journals, published and unpublished thesis and dissertations.

Two types of Sampling Technique:


Probability Sampling - every unit has a chance of being selected and that chance can be qualified. Non-Probability Sampling - every item in a population does not have an equal chance of being selected.

Sampling Technique

Procedure in selecting the numbers of samples from the entire population.

Different Types of Sampling Techniques

Simple Random Sampling

It is recommended to prevent the possibility of a bias or erroneous inference. Under the concept of randomness, each member of the population has an equal chance to be included in the sample gathered.

Systematic Random Sampling

The items or individuals are arranged in some way perhaps alphabetically or other sort.

Stratified Random Sampling

In this type of planning a population is first divided into subsets based on homogenity called Strata. The Strata are internally homogenous as possible and at the same time each stratum is different from one another as much as possible.

Cluster Sampling

Can be done by subdividing the population into smaller units and then selecting only a random some primary units where the study would then be concentrated if sometimes referred are sampling because it is frequently applied on a geographical basis

2.2 Data Presentation


This section aims to: Summarize and present data in different forms; Arrange and organize the raw data into a n array and construct the frequency distribution, stem and lead diagram; and Define, illustrate, and solve for the class limits, class boundaries and class marks.

Methods in Presenting Data


Textual Form - data in paragraph form. Tabular Form - systematic arrangement of data in rows and columns. Graphical Form - a graph or chart is a device for showing numerical values in pictorial form. Semi Tabular/Semi Tabular Form - the combination of Textual and Tabular Form.

Stem and Leaf Diagram


Raw data are data collected in an investigation and they are not organized systematically. Raw data that are presented in the form of a frequency distribution are called grouped data. There are two methods of organizing the raw data setting up an array and stem-and leaf diagram.

For example, a nationwide travel agency offers special rates for package tours during summer. To economize spending for the advertisement only certain age group of people will be sent brochures for attraction. The agency gets to previous passenger customers from its files and groups them according to ages. Only those age groups with least people are sent brochures. The following are the ages of the previous customers:

Example:
59 60 71 41 53 50 61 54 34 43 52 58 44 61 63 38 62 52 50 71 80 51 26 60 65 62 36 63 53 79 77 54 58 62 45 56 18 56 62 66

I. Setting up an array from the largest to the smallest


80
63 60 53 44

79
63 60 53 43

77
62 59 52 41

71
52 58 52 38

71
52 58 50 36

66
52 55 50 34

66
61 54 50 26

66
61 54 45 18

II. An array from the smallest to the largest


18 26 34 36 38 41 43 44

45

50

50

51

52

52

53

53

54

54

55

58

58

59

60

60

61

61

62

62

62

62

53

53

66

66

66

71

71

77

79

80

III. Setting up into stem-and-leaf diagram


1

2
3 4 5 6 7 8

8 6 4 1 0 0 1 0

6 3 0 0 1

8 4 1 1 7

5 2 2 3 4 4 5 8 8 1 2 2 2 3 3 6 6 9

Tally Method
CLASS LIMIT 80-89 70-79 60-69 50-59 40-49 30-39 TALLY I IIII IIIII-IIIIIIII IIIII-IIIIIIII IIII III f 1 4 13 13 4 3 CLASS BOUNDARY 79.5-89.5 69.5-79.5 59.5-69.5 49.5-59.5 39.5-49.5 29.5-39.5

20-29
10-19

I
I

1
1 n=40

19.5-29.5
9.5-19.5

2.3 Graphical Representation of Frequency Distribution


This section aims to: Define and illustrate histograms, frequency polygon, ogives and pie graphs; Portray and apply the distribution of data in various graphs such as histogram, frequency polygon, and a cumulative frequency polygon.

Graphical forms of presenting information is often more helpful in making a stronger impact. There are some features in tabular form, which cant be discerned simply by looking at raw data.

Graphical Representation of Frequency Distribution

Frequency Histogram
It is a bar graph that displays the classes or the horizontal axis and the frequency of the classes on the vertical axis.

Frequency Histogram
14 12 10 8 Series1 6 4 2 0 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5

Frequency Polygon
It is a line chart that is constructed by plotting the frequencies and class mark and connecting the plotted pointed by means of a straight line; the polygon us closed by considering an additional class at each end and each end of the lines are brought down to the horizontal axis at the mid point of the additional classes.

Frequency Polygon
14 12 10 8 Series1 6 4 2 0 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5

Ogive
It is a graph of a cumulative frequency distribution and sometimes called a cumulative frequency distribution graph.

Ogive
45 40 35 30 25 20 15 10 5 0 9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 Series1 Series2

Pie Chart
It is a graphical presentation that uses circle or pie.

Pie Chart

36 36 9 9 117 117 27 117

36 27 36 9 9

117

Chapter 3

3.1 The Mean


This section aims to: State and illustrate the definition of the mean both for grouped and raw data (ungrouped); Apply the shortcut formula for calculating the mean.

The most commonly used measure of central tendency is the mean. When taking an average, it is the mean that is often referring to.

This section is divided into two: the mean for ungrouped data and the mean for grouped data

MEASURES OF CENTRAL TENDENCY


-single number represent the given data.
1.

Mean average value of the given data. - not appropriate measures of central tendency if there is outer.

2. Median divide the distribution into two equal parts (upper 50% and the lower 50%)

3. Mode the most frequent occuring data. - nominal value/part.

UNGROUPED DATA
25 32 41 58 78 9 5 105 110 112 112 115 Mean = X n = 883 11 = 80.2727

GROUPED DATA
Short Method

Mean = AM (fd/n) i

Long Method

Mean = fx n
= 2250 40 Mean = 56.25

3.2 Median and Mode


This section aims to: Differentiate the three principal measurements of central tendency; Apply the computations of the median and mode in various sets of data

B. Median - is the middle measure in a set of measures arranged in order magnitude. If the total number of measure is given by the average of two middle measures. Thus, in the median, half the distribution lies above it.

Mode = 112 if in case of two mode, it is called bimodal. if no mode, there is no pair of data.

C. Mode - is the item or measure which occurs most often. It has the highest number of frequency.

ASSUMED MEAN
Mean = AM+ (fd/n)i = 64.5+ (-33/40)10 = 64.5-8.25 Mean = 56.25

Median

Median = LL+(n/2-<cf) i f
= LL + (20-9/13)10 = 49.5 + (11/13) 10 = 49.5 + 8.4615 Median =57.9615

Mode Mode= LL + (1/ 1+ 2) i *where 1 = difference between the modal class and the next lower score. 2 = difference between the modal class and the next upper score.

3.3 Percentiles, Deciles, and Quartiles


This section aims to: Define, illustrate, and distinguish percentiles, deciles, and quartiles; and Discuss the formulas of percentiles, deciles, and quartiles.

Measure of Location

Position/Location

QUARTILE (Q)

Q1- 25% Q2- 50% Q3- 75%

1/2 1/2 3/4

DECILE (D)

D1 -10% D2 20% D3 30% D4- 40% D5- 50% D6- 60% D7- 70% D8- 80% D9- 90%

PERCENTILE (P)
P1- 1/100 P2- 2/100 P3- 3/100 P4- 4/100 P5- 5/100 .. P99- 99/100

UNGROUPED DATA
85 92 105 118 126 149 165 189 205 210 220

Q1 : 0.25n = 0.25(11) = 2.75 d = 105-92 = 13 c = 13(0.75) = 9.75 Q1= 92=9.75 Q1= 101.75

Q3 : 0.75n = 0.75(11) = 8.25 d = 205-189 = 16 c = 16(0.25) = 4 Q3= 189+4 Q3= 193

P3 : 0.3n = 0.3(11) = 8.25 d = 118-105= 13 c = 13(0.3) = 3.9 P3= 105+3.9 P3= 108.9

GROUPED DATA
C.I. 80-89 70-79 f 1 4 <cf 40 39 X 84.5 74.5 <cf 1 5 f/n 0.0025 0.1000 sector 9percen t 36

60-69 50-59
40-49 30-39 20-29 10-19

13 13
4 3 1 1 n=40

35 22
9 5 2 1

64.5 54.5
44.5 34.5 24.5 14.5

18 31
35 38 39 40

0.3250 0.3250
0.1000 0.0750 0.0250 0.0250 rf=1

117 117
36 27 9 9

Q1 = LL + n/4 - <cf i f = 49.5 + 10-9 10 13 = 4905 + (1/13) 10 = 4905 + 0.7692 Q1 = 50.2692

Q3 = LL + 3n/4 - <cf i f = 59.5 + 30-22 10 13 = 59.5 + (80/13) 10 = 59.5 + 6.1538 Q3 = 65.6538

D2 = LL + 0.2n - <cf i f = 39.5 + 8-5 10 4 = 39.5 + (30/4) 10 = 39.5 + 7.5 D2 = 47

P23 = LL + 0.23n - <cf i f = 49.5 + 92.2 - 9 10 13 = 49.5 + (0.2/13) 10 = 49.5 + (0.0153)10

= 49.5 + 0.1538
P23 = 49.6538

Q3 = LL + 3n/4 - <cf i f = 59.5 + 30-22 10 13 = 59.5 + (80/13) 10 = 59.5 + 6.1538 Q3 = 65.6538

MEASURE OF VARIABILITY OR DISPERSION


Measure of the scatteredness of a particular data in a given data set. Average of distance

1. Range = H.S. L.S. C.L. Range 80-89 8905-9.5 = 80 2. Mean Average Deviation - takes into account all the variables in a given distribution.

FORMULA FOR FINDING MEAN AVERAGE DEVIATION:

MAD = |x-x| n

3. Standard Deviation - the most commonly used in measures of variability

UNGROUPED DATA Sample SD 2. SD= x2 (x)2 1. SD= (x-x)2 n-1 n-1


Population SD = x-m)2 N

GROUPED DATA: SD = i f(d1)2 - fd1 2 n n

4. Quartile Deviation - semi- center quartile range. - represent mid-point of middle part of a distribution.

FORMULA:
UNGROUPED DATA: QD = Q3 Q1 2

Chapter 4

MEASURE OF VARIABILITY
COEFFICIENT OF VARIATION Coefficient of Variation denoted by CV allows the variability of scores in 2 sets of data that do not necessarily measures the same thing. The one who got highest scores is the one who needs improvement.

FORMULA: CV SD x 100% x

Example:
10- Highest 1- Lowest

1 2 Coke

4 8

5 9

6 5

7 8

8 6

9 8

10 10

8 10 2

Pepsi

8 1 10 9 3 7 8

10

x= 7.4 (coke)
x 8 7.4 x .6 x-x .6 |x-x| (x-x)2 .36 64 (x)2

10 2 8 9 5 8 6 8 10

7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4

2.6 5.4 .6 1.6 -2.4 .6 -1.4 .6 2.6

2.6 5.4 .6 1.6 2.4 .6 1.4 .6 2.6


|x-x|=18.4

6.76 29.16 .36 2.56 5.76 .36 1.96 .36 6.76


(x-x)2=54.4

100 4 64 81 25 64 36 64 100
(x)2=602

x= 7.3 (pepsi)
x 9 8 1 10 9 3 7 8 8 10 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 7.3 x 1.7 0.7 -6.3 2.7 1.7 -4.3 -0.3 0.7 0.7 2.7 x-x |x-x| 1.7 0.7 6.3 2.7 1.7 4.3 0.3 0.7 0.7 2.7
|x-x|=

(x-x)2 2.87 0.49 39.69 7.29 2.87 18.47 0.09 0.49 0.49 7.29
(x-x)2=80.1

(x)2 81 64 1 100 81 9 49 64 64 100

Coke
CV = SD x 100% x
CV= 204585 x 100% 7.4 SD= (x-x)2 n-1

= 54.4 9 = 6.04

CV= 33.2229%

SD= 2.4585

Pepsi
SD= (x-x)2 n-1 = 80.1 10-1 = 80.1 9 = 8.9 SD = 2.9833 SD2= 8.9

DECISION:
Pepsi needs more improvement than coke in terms of
taste

A distribution of 2 different units is given to compare in dispersion of heights versus in dispersion of weights. The mean height is 5.70 feet with SD = 0.9 ft. The mean weight is 72.5 kg with SD = 801 kg. Compare the dispersion in heights and in weights.
HEIGHTS
CV = SD x 100% x = 0.9 x 100% 5.7 = 0.15789 = 15.7985%

WEIGHTS
CV = SD x 100% x = 8.1 x 10% 72.5 = 0.111724 = 11.1724%

MEASURE OF SKEWNESS

Degree of symmetry or departure from symmetry.

FORMULA: 1. SK1 = x-x SD 2. SK2 = 3(x-x) SD

3.SK3 = Q3 2Q2 + Q1 Q3 Q1
4. SK4 = P90 2P50 + P10 P90 P10

UNGROUPED DATA
5. SK5 = f(x-x)3 n(SD)3

GROUPED DATA
SK5 = f(x-x)3 n(SD)3

Negatively Skewed Distribution (all negative)

Positively Skewed Distribution (all positive)

Normal Distribution

Measure of Kurtosis

It is the degree of peakedness FORMULA:

Ungrouped Data

Grouped

K = (x-x)4 n(SD)4

K= f(x-x)4 n(SD)4

Leptokurtic Distribution

Mesokurtic Distribution

Platykurtic Distribution

Chapter 5

5.1 PRINCIPLE OF COUNTING


This section aims to: State and illustrate the principle of counting; Diagram the computations involving the principle of counting; and Apply the principle of counting in various area of problem solving.

Principle of Counting
If a choice of 2 steps of which the first can be made in n1 ways and a second can be made in n2 ways, then the whole choice can be made by n1 n2 ways.

EXAMPLES:
1. In a class of 20 the # of ways selecting president, Vice-President, Secretary, treasurer is 20 . 19 . 18 . 17 = 116280

2. Certain government employees are classified into 2 categories Sex: (male, female) Marital Status : (single. Married, widow, separated) 2 . 4 = 8

GENERALIZATION OF PRINCIPLE OF COUNTING

If a choice has k steps of which the first can be made ian N1 ways, of which each of these 2nd can be made in n2 ways. 3rd of which of these kth can be made in nk which then the whole choice can be made by n1 . n2 . nk(ways)

EXAMPLES: 1. A test is compose of a 10 multiple question with each having four(4) possible answers. 4 . 4 . 4 . 4 . 4 . 4 . 4 . 4 . 4 . 4 = 1,048,576

2. How many nos, of five(5) digits each can be made from the digit 1-9 if: a. No. must be odd b. The last two(2) digit each no. are even number. Repetition is not allowed

1 ,2 ,3 , 4 , 5

5.2 Permutations
This section aims to: Define and illustrate permutations Apply permutations in various situational conditions; and State and illustrate the circular permutation.

PERMUTATIONS

Arrangement of group of things in a definite order that is, there is a 1st element, 2nd element, 3rd element etc. In other words, the order of arrangement of an element is important.

EXAMPLES:
1. In how many ways can the five(5) starting position on the PBA team with 12 mean who can play any of the position.
12P5 = 12! = 12! = 12 .11 .10 .9 . 8 . 7 . 6 . 5 .4 .3 .2 .1 = 95,040 (12-5) 7! 7!

2. How many permutation can be made from the letter of q word Sunday? a. If the four(4) letters are use at a time. 6P4 = 6! = 6! = 360 (6-4)! 2! b. All letters are used 6P6 = 6! = 720 6!

FORMULAS:
1ST Formula: the number if permutation n things taken n at a time is nPn=n!
2nd Formula: the number of permutation of n things is taken at a time is nPr = n (n-r)

CIRCULAR PERMUTATION

The permutation that occur by arranging objects in a circle are called circular permutation
P = (n-1)!

COMBINATION

A combination also concerns arrangement but without regards to order. This means that the order or arrangement in which the element are taken is not important. nCr = n! r!(n-1)!

The End

Thank You!

Presenters:

Mary Ann Frogosa Mary Ann Mosquerra BOA IV-1

You might also like