You are on page 1of 40

Chapter 1 (Introduction)

Statistics

The science of:


Collecting data (All or sample)
Analyzing (Clean up the data)
Presenting (Charts and graphs)
Making conclusions and observations
(inference)
Statistics is a way to get information from
data
Statistics

Data Information

2
Example
An engineering school student is anxious about their statistics
course, since theyve heard the course is difficult. The professor
provides last terms final exam marks to the student. What can be
discerned from this list of numbers?
Statistics

Data Information
List of last terms marks. New information about the
statistics class.
95
89
70 E.g. Class average,
65 Proportion of class receiving As
78 Most frequent mark,
57
Marks distribution, etc.
:
3
Key Statistical Concepts
Population Sample

Subset

Statistics
Parameter

Populations have Parameters,


Samples have Statistics.
Adapted from Keller G. and Warrack B. (Statistics for Management and Economics ) 4
Statistics

Science of statistics applies two


types of problems
Descriptive Statistics
Statistical inference

5
Descriptive Statistics

are methods of organizing, summarizing,


and presenting data in a convenient and
informative way. These methods include:
Graphical Techniques (Pie charts,
Histograms) Nuclear
Lightning
2.2%
2.2%
Pie Chart of Cause

OilFire
8.9% CoalMine
15.6%
C ategory
C oalMine
DamFailure
GasExplosion
Lightning
Nuclear
30

25
Chart of Cause

OilFire
20
DamFailure

Count
8.9%
15

10

GasExplosion 0
62.2%
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause

Numerical Techniques (Mean, Standard


deviation)

6
Descriptive Statistics
Descriptive statistics involves arranging, summarizing, and
presenting a set of data in such a way that useful information
is produced.
Statistics

Data Information

Descriptive Statistics describe the data set


thats being analyzed, but doesnt allow us to
draw any conclusions or make any
interferences about the data.
Adapted from Keller G. and Warrack B. (Statistics for Management and Economics ) 7
Statistical Inference

Statistical inference is the process of


making an estimate, prediction, or decision
about a population based on a sample.

Population

Sample
Inference

Statistic
Parameter

8
Classification of Data

Data

Qualitative Quantitative
(Interval)

Nominal Ordinal Discrete Continuous


Marital Status College course Number of Children Weight
Political Party rating system Defects per hour Voltage
Eye Color (Counted items)

9
Numerical Methods for Describing Qualitative Data

Category frequency: number of observations


that fall in a given category.

Category relative frequency: the proportion of


the total number of observations that fall in a
given category.

10
Example

11
Graphical Methods for Describing Qualitative Data

Bar Chart
Pie Chart
Pareto Diagram

12
Pie Chart

Pie Chart of Cause


Category
OilFire CoalMine
8.9% CoalMine DamFailure
Nuclear 15.6% GasExplosion
Lightning
2.2%
2.2% Lightning
Nuclear
OilFire

DamFailure
8.9%

GasExplosion
62.2%

13
Bar Chart

Chart of Cause
30

25

20
Count

15

10

0
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause

14
Pareto Diagram

Chart of Cause

100

Cumulative Percent Count


80

60

40

20

0
GasExplosion CoalMine DamFailure OilFire Lightning Nuclear
Cause
Percent within all data.

15
Graphical Methods for Describing Quantitative Data

Dot plots
Steam-and-leaf display
Histograms

16
Example

17
Dotplots

Dotplot of MPG

30.0 32.5 35.0 37.5 40.0 42.5 45.0


MPG

18
Histograms

Histogram of MPG
35

30

25
Frequency

20

15

10

0
30 33 36 39 42 45
MPG

19
Frequency Distribution Example

Example: A manufacturer of insulation randomly


selects 20 winter days and records the daily high
temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27

20
Frequency Distribution Example
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
Compute class midpoints: 15, 25, 35, 45, 55

Count observations & assign to classes

21
Frequency Distribution Example

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
22
Frequency Distribution Example

Data in ordered array:


12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage

10 but less than 20 3 15 3 15


20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100

23
Histogram Example

Class
Class Midpoint Frequency
10 but less than 20 15 3 His togram : Daily High Te m pe rature
20 but less than 30 25 6
30 but less than 40 35 5 7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency
4
3
2
(No gaps 1
between 0
bars)
5 15 25 35 45 55 65
Class Midpoints
24
Numerical Methods for Describing Quantitative Data

The measures are those help;


to locate the center of the relative frequency
distribution
(measures of central tendency)

to measure spread around the center


(measures of variation)

to describe the relative position of an


observation
(measures of relative standing)

25
Measures of Central Tendency

Central Tendency

Arithmetic Mean Median Mode

X
i1
i
X
n Midpoint of Most
ranked frequently
values observed
value

26
Mean
Population Sample

Size N n

Mean

Population Mean Sample Mean


27
Mean

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
n

X
i1
i
X1 X 2 Xn
X
n n

1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5

28
Median

In an ordered array, the median is the middle


number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 34
Median = 3.5
2

29
Mode

Value that occurs most often

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9

30
Mean, Median, Mode

31
Measures of Variation
Measures of variation give information on the
spread or variability of the data values.
Range
Standard deviation
Variance

Same center,
different variation
32
Range

Range = Xlargest Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

33
Disadvantages of the Range

Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

34
Variance Standard deviation
Average (approximately) of squared deviations of values
from the mean
Sample variance: Sample standard deviation:
n n
2

2
(Xi X)
i1
i
(X X ) 2

S S i 1
n -1 n -1

35
Population vs Sample
Population
Sample
Subset

Statistics
Parameter

N n
2
(X )
i
2

2
(X X)
i1
i
2 i1 S

N n -1
36
Example: Sample Standard Deviation

Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16

(10 X ) 2 (12 X ) 2 (14 X ) 2 (24 X ) 2


S
n 1

2 2 2 2
(10 16) (12 16) (14 16) (24 16)

8 1

130 A measure of the average


4.3095
7 scatter around the mean

37
Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567

38
Measuring variation

Small standard deviation A

Large standard deviation

39
Shape of a distribution

Describes how data are distributed


Measures of shape
Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean

40

You might also like