You are on page 1of 37

Percentiles and Quartiles

Given any set of numerical observations, order them according to magnitude. The Pth percentile in the ordered set is that value below which lie P% (P percent) of the observations in the set. The position of the Pth percentile is given by (n + 1)P/100, where n is the number of observations in the set.

A large department store collects data on sales made by each of its salespeople. The number of sales made on a given day by each of 20 salespeople is shown on the next slide. Also, the data has been sorted in magnitude.

Sales and Sorted Sales


Sales Sorted Sales
9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24

Percentiles

Find the 50th, 80th, and the 90th percentiles of this data set. To find the 50th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(50/100) = 10.5. Thus, the percentile is located at the 10.5th position. The 10th observation is 16, and the 11th observation is also 16. The 50th percentile will lie halfway between the 10th and 11th values and is thus 16.

Percentiles

To find the 80th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8. Thus, the percentile is located at the 16.8th position. The 16th observation is 19, and the 17th observation is also 20. The 80th percentile is a point lying 0.8 of the way from 19 to 20 and is thus 19.8.

Percentiles

To find the 90th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(90/100) = 18.9. Thus, the percentile is located at the 18.9th position. The 18th observation is 21, and the 19th observation is also 22. The 90th percentile is a point lying 0.9 of the way from 21 to 22 and is thus 21.9.

Quartiles Special Percentiles

Quartiles are the percentage points that break down the ordered data set into quarters. The first quartile is the 25th percentile. It is the point below which lie 1/4 of the data. The second quartile is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median. The third quartile is the 75th percentile. It is the point below which lie 3/4 of the data.

Quartiles and Interquartile Range

The first quartile, Q1, (25th percentile) is often called the lower quartile. The second quartile, Q2, (50th percentile) is often called median or the middle quartile. The third quartile, Q3, (75th percentile) is often called the upper quartile. The interquartile range is the difference between the first and the third quartiles.

Finding Quartiles
(n+1)P/100
Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17 Sorted Sales 6 9 10 12 13 14 14 15 16 16 16 17 17 18 18 19 20 21 22 24

Quartiles

Position

First Quartile

(20+1)25/100=5.25

13 + (.25)(1) = 13.25

Median

(20+1)50/100=10.5

16 + (.5)(0) = 16

Third Quartile

(20+1)75/100=15.75

18+ (.75)(1) = 18.75

Summary Measures: Population Parameters Sample Statistics

Measures of Central Tendency Median Mode Mean

Measures of Variability Range Interquartile range Variance Standard Deviation

Other summary measures: Skewness Kurtosis

Characteristics of the Mean


The arithmetic mean is the most widely used measure of location.
It is calculated by summing the values and dividing by the number of values. The major characteristics of the mean are: It requires the interval scale.
All values are used. It is unique. The sum of the deviations from the mean is 0.

Population Mean
For ungrouped data, the population mean is the sum of all the population values divided by the total number of population values:
where is the population mean.
N is the total number of observations. X is a particular value. indicates the operation of adding

X
N

A Parameter is a measurable characteristic of a population. A business man owns four cars. The following is the current mileage on each of the four cars: 56,000, 23,000, 42,000, 73,000
Find the mean mileage for the cars.

X N

56,000 ... 73,000 48,500 4

Sample Mean
For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

X X n

Where n is the total number of values in the sample.

A statistic is a measurable characteristic of a sample. A sample of five executives received the following bonus last year :
14.0, 15.0, 17.0, 16.0,
15.0
X X 14.0 ... 15.0 77 15.4 n 5 5

Properties of the Arithmetic Mean


Every set of interval-level and ratio-level data has a mean. All the values are included in computing the mean. A set of data has a unique mean. The mean is affected by unusually large or small data values. The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.

Consider the set of values: 3, 8, and 4. The mean is 5. Illustrating the fifth property:

( X X ) (3 5) (8 5) (4 5) 0

The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula: The weighted mean of a set of numbers X1, X2, ..., Xn, with corresponding weights w1, w2, ...,wn, is computed from the following formula:

(w1 X 1 w2 X 2 ... wn X n ) Xw (w1 w2 ...wn )

A dealer sold the following items at the prices mentioned. He sold five items for Rs 0.50, fifteen for Rs 0.75, fifteen for Rs0.90, and fifteen for Rs1.10. Compute the weighted mean of the price of the 5(0.50) 15(0.75) 15(0.90) 15(1.15) X drinks. w
5 15 15 15 44.50 Rs0.89 50

The Median
The Median is the midpoint of the values after they have been ordered from the smallest to the largest.
There are as many values above the median as below it in the data array. For an even set of values, the median will be the arithmetic average of the two middle numbers.

The ages for a sample of five college students are: 21, 25, 19, 20, 22 Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21.

The heights of four basketball players, in inches, are: 76, 73, 80, 75 Arranging the data in ascending order gives: 73, 75, 76, 80. Thus the median is 75.5

Properties of the Median


There is a unique median for each data set. It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur. It can be computed for ratio-level, interval-level, and ordinal-level data. It can be computed for an open-ended frequency distribution if the median does not lie in an openended class.

The Mode
The mode is the value of the observation that appears most frequently. The exam scores for ten students are: 81, 93, 84, 75, 68, 87, 81, 75, 81, 87. Because the score of 81 occurs the most often, it is the mode.

The Mean of Grouped Data


The mean of a sample of data organized in a frequency distribution is computed by the following formula:

Xf X n

sample of ten movie theaters in a large metropolitan area tallied the total number of movies showing last week. Compute the mean number of movies showing.

frequency class f midpoint X 1 up to 3 1 2 3 up to 5 5 up to 7 7 up to 9 9 up to 11 Total 2 3 1 3 10 4 6 8 10

Movies showing

(f)(X)

2 8 18 8 30 66
X X 66 6.6 n 10

The Median of Grouped Data


The median of a sample of data organized in a frequency distribution is computed by:
n CF 2 Median L (i ) f

where L is the lower limit of the median class, CF is the cumulative frequency preceding the median class, f is the frequency of the median class, and i is the median class interval.

Finding the Median Class


To determine the median class for grouped data: Construct a cumulative frequency distribution. Divide the total number of data values by 2. Determine which class will contain this value. For example, if n=50, 50/2 = 25, then determine which class will contain the 25th value.

Movies showing 1 up to 3 3 up to 5 5 up to 7 7 up to 9 9 up to 11

Frequency 1 2 3 1 3

Cumulative Frequency 1 3 6 7 10

From the table, L=5, n=10, f=3, i=2, CF=3


n 10 CF 3 Median L 2 (i ) 5 2 (2) 6.33 f 3

The Mode of Grouped Data


The mode for grouped data is approximated by the midpoint of the class with the largest class frequency.

When two values occur a large number of times, the distribution is called bimodal, as in Example 10.

Symmetric Distribution
zero skewness mean mode = median =

Right Skewed Distribution


positively skewed: Mean and Median are to the right of the Mode.

Mode<Median<Mean

Left Skewed Distribution


Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

Skewed to left

Symmetric

Skewed to right

You might also like