You are on page 1of 23

Measures of Variation

As well as the Central Tendency of the data in a population or sample a second important characteristic of the data is it variability about some center.

Measures of Variation include: The range The Variance The Standard Deviation The Mean Absolute Deviation
The standard deviation is just the square root of the variance

Measures of Variation
Standard Deviation of a Population

We will label the population variance to be 2


And define 2 = (xi )2/N
i

Where is the population mean N is the size of the population

difference between each item in the population and the mean.

i(xi )2 is the sum of the squares of the

Measures of Variation
Suppose a student receives the following quiz grades:

{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
For this student, these grades are the total population of her scores that are used to calculate her mean or average grade. We obtain: = (82 + 68 + 74 + 86 + 90 + 88 + 62 + 75 + 80 + 55)/10 = 760/10 = 76 The mean of this population is 76

Measures of Variation
Having obtained the mean, we can now calculate the variance {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} and =76 2 = (xi )2/N
i

= {(82-76)2 + (68-76)2 + (74-76)2 + (86-76)2 + (90-76)2 + (88-76)2 + (62-76)2 + (75-76)2 + (80-76)2 + (55-76)2 }/10

= (36 + 64 + 4 +100 + 196 + 144 + 196 + 1 + 16 + 441)/10


= 119.8

Measures of Variation
We find the standard deviation in this population data by taking the square root of the variance. 2 = (xi )2/N = 119.8
i

= (119.8) = 10.94 If we display the data on a dot plot, we can visualize the use of the standard deviation as a measure of variation in the data {82, 68, 74, 86, 90, 88, 62, 75, 80, 55}

x 55 60 x 65 x 70

= 76

x 80 x x 85 x x 90 95 100

xx 75

Mean = 76

Measures of Variation
Chebyshevs Theorem
The proportion of any set of data lying within K standard deviations of the mean is always at least 1 1/K2, for all K greater than or equal to 2.

Chebyshevs Inequality tells us that in any statistical distribution at least of the values will lie within 2 standard deviations of the mean, and at least 8/9 of all values will lie within 3 standard deviations of the mean.

In the previous example we found = 76 and = 10.94


- 2 = 76 2(10.94) = 54.12 + 2 = 76 + 2(10.94) =97.88
We find that 100% of the values lie within 2 of the mean

Measures of Variation
The Sample Standard Deviation The standard deviation of a sample is denoted by the letter s. The sample standard deviation is an estimate of the _ population standard deviation

s2 = i(xi x)2/(n 1)

Where x bar in the previous formula denotes the sample mean. The sample standard deviation is obtained by taking the square root of the variance. Note! To calculate the sample variance we divide by the number of degrees of freedom (n 1) instead of the sample size n. We have already calculated the sample mean when we use the same sample data to obtain a second statistic. Only n-1 of those values are considered free the nth value is fixed since the sum must equal n times the mean.

Measures of Variation
The formula for the standard deviation can be transformed into a form that slightly simplifies the computation.

s = (n i(xi)2 (ixi)2)/n(n 1))


On first sight it is not clear that we have simplified the calculation, but if we assume that the previous 10 grades were a sample taken from a larger number of students enrolled in a course, then we will illustrate how the two formula are used to calculate the standard deviation.

Measures of Variation
Using the original formula and treating the previous data a sample data with a mean of 76 we get:

_ s = (i(xi x)2/(n 1))

{82, 68, 74, 86, 90, 88, 62, 75, 80, 55}
s = (((82-76)2 + (68-76)2 + (74-76)2 + (86-76)2 + (90-76)2 + (88-76)2 + (62-76)2 + (75-76)2 + (80-76)2 + (55-76)2)/(n-1)) = (1198/9) = 133.11 = 11.54

Measures of Variation
To use the modified formula, we first construct the following table {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} x 82 68 74 86 90 88 62 75 80 55 760 x2 6724 4724 5476 7396 8100 7744 3844 5625 6400 3025 58958 n = 10

s2 = ((10)(58958)-7602)/(10)(9)

= (589580-577600)/(10)(9)
= 133.11

s = 133.11 = 11.54
In this second method we find the total of the sample items and the total of the square of each of these items.

Measures of Variation
Finding the standard deviation for tabulated or weighted data
Recall the table we constructed for finding the mean of a sample of September temperature readings in the Central Tendency lecture notes.

Class
64.5 - 69.5 69.5 74.5 74.5 79.5 79.5 84.5 84.5 89.5 89.5 94.5

Midpoint (x) Total (f)


67 72 77 82 87 92 6 11 20 13 9

f*x
402 792 1540 1066 783

x2
4489 5184 5929 6724 7569

f*x2
26934 57024 118580 87412 68121

1 92 8464 8464 60 4675 366535 We have augmented the previous table by adding two additional columns that will be used for calculating the sample standard deviation of these grouped data.

Measures of Variation
The formula for obtaining the standard deviation of weighted or tabulated data is:

s = (n i(fi * xi2) (i fi * xi)2)/n(n 1))


From the previous table we have
ni(fi * xi2) = (60)(366535) = 21992100 (i fi * xi)2 = (4675)2 = 21855625 s = ((21992100 21855625)/(60)(59)) = 38.55 = 6.21

Measures of Variation
We construct an ogive from the previous table
frequency Mean = 79.183 60 55 50 45
x x x

s = 6.21 2s = 12.42

40
35 30 25 20 15 10 5 0 x 64.5
x

6.21 2s
x

6.21 2s

69.5

74.5

79.5

84.5

89.5

94.5

Temperature

Measures of Variation
The Normal Distribution Continuous

Symmetric
Mean = Median = Mode (all the same value)
mean
o o o o o o o o o o o o o o o

68% of values 2 95% of values 3 99.8 % of values

o
o o o o o o o

Measures of Variation
Other measures of variation Using the range to estimate the standard deviation

s ~ range/4 On an earlier slide we found for a population of student grades: {82, 68, 74, 86, 90, 88, 62, 75, 80, 55} = 76 and = 10.94
The range of this population = 90 55 = 35 This gives us an estimate of = 35/4 = 8.75 In the tabulated data for the temp readings we have range = 92 65 = 27 s = 27/4 = 6.15 which agrees fairly well with the calculated value of s = 6.21

Measures of Variation
The Coefficient of Variation (CV)

Define: For either a population or a sample the Coefficient of Variation is defined to be the ratio of the standard deviation over the mean CV = / for a population CV = s/ x for a sample
Where x denotes x bar the sample mean

The CV for the population of grades from the previous page:


CV = 10.94/76 = 0.144

Part 2

Measures of Relative Standing

Relative Standing
A z score is the number of standard deviations that a raw score, x, is above or below the mean. A raw score x taken from a population is converted to a standardized z score by the formula z = (x )/ In a sample the z score of a value x is given by z = (x x)/s where x denotes the sample mean

Relative Standing
Percentiles
percentile of value x = ((number of values < x)/ total number of values)*100
(round the result to the nearest whole number

Suppose that in a class of 25 people we have the following averages (ordered in ascending order)
42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98

If you received a 77, what percentile are you?


percentile of 77 = (12/25)*100 = 48

Relative Standing
Quartiles

Instead of finding the percentile of a single data value as we did on the previous page, it is often useful to group the data into 4, or more, (nearly) equal groups. When grouping the data into four equal groupings, we call these groupings quartiles.
Let n = number of items in the data set k = percent desired (ex. k= 25) L = locator the value separating the first k percent of the data from the rest

L = (k/100) * n

Relative Standing
Lets separate the 25 class grades into four quartiles.

Step 1 order the data in ascending order


42, 59, 63, 67, 69, 69, 70, 73, 73, 74, 74, 74, 77, 78, 78, 79, 80, 81, 84, 85, 87, 89, 91, 94, 98 L25 Q2 Q1 Q3

Now find the 3 locators L25, L50, L75,


L25 = (25/100) * 25 = 6.25 L50 = (50/100) * 25 = 12.5 L75 = (75/100) * 25 = 18.75 7 13 19 Round fraction part up to the next integer

Relative Standing
Other measures of relative standing include
Interquartile range (IQR) = Q3 - Q1 Semi-interquartile range = (Q3 - Q1)/ 2 Midquartile = (Q3 + Q1)/2

10 90 percentile range = P90 - P10


For the data on the previous page we have:

IQR = 84 70 = 16
Semi IQR = (84 70)/2 = 8 Midquartile = (84 + 70)/2 = 77

Measures of variation Measure of central tendency

Box Diagram
Recall the ordered high temperature readings from an previous lecture 65, 67, 68, 68, 69, 69, 71, 71, 71, 72, 72, 72, 73, 73, 73, L25 74, 74, 75, 75, 75, 75, 76, 76, 77, 77, 77, 77, 77, 77, 78, median 78, 78, 78, 79, 79, 79, 79, 80, 81, 81, 81, 81, 81, 81, 81, L75 81, 82, 82, 83, 84, 85, 85, 85, 86, 86, 87, 87, 88, 89, 92
To construct a box diagram to illustrate the extent to which the extreme data values lie beyond the interquartile range, draw a line with the low and high value highlighted at the two ends. Mark the gradations between these two extremes, then locate the quartile boundaries Q1, Med., and Q3 on this line. Construct a box about these values. Q1 = (73 + 74)/2 = 73.5
Q1 M Q3

65

69

73

77

81

85

89

92

You might also like