You are on page 1of 12

Spread of data is called dispersion

Two sets of data in feet both have mean = 4 feet

3, 3.25, 3.5, 3.75, 4, 4.25, 4.5, 4.75, 5

0, 1, 2, 3, 4, 5, 6, 7, 8

Which is more spread out ?

Does it matter ?

Yes if this is depth of water,

that you have to cross and

you do not know swimming

1-1
How to find overall spread ?
For each data point we will find deviation from mean i.e.
data point minus mean

This will have both +ve and ve values.

For the data

0, 1, 2, 3, 4, 5, 6, 7, 8

Deviations are

-4, -3, -2, -1, 0, 1, 2, 3, 4

So if we add them to take average they will cancel out.

1-2
How to avoid cancelling out ?
Take the absolute value of the deviations and average those

Or square each of the deviations, find the average of the


squares, and then find square root of that.

Suppose the data is 1,2,3,4,5 the average is 3

Deviations are -2,-1,0,1,2 average deviation is 0

Mean Absolute Deviation = (2+1+0+1+2)/5 = 1.2

Variance ={(2) (1) (0) (1) (2) }/ 5 10 / 5 2


2 2 2 2 2

Std Dev = Variance 2 1.4

1-3
Compare the rivers using MAD & SD
River 1 Devn Abs Dev dev Sqrd River 2 Devn Abs Dev dev Sqrd
0 3.00
1 3.25
2 3.50
3 3.75
4 4.00
5 4.25
6 4.50
7 4.75
8 5.00
MEAN Avg Dev MAD Std dev MEAN Avg Dev MAD Std dev
4 4

1-4
Compare the rivers using MAD & SD
River 1 Devn Abs Dev dev Sqrd River 2 Devn Abs Dev dev Sqrd
0 -4 3.00 -1.00
1 -3 3.25 -0.75
2 -2 3.50 -0.50
3 -1 3.75 -0.25
4 0 4.00 0.00
5 1 4.25 0.25
6 2 4.50 0.50
7 3 4.75 0.75
8 4 5.00 1.00
MEAN Avg Dev MAD Std dev MEAN Avg Dev MAD Std dev
4 0 4 0

1-5
Compare the rivers using MAD & SD
River 1 Devn Abs Dev dev Sqrd River 2 Devn Abs Dev dev Sqrd
0 -4 4 3.00 -1.00 1.00
1 -3 3 3.25 -0.75 0.75
2 -2 2 3.50 -0.50 0.50
3 -1 1 3.75 -0.25 0.25
4 0 0 4.00 0.00 0.00
5 1 1 4.25 0.25 0.25
6 2 2 4.50 0.50 0.50
7 3 3 4.75 0.75 0.75
8 4 4 5.00 1.00 1.00
MEAN Avg Dev MAD Std dev MEAN Avg Dev MAD Std dev
4 0 2.2 4 0 0.56

1-6
MAD & SD tell us River 1 is unsafe
River 1 Devn Abs Dev dev Sqrd River 2 Devn Abs Dev dev Sqrd
0 -4 4 16 3.00 -1.00 1.00 1.00
1 -3 3 9 3.25 -0.75 0.75 0.56
2 -2 2 4 3.50 -0.50 0.50 0.25
3 -1 1 1 3.75 -0.25 0.25 0.06
4 0 0 0 4.00 0.00 0.00 0.00
5 1 1 1 4.25 0.25 0.25 0.06
6 2 2 4 4.50 0.50 0.50 0.25
7 3 3 9 4.75 0.75 0.75 0.56
8 4 4 16 5.00 1.00 1.00 1.00
MEAN Avg Dev MAD Std dev MEAN Avg Dev MAD Std dev
4 0 2.2 2.6 4 0 0.56 0.65

1-7
Another example, here MAD is the same for both
data sets, but Std Dev is different for the two sets
Dev Dev
data 1 Devn Abs Dev Sqrd data 2 Devn Abs Dev Sqrd
4 -6 6 36 5 -5 5 25
6 -4 4 16 5 -5 5 25
10 0 0 0 10 0 0 0
14 4 4 16 15 5 5 25
16 6 6 36 15 5 5 25
MEAN Avg Dev MAD Std Dev MEAN Avg Dev MAD Std Dev
10 0 4 4.6 10 0 4 4.5

1-8
Let us look at the data closely without all the
workings of MAD and SD
data 1 data 2
4 5
6 5
10 10
14 15
16 15
MEAN MEAN
10 10

Which data set has more spread ? Clearly data set 1

1-9
Now we see the data with the workings of MAD
and SD
Dev Dev
data 1 Devn Abs Dev Sqrd data 2 Devn Abs Dev Sqrd
4 -6 6 36 5 -5 5 25
6 -4 4 16 5 -5 5 25
10 0 0 0 10 0 0 0
14 4 4 16 15 5 5 25
16 6 6 36 15 5 5 25
MEAN Avg Dev MAD Std Dev MEAN Avg Dev MAD Std Dev
10 0 4 4.6 10 0 4 4.5

Hence which is a better indicator of spread? MAD or SD?

1-10
Why do we prefer SD to MAD?
When the spread is less the values of Standard deviation
and Mean Absolute Deviation are close to each other.

But when the spread is more, the standard deviation is


always more than the Mean Absolute Deviation.

Because it gives more weight to those far away.

Hence it is a more powerful indicator of spread.

1-11
Working with MS- Excel
(pronounced sigma) is the symbol for population standard deviation

and when calculating population variance, we divide the sum of squares


of deviation by the population count. [Ideal]

s (always small letter) is the symbol for the sample standard deviation,

And when calculating sample variance, we divide the sum of squares by


(n-1) where n is the sample size.

In MS-Excel the formula is =STDEV()

1-12

You might also like