You are on page 1of 32

DESCRIPTIVE STATISTICS

LOUIS COHEN, LAWRENCE


MANION & KEITH MORRISON
STRUCTURE OF THE CHAPTER
Frequencies, percentages and
crosstabulations
Measures of central tendency and dispersal
Taking stock
Correlations and measures of association
Partial correlations
Reliability
FREQUENCIES AND PERCENTAGES
Graphical forms of data presentation:
Frequency and percentage tables;
Bar charts (for nominal and ordinal data);
Histograms (for continuous interval and ratio
data);
Line graphs;
Pie charts;
High and low charts;
Scatterplots;
Stem and leaf displays;
Boxplots (box and whisker plots).
FREQUENCIES AND PERCENTAGES
Bar charts for presenting categorical and discrete
data, highest and lowest;
Avoid using a third dimension (e.g. depth) in a
graph when it is unnecessary; a third dimension to
a graph must provide additional information;
Histograms for presenting continuous data;
Line graphs for showing trends, particularly in
continuous data, for one or more variables at a
time;
Multiple line graphs for showing trends in
continuous data on several variables in the same
graph;
FREQUENCIES AND PERCENTAGES
Pie charts and bar charts for showing proportions;
Interdependence can be shown through cross-
tabulations;
Boxplots for showing the distribution of values for
several variables in a single chart, together with
their range and medians;
Stacked bar charts for showing the frequencies of
different groups within a specific variable for two or
more variables in the same chart;
Scatterplots for showing the relationship between
two variables or several sets of two or more
variables on the same chart.
A crosstabulation is a presentational device.
Rows for nominal data, columns for ordinal
data.
Independent variables as row data,
dependent variables as column data.
CROSSTABULATIONS
BIVARIATE CROSSTABULATION
sex * The course was too hard: crosstabulation
7 11 25 4 3 50
3.7% 5.8% 13.1% 2.1% 1.6% 26.2%
17 38 73 12 1 141
8.9% 19.9% 38.2% 6.3% .5% 73.8%
24 49 98 16 4 191
12.6% 25.7% 51.3% 8.4% 2.1% 100.0%
Count
% of Total
Count
% of Total
Count
% of Total
male
f emale
Total
not at
all
very
little a little
quite a
lot
a very
great deal
the course was too hard
Total
TRIVARIATE CROSSTABULATION

Acceptability of formal, written public examinations

Traditionalist
Progressivist/
child-centred
Formal,
written
public
exams
Socially
advantaged
Socially
disadvantaged
Socially
advantaged
Socially
disadvantaged
In favour 65% 70% 35% 20%
Against 35% 30% 65% 80%
Total per
cent
100% 100% 100% 100%
MEASURES OF CENTRAL
TENDENCY AND DISPERSAL
The mode (the score obtained by the greatest
number of people);
For categorical (nominal) and ordinal data
The mean (the average score);
For continuous data
Used if the data are not skewed
Used if there are no outliers
MEASURES OF CENTRAL
TENDENCY AND DISPERSAL
The median (the score obtained by the middle
person in a ranked group of people, i.e. it has an
equal number of scores above it and below it);
For continuous data
Used of the data are skewed
Used if there are outliers


MEASURES OF CENTRAL
TENDENCY AND DISPERSAL
Standard deviation (the average distance of
each score from the mean, the average
difference between each score and the mean,
and how much, the scores, as a group, deviate
from the mean.
A standardized measure of dispersal.
For interval and ratio data

STANDARD DEVIATION
The standard deviation is calculated, in its most
simplified form as:

or

d
2
= the deviation of the score from the mean
(average), squared
= the sum of
N = the number of cases
A low standard deviation indicates that the
scores cluster together, whilst a high standard
deviation indicates that the scores are widely
dispersed.
|
|
.
|

\
|

=

1
. .
2
N
d
D S
|
|
.
|

\
|
=

N
d
D S
2
. .

9
8
Mean
7
|
6
|
5
|
4
|
3
|
2
|
1
X X X X | X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 3 4 20
Mean = 6

High standard deviation
9
8
Mean
7
|
6
|
5
|
4
|
3
|
2
|
1
X X X X X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 2 6 10 11
Mean = 6

Moderately high
standard deviation
9
8
Mean
7 |
6 |
5 |
4 |
3 X
2 X
1 X X X
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5 6 6 6 7
Mean = 6

Low standard deviation
THE RANGE AND INTERQUARTILE
RANGE
The range:
The difference between the minimum and
maximum score.
A measure of dispersal.
Outliers exert a disproportionate effect.
The interquartile range:
The difference between the first and the third
quartile, the difference between the 25
th
and the
75
th
percentile, i.e. the middle 50 per cent of
scores (the second and third quartiles).
Overcomes problems of outliers/extreme scores.
CORRELATION
Measure of association between two
variables.
Note the direction of the correlation:
Positive: As one variable increases, the
other variables increases
Negative: As one variable increases, the
other variable decreases
The strongest positive correlation
coefficient is +1.
The strongest negative correlation
coefficient is -1.
CORRELATION
Note the magnitude of the correlation
coefficient:
0.20 to 0.35: slight association
0.35 to 0.65: sufficient for crude prediction
0.65 to 0.85: sufficient for accurate prediction
>0.85: strong correlation
Ensure that the relationships are linear and not
curvilinear (i.e. the line reaches an inflection
point)
CURVILINEAR RELATIONSHIP
0
10
20
30
40
50
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
Age
M
u
s
c
u
l
a
r

s
t
r
e
n
g
t
h
CORRELATION
Foot size Hand size
1 1
2 2
3 3
4 4
5 5

Perfect positive correlation: + 1
CORRELATION
Foot size Hand size
1 5
2 4
3 3
4 2
5 1

Perfect negative correlation: + 1
CORRELATION
Hand size Foot size
1 2
2 1
3 4
4 3
5 5

Positive correlation: <+1
0
1
2
3
4
5
6
7
Line 1
PERFECT POSITIVE CORRELATION
0
1
2
3
4
5
6
7
Line 1
PERFECT NEGATIVE CORRELATION
0
2
4
6
8
10
Line 1
MIXED CORRELATION
CORRELATIONS
Correlations
Spearman correlation for nominal and
ordinal data
Pearson correlation for interval and ratio
data

BIVARIATE CORRELATIONS
Correlations
Spearman correlation for nominal and
ordinal data
Pearson correlation for interval and ratio
data

MULTIPLE AND PARTIAL
CORRELATIONS
Multiple correlation:
The degree of association between three
or more variables simultaneously.
Partial correlation:
The degree of association between two
variables after the influence of a third has
been controlled or partialled out.
controlling for the effects of a third variable
means holding it constant whilst
manipulating the other two variables.
RELIABILITY
Split-half reliability (correlation between one
half of a test and the other matched half)
The alpha coefficient
SPLIT-HALF RELIABILITY
(Spearman-Brown)
Reliability =

r = the actual correlation between the two halves of
the instrument (e.g. 0.85);

Reliability = = = 0.919 (very high)


r
r
+ 1
2
85 . 0 1
) 85 . 0 ( 2
+
185
70 . 1
CRONBACH ALPHA
Reliability as internal consistency: Cronbachs
alpha (the alpha coefficient of reliability).
A coefficient of inter-item correlations.
It calculates the average of all possible split
half reliability coefficients.


INTERPRETING THE RELIABILITY
COEFFICIENT
Maximum is +1

>.90 very highly reliable
.80-.90 highly reliable
.70-.79 reliable
.60-.69 marginally/minimally reliable
<.60 unacceptably low reliability

You might also like