You are on page 1of 17

Br. J. educ. Psychol.

, 51, 170-186, 1981

TEACHING STYLES AND PUPIL PROGRESS:


A RE-ANALYSIS
BY M. AITKIN, S . N. BENNETT AND JANE HESKETH"
(Centre for Applied Statistics and Department of Educational Research,
University of Lancaster)

SUMMARY.The object of the study was to examine the statistical techniques available
for the analysis of process-product studies involving non-randomised quasi-experimental designs, and to demonstrate the practical effects of their use on the data from the
Teaching Styles study (Bennett, 1976). Of particular concern were the ' unit of analysis '
or aggregation problem, and the differential effects of treatment grouping by cluster
and factor methods.
The original grouping of teachers into formal, informal and mixed styles was
investigated using a latent class model for the 38 binary questionnaireitems. Convincing
evidence of three overlapping latent classes was found. The comparison of latent classes
in terms of pre-test gain scores was examined using a series of variance component
models, allowing for correlation of children within the same class. Differences among
classes were altered by the probabilistic clustering of the latent class model compared
to the original findings, and the significance of the differences was reduced when the
correlation among children was allowed for.

INTRODUCTION
INthe four years since the publication of Teaching Styles and Pupil Progress (subsequently abbreviated to TS) there have been rapid developments in the statistical
methods available for the analysis of complex data. While these developments are
still in their early stages, it is already clear that they will have an important influence
on the analysis of large-scale educational research studies. Two of these developments are particularly important for the analysis of educational data from surveys and
observational studies: the development of latent class models for clustering nonhomogeneous populations, and the development of unbalanced variance component
(' mixed ') models for nested and cluster sampling structures.
The objects of this article are to describe the application of these modelling
procedures to the Teaching Styles data, to report the conclusions drawn, and to
compare these conclusions with those found in the original analysis. Implications for
future research studies are also discussed (for statistical detail see Aitkin et al., 1981).
In the re-analysis, two main questions were considered:
(1) Is there convincing statistical evidence of distinguishable teaching styles? If
so, how many styles can be convincingly identified, and how can these be characterised ?
(2) Is teaching style, as determined statistically above, related to overall pupil
progress ?
THE EXISTENCE OF DISTINGUISHABLE TEACHING STYLES

Cluster analysis
The use of cluster analysis in educational research is increasing as researchers
recognise the utility of grouping people rather than grouping variables. Barker
Lunn's (1970) study of streaming in the primary school was the first major investigation to use this approach, delineating two ' types ' of teaching closely conforming
to the progressive-traditional dichotomy. The two most recent studies (Bennett,

* Now at Computing Centre, Heriot-Watt University.


170

171

M. AITKIN,S. N. BENNETT
and JANE HESKETH

1976; Galton et al., 1980) used identical cluster methods to delineate both teacher and
pupil types although the data base was different. The method chosen was based on
iterative relocation using a Euclidean distance metric. Nevertheless it was recognised
in both studies that uncertainties about the method itself, for example the most
appropriate similarity coefficient, should be reflected in cluster interpretation (cf.
Bennett and Jordan, 1975; Galton et al., 1980, appendix 2c).
Uncertainty about technique is perhaps best illustrated by the most recent American study to adopt this approach. Solomon and Kendall(l979) cluster analysed data
from 50 teachers and 1,200 pupils and in so doing they tried several cluster techniques
- Q factor analysis, Linear Typal analysis, Cluster build-up, Elementary Linkage
analysis and a hierarchical method. They reported that although most provided six
teacher clusters they produced somewhat different results. In order to overcome this
they developed several sets of core clusterings , each started from the vantage point
of one of the clustering methods. They then identified for each cluster those classes
which also fell into the same group by at least two of the other clustering methods.
Discriminant function analysis was then used to complete the cluster assignments.
While researchers have been struggling with the practical application of clustering
methods, statisticians have been considering their statistical foundations. Everitt
(1977), for example, argued that A fundamental problem in this area is the lack of a
satisfactory definition of exactly what constitutes a cluster. Because of this, most
clustering techniques cannot be formulated in terms of a satisfactory model . . . Most
cluster analysis methods are essentially non-statistical in the sense that they have no
associated distribution theory or significance tests, and so are unable to relate from
sample to population . . . Hartigan (1977) pointed out the sparsity of methods for
establishing the reality of clustering: The very large growth in clustering techniques and applications is not yet supported by development of statistical theory by
which the clustering results may be evaluated . . . There are many guesses, conjectures, analogies, and hopes, and only a few hard results. Aitkin (1979) has also
pointed out the unsatisfactory nature of clustering methods which are not based on a
probability model: How do we know that a particular configuration of clusters
produced by a numerical algorithm would also have been produced by a different
random sample from the same population, or by a different algorithm on the same
sample? What confidence can be placed in the existence of real clusters? . . The
only methods of cluster analysis which allow formal statistical tests for the actual
existence of clusters . . . are those based on mixture models . . .
Clustering methods based on probability models allow estimation and hypothesis
testing within the framework of standard statistical theory. Though theoretical
difficulties remain in deciding on the number of clusters, for a given number of clusters
the assignment of individuals to clusters is based on standard likelihood ratio methods
analogous to those used in discriminant analysis.

Re-analysis
In re-examining the existence of distinguishable teaching styles, a mixture or latent
class probability model was adopted using the original 38 binary items from the
teacher questionnaire. It is assumed that the population consists of k homogeneous
subpopulations or latent classes of teachers, each class having a distinct teaching style.
Each class is characterised by a set of 38 response probabilities, the probabilities of
responding YES to each of the 38 binary items in Table 1 . Given these probabilities,
the probability that a teacher belongs to the j-th class is calculated from Bayes
theorem using the pattern of Yes and No responses for the teacher. Full details
of the model and the method of estimation of the response probabilities are given in
Aitkin and Bennett (1980). It is an important feature of this form of probabilistic
clustering that it does not produce assignments of individuals to classes, but gives

172

Teaching Styles

instead the probability that each individual belongs to each latent class. This is
preferable to a formal assignment rule (as in discriminant analysis) which assigns each
individual to the class to which he or she has the greatest probability of belonging,
since this overstates the information available about cluster membership.
Parameter estimates

The parameter estimates (i.e., the maximum likelihood estimates of the response
probabilities) for the two- and three-latent class models are shown in Table 1. The
item number corresponds to that in TS (pp. 166-9), the number in parentheses next
to the item number being the number of this item in Table 2 of Bennett and Jordan
(1975). For the two-class model, the response probabilities marked 1- show large
Two- AND

TABLE 1
THREE-LATENT
CLASSPARAMETER
ESTIMATES
(1OOx djl) FOR TEACHER
DATA
Two-Class Model
Item

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 (ij
(11)

(1) Pupils have choice in where to sit


Pupils sit in groups of 3 or more
(2) Pupils allocated to seating by ability

Pupils stay in same seats for most of day

(3) Pupils not allowed freedom of movement in classroom


(4)
(5)

(6)
(7)

(8)

(iii) (9)
(iv) (10)
(v) (11)
17
18
19
20
21
22
23
24
25
26
27 (i)

(12)
(13)
(14)
(15)

(ii) (16)
(iii)
(IV)
(v) (17)
28 (i) (18)
(ii)
(iii) (19)

Pupils not allowed to talk freely


Pupils expected to ask permission to leave room
Pupils expected to be quiet
Monitors appointed for special jobs
Pupils taken out of school regularly
Timetable used for organising work
Use own materials rather than text books
Pupils expected to know tables by heart
Pupils asked to find own reference materials
Pupils given homework regularly
Teacher talks to whole class
Pupils work in groups on teacher tasks
Pupils work in groups on work of own choice
Pupils work individually on teacher tasks
Pupils work individually on work of own choice
Explore concepts in number work
Encourage fluency in written English even if inaccurate
Pupils work marked or graded
Spelling and grammatical errors corrected
Stars given to pupils who produce best work
Arithmetic tests given at least once a week
Spelling tests given at least once a week
End of term tests given
Many pupils who create discipline problems
Verbal reproof sufficient
Discipline-extra work given
Smack
Withdrawal of privileges
Send to head teacher
Send out of room
Emphasis on separate subject teaching
Emphasis on aesthetic subject teaching
Emphasis on integrated subject teaching

h Estimated proportion of teachers in each class

Class 1
22
60
35
91
97
89
97
82
85
32
90
19
92
29
35
71
29
15
55
28
18
87
43
84
57
59
73
66
09
97
70
65
86
24
19
85
55
22
0.538

Class 2
43
87f
23
54t
63t
48t
76t

gt
60
66t
49
76
37
22
44
42

gt
50
;
it
14t
68
29
38
51

44
09
95
53
42
77
17
15

+62: 5 t
0.462

* Indicates an item on which Class 3 is extreme

Three-Class Model
Class 1

Class 2

Class 3

20
54
36
91
100
94
96
92
90
33
95
20
97
28
45
73
24
13
57
29
14
87
50
86
65
68
83
75
07
98
69
64
85
21
15
87
53
21

44
88
22
52
53
50
69
39
70
70
62
56
80
39
29
37
45
59
32
62
95
16
64
30
43
56
48
01
99
49
33
74
13
08
43
61
75

33
79
..
30
89
74
61
95
56
69
35
77
26
75*
34
12*
62
38
20
50
26*
34
90
20
78
34
35*
46*
42*
18*
91*
67
63
85
28*
27*
73
63*
33

0.366

0.312

-f Indicates an item with large differences in response probability between Classes 1 and 2

Numbers in parentheses are item numbers in Bennett and Jordan (1975).

djJIis the estimated probability that a teacher in class j responds YES to item 1.

60

0.322

M. AITKIN,
S. N. BENNETT
and JANE HESKETH

173

differences between the classes; indicating systematic differences in behaviour on these


items for teachers in the two latent classes. For the three-class model, the response
probabilities for classes 1 and 2 are very close to those for the corresponding classes
in the two-class model (though in most cases more widely separated), and the response
probabilities for class 3 are mostly between those for classes 1 and 2, except for those
items marked with an asterisk. Thus class 3 is to some extent intermediate between
classes 1 and 2.
Signijcance of latent class model
Before attempting to interpret these results, we need to consider their statistical
significance. Since this clustering method, like any other, will produce clusters with
homogeneous random data, convincing evidence of the statistical significance of the
latent-class clusters is needed. There are two sources for this evidence.
First, the probabilities of cluster membership for all 468 teachers, for the twoclass model are considered. In Figure 1 is shown a histogram of the 468 probabilities
of cluster membership (arranged so that the larger of the two probabilities is shown).
Of the 468 teachers, 257, or 55 per cent had a probability of 0.99 or more of belonging
to one of the classes, and a further 83, or 18 per cent had a probability of 0.95-0.989.
Thus 73 per cent of teachers had a probability of 0.95 or more of belonging to one of
the classes.
For comparison, 19 random samples of 468 from homogeneous populations were
generated in which the 38 items were independent, and had the same marginal response
rates as in the TS data. In these samples, on average 35 per cent of individuals had a
probability of 0.95 or more of belonging to one of the two classes when a two-class
model was fitted to the homogeneous data. Thus with homogeneous data about one
third of individuals are assigned confidently to one of the two classes. This gives an
indication of the apparent degree of clustering to be expected from random data.
The degree of clustering in the TS data is very much greater than this, pointing to the
real existence of distinct teaching styles. For the three-class model, there was a
substantial drop in the proportion of teachers with high probabilities of membership
in one of the classes. Only 30 per cent had probabilities of 0.99 or more, and another
19 per cent had probabilities between 0.95 and 0-989.
The second source of evidence is a formal test for significance. The test used was
based on empirical simulations of the distribution of the test statistic (-2 log I,
where I is the likelihood ratio) under the null hypothesis of a single homogeneous
population. In 19 simulations of the value of - 2 log I, the largest value obtained was
84.4. In the TS data, the value of -2 log 1 was 775.8, very much larger than the
' critical value ' above. The null hypothesis of a single homogeneous population can
thus be rejected at the 5 per cent level in favour of the alternative hypothesis of two
latent classes. The statistical test used is a permutation test: if the observed test
statistic is larger than s simulation values of the test statistic under the null hypothesis,
then the null hypothesis can be rejected at level l/(s+1).
The same test was used to assess the likelihood of three classes and this was also
significant. Those findings, together with other data provided by Aitkin and Bennett
(1980), indicate that three overlapping, rather than two distinct, styles exist. These
are described below.
Interpretation of classes
Almost all Class 1 teachers restrict the movement and talk of children in the classroom, whilst a large majority organise their work by timetable, emphasise separate
subject teaching and talk to the whole class. A majority have pupils working individually on teacher tasks. Class 2 teachers are much less restrictive in their classroom
organisation, emphasise integrated subject teaching, and are likely to have pupils

Teaching Styles

174
2 60

Figure 1 (a)

2 40

Probability of class
membership, two
latent-class model

220
200

180
u)

160

Q)

140

4J

u-l

120

2
'2

100

80

60
40
20
-6,
1

.5

*.7

.6

.9

.8

1.0

Larger probability of class membership


140

Figure 1 (bl
Probability of class
membership, three
latent-class model

120

.du
80
4J

w
0
FI

60
40

z'

20' a
* '

,.4

n h n h 7 - F

.5

.6

.7

.8

Largest probability of class membership

.9

1.0

M. AITKIN,
S. N. BENNETT
and JANE HESKETH

175

working individually or in groups on work of their own choice. Marking or grading


of pupils work is very uncommon in Class 2. The identification of Class 1 with a
formal, and Class 2 with an informal, teaching style (as these terms were used in
T S ) is very clear.
Class 3 shares some of the characteristics of both the other classes. Like the
formal teachers, their pupils stayed in the same seats for most of the day, were expected
to ask permission to leave the room and were not taken out of school regularly, and
the teachers used textbooks rather than their own materials, had similar teacher
emphasis (Item 16) and similar disciplinary actions to the formal teachers. However,
like the informal teachers, their pupils tended to sit in groups of three or more, they
did not often mark or grade work, and did not give stars for good work. Amongst the
three classes they placed greatest emphasis on aesthetic subject teaching. It is notable
that the Class 3 teachers were lowest in expecting pupils to know their tables by heart,
in giving homework regularly, in giving weekly arithmetic or spelling tests, and end of
term tests. Eighteen per cent of these teachers had many pupils who created discipline
problems, compared with only seven per cent of formal teachers and one per cent of
informal teachers, and nine per cent found a verbal reproof insufficient, compared
with two per cent of formal, and one per cent of informal teachers. Sending children
out of the room, or to the head teacher, were more common disciplinary measures for
Class 3 than for either of the other two classes.
While Class 3 shares some of the characteristics of each of the other two classes,
and might therefore reasonably be called mixed , the disciplinary problems and the
low frequency of testing and assessment give this class a somewhat different character
from that of the mixed style in TS.
Comparison with TS Clusters
Since the result of probabilistic clustering is not an assignment to clusters but a
set of probabilities of class membership, it is not easy to present a simple table comparing the classes of teaching style for each clustering method. Two tables are
presented. First, in Table 2 each teacher is formally assigned to the latent class to
which he has the highest probability of belonging, and this assignment is compared
with his membership in one of the 12 TS clusters. It should be noted that 78 teachers
were not assigned to any of the 12 TS clusters, as they were not close to any of the
12 cluster centroids. These teachers form the unclassified group in Table 2.
It is clear from Table 2 that only TS Clusters 1 and 12 correspond closely to the
TABLE 2
LATENT
CLASSASSIGNMENT
AND TS CLUSTER
MEMBERSHIP
FOR 468 TEACHERS
TS Cluster

Latent
Class

Mixed
(Class 3)
Informal
(Class 2 )
Total

10

11

11
2
7 2 6 1 9
6 1 4
6
1 1 3
8
(3) (41) (46) (6) (27) (69) (63) (20) (39) (20) (21)
1
22
34
19
8
15
10
2
4
3
(97) (59) (33) (67) (58) (26) (7) (13) (8) (3) (0)
33
26
38
30
35
32
24
30
36
31
39

12

(0)
-

(0)

36

Unclass

31
(39)
31
(40)
78

Total

144
149
468

The top entry is the number of teachers in each latent class who fall in the corresponding TS cluster, and the bottom
entry is the percentage of teachers out of the total in this cluster.

176

Teaching Styles

latent classes (2 and 1 respectively). About 40 per cent of TS Cluster 2 teachers are in
latent class 3, the mixed class, as are 20 per cent of TS Cluster 11 teachers. The
remaining TS clusters are split across all three classes to varying degrees, the proportion of Class 1 teachers increasing, and of Class 2 teachers decreasing, fairly steadily
from Cluster 1 to Cluster 12. Clusters 6 and 7 contain the greatest proportion of
Class 3 teachers.
The general pattern of Table 2 supports the ordering in TS from Cluster 1 to
Cluster 12 of increasing formality, though as noted there (p. 47), clusters other
than 1 and 12 contain both formal and informal elements.
It was noted above that the formal assignment of teachers to latent classes overstates the information available from the probabilistic clustering. Since the conclusions drawn about pupil progress in TS depend critically on the cluster membership
of the 37 teachers, Table 3 considers the actual latent class membership for these
teachers. Table 3 shows the probabilities of latent class membership for 36 of the
teachers (one mixed TS style teacher could not be identified, and has been omitted
from this table) for the three-class model.
TABLE 3
LATENT
CLASS
PROBABILITY
AND TS STYLE
CATEGORY
FOR 36 TEACHERSTHREE
CLASSES
TS Style
Latent
Class

Formal

Mixed

Informal

100
100
99
99
100
100
92
100
98
100
71
94

01
01
-

08
02
29
06

100
100

70
12
44
01
100
85
11
-

3
~

- - - - _ - 30
- 01 85

8 8 49
07
98
01
14
86

1 5 8 9 01
99

_
_

_
-

_
-

03
03

73
36
93

2
~

100
100
14
100
97
100
91
100
100
100
27
64
07

The entries are the probabilities of latent class membership ( x 100) for the
three-classmodel for 36 of the 37 teachers in TS Chapter 5.

The formal TS teachers, with one exception, have very high probabilities of
belonging to Class 1. The one exception has a probability of 0.29 of belonging to
Class 3, the mixed class. Nine of the 13 informal TS teachers have very high probabilities of belonging to Class 2, but three of the remaining four have high probabilities
of belonging to Class 3, and the fourth is essentially unidentified. The mixed TS
teachers are poorly identified : three clearly belong to Class 1, one to Class 2, and one
to Class 3, while the remainder have substantial probabilities of belonging to two
classes.

Conclusion
There is convincing statistical evidence, based on the latent class model, of three
distinguishable but overlapping teaching styles. Two of these correspond closely to
the broad classes formal and informal as these terms were used in TS. The third

M. AITKIN,
S. N. BENNETT
and JANE HESKETH

177

class, called ' mixed ' here as in TS, is characterised by a low frequency of testing and
assessment, and a relatively high frequency of disciplinary problems. The classification of the 36 teachers used in TS corresponds closely to the class membership
probabilities for formal teachers, less so for informal teachers and poorly for mixed
teachers.
THE RELATION OF TEACHING STYLE TO PUPIL PROGRESS
In Chapter 5 of TS the relation between teaching style and pupil progress was
investigated using an analysis of covariance model. The analysis was based on the
individual pre-test and test scores of each child, the children being classified by the
teaching style (formal, mixed, informal) of the teacher.
There has been considerable discussion in the educational research literature of
the ' unit of analysis ' question: should the child or the classroom be treated as the
' unit ' on which statistical analysis is based? Gray and Satterly (1976) raised this
question in their discussion of TS, and Bennett and Entwistle (1976) referred to it
briefly in their reply. Satterly and Gray (1976) gave a more detailed discussion of some
of the statistical issues involved, and recognised the need for a variance component
model for the data. A ' mixed ' or variance component model for ' clustered ' or
' nested ' sample designs is developed below for the one-way analysis of covariance
for pre-test/test situations. This model is then applied to the latent class membership
assignment for the 36 teachers described in Section 1. The model is then adapted for
the probability of latent class membership of the teacher.
Variance component model for the analysis of covariance
Let YPqr denote the achievement test score, and xpqrthe pre-test score, of the
r-th child in the q-th classroom, taught by method p , where r = 1, . . . nq, q = 1, . . .,
36, p = 1, 2, 3, N = Zqn,. All subsequent analyses will be based on extensions or
contractions of the variance component model:
Y p q r = P + Y X p q r + u p + Tq + E p q r .
Here T, and Epqrare mutually independent random variables, assumed to be normally
distributed :

Tq N(0, 0 2 T h E p q r N(0, 02d,


and p and y are the intercept and slope of the regression.
The up are constants with a3 = 0 (so that the model is of full rank-alternatively,
we could take Cpup = 0), representing the mean achievement differences between
methods 1 and 2, and method 3. The slope of the regression of test score Yon pre-test
score x is y , assumed to be the same within each teaching method.
The T, are treated as random variables rather than fixed constants because the
teachers have been selected from a population, and the interest is in modelling
the variability of teachers in this population, rather than drawing inferences about the
particular subset of teachers included in the sample. Teaching methods are represented by fixed constants because they are the unique set of experimental ' treatments '
under examination.
The properties of the above model are well known, and are described, for example,
in Searle (1971, Chapters 9 and 10). A consequence of the random teacher effects is
that the achievement scores of children within the same classroom are positively
correlated :
var ( Ypqr)= var (Tq Epqr)= 0 2 ~ 0 2 ~
cov(YpqryY,,,) = cov (T,+EPqr,Tq+EpqE)= Var (T,) = 0 2 ~
N

COrr(Ypqr, Ypqfi) = P = 0 2 T / ( 0 2 T + 0 2 E )

178

Teaching Styles

The intraclass correlation p may be large if d Tis large compared with 02,, and is
zero only when ( ~ =2 0, ~that is when there is no variation among teachers in the
teacher population, which will rarely happen in practice.
The above model may be extended to allow for pre-test by method interactions:
it may happen that the slope of the regression of test score y on pre-test score x is
different for different methods. A comparison of the methods then depends on the
covariate value considered, and one method may be superior for low pre-test scores,
while another is superior for high pre-test scores. The extended model is
Y p q r = P + Y p x p q r + ~p + Tq + E p q r ,
+ method 2,
and the regressions are now p + ylxlqr+ a1 for method 1, p y 2 ~ 2 ~u p~for
and p + ~ 3 x for
3 ~method
~
3.
Unconditional conclusions about the relative superiority of one treatment to
another are not possible in general with this extended model. Although methods are
available for drawing conditional conclusions given the value of the pre-test score, this
is not pursued further, as the interaction model will not be found necessary.
In general, efficient (maximum likelihood) estimation of the parameters in the
above models requires extensive iterative computation, even when the class sizes are
equal. Several simpler methods are available which give consistent, but not efficient,
estimates, and for which approximate ANOVA tables can be constructed. Three of
these were applied to the TS data, both for internal comparisons, and for comparison
with the efficient method. Discussion of these methods is given in Aitkin and Bennett
(1980). The methods are summarised in terms of their estimation of the fixed effects
as follows: I-ignore the random effects ; II-unweighted class means ; III-class
means weighted by sample size.
For each method, an ANOVA table can be presented as follows.
Source
SS
df
MS
Regression on pre-test x
ss,
1
Among methods, adjusted for pre-test S S ,
2 MS,
Method x pre-test interaction,
adjusted for methods and pre-test SS,
2 MS,
Residual variation among teachers
SS,
31 MS,
Within teachers, adjusted for pre-test SS, N-37 MS,
The first four sums of squares are obtained by successive differencing of the residual
sum of squares among teachers after fitting the appropriate parameters. This
procedure is fully described in Aitkin and Bennett (1980).
It should be noted that the sums of squares do not have distributions which are
multiples of x2, even when the class sizes are equal. If there are no covariates, the
sums of squares have multiples of x 2 distributions if the class sizes are equal, and
approximate x2 distributions if the class sizes are not too unequal.
Before describing the results of these methods, consideration of the effect of
classroom formation on the conclusions to be drawn is needed.

The eflect of non-random assignment to classes


The straightforward interpretation of teaching method differences would require
the random assignment of children to classrooms, and the random assignment of
teachers to teaching methods. The reality of the classroom formation in TS is very
different. First, teachers were not randomly assigned to methods : rather, teachers
with existing styles were assigned (independently of the TS study) to intact classes.
The greatest extent of randomness that could be hoped for is that the assignment of
teachers was not based on the nature of pupils in the classes-that is, that teachers

M. AITKIN,S. N. BENNETTand JANE HESKETH

179

recognised as formal were not systematically assigned to classes which were below
(or above) average on the pre-test.
If there were evidence of such an assignment bias, it would be very difficult to
draw general conclusions about differences in achievement between formal and informal teaching styles used on pupils of the same initial achievement, for teaching style
and initial achievement would be at least partly confounded.
Since pupils were not randomly assigned to classes, it may be expected that the
36 classes will differ systematically in their mean scores on the pre-test, such differences
reflecting variation in the school populations, previous teachers and other systematic
effects. The adjustment for the pre-test should then reduce the residual variation
among teachers, and thus increase the sensitivity of the test for teaching style differences, since the variation among teaching styles would not be reduced by the pre-test
adjustment, if initial achievement and teaching style are not confounded.
Thus we may expect that the ANOVA variance component model, when applied
to the TS study, will give interpretable results only if there are no systematic differences
among teaching styles on the pre-test score. Even in this case, considerable care is
needed in interpreting different styles as a cause of differential achievement. The data
do not come from a randomised experiment, and there are many possible confounding
variables. Discussions of such variables were given in TS, Gray and Satterly (1976)
and Bennett and Entwistle (1976).
With these cautions in mind, the results of the variance component models
applied to the TS data are considered in the next section.
A further difficulty, referred to several times previously, is that latent class
membership is probabilistic, since class membership is not observable. An extended
ANCOVA model incorporating latent variables is necessary to properly model the full
data: such a model is considered later.
ANCOVA results for the TS data
The pre-test scores for reading, mathematics and English are first considered.
A one-way classification variance component model is fitted to each of the pre-test
scores, using the approximate ANOVA method for unequal class sizes. The ANOVA
tables are shown in Table 4,based on complete data for 921 children (although 950
children were analysed in TS, one complete classroom of 29 children was omitted in
the re-analysis because the teachers style could not be identified).
TABLE 4
ANOVA

OF PRE-TEST
SCORES

Reading
Source
Among styles
Among classrooms
within styles
Within teachers

df

2>,
33
885

ss

Mathematics

2,826

MS
1,413

57,649
163,540

1,747
185

ss

50,355
1203
106,227

185
61
0.25

120
53
0.31

157
59
0.27

Means
Formal
Mixed
Informal

101.1
97.4
97-7

99-9
97.3
97.9

102.8
99.7
99.1

1,728

Variance component estimates


62E

6.2 T

English

MS

w21

1,473

1,526
120

ss

MS

2,853

1,427

55,224
139,293

1,683
157

1,659

Teaching Styles
In all three cases, the among-styles mean square is less than the among-teacher
within-styles mean square, so there is no evidence of association of style with pre-test
score. The variance component estimates are also given in Table 4, based on the
pooled style and within-style sums of squares. The correlation between childrens
pre-test scores within classrooms is moderate, and certainly not zero.
This conclusion differs from that in TS and arises from the use of a different
denominator in the F-tests used. In TS, the within-styles mean square was used as the
denominator for the Ptest of among-style differences. Here the residual among
classrooms within-styles mean square is used. In the variance component model the
ratio of among-styles mean square to within-styles mean square does not have an
Fdistribution unless 6,= 0, and its distribution depends on the ratio of the variance
components B,/a2,
(see Aitkin and Bennett, 1980, for details). Since for all three
test scores 02, = 0 is not tenable, the test for style differences must be based on the
ratio of among-stylesmean square to the among-classroomswithin-styles mean square.
The former ratios are all about 10, the latter are all less than 1.0.
The ANCOVA variance component model was fitted to each of the test scores, and
the ANOVA tables are shown in Table 5. Three tables are presented comparing the
three methods on each test score. The analyses of variance and parameter estimates in
Table 6 are fairly consistent over the three methods.
It is clear that the variation among styles is quite small compared with that
among teachers within styles. There are negligible style-by-pre-test interactions,
though it is notable that Methods I1 and I11 consistently find larger interactions than
Method I. These interactions are therefore pooled with the error term as indicated in
Table 5. As expected, the residual variation among teachers on the test score has been
substantially reduced compared with that on the pre-test after adjustment for the
pre-test. However, the teaching style sums of squares are such that the small effects of
TABLE 5
ANCOVA

OF

TESTSCORES:
LATENT
CLASSASSIGNMENT

Method I
Source

Method I1

df

ss

MS

ss

1
2
21

132,985
527
114

263
3871

0.39

1,778
17.5
99.4

MS

Method 111

ss

0.34

38,590
530
2.160

MS

(a) Reading

Pre-test
Among styles
P x S interaction
Residual
among teachers
Within teachers

(b) Mathematics
Pre-test
Styles
Residual
P
XS
among teachers
Within teachers

3 1 ) ~ ~
882
1
2

21,644
38,305
144,555
972
356

486

2}33
31
882

15,892
44,376

50

1
2

116,457
1,186
26

312}33
882

9,675
41,285

(c) English

Pre-test
Styles
PXS

Residual
among teachers
Within teachers

6983679
43

0.99

513
178}492

726.8

8.8
49.71

2 4 . ~ ) ~ ~ ~20;180

2,975

479.7
-

0.63
16.0

16.4

265
1.0801
673]698

69,450
670

335

870
12,500

417
435)418

0.38

0.80

181

M, AITKIN,S. N. BENNETT
and JANE HESKETH
TABLE 6
ADJUSTED
MEANDIFFERENCES
FOR TEACHING
STYLES
: LATENT
CLASSASSIGNMENT
Reading
Method

0.10
Formal
Mixed
-1.12
Informal
1.01
Regression
0.77
Coefficient

Mathematics

English

I1

111

I1

-0.35
-0.66
1.02

0.03
-1.08
1.04

1.09
-1.62
0.52

0.61
-1.28
066

0.79
-1.41
0.61

0.85

0.80

0.95

1.18

1.15

111

I1

I11

1.49
-1.41

1.24
-1.25
0.00

1.34
-1.35
0.01

-0.08

0.76

0.87

0.82

(These estimates are obtained from and &2in the ANCOVA model of $2.3, u3 being set to zero, by
subtracting (&I +62)/3, to give estimates which sum to zero.)

different teaching styles are swamped by the variation among classrooms due to other
systematic effects. The largest style effects are in English using Method I, but the
Pvalue for among-styles compared with among-teachers is only 2-02, which is not
significant. Table 6 shows the intercept differences, or adjusted mean differences,
between styles on each test, from the model with no interactions.
The direction of the differences is not consistent with those reported in TS, due
to the change in class membership of teacher resulting from the different class assignment by the latent class model. The formal classrooms do best in English and slightly
better in mathematics but the informal classes do best in reading. The mixed classrooms do worst on all tests. These differences, though interesting, are not statistically
significant.
Latent class model for change
Teaching style is not observable, but is estimated probabilistically from the 38
binary behaviour variables. It has been shown that the mixed style, Class 3, was the
lowest on all three tests, and Table 3 shows that there are very few teachers who are
unequivocally assigned to this class-only two teachers have a probability of 0.9 or
more of belonging to it, and three others a probability of 0.85 or more, though there
are seven teachers assigned to this style by the assignment rule.
The assessment of teaching style differences must allow for the certainty with
which these style assignments are made. A reasonable procedure is to fit the
ANCOVA model replacing the implicit (0, 1) dummy variables for teaching style
membership by the latent class membershipprobabilities. Thus if zptakes the value 1 if
the child is taught by a teacher in latent class p , and 0 otherwise, then z1 is replaced
by P (class 11 X = x), and 22 by P (class 2)X = x), where these probabilities are given
in Table 3. Thus for the first teacher in the table, the dummy variables z1 and z2 take
the values 1-00 and 0.00, as they do in the previous analysis. For the last teacher,
z1 and zz take the values 0.00 and 0.07, these being the probabilities of membership in
Classes 1 and 2 for this teacher.
In the resulting ANCOVA model, CLI and c12 still have the same interpretations as
the (Class l-Class 3) and (Class 2-Class 3) mean differences; the change is only in the
certainty of the identification of the class membership of each teacher.
The use of the probabilities of style membership instead of the (0, 1) dummy
variables, though reasonable, is only an approximation to the efficient maximum
likelihood analysis. It is analogous to the use of estimated factor scores as predictors
of a response variable, instead of the full maximum likelihood estimation of the
parameters in the combined factor and regression model (such models are discussed
in Joreskog and Goldberger (1975) and can be analysed using the LISREL package).
D

182

Teaching Styles
TABLE 7
ANCOVA OF TESTSCORES:
LATENT
CLASSPROBABILITIES
Method I

Method I1

Method I11

ss

MS

ss

MS

SS

132,985
693
629

346
3141

0.51

1,778
18.5
115.0

9.2
57.5

0.39
2.43t

38,590
660
2,470

330
1,235

19,740

658

69,450
1.170

585

11,830
11040

394

41,550
1,554

777

424
8,359

279

df

Source

MS

(a) Reading

Pre test
Styles
PXS
Residual
among teachers
Within teachers

(6) Mathematics
Pre-test
Styles
Pxs
Residual
among teachers
Within teachers

1
2
31
2}33

21,623
698)674
710.2
23.7
38,305
43
tinteraction significant for informal group

882
1
2
31
z>33

144,555
1,731
298
15,191

865
1491

1.84

490)~~'

2,975
36.0
524

18.0
26.2

445.4

14.7

1.23
1.78

882

"1-

0.50
I48t

1.46

(c) English

11

1
2

Pre-test
Styles
PXS
Residual
among teachers
Within teachers

1.93
1*42

-7---

17
31
.>33

9,013

291jLr3

353.9

11.4

882

* P<O.lO

2121276

**P<0.05

TABLE 8
ADJUSTED

MEANDIFFERENCES
FOR TEACHING
STYLES: LATENTCLASS PROBABILITIES
Reading

Method

Formal
0.57
Mixed
- 1.75
Informal
1.19
Regression
0.77
Coefficient

I1

Mathematics

I11

See. text
for these:
interaction
-

I1

English

I11

2.82*

I1

I11

1.65
-2.78
1.13

0.91
-2.08
1.16

1.20
-2.36
1.15

2.09
-244
0.35

1.62
-1.92
0.31

1.93
-2.33
0.40

0.95

1.18

1.14

0.76

0.86

0.81

The results of fitting the model are shown in Tables 7 and 8. The F-distributions
for variance ratio tests may be regarded as rough approximations since the probabilistic dummy variables are determined independently of the regression data.
The significance of all style differences is substantially increased. For mathematics, the differences still do not reach significance, though the contrast between the
mixed style and the formal and informal styles is more pronounced. For English, the
differences by Method I reach significance at the 5 per cent level of F2,33,but are not as
large by the other two methods. For reading, the style by pre-test interaction with two
degrees of freedom (df) contains one component with almost all the sum of squares:
the contrast of informal with formal slopes. This single df term is significant at the
5 per cent level of F1,31for Method 11, and is almost significant for Method 111. It is
not significant by Method I.

M. AITKIN,
S. N. BENNETT
and JANE HESKETH

183

TABLE 9
READING
TEST-PRE-TEST REGRESSIONS
:
LATENT
CLASSPROBABILITIES
Method I1
Method I11

P = 4.1f1.02~
Formal
Mixed
P = 25.5t0.80~
Informal 'I = 54.7 + 0 . 5 2 ~
Formal
'I = 10.5 0.96~
Mixed
P = 22.9+ 0 . 8 2 ~
Informal
= 58.9+0.48x

The estimated regressions for the three groups for reading by Methods I1 and I11
are shown in Table 9. The slope is greatest for formal, and least for informal styles,
and the parameter estimates are very similar for the two methods. For classes scoring
low on the pre-test, the informal style has a much higher mean test score-an eightpoint difference for the lowest class. The cross-over between formal and informal
occurs at about x = 102, and classes scoring high on the pre-test do better under a
formal style-for the highest class, the difference is six points. The mixed regression
is in between, but is closer to the formal regression.
Comparison with maximum likelihood estimation
We consider finally the estimation of the parameters of the model by maximum likelihood. Programmes for ML-estimation in the unbalanced mixed model are not widely
available (BMDP has such a programme, but it was not implemented on UMRCC)
so a GENSTAT macro was developed (by Dorothy Anderson). Tests of the hypotheses of no interaction or no style main effects are based on the likelihood ratio test, and
Table 10 gives an ' analysis of deviance ' table in which the entries have x 2 distributions
under the appropriate null hypothesis. This table should be compared to Table 7
where the approximate ANOVA methods were used.
None of the pre-test by style interactions is significant at the 10 per cent level.
Reading, which had the largest interaction in the class mean models, has the smallest
here. The main effect of English is significant at the 10 per cent level. No other
effects are significant. Parameter estimates for the main effect models are given in
Table 11. The pattern of mean differences is similar to that found by the ordinary
least squares Method I, but the differences are smaller.
It should be noted that the class mean method of parameter estimation results in
a serious loss of efficiency. In the case of reading a misleading interaction appears,
and the conclusions about the relative differences between styles are incorrect. Since
the class mean method is based on only 36 ' observations ', the possibility of random
fluctuations among classes producing misleading results is quite high, and this method
cannot be regarded as a satisfactory substitute for ML estimation when the number of
classrooms is small.
TABLE 10

ANALYSIS
OF DEVIANCE
OF TESTSCORES
: LATENT
CLASS
PROBABILITIES
Deviance

Source
Styles, adjusted
for pre-test
PXS

df

(a) Reading

(6) Mathematics

(c) English

0.8

2.8

5.2*

0.2

1.o

4.0

P < 0.10

184

Teaching Styles
TABLE 11
ADJUSTED
MEANDIFFERENCES
FOR TEACHING
STYLES BY RESTRICTED
MAXIMUM
LIKELIHOOD
:
LATENT
CLASSPROBABILITIES
Reading Mathematics English
Formal
Mixed
Informal

0.15
- 1.29
1.14

1.33
-2.56
1.22

1.91
-2.18
0.27

Conclusion
The teaching style differencesin achievement which were found in TS are modified
by the re-analysis. There are two reasons for this. First, the analysis of covariance
model which includes the random effect of teachers results in greatly reduced significance of any differences, because of the large random variation among teachers.
Second, the clustering of teachers by the latent class model changes the nature of the
differences between teaching styles.
The only significant teaching style differences are in English, where the formal
style has the highest mean, mixed the lowest, and informal is in the middle. In
mathematics, the formal and informal styles are close, and substantially above the
mixed style. In reading informal has the highest mean, mixed the lowest, and formal
is in the middle. Though the differences may appear small, the four-point difference
between formal and mixed in reading corresponds to a 6 to 8 months difference in
reading age. It is of interest that the mixed style which was distinguished in the
cluster analysis by a relatively high frequency of disciplinary problems, and by the
lowest use of formal testing, gives consistently the worst results in the achievement
model.

RECOMMENDATIONS
The re-analysis of the TS data discussed in this paper raises important issues for
the design and analysis of future educational research studies of this kind.
First, research designs using multi-stage sampling of schools and classrooms are
natural and administratively feasible. The examination of intact classrooms for
teacher or pupil differences does raise difficult statistical problems, but the formidable
difficulties of randomised experiments in a school administrative context mean that
non-randomised observational studies will remain important in educational research.
When intact classrooms are the effective experimental or quasi-experimental unit,
but outcomes are measured on pupils in the classroom, the correlation between pupils
within a classroom must be allowed for by a suitable variance component model.
Such models are necessary for multi-stage sampling procedures of all kinds in general
survey designs. It should be clear from the discussion in Section 2 that the effective
sample size for testing the significance of effects at the teacher level is the number of
classrooms in the study, and this number should therefore be as large as possible:
many classrooms with few pupils in each will give much greater power than few
classrooms with many pupils. Financial constraints obviously impose a limit on the
possible number of classrooms, but a small number of classrooms is likely to result
in low power and the failure to find differences. Only the four-point difference in
reading was statistically significant, but smaller differences than this are educationally
significant.
The approximate methods of analysis described in Section 2 are not satisfactory
alternatives to full maximum likelihood estimation in the variance component model.
In particular, the use of class means results in a very serious loss of efficiency in

M. AITKIN,
S. N. BENNETT
AND JANE HESKETH

185

estimating effects at the pupil level (for example, the regression of test on pre-test, or
the size of sex differences).
A major gap exists in statistical packages in this area: BMDP is the only package
available which has a maximum likelihood programme, and implementation of this
programme at UMRCC has been substantially delayed. Efficient methods for the
analysis of multi-stage sample designs cannot become generally used without good
general programmes.
In non-randomised observational studies, many sources of potential bias are
present. We cannot interpret effects of interest (like teaching style differences) in
such studies as though they had arisen from properly randomised experimental studies.
The best that can be done is to measure other possible confounding variables,
and to allow for their effects through covariance analysis. This ' statistical control '
is never a substitute for ' experimental control ' through randomisation. In our
interpretation of teaching style effects, we noted that confounding of pre-test score
with teaching style did not occur, and so tentative conclusions could be drawn about
the ' effects ' of different styles. However there are many other possible confounding
variables, some of which were discussed by Gray and Satterly (1976) and Bennett and
Entwistle (1976), and in TS itself, and so the interpretation of different teaching styles
as a cause of differential achievement should not be pushed too far. An important
implication of non-randomised studies is the need for measurement of a large number
of possible confounding variables, and the resulting complexity of the statistical
models which need to be fitted.
In the discussion of clustering in Section 1, great emphasis was placed on the
latent class model. Latent variable models are essential in the analysis of studies of
this kind, in which a large amount of information (38 items here) is available about
each teacher, but the number of teachers used in the second stage is relatively small.
It would not be possible to use the 38 items as explanatory variables in a regression of
test score on pre-test and the items, for there are more items than teachers in the second
stage. The items are treated as indicators of an underlying latent style of teaching,
and so the dimensionality of the teacher information is reduced from 38 to two (the
two dummy variables needed for three styles).
It is worth emphasising again that clustering methods must be based on statistical
models if they are to have any validity. Cluster algorithms based on distance functions
which bear no relation to the type of data considered, or to any probability considerations, cannot be expected to produce clusters which have any statistical validity. The
probabilistic nature of cluster membership is an essential feature of the statistical
model, and formal assignments to clusters from standard algorithms overstate the
real information available from clustering. (In the re-analysis, the addition of the
random error involved in producing a formal assignment actually reduces the differences among styles.) It is important to note that effective clustering requires a sample
size which is large relative to the number of descriptive items used. If the sample size
is not large relative to the number of items, the occurrence of multiple local maxima
of the likelihood function indicates that there may be several different configurations
of clusters which are equally well supported by the data. It would have been pointless
to attempt to cluster a small sample of, say, 50 teachers using 38 binary items, or even
ten items. Again financial constraints limit the possible sample size in observational
studies. There are very few programmes available for probabilistic clustering. The
normal mixture model was described by Wolfe (1970), and a FORTRAN listing for
this model is given in the book on cluster analysis by Hartigan (Wiley, 1975), Goodman (1978, p. 468) gives a brief reference to a programme for maximum likelihood
latent structure analysis.
Simple macros in GLIM and GENSTAT can be written for ML estimation of

Teaching Styles

186

the parameters in general mixture models, including the latent class model of Section
1. Programme listings can be obtained from the Centre for Applied Statistics at
Lancaster.
The final comments are on the importance of statistical computing and statistical
modelling in the analysis of educational studies of this kind, and indeed of any studies
involving multi-stage sampling. The statistical theory of the analysis of unbalanced
mixed models has been established for at least 10 years, but only recently have any
computer programmes been developed which are suitable for the analysis of such
studies. Such programmes are still not generally available, and a pressing need
exists for the development of general-purpose programmes or sub-routines which can
handle the latent variable models on which clustering methods should be based. Such
programmes are under development at Lancaster in the Complex Social Data research
programme supported by the SSRC.
The importance of statistical modelling is more general. The discussion of
' class ' versus ' pupil ' as the ' unit of measurement ', can be resolved by answering
the question, " What is the appropriate statistical model for data from a multi-stage
sample? " This is even more important for cluster models, which are much less
developed statistically.
ACKNOWLEDGMENT.-This

study was supported by SSRC (Grant HR5710).

REFERENCES
AITKIN,M. A. (1979). Dealing with survey data. Br. J. educ. Psychol., 49, 198-205.
AITKIN,M. A., and BENNETT,
S. N. (1980). A Theoretical and Practical Investigation into the Analysis
of Change in Classroom Based Research. Final report on SSRC grant HR5710.
AITKIN,M. A., ANDERSON,
D. A., and HINDE,J. P. (1981). Statistical modelling of data on
teaching styles. J. Roy. statist. soc., Series A & B, 144, (in press).
BARKER
LUNN,J. C. (1970). Streaming in the Primary School. Slough: NFER.
S. N. (1976). Teaching Styles and Pupil Progress. London: Open Books.
BENNETT,
BENNETT,
S. N., and EN'IWISTLE,N. J. (1976). Rite and wrong. A reply to 'A Chapter of Errors'.
Educ. Res., 19, 217-222.
BENNETT,
S. N., and JORDAN,
J. (1975). A typology of teaching styles in primary schools. Brit.
J. educ. Psychol., 45, 20-28.
C. A. and PAYNE,C. (Eds.),
EVERITT,B. S. (1977). Cluster analysis. In ~'MUIRCHEARTAIGH.,
The Analysis of Survey Data. Vols. I and 11. Chichester: Wiley.
GALTON,
M., SIMON,B., and CROLL,P. (1980). Inside the Primary School. London: Routledge and
Kegan Paul.
GOODMAN,
L. A. (1978). Analyzing QualitativelCategorical Data. London : Addison-Wesley.
GRAY,J., and SATTERLY,
D. (1976). A chapter of errors. Educ. Res., 19, 45-56.
HARTIGAN,
J. A. (1975). Crustering Algorithms. New York: Wiley.
HARTIGAN,
J. A. (1977). Distribution problems in clustering. In VANRYZIN,J. (Ed.), Classification
and Clustering. New York: Academic Press.
JORESKOG,
K. G., and GOLDBERGER,
A. S. (1975). Estimation of a model with multiple indicators and
multiple causes of a single latent variable. J. American statist. Assoc. 70, 631-639.
SATTERLY,
D., and GRAY,J. (1976). Two Statistical Problems in Classroom Research. Unpublished.
SEARLE,
S. R. (1971). Linear Models. New York: Wiley.
SOLOMON,
D., and KENDALL,
A. J. (1979). Children in Classrooms. New York: Praeger.
WOLFE,J. H. (1970). Pattern clustering by multivariate mixture analysis. Multiv. Behav. Res., 5,
329-350.
(Manuscript received 10th October, 1980)

You might also like