You are on page 1of 10

G Model

JSEE-539; No. of Pages 10

Studies in Educational Evaluation xxx (2014) xxx–xxx

Contents lists available at ScienceDirect

Studies in Educational Evaluation


journal homepage: www.elsevier.com/stueduc

Teaching skills of student teachers: Calibration of an evaluation


instrument and its value in predicting student academic engagement
Wim van de Grift, Michelle Helms-Lorenz, Ridwan Maulana *
Department of Teacher Education, University of Groningen, The Netherlands

A R T I C L E I N F O A B S T R A C T

Article history: Student teachers are expected to develop their teaching skills sooner and more rapidly. However, a
Received 8 May 2014 sound evaluation instrument that can be used to diagnose and monitor the skilfulness level to aid
Received in revised form 12 August 2014 formative assessment of student teachers is still limited. This article is aimed to calibrate and validate a
Accepted 26 September 2014
teaching skill evaluation instrument for use in secondary education. A total of 264 student teachers in the
Available online xxx
Netherlands participated in the study. Rasch and multilevel analyses were used. Results suggest that the
evaluation instrument meets the restrictive assumptions of the Rasch model and has predictive value for
Keywords:
academic engagement. This adds validation evidence and justifies the calibration of the evaluation
Teaching skills
Evaluation instrument
instrument to be used for monitoring the development of teacher’s teaching skills.
Rasch model ß 2014 Elsevier Ltd. All rights reserved.
Student teachers
Secondary education

Background A second reason is the fact that many beginning teachers leave
their first job or quit the profession altogether because their level
There are various important reasons why it is essential for of basic teaching skills is still far behind that of their experienced
beginning teachers to reach higher levels of teaching competence colleagues (Van de Grift & Helms-Lorenz, 2012). Beginning
sooner and more rapidly. The first is the ageing teaching body in teachers who more rapidly attain a high level of teaching
many countries, with many experienced teachers likely to leave competence will also sooner build up confidence in their own
the profession in the coming years. Although the beginning performance and experience less occupational stress, which means
teachers who will replace the experienced teachers are labelled they will be less inclined to leave the profession (Helms-Lorenz,
‘‘competent to start teaching’’, there is a big gap between their Slof, & Van de Grift, 2013; Helms-Lorenz, Slof, Vermue, & Canrinus,
teaching skill levels and that of experienced teachers. In the 2012).
Netherlands, for example, the basic teaching skills of many A third reason urges for rapid boosting the teaching skills of
beginning secondary teachers lag a half to a whole standard beginning teachers relates to student outcomes. Several interna-
deviation behind that of teachers with 15 years of experience (Van tional comparative studies show that, on average, Dutch 15 year
de Grift & Helms-Lorenz, 2013). In this cross sectional study, olds students score relatively high. However, the percentage of
teaching skills of more than 1600 teachers were observed with poorly performing secondary school students in the Netherlands
observation scales used in earlier studies (Van de Grift, 2007, increased between 2000 and 2009, while the percentage achieving
2013). More details can be found in Van de Grift and Helms-Lorenz excellence fell (Mullis, Martin, Foy, & Drucker, 2012; OECD, 2010).
(2012). Unless the competence of beginning teachers is enhanced, To combat this, more teachers are needed who can help weaker
the quality of teaching will drop in many countries, with the students to at least achieve certain minimum goals and to prevent
consequence of lowering pupil achievement, as a large number of gifted students from underachieving. This is a difficult skill for
experienced teachers are replaced by new inexperienced teachers. many teachers, including those at the start of their career (Van de
Grift & Helms-Lorenz, 2013). This adaptive skill has been shown to
be lacking in many countries (OECD, 2010).
* Corresponding author at: Department of Teacher Education, University of A fourth reason is that various European countries, including
Groningen, Grote Kruisstraat 2/1, 9712 TS Groningen, The Netherlands.
the Netherlands, have seen growing numbers of students entering
Tel.: +31 503639120.
E-mail addresses: r.maulana@rug.nl, mridwanmaulana@yahoo.com higher tracks of secondary school in recent years (OECD, 2012). In
(R. Maulana). 1990, 32% of students in the Netherlands went to senior general

http://dx.doi.org/10.1016/j.stueduc.2014.09.003
0191-491X/ß 2014 Elsevier Ltd. All rights reserved.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

2 W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx

secondary education (HAVO) or pre-university (VWO). In 2009, the concerned with their impact on students and with what students
percentage increased to 44%. As a result, teachers working in lower are taught: ‘‘What do students need and what has to be done to
kinds of secondary education have in recent years seen the ‘‘better’’ help them develop?’’ Fuller calls these impact concerns. At this
students go to higher school tracks. It also means that teachers in stage, teachers are concerned about ‘‘guiding, challenging, and
higher tracks of secondary schools now face more heterogeneous meeting the diverse needs of students’’ (Schipull, 1990, p. 11).
class compositions than in the past. This calls for more complex Fuller believes they do not come into play until teachers have put
teaching skills, such as helping weaker students attain at least task concerns behind them. Reaching this stage is important to be
certain minimum goals, activating and motivating students, able to adapt teaching to the needs of students. Although this
teaching students how to learn, and encouraging (gifted) students theory has inspired a good deal of scientific work, it has never been
to achieve excellence. This situation will intensify in the future as tested widely.
the number of students entering secondary school will decline in Considerable studies on teacher effectiveness and learning
some western European countries. In the Netherlands, for example, environments show that approximately 15–25% of the variance in
the number of 12-year olds will fall from 206,504 in 2013 to student achievement (after correction for pre-tests and various
182,106 in 2020 – a 12% drop (Statline CBS, 2013). This will lead to student background characteristics) can be explained by curricu-
classroom populations with more heterogeneous ability levels. lum quality, the amount of learning time, and various teaching
This is why it is so important to boost the competence level of skills including the creation of a safe and stimulating environment
new teachers and to accelerate the development of their teaching for students, efficient classroom management, the quality of
skills. In order to achieve this, we need an accurate and valid instruction, teaching students how to learn, monitoring student
evaluation instrument to establish and monitor the skill level of progress, adapting teaching to student differences, and attention to
teachers both during and after their training. The validity of an students at risk of falling behind (Creemers, 1994; Hattie, 2012;
evaluation instrument can be proven when the construct Levine & Lezotte, 1995; Marzano, Pickering & Pollock, 2001; Purkey
measured can predict an outcome measure (predictive validity). & Smith, 1983; Sammons, Hillman & Mortimore, 1995; Scheerens,
Hence, it is important to investigate the relationship between 1992; Scheerens & Bosker, 1997; Walberg & Haertel, 1992). Some
teaching skills as measured by the evaluation instrument and an characteristics lend themselves to observation in the classroom,
external criterion like student academic engagement. In the while others are best investigated by means of questionnaires.
present study, these two aims are focused on. What follows is a brief summary of the research literature on
teaching skills that lends itself to observation and which has
Theoretical framework proven effective for student learning gains.
A safe learning climate characterized by a relaxed atmosphere
Since the 1970s, teacher education institutions and inspecto- and mutual respect that encourages student self-confidence have
rates of education have used observation instruments to evaluate been shown to produce good student outcomes (Cornelius-White,
whether and to what extent student teachers are competent to 2007; Fraser, 1985; Hattie & Clinton, 2008; Smith, Baker, Hattie &
begin teaching (e.g., Van de Grift, Van der Wal, & Torenbeek, 2011). Bond, 2008; Wilkinson & Fung, 2002). Efficient classroom
Almost without exception, these tools are not based on teacher management like starting and finishing the lesson on time,
behaviour that has been shown to correlate with student efficient transitions between lessons, no wasting of time, no
achievement. Nor, generally, have the reliability and validity of queues at the teacher’s desk, a well-structured lesson and
the evaluation instruments been investigated or demonstrated, maintaining order has a significant relationship with student
and no calibration studies have been carried out. Exceptions are achievement (Brophy, 1979; Carnine, Dixon & Silbert, 1998;
the observation instruments developed by Van de Grift and Lam Creemers, 1994; Houtveen, Booij, De Jong, & Van de Grift, 1999;
(1998), Kyriakides, Creemers and Antoniou (2009) and the Dutch, Marzano, 2003; Wang, Reynolds & Walberg, 1995; Yair, 2000).
Scottish and English education inspectorates (Dutch Inspectorate Furthermore, clear instruction includes setting clear lesson
of Education, 1998; HM Inspectorate of Education, 2001; Ofsted, objectives, checking whether these objectives are achieved,
1995). activating prior knowledge, having a clear lesson structure and
Despite the availability of various observation tools, the effectively alternating explanation, presentation, independent
research literature lacks a strong and tested theory of how work, group work and individual help, phasing instruction and
teachers develop the general teaching and methodological skills to the processing of subject matter, giving clear examples, checking
promote learning and learning gains among their students. whether the subject matter is understood and that tasks are carried
However, there is Fuller’s theory (1969, 1970) on the development out properly, and offering immediate feedback if this is not the case
of what she called ‘‘teacher concerns’’. This theory describes the (Creemers, 1994; Hattie & Clinton, 2008; Kameenui & Carnine,
peaks of the iceberg, but does not clarify how teachers progress 1998; Locke & Latham, 1990; Pearson & Fielding, 1991; Rosenshine
from one stage to another. Fuller posited a model of teacher & Meister, 1997; Smith et al., 2008). The literature mentioned
development in which teachers move from concerns about self, to ‘‘evidence-based guidelines’’ for these instructional principles,
concerns about tasks, to concerns about the impact they have on which are referred to as ‘‘direct’’ or ‘‘explicit’’ instruction
students. In the first stage, they are less concerned with their (Rosenshine & Stevens, 1986).
classroom practice than with questions such as ‘‘Is teaching the Next, activating students, intensifying instruction and avoiding
right job for me?’’ and ‘‘Do I have anything to offer the teaching too much unproductive time have been shown to correlate with
profession?’’ Besides feelings of self-adequacy, concerns about student achievement (Hampton & Reiser, 2004; Lang & Kersting,
relations with students dominate students teachers at this stage as 2007). The extent to which students feel engaged by lesson
well (Watzke, 2007). Student teachers go through this stage of self- content, the activation of prior knowledge and the use of ‘‘advance
concerns at the start of their training and this usually leads to a organizers’’ correlates with student achievement (Nunes & Bryant,
decision about whether or not to continue with their training. Their 1996; Pressley et al., 1992). And as early as the 1990s, the use of
next set of concerns relates to knowledge of the subject matter and, modern media and visual representations were shown to correlate
in particular, to tasks such as classroom management and with achievement (Hiebert, Wearne & Taber, 1991; Kozma, 1991;
instructional methods (Watzke, 2007). These task concerns, to Mayer & Gallini, 1990). Evertson, Anderson, Anderson, and Brophy
use Fuller’s term, continue to play a role during training and during (1980) found a significant correlation between student achieve-
the initial period on the job. In the third stage, teachers are ment and class discussions, the level of teacher contributions and

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx 3

the extent to which teachers accepted students’ ideas, especially in domains of teaching are theoretically connected, this should be
arithmetic and mathematics (and to a lesser degree in mother- visible empirically as well. In relation to this, we expect that the
tongue teaching). More recent studies have looked at cooperative first and second stage of concerns (self- and task-related) will
learning and task-based student interaction (Meeuwisse, Sever- correspond to less complex domains of teaching skill (learning
iens, & Born, 2010). In studies of mathematics and physics lessons climate, classroom management, clarity of instruction), while the
there has been an increasing emphasis on boosting student third concern (impact-related) will correspond to more complex
participation and ‘discursive’ task-based student discourse (Mor- domains of teaching skill (activating learning, adaptive teaching,
timer & Scott, 2000; Peressini & Knuth, 1998; Sherin, 2002). teaching learning strategy).
Teaching learning strategies like providing temporary forms of Previously, Van de Grift and Lam (1998) developed the first
support, or ‘‘scaffolds’’, which help students bridge the gap version of the evaluation instrument, in the form of observation
between present skills and the skills they need to learn improve tool, for primary schools in the Netherlands and tested its
student achievement (Bickhard, 1992; Carnine et al., 1998; Hattie reliability, validity and predictive value for student achievement.
& Clinton, 2008; Houtveen & Van de Grift, 2007; Palincsar & Brown, That instrument then served as the basis for the observation
1984; Rosenshine & Meister, 1997; Slavin, 1987; Smith et al., instruments used by the Dutch Inspectorate of Education. These
2008). Finally, Adapting teaching to student differences produces inspection instruments were shown to correlate positively with
positive outcomes for weak students especially. Adaptation could student achievement that had been corrected for socioeconomic
involve additional instruction, additional learning time, pre- background and to correlate negatively with the percentage of
teaching and re-teaching (Dutch Inspectorate of Education, students who lagged one or more years behind their peers (Van de
1998; Houtveen et al., 1999; Lundberg & Linnakylä, 1992; Pearson Grift & Houtveen, 2006). The evaluation instrument’s reliability
& Fielding, 1991). The six domains of teaching quality were used and cross-cultural validity have also been tested through use by
conceptually and empirically in the work of Van de Grift (2007, education inspectors in England, Scotland, Flanders, Slovakia and
2013). Lower Saxony (Van de Grift, 2007, 2013). In an international
The three stages of teachers’ concerns (Fuller, 1969, 1970) are comparative study among Dutch, Flemish, Scottish, Lower
theoretically consistent with the six domains of teaching quality Saxonian, and Slovakian primary school teachers, a new version
mentioned above. Fuller’s stage theory suggests that teachers’ of the evaluation instrument was constructed that met the
focus would develop from self-related, to task-related, and then to stringent criteria of the Rasch model (Van de Grift et al., 2011).
student or impact-related concerns. In line with the research of The established instrument can therefore be used to accurately
Watzke (2007) on beginning teachers’ stage of concerns, the six measure the skill level of primary school teachers.
domains of teaching quality proposed by Van de Grift (2007, 2013) In the present study the evaluation instrument was adapted for
can be framed within self-stage-related groupings. Learning climate use in secondary education, with the aim to deploy it in teacher
covers aspects teacher-student relationships and is associated training and induction of secondary school teachers. This article
with self-related concerns. Classroom management deals with describes the psychometric properties of the secondary version of
classroom organization and misbehaviour, while clarity of the evaluation instrument and discusses the general skilfulness
instruction covers aspects of instructional clarity. These two level of student teachers.
teaching domains are related to task-related concerns. Finally,
activating learning, adaptation of teaching, and teaching learning Method
strategies address teaching behaviour associated with teaching
within the broader context of student socio-emotional well-being, Study sample
motivation, and academic growth. These three domains are
conceptually relevant with impact-related concerns. We used a random sample of 264 student teachers (118 male
and 146 female) from 64 schools who were prepared to take part in
Aim of the current study our study. About 80% of them were student teachers and 20% were
student teachers who have experiences in teaching. Table 1 shows
A major source of inspiration was Fuller’s theory (1969, 1970) the composition of the study sample.
theory, which led us to hypothesize that mastering basic skills such About 32% of the student teachers reported that they taught a
as creating a safe learning environment and efficient classroom humanities subject (including philosophy), 29.9% a science subject
management is a precondition for demonstrating the more (including biology) and 35.6% a social science subject (including
complex teaching skills like adapting teaching to student ‘other’ subjects). Almost 34% of these student teachers were
differences. Based on the preliminary work of Van de Grift et al. undertaking their teaching practice at an official teacher training
(2011) using the sample of primary school teachers, we expect that school. The student teachers worked in classes ranging in size from
the same patterning will emerge with the sample of secondary 10 to 40, with a mean of 21.4 and a standard deviation of 5.9.
school teachers: learning climate, classroom management, and
clarity of instruction will fall into less complex teaching domains, Instrument and procedure
while activating learning, differentiation, and teaching learning
strategy will fall into more complex teaching domains. If Fuller’s A 32-item event-sampling evaluation instrument was used to
theory (1969, 1970) stage concerns and Van de Grift’s (2007, 2013) observe the student teachers’ classroom practice. The instrument

Table 1
Composition of study sample.

Abs %

Student teachers Masters of education 25 9.5


Bachelors of education 186 70.5
Student (experienced) teachers Certified bachelors of education working on their professional master of education 37 14.0
Uncertified experienced teachers working on their bachelor of education 15 6.0
Total 264 100.0

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

4 W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx

covers all the six domains of teaching skills reviewed earlier (see To explore the predictive validity of the evaluation instrument,
Table 2 for an example). we analyzed whether there was a significant relationship between
Student academic engagement in the lesson was observed by observed student teachers’ teaching skill and student academic
observers by means of a three item scale developed by Van de Grift engagement in the lesson using multilevel analyses (Snijders &
(2007). This scale measures students’ academic engagement with Bosker, 2012).
the emphasis on psychological and behavioural engagement
(Appleton, Christenson, & Furlong, 2008). Examples of items are:
‘‘Students are fully engaged in learning’’ and ‘‘Students show that Results
they are interested in learning’’. Observers rated the items on a
four-point response, ranging from 1 (Completely not true) and 4 Unidimensionality of the teaching skill construct
(Completely true). For the present sample, the reliability of student
engagement measure is 0.89, indicating a good internal consisten- As discussed in the theoretical section, teaching skill can be
cy. distinguished conceptually into the six domains. Based on
Specially trained observers carried out the evaluations of the correlational analyses, the mean inter-scale correlations of the
student teachers’ classroom practice and student academic current data range between 0.59 (learning climate) and 0.69
engagement. The training involved an explanation of the (classroom management and activating learning). This means that
observation instruments and a discussion about how to evaluate although the domains of teaching skills overlap, the scales could
teaching practices with the associated scoring rules. The observers measure distinct aspects of teaching skill sufficiently. Furthermore,
then observed and scored three digitally recorded lessons. After rather strong mean inter-scale correlations suggest that the six
each observed lesson both the percentage consensus of the domains form a higher order latent construct called teaching skill,
observers and the extent to which the observers agreed with supporting a one-dimensional construct1. Because our main aim is
previously set criteria was established. Observers who differed in to construct an evaluation measure that satisfies the stringent
more than 30% of instances from the scores of fellow observers or unidimensional Rasch model, we began our analysis by checking
from the set criteria were not invited to take part in the study. the unidimensionality of the teaching skill construct.
Observers who reached the consensus of higher than 0.70 were To check whether the 32 items of the teaching skill together
sent to observe and evaluate student teachers’ teaching practices. form a unidimensional latent construct, we used factor analysis to
make a scree plot of the eigenvalues based on the tetrachoric
Data analysis correlation matrix of items. Results indicate that the scree plot
clearly shows one dominant factor, which indicates that the
Given that item response theory, and the Rasch model (Rasch, assumption of unidimensionality is reasonable (see Fig. 1). We can
1960, 1961) in particular, offers unique possibilities for arranging observe that the last items have a negative eigenvalue. For
teaching skill scores on a single dimension and therefore for polytomous items, negative eigen values of items can point to a
accurately estimating the skill level of individuals on a latent misspecification of the model. For a tetrachoric correlation matrix
variable, this was the psychometric model that we opted for. The (of dichotomous variables, in other words), negative eigenvalues
Rasch model also offers simpler possibilities for interpreting seem unavoidable (Muthén & Muthén, 1998–2012). However, the
results than does Birnbaum’s (1968) two-parameter model, for values are so close to 0 that we cannot attach any value to them.
instance. However, the Rasch model is a stringent model. The Another way to evaluate the assumption of unidimensionality
Rasch model requires the data to satisfy the assumptions scale is to check whether factors other than the intended latent
unidimensionality, local stochastic independence of items, and of dimension – teaching skill – affect the distribution indicated by
parallel item characteristic curves. We therefore checked whether the items. This too is important, because the evaluation instrument
the observations and evaluations of student teachers made with must be suitable for use with student teachers who have different
this instrument met these assumptions. We then checked whether characteristics and whose teaching practice takes place in different
the evaluation instrument satisfied several special use require- situations; characteristics which may not influence the distribu-
ments that are important in the training of student teachers. tion of the Rasch scale. Various analyses that we conducted using
Model-data fit analyses were carried out using several Andersen’s (1973, 1977) log-likelihood ratio test showed that the
statistical programmes. Preliminary, we used PML (Gustafsson, distribution of the items on the scale was invariant for the
1977) to perform the Andersen test and the Martin Löf test. following student teachers’ group characteristics: at the beginning
Because the test available in PML is limited to traditional log- and end of their training (x230 ¼ 38:73; p = .14), with bachelor and
likelihood tests, we used WINMIRA (Von Davier, 1994) to check master students undertaking pre- or in-service training
whether or not using more contemporary fit-statistics would (x258 ¼ 71:01; p = .12), of males or females (x230 ¼ 23:76; p = .78),
reveal similar results as PML. Next, we used PARSCALE (Muraki & undertaking their teaching practice at official or non-official
Bock, 1985–2002) to perform the Birnbaum model for estimating training school (x230 ¼ 25:16; p = .72), working with large or small
the slope for each item. Furthermore, we conducted one-factor classes (x230 ¼ 25:54; p = .65), teaching humanities, science or
confirmatory factor analyses (CFA) using MPLUS (Muthén & social science subjects (x260 ¼ 68:10; p = .22). These outcomes
Muthén, 1998–2012), in combination with local dependence test make the instrument especially useful for evaluating the skills of
(LD test) using IRTPRO Cai, Thissen, and Du Toit (2005–2013) to student teachers across a range of personal and group character-
check the assumption of local stochastic independence. CFA istics.
provides preliminary information about items violating the local
stochastic independency assumption, while LD test functions as a 1
Clustering the dimension of teaching skill in domains is practical for didactical
confirmation test. Although using one particular statistical and conceptual purposes, only. Clustering of behaviour into domains, and breaking
software is often adequate for conducting the Rasch analysis, it the domains down into components and elements, does not mean or suggest that
is sometimes necessary to use multiple statistical tools to arrive at these behavioural domains are not interrelated (Danielson, 2013). The simplifica-
the most representative conclusion. Typically as in this study, the tion of teacher behaviour can help guide teachers develop their teaching behaviour
by focusing on the domains they need to pay attention to (zone of proximal
statistical tools complement each other in producing information development). The domains proposed by the mentioned author show content
about the psychometric quality of the measure. Additionally, the overlap and are related, as all the behaviours contribute to one common dimension:
skilfulness level of student teachers were determined. teaching skill.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx 5

18

16

14

12

10

-2
eigenvalue 16.45 2.435 1.967 1.673 1.371 1.193 1.068 0.89550.791 0.776 0.672 0.565 0.494 0.482 0.451 0.359 0.291 0.282 0.237 0.215 0.162 0.126 00.042 0.018 -0.01 -0.02 -0.08 -0.1 -0.13 -0.15 -0.19 -0.32

Fig. 1. Scree plot of the eigenvalues of the correlation matrix of 32 items.

Local stochastic independence asymptote in Fig. 2) that this item is susceptible to guessing. In
retrospect, the fact that the items ‘‘adapts instruction to relevant
The assumption of local stochastic independence involves the student differences’’ and ‘‘adapts processing of subject matter to
correlations between the items disappearing once the effect of the student differences’’ are still correlated after the effect of the
intended latent variable (teaching skill) has been partialled out. latent skill has been partialled out is not really surprising because
We therefore used confirmatory factor analysis (with the Mplus it would not be very consistent for a teacher to exhibit one
program) to check the item correlations after the effect of the behaviour but not the other.
latent skill was partialled out. We formulated a one-factor model in
which all residual correlations were set at 0. Results show that the Parallelism of the item characteristic curves
Tucker–Lewis Index (TLI) was 0.96 and the root mean square error
of approximation (RMSEA) was 0.08. If we adhere to Hu and The items in the scale should have a stable sequence if student
Bentler (1999) rule of thumb of TLI > 0.95 and RMSEA < 0.07, the teachers are to be supported in their development. If the items of
model-data fit is deemed acceptable, but room for improvement is the scale reveal a stable sequence, this sequence could guide the
suggested. Releasing the correlation between the residuals of two development of interventions aimed to stimulate teaching skill
variables produces a TLI of 0.99 and an RMSEA of 0.05, which is a improvement targeting at teachers’ zone of proximal development.
very good model-data fit. The variables that cause the slight Statistically, this means that the item characteristic curves of the
violation of the assumption of local stochastic independence are instrument items should ideally be parallel. We used various
‘‘checks during processing whether students are carrying out tasks procedures to check whether this was the case for the 32 items in
properly’’ and ‘‘involves all students in the lesson’’. the scale. Firstly, we used Andersen’s (1973, 1977) log-likelihood
Chen and Thissen (1997) proposed a standardized index, the ratio test to examine the stability of the item parameters (denoted
LDx2 index, to establish whether there is a violation of the as b) in student teachers with high and low skill levels. Results
assumption of local stochastic independence for individual items. show that at 40.56, the x2 of this test is relatively small, given the
The manual of the IRTPRO program that we used for these number of degrees of freedom (30). The associated p-value was
calculations employs the following criterion values for this x2 0.09. The stability of the distribution on the dimension for teachers
index. A value of <5 means that there is little likelihood of local with a high and a low level of skills is the first indication of
dependence. Values between 5 and 10 form a ‘grey area’: while parallelism in the item characteristic curves.
there may be local dependence, this value could also result from a
sparseness in the underlying table of frequencies. A value of >10
points to the possibility of local dependence. Results show that the
item ‘‘explains the lesson objectives at the start of the lesson’’
seems to be locally dependent (LDx2 > 10) on ‘‘creates a relaxed
atmosphere’’, ‘‘supports student self-confidence’’, ‘‘encourages
students to reflect on solutions’’, ‘‘has students think out loud’’,
‘‘encourages students to think critically’’, ‘‘boosts the self-confi-
dence of weak students’’, ‘‘adapts instruction to relevant student
differences’’ and ‘‘adapts processing of subject matter to student
differences’’. The last two items were found to be locally dependent
as well.
We have already seen that the item characteristic curve for
‘‘explains the lesson objectives at the start of the lesson’’ has a Fig. 2. Item characteristic curves of 32 items. The bold line indicates an item that is
fairly flat slope (a = 0.50) and there are indications (see the low susceptible to guessing. This item violates the assumption of Rasch model.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

6 W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx

Table 2
Domains, P-values (proportion of correct response), levels of difficulty (b) and slopes (a) of the items and model fit characteristics (Q) of the teaching skill items.

Domain The teacher . . . P b SE a SE Q Zq

Climate Shows respect for students in behaviour and language .99 5.57 1.03 1.78 .99 .12 .10
Instruction Explains the subject matter clearly .91 1.94 .28 1.45 .30 .10 .31
Climate Creates a relaxed atmosphere .90 1.78 .23 1.01 .22 .16 .45
Climate Supports student self-confidence .84 1.05 .22 1.32 .24 .10 .34
Instruction Gives feedback to students .83 .99 .23 1.56 .28 .09 .58
Climate Ensures that the lesson runs smoothly .83 .95 .21 1.32 .24 .10 .29
Climate Ensures mutual respect .82 .85 .22 1.14 .21 .12 .03
Organization Ensures effective class management .81 .76 .21 1.76 .30 .07 .88
Instruction Gives well-structured lessons .80 .71 .29 1.95 .33 .06 1.19
Activating Encourages students to do their best .80 .67 .23 .99 .18 .10 .34
Organization Uses learning time efficiently .80 .67 .19 1.38 .24 .14 .54
Organization Checks during processing whether students are carrying out tasks properly .78 .50 .19 1.20 .21 .12 .02
Activating Asks questions that encourage students to think .77 .42 .21 1.25 .21 .12 .01
Activating Involves all students in the lesson .73 .12 .21 1.33 .22 .11 .25
Instruction Explains the lesson objectives at the start of the lesson .73 .12 .19 .50 .11 .24 3.11
Activating Uses teaching methods that activate students .72 .04 .19 .95 .17 .14 .60
Instruction Clearly explains teaching tools and tasks .69 .16 .18 1.14 .19 .12 .05
Teaching learning Encourages students to reflect on solutions .69 .16 .19 1.02 .17 .14 .61
Instruction Checks during instruction whether students have understood the subject matter .68 .23 .20 1.35 .21 .10 .31
Differentiation Boosts the self-confidence of weak students .66 .36 .18 .87 .15 .15 .94
Activating Provides interactive instruction .62 .65 .18 1.07 .17 .12 .25
Teaching learning Encourages students to think critically .62 .65 .18 1.37 .21 .10 .25
Teaching learning Encourages students to apply what they have learned .61 .71 .18 1.26 .21 .11 .16
Teaching learning Has students think out loud .59 .81 .18 1.23 .19 .11 .01
Teaching learning Teaches students how to simplify complex problems .52 1.26 .18 1.75 .26 .07 .78
Differentiation Checks whether the lesson objectives have been achieved .51 1.32 .19 1.36 .20 .09 .41
Teaching learning Encourages the use of checking activities .48 1.54 .18 1.18 .18 .11 .12
Teaching learning Teaches students to check solutions .47 1.57 .18 1.55 .23 .08 .50
Differentiation Adapts instruction to relevant student differences .44 1.79 .18 1.29 .19 .10 .06
Teaching learning Asks students to reflect on approach strategies .43 1.82 .19 1.37 .20 .10 .17
Differentiation Offers weak students additional learning and instruction time .42 1.86 .18 1.12 .17 .11 .26
Differentiation Adapts processing of subject matter to student differences .37 2.26 .18 1.33 .19 .10 .13

Note: b = parameters for the level of difficulty (b) calculated for the Rasch model, a = parameters for the slope (a) calculated for the Birnbaum model.

Although it is a fairly major undertaking to gather observational Apart from very small violations, Fig. 2 shows that the pattern of
data on 264 teachers, this is a relatively small sample for the parallel item characteristic curves is disrupted by a single item –
Andersen test. Consequently, the reliability intervals around the ‘‘explains the lesson objectives at the start of the lesson’’. As we
item parameters are relatively large, which means there is a fairly saw in Table 3, this item, with a slope (a) of 0.50, has a much flatter
low likelihood of a significant result. With small samples like ours, curve than the others. We can also observe that it has a lower
both the Pearson x2 test and the Cressie–Read test (Cressie & Read, asymptote of 0.07, whereas the other item characteristic curves
1984) are more stable indices than the classical Andersen test. start at 0. This item therefore has characteristics that match the
Checks on these goodness of fit statistics indices show the p-values three-parameter model better than the Rasch’s one-parameter
of the Pearson x2 and Cressie–Read tests of 0.10 and 0.03 logistic model.
respectively, confirming our results from the Andersen test.
To obtain information on the extent to which individual items Item misfit
cause model violations, we used Rost and Von Davier’s (1994)
Q-index. The Q-index ranges between 0 and 1, with a value of 0 We have seen earlier that the items ‘‘encourages students to
indicating a perfect correlation with the ideal Guttman model and apply what they have learned’’, ‘‘boosts the self-confidence of weak
a value of 1 indicating a complete misfit. A value of 0.5 points to students’’, ‘‘checks during processing whether students are
random response behaviour. Table 2 shows that all items have a carrying out tasks properly’’ and ‘‘involves all students in the
suitably low Q-index (i.e., <0.20). Only the item ‘‘explains the lesson’’ cause slight violations in some of the tests used. Given that
lesson objectives at the start of the lesson’’ has a value of 0.24, but violations were minor and did not occur systematically across all
that is still well below 0.5. tests, we will retain these items in the evaluation instrument for
Rost and Von Davier (1994) also developed a standardized Zq- the time being. This does not apply to the item ‘‘explains the lesson
index with a mean of 0 and a variance of 1. This Zq-index is objectives at the start of the lesson’’, which has a much flatter slope
presented in Table 2. Once again, we see that the item ‘‘explains the and an anomalous lower asymptote. Because it has characteristics
lesson objectives at the start of the lesson’’, with a Zq of 3.11, which match the three-parameter model better than the one-
indicates problems at a significance level of 0.95, but not quite at a parameter logistic (Rasch) model, we have decided to remove it
significance level of 0.99. from the evaluation instrument.
Additional approach to investigate the parallelism of item
characteristic curves is to calculate the slope (denoted as a) of the Item calibration
curves. The parameters for the slope of the item characteristic
curves have a mean of 1.21 (see Table 2). We can see that the slope Table 2 presents an overview of the level of difficulty of the
of the item ‘‘explains the lesson objectives at the start of the lesson’’ items (b) and their associated standard error. It shows that the
differs more than 1.96 the SE from the mean slope, while the hierarchical structure of the items largely corresponds with
slope of the item ‘‘boosts the self-confidence of weak students’’ the structure predicted by Fuller (1969, 1970). In general, results
deviates only slightly from the mean. Finally, we conducted a show that the first half of the items have lowest to moderate
graph ‘test’ by plotting the item characteristic curves (see Fig. 2). difficulty levels, while the second half of the items have moderate to

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx 7

Table 3 items. In the sequence of domains and items we can clearly


Skills of student teachers.
recognize the sequence postulated in Fuller’s theory (1969, 1970)
Raw score Skill u (Warm’s weighted SE % student concern theory. Mastery of the skills ‘‘creating a safe and
likelihood estimates) teachers stimulating instructional environment’’, ‘‘good lesson organiza-
31 4.93 1.49 10.3 tion’’, ‘‘efficient use of time’’ and ‘‘clear and structured explana-
30 3.76 .89 6.0 tions’’ precedes the mastery of skills involving the impact that
29 3.17 .71 3.8 teachers have on their students, namely, ‘‘activating students and
28 2.75 .62 3.3
interactive instruction’’, ‘‘teaching students how they should
27 2.42 .56 4.9
26 2.13 .53 4.9 learn’’ and ‘‘adapting teaching to student differences’’.
25 1.88 .50 3.8
24 1.65 .48 4.9 Student teacher’ skilfulness
23 1.44 .46 6.0
22 1.24 .45 7.1
21 1.04 .44 6.5
Table 3 presents the frequency distribution of the teaching
20 .86 .43 4.3 skills of the student teachers in the present sample. The person
19 .68 .42 1.1 parameters were estimated using Warm’s weighted likelihood
18 .51 .42 2.2 estimates (WLE, Warm, 1989). This has two advantages over the
17 .34 .42 3.3
more traditional method of maximum likelihood estimates (MLE).
16 .17 .42 4.3
15 .00 .41 2.7 First, the bias is smaller (Hoijtink & Boomsma, 1995, chap. 4;
14 .17 .42 4.3 Warm, 1989). Second, this method can also be used to estimate the
13 .34 .42 2.2 skills of people with a zero and a maximum score. The average
12 .51 .42 2.2
score of the Warm’s estimates is 1.33 with a standard deviation of
11 .69 .43 1.1
10 .87 .44 1.6
1.85. This means that the average student teacher is able to create
9 1.07 .45 1.6 safe and stimulating learning climate, is able to manage classrooms
8 1.27 .47 1.6 effectively, delivers clear instruction, and activate their students to
7 1.49 .49 .5 learn and, to some extent, is able to teach learning strategies.
6 1.73 .51 1.6
However, the average student teacher finds it difficult to teach
5 2.01 .55 .0
4 2.33 .61 1.1 learning strategies and to adapt their teaching that addresses
3 2.73 .69 .0 students’ differences and learning needs.
2 3.31 .85 1.1
1 4.43 1.23 1.6
Predictive validity: teaching skills and student engagement
0 6.73 2.04 .0

For the purpose of predictive validity, the size of the scores on


the evaluation instrument should correlate significantly and
highest difficulty levels (items are ordered from least to most strongly with student learning gains. Because of the diversity of
difficult). The calibration result suggests that items associated with school types in which student teachers worked and the wide range
learning climate, classroom management, and clarity of instruction of subjects they taught, it was not possible in this study to validate
are less complex compared to items associated with activating learning, the instrument with student achievement. However, we were able
teaching learning strategy, and differentiation. This finding is consistent to establish the relationship with student academic engagement.
with our expectation that the first and the second stage of concerns Results show that the teaching skill evaluated by the observa-
(self- and task-related) will correspond to less complex domains of tion instrument correlates highly and significantly with the
teaching skill (learning climate, classroom management, clarity of observed student academic engagement measure (r = 0.66,
instruction), while the third concern (impact-related) will correspond p < 0.00). The fact that students pay attention and continue
to more complex domains of teaching skill (activating learning, working with teachers who have a higher level of teaching skill is
adaptive teaching, teaching learning strategy). This result is consistent an important indication of the teaching scale’s predictive validity.
with that of primary school teachers (Van de Grift et al., 2011). Furthermore, we conducted multilevel analyses to examine the
Results presented in Table 2 allow us to interpret the teachers’ relationship between teaching skills and class academic engage-
teaching skill scores in terms of the domains underpinning the ment more thoroughly (see Table 4). Prior to conducting multilevel

Table 4
Results of multilevel models explaining class academic engagement.

Model 1: null model Model 2 teaching behaviour Model 3: full model


model (teaching behaviour +
background variables)

Coefficient SE Coefficient SE Coefficient SE

Fixed effect
Intercept 3.01*** .05 2.71*** .05 2.72*** .14
Teaching skill .24*** .02 .25*** .03
Subject (0 = Science) .13 .11
Class size (0 = Small) .05 .11
School variant (0 = ordinary school) .01 .11
Teacher gender (0 = male) .09 .10
Random effect
Level 2 variance .12 .06 .10 .05 .06 .07
Level 1 variance .41 .06 .21 .05 .24 .07
Deviance 562.62 295.42 186.21
***
p < .001.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

8 W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx

analyses, power analysis and tests on assumptions of normality competence level of the behavioural domains. The most complex
and multicollinearity were checked. Results suggested that there behavioural domains require long term interventions. Even though
were no violations of these assumptions. a proportion of student teachers is skilful in the complex
Multilevel analyses indicated that about 23% of the differences behavioural area, research amongst experienced teachers reveals
in class academic engagement were related to schools, and that the same patterning (Van de Grift & Helms-Lorenz, 2013). This
about 77% was related to teachers/classes (see Model 1). This result means that a great number of teachers never develop the most
suggests that there is large variability between teacher/class in complex skills. Professional development programmes should pay
class academic engagement. Furthermore, teacher behaviour attention to these more complex behavioural domains. In the long
played a significant role in explaining differences in class academic run the pay-offs should be visible in student achievement
engagement (b = 0.24, p < 0.001, see Model 2). Explained variance outcomes.
between teachers/classes was 42% and between schools was 17%. It is also important to note that the current study attempts to
Teacher behaviour uniquely explained about 27% of the variance in connect Fuller’s theory of teachers’ stage concerns (Fuller, 1970)
class academic engagement. The effect of teacher behaviour on and the domains of teaching skill (Maulana, Helms-Lorenz, & Van
class academic engagement remained significant even after de Grift, 2014; Van de Grift, 2007). At the theoretical level, the self-
controlling for background variables (b = 0.24, p < 0.001, see and task-related concerns are conceptually consistent with
Model 3). Multilevel analysis with cross-level interactions domains of learning climate, classroom management, and clarity
(products of teacher, contextual, and teaching behaviour char- of instruction. The impact concern is conceptually consistent with
acteristics, not shown in the table), revealed no significant domains of activating learning, differentiation, and teaching
interaction effects. This implies that the results apply to both learning strategy. Although we acknowledge that the theoretical
science and non-science subjects, small and large classes, school- connection between stage concerns and teaching skill is only
based and non-school-based routes, as well as to male and female approximate, results of the current study confirm the connection.
teachers. Comparing the teacher behaviour model (Model 2) and Less complex domains of teaching skill (learning climate,
the full model (Model 3) with the null model (Model 1) revealed a classroom management, clarity of instruction) correspond with
significant decrease in deviance (p < 0.001), suggesting a substan- the earlier stage of concerns (self- and task- related concerns),
tial model fit improvement. This means that the inclusion of some while more complex domains of teaching skill (activating learning,
background and contextual variables is necessary, which suggests differentiation, teaching learning strategy) correspond with the
that academic engagement can be influenced by contextual factors later stage of concern (impact-related concern).
such as teaching skill. This study is subject to limitations as well. There is a need for
follow-up research to arrive at a sound standard for the initial level
Concluding remarks of competence of secondary school teachers. First and foremost,
this will require large samples of student teachers to determine the
An evaluation instrument for measuring teachers’ growth in mean and distribution at the time of graduation. It will then be
teaching skill from the start of their training until they become possible to establish a realistic desired standard of competence for
experienced teachers in the context of secondary education new teachers. It seems reasonable to expect new teachers to be
satisfies the stringent assumptions imposed by the Rasch model. able to create a safe and stimulating learning environment,
Moreover, the evaluation instrument has predictive value for organize a lesson well, make efficient use of time, and give clear
student academic engagement in the lesson. These results are and structured explanations. However, to what extent can we
consistent with an earlier study (Van de Grift et al., 2011) that expect recently qualified teachers to be able to activate their
demonstrated that a different version of this evaluation instru- students using interactive instruction, teach students to learn how
ment is suitable for use in primary schools. These results also to learn and adapt their teaching to student differences? This calls
suggest that the quality of student teachers’ teaching skills is an for a panel study in which teachers, administrators, principals,
important predictor of students’ academic engagement, which is in teaching methodologists, school mentors, educationalists, teach-
line with previous research using the experienced teachers context ing inspectors, policymakers and researchers give their opinions
(Maulana & Opdenakker, 2013; Maulana, Opdenakker, & Bosker, concerning what the initial standards of competence for new
2013; Maulana, Opdenakker, Stroet, & Bosker, 2012). teachers should be. Additionally, although the current study
We may conclude that the teaching skills of teachers in supports the theoretical connection between Fuller’s theory of
secondary schools can be measured on a single dimension. The teachers’ stage concerns and Van de Grift’s domains of teaching skill,
distribution of the items was shown to be stable when observing the connection between the two frameworks at the empirical level is
male and female teachers, and student grade one and grade two rather limited. Future research should be directed towards
teachers, regardless of whether they were undertaking a bachelor strengthening the empirical connection between the two frame-
or master study. The score distribution also remained stable works. This can be achieved by examining stage concerns and
irrespective of class size or school subject cluster and regardless of teaching skills utilizing state concerns and teaching skill instru-
whether the school where teaching practice took place was part of ments using the same sample of beginning teachers simultaneously.
an official training school or not. In view of the fact that the score We acknowledge that there are other (observable and unobserv-
distribution also remained constant for student teachers at the able) domains beyond the domains covered by the present
beginning and end of their training, we argue that this evaluation evaluation instrument. Future research would benefit from addres-
instrument is suitable not only for determining progress in sing both the professional development of beginning teachers as
classroom practice for student teachers targeting at the zone of well as an international comparison of the quality of teaching.
proximal development, but also for monitoring the effects of Our plans for future research are twofold. The first plan is
interventions among student teachers and beginning teachers. directed towards a longitudinal study in which the teaching skills
This implies that the evaluation instrument can be used to of beginning teachers will be observed during three consecutive
diagnose the level of skilfulness to aid formative assessment of years. This study should shed a light on the ‘natural’ growth in
student teachers. Student teachers should be supported differen- professional development of beginning teachers. Second, the
tially in their development depending on their competence levels. implementation of an international comparative study on teaching
The support should aim to scaffold the proximal development quality using this evaluation instrument, involving researchers
zone. Our evaluation instrument can be used to pinpoint the from many countries worldwide.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx 9

Finally, educational system in many countries still suffers from Hoijtink, H., & Boomsma, A. (1995). On person parameter estimation in the dichoto-
mous Rasch model. In G. H. Fischer & I. Molenaar (Eds.), Rasch models: Foundations,
the gap between scientific products and practice. To reduce this recent developments and applications. New York: Springer.
gap and to increase the usefulness of our findings, efforts to Houtveen, A. A. M., Booij, N., De Jong, R., & Van de Grift, W. J. C. M. (1999). Adaptive
communicate findings of the present study to schools leaders, instruction and pupil achievement. School Effectiveness and School Improvement,
10(2), 172–192.
teachers, teacher educators, and school mentors are called for. Houtveen, A. A. M., & Van de Grift, W. J. C. M. (2007). Effects of meta cognitive strategy
instruction and instruction time on reading comprehension. School Effectiveness
Acknowledgements and School Improvement, 18(2), 173–190.
Hu, L-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure
analysis: Conventional criteria versus new alternatives. Structural Equation Model-
Our special thanks go to Anna Verkade and Esther Canrinus for ing, 6(1), 1–55.
their significant commitment and dedication to contributing in the Kameenui, E. J., & Carnine, D. W. (Eds.). (1998). Effective teaching strategies that
accommodate diverse learners. New Jersey: Prentice Hall.
national project. This work was supported by the Nederlandse Kozma, R. (1991). Learning with media. Review of Educational Research, 61(2), 179–211.
organisatie voor Wetenschappelijk Onderzoek (NWO, project Kyriakides, L., Creemers, B. P. M., & Antoniou, P. (2009). Teacher behaviour and student
number 411-09-802). NWO funds scientific research at Dutch outcomes: Suggestions for research on teacher training and professional develop-
ment. Teaching and Teacher Education, 25(1), 12–23.
universities and institutes. Lang, J., & Kersting, M. (2007). Regular feedback from student ratings of instruction: Do
college teachers improve their ratings in the long run? Instructional Science, 35(3),
187–205.
References Levine, D. U., & Lezotte, L. W. (1995). Effective schools research. In J. A. Banks & C. A. M.
Banks (Eds.), Handbook of research on multicultural education (pp. 525–547). New
York, VS: Macmillan.
Andersen, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance.
123–140. Englewood Cliffs, NJ: Prentice Hall.
Andersen, E. B. (1977). Sufficient statistics and latent trait models. Psychometrika, 42, Lundberg, I., & Linnakylä, P. (1992). Teaching reading around the world. The Hague: IEA.
69–81. Marzano, R. J. (2003). What works in schools. Alexandria, VA: ASCD.
Appleton, J. J., Christenson, S. L., & Furlong, M. J. (2008). Student engagement with Marzano, R., Pickering, D., & Pollock, J. (2001). Classroom instruction that works:
school: Critical conceptual and methodological issues of the construct. Psychology Research-based strategies for increasing student achievement. Alexandria, VA: Asso-
in the Schools, 45, 369–386. ciation for Supervision and Curriculum Development.
Bickhard, M. H. (1992). Scaffolding and self-scaffolding: Central aspects of develop- Maulana, R., Helms-Lorenz, M., & Van de Grift, W. (2014). Development and evaluation
ment. In L. T. Winegar, & J. Valsiner (Eds.), Children’s development within social of a questionnaire measuring pre-service teachers’ teaching behavior: A Rasch
context (Vol. 2). Hillsdale, New Jersey: Lawrence Erlbaum Associates. modelling approach. School Effectiveness and School Improvement. http://dx.doi.org/
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s 10.1080/09243453.2014.939198
ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Maulana, R., & Opdenakker, M.-C. (2013). Teachers’ interpersonal involvement as a
Reading, MA: Addison–Wesley. predictor of students’ academic motivation among indonesian secondary school
Brophy, J. E. (1979). Teacher behaviour and its effects. Journal of Educational Psychology, students: A multilevel growth curve analysis. The Asia-Pacific Education Researcher.
71(6), 733–750. http://dx.doi.org/10.1007/s40299-013-0132-7
Cai, L., Thissen, D., & Du Toit, S. (2005–2013). IRTPRO 2.1. Lincolnwood, IL: Scientific Maulana, R., Opdenakker, M.-C., & Bosker, R. (2013). Teacher–student interpersonal
Software International Inc. relationships do change and affect academic motivation: A multilevel growth
Carnine, D. W., Dixon, R. C., & Silbert, J. (1998). Effective strategies for teaching curve modelling. British Journal of Educational Psychology. http://dx.doi.org/
mathematics. In E. J. Kameenui & D. W. Carnine (Eds.), Effective teaching strategies 10.1111/bjep.12031
that accommodate diverse learners. New Jersey: Prentice Hall. Maulana, R., Opdenakker, M.-C., Stroet, K., & Bosker, R. (2012). Observed lesson
Chen, W.-H., & Thissen, D. (1997). Dependence indexes for item pairs using structure during the first year of secondary education: Exploration of change
item response theory. Journal of Educational and Behavioral Statistics, 22(3), and link with academic engagement. Teaching and Teacher Education, 28, 835–850.
265–289. Mayer, R. E., & Gallini, J. K. (1990). When is an illustration worth ten thousand words?
Cornelius-White, J. (2007). Learner-centered teacher-student relationships are effec- Journal of Educational Psychology, 82(4), 715–726.
tive: A meta-analysis. Review of Educational Research, 77(1), 113–143. Meeuwisse, M., Severiens, S. E., & Born, M. P. (2010). Learning environment, interaction,
Creemers, B. P. M. (1994). The effective classroom. London: Cassell. sense of belonging and study success in ethnically diverse student groups. Research
Cressie, T. R. C., & Read, N. A. C. (1984). Multinomial goodness-of-fit statistics. Journal of in Higher Education, 51(6), 528–545.
the Royal Statistical Society Series B, 46, 440–464. Mortimer, E., & Scott, P. (2000). Analysing discourse in the science classroom. In R.
Danielson, C. (2013). The framework for teaching: Evaluation instrument. Princeton: The Millar, J. Leach, & J. Osborne (Eds.), Improving science education (pp. 126–142).
Danielson Group. Buckingham: Open University Press.
Dutch Inspectorate of Education (1998). Integral school supervision 1999. Utrecht: Mullis, I. V. S., Martin, M. O., Foy, P., & Drucker, K. T. (2012). TIMSS & PIRLS International
Inspectorate of Education the Netherlands. Study Center. Chestnut Hill, MA: Boston College.
Evertson, C. M., Anderson, C. W., Anderson, L., & Brophy, J. E. (1980). Relationships Muraki, E., & Bock, D. (1985–2002). PARSCALE V4.1(1985–2002). Maximum Likelihood
between classroom behaviors and student outcomes in junior high mathematics item analysis and test scoring: Polytomous model. Lincolnwood, IL: Scientific Soft-
and English classes. American Educational Research Journal, 17(1), 43–60. ware International Inc.
Fraser, B. J. (1985). The study of learning environments. Salem, Oregon: Assessment Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th ed.). LA, CA: Muthén
Research. & Muthén.
Fuller, F. F. (1969). Concerns of teachers: A developmental conceptualization. American Nunes, T., & Bryant, P. (1996). Children doing mathematics. Oxford: Blackwell.
Educational Research Journal, 6, 207–226. OECD (2010). PISA 2009 results: What students know and can do. Student performance in
Fuller, F. (1970). Personalized education for teachers: One application of the teachers reading, mathematics and science. Paris, France: OECD.
concerns model. Austin: R & D Center for Teacher Education. OECD (2012). Education at a glance 2012: OECD indicators. OECD Publishing.
Gustafsson, J. E. (1977). The Rasch model for dichotomous items: Theory applications and a Ofsted (1995). Guidance on the inspection of nursery & primary schools. London: HMSO.
computer program. Göteborg: Institute of Education, University of Göteborg. Peressini, D., & Knuth, E. (1998). Why are you talking when you could be listening? The
Hampton, S. E., & Reiser, R. A. (2004). Effects of a theory-based feedback and consulta- role of discourse and reflection in the professional development of a secondary
tion process on instruction and learning in college classrooms. Research in Higher mathematics teacher. Teaching and Teacher Education, 14(1), 107–125.
Education, 45, 497–527. Palincsar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering
Hattie, J. (2012). Visible learning for teachers: Maximizing the impact on learning. London: and comprehension-monitoring activities. Cognition and Instruction, 2, 117–175.
Routledge. Pearson, P. D., & Fielding, L. (1991). Comprehension instruction. In R. Barr, M. L. Kamil,
Hattie, J. A. C., & Clinton, J. (2008). Identifying accomplished teachers: A validation P. B. Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (Vol. II, pp.
study. In L. Ingvarson & J. A. C. Hattie (Eds.), Assessing teachers for professional 815–860). White Plains, NY: Longman.
certification: The first decade of the National Board for Professional Teaching Standards Pressley, M., Wood, E., Woloshyn, V. E., Martin, V., King, A., & Menke, D. (1992).
(pp. 313–344). Oxford, UK: Elsevier. Encouraging mindful use of prior knowledge. Attempting to construct explanatory
Helms-Lorenz, M., Slof, B., Vermue, C. E., & Canrinus, E. T. (2012). Beginning teachers’ answers facilitates learning. Educational Psychologist, 27(1), 91–109.
self-efficacy and stress and the supposed effects of induction arrangements. Purkey, S. L., & Smith, M. S. (1983). Effective schools: A review. Elementary School
Educational Studies, 38(2), 189–207. Journal, 83(4), 427–452.
Helms-Lorenz, M., Slof, B., & Van de Grift, W. J. C. M. (2013). First year effects of Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copen-
induction arrangements on beginning teachers’ psychological processes. European hagen: Denmarks Paedagogiske Institut.
Journal of Psychology of Education. http://dx.doi.org/10.1007/s10212-012-0165-y Rasch, G. (1961). On general laws and the meaning of measurement in psychology.
Hiebert, J., Wearne, D., & Taber, S. (1991). Fourth graders’ gradual construction of Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Proba-
decimal fractions during instruction using different physical representations. bility, IV (pp. 321–334). University of California Press: Berkeley, California.
Elementary School Journal, 91, 321–341. Rosenshine, B. V., & Stevens, R. (1986). Teaching functions. In M. C. Wittrock (Ed.),
HM Inspectorate of Education (2001). Standards and quality in primary Schools: Mathe- Handbook of research on teaching (3rd ed.). New York: Macmillan.
matics 2001. Edinburgh: HM Inspectorate of Education.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003
G Model
JSEE-539; No. of Pages 10

10 W. van de Grift et al. / Studies in Educational Evaluation xxx (2014) xxx–xxx

Rosenshine, B. V., & Meister, C. (1997). Cognitive strategy instruction in reading. In S. Stahl & D. Van de Grift, W. J. C. M. (2013). Measuring teaching quality in several European
A. Hayes (Eds.), Instructional models in reading. Mahwah, New Jersey: The Guilford Press. countries. School Effectiveness and School Improvement. http://dx.doi.org/
Rost, J., & Von Davier, M. (1994). A conditional item fit index for Rasch models. Applied 10.1080/09243453.2013.794845
Psychological Measurement, 18(2), 171–182. Van de Grift, W. J. C. M. & Helms-Lorenz, M. (2012, January). Classroom practice in
Sammons, P., Hillman, J., & Mortimore, P. (1995). Key characteristics of effective schools: secondary schools. Paper presented at the ICSEI conference, Malmo, Sweden.
A review of school effectiveness research. London: Office for Standards in Education. http://www.rug.nl/staff/w.j.c.m.van.de.grift/research
Scheerens, J. (1992). Effective schooling: Research, theory and practice. London: Cassell. Van de Grift, W. J. C. M., & Helms-Lorenz, M. (2013). Vaardigheid en ervaring van leraren.
Scheerens, J., & Bosker, R. (1997). The foundations of educational effectiveness. Oxford: [Skills and teaching experiences]. Groningen: Rijksuniversiteit Groningen.
Pergamon. Van de Grift, W. J. C. M., & Houtveen, A. A. M. (2006). Underperformance in primary
Schipull, D. (1990). A study of the validity and reliability of the teacher concerns checklist. schools. School Effectiveness and School Improvement, 17(3), 255–273.
University of Southern Mississippi (Unpublished doctoral dissertation). Van de Grift, W. J. C. M., & Lam, J. F. (1998). Het didactisch handelen in het basison-
Sherin, M. G. (2002). A balancing act: Developing a discourse community in a mathe- derwijs [Instruction in primary education]. Tijdschrift voor Onderwijsresearch,
matics classroom. Journal of Mathematics Teacher Education, 5, 205–233. 23(3), 224–241.
Slavin, R. E. (1987). Ability grouping and achievement in elementary schools. Review of Van de Grift, W. J. C. M., Van der Wal, M., & Torenbeek, M. (2011). Ontwikkeling in de
Educational Research, 57, 293–336. pedagogisch didactische vaardigheid van leraren in het basisonderwijs. Pedago-
Smith, T. W., Baker, W. K., Hattie, J. A. C., & Bond, L. (2008). A validity study of the gische Studiën, 88, 416–432.
certification system of the National Board for Professional Teaching Standards. In L. Walberg, H. J., & Haertel, G. D. (1992). Educational psychology’s first century? Journal of
Ingvarson & J. A. C. Hattie (Eds.), Assessing teachers for professional certification: The Educational Psychology, 84(1), 6–19.
first decade of the National Board for Professional Teaching Standards (pp. 345–378). Wang, M. C., Reynolds, M. C., & Walberg, H. J. (1995). Serving students at the margins.
Oxford, UK: Elsevier. Educational Leadership, 52(4), 12–17.
Snijders, T. A. B., & Bosker, R. (2012). Multilevel analysis: An introduction to basic and Warm, T. A. (1989). Weighted likelihood estimation of ability in item response models.
advanced multilevel modeling. London: Sage. Psychometrika, 54, 427–450.
Statline Central Bureau voor de Statistiek (2013). One in five schools have fewer than 100 Watzke, J. L. (2007). Longitudinal research on beginning teacher development: Com-
pupils. Retrieved from http://www.cbs.nl/en-GB/menu/themas/onderwijs/publi- plexity as a challenge to concerns-stage based theory. Teaching and Teacher
caties/artikelen/archief/2013/2013-3806-wm.htm Accessed 15.03.13. Education, 23, 106–122.
Von Davier, M. (1994). WINMIRA 200.1. Kiel: IPN. Wilkinson, I. A. G., & Fung, I. Y. Y. (2002). Small-group composition and peer effects.
Van de Grift, W. J. C. M. (2007). Quality of teaching in four European countries: A review of International Journal of Educational Research, 37(5), 425–447.
the literature and application of an assessment instrument. Educational Research, 49, Yair, G. (2000). Educational battlefields in America: The tug-of-war over students’
127–152. engagement with instruction. Sociology of Education, 73(4), 247–269.

Please cite this article in press as: W. van de Grift, et al.. Teaching skills of student teachers: Calibration of an evaluation instrument and
its value in predicting student academic engagement. Studies in Educational Evaluation (2014), http://dx.doi.org/10.1016/
j.stueduc.2014.09.003

You might also like