You are on page 1of 15

Introduction: What is assessment?

Formative assessment is generally defined as assessment for the purpose of instruction [Heritage et al., 2008]. The central idea is that assessment should not be reserved for an examination of achievement after the teacher has completed instruction. Rather, assessment should be used to gain information that can help the teacher plan effective instruction, particularly for the individual. Formative assessment can be organized, as when the teacher uses an already prepared observational instrument to learn about her students motivation, or it can be informal, as when the teacher spontaneously questions students about their methods of solution. What kind of information gained by organized or informal methods can be useful? The teacher needs to learn about performance, thinking/knowledge, learning potential and affect/motivation. Each of these corresponds to a major approach to psychology. Teachers need assessments that give them a personally meaningful and practical theory of the childs performance, thinking/knowledge, learning potential, and affect/motivation. Unless it is to be merely academic, namely something learned in the academy and usually irrelevant for practice, the theory should make sense to the teacher, and should entail concepts that the teacher can see contribute to the practical job of teaching mathematics. It need not deal with the fine details that usefully interest cognitive researchers (like the myriad strategies that comprise young childrens counting and addition), and it need not deal with broad generalities,

like constructivism, that may be of general theoretical relevance but offer little insight into the details of student behaviour in an ordinary classroom. Instead, the teachers theory should be mid-level, in the sense of dealing with concepts that lead to specific pedagogical manoeuvres, and should avoid fine details and fancy talk. Finally, the bottom line attribute of formative assessment is its actionable character: it is assessment that forms or informs instruction in a principled and effective manner. And if formative assessment can guide and improve teaching, what teacher would not be interested in it

In the late 1980s and 1990s formative assessment, that is, assessment used to promote learning, was not a priority for many teachers in England and Scotland due to the emphasis in key policies on summative assessment in the form of national tests, formal examinations and the pressure of frequent inspection and league table performance. Recent interest in formative assessment has been triggered by two publications entitled Assessment and classroom learning (Black & Wiliam, 1998a) and another, in more accessible form for teachers and policy makers, Inside the black box (Black & Wiliam, 1998b). The fact that Inside the black box has sold 50,000 copies since its publication indicates a healthy interest in formative assessment. Hargreaves (2001) described assessment for learning as the beginning of a revolution in education because he believes it is a key driver of the convergence between curriculum, assessment and pedagogy, as it: undermines the old conception that assessment is something that follows teaching and learning. Instead it asserts that assessment can and should actively contribute to the quality of teaching and learning and do so as an inherent component of the daily round of classroom life. (Hargreaves, 2001). Whether assessment for learning can accurately be described as a revolution will be examined at the end of this article. Assessment for learning has now become part of primary and secondary strategies for all schools in England. In Scotland a major review of the assessment system began in 1999 with Her Majestys Inspectorate (HMI) reporting a number of alternatives for change. Consultation took place in 2000 and the responses were analysed, resulting in a report, Improving Assessment in Scotland

(2000). The consultation revealed that teachers were against radical changeevolution, not revolution was required. The main message was that assessment for statistical and monitoring purposes should not dominate the system because assessment to support learning and teaching, relying principally on teachers professional judgements, was most important.

Literature Review In their review of assessment and learning, and in subsequent publications, Black et al. (1998a, b, 2002, 2003) drew attention to the capacity of formative assessment to play a crucial role in raising standards through giving students a clear sense of themselves as learners, the goals they are trying to achieve and how to reach them. Their approach also links with the wider aim of promoting effective learning. For the purposes of this article, though there are many other ways of defining it, effective learning is: that which actively involves the student in metacognitive processes of planning, monitoring and reflecting (Biggs & Moore, 1993). Black and Wiliam (1998a) concluded from their review that formative assessment helps low attainers most, thus reducing the spread of attainment whilst raising it overall. They emphasised that using formative assessment requires new modes of pedagogy in which pupils have to be actively involved in the assessment process, which should improve their motivation and self-esteem. In more recent publications Black et al. (2002, 2003) provide a variety of examples for teachers to introduce or improve the use of formative assessment in their classrooms. These are: improved questioning techniques and use of response time (wait time) and a discussion of wrong answers; giving oral and written feedback rather than, or in addition to, grades or marks; using self and peer assessment by pupils, for example, the use of traffic lighting in which pupils use the icons of green, amber or red to assess whether their understanding of the subject matter is good, partial or poor;

sharing the criteria for assessment with pupils; encouraging pupil collaboration through small groups, pairs and trios, to plan, to discuss, draft and redraft written work, including summative tests as well as drawing up and using mark schemes.

Black et al. also provide evidence from the action research project and elsewhere (Black et al., 2004, Wiliam et al., 2004) that the use of formative assessment produced significant gains in pupil achievement in national curriculum tests and General Certificate of Secondary Education (GCSE) examinations. They concluded that teachers should not have to choose between teaching well and getting good examination grades, as using formative assessment methods effectively should raise achievement in summative assessments. Elwood (2004) welcomes assessment for learning as prioritising classroom-based assessment that is done with students rather than to them, but cautions against wholeheartedly accepting the approach as humanistic and benign. She draws on research to remind us how the gendered lives of students interact with assessment outcomes. She illustrates how the different experiences of boys and girls impact on the results of assessment in different subjects; boys doing well in science where they were using measuring instruments that they were familiar with outside school, but doing less well in an English classroom where the teacher equated good writing with that written in a romance genre. Therefore all individual student attainment has to be contextualised according to the pedagogic practices that define what is success for girls and boys. Elwood stresses the importance of teachers seeing classroom assessment as influenced by cultural and social factors such as gender relations rather than being regarded as a neutral process. Assessment Policy In a review of the development of policy on testing, Torrance (forthcoming) notes that, compared to the previous relatively unregulated system of curriculum and assessment, the claimed benefits of the national curriculum and testing system are easily stated: Clarity of curriculum content and progression; Clarity of outcomes; Comparable measures of progress over time (Torrance forthcoming). These ideas

still guide government thinking. The changes to national curriculum and assessment in England, particularly since 1997, can be seen to be increasingly linked with international agendas (Lauder et al. 2006; Whetton, Twist and Sainsbury 2000). Global reform of education systems has increasingly focused on teachers as a major factor in enhancing learning and educational quality. However, Tattos (2007) thesis is that in many cases the topdown operationalisation of this focus has resulted in control of education being taken away from teachers and teacher educators. This change in the locus of control is often at the expense of teacher-owned deeper levels of knowledge and critical thinking, which may be more likely to result in increases in learning and teaching quality. The thesis that Tatto suggests is indicative of the basic tensions that exist between a commitment to the pursuit of efficiency and a commitment to the pursuit of effectiveness coupled with social justice (Ball 1997, 257). Ball (2008) suggests that globalisation is a key idea in relation to policy development; in particular, it forms a spatial frame within which policy discourses and policy formulation are set: Education is very particularly implicated in the discourse and processes of globalisation through the idea of the knowledge economy. However, the idea of globalisation has to be treated with care and is subject to extensive debate. (Ball 2008, 25) One aspect of the debate that Ball refers to is that globalisation can, if the term is used too casually, be used to explain almost anything. Particular care needs to be taken when examining the flow and influence of policies between nations. It is not the case that nations uncritically adopt the policies of other nations, because they position themselves in different ways according to their histories and cultural priorities. In view of the claims made about a world class education system, the actual and potential influence of policy in England on other nations, and theories of education as an economic driver in a global market place, Tiklys (2004, 194) cautions are important. The hegemonic role of economics in developing educational programmes, with the associated targets and quantifiable indicators, often ignores the processes at the heart of education, namely those of the curriculum and pedagogy. Tikly describes such global economics-driven policy as a new imperialism, which can be challenged by grass roots social movements that represent globalisation from below (2004, 193) linked to specific forms of critical pedagogy. With regards to assessment, Tiklys work perhaps suggests that

formative assessment by teachers (and pupils) could be aligned with critical pedagogy to provide rigor and legitimacy to grass roots social movements. Advocacy to improve national assessment systems may be more effective if international collaboration with groups in countries who have more appropriate systems is sought in order to provide an alternative lens to the hegemonic economic/political position of many governments.

Challenges facing teachers in implementing assessment Teachers interested in using summative test results want to know if the results are reliable, i.e. reproducible, and if they are good predictors of future success. Others affected by such tests are concerned that they are fair and re ect authentic learning aims and practices. The concept of reliability is relatively straightforward. However, if one were to ask, of public examination results in the UK, for an estimate of the probability of a pupils level (in Key Stage tests) or grade (in GCSE) being at least one level or grade in error, no well researched answer is available. Wiliam (1995) has made an estimate for the level results from Key Stage 3, based only on one of the possible sources of errorvariations between a candidates performances on different questions and concluded that the chance of an individuals result being in error by one or more grades was between 30% and 50%. It is very surprising that the public do not demand that well researched estimates of error be produced for all public examinations the results would probably cause alarm. Issues of policy are involved here. A test can be made more reliable by making it longer, to allow a larger sample of all possible questions, and spreading it over more than one occasion, but both of these changes would raise the cost, both financial and in terms of school time. A decision about whether such cost is justified can hardly be taken without considering the trade-offs with reliability. By contrast, threats to reliability arising from differences between markers are looked for very carefully. Another way to improve is to use multiple choice tests and narrow the range of aims tested: it is by such measures that the well researched reliability of standardised tests, notably in the USA, is achieved, but such measures compromise validity. The issues concerning validity are far more complex and intractable (Wainer &

Braun, 1988). One of the several aspects of this concept relates to prediction. Research on predictive validity for UK A levels has gone little further than recording raw correlations in relation to degree results. The fact that these are significant, but not high, is hard to interpret, both because the limited reliability of both of the measures is bound to lower any correlation, and because the school-leaving data are attenuated in range by the omission of those who are not admitted. There is room for . ne-grained studies, looking at the pro. les of both school and university test results, at the variations between disciplines, and at the change of any patterns of relationships over the duration of a degree course of study. A potentially important research exercise was a test in the UK to explore whether so-called aptitude tests, as widely used in the USA, could be better predictors of university success than A levels (see Chapter 7 in Wood, 1991). It is claimed for such tests that results are not affected by the subject content studied so that there be no need to worry about particular students being over-rated because of privileged schooling or choice of an easy examination syllabus. The experiment showed that the tests results were not independent of subjects studied (e.g. students of economics and psychology showed higher numerical aptitude than those of law and sociology) and that the tests were no better than A levels as predictors of degree achievement. Many doubts have been cast on the claim that aptitude tests measure properties that are not susceptible to the educational or social backgrounds of candidates (see Chapter 17 in Wood, 1991). A second aspect of validity addresses the question of whether the content of tests is a fair reflection of the content and aims of the subject. One issue of current significance is the evidence that some of the important aims of education cannot be assessed by external tests. For example, the capability to carry out practical investigations in science cannot be assessed by written tests (Black, 1990) and cannot be reliably assessed without averaging over several different investigations (Shavelson et al., 1993). Thus there are strong arguments that validity in certain spheres can only be secured in practice by teachers assessments. This raises new questions about the reliability of such assessments. Research in the 1970s laid a basis for confidence in including such assessments in the GCSE system (see Chapter 7 in Wood, 1991), but doubts amongst public and politicians have not been overcome and a further research initiative seems

to be essential.

The effects of assessment on learning The limited validity of tests affects their feedback effects on schooling. Teachers will commonly report on the harmful effects, on their teaching, of the pressure to train their students for success in external examinations. One example is the study by Wood (1988), who used a qualitative enquiry amongst four teachers and 165 seventh-grade students about the state mandated testswhich led him to conclude that the tests reduce science instruction to the literal comprehension of isolated facts and skills. A study of Advanced Placement examinations set for high-school physics students in the USA has also revealed the narrowing effect (Herr, 1992), whilst a review of the literature by Smith et al. (1992) concluded that test pressure inhibits teachers innovation and diversity, reduces their autonomy, and leads to students being taught how to do tests at the expense of time devoted to teaching the subject. It is also well established that any new high stakes test will show improvement in scores over time as teachers learn how to drill students to meet its demands, so there can be apparent improvement with no real gain in learning (Linn, 1994; Linn et al., 1990). Challenges facing teachers in implementing assessment

Teachers interested in using summative test results want to know if the results are reliable, i.e. reproducible, and if they are good predictors of future success. Others affected by such tests are concerned that they are fair and re ect authentic learning aims and practices. The concept of reliability is relatively straightforward. However, if one were to ask, of public examination results in the UK, for an estimate of the probability of a pupils level (in Key Stage tests) or grade (in GCSE) being at least one level or grade in error, no well researched answer is available. Wiliam (1995) has made an estimate for the level results from Key Stage 3, based only on one of the possible

sources of errorvariations between a candidates performances on different questions and concluded that the chance of an individuals result being in error by one or more grades was between 30% and 50%. It is very surprising that the public do not demand that well researched estimates of error be produced for all public examinations the results would probably cause alarm.
England (NLS and NNS).

Several key research studies have addressed the impact of the implementation of testing and assessment systems on teachers and pupils. The Primary Assessment Curriculum and Experience (PACE) project examined the impact of the Education Reform Act 1988 on teachers practice throughout the 1990s (Pollard et al. 1994, Croll 1996, Osborne et al. 2000). The two main sources of data were interviews with teachers triangulated by systematic observation of classrooms. Reporting in 1994, Pollard et al. found that overall there was a picture characterised by change and resistance, commitment and demoralisation, and decreasing autonomy but some developments in professional skills. These skills included greater collegiality in response to the governments legislation but this was in the context of the alienation of professionally committed teachers. As far as teacher assessment was concerned, teachers tended to favour formative, provisional and implicit assessment. This ideology was in conflict with national testing, which involved summative paper-and pencil tests and which constrained and distorted classroom activities, especially towards the end of each key stage as teachers coached and practised for the tests. Writing in 2000, Osborn et al. concluded that although there was evidence of teachers mediating the impact of legislation this was a stressful experience for most teachers because of the reduction in professional discretion and in particular the increasingly high-profile and externally controlled national assessments [that] provided one of the most widespread causes of such conflict (Osborn et al. 2000, 228). Conclusion and Recommendations

The evidence reviewed in this paper suggests that the current intense focus on testing and test results in the core subjects of English, maths and science is narrowing the

curriculum and driving teaching in exactly the opposite direction to that which other research indicates will improve teaching, learning and attainment. Other studies indicate that the quality of teacherpupil interaction is vital to successful teaching and learning, and that good quality teaching will employ a variety of methods and tasks, including small group work and investigative work Test results improved quickly and dramatically, from a relatively low base, but they have largely levelled off since 2000. They have reached a plateau (below the 224 D. Wyse and H. Torrance targets that the government set for itself) that not even the National Literacy Strategy and National Numeracy Strategy have been able to raise. Indeed, it is interesting to note from Figure 1 that at key stage 2, results in science started higher and have remained higher, without the benefit (or hindrance) of a National Science Strategy. It would appear that primary teachers were initially unprepared for National Testing, learnt very quickly how to coach for the tests, hence results improved, but any benefit to be squeezed from the system by such coaching has long since been exhausted. Interestingly such an explanation parallels similar research internationally (Linn 2000; Fullan 2001) and even reflects research in business management about how innovation initially brings improvement, but tails off, as personnel are deskilled then reskilled by change, but then become accustomed to it (Strang and Macy 2001). The key problem with such a phenomenon in education however, is that it is by no means apparent that even such early improvements in scores denote any actual improvements in educational standards.The various studies reviewed earlier would indicate quite the reverse, that coaching for the tests has restricted the quality of teaching and learning and that as test scores have risen, educational standards may have actually declined. If England is to help its pupils achieve more in future, then a renewed focus on formative assessment, which is manageable and built on coherent understandings of the complex roles of the primary teacher, is a promising way forward. The motivation of pupils needs to be addressed more urgently, arguably through much more choice offered to pupils through their curriculum coupled with greater empowerment of teachers in order that they may offer such choice, and themselves be motivated to harness their enthusiasms. As far as national monitoring of standards is concerned this would be more appropriately carried out through a system of national sampling. These and other reforms will be necessary if teaching in

the primary school is going to become much more than whole class cued elicitation (Edwards and Mercer 1987) and direct test preparation.

References Alexander, R.J. 2000. Culture and pedagogy: International comparisons in primary education. Oxford: Blackwell. Ball, S. 1997. Policy sociology and critical social research: A personal review of recent education policy and policy research. British Educational Research Journal 23, no. 3: 25774. Ball, S. 2008. The education debate. London: Policy Press. Black, P. 1994. Performance assessment and accountability: The experience in England andWales. Educational Evaluation and Policy Analysis 16, no. 2: 191203. Black, P., R. McCormick, M. James, and D. Pedder. 2006. Learning how to learn and assessment for learning: A theoretical inquiry. Research Papers in Education 21, no. 2: 11932.

Boyle, B., and J. Bragg. 2006. A curriculum without foundation. British Educational Research Journal 32, no. 4: 56982. Cox, C., and A. Dyson, eds. 1969. Black Paper 1: The fight for education. Croll, P. 1996. Teachers, pupils and primary schooling. Continuity and change. London: Cassell. Educational Research 225 Coffield, F., R. Steer, R. Allen, A. Vignoles, G. Moss, and C. Vincent. 2007. Public sector reform: Principles for improving the education system. London: Institute of Education. Daugherty, R. 1995. National curriculum assessment: A review of policy 19871994. London: Falmer Press. Earl, L., N. Watson, B. Levin, K. Leithwood, M. Fullan, N. Torrance, et al. 2003. Watching and learning: OISE/UT evaluation of the implementation of the national literacy and numeracy strategies. Nottingham: DfES Publications. Edwards, D., and N. Mercer. 1987. Common knowledge. London: Methuen. Ellis, T., J. McWhirter, D. Colgan, and B. Haddow. 1976. William Tyndale: The teachers story. London: Writers & Readers Publishing Co-operative. English, E., L. Hargreaves, and J. Hislam. 2002. Pedagogical dilemmas in the national literacy strategy: Primary teachers perceptions, reflections and classroom behaviour. Cambridge Journal of Education 32, no. 1: 926. Fullan, M. 2001. Leading in a culture of change. San Francisco, CA: Jossey Bass. Galton, M., B. Simon, and P. Croll. 1980. Inside the primary classroom. London: Routledge and Kegan Paul. Galton, M., L. Hargreaves, C. Comber, and D. Wall. 1999a. Inside the primary classroom: 20 years on. London: Routledge. Galton, M., L. Hargreaves, C. Comber, D. Wall, and T. Pell. 1999b. Changes in patterns of teacher interaction in primary classrooms 19761996. British Educational Research Journal 25, no. 1: 2337. Gipps, C., M. Brown, B. McCallum, and S. McAlister. 1995. Intuition or evidence? Teachers and national assessment of seven year olds. Buckingham: Open University Press. Gretton, J., and M. Jackson. 1976. William Tyndale: Collapse of a school or a system? London: George Allen & Unwin. Hall, K., J. Collins, S. Benjamin, M. Nind, and K. Sheehy. 2004. Saturated models of pupildom: Assessment and inclusion/exclusion. British Educational Research Journal 30, no. 6: 80117.

Halsey, A.H., J. Floud, and C.A. Anderson, eds. 1961. Education, economy and society. New York: Free Press. Hamilton, L., B. Stecher, J. Marsh, J. McCombs, A. Robyn, J. Russell, S. Naftel, and H. Barney. 2007. Standards-based accountability under No Child Left Behind. Santa Monica, CA: Rand Education. Hardman, F., F. Smith, and K. Wall. 2003. Interactive whole class teaching in the national literacy strategy. Cambridge Journal of Education 33, no. 2: 197215. Harlen, W., and R. Deakin Crick. 2002. A systematic review of the impact of summative assessment and tests on students motivation for learning (eppi-centre review, version 1.1* . Hilton, M. 2001. Are the key stage two reading tests becoming easier each year? Reading April, 411. James, M., P. Black, R. McCormick, D. Pedder, and D. Wiliam. 2006. Learning how to learn, in classrooms, schools and networks: Aims, design and analysis. Research Papers in Education 21, no. 2: 10118. Kispal, A. 2005. Examining Englands national curriculum assessments: An analysis of the KS2 reading test questions, 19932004. Literacy 39, no. 3: 14957. Klein, S., L. Hamilton, D. McCaffrey, and B. Stecher. 2000. What do test scores in Texas tell us? Education Policy Analysis Archives 8, 49. http://epaa.asu.edu/epaa/v8n49/ Levacic, R., and A. Marsh. 2007. Secondary modern schools: are their pupils disadvantaged? British Educational Research Journal 33, no. 2: 15578. Linn, R. 2000. Assessments and accountability. Educational Researcher 29: 416. Marshall, B., and M.J. Drummond. 2006. How teachers engage with assessment for learning: Lessons from the classroom. Research Papers in Education 21, no. 2: 13349. Mercer, N. 1995. The guided construction of knowledge. Clevedon: Multi-Lingual Matters. Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School years: The junior years. Wells: Open Books Publishing Ltd. Mroz, M., F. Smith, and F. Hardman. 2000. The discourse of the literacy hour. Cambridge Journal of Education 30, no. 3: 38090. Osborn, M., E. McNess, P. Broadfoot, A. Pollard, and P. Triggs. 2000. What teachers do:Changing policy and practice in primary education. London: Continuum. Pollard, A., P. Broadfoot, P. Croll, M. Osborn, and D. Abbott. 1994. Changing English primary schools? The impact of the education reform act at key stage one. London: Cassell.

Pryor, J., and H. Torrance. 2000. Questioning the three bears: the social construction of assessment in the classroom. In Assessment: Social process and social product, ed. A. Filer. London: Routledge Falmer. Reay, D., and D. Wiliam. 1999. Ill be a nothing: Structure, agency and the construction of identity through assessment. British Educational Research Journal 25, no. 3: 34354. Skidmore, D., M. Perez-Parent, and D. Arnfield. 2003. Teacherpupil dialogue in the guided reading session. Reading Literacy and Language 37, no. 2: 4753. Smith, F., F. Hardman, K. Wall, and M. Mroz. 2004. Interactive whole class teaching in the National Literacy and Numeracy Strategies. British Educational Research Journal 30, no. 3: 395412. Stenhouse, L., ed. 1980. Curriculum research and development in action. London: Heinemann. Strang, D., and M. Macy. 2001. In search of excellence: Fads, success stories, and adaptive emulation. American Journal of Sociology 107, no. 1: 14782. Tatto, M.T. 2007. Reforming teaching globally. Oxford: Symposium Books. Tikly, L. 2004. Education and the new imperialism. Comparative Education 40, no. 2: 17398. Torrance, H., ed. 1995. Evaluating authentic assessment: Issues, problems and future possibilities. Buckingham: Open University Press. Torrance, H. 2003. Assessment of theNational Curriculumin England. In International handbook of educational evaluation, ed. T. Kellaghan and D. Stufflebeam. Dordrecht: Kluwer. Torrance, H. 2007. Assessment as learning? How the use of explicit learning objectives, assessment criteria and feedback in post-secondary education and training can come to dominate learning. Assessment in Education 14, no. 3: 28194. Torrance, H. Forthcoming. Using assessment in education reform: Policy, practice and future possibilities. In Knowledge, values and educational policy, ed. H. Daniels, H. Lauder, and J. Porter. London: Routledge. Torrance, H., and J. Pryor. 1998. Investigating formative assessment: Teaching learning and assessment in the classroom. Buckingham: Open University Press. Torrance, H., and J. Pryor. 2001. Developing formative assessment in the classroom: Using action research to explore and modify theory. British Educational Research Journal 27, no. 5: 61531. Torrance, H., and J. Pryor. 2004. Investigating formative classroom assessment. In Learning to read critically in teaching and learning, ed. L. Poulson and M. Wallace. London: Sage.

Torrance, H., H. Colley, D. Garratt, H. Piper, K. Ecclestone, and D. James. 2005. The impact of different modes of assessment on achievement and progress in the learning and skills sector. London: LSDA for the LSRC. Webb, R. 1993. Eating the elephant bit by bit: The national curriculum at key stage 2. London: Association of Teachers and Lecturers. Webb, R., and G. Vulliamy. 2006. Coming full circle? The impact of new labours education policies on primary school teachers work. London: The Association of Teachers and Lecturers.Educational Research 227 Wyse, D., and R. Jones. 2008. Teaching English, language and literacy. 2nd ed. London: Routledge. Wyse, D., and D. Opfer. Forthcoming. Globalisation and the international context for literacy policy reform in England. In The international handbook of English, language and literacy teaching, ed. D. Wyse, R. Andrews, and J. Hoffman. London: Routledge. Yates, A., and D.A. Pidgeon. 1957. Admission to grammar schools. London: Newnes. 228 D. Wyse and H. Torrance

You might also like