You are on page 1of 14

Testing and Evaluation: English Language Circle the correct meaning for each word. 1. A B. C. D.

KREUZER A German maker of cheeses. German pastry made in the shape of a cross. Scandinavian word for a ships navigator. A small coin of low value formerly in currency in Austria and Germany.

2. KWACHA A. The basic monetary unit of Zambia equal to 100 ngwee. B. Any of a number of tropical diseases resulting in acute anemia, swelling of the lower intestinal tract, and engorged glands. C. South African rhythmic dance to accompany flutes and bongos. D. A nut of the betel-nut family found in the lowlands of Peru and Bolivia; the hull is used in the dying of cloth and the meat is an edible protein. 3. MEALIE A. A microscopic organism that enhances the excretory process of mucous membranes. B. Burrowing animal of the rodent family usually found in tropical regions. C. In Appalachian dialect, a derogatory term for someone who eats too much. D. South African word for an ear of corn. 4. PROPTOSIS A. Forward displacement of an organ, such as an eyeball. B. A trigonometric function denoting the tendency of certain hyperbolic curves to reach a state of linearity. C. In dream analysis, the underlying factor in the individuals world view that gives credence to the majority of dream subjects. D. A systematic disease of the roots of various coniferous trees found mainly in North America. 5. A. B. C. D. 6. A. B. C. D. LATICIFEROUS Pertaining to plants predisposed to climb fences or walls. Secreting or exuding latex. Pertaining to animals lacking in climbing ability. Trees which grow parallel to the ground. QUADROON A small featherless bird usually found in swampy areas. A person having one quarter Negro ancestry. Four sailing ships in tandem. Old English gold coin worth one quarter of a pound.

Answers 1. D 2. A 3. D 4. A 5. B 6. B If you get more than two correct, you have exceeded the average.

Testing and Evaluation: English Language Why test? Tests can increase motivation as they serve as milestones of student progress. Tests can spur learners to set goals for themselves, both before and after a test. Tests can aid the retention of information through the feedback they give on learners competence. Tests can provide a sense of periodic closure to various units and modules of a curriculum. Tests can encourage students self- evaluation of their progress. Tests can promote student autonomy as they confirm areas of strength and areas needing further work. Tests can aid in evaluating teaching effectiveness. Thus, we test for the following purposes: Diagnosis to gauge pupils strengths and weaknesses Assessment to assess to what extent pupils have benefited from a course of instruction Evaluation to evaluate the effectiveness of methods of teaching Prediction to predict pupils future performance Placement to place the pupils in the most beneficial educational situations Types of tests An achievement or attainment test does the following: to assess what a learner has learnt at the end of a course or a period of time. The items in this kind of test are related to a given syllabus / teaching programme. The test is backward looking, i.e., the results show what students have already learnt Examples are SPM, PMR or STPM exams. A proficiency test does the following: to assess students proficiency with reference to a particular activity they will need to perform The test presupposes some previous learning, but it is not related to any particular syllabus / teaching programme The test is forward-looking, i.e., the results should show what the students are capable of doing in future Examples are IELTS, TOEFL An aptitude test does the following: to assess what a person can do, either at the moment or at some future time. The tests assumes no previous learning The test is forward-looking, i.e., the results should show what the students are capable of doing in future Examples are tests given to potential diplomats, pilots, astronauts, etc

A diagnostic test does the following: to identify what a learner can and cannot do. The test may or may not be related to a specific syllabus / teaching programme. The test could be both forward- and backward-looking i.e., the results would indicate what the students are capable of doing in future and the results could also indicate what they have already mastered or vice versa. Examples are tests given to students after they have learnt a topic. Concepts Validity Validity refers to the appropriateness to what the test is purported to measure. You must always think of adding the preposition for i.e.,What is a particular test valid for? Does the test measure what it is supposed to measure. If, for example, it is intended to measure speaking ability, are the results influenced by the personality of the examiners? Types of validity Face validity Content validity Construct validity Empirical validity Face validity Face validity describes whether the test is seen as reasonable. If teachers and learners feel unhappy about a test, it will not yield good results. Students often provide very informative criticism of a test. It is useful to compare the general tone of their comments with their scores. Here are some questions when considering face validity: Does the test look right? Do people involved such as teachers, candidates, sponsors and user institutions believe it to be a fair test? Content validity Content validity deals with the relevance of the test. Failure to assess the items or tasks it is supposed to means that it does not match the items or tasks and the aims of the course. Here are some questions when considering content validity: Does the test cover a representative sample of relevant language skills / elements? Does the content accord with the syllabus content / needs analysis? Construct validity

Construct validity looks at how the test matches the theory behind it. A test should share the same assumptions about language learning. For example, if the courses teach certain language skills by using a lot of exposure to authentic documents and language, the test procedure should follow the same approach. It would be inappropriate, for example, to ask a group of engineers to write an essay on teaching methodology in the terminal test. Construct validity Questions to ask when considering construct validity: Does the test measure underlying ability in accordance with a theory of language and language learning? Does it have a theoretical basis? For example, if you believe that phoneme discrimination is not important in listening comprehension, then a listening test that measures phoneme discrimination would lack construct validity. Empirical validity The results of the test should be in accordance with those of other forms of assessment. Concurrent validity: compared with another language measure given at the same time Predictive validity: compared with a measure of subsequent performance in a relevant task Ensuring validity We ensure validity by Selecting the appropriate test for a given purpose Selecting the appropriate content and making sure that we only test what we have taught (drawn from syllabus specifications, scheme of work, text book) Selecting the appropriate assessment criteria Basing the test on valid constructs i.e. selecting the appropriate testing approach (evidenced in item type) Reliability Reliability refers to how consistent the test results are. It is usually associated with the following terms: consistency, accuracy, dependability, keeping rank order the same every time the test is applied to the same group of students. The question when considering reliability is: Do the variations in results reflect true variations in what is being measured, or do they reflect, other irrelevant factors? Ensuring reliability We ensure reliability by Designing items with a clear correct answer or answers Designing clear and appropriately detailed marking schemes, agreed on by all markers 5

Training all markers to use the marking schemes Preventing cheating in the test Providing clear rubrics Standardizing procedures and conditions for all test students, wherever and whenever they take the test Giving sufficient time to complete the test not too much and not too little Setting a sufficient number of items because the more items there are, the greater the range of scores possible We ensure reliability by asking: Is the test paper / answer sheet set out clearly and printed correctly to avoid misunderstandings? Are the test rubrics clear? Are practice examples provided to familiarize students with the format of the test? Practicality Practicality refers to issues related to cost, administration and marking. A test is practical if it is fairly straightforward to administer. Here are the questions to ask when we are dealing with practicality: How much will the test cost to prepare, administer and mark? Can it be administered and marked in the time available? How much manpower is required to run the test? What resources are required (e.g. paper, rooms, tape recorders)? Can the necessary arrangements be made to ensure smooth administration and marking? Discrimination This refers to the capacity of the test to discriminate among the different students and to reflect the differences in the performances of the individuals in the group. The extent of discrimination required will depend on the nature of the test. In classroom progress tests, it is often enough to class students performance as satisfactory or unsatisfactory. However, when a pass / fail / distinction is required, a wide spread of marks is desirable to reduce the number of students falling around the pass / fail borderline. The question to ask when dealing with discrimination is: Does the test adequately measure the differences in the students performance? Backwash Backwash refers to the effect the test has on teaching and learning. Backwash can be positive or negative. If a national syllabus stresses oral fluency and reading skills, but the examination focuses on multiple-choice grammar items, the teaching is likely to be directed

more towards the test than the syllabus objectives. This is an example of negative backwash. The question to when dealing with backwash is: How does the test affect English language teaching? Test Specifications Why do we need test specifications? The test specification ensures that the test is consistent from one year to the next in terms of items. The information given will help to make the test reliable as students will have an idea of the format of the test and the kinds of items. The skills are specified and test designers can ensure validity by referring to the syllabus for selection of test items. Test Specifications content Purpose of the test Description of the candidate Test level Construct (theoretical framework) for the test Description of suitable coursebook / text book Number of sections or papers Time for each section or paper Weighting for each section per paper Target language situation Text types ( for listening, reading) Text length Language skills to be tested Language elements to be tested Test task Test methods / techniques e.g. multiple choice items, summary Rubrics Criteria for marking Description of typical performance Description of what students can do in real world Sample paper

Table of Specifications (An example) Section A Skill Listening: main idea Inference Grammar: Tenses Reading : 200-word text MCQ Format No. of items 10 Marks 20

B C

Cloze: 1 passage Rational cloze Open-ended questions

20 20 5 1

20 20 10 30

Writing: Description 250-word essay of places of interest (notes expansion)

What are the stages in test construction? test specifications this means that the teacher needs to consider why the test is set and how to achieve the purpose of the test item writing this means a consideration of the test questions to be used and should take into account the validity, reliability, practicality and discrimination item moderation this will help to make the test more reliable, as ambiguous items and unclear rubrics can be ascertained pre-testing this will help to see whether the test is practical and reliable but security issues may arise analysis the results will help the tester decide whether the items are working or not training of examiners and administrators this will help to make the test reliable especially when tests have subjective items reporting test scores for feedback to the teacher, student, etc Test task design What constitutes an understanding of the text? Did we framework the task so that candidate processes the discourse as in a reallife context? Are the questions sequenced carefully? Are the test formats familiar? Does the test task involve realistic discourse processing? Does the test have a positive washback effect? 8

Some thoughts on testing When the test at the end of the course is communicative, it is likely that a more communicative approach will be adopted in teaching too. Weir (1989:27) describes this influence of testing on our teaching as the tail wagging the dog. Rubrics. Rubrics tell the candidate clearly what he or she has to do. To circle the correct answer; fill in the blanks with correct option, etc. The most common ways of indicating the correct answer are by: filling in a blank with the correct option; writing the letter of the correct option in the blank or in a box placed in the margin; circling the letter of the key shading the letter of the correct option in the answer sheet; underlying the correct option; putting a tick or cross in the box provided next to the correct option. For simplicity, use the imperative, for example, Circle the correct answer. Rubrics: words used to show or explain how something should be done. Example of a rubric e.g. You are given four options to complete the statement in each of the following items. Read each item carefully. Then circle the best option.

Scoring and Reporting of performance Marking scheme The objective tests have a predetermined correct answer and marking them poses no problem. The computer can be used to mark the script. However, the subjective test requires examiners to make decisions. As most of these relate to particular skills suitable scoring procedures suitable for each skill are required. Preparing a marking scheme and awarding marks is relatively easy for objective test, especially for discrete point tests. The problem becomes more acute with tests which are designed to assess language skills, especially if these skills are of a subjective nature.

Analytic versus Impressionistic marking Analytic marking is a better system for all kinds of formative tests. This marking will give students feedback on how well they are getting on in their learning. An examiner has to rely on a clear description of levels of performance to decide on the exact marks to give the candidates. Therefore analytic marking makes it possible to give detailed feedback to the candidates. Impressionistic marking is done through a very quick skimming of the answer script. The examiner does not read the answer script carefully. The students will not get any feedback in order to improve their performance. This type of marking is suitable for large numbers of scripts and the candidates do not get their work back. It can be very reliable if more than one person rates the same test script. To ensure its reliability, the practice of marking sessions with sample test papers is essential.

10

An example of a marking scheme for essays. Question Question 1 : Writing (20 marks) The members of your school Consumer Society are holding a writing competition. The theme is Buy More Local Fruits. As a participant, write an essay of about 250 300 words on the disadvantages of buying more local fruits. You are required to elaborate on five advantages. Mark scheme for Question 1: Writing. Award marks as follows: 10 marks for content 10 marks for language Content Award 2 marks for each advantage well-elaborated. Award 1 mark for a mere mention without any elaboration (5 x 2 marks = 10 marks) Language Marks 9 10 Criteria Excellent linguistic ability. Error free. Very good organisation Very good linguistic ability. Very few errors. Coherent and well-organised. Good linguistic ability. Few errors. Fairly well-organised. Fair linguistic ability. Some errors. Jerky development. Poor linguistic ability. Many gross errors. No cohesion. No attempt made

78

56

34

12 0

11

Reporting procedures The most common way of reporting the results of a test in Malaysian schools is in term of grades. The score is converted into grade. The following is a common scale : A B C D F 80 100 60 79 50 59 40 49 0 39

The grade will be in the students report card. Students are then ranked according to the scores they obtained and a class position is determined. Here is an example of a profile report: Speaking Listening Reading Writing A B C D

Key A: Excellent. Performs at near-native level of proficiency. B: Very good. Occasional errors in use but generally error free. C: Competent user. Can perform at an adequate level to perform the task minimally. D: Inadequate user. Makes many mistakes. Does not have enough competence to do task satisfactory. F: Very poor user : Very little command of language. Not able to perform task even minimally.

12

TEST ITEMS/TECHNIQUES: Rearrangement item Instruction Complete each sentence by putting the words below it in the right order. Put in the boxes only the letters of the words. Multiple choice item (MCQ) Instruction Read the text and then select the correct answer. Circle the letter A, B, C or D. Matching item Instruction Write the letter of the correct response in the space provided. MCQ item evaluation: Check whether each question meets the following criteria: The item is clear without ambiguity, complicated syntax or obscure vocabulary The item validly reflects the subject area which is being tested The stem contains as much of the items content as possible (to avoid lengthy options) 4. The options are grammatically consistent with the stem. 5. There is one and only one correct answer. 1. 2. 3. Task: MCQ item evaluation Example 1 Select the underlined word or phrase which is incorrect. Circle A, B, C or D. On Valentines Day people have to pay expensive prices for roses. A B C D Example 2 Select the correct answer. Circle A, B, C or D. An assessment of a group of airline stewardesses working on board an aircraft after having completed an English course for in-flight crew is an example of a ____ test. A. oral C. discrete point B. integrative D. communicative 13

Task: Design a reading test based on the text below. A golfer was standing in the fairway, about 140 yards out, when a frog whispered from the rough, Use an 8-iron. The golfer, deep in concentration, pulled out his 8-iron and hit the shot. It rolled right into the cup for an eagle. Now take me to Las Vegas, said the frog. What? said the startled golfer, suddenly realizing it was a talking frog. You heard me, repeated the frog, take me to Las Vegas. Im obviously a lucky frog, and well make a bundle! So the golfer picked up the frog, and they flew to Las Vegas. In the casino, the frog whispered, Go to the dice table and bet everything on the pass line. The shooter rolled a seven, and the man with the frog won $100,000.00. Then the guy took the frog upstairs to his room and the frog said, Kiss me. When he did, it turned into the most beautiful girl hed ever seen deep brown eyes, blond hair, beautiful smile and sixteen years old. And I swear, Your Honour, thats how she got in my room.
Source: Argus Hamilton in Oklahoma City Daily Oklahoma (RD Dec 1996)

14

You might also like