You are on page 1of 18

THE LIKERT SCALE USED IN RESEARCH ON AFFECT A SHORT DISCUSSION OF TERMINOLOGY AND APPROPRIATE ANALYSING METHODS

Kirsti Kislenko & Barbro Grevholm University of Tallinn & University of Agder

In the paper we discuss the terminology of Likert scale, Likert items and Likert type items, which are common constructs in studies on attitudes and beliefs in mathematics education. Further we discuss the choice of analysing methods when using Likert scale as an instrument in a study. There is no common agreement on what statistical methods are appropriate in relation to use of Likert scale. A scientific debate is suggested among researchers in the area of affective studies on the issues we raise. INTRODUCTION In research on affective variables such as motivation, beliefs, and attitudes, we are dealing with phenomena in mathematics education, which is difficult to capture. A common method for data-collection is to use questionnaires and they are often of Likert scale type. In connection to the analysis of such data we want to raise and discuss some crucial questions. Kirsti Kislenko has done a study on students beliefs

about mathematics teaching and learning supervised by Barbro Grevholm and Madis Lepik. Scrutinising literature on earlier studies in this area lead us to the problems we are discussing in this paper. The questions we inquire into in the paper are What is considered to be appropriate terminology when using a Likert scale? What is considered to be appropriate use of statistics for the analysis? To make sure what are appropriate analysing methods of the data one has to be aware of the type of the data that has been collected. It seems one can question to what degree this awareness is present in some studies in the area of affective variables?

TERMINOLOGICAL CONSIDERATIONS ABOUT A LIKERT SCALE A Likert scale (Likert, 1932) is a popular instrument to measure peoples attitudes, preferences, images, opinions, conceptions, etc., in general (Gb, McCollin, & Ramalhoto, 2007; Wu, 2007). In particular, there are several studies about affect in mathematics education that use a Likert scale as the instrument or one of the instruments (e.g. Ma, 1997; Nurmi, Hannula, Maijala, Pehkonen, 2003; Pehkonen, 1994; Pehkonen & Lepmann, 1994; Pehkonen & Trner, 2004, Vale & Leder, 2004). It seems that there are three kinds of concepts emerging when talking about Likert scales; these are a Likert scale, known as well as a Likert (summated) scale, a Likert

item, and a Likert-type item. Often researchers either treat these notions as synonyms or apply these notions to false constructs (e.g. they consistently use a Likert scale when talking about a Likert item). John S. Uebersax (2006) points out two reasons why an understanding of the differences between these constructs, and the adequate use of them, is important. Firstly, we expect mutually agreed-on definitions because then we can more clearly understand what people mean with different notions. Secondly, we should first agree on terms for making a progress in the discussion about appropriate statistical methods for such variables (Uebersax, 2006). Therefore, for avoiding further misunderstandings of these constructs we at this point give a short exposition of the notions together with an example from our instrument. The important feature of a Likert scale is the way in which it is constructed. First, a number of items are collected that all refer to the feature in question. Then these items are administered to a large group of respondents ( N of at least 100) who has to decide if these (1) absolutely apply to it (the feature in question), (2) apply to it, (3) are neutral to it, (4) do not apply to it, or (5) absolutely do not apply to it. Based on the answers of each respondent an index of selectivity is calculated, and finally, the items with the highest selectivity constitute the scale. The scale score is achieved by summing the item scores for the selected items (Dawis, 1987). Other characteristics are related to the structure of the scale: (1) the scale contains several items; (2) response levels are arranged horizontally; (3) response levels are anchored with consecutive integers; (4) response levels are also anchored with verbal

labels which connote more-or-less evenly-spaced gradations; (5) verbal labels are bivalent and symmetrical about a neutral middle (Uebersax, 2006). Therefore, a Likert scale is never an individual item; it is always a set of items (Uebersax, 2006), or in other words, a series of statements (Leder & Forgasz, 2002). A Likert scale itself is comprised of Likert items or Likert-type items. We deal with Likert items when features 2 through 5 above are all present, and with Likert-type items when only 2 through 4 appear. For example, in the KIM-questionnaire (which Kirsti based her work on; see explanation below) the statement A1 Mathematics is important was a Likert item because it was rated as totally agree, partially agree, uncertain, partially disagree, and totally disagree. But the item D1 I am afraid of making mistakes when I do mathematics a Likert-type item as it was rated as never, seldom, often, and very often. It is relevant to mention that as our study was web-based (i.e. the questionnaire was available to the students on an Internet webpage) then the respondents did not circle an integer, which is a common way of collecting data using Likert items or Likert-type items (see characteristics 3) but marked an option button instead. Nevertheless, we did not consider this feature a significant factor because in the analysis we interpreted all verbal labels by the grades (scores, degrees) (Gb et al., 2007). Therefore, a certainty that the statements in our study were either Likert items or Likert-type items was one of the basic considerations for finding appropriate statistical methods for analysing the data.

ANALYTICAL CONSIDERATIONS Clason and Dormody (1994) investigated 95 articles from the Journal of Agricultural Education where responses to individual Likert-type items on measurement instruments were analysed. The result was as follows: 54 % reported only descriptive statistics, non-parametrical statistics was used in 13% of the articles, and in 32 % of the articles used parametrical statistical procedures. Therefore, there seems to be a large variety of statistical methods used when analysing individual Likert-type items. They then pose a question about the appropriateness of these statistical methods (Clason & Dormody, 1994). Commonly there are four levels of measurement scales: nominal, ordinal, interval, and ratio (Stevens, 1946). Ordinal measure scales consist of categories ordered by a relation of the type < or , respectively. Beyond order, there is no measure for the distance between two scale values. Interval scales express magnitudes, and the differences between scale values are meaningful (Gb et al., 2007). Likert scale is by definition an ordinal scale. Goldstein and Hersen (1984) state directly that
The [Likert] scale is clearly at least ordinal. . In order to achieve an interval scale, the properties on the scale variable have to correspond to differences in the trait on the natural variable. Since it seems unlikely that the categories formed by the misalignment of the five responses will all be equal, the interval scale assumption seems unlikely. (p. 52).

This is consistent with the results reported by Hart (1996) based on the experiment suggested by Lodge (1981). The idea presented was to quantify the scores in a Likert scale by magnitudes. A small sample of 48 respondents assigned magnitudes related with commonly used adjectives on a 7-grade Likert scale: atrocious, very bad, bad, so-so, good, very good, excellent. The results confirmed that respondents did not use a linear interval scale when making judgements concerning the magnitudes, and there could be found considerable differences in the weights assigned to distances between the scores on the Likert scale. For example, the step from very bad to bad was quantified by 0.6, while the step from so-so to good was quantified by 1.9. Clason and Dormody (1994) add that an analysis of single items from Likert scales should acknowledge the discrete nature of the response. If it does, data would be summarized as counts (or percentages) occurring in the various response categories (Clason & Dormody, 1994). Although there are several appropriate statistical procedures for analysing Likert scale type of data presented in the literature (e.g. Agresti, 2002; Andrews, Klem, Davidson, OMalley, & Rodgers, 1981; Clason & Dormody, 1994; Field, 2005; Wu, 2007) inappropriate data analysis methods, mainly such as sample means, sample variances, t-tests, analysis of variance, etc., are still conveniently used by researchers in social sciences (Gb et al., 2007; Wu, 2007). A simple example shows the paradoxical situation when using, for example, mean scores as a basis for analysis. Let us say that we found following observed proportions:

5 - totally agree Group 1 Group 2 0.2 0.5

4 - partially agree 0.2 0

3 - uncertain 0.2 0

2 - partially disagree 0.2 0

1 - totally disagree 0.2 0.5

In both groups the mean score is 3 which refers to the neutral value but one has to be extremely ignorant to claim that these populations are similar (the idea of the example came from Clason and Dormody (1994)). Moreover, most of the parametrical procedures, which are inappropriate for the analysis, assume not only an interval level of measurement but at the same time demand that the response variables are normally distributed as well (Field, 2005). But as pointed out by Clason and Dormody (1994) and supported by Wu (2007) it is hard to see how normally distributed data can arise in an individual Likert-type item. A short list of some appropriate analysing methods of Likert scale, which at the same time are easy accessible when using the program SPSS could be, first, when talking about descriptive techniques: a median or a mode, frequencies, cross-tables, boxplots, etc; secondly, what comes to inferential techniques: the Mann-Whitney test (when comparing two independent samples), the Wilcoxon Signed-Rank test (when comparing two related samples), the Kruskal-Wallis test (when comparing several independent groups). If the aim is to look at the relationships between several categorical variables then one of the analysing methods could be a loglinear analysis (Field, 2005); a Rasch analysis should be undertaken when there is an aim to use the

total score on a questionnaire to summarize each person (for more see Bond & Fox, 2007). An empirical study of students beliefs Kirstis story In her study about students beliefs in mathematics that was carried out in 2005 Kirsti used a questionnaire developed in 1997 by a project called KIM1 (Streitlien, Wiik, & Brekke, 2001). There were several reasons for using this questionnaire, and one of them was the aim to see if the structure of beliefs measured with the same tool a questionnaire was stable in different samples and after 10 years of schooling. At the same time, the analysis done by KIM project researchers (factor analysis, frequency tables and means (Streitlien et al., 2001)) gave some ideas of the statistical methods that should be used with this data. Most of the items in the KIM-questionnaire could be classified as Likert-items or Likert-type items. (In rest of this section the pronouns I and my refer to Kirsti). As mentioned previously I was faced with the issues of appropriate and inappropriate statistical analysis methods when using a Likert scale when I started to carry out an analysis of the data in this study. In the beginning I experienced all the methodological problems a novice researcher probably does. The first impression of the proper analysis was that it must include sophisticated statistical procedures, which somehow equalled with the fact that I should follow the procedures done before with this questionnaire by the experts in the area (e.g. the KIM-project researchers). The
1 KIM Kvalitet i Matematikkundervisningen, translated as Quality in Mathematics Teaching.

first paper we published (see Kislenko, Grevholm, & Breiteig, 2005) included only some frequency tables for comparing the answers from different groups of students as the research question included a comparative perspective and I was not too advanced in knowledge about statistics. When improving my knowing about the analysing methods I included some mean calculations as well in the next paper, where the question was about the general tendencies in students answers (see Kislenko, Grevholm, & Lepik, 2007). Getting more knowledgeable in the area by reading different reports and papers from studies using similar instruments and investigating similar features, I applied an exploratory factor analysis for structuring the data, calculated the means of the factors and used an independent samples t-test on the perspective of comparison in the next paper (see Kislenko, 2007a). There was a turning point in my view of what counts as a proper statistics in the study when one of the reviewers of the next paper, which was first written only based on parametrical statistical procedures, pointed out some crucial issues in relation to this scale. So I reconsidered my methods in the revision of that paper. Finally, I went back to the roots of questions about the appropriate and inappropriate statistical procedures for the analysis and tried to learn more about the mathematical procedures and assumptions for the methods. Therefore, the next paper included an exploratory and confirmatory factor analysis, where I explicitly noted all the assumptions of the methods and the violations of the assumptions. The main goal was only to explore the structure of the dataset so I could see some tendencies in the data. The rest of the analysis was done based on the cross-tables and frequency tables (that should be a

common and appropriate approach when using a Likert scale questionnaire; used e.g. by Lianghuo Fan, and Shu Mei Yeo (2008)), and all the analysis was done in the light of my research questions (see Kislenko, 2007b). Some examples from the literature and different views of proper statistics At this point we would like to present some examples from the literature, which could show some of the reasons why a novice researcher can get trapped into the wrong statistical analysis when using Likert scale in his/her study. As the examples are not given for criticising others work but to illustrate the thought we presented above then the references are not included (authors do have them in case of interest). There were two papers, one about the motivation in foreign language learning, which used a 4grade Likert scale, and another one about the structure of student teachers view of mathematics using a 5-grade Likert scale (it was not even said explicitly what was the scale; based on the means one could suggest that it was a 5-point one). Both papers used only parametrical procedures (e.g. principal component analysis; t-tests; ANOVA) and did not give any clarification whatsoever of the reasons behind the choices of these methods. Two papers, one about pupils self-confidence in mathematics, and another one about gender and mathematics (both using 5-point Likert scale), carried out their analysis with parametric methods but in one part of the article there was at least a note about the choices. E.g. in one paper they mentioned: parametric tests, such as t-test, were used, and the results were checked, if needed, with corresponding nonparametric tests, in another paper: methods used are both

parametric (t-test) and non-parametric (Mann-Whitney), with basically the same result. It showed that the authors had paid attention to this issue but they were still implicit about why they did or did not choose these methods. The last paper we mention dealt with differences in pupils conceptions about mathematics teaching, and it used a 5-grade Likert scale. In this paper t-tests and means were used in the analysis, and it was refreshing to read the explanation of the choices, which was: this form of dealing with results has been consciously chosen, although all the formal prerequisites for its use were not fulfilled (e.g. the questionnaire is actually an order scale). It has also been pointed out in the literature that there are two kinds of views when it comes to the analysis of Likert-type items the supporters of measurement and the supporters of statistics. The first ones claim that the level of measurement defines the statistical procedures that can be applied to the numerical data. The latter ones declare that the level of measurement is not a constraining factor when analysing data. Therefore, summed by Dawis (1987)
Those who accept the latter view tolerate the use of parametric statistics with scores from quasi-interval scales that actually are at the ordinal level of measurement, a common practice that is criticized by proponents of the former view. (p. 487).

Even though a predominant role of measurement theory in data analysis has been criticized by several authors (e.g. Adams, Fagot, & Robinson, 1965; Lord, 1953) we

still find the appropriate statistical measures applied to Likert-type responses an important issue to address. CONCLUDING REMARKS There is obviously not a common understanding among researchers using Likert scales about what methods are appropriate to use. And there seems not to be a common understanding or agreement of the need to explain the concepts one uses or to justify the methods one chooses to use. To make sure what are the appropriate terms describing the data and what are the appropriate methods for analysing this kind of data should be essential tasks and questions in every research report. In reading scientific papers on attitudes and beliefs using different statistical methods we find that it is common that no justification at all is given for the choice of methods. It is also common that central constructs like motivation, attitude, belief, conception are not defined or described in the papers and the reader is left wondering whose meaning of the construct is valid in the text. The aim of this paper is not to declare the final truth about the statistical procedures when using a Likert scale. First, it is important to understand that any analysis of the research (and the instrument of the research) is chosen based on the research questions. Secondly, in the mathematics education area we mainly deal with data coming from the real world based on for example replies from human beings, which always adds an element of subjectivity to the data. We cannot be sure that a certain

person would give the same replies to a questionnaire if we repeated it. For example the circumstances under which the replies are given can influence the answers. What we would like to suggest is that the author of any study, using these previously mentioned constructs, would carry out and report about the analysis in a sincere and reflective way where the reader could understand the reasons for the chosen methods of analysis. We found it important to see that the analysis is done by researchers being aware of all the mentioned obstacles. The question of what is appropriate use of statistics for the analysis should be dealt with in an explicit way, as this would at least leave the reader with an understanding of the intentions and thinking of the author/researcher. The issue we are raising here is not to clarify the right or wrong answers; it is rather about the explicit explanations of the chosen methods. Actually the question here is not so much about what?, it is about why/why not?. This point is supported by Adams et al. (1965) (quoted in Gb et al., 2007), who concluded:
Nothing is wrong per se in applying any statistical operations to measurements of given scale, but what may be wrong, depending on what is said about the results of these operations, is that the statement about them will not be empirically meaningful or else that it is not scientifically debated.

We would welcome a scientific debate among researchers in the affective area on the questions we discuss in this paper.

Acknowledgement We would like to thank our dear colleague Professor Simon Goodchild from Agder University for fruitful discussions and suggestions about appropriate statistical methods when using a Likert scale. REFERENCES
Adams, E., Fagot, R. F., & Robinson, R. E. (1965). A theory of appropriate statistics. Psychometrika, 30 (2), 99-127. Agresti, A. (2002). Categorical data analysis. NY: Wiley-Interscience. Andrews, F. M., Klem, L., Davidson, T. N., OMalley, P. M., & Rodgers, W. L. (1981). A guide for selecting statistical techniques for analyzing social science data . Ann Arbor: Institute for Social Research, University of Michigan. Bond, T., & Fox, C. (2007). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, New Jersey: Lawrence Erlbaum Associates. Clason, D. L., & Dormody, T. J. (1994). Analyzing data measured by individual Likert-type items. Journal of Agricultural Education, 35 (4), 31-35. Dawis, R. V. (1987). Scale construction. Journal of Counseling Psychology, 34 (4), 481489. Fan, L., & Yeo, S. M. (2008). Integrating oral presentations into mathematics teaching and learning: An exploratory study with Singapore secondary students. In B. Sriraman (Ed.),

Beliefs and mathematics: Festschrift in honor of Gnter Trners 60 birthday (pp. 99122). Charlotte, NY: Information Age Publishing. Field, A. (2005). Discovering statistics using SPSS, 2thnd Ed. Thousand Oaks: Sage Publications. Goldstein, G., & Hersen, M. (1984). Handbook of psychological assessment. NY: Pergamon Press. Gb, R., McCollin, C., & Ramalhoto, M. F. (2007). Ordinal methodology in the analysis of Likert scales. Quality & Quantity, 41, 601-616. Hart, M. C. (1996). Improving the discrimination of SERVQUAL by using Magnitude Scaling. In G. K. Kanji (Ed.), Total quality management in action. London: Chapman and Hall. Kislenko, K. (2007a). Structuring students' beliefs in mathematics: A Norwegian case. In K. Hoskonen, and M. S. Hannula (Eds.), Current State of Research on Mathematical Beliefs XII. Proceedings of the MAVI-12 Workshop, May 25 - 28, 2006. University of Helsinki. Department of applied Sciences of Education. Research Report. Kislenko, K. (2007b). A study of Norwegian students beliefs in mathematics. In B. Jaworski, A. B. Fuglestad, R. Bjuland, T. Breiteig, and B. Grevholm (Eds.), Lringsfellesskap i matematikk Learning Communities in Mathematics (pp. 215-227). Bergen: Caspar Forlag. Kislenko, K., Grevholm, B., & Breiteig, T. (2005). Beliefs and attitudes in mathematics teaching and learning. In I. M. Stedy (Ed.), Vurdering i matematikk Hvorfor og

hvordan? Fra smskole til voksenopplring. Nordisk konferanse i matematikkdidaktikk ved NTNU 15. og 16. november 2004 (pp. 129-138). Trondheim: Nasjonal Senter for Matematikk i Opplringen. Kislenko, K., Grevholm, B., & Lepik, M. (2007). Mathematics is important but boring: students beliefs and attitudes towards mathematics. In C. Bergsten, B. Grevholm, H. Strmskag Msoval and F. Rnning (Eds.), Proceedings of NORMA05, Fourth Nordic Conference on Mathematics Education: Relating Practice and Research in Mathematics Education (pp. 349-360). Trondheim: Tapir Academic Press. Leder, G. C., & Forgasz, H. J. (2002). Measuring mathematical beliefs and their impact on the learning of mathematics. In G. C. Leder, E. Pehkonen, and G. Trner (Eds.), Beliefs: A hidden variable in mathematics education? (pp. 95-114). Dordrecht: Kluwer Academic Publishers. Likert, R. (1932). A technique for the measurement of attitudes. NY: Archives of Psychology. Lodge, M. (1981) Magnitude scaling: Quantitative measurements of opinions. Sage University Paper series on quantitative applications in the social sciences , 07-025, Beverly Hills CA: Sage Publications. Lord, F. M. (1953). On the statistical treatment of football numbers. American Psychologist, 8, 750-751. Ma, X. (1997). Reciprocal relationships between attitude toward and achievement in mathematics. Journal of Educational Research, 90, 221-229.

Nurmi, A, Hannula, M. S., Maijala, H., & Pehkonen, E. (2003). On pupils self-confidence in mathematics: gender comparisons. In N. A. Pateman, B. J. Dougherty, and J. Zillox (Eds.), Proceedings of the 27th conference of the international group for the psychology of mathematics education (pp. 453-460). Hawaii: University of Hawaii. Pehkonen, E. (1994). On differences in pupils conceptions about mathematics teaching. The Mathematics Educator, 5 (1), 3-10. Pehkonen, E., & Lepmann, L. (1994). Teachers conceptions about mathematics teaching in comparison (Estonia - Finland). In M. Ahtee and E. Pehkonen (Eds.), Constructivist viewpoints for school teaching and learning in mathematics and science (pp. 105-110). Helsinki: University of Helsinki. Department of Teacher Education. Research Report 131. Pehkonen, E. & Trner, G. (2004). Methodological considerations on investigating teachers beliefs of mathematics teaching and learning. Nordic Studies in Mathematics Education, 9 (1), 21-49. Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677-680. Streitlien, ., Wiik, L., & Brekke, G. (2001). Tankar om matematikkfaget hos elever og lrarar. Oslo: Lringssenteret. Uebersax JS. Likert scales: dispelling the confusion. Statistical Methods for Rater Agreement website. 2006. Available at: http://ourworld.compuserve.com/homepages/jsuebersax/likert.htm. Accessed: January 10, 2008.

Vale, C. M., & Leder, G. C. (2004). Student views of computer-based mathematics in the middle years: does the gender make a difference? Educational Studies in Mathematics, 56, 287-312. Wu, C. H. (2007). An empirical study on the transformation of Likert-scale data to numerical scores. Applied Mathematical Sciences, 1 (58), 2851-2862.

You might also like