You are on page 1of 19

Statistics, Research, & SPSS: The Basics

SPSS (Statistical Package for the Social Sciences) is a software program that makes the calculation and presentation of statistics relatively easy. It is an incredibly expensive piece of software
(http://www.spss.com/stores/1/Software_HigherEducation_C91.cfm),

so please do not take access to it for granted as McDaniel College pays a large sum of money to make SPSS available to students. The biggest problem with SPSS is that it is too easy, and will tempt you to try statistical tests that are not appropriate for the data you have collected or for the Research Questions and Hypotheses you are proposing. While we will go over everything you need to know in class, there are many resources freely available online for mastering SPSS. YOU are responsible for mastering SPSS, and YOU need to practice, find alternative information sources, and fill in any gaps in your knowledge/skill sets regarding use of SPSS and statistics. A simple search for SPSS tutorials on Google will yield a host of useful resources. The following information packet provides you with everything you need to know about SPSS in order to be successful in this course as well as in Senior Seminar. This information packet includes information on entering data, outputting data to MS Word, creating variables, cleaning your data, performing descriptive statistics, performing inferential statistics, and developing composite scales.

Table of Contents
Statistics, Research, & SPSS: The Basics .................................. 1 Table of Contents ..................................................................... 2
Entering Data/Creating Variables ...................................................... 3 Cleaning Your Data............................................................................. 6 Descriptive Statistics.......................................................................... 7 Inferential Statistics ........................................................................ 10 Independent Samples T-Test.......................................................10 ANOVA ....................................................................................10 Correlations..............................................................................11 Linear Regression ......................................................................12 Creating Scales (Factor Analysis/Reliabilities) ................................ 13 Validity and Reliability ..................................................................... 15 Internal Validity ........................................................................15 External Validity ........................................................................15 Ecological validity ......................................................................15 POPULATION VALIDITY ..............................................................16 Construct Validity ......................................................................16 Intentional Validity ....................................................................16 Content Validity ........................................................................16 Face Validity .............................................................................17 OBSERVATION VALIDITY ............................................................17 Criterion Validity .......................................................................17 Concurrent Validity ....................................................................17 Predictive Validity......................................................................17 Convergent Validity ...................................................................17 Discriminant Validity ..................................................................18 FACTORS JEOPARDIZING VALIDITY .............................................18 Reliability .................................................................................19

Entering Data/Creating Variables


In SPSS, there are two views: DATA VIEW and VARIABLE VIEW. DATA VIEW is used for typing in data, and VARIABLE VIEW is used for creating variables. The key to typing in data is to type in responses across the page in DATA VIEW. So, when you are looking at an individuals responses to your survey, for example, you need to type those responses (or their proper code) moving across the page (the first row = the first respondents answers, the second row = the second respondents answers, and so). In order to type in the data, the data has to be coded (put into numbers). For example, you have a categorical variable called GENDER, and on your survey you have a question such as: 1. What is you gender? Male ___ Female ___. Then, if someone selects Male, you can code their response as a 0, and if someone selects Female, you can code their response as a 1. If you used a Likert Scale to represent a numerical variable such as WILLINGNESS TO COMMUNICATE (1 = Strongly Disagree, 2 = Disagree, 3 = No Opinion, 4 = Agree, and 5 = Strongly Agree), then coding is easy since numbers are already attached to responses. The challenge in this case is to make sure all items are coded in the same direction. Compare the following 2 statements: 1. I like to talk whenever I have the chance. 2. I dont like to talk even if I have the chance.

1 1

2 2

3 3

4 4

5 5

A 1 on item 1 does not equal a 1 on item 2. In fact, the responses for the 2 items are opposites. And, so we would need to reverse code one of the items so that 1 = 5, 2 = 4, 3 = 3, 4 = 2, and 5 = 1. Generally, it is standard practice to put positive values on the right side and negative values on the left: Negative No Disagree Hate 1 1 2 2 3 4 3 5 4 6 5 + Positive Yes Agree Love 7

In VARIABLE VIEW, one has the opportunity to do a variety of things. First, one can give variables names, but with a few restrictions. The variable name must start with a letter, cannot have blank spaces, and can only be 8 characters long. Another feature that is commonly used is VALUES. VALUES allows you to assign numeric values to words. For example: Value: 1 Value Label: Freshman Add 1.00 = Freshman You must press Add after each value label. Another key function is MISSING. MISSING allows you to assign a numeric value to missing data. Commonly, missing data is coded as 99. For example, we know we have values as follows: 1 2 3 4 5 = = = = = Strongly Disagree Disagree No Opinion Agree Strongly Agree

These are the only 5 options, however there is a 6 as a response for one respondent. Either 1) data entry was wrong or 2) the respondent made a mistake. Either way, the data is missing. So we can enter 6 as missing data using the MISSING feature. Finally, MEASURE allows us to specify the type of variable a variable is. A variable can be either 1) nominal or categorical, 2) ordinal, 3) interval, or 4) ratio. Some statistics can only be performed on categorical variables. Some can only be performed on ratio. So, using MEASURE, we can specify what type of variable the variable is. Nominal Variables: are categories and not numbers. For example, GENDER is a nominal variable consisting of 2 categories, MALE or FEMALE. (mode) Ordinal Variables: the numbers assigned to objects are in a rank order; first, second, third An example of an ordinal level variable would be SOCIAL CLASS. We assume there is some difference between HIGH CLASS and UPPER MIDDLE CLASS with HIGH CLASS being more than UPPER MIDDLE CLASS, but the difference is not exact, and the differences between HIGH CLASS and

UPPER MIDDLE CLASS and the differences between UPPER MIDDLE CLASS and MIDDLE CLASS may not be the same amount even though they should be. (median) Interval Variables: have equal intervals between values. For example, temperature is measured using an interval scale. The difference between 1 and 2 degrees is the same as the difference between 4 and 5 degrees. Likert Scales and Semantic Differential Scales are interval level measures (though they are treated as ration level). (mean) Ratio Variables: have all the features of the other variables plus they have a true zero. While there will never be a time when there is no temperature, it is possible (unfortunately) to have no money. Income is thus a ratio level variable. (variance)
have an inherent order are numbers with from more to less or higher equal intervals to lower between them are numbers that have a theoretical zero point

Level Nominal level Ordinal level Interval level Ratio level

are names X X X X

X X X X X X

Video Tutorials (sound quality varies): 1. Typing in data: http://distdell4.ad.stat.tamu.edu/spss_1/TypeDataSPSS.html 2. Outputting charts and graphs to MS Word: http://distdell4.ad.stat.tamu.edu/spss_1/OutputWindowSPSS.html

Cleaning Your Data


Cleaning data is a rather simple, but necessary step. Inevitably, there will be data entry error as well as respondent error when filling out surveys or typing in data values. Cleaning your data will help lessen the negative consequences of these types of errors. For example, there is a Likert Scale type measure for SELF-DISCLOSURE. On one of the items, a mistake was made and 33 was entered as the value rather than 3. Lets say that there are 19 responses ranging from 2 to 4 with an average of 3. So, 3 times 19 = 57. The 20th response was mistyped as 33. 57 + 33 = 90. 90/20 = 4.5. Due to this one error, the mean average for SELF-DISCLOSURE changed from 3 (basically an average amount of self-disclosure) to 4.5 (high amounts of selfdisclosure). So, while in realty, people do not have high amounts of self-disclosure, because of the error, it seems like they do. This error causes all sorts of problems, and claims will be made that are not based on the true data. YIKES!!! In order to ensure that such things do not happen, we clean the data by checking to make sure that all values fall within the expected range. We know what the expected values are from having created the variables. We can check to make sure that there have been no data entry errors by using DESCRIPTIVES. First, go to the ANALYZE menu. Then, scroll down to DESCRIPTIVE STATISTICS. Next, chose DESCRIPTIVES. For each item, we will calculate mean scores. Then, from the output we can check if the listed values fall within the expected range of values. We can change unexpected values to MISSING DATA. Missing data is not used in statistical calculations. Another way of cleaning ones data is to remove outliers. This process is described at the end of the next section.

Descriptive Statistics
Every research study should include demographic data to provide information about the sample. One of the first crucial decisions made in social science research is who the study references. You must provide some evidence that the sample you have is representative of the population you are targeting. This is done using descriptive statistics (which can include items such as: gender, age, social economic status, ethnicity, employment status, income, religion, or some other category or identifier). At a minimum, every study includes descriptive statistics about age, gender, and ethnicity included in the Sampling Method section of the Methods section of the research paper. Descriptive statistics include several types: 1) proportions (percentages) and 2) means. Proportions basically show how many people fit into a category. For example in my study of undergraduate student cognitive learning, the study had an n of 333 with 55% females (n = 183), 44% males (n = 146), and 1% (n = 4) who declined to respond (n = number of people in a sample and N = number of people in a population). If the population I was targeting for this study was undergraduate students and we know that in this population women account for 10% and males for 89% or the total, then we should show caution when interpreting the results of the study since the sample isnt representative of the population. In other words, what is normal in our sample group may not be the same as what is normal in that population since the proportions are so dramatically different. The other type of descriptive statistics is based on group means (what is average, normal, or typical for that group). In fact, statistics are based on the idea of a normal curve. The normal curve is the idea that peoples attitudes, opinions, feelings, beliefs, and behaviors tend to center around a central point (the mean). For example, most American undergraduates believe that statistics are difficult. The central point would then be 3 on a 5-point scale. 3 in this case means agreement with the average opinion or view of statistics (statistics are hard). Yet, there will be some variation. Some people will not think statistics are so hard, and others will think statistics are extremely hard. We know that 68% of people will be within standard deviation of the central point. We know that 95% of people will be within 2 standard deviations of the mean and 99.7% will be within 3 standard deviations of the mean. People who are beyond three standard deviations from the mean are considered outliers and either become the focus of the study (if one is exploring the variations in

human behavior) or their data are tossed out of the study. There are several reasons why people could be outliers. One reason is that they are different from other people. Another reason is that people have systematically responded (always responded with 1) without really thinking about what they were doing. Normal Curve

1 standard deviation = 68% 2 standard deviations = 95% 3 standard deviations = 99.7% Beyond 3 standard deviations = outliers (can be discarded from data depending on what population you are looking at) Measures of Central Tendency 1. mean: average 2. median: the middle value 3. mode: most common Measures of Dispersion 1. Range: high and low score 2. Standard Deviation: distance from the mean

Skewness and Kurtosis I. Skewness: type of distribution 1. Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be rightskewed. 2. Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be leftskewed.

II. Kurtosis: higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations.

* In SPSS, look at Frequencies under Analyze. Select the mean, standard deviation, skewness, and kurtosis. You can also generate a histogram with the normal curve superimposed on it.

Inferential Statistics
Independent Samples T-Test
The Independent Samples T-Test is used when one is comparing the mean differences of the two groups (i.e. GENDER = MALE and FEMALE) of a categorical variable in terms of one numerical variable (i.e. SELF-DISCLOSURE). In the case of gender and self-disclosure, the hypothesis would be: H1: Female undergraduate students are more willing to disclose personal information during class than male undergraduate students. A self-disclosure scale is created, data is collected, and then the data is entered into SPSS. For the GENDER variable, MALE is usually coded as 0, and FEMALE is coded as 1. The reasoning for this is: 0 = not X [not having the attribute of X], and 1 = +X [having the attribute of X]. So, 0 = not FEMALE, and 1 = +FEMALE. Before running the Independent Samples T-Test, the data is cleaned and the scale for SELF-DISCLOSURE is created and tested (see Factor Analysis/Reliabilities below). When these steps have been completed, the Independent Samples T-Test is run.

ANOVA
Analysis of Variance (ANOVA) is used when one is comparing the mean differences of two or more groups (i.e. ACADEMIC STATUS = FRESHMAN, SOPHOMORE, JUNIOR, and SENIOR) of a categorical variable in terms of one numerical variable (i.e. CRITICAL THINKING). In the case of gender and selfdisclosure, the hypothesis would be: H1: As ACADEMIC STATUS increases, CRITICAL THINKING increases. Basically, this means that on average seniors as a group should have significantly higher scores in critical thinking than juniors, sophomores, and freshman. This also means that on average juniors have significantly higher scores in critical thinking than sophomores and freshman, and that sophomores on average have significantly

higher scores in critical thinking than freshman. The key here is that group differences are being compared and not individual differences. So, the logical conclusion that Cindy is a senior, and seniors have higher critical thinking skills, and thus, Cindy has higher critical thinking skills cannot be made. The real value of this study is if seniors do not significantly differ from the other three groups in terms of critical thinking (assuming that critical thinking is a goal of undergraduate education). If there is no significant difference, then there is a problem with the curriculum. We wouldnt have known that the curriculum was flawed if we hadnt conducted this research. If there are significant differences, then we can assume tat our curriculum is doing a good job in terms of increasing student critical thinking skills over the course of their academic career. The procedures for conducting ANOVA are similar to an Independent Samples TTest. Measures are created for ACADEMIC STATUS and for CRITICAL THINKING. The measures are given to the sample groups (data is collected), and the data is entered and cleaned in SPSS. Next, Factor Analysis and a Reliability score are calculated for CRITICAL THINKING. CRITICAL THINKING is transformed into a composite measure, and then the ANOVA is calculated on the ACADEMIC STATUS variable (i.e. SENIOR = 4, JUNIOR = 3, SOPHOMORE = 2, and FRESHMAN = 1) and the CRITICAL THINKING composite.

Correlations
One of the conditions for establishing cause and effect relationships between variables is that the two variables correlate. The correlation statistics show that there is (or isnt) a significant relationship between two variables. If the correlation is too high, then it is likely that the two variables are actually the same thing. Different correlation statistics are used depending on the types of variables: Pearson Product Moment Correlation: interval + interval Spearman rank Order Correlation (rho): ordinal + ordinal Kendall rank order Correlation (tau): ordinal + ordinal In SPSS, go to Analyze, then Correlate, and then Bivariate. Enter the two variables of interest. Hypotheses or RQs that require correlation statistics for analysis include: RQ1: Is there a positive relationship between self-esteem and affinity seeking? (As self-esteem increases, does affinity seeking also increase = positive relationship. A

negative relationship would be: As self-esteem increases, affinity seeking decreases. In exploratory research, one would nearly need to ask whether or not there is a relationship between self-esteem and affinity seeking.) H1: As affinity seeking increases, self-esteem decreases. (The opposite statement means the same thing: As self-esteem increases, affinity seeking decreases.) RQ2: Is there a significant relationship between message clarity and message relevance?

Linear Regression
Linear regression is used when you want to know how well variables predict other variables (account for the variance). In order to make claims about cause and effect relationships, a regression statistic must be calculated. While not a terribly complicated procedure in SPSS, it would help to look at several tutorials. Try the following: http://academic.udayton.edu/gregelvers/psy216/SPSS/reg.htm (simple text) http://calcnet.mth.cmich.edu/org/spss/StaProcRegress.htm (with video) Example of Hypotheses requiring linear regression would be (it would be odd to have a RQ requiring linear regression since prediction implies that we already have a certain amount of knowledge about the relationship between the variables of interest): H1: Message clarity is the most significant predictor of cognitive learning. H2: Message clarity accounts for a more significant amount of the variance in student cognitive learning outcomes than message relevance, motivation, or self-esteem.

Creating Scales (Factor Analysis/Reliabilities)


In our struggle to impose human order on reality, we need to have the ability to create measures of abstract concepts. Basically, we provide conceptual definitions of variables in the rationale and substantiate those conceptual definitions during the literature review. Then, we have to come up with ways to observe the presence (or absence) of those conceptual variables in the real world. In other words, we need to transform the variables of interest into real world indicators. For example, we believe that human behavior is driven by economic realities. Indeed, in most situations people conduct a cost/benefits analysis in their head, the result of which is used for decision making and planning. Some people do this more or less than others. So, we have created a variable here: Need for Cost/Benefit Analysis. Cool! Yet, how then do we measure this variable; especially measure it in a way that is likely to capture variations in the intensity of the need people have for it? The down and dirty way is to ask people about it. However, most people may not have considered their own behavior in this way. In fact, talking about behavior in this way may influence peoples behavior. Complicated, huh? So, we think about events, actions, situations, and so on that would represent more or less of this need. This requires further thought. What exactly do we mean by cost? What about benefit? Cost could include money, effort, time, vulnerability Benefit could include better position, more opportunities, improvement, advantage, better competitive edge, satisfaction, need fulfillment, financial gain Perhaps, the easiest have to distinguish between cost and benefit is by cost = negative, and benefit = positive. OK. So now we have a better understanding of cost/benefit, but we are really interested in a variable called need for cost/benefit analysis. Analysis implies thinking, planning, consideration, considering consequences and gains, minimizing risks and maximizing gains Need implies that this is an innate, hardwired faucet of human behavior. So, now what would be real world indicators of this need for premeditated decision making? How about creating some statements and asking people to agree or disagree with those statements. For example: 1. I think carefully about consequences before making decisions. 2. I try to minimize risks by thinking ahead.

So, now we have two items (enough to create a composite measure). Are these two items enough to truly capture the construct and the variations in peoples need for cost/benefit analysis? Probably not. We need to come up with many more possible statements (actually, initial scale development usually means trying to be exhaustive [though in practice this is generally not possible] in the number of items). The rule is creating three times as many items as you ultimately want the composite measure to be. So, if you want a 10-item scale, then you need to initially create 30 items. During the Factor Analysis process, the 30 items will be whittled down to the top 10. Constructs can be quite complex consisting of multiple dimensions (factors). A unidimensional construct has one factor, alternate interpretation or meaning. For example, Puppy Love is a unidimensional construct. Everyone agrees on its definition. Love though is a multidimensional construct. Love includes Sex, Platonic Love, Love between Family Members, Love between Friends, and so on. So, if we were creating a measure for Love, we would have one huge composite measure for Love divided into a zillion little subscales for all the different dimensions of Love. We use factor analysis to choose the best items to represent a construct, and to see if the construct has one dimension or multiple dimensions (factors). And, we calculate Cronbachs alpha reliabilities to test the consistency with which our participants respond to our measures. (Reliability = consistency; validity = accuracy). Unfortunately, we can never be completely sure that our measures are measuring what we think we are measuring. Rather we must provide evidence that 1) the construct exists (by defining it making it as concrete as possible), 2) the construct can be observed, 3) observation is consistent across people, time, and place, 4) the construct is theoretically supported, 5) the construct makes sense, and 6) the construct is supported by the data. The following section contains much more information about reliability and validity in research.

Validity and Reliability


Validity has two distinct fields of application. The first involves test validity, the degree to which a test measures what it was designed to measure. The second involves research design. Here the term refers to the degree to which a study supports the intended conclusion drawn from the results. In the Campbellian tradition, this latter sense divides into four aspects: support for the conclusion that the causal variable caused the effect variable in the specific study (internal validity), support that the same effect generalizes to the population from which the sample was drawn (statistical conclusion validity), support for the intended interpretation of the variables (construct validity), and support for the generalization of the results beyond the studied population (external validity).

Internal Validity
Internal validity is an inductive estimate of the degree to which conclusions about causes of relations are likely to be true, in view of the measures used, the research setting, and the whole research design. Good experimental techniques in which the effect of an independent variable on a dependent variable is studied under highly controlled conditions, usually allow for higher degrees of internal validity than, for example, single-case designs.

External Validity
The issue of External validity concerns the question to what extent one may safely generalize the (internally valid) causal inference (a) from the sample studied to the defined target population and (b) to other populations (i.e. across time and space).

Ecological validity
This issue is closely related to external validity and covers the question to which degree your experimental findings mirror what you can observe in the real world (ecology= science of interaction between organism and its environment). Ecological validity is whether the results can be applied to real life situations. Typically in science, you have two domains of research: Passive-observational and activeexperimental. The purpose of experimental designs is to test causality, so that you can infer A causes B or B causes A. But sometimes, ethical and/or methological

restrictions prevent you from conducting an experiment (e.g. how does isolation influence a child's cognitive functioning?) Then you can still do research, but it's not causal, it's correlational, A occurs together with B. Both techniques have their strengths and weaknesses. To get an experimental design you have to control for all interfering variables. That's why you conduct your experiment in a laboratory setting. While gaining internal validity (excluding interfering variables by keeping them constant) you lose ecological validity because you establish an artificial lab setting. On the other hand with observational research you can't control for interfering variables (low internal validity) but you can measure in the natural (ecological) environment, thus at the place where behavior occurs.

POPULATION VALIDITY Construct Validity


Construct validity refers to the totality of evidence about whether a particular operationalization of a construct adequately represents what is intended by theoretical account of the construct being measured. (Demonstrate an element is valid by relating it to another element that is supposedly valid.) There are two approaches to construct validity- sometimes referred to as 'convergent validity' and 'divergent validity'.

Intentional Validity
Do the constructs we chose adequately represent what we intend to study?

Content Validity
This is a non-statistical type of validity that involves the systematic examination of the test content to determine whether it covers a representative sample of the behaviour domain to be measured (Anatasi & Urbina, 1997, p. 114). A test has content validity built into it by careful selection of which items to include (Anatasi & Urbina, 1997). Items are chosen so that they comply with the test specification which is drawn up through a thorough examination of the subject domain. Foxcraft et al (2004, p. 49) note that by using a panel of experts to review the test specifications and the selection of items the content validity of a test can be improved. The experts will be able to review the items and comment on whether the items cover a representative sample of the behavior domain.

Face Validity
Face validity is very closely related to content validity. While content validity depends on a theoretical basis for assuming if a test is assessing all domains of a certain criterion (e.g. does assessing addition skills yield in a good measure for mathematical skills? - To answer this you have to know, what different kinds of arithmetic skills mathematical skills include ) face validity relates to whether a test appears to be a good measure or not. This judgment is made on the "face" of the test, thus it can also be judged by the amateur.

OBSERVATION VALIDITY Criterion Validity


Criterion-related validity reflects the success of measures used for prediction or estimation. There are two types of criterion-related validity: Concurrent and predictive validity. A good example of criterion-related validity is in the validation of employee selection tests; in this case scores on a test or battery of tests is correlated with employee performance scores.

Concurrent Validity
Concurrent validity refers to the degree to which the operationalization correlates with other measures of the same construct that are measured at the same time. Going back to the selection test example, this would mean that the tests are administered to current employees and then correlated with their scores on performance reviews.

Predictive Validity
Predictive validity refers to the degree to which the operationalization can predict (or correlate with) with other measures of the same construct that are measured at some time in the future. Again, with the selection test example, this would mean that the tests are administered to applicants, all applicants are hired, their performance is reviewed at a later time, and then their scores on the two measures are correlated.

Convergent Validity
Convergent validity refers to the degree to which a measure is correlated with other measures that it is theoretically predicted to correlate with.

Discriminant Validity
Discriminant validity describes the degree to which the operationalization does not correlate with other operationalizations that it theoretically should not correlated with.

FACTORS JEOPARDIZING VALIDITY


Campbell and Stanley (1963) define internal validity as the basic requirements for an experiment to be interpretable did the experiment make a difference in this instance? External validity addresses the question of generalizability to whom can we generalize this experiment's findings? Internal Validity: the eight extraneous variables can interfere with internal validity are: 1. History, the specific events occurring between the first and second measurements in addition to the experimental variables 2. Maturation, processes within the participants as a function of the passage of time (not specific to particular events), e.g., growing older, hungrier, more tired, and so on. 3. Testing, the effects of taking a test upon the scores of a second testing. 4. Instrumentation, changes in calibration of a measurement tool or changes in the observers or scorers may produce changes in the obtained measurements. 5. Statistical regression, operating where groups have been selected on the basis of their extreme scores. 6. Selection, biases resulting from differential selection of respondents for the comparison groups. 7. Experimental mortality, or differential loss of respondents from the comparison groups. 8. Selection-maturation interaction, etc. e.g., in multiple-group quasiexperimental designs External Validity: the four factors jeopardizing external validity or representativeness are: 9. Reactive or interaction effect of testing, a pretest might increase the scores on a posttest 10. Interaction effects of selection biases and the experimental variable.

11. Reactive effects of experimental arrangements, which would preclude generalization about the effect of the experimental variable upon persons being exposed to it in non-experimental settings 12. Multiple-treatment interference, where effects of earlier treatments are not erasable.

Reliability
In statistics, reliability is the consistency of a set of measurements or measuring instrument, often used to describe a test. This can either be whether the measurements of the same instrument give or are likely to give the same measurement (test-retest), or in the case of more subjective instruments, such as personality or trait inventories, whether two independent assessors give similar scores (inter-rater reliability). Reliability is inversely related to random error. Reliability does not imply validity. That is, a reliable measure is measuring something consistently, but not necessarily what it is supposed to be measuring. For example, while there are many reliable tests of specific abilities, not all of them would be valid for predicting, say, job performance. In terms of accuracy and precision, reliability is precision, while validity is accuracy. In experimental sciences, reliability is the extent to which the measurements of a test remain consistent over repeated tests of the same subject under identical conditions. An experiment is reliable if it yields consistent results of the same measure. It is unreliable if repeated measurements give different results. It can also be interpreted as the lack of random error in measurement. Check the section on Cronbachs alpha in the text for information about how to calculate reliabilities for composite measures.

You might also like