You are on page 1of 20

CHAPTER 9: CORRELATION

Research Problem:

What is the relationship between two variables?

Relationship between hours studying (X)


and grades on a midterm (Y)?

Relationship between self-esteem (X)


and depression (Y)?

Correlation = Direction and strength of relationship between two variables

Chapter 9: Page 1
The Scatterplot:

Requires two scores from each person: X, Y

What is the relationship between hours studying (X) and scores on a quiz (Y)?

STUDENT HOURS SCORE


A 1 1
B 1 3
C 3 2
D 4 5
E 6 4
F 7 5

Chapter 9: Page 2
6

4
Score

0
0 1 2 3 4 5 6 7

Hours Studying

Chapter 9: Page 3
Characteristics of a Relationship:

Direction

Positive  As X goes up, Y goes up; variables “move” in same direction

Negative  As X goes up, Y goes down; variables “move” in different


directions

25
130000

110000 20

90000
15
Income

Rainfall
70000

50000 10

30000
5

10000
10 12 14 16 18 20 22 24 26
0
Education 0 2 4 6 8 10 12
Hours Spent Outdoors

Positive Negative
Chapter 9: Page 4
Form of the Relationship

(a) Linear (b) Non-linear (“curvilinear”)


190

180

170

160

Performance
150
Weight

140

130

120

110

100
58 60 62 64 66 68 70 72 Arousal
Height

Degree/Strength of Relationship

How well do the data fit a specific form

Typically look for how well data fit a straight line

Chapter 9: Page 5
Pearson Correlation Coefficient:

Symbol: r

r can range from -1.0 to +1.0

Sign (+/-) indicates “direction”

Value indicates “strength”

Measures a “linear” relationship only

(a) Direction of relationship between x, y

Positive (+r) = As X goes up, Y goes up

Negative (-r) = As X goes up, Y goes down

Chapter 9: Page 6
(b) Strength of a relationship between X, Y

Closer to  1.0, stronger

Closer to 0, weaker

when r = 0  X,Y relationship not defined by a straight line

Pearson Correlation Coefficient

-1.0 0 +1.0
Perfect No Linear Perfect
Negative Relationship Positive
Relationship Relationship

Chapter 9: Page 7
 Closer to 0 = weaker

 Closer to 1.0 = stronger

 r close to 1.0 very rare in social research

 r  .30 considered important

 r  0 could mean many things:


 No relationship at all between X & Y
 Non-linear relationship between X & Y
 Restricted range on X and/or Y
 Outlier may be causing problems

Chapter 9: Page 8
What does r represent?:

r = degree to which X and Y vary together


degree to which X and Y vary separately

r = covariance of X and Y
variance of X and Y

Computational Formula:

N  XY   X  Y
r=
[ N  X 2 ( X ) 2 ][ N  Y 2 ( Y ) 2 ]

Chapter 9: Page 9
Factors that affect r:

1. Restriction of range: “the range over which X or Y varies is artificially


limited”

Usually reduces the magnitude of r (see figure 9.7)

Can sometimes increase the magnitude of r

--Typically when the restriction eliminates a curvilinear


relationship

--r between height and age would be near zero if ages were
from 0 – 80
--r between height and age would be positive & non-zero if
ages were from 4 - 17
--in this case, the restricted range of age would result in a large
r; whereas the non-restricted range would result in a small r

Chapter 9: Page 10
Factors that affect r (cont):

2. Nonlinearity: degree to which a relationship follows a non-linear trend

r captures a linear relationship between two variables

If the true relationship between the variables is non-linear, r will be


severely reduced

3. Outliers: In the case of correlation, an unusual/extreme combination of


the X, Y variables

Outlying values can suppress an otherwise strong correlation OR


“create” a correlation that is not representative of most of the data
points

Chapter 9: Page 11
Hypothesis Testing for r:
Is r significantly different from zero?

H0: =0
H1: 0

 = “rho”, population parameter

r = sample statistic

 Almost always two-tailed (non-directional)

 Can be one-tailed (directional)

H0: 0 H0:   0


H1: >0 H1: <0

Chapter 9: Page 12
 Compute observed r, compare its absolute value to a critical value in Table
E.2

(a) 
(b) degrees of freedom: df = n-2
 Reject H0 if observed r equals or exceed critical r

Chapter 9: Page 13
Correlation vs. Causality:

 Correlation tells you two variables are related

 Does NOT tell you why!!

 Do not draw causal inferences from a correlation

X  Y

Y  X
X Third variable problem
Z
Y

Chapter 9: Page 14
examples:
r = -0.30 #friends, depression

Does being depressed cause you to not have friends?


Or does not having friends cause you to be depressed?

r = +0.40 hours studying, grades

Do people who get good grades study more?


Or does studying more lead to good grades?

r = 0.25 ice-cream sales, heart attacks

Do heart attacks cause more people to buy ice cream?


Do ice-cream sales cause people to have heart attacks?
Third variable interpretation???

 Causal inferences require an “experiment”

Chapter 9: Page 15
Other Correlation Coefficients:

Pearson r used when X & Y are at least interval level

Many types of correlation coefficients for other data

Spearman  ordinal (rank) data


Point-biserial  dichotomous, nominal X; interval/ratio Y
Phi  dichotomous, nominal X & Y

Chapter 9: Page 16
Computing the Pearson r:

HOURS SCORE
STUDENT (X) (Y) X2 Y2 XY
A 1 1 1 1 1
B 1 3 1 9 3
C 3 2 9 4 6
D 4 5 16 25 20
E 6 4 36 16 24
F 7 5 49 25 35
G 8 7 64 49 56
H 8 8 64 64 64
X = 38 Y = 35 X2=240 Y2=193 XY= 209

Chapter 9: Page 17
N  XY   X  Y
r=
[ N  X 2 ( X ) 2 ][ N  Y 2 ( Y ) 2 ]

8(209) (38)(35)
r=
[8(240) (38) 2 ][8(193) (35) 2 ]

16721330
r=
(19201444)(15441225)

342 342
r= =r=
(476)(319) 151844

=
342 = +0.878
389.6717

Chapter 9: Page 18
Chapter 9: Page 19
Critical value of r(6) = 0.707 (two-tailed) from Table E.2

Our observed r exceeds this value, so  Reject H0

Conclusion: “There is a significant linear relationship between number of


hours studying and scores on the quiz, r(6) = 0.878, p ≤ 0.05, two-tailed.”

Chapter 9: Page 20

You might also like