You are on page 1of 14

Bivariate Analvsis:

Measures of Association

WHAT YOU WILL LEARN


IN THIS CHAPTEH:

To give examples of the types of business questions that may be answered by ana.
.l.
lyzing the association between two variables
T.o,list,the common procedures for measuring association and to discuss how the
measUremert scale will influence the selection of statistical tests.
-rf the simnle
To discuss the concept of
simple cnrelafinn
correlation cneffinicnt
coefficient.

To calculate a simple correlation coefficient and a coefficient of determination,


To understand that correlation does not mean
To interpret a correlation

causation.

matrix.

To explain the concept ofbivariate linear

'r":

':,,1,.:,,t',"

regression.

,.,

To identify the intercept and slope coefficients.


To discuss the least-squares method of regression
To draw a regression

analysis.

line.

To test the statistical significance of a least-squares

t
,,t'

.,,..,,,,1

:::.:...:

regression.

To calculate the intercept and slope coefficients in a bivariate linear regression.


To interpret analysis of variance summary tables for linear

regression.

..lllr,

...:,
,,.1.:

Z2

CHAPTER

Bivariate Ar-ralr,'sis: N4easures of Association

IiXHIBI'l' 22.2 Bivariatc Analvsis - Comnron

551

Procedures for 'festing ,\ssociaticin

Sample question

Measure of association

Measurement levela

Are dollar sales


associated with advertising
dollar expenditures?

sn coeff icie nt
{Pearson's r)

Co rfe,lati
E.iva,rf

iat0, 1g,gl'gss,iO n an.a|ysis

r:::,-ij,,:,:,1]:ri:,::::::::,:::::::::::::::::::
ii

'::

.:;:::::::::::::::::':

:::::::j:::::::::::i::::::::::i::::r

1 .

: .:

.::..::.i::':

:: ..:.

ls rank preterence for


shopping centers associated
with Likert scale ranking
of convenience of locations?

iiiiii.iiiiiiiiiiiii""'

::i:::::

... .,.,.......,H.iffi.1H#l.ii..h.dill$*...

Chi-square
Spearman rank correlation
Kendall's rank correlation

::::l:::llllllllll:::::llt::lll..,l..6,h,i;squaro
::::::i:i::::::i:.i::::i::::i,ii:i:i:;i:Hl

""

ls sex associated with


brand awareness (aware/
not aware)?

liisoef{ic|efl'f''

Cu,nti nge ncy coeff i cie nt

,lf at least one of the two variables has

a given level of measurement, the appropriate procedure is the one with the lewest assumptions about

the data.

SIN,{PLE CORRELATION COEFFICIENT


The most popular technique that indicates the relationship of one variable to another
simple correlation
coeff icient
A statistical measure of the
covariation of or association
between two variables.

is simple correlation analysis. The simple correlation coefficient is a statistical


measure of the covariation or association between two variables. The correlation
coefficient (r) ranges from +1.0 to -1.0. If the value of r is 1.0, there is a perfect
positive linear (straightJine) relationship. If the value of r is -1.0, a perfect negative
linear relationship or a perfect inverse relationship is indicated. No correlation is
indicated if r = 0. A correlation coefficient indicates both the magnitude of the linear relationship and the direction of the relationship. For example, if we find that
the value of r = -.92, we know we have a relatively strong inverse relationship.
That is, the greater the value measured by variable
by variable L

the less the value measured

The formula for calculating the correlation coefficient for two variables X
and

Iis:
("

where the symbols

r"

I(X,- X)V,-Yt

l\x,-

-);,,>ff,

- Yr

and Y represent the sample means of X and

respectively.

517

PART

VI

Data Analysis and Presentation

An alternative way of expressing the correlation formula is:

rr_

orv

ryx_

\/oW,

where
o? = variance ofX
o? = vaiance of Y

ofX and

o,r' = covoriance

with

r(X,

o" =-

- Iog,-

Yt

lg

If associated values of X, and Yr differ from their means in the same direction.
then their covariance will be positive. Covariance will be negative if the values of Xi
and Y, have a tendency to deviate in opposite directions.

EXHIBTT ?2,7
Scatter Diagrams

r = .30

Illustrating Correlation

OO

oo

oo
ao
o

Patterns

o'
ao
a'
o

aO

Oa
oa'

Oa

ooo
OO
aa

aO

aO

aa

ao
aO
o

+1 .0

aa

Oa
OO

o
a

aa

f=.80

aO

'a
aa
a

o
a

Low Positive

High Positive

Perfect Positive

Correlation

Correlation

Correlation

f=0

= -.60

I = -1 .0

a
o

ooo
o

'
o 'a
OO

'a
oo
o

a
o

Moderate Negative
Correlation

Perfect Negative
Correlation

CHAPTER

22

Bivariate An:rlvsis: N,leasures of ,{.ssociartion

557

In actuality, the simple correlation coefficient is a standardized measure of covariance. In the formula the numerator represents covariance and the denominator is
the square root of the product of the sample variances. Researchers find the correlation coefficient useful because two correlations can be compared without regard to
the amount of variation exhibited by each variable separately.
Exhibit 22.3 ilhstrates the correlation coefficients and scatter diagrams for several sets of data.

An Erarnple
To illustrate the calculation of the correlation coefficient, an investigation is made to
determine if the average number of hours worked in manufacturing industries is related to unemployment. A correlation analysis on the data in Table 22.1 is used to
determine if the two variables are associated.
The correlation between the two variables is -.635, which indicates an inverse relationship. Thus when the number of hours worked is high, unemployment is low.
This makes intuitive sense. If factories are increasing output, regular workers typically work more overtime and new employees are hired (reducing the unemployment rate). Both variables are probably related to overall economic conditions.

Correlation and Causation


It is important to remember that correlation does not meancausation. No matter how
highly conelated the rooster's crow is to the rising ofthe sun, the rooster does not
cause the sun to rise. It has been pointed out that there is a high correlation between
teachers' salaries and the consumption of liquor over a period of years. The approxi-

mate correlation coefficient is r = .9. This high correlation does not indicate that
teachers drink, nor does it indicate that the sale of liquor increases teachers' salaries.
It is more likely that both teachers' salaries and liquor sales covary because they are

both influenced by a third variable, such as long-run growth in national ircome


and./or population.

In this example relationship between the two variables is apparent but not real.
Even though the variables are not causally related, they can be statistically related.
Researchers who examine
statistical relationsh ips m ust
be aware that the variables
may not be causally related.

fr

-;
!t

554

TABLtr

PART

VI

Data Ar-ralysis and Presentatior-r

22.1

Correlation Analvsis
of Number of
Hours \\'orked in
N,Ianufacturing
Industries w'ith

unemplor,'ment Rate

Number

Unemployment of Hours
Rate

(X,)

Worked

(Y;) X,- X

(X,- X),

5.5

39.6

0.51

0.2601

4.4

40.7

-0.59

0.3481

4.1

40.4

-0.89

0.7921

4.3

39.8

-0.69

0.4761

6.8

39.2

1.81

3.2761

5.5

40.3

0.51

0.260

5.5

39.7

0.51

0.2601

6.7

39.8

.71

2.9241

5.5

40.4

0.51

0.2601

5.7

40.5

0.71

0.5041

5.2

40.7

0.21

0.0441

4.5

41 .2

-0.49

0.2401

3.8

41 .3

-1.19

.4161
.4161

3.8

40.6

-1 .19

3.6

40.7

-1 .39

1.9321

3.5

40.6

-1 .49

2.2201

4.9

39.8

-0.09

0.0081

5.9

39.9

0.91

0.8281

5.6

40.6

0.61

0.3721

Y,

-Y

-0.71
0.39
0.09
-0.51
-1 .11
-0.01
-0.61
-0.51
0.09
0.1 9
0.39
0.89
0.99
0.29
0.39
0.29
-0.51
-0.41
0.29

(Y,

- Y)' (X,-

X)(V,

0.5041

-0.3621

0.1521

-0.2301

0.0081

-0.0801

0.2601

0.3519

.2321

-2.0091

0.0001

-0.0051

0.3721

-0.31

0.2601

-0.8721

0.0081

0.0459

0.0361

0.1 349

0.1521

0.0819

0.7921
0.9801

-0.4361
-1 .1781

0.0841

-0.3451

0.1 521

-0.5421

0.0841

-0.4321

0.2601
0.

681

0.0841

-D

0.0459
-0.3731
0.1 769

X = 4.99
Y

= 40.31

I(X,-X)r-17.8379

>(f-Y)'=5.5899
Y) - -6.338e
2(X,-

z(X,- xl(Y,-

xlI - Y)

2(X,- x)r2(Y,- Y)'


-6.3389

=:f-6.3389
ge.ttz
=

-.635

This can occur because both are caused by a third (or more) factor(s). When this is
so, the variables are said to be spuriously related.
coefficient of
determination (r2)
A measure of that portion of
the total variance of a variable
that is accounted for by
knowing the value of another
variable.

C oeffi cient of Determination


If we wish to know the proportion of variance in I explained by X (or vice versa).
we can calculate the coefficient of determination by squaring the correlation
coefficient (r2):
t

.
--

Explained variance
Total variance

t)6

PART

'f ABLE

22.2

VI

Pearson Product-Nlornent Correlation Nlatrix for Sales N'Ianagement Example,'

Variables

S
JS
GE
SE
OD
Vl
JT
RA
TP
WL

Data Analysis and Presentartion

JS

GE

Performance

1.00

Job satisfaction

.45b 1.00
.31b .10 1.00
.61b .28b .36b
.05 -.03 -.44b
-.36b -.13 -.14

Generalized self-esteem
Specific self-esteem
Other-directedness

Verbal intelligence
Job-related tension
Role ambiguity

Territory potential
Workload

_.48b _.56b _.32b

-.26" -.24" -.32b


.49b .31b .04
.45b .1 1 .29"

SE

OD

VI

JT

RA

TP

WL

1.00

-.24"

1.00

-.11

8d
.26b
.38b
.09
-.04

-.34b
-.39b
.zgb

.29"

-.1

.00

-.02
-.05
-.09

-j2

1.00

.44b 1.00
-.38b -.26b 1.00
_.27" _.22d .4gb

1.00

"Numbers below the diagonal are for the sample. Those above the diagonal are omitted.

op

<

.05.

REGRESSION AN'ALYSIS

bivariate linear regression


A measure of linear
association that investigates a
straight-line relationship of the
tyOe Y: 3 * pX, where Y is
the dependent variable. X is
the independent variable, and
a and B are two constants to
be estimated.

intercept
An intercepted segment of a
iine. The point at which a
regression lrne intersects the
Y-axis.

slope
The inclination of a regression
line as compared to a base
line. Rise (vertical distance)
over run (horizontal difference),

Regression is another technique for measuring the linear association between a dependent and independent variable. Although regression and correlation are mathematically related, regression assumes the dependent (or criterion) variable, I, is
predictively linked to the independent (or predictor) variable, X. Regression analysis
attempts to predict the values of a continuous, interval-scaled dependent variable
from the specific values ofthe independent variable. For example, the amount ofexternal funds required (the dependent variable) might be predicted on the basis of
sales growth rates (independent variable). Although there are numerous applications
of regression analysis, forecasting sales is by far the most common.
The discussion here concerns bivariate linear regression. This form of regression investigates a straight-line relationship of the type Y = a + 9X, where I is the
dependent variable and X is the independent variable and a and B are two constants
to be estimated. The symbol a represents the I intercept and B is the slope coefficient. The slope B is the change in Idue to a corresponding change in one unit ofX.
The slope may also be thought of as "rise over run" (the rise in units on the I axis divided by the run in units along the X axis.) (The A is the notation for "a change in.",
Suppose a researcher is interested in forecasting sales for a construction distributor (wholesaler) in Florida. Further, the distributor believes a reasonable associatioi
exists between sales and building permits issued by counties. Using bivariate linea:
regression on the data in Table 22.3, the researcher will be able to estimate sales potential (Y) in various counties based on the number of building permits (X).
For a better understanding of the data in Table 22.3, the data can be plotted on
"
scatter diagram (Exhibit 22.4).ln the diagram the vertical axis indicates the value c:
the dependent variable I and the horizontal axis indicates the value of the independent variable X. Each point in the diagram represents an observation of the X and i'
at a given point in time, that is, the paired values of Y arrd X. The relationshr:

CIIAP'|ER

22

ss7

Bivrrriatte An:rlvsis: \'leasttres of Associ:rtiorr

Regression: One Step Backward


..iF.
l.r+
lrl*r

:|:;jr:,:;,1:;:::;i;::';;

,.,,it:,f

,\J,

:::,:::::t:):a::::t:)):)

i:::fim

]ffi
lia.d

The essence of a dictionary


definition of the word "re-

gression" is a going back


or moving backward. This

notion of regressing, that


things "go back to Previous
conditions," was the source

for the original concept of statistical regression. Gal-

ton, who first worked out the concept of correlation,


got the idea from thinking about "regression toward

mediocrity,o' a phenomenon observed in studies of inheritance. "Tall men will tend to have shorter sons,
and short men taller sons. The sons' heights, then,
tend to 'regress to,' or 'go back to,' the mean of the
population. Statistically, if we want to predict Y and X
and the correlation between X and Y is zero, then our
best prediction is to the mean." (lncidentally, the symbol r, used for the coefficient of correlation, was origi-

nally chosen because it stood for "regression.")

between X and Y could be "eyeballed," that is, a straight line could be drawn through
the points in the figure. However, such a line would be subject to human error. Two
researchers might draw different lines to describe the same data.

least-squares rnethod
A mathematical iechnique
ensuring that the regression
line will hest represent the
linear relationship between

X and

Y.

Least-Sciuares \Iethod of Regressinn .\nalvsis


The task of the researcher is to find the best means for fltting a straight line to the
data. The least-squares method is a relatively simple mathematical technique that
ensures that the straight line will best represent the relationship between X and Y.
The logic behind the least-squares technique goes as follows. No straight line can
completely represent every dot in the scatter diagram. Unless there is a perfect

'I'atble 72.3

Relationsliil> of Salcs
Potential to Rtrilcling
Pernrits Issrrecl

Dealer

Dealer's Sales
Volume (000)

x
Building
Permits

77

86

79

93

80

95

83

104

101

139

117

180

129

165

I
I

120

147

97

119

10

106

132

11

99

126

12

121

156

13

103

129

14

86

96

15

99

108

558

PART

VI

Data Analysis and Presentation

EXHIBIT 22,4
Scatter Diagram and
Eyeball Forecast

165
160
155
150

My

145

line
140
135

130

12s
120

t'

115
110

Your

105

line

100
95
90
85
80

85

95

105

115

125

135

145

155

165

175

18s

195

correlation between two variables, there will be a discrepancy between most of the
actual scores (each dot) and the predicted score based on the regression line. Simply
stated, any straight line that is drawn will generate errors. The method of least
squares uses the criterion of attempting to make the least amount of total error in
prediction of Y from X. More technically, the procedure used in the least-squares
method generates a straight line, which minimizes the sum of squared deviations of
the actual values from this predicted regression line. Using the symbol e to represent
the deviations ofthe dots from the line, the least-squares criterion is:

Le?

is-iri*r*

where
residual
The difference between the
actual value of the dependent
variable and the estimated

value of the dependent


variable in the regression
equation.

i=Yi- i, (the "residual")


I; = actual value of the dependent variable
i, = estimated value of the dependent variable (Yhat)
n = number of observations
i = number of the observation

22

CHAPTER

559

Bivariate Analysis: Measures of Association

The general equation of a straight line equals


priate equation includes an allowance for error:

=a

BX, whereas a more appro-

Y=6+BX+e
The symbols A and B ate utilized when the equation is a regression estimate of
the line. Thus, to comPute the estimated values of a and 9, *. use the following
formulas:

A
p-

n(>xY)

- (>x)(Ir)

and

6=V - 0X
where

- estimated slope of the line (the "regression


- estimated intercept of the y axis
Y - dependent variable
Y - mean of the dePendent variable
X - independent variable
X - mean of the independent variable
0
A

coefficient")

n = number of observations

tl
195

TABLtr

22.4

Least-Squares

Computation

rf the
mply
least

ror in
pares
ms

of

resent

Dealer
177
279
380
483
5
6
7
8
997
10
11
12
13
14
15

)e

XY

5,929

86

7,396

6,622

6,241

93

8,649

7,347
7,600

6,400

95

9,025

6,889

104

10,816

8,632

101

10,201

139

19,321

14,039

117

13,689

180

32,400

21 ,060

129

16,641

165

27,225

21

120

14,400

147

21 ,609

,285
17,640

't

19

14,161

11,543

132

17,424

13,992

9,409
106

11,236

99

9,801

126

15,876

12,474

121

14,641

156

24,336

18,876

103

10,609

129

16,641

13,287

86

7,396

96

9,216

8,256

99

7 = 99.8

9,801

108

11,664

10,692

>Y2 = 153,283

2X - 1,W5

>X2 = 245,759

>xY= 193ffi

X -125

560

I']AR'f

VI

Data .\nalvsis uncl I'rcsentertion

These equations may be solved by simple arithmetic (see Table 22.4). To estimate the relationship between the distributor's sales to a dealer and the number of
building permits, the following manipulations are performed:

0-

- (>))(Ir;
- (I4'
5( 93,345.) - 2,906,975
15(215 ,l 59) - 3,5 15 ,625
2,900,115 - 2,906,,915
3,686,38s - 3,5ts,62s
n(ZxY)

n(2X2)

93,300
110,160

= .54638

h=Y - gX
= 99.8

.54638(125)

= 99.8

68.3

= 31.5
The formula i' = 31.5 + 0.546X is the regression equation used for the prediction of
the dependent variable. suppose the wholesaler considers a new dealership in an
area where the number of building permits equals 89. Sales may be forecast in this
area as:

i'=

31.5 + .546

(n

= 31.5 + .s46 (89)


= 31.5 + 48.6
= 80.1
Thus our distributor may expect sales of 80. I in this new area.s
Calculation of the cor:relation coefficient gives an indication of how accurate the
predictions may be. In this example the correlation coefficient is r = .9356, and the
coefficient of determination is 12 = .8754.

i ila* ilig ii it,,:i;;rrllrri !,rrrr:


To draw a regression line on the scatter diagram, only two predicted values of Ineed plotting. For example, if Dealer 7 and Dealer 3 are used,
culated to be 121.6 and 83.4:

Dealer 7 (actual Ivalue

t,

and

?rwill

be cal-

129): f', =31.5 +.546(165)

=
Dealer 3 (actual Y value = 80):

I,

121.6

= 31.5 + .546(95)

= 83.4

once the two Y values have been predicted, a straight line connecting the points

121 .6, Xt = 165, and i, = 83.4, X1= 95 can be drawn.


Exhibit 22.5 shows the regression line. If it is desirable to determine the error (residual) of any observation, the predicted value of r is flrst calculated. The predicted
value is then subtracted fiom the actual value. For example, the actual observation

?t

567,

PART

VI

Data Analysis

ar-rd Presentatior-r

trXHIBT'I'22.6
Scatter f)iagranr
of fhplained ancl
Llnerplainecl Yariation

Dea ler B

actua I sales

130

\
120

110

\o

$ry
Yi- Y = Deviation
explained by regression

100

90

AY
AX

80

100

120

110

130

140

150

160

170

180

using r, - Y; rather than { - 7. ttris is the "explained" deviation due to the regression. The smaller number 8.2 is the deviation not explained by the regression.
Thus the total deviation can be partitioned into two parts:

(y,-V)
Total
deviation

=1?,-r1 +g,-?;

Deviation Deviation
by + unexplained
=
explained
the

regression

by

the regression
(residual error)

where

7 = mean of the total group

= value predicted with regression equation

Yi

= actual value

For Dealer 8 the total deviation is 120 - 99.8 = 20.2, the deviation explained by the
is I 1 1.8 - 99.8 = 12, and the deviation unexplained by the regression is
120 - 111.8 = 8.2. If these values are summed over all values of y,(i.e., all observations) and squared, these deviations provide an estimate of the variation of r explained by the regression and unexplained by the regression:
regression

Z(y,- y), = I(r, - y), + 2(y,_ t,),


Total
Explained Unexplained
variation =variation + variation

explained

(residual)

we have thus partitioned the total sum of squares, ssr, into two parts: the regression sum of squares, SSr, and the error sum of squares, SSe..
SSr-SSr+SSe

CHAPTER

22

Bivariate Analvsis: N4easures of Association

The Concept of Beta When Investirg in Stocks

Suppose a regression was


run with the historic realized
rate of return on a particular
stock (K ) as the dependent
variable and the historic realized rate of return on the stock market ( K*1.
The tendency of a stock to move with the market
is reflected in its beta coefticient, which is a measure of the stock's volatility relative to an average
stock. Betas are discussed at an intuitive level in this
section.
An average risk sfock is defined as one which
tends to move up and down in step with the general
market as measured by some index such as the Dow
Jones or the New York Stock Exchange lndex. Such a
stock will, by definition, have a beta (g) of 1.0, which
indicates that, in general, if the market moves up by
10 percent, the stock will also move up by 10 percent,
while if the market falls by 10 percent, the stock will
likewise fall by 10 percent, A portfolio of such g = 1.0
stocks will move up and down with the broad market
averages and will be just as risky as the averages. lf
B = 0.5, the stock is only half as volatile as the market-it will rise and fall only half as much-and a portfolio of such stocks is half as risky as a portfolio of
F = 1.0 stocks. On the other hand, if p :2.A, the stock
is twice as volatile as an average stock, so a portfolio
of such stocks will be twice as risky as an average
portfolio.
Betas are calculated and published by Merrill

Lynch, Value Line, and numerous other organiza-

tions. The beta coefficients of some well-known companies, as calculated by Merrill Lynch, are shown in
the table below. Most stocks have betas in the range
of 0.75 to 1.50, The average for all stocks is 1.0 by
definition. A list of beta coefficients is given below:
Stock

Beta

Apple Computer

1.60

Union Pacific

1.43

Georgia-Pacific

1.36

Mattel

General Electric

1.09

.15

Bristol Myers

1.00

General Motors

0.94

McDonald's

0.93

Procter & Gamble

0.80

IBM

0.70

Anheuser-Busch

0.58

Pacific Gas & Electric

4.47

lf a high-beta stock (one whose beta is greater


than 1,0) is added to an average risk (F : 1.0) portfolio,
then the beta and consequently the riskiness of the
portfolio will increase. Conversely, if a low-beta stock
(one whose beta is less than 1.0) is added to an average risk portfolio, the portfolio's beta and risk will decline. Thus, because a sfock's beta measures ifs
contribution to the riskiness of the portfofio, beta is
the appropriate measure of the stock's riskiness.

F-test
A procedure used to
determine if there is more
variability in the scores of one
sample than in the scores of

An F-test or an analysis ofvariance applied to regression can be used to test relative


magnitude of the SSr and SSe with their appropriate degrees of freedom. Table 22.5
indicates the technique for conducting the F-test.

another sample.

'l':\ULIi,22.5
Analvsis ol- \'ariance
'l':rble fr;r llivariatc

Source of Variation

Rc-gre ssion

Explained by regression
Unexplained (error)

where k
/?

Degrees of
Freedom

k-1
n- k

hUrTlber of estimated parameters (variables)


r'lurT'lber of observations

Sum of Squares

>(V,- V1,
- >(Y,- Y)'

Mean Square
(Variance)

k-1
-k

SSr =

SSrl

SSe

SSeln

PART

564

TABLtr

22.6

Analvsis of Yariance

Summarr''I-able for
Regression of Sales on
Building Pennits

VI

Data Analysis and Presentation

summarY table
A table that Presents the
results of a regression
calculation.

Mean Square

F-Value

3398.49

3398.49

Explained bY regression

91 .30

Unexplained by regression (error)


Total

analysis of variance

d.f

Sum of Squares

Source of Variation

483.91

1!

3882.40

14

37.22

Fortheexampleonsalesforecasting,theanalysisofvariancesummarytable'
comparingrelativemagnitudesofthemeanSquare,ispresentedinTable22.6,From

Table6intheAppendixwefindthattheF-valuegl.3,withldegreeoffreedomin
probabil-

denominator, exceeds the


the numerator and 13 degrees of freedom in the
the proportion of variaity level of .01. The ,orfiri"nt of determinatio-n, rz,reflects
tiln explained by the regression line' To calculate r2:

- SSr=, _F
r.=lS
SSe

In our example, 12 is calculated to be '875:

"

3398.49

"=ffii

='875

to mean that 87 percent of the


The coefficient of determination may be interpreted
with building permits'
variable
the
by associating
variation in sales *u,

"^ftuir.d

SUMN,TARY
associated. Many bivariate statisIn many situations two variables are interrelated or
Researchers select the appropritical techniques can be used to measure association.
of measurement'
technique on the basis of each variable's scale
ate

Thecorrelationcoefficient(r),astatisticalmeasureofassociationbetweentwo

positive correlation to r = -1'0 for a pervariables, ranges from r = +1 .0 for a perfect


for r = 0. Simple correlation is
fect negative correlation. No correlaiion is indicated

themeasureoftherelationshipofonevariabletoanother.Thecorrelationcoefficient
of that
of the association of two variables and the direction
indicates the strength

association.Itmustberememberedthatcorrelationdoesnotprovecausation,as
deter-

involved' The coefficient of


variables other than those being measured may be
in the dependent variable
mination (rz) measures th" uriount of the total variance
independent variable' The results
that is accounted for by knowing the value of the
in a correlation matrix'
of correlation computations -" oft"' presented
relationship between one deBivariate tin"* r"gr"rrion investigates a straight-line
regression can be done intuitivell
pendent variable anO one independeni variable. The
y
line to fit the obfv prouing a scatter aiagram of the X and points and drawing a
the best-fitdetermines
mathematically
served relationship. rneieast-squares method
may be
method
this
determined by
ting regression line for tlre observed data. The line
independent
the
for
given a value
used to forecast values of the dependent variable,

CHAPTER

22

;i1,:

Bivariate Analtsis: Measures of Association

iEL.

:fldr
_,ar
.$ml

8. A football team's

season ticket sales, percentage of games won, and number

of

ffi
i.dlH

{ffij

..=:'

ffi'

active alumni are given below:

fp
'{&

Year

{&

Number of

Season
Ticket Sales

Percentage of
Games Won

Active Alumni

qif,.

4'i,,

985

4,995

40

NA

986

8,599

54

NA

ffi
i#

987

8,479

55

NA

988

8,419

58

NA

989

10,253

63

NA

990

12,457

75

6,315

991

13,285

36

6,860

1992

14,177

27

8,423

993

15,730

63

9,000

1,rffi

,*
r;ie,

a. Interpret the correlation between each variable.


b. Calculate: Regression sales = Percentage of games won.
c. Calculate: Regression sales = Number of active alumni.

9. Are the different forms of consumer installment credit in the table below highly
correlated? Explain.
Credit Card Debt Outstanding (Millions of Dollars)

Gas
Year
1

Cards

and
Cards

Travel

Bank

Entertainment Credit

e3e

$61

1,119

76

Retail
Cards

Gards

828

1,312

Total
Credit
Cards

Total

lnstallment
Credit

$ 79,428

9,400

$1 1,229

10,200

12,707

87,745
98,1 05

1,298

110

2,639

10,900

14,947

1,650

122

3,792

11,500

17,064

102,064

1,804

132

4,490

13,925

20,351

11

1,762

164

5,408

14,763

22,097

127,332

1,832

191

6,838

16,395

25,256

147,437

1,823

238

9,281

17,933

28,275

156,124

1,993

273

9,501

18,002

29,669

164,955

10

1,981

238

1,351

19,052

32,622

185,489

11

2,074

284

14,262

,082

37,702

216,572

21

1,295

10.

A manufacturer of disposable washcloths/wipes told a retailer that sales for this


product category closely correlated with the sales of disposable diapers. The retailer thought he would check this out for his own sales-forecasting purposes.
Where might a researcher find data to make this forecast?

11.

The Springfield Electric Company manufactures electric pencil sharpeners. The


company believes that sales are correlated with the number of workers employed in specific geographic al areas. The following table presents Springfleld's

You might also like