Método Regresión

Bivariate Analvsis:
Measures of Association
WHAT YOU WILL LEARN

IN THIS CHAPTEH:
To give examples of the types of business questions that may be answered by ana.
.l.
lyzing the association between two variables
T.o,list,the common procedures for measuring association and to discuss how the
measUremert scale will influence the selection of statistical tests.
-rf the simnle
To discuss the concept of
simple cnrelafinn
correlation cneffinicnt
coefficient.
To calculate a simple correlation coefficient and a coefficient of determination,

To understand that correlation does not mean
To interpret a correlation
causation.
matrix.
To explain the concept ofbivariate linear
'r":
':,,1,.:,,t',"
regression.
,.,
To identify the intercept and slope coefficients.

To discuss the least-squares method of regression
To draw a regression
analysis.
line.
To test the statistical significance of a least-squares
t
,,t'
.,,..,,,,1
:::.:...:
regression.
To calculate the intercept and slope coefficients in a bivariate linear regression.

To interpret analysis of variance summary tables for linear
regression.
..lllr,
...:,
,,.1.:
Z2
CHAPTER
Bivariate Ar-ralr,'sis: N4easures of Association
IiXHIBI'l' 22.2 Bivariatc Analvsis - Comnron
551
Procedures for 'festing ,\ssociaticin
Sample question
Measure of association
Measurement levela
Are dollar sales

associated with advertising
dollar expenditures?
sn coeff icie nt
{Pearson's r)
Co rfe,lati
E.iva,rf
iat0, 1g,gl'gss,iO n an.a|ysis
r:::,-ij,,:,:,1]:ri:,::::::::,:::::::::::::::::::
ii
'::
.:;:::::::::::::::::':
:::::::j:::::::::::i::::::::::i::::r
1 .
: .:
.::..::.i::':
:: ..:.
ls rank preterence for

shopping centers associated
with Likert scale ranking
of convenience of locations?
iiiiii.iiiiiiiiiiiii""'
::i:::::
... .,.,.......,H.iffi.1H#l.ii..h.dill$*...
Chi-square
Spearman rank correlation
Kendall's rank correlation
::::l:::llllllllll:::::llt::lll..,l..6,h,i;squaro
::::::i:i::::::i:.i::::i::::i,ii:i:i:;i:Hl
""
ls sex associated with

brand awareness (aware/
not aware)?
liisoef{ic|efl'f''
Cu,nti nge ncy coeff i cie nt
,lf at least one of the two variables has
a given level of measurement, the appropriate procedure is the one with the lewest assumptions about
the data.
SIN,{PLE CORRELATION COEFFICIENT

The most popular technique that indicates the relationship of one variable to another
simple correlation
coeff icient
A statistical measure of the
covariation of or association
between two variables.
is simple correlation analysis. The simple correlation coefficient is a statistical

measure of the covariation or association between two variables. The correlation
coefficient (r) ranges from +1.0 to -1.0. If the value of r is 1.0, there is a perfect
positive linear (straightJine) relationship. If the value of r is -1.0, a perfect negative
linear relationship or a perfect inverse relationship is indicated. No correlation is
indicated if r = 0. A correlation coefficient indicates both the magnitude of the linear relationship and the direction of the relationship. For example, if we find that
the value of r = -.92, we know we have a relatively strong inverse relationship.
That is, the greater the value measured by variable
by variable L
the less the value measured
The formula for calculating the correlation coefficient for two variables X
and
Iis:
("
where the symbols
r"
I(X,- X)V,-Yt
l\x,-
-);,,>ff,
- Yr
and Y represent the sample means of X and
respectively.
517
PART
VI
Data Analysis and Presentation
An alternative way of expressing the correlation formula is:
rr_
orv
ryx_
\/oW,
where
o? = variance ofX
o? = vaiance of Y
ofX and
o,r' = covoriance
with
r(X,
o" =-
- Iog,-
Yt
lg
If associated values of X, and Yr differ from their means in the same direction.
then their covariance will be positive. Covariance will be negative if the values of Xi
and Y, have a tendency to deviate in opposite directions.
EXHIBTT ?2,7
Scatter Diagrams
r = .30
Illustrating Correlation
OO
oo
oo
ao
o
Patterns
o'
ao
a'
o
aO
Oa
oa'
Oa
ooo
OO
aa
aO
aO
aa
ao
aO
o
+1 .0
aa
Oa
OO
o
a
aa
f=.80
aO
'a
aa
a
o
a
Low Positive
High Positive
Perfect Positive
Correlation
Correlation
Correlation
f=0
= -.60
I = -1 .0
a
o
ooo
o
'
o 'a
OO
'a
oo
o
a
o
Moderate Negative
Correlation
Perfect Negative
Correlation
CHAPTER
22
Bivariate An:rlvsis: N,leasures of ,{.ssociartion
557
In actuality, the simple correlation coefficient is a standardized measure of covariance. In the formula the numerator represents covariance and the denominator is
the square root of the product of the sample variances. Researchers find the correlation coefficient useful because two correlations can be compared without regard to
the amount of variation exhibited by each variable separately.
Exhibit 22.3 ilhstrates the correlation coefficients and scatter diagrams for several sets of data.
An Erarnple
To illustrate the calculation of the correlation coefficient, an investigation is made to
determine if the average number of hours worked in manufacturing industries is related to unemployment. A correlation analysis on the data in Table 22.1 is used to
determine if the two variables are associated.
The correlation between the two variables is -.635, which indicates an inverse relationship. Thus when the number of hours worked is high, unemployment is low.
This makes intuitive sense. If factories are increasing output, regular workers typically work more overtime and new employees are hired (reducing the unemployment rate). Both variables are probably related to overall economic conditions.
Correlation and Causation

It is important to remember that correlation does not meancausation. No matter how
highly conelated the rooster's crow is to the rising ofthe sun, the rooster does not
cause the sun to rise. It has been pointed out that there is a high correlation between
teachers' salaries and the consumption of liquor over a period of years. The approxi-
mate correlation coefficient is r = .9. This high correlation does not indicate that
teachers drink, nor does it indicate that the sale of liquor increases teachers' salaries.
It is more likely that both teachers' salaries and liquor sales covary because they are
both influenced by a third variable, such as long-run growth in national ircome

and./or population.
In this example relationship between the two variables is apparent but not real.
Even though the variables are not causally related, they can be statistically related.
Researchers who examine
statistical relationsh ips m ust
be aware that the variables
may not be causally related.
fr
-;
!t
554
TABLtr
PART
VI
Data Ar-ralysis and Presentatior-r
22.1
Correlation Analvsis
of Number of
Hours \\'orked in
N,Ianufacturing
Industries w'ith
unemplor,'ment Rate
Number
Unemployment of Hours
Rate
(X,)
Worked
(Y;) X,- X
(X,- X),
5.5
39.6
0.51
0.2601
4.4
40.7
-0.59
0.3481
4.1
40.4
-0.89
0.7921
4.3
39.8
-0.69
0.4761
6.8
39.2
1.81
3.2761
5.5
40.3
0.51
0.260
5.5
39.7
0.51
0.2601
6.7
39.8
.71
2.9241
5.5
40.4
0.51
0.2601
5.7
40.5
0.71
0.5041
5.2
40.7
0.21
0.0441
4.5
41 .2
-0.49
0.2401
3.8
41 .3
-1.19
.4161
.4161
3.8
40.6
-1 .19
3.6
40.7
-1 .39
1.9321
3.5
40.6
-1 .49
2.2201
4.9
39.8
-0.09
0.0081
5.9
39.9
0.91
0.8281
5.6
40.6
0.61
0.3721
Y,
-Y
-0.71
0.39
0.09
-0.51
-1 .11
-0.01
-0.61
-0.51
0.09
0.1 9
0.39
0.89
0.99
0.29
0.39
0.29
-0.51
-0.41
0.29
(Y,
- Y)' (X,-
X)(V,
0.5041
-0.3621
0.1521
-0.2301
0.0081
-0.0801
0.2601
0.3519
.2321
-2.0091
0.0001
-0.0051
0.3721
-0.31
0.2601
-0.8721
0.0081
0.0459
0.0361
0.1 349
0.1521
0.0819
0.7921
0.9801
-0.4361
-1 .1781
0.0841
-0.3451
0.1 521
-0.5421
0.0841
-0.4321
0.2601
0.
681
0.0841
-D
0.0459
-0.3731
0.1 769
X = 4.99
Y
= 40.31
I(X,-X)r-17.8379
>(f-Y)'=5.5899
Y) - -6.338e
2(X,-
z(X,- xl(Y,-
xlI - Y)
2(X,- x)r2(Y,- Y)'

-6.3389
=:f-6.3389
ge.ttz
=
-.635
This can occur because both are caused by a third (or more) factor(s). When this is
so, the variables are said to be spuriously related.
coefficient of
determination (r2)
A measure of that portion of
the total variance of a variable
that is accounted for by
knowing the value of another
variable.
C oeffi cient of Determination

If we wish to know the proportion of variance in I explained by X (or vice versa).
we can calculate the coefficient of determination by squaring the correlation
coefficient (r2):
t
.
--
Explained variance
Total variance
t)6
PART
'f ABLE
22.2
VI
Pearson Product-Nlornent Correlation Nlatrix for Sales N'Ianagement Example,'
Variables
S
JS
GE
SE
OD
Vl
JT
RA
TP
WL
Data Analysis and Presentartion
JS
GE
Performance
1.00
Job satisfaction
.45b 1.00
.31b .10 1.00
.61b .28b .36b
.05 -.03 -.44b
-.36b -.13 -.14
Generalized self-esteem
Specific self-esteem
Other-directedness
Verbal intelligence
Job-related tension
Role ambiguity
Territory potential
Workload
_.48b _.56b _.32b
-.26" -.24" -.32b

.49b .31b .04
.45b .1 1 .29"
SE
OD
VI
JT
RA
TP
WL
1.00
-.24"
1.00
-.11
8d
.26b
.38b
.09
-.04
-.34b
-.39b
.zgb
.29"
-.1
.00
-.02
-.05
-.09
-j2
1.00
.44b 1.00
-.38b -.26b 1.00
_.27" _.22d .4gb
1.00
"Numbers below the diagonal are for the sample. Those above the diagonal are omitted.
op
<
.05.
REGRESSION AN'ALYSIS
bivariate linear regression

A measure of linear
association that investigates a
straight-line relationship of the
tyOe Y: 3 * pX, where Y is
the dependent variable. X is
the independent variable, and
a and B are two constants to
be estimated.
intercept
An intercepted segment of a
iine. The point at which a
regression lrne intersects the
Y-axis.
slope
The inclination of a regression
line as compared to a base
line. Rise (vertical distance)
over run (horizontal difference),
Regression is another technique for measuring the linear association between a dependent and independent variable. Although regression and correlation are mathematically related, regression assumes the dependent (or criterion) variable, I, is
predictively linked to the independent (or predictor) variable, X. Regression analysis
attempts to predict the values of a continuous, interval-scaled dependent variable
from the specific values ofthe independent variable. For example, the amount ofexternal funds required (the dependent variable) might be predicted on the basis of
sales growth rates (independent variable). Although there are numerous applications
of regression analysis, forecasting sales is by far the most common.
The discussion here concerns bivariate linear regression. This form of regression investigates a straight-line relationship of the type Y = a + 9X, where I is the
dependent variable and X is the independent variable and a and B are two constants
to be estimated. The symbol a represents the I intercept and B is the slope coefficient. The slope B is the change in Idue to a corresponding change in one unit ofX.
The slope may also be thought of as "rise over run" (the rise in units on the I axis divided by the run in units along the X axis.) (The A is the notation for "a change in.",
Suppose a researcher is interested in forecasting sales for a construction distributor (wholesaler) in Florida. Further, the distributor believes a reasonable associatioi
exists between sales and building permits issued by counties. Using bivariate linea:
regression on the data in Table 22.3, the researcher will be able to estimate sales potential (Y) in various counties based on the number of building permits (X).
For a better understanding of the data in Table 22.3, the data can be plotted on
"
scatter diagram (Exhibit 22.4).ln the diagram the vertical axis indicates the value c:
the dependent variable I and the horizontal axis indicates the value of the independent variable X. Each point in the diagram represents an observation of the X and i'
at a given point in time, that is, the paired values of Y arrd X. The relationshr:
CIIAP'|ER
22
ss7
Bivrrriatte An:rlvsis: \'leasttres of Associ:rtiorr
Regression: One Step Backward

..iF.
l.r+
lrl*r
:|:;jr:,:;,1:;:::;i;::';;
,.,,it:,f
,\J,
:::,:::::t:):a::::t:)):)
i:::fim
]ffi
lia.d
The essence of a dictionary

definition of the word "re-
gression" is a going back

or moving backward. This
notion of regressing, that

things "go back to Previous
conditions," was the source
for the original concept of statistical regression. Gal-
ton, who first worked out the concept of correlation,

got the idea from thinking about "regression toward
mediocrity,o' a phenomenon observed in studies of inheritance. "Tall men will tend to have shorter sons,
and short men taller sons. The sons' heights, then,
tend to 'regress to,' or 'go back to,' the mean of the
population. Statistically, if we want to predict Y and X
and the correlation between X and Y is zero, then our
best prediction is to the mean." (lncidentally, the symbol r, used for the coefficient of correlation, was origi-
nally chosen because it stood for "regression.")
between X and Y could be "eyeballed," that is, a straight line could be drawn through
the points in the figure. However, such a line would be subject to human error. Two
researchers might draw different lines to describe the same data.
least-squares rnethod
A mathematical iechnique
ensuring that the regression
line will hest represent the
linear relationship between
X and
Y.
Least-Sciuares \Iethod of Regressinn .\nalvsis

The task of the researcher is to find the best means for fltting a straight line to the
data. The least-squares method is a relatively simple mathematical technique that
ensures that the straight line will best represent the relationship between X and Y.
The logic behind the least-squares technique goes as follows. No straight line can
completely represent every dot in the scatter diagram. Unless there is a perfect
'I'atble 72.3
Relationsliil> of Salcs
Potential to Rtrilcling
Pernrits Issrrecl
Dealer
Dealer's Sales
Volume (000)
x
Building
Permits
77
86
79
93
80
95
83
104
101
139
117
180
129
165
I
I
120
147
97
119
10
106
132
11
99
126
12
121
156
13
103
129
14
86
96
15
99
108
558
PART
VI
EXHIBIT 22,4
Scatter Diagram and
Eyeball Forecast
165
160
155
150
My
145
line
140
135
130
12s
120
t'
115
110
Your
105
line
100
95
90
85
80
85
95
105
115
125
135
145
155
165
175
18s
195
correlation between two variables, there will be a discrepancy between most of the
actual scores (each dot) and the predicted score based on the regression line. Simply
stated, any straight line that is drawn will generate errors. The method of least
squares uses the criterion of attempting to make the least amount of total error in
prediction of Y from X. More technically, the procedure used in the least-squares
method generates a straight line, which minimizes the sum of squared deviations of
the actual values from this predicted regression line. Using the symbol e to represent
the deviations ofthe dots from the line, the least-squares criterion is:
Le?
is-iri*r*
where
residual
The difference between the
actual value of the dependent
variable and the estimated
value of the dependent

variable in the regression
equation.
i=Yi- i, (the "residual")

I; = actual value of the dependent variable
i, = estimated value of the dependent variable (Yhat)
n = number of observations
i = number of the observation
22
CHAPTER
559
Bivariate Analysis: Measures of Association
The general equation of a straight line equals

priate equation includes an allowance for error:
=a
BX, whereas a more appro-
Y=6+BX+e
The symbols A and B ate utilized when the equation is a regression estimate of
the line. Thus, to comPute the estimated values of a and 9, *. use the following
formulas:
A
p-
n(>xY)
- (>x)(Ir)
and
6=V - 0X
where
- estimated slope of the line (the "regression

- estimated intercept of the y axis
Y - dependent variable
Y - mean of the dePendent variable
X - independent variable
X - mean of the independent variable
0
A
coefficient")
n = number of observations
tl
195
TABLtr
22.4
Least-Squares
Computation
rf the
mply
least
ror in
pares
ms
of
resent
Dealer
177
279
380
483
5
6
7
8
997
10
11
12
13
14
15
)e
XY
5,929
86
7,396
6,622
6,241
93
8,649
7,347
7,600
6,400
95
9,025
6,889
104
10,816
8,632
101
10,201
139
19,321
14,039
117
13,689
180
32,400
21 ,060
129
16,641
165
27,225
21
120
14,400
147
21 ,609
,285
17,640
't
19
14,161
11,543
132
17,424
13,992
9,409
106
11,236
99
9,801
126
15,876
12,474
121
14,641
156
24,336
18,876
103
10,609
129
16,641
13,287
86
7,396
96
9,216
8,256
99
7 = 99.8
9,801
108
11,664
10,692
>Y2 = 153,283
2X - 1,W5
>X2 = 245,759
>xY= 193ffi
X -125
560
I']AR'f
VI
Data .\nalvsis uncl I'rcsentertion
These equations may be solved by simple arithmetic (see Table 22.4). To estimate the relationship between the distributor's sales to a dealer and the number of
building permits, the following manipulations are performed:
0-
- (>))(Ir;
- (I4'
5( 93,345.) - 2,906,975
15(215 ,l 59) - 3,5 15 ,625
2,900,115 - 2,906,,915
3,686,38s - 3,5ts,62s
n(ZxY)
n(2X2)
93,300
110,160
= .54638
h=Y - gX
= 99.8
.54638(125)
= 99.8
68.3
= 31.5
The formula i' = 31.5 + 0.546X is the regression equation used for the prediction of
the dependent variable. suppose the wholesaler considers a new dealership in an
area where the number of building permits equals 89. Sales may be forecast in this
area as:
i'=
31.5 + .546
(n
= 31.5 + .s46 (89)

= 31.5 + 48.6
= 80.1
Thus our distributor may expect sales of 80. I in this new area.s
Calculation of the cor:relation coefficient gives an indication of how accurate the
predictions may be. In this example the correlation coefficient is r = .9356, and the
coefficient of determination is 12 = .8754.
i ila* ilig ii it,,:i;;rrllrri !,rrrr:

To draw a regression line on the scatter diagram, only two predicted values of Ineed plotting. For example, if Dealer 7 and Dealer 3 are used,
culated to be 121.6 and 83.4:
Dealer 7 (actual Ivalue
t,
and
?rwill
be cal-
129): f', =31.5 +.546(165)
=
Dealer 3 (actual Y value = 80):
I,
121.6
= 31.5 + .546(95)
= 83.4
once the two Y values have been predicted, a straight line connecting the points
121 .6, Xt = 165, and i, = 83.4, X1= 95 can be drawn.

Exhibit 22.5 shows the regression line. If it is desirable to determine the error (residual) of any observation, the predicted value of r is flrst calculated. The predicted
value is then subtracted fiom the actual value. For example, the actual observation
?t
567,
PART
VI
Data Analysis
ar-rd Presentatior-r
trXHIBT'I'22.6
Scatter f)iagranr
of fhplained ancl
Llnerplainecl Yariation
Dea ler B
actua I sales
130
\
120
110
\o
$ry
Yi- Y = Deviation
explained by regression
100
90
AY
AX
80
100
120
110
130
140
150
160
170
180
using r, - Y; rather than { - 7. ttris is the "explained" deviation due to the regression. The smaller number 8.2 is the deviation not explained by the regression.
Thus the total deviation can be partitioned into two parts:
(y,-V)
Total
deviation
=1?,-r1 +g,-?;
Deviation Deviation
by + unexplained
=
explained
the
regression
by
the regression
(residual error)
where
7 = mean of the total group
= value predicted with regression equation
Yi
= actual value
For Dealer 8 the total deviation is 120 - 99.8 = 20.2, the deviation explained by the
is I 1 1.8 - 99.8 = 12, and the deviation unexplained by the regression is
120 - 111.8 = 8.2. If these values are summed over all values of y,(i.e., all observations) and squared, these deviations provide an estimate of the variation of r explained by the regression and unexplained by the regression:
regression
Z(y,- y), = I(r, - y), + 2(y,_ t,),

Total
Explained Unexplained
variation =variation + variation
explained
(residual)
we have thus partitioned the total sum of squares, ssr, into two parts: the regression sum of squares, SSr, and the error sum of squares, SSe..
SSr-SSr+SSe
CHAPTER
22
Bivariate Analvsis: N4easures of Association
The Concept of Beta When Investirg in Stocks
Suppose a regression was

run with the historic realized
rate of return on a particular
stock (K ) as the dependent
variable and the historic realized rate of return on the stock market ( K*1.
The tendency of a stock to move with the market
is reflected in its beta coefticient, which is a measure of the stock's volatility relative to an average
stock. Betas are discussed at an intuitive level in this
section.
An average risk sfock is defined as one which
tends to move up and down in step with the general
market as measured by some index such as the Dow
Jones or the New York Stock Exchange lndex. Such a
stock will, by definition, have a beta (g) of 1.0, which
indicates that, in general, if the market moves up by
10 percent, the stock will also move up by 10 percent,
while if the market falls by 10 percent, the stock will
likewise fall by 10 percent, A portfolio of such g = 1.0
stocks will move up and down with the broad market
averages and will be just as risky as the averages. lf
B = 0.5, the stock is only half as volatile as the market-it will rise and fall only half as much-and a portfolio of such stocks is half as risky as a portfolio of
F = 1.0 stocks. On the other hand, if p :2.A, the stock
is twice as volatile as an average stock, so a portfolio
of such stocks will be twice as risky as an average
portfolio.
Betas are calculated and published by Merrill
Lynch, Value Line, and numerous other organiza-
tions. The beta coefficients of some well-known companies, as calculated by Merrill Lynch, are shown in
the table below. Most stocks have betas in the range
of 0.75 to 1.50, The average for all stocks is 1.0 by
definition. A list of beta coefficients is given below:
Stock
Beta
Apple Computer
1.60
Union Pacific
1.43
Georgia-Pacific
1.36
Mattel
General Electric
1.09
.15
Bristol Myers
1.00
General Motors
0.94
McDonald's
0.93
Procter & Gamble
0.80
IBM
0.70
Anheuser-Busch
0.58
Pacific Gas & Electric
4.47
lf a high-beta stock (one whose beta is greater

than 1,0) is added to an average risk (F : 1.0) portfolio,
then the beta and consequently the riskiness of the
portfolio will increase. Conversely, if a low-beta stock
(one whose beta is less than 1.0) is added to an average risk portfolio, the portfolio's beta and risk will decline. Thus, because a sfock's beta measures ifs
contribution to the riskiness of the portfofio, beta is
the appropriate measure of the stock's riskiness.
F-test
A procedure used to
determine if there is more
variability in the scores of one
sample than in the scores of
An F-test or an analysis ofvariance applied to regression can be used to test relative

magnitude of the SSr and SSe with their appropriate degrees of freedom. Table 22.5
indicates the technique for conducting the F-test.
another sample.
'l':\ULIi,22.5
Analvsis ol- \'ariance
'l':rble fr;r llivariatc
Source of Variation
Rc-gre ssion
Explained by regression
Unexplained (error)
where k
/?
Degrees of
Freedom
k-1
n- k
hUrTlber of estimated parameters (variables)

r'lurT'lber of observations
Sum of Squares
>(V,- V1,
- >(Y,- Y)'
Mean Square
(Variance)
k-1
-k
SSr =
SSrl
SSe
SSeln
PART
564
TABLtr
22.6
Analvsis of Yariance
Summarr''I-able for
Regression of Sales on
Building Pennits
VI
summarY table
A table that Presents the
results of a regression
calculation.
Mean Square
F-Value
3398.49
3398.49
Explained bY regression
91 .30
Unexplained by regression (error)

Total
analysis of variance
d.f
Sum of Squares
Source of Variation
483.91
1!
3882.40
14
37.22
Fortheexampleonsalesforecasting,theanalysisofvariancesummarytable'
comparingrelativemagnitudesofthemeanSquare,ispresentedinTable22.6,From
Table6intheAppendixwefindthattheF-valuegl.3,withldegreeoffreedomin
probabil-
denominator, exceeds the

the numerator and 13 degrees of freedom in the
the proportion of variaity level of .01. The ,orfiri"nt of determinatio-n, rz,reflects
tiln explained by the regression line' To calculate r2:
- SSr=, _F
r.=lS
SSe
In our example, 12 is calculated to be '875:
"
3398.49
"=ffii
='875
to mean that 87 percent of the

The coefficient of determination may be interpreted
with building permits'
variable
the
by associating
variation in sales *u,
"^ftuir.d
SUMN,TARY
associated. Many bivariate statisIn many situations two variables are interrelated or
Researchers select the appropritical techniques can be used to measure association.
of measurement'
technique on the basis of each variable's scale
ate
Thecorrelationcoefficient(r),astatisticalmeasureofassociationbetweentwo
positive correlation to r = -1'0 for a pervariables, ranges from r = +1 .0 for a perfect

for r = 0. Simple correlation is
fect negative correlation. No correlaiion is indicated
themeasureoftherelationshipofonevariabletoanother.Thecorrelationcoefficient
of that
of the association of two variables and the direction
indicates the strength
association.Itmustberememberedthatcorrelationdoesnotprovecausation,as
deter-
involved' The coefficient of

variables other than those being measured may be
in the dependent variable
mination (rz) measures th" uriount of the total variance
independent variable' The results
that is accounted for by knowing the value of the
in a correlation matrix'
of correlation computations -" oft"' presented
relationship between one deBivariate tin"* r"gr"rrion investigates a straight-line
regression can be done intuitivell
pendent variable anO one independeni variable. The
y
line to fit the obfv prouing a scatter aiagram of the X and points and drawing a
the best-fitdetermines
mathematically
served relationship. rneieast-squares method
may be
method
this
determined by
ting regression line for tlre observed data. The line
independent
the
for
given a value
used to forecast values of the dependent variable,
CHAPTER
22
;i1,:
Bivariate Analtsis: Measures of Association
iEL.
:fldr
_,ar
.$ml
8. A football team's
season ticket sales, percentage of games won, and number
of
ffi
i.dlH
{ffij
..=:'
ffi'
active alumni are given below:
fp
'{&
Year
{&
Number of
Season
Ticket Sales
Percentage of
Games Won
Active Alumni
qif,.
4'i,,
985
4,995
40
NA
986
8,599
54
NA
ffi
i#
987
8,479
55
NA
988
8,419
58
NA
989
10,253
63
NA
990
12,457
75
6,315
991
13,285
36
6,860
1992
14,177
27
8,423
993
15,730
63
9,000
1,rffi
,*
r;ie,
a. Interpret the correlation between each variable.

b. Calculate: Regression sales = Percentage of games won.
c. Calculate: Regression sales = Number of active alumni.
9. Are the different forms of consumer installment credit in the table below highly
correlated? Explain.
Credit Card Debt Outstanding (Millions of Dollars)
Gas
Year
1
Cards
and
Cards
Travel
Bank
Entertainment Credit
e3e
$61
1,119
76
Retail
Cards
Gards
828
1,312
Total
Credit
Cards
Total
lnstallment
Credit
$ 79,428
9,400
$1 1,229
10,200
12,707
87,745
98,1 05
1,298
110
2,639
10,900
14,947
1,650
122
3,792
11,500
17,064
102,064
1,804
132
4,490
13,925
20,351
11
1,762
164
5,408
14,763
22,097
127,332
1,832
191
6,838
16,395
25,256
147,437
1,823
238
9,281
17,933
28,275
156,124
1,993
273
9,501
18,002
29,669
164,955
10
1,981
238
1,351
19,052
32,622
185,489
11
2,074
284
14,262
,082
37,702
216,572
21
1,295
10.
A manufacturer of disposable washcloths/wipes told a retailer that sales for this

product category closely correlated with the sales of disposable diapers. The retailer thought he would check this out for his own sales-forecasting purposes.
Where might a researcher find data to make this forecast?
11.
The Springfield Electric Company manufactures electric pencil sharpeners. The

company believes that sales are correlated with the number of workers employed in specific geographic al areas. The following table presents Springfleld's

Método Regresión

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Método Regresión

Uploaded by

Copyright:

Available Formats

Bivariate Analvsis:

WHAT YOU WILL LEARN

To calculate a simple correlation coefficient and a coefficient of determination,

To explain the concept ofbivariate linear

To identify the intercept and slope coefficients.

To test the statistical significance of a least-squares

To calculate the intercept and slope coefficients in a bivariate linear regression.

Bivariate Ar-ralr,'sis: N4easures of Association

IiXHIBI'l' 22.2 Bivariatc Analvsis - Comnron

Procedures for 'festing ,\ssociaticin

Are dollar sales

iat0, 1g,gl'gss,iO n an.a|ysis

ls rank preterence for

ls sex associated with

Cu,nti nge ncy coeff i cie nt

,lf at least one of the two variables has

SIN,{PLE CORRELATION COEFFICIENT

is simple correlation analysis. The simple correlation coefficient is a statistical

the less the value measured

where the symbols

and Y represent the sample means of X and

Data Analysis and Presentation

An alternative way of expressing the correlation formula is:

Bivariate An:rlvsis: N,leasures of ,{.ssociartion

Correlation and Causation

both influenced by a third variable, such as long-run growth in national ircome

Data Ar-ralysis and Presentatior-r

2(X,- x)r2(Y,- Y)'

C oeffi cient of Determination

Pearson Product-Nlornent Correlation Nlatrix for Sales N'Ianagement Example,'

Data Analysis and Presentartion

_.48b _.56b _.32b

-.26" -.24" -.32b

bivariate linear regression

Bivrrriatte An:rlvsis: \'leasttres of Associ:rtiorr

Regression: One Step Backward

The essence of a dictionary

gression" is a going back

notion of regressing, that

for the original concept of statistical regression. Gal-

ton, who first worked out the concept of correlation,

nally chosen because it stood for "regression.")

Least-Sciuares \Iethod of Regressinn .\nalvsis

Data Analysis and Presentation

value of the dependent

i=Yi- i, (the "residual")

Bivariate Analysis: Measures of Association

The general equation of a straight line equals

BX, whereas a more appro-

- estimated slope of the line (the "regression

Data .\nalvsis uncl I'rcsentertion

= 31.5 + .s46 (89)

i ila* ilig ii it,,:i;;rrllrri !,rrrr:

Dealer 7 (actual Ivalue

129): f', =31.5 +.546(165)

121 .6, Xt = 165, and i, = 83.4, X1= 95 can be drawn.

7 = mean of the total group

= value predicted with regression equation

Z(y,- y), = I(r, - y), + 2(y,_ t,),

Bivariate Analvsis: N4easures of Association

The Concept of Beta When Investirg in Stocks

Suppose a regression was

Lynch, Value Line, and numerous other organiza-

Procter & Gamble