Professional Documents
Culture Documents
Regression Analysis
Y
X
Y
(Xi, Yi)
Yi
Xi X
Y
X
Y
X
Y
X
The correlation coefficient is based on the covariance.
For a sample, the covariance is calculated as:
_ _
sxy = (Xi - X)(Yi - Y)
N-1
Interpretation: Covariance tells us how variation in one
variable “goes with” variation in another variable
(“covary”).
Two variables are statistically independent
(perfectly unrelated) when their covariance =
0.
Positive relationships indicated by + value,
negative relationships by a – value.
Problem with Covariance as a measure of
association?
Correlation Coefficient (Pearson’s r)
A way of standardizing the covariance.
rxy = sxy / sxsy
Intepretation: Measures the strength of a linear
relationship.
-1 r 1
X and Y are perfectly unrelated (independent, uncorrelated) iff rxy
=0
What explains variation the generosity of
state welfare expenditures? (STATES – 55)
What explains variation in the generosity of
state welfare expenditures? (STATES – 1570)
738 – Poverty rate
739 – Median Family Income
1644 - %Clinton
1715 – Female State Legislators
Regression is concerned with dependence of one variable
(the dependent variable, measured at the interval/ratio
level) on one or more other variables (independent
variables, measured at the interval, ratio, ordinal or
nominal levels).
X
Y
X
Y
Xi X
Y
Yi
Xi X
Yi = a + bXi
a = Intercept, or Constant = The value
of Y when X = 0
Yi = a + bXi + ei
Regression analysis finds the line that
minimizes the sum of squared residuals
Yi = a + bXi + ei
a = the expected value of Y when X=0
b = the expected change in Y given a one
unit increase in X
Yi = a + bXi + ei
We can calculate a predicted value for the
dependent variable for any value of X by
using the regression equation for the
regression line:
^
Yi = a + bXi
Y
intercept
Xi X
Y
Slope (b)
Xi Xj X
One
unit of
X
Y
Yi
ei
Yi
Xi X
Research Question: Did the butterfly
ballot result in an unusual number of
votes for Pat Buchanan in the 2000
election in Palm Beach Co.?
Did it cost Al Gore the election?
Research Question: Did the butterfly ballot
result in an unusual number of votes for Pat
Buchanan in the 2000 election in Palm
Beach Co.?
Unit of analysis – Fla. Counties (67)
Dependent variable (Y) – vote for Buchanan
in 2000
Independent variable (X) – vote for
Buchanan in 1996 Republican primary
4000
PALM BEACH
3000
2000
1000
PINELLAS
HILLSBOROUGH BROWARD
DUVAL
MARION PASCO
POLK DADE
ESCAMBIA BREVARD
VOLUSIA ORANGE
SANTA
LEON
CITRUS
ALACHUA
BAY ROSA
LAKE
HERNANDO SARASOTA
MANATEE
OKALOOSA LEE
ST JOHNS
CLAY SEMINOLE
CHARLOTTE
PUTNAM
WALTON
SUMTEROSCEOLA
HIGHLANDS
SUWANNEE COLLIER
ST LUCIE
MARTIN
SOTOINDIAN
JACKSON
CALHOUN
WASHINGTON
NASSAU
COLUMBIA
FLAGLER
HOLMES
BAKER
GULF
LEVY
BRADFORD
WAKULLA
OKEECHOBEE
LIBERTY
GADSDEN
DE
FRANKLIN
UNION
DIXIE
GILCHRIST
HARDEE
JEFFERSON
MADISON
HAMILTON
TAYLOR
HENDRY
LAFAYETTE
GLADES
RIVER
MONROE
0
Intercept (a)
Y = 12.957 + .101(X)
PALM BEACH
3000
2000
1000
PINELLAS
HILLSBOROUGH BROWARD
DUVAL
MARION PASCO
POLK DADE
ESCAMBIA BREVARD
VOLUSIA ORANGE
SANTA
LEON
CITRUS
ALACHUA
BAY ROSA
LAKE
HERNANDO SARASOTA
MANATEE
OKALOOSA LEE
ST JOHNS
CLAY SEMINOLE
CHARLOTTE
PUTNAM
WALTON
SUMTEROSCEOLA
HIGHLANDS
SUWANNEE COLLIER
ST LUCIE
MARTIN
SOTOINDIAN
JACKSON
CALHOUN
WASHINGTON
NASSAU
COLUMBIA
FLAGLER
HOLMES
BAKER
GULF
LEVY
BRADFORD
WAKULLA
OKEECHOBEE
LIBERTY
GADSDEN
DE
FRANKLIN
UNION
DIXIE
GILCHRIST
HARDEE
JEFFERSON
MADISON
HAMILTON
TAYLOR
HENDRY
LAFAYETTE
GLADES
RIVER
MONROE
0
2503.55