Professional Documents
Culture Documents
Lecture 9
Apr 16, 2010
Review
Scatterplot
Pattern, direction
Strength
Unusual observations?
Correlation coefficient r
--properties
--use when there is a linear association
Linear Regression
Sometimes, the value of Y may be thought of as
being dependent on the value of X
Interest is in describing how Y depends on X
Example: data for the heights of 1078 fathers
and sons.
X—height of father; Y—height of son. Y is
dependent on X in some way
If the relationship between X and Y is linear
(scatter plot is football shaped), we can fit a
straight line to the data
The straight line that “best fits” the data, is
called the regression line of Y on X :
y = a + b x (a = intercept, b = slope)
Example of Football
Shaped Scatterplot
Fitting the Regression
Line
To find the slope and the intercept of the
regression line that best fits the data use :
sy
b=r
sx
a = y − bx
This is the line that minimizes the sum of squared
distances (in the y-direction) from the line, so the
regression line is also called the least-square line
Example : Airline Passenger
Booking vs. Hotel Occupancy
Data on the airline passenger booking and hotel occupancy
rate near Orlando, Florida
X = thousands of passengers booked for airline flights by
Eastern Airlines to Orlando International Airport
Y = occupancy rate for Walt Disney World area hotels (in
%)
X 65.7 71.6 53.7 70.2 75.0 85.6 84.6 58.0 72.8 87.6 85.4 50.6
Y 40 41 48 49 73 74 68 51 63 75 70 38
70
occupancy (%)
60
50
40
50 60 70 80 90
booking (thousands people)
Example (cont.)
x = 71.73 s x = 12.82
Scatterplot of occupancy (%) vs booking (thousands people)
y = 57.5 s y = 14.39 80
70
r = .819
occupancy (%)
60
(71.73,57.5) ●
50
sy 14.39
b=r = (. 819) = .9198 40
sx 12.82
50 60 70 80 90