Professional Documents
Culture Documents
Submitted by
Mak Kaboudan & Avijit Sarkar
School of Business, University of Redlands, Redlands, CA 92399, USA
number of bathrooms, age of the house, lot square footage, etc. Ball (1973) provides a
review of the early literature. Recent use of geographic information systems (GIS) helped
add spatial attributes such as distance to schools, distance to parks, distance to city
business center, neighborhood ethnic mix, neighborhood median family income, etc.
when modeling home prices. Can (1998) provides a spatial analytical framework to use
when accounting for neighborhood effects. Including spatial variables along with housing
attributes in specifications of hedonic models clearly adds a new dimension of
complication to the statistical estimation of home price models. Harveston and
Pollakowski (1981) addressed concerns about the functional form to use in their
estimation. The complication is mainly due to spatial autocorrelation. Like temporal
autocorrelation, spatial autocorrelation reduces the efficacy of forecasts obtained when
using standard statistical modeling techniques. Prior work that addressed spatial
dependence either considered geographical coordinates as explanatory variables in the price
model (Clapp, 2003) or modeled the regression residuals spatially (Basu and Thibodeau,
1998). Most existing models both strictly hedonic or those that address spatial dependence
focus on parsimony of estimated equations and produce out-of-sample predictions of prices
for homes sold during the same time period. The sample to forecast and measure model
efficacy typically consists of a withheld percent of the sample of data available to conduct the
research. This means that the time dimension is absent in those models and as a result the
forecasts are less useful when making future decisions since prices of homes change over
time.
The method of modeling residential home prices proposed in this paper aims to move a
step closer to producing parsimonious home price models that take into consideration
median price may be used instead of average price, the median price may fail to represent
homes in its neighborhood accurately if that median-priced house happens to be atypical.
Modeling the average neighborhood price calculation may help smooth out the effects
of unusual homes. To estimate the average neighborhood price model using panel data,
existing methodology suggests use of a generalized least square method (GLS) (Pindyck
and Rubinfeld, 1998).
defined using the county assessors parcel numbers (APN). An APN is a nine digit id of
land parcels assigned by the county when a parcel of land is subdivided (at least in
several western US states). Contiguous subdivisions are assigned consecutive numbers.
For example, if a parcel of land that is 25 acres large gets subdivided into 50 potential
home sites, the 50 new lots get new parcel numbers that relate to the original 25-acre lot
number. To elaborate, assume that before the subdivision, the 25-acre parcel was assigned
the APN 0300-100-00 at some time in the past. After subdivision, the new 50 lots are
assigned new sequential numbers that would be something like: 0300-100-10, 0300-10020, etc., which clearly relate to the original. If this is the case, using the APNs first four
digits of properties in a city (like 0300, 0301, etc.) provides a definition of neighborhoods
that contain a fairly large number of contiguous houses. Selection of the number of digits
to use is dependent on the size of the city. The objective when selecting such number is
that the number of homes per neighborhood satisfies a minimum level imposed by
statistical theoretical constraints. (Results provided later in this paper suggest choosing
that number such that neighborhoods contain ten to 30 homes.)
Average neighborhood price models are different from standard hedonic price models and
from models of housing submarkets. Average price models utilize a much smaller number
of observations that hopefully smooth out of effects of unusual house attributes on
estimated coefficients. Standard hedonic price models utilize a huge number of individual
property observations regardless of the effects of unusual house attributes. Hedonic
models of submarkets estimate a different price equation for each market segment still
using individual home prices. Each equation thus has a lower the number of observations
than standard hedonic models but requires estimating a larger number of equations; one
for each submarket.
Adopting any of the three neighborhood definitions to obtain an average price of a home
was never used before. It is not possible without a GIS framework to identify the
neighborhoods clearly. This idea of modeling home prices assumes that each
neighborhood has an imaginary average house that sells at a price that is determined by
an average square footage, with an average number of bedroom, etc. Besides reducing
spatial correlation problems, this averaging process provides a consistent definition of a
neighborhood that can be easily reproduced. Variations in the average price may then be
explained by the average home attributes, average spatial attributes, and temporal
changes in mortgage rates and average median income. This paper explores the idea of
modeling average neighborhood prices by applying it to four cities in Southern
California. Section 2 contains a description of the neighborhood resolutions for which the
price equations are estimated. Section 3 and 4 introduce the data and methodology,
respectively. Comparisons between standard hedonic model results of individual property
prices and results from models of average neighborhood prices are in section 5. Section 6
has the conclusion.
2. Neighborhood Resolutions
Appropriate neighborhood resolutions are defined using a GIS framework that clusters
houses possibly possessing similar attributes. The framework applies the three spatial
resolutions: census tract (CT), assessors parcel number (PN), and by zip code (ZIP). CT
follows U.S. Census Bureau assigned numbers. All houses sold during a given quarter
within a given census tract number belong to a neighborhood. Only the leftmost 4 digits
of a PN in cities subject of this study define a neighborhood. ZIP+1 code (a subset of
ZIP+4) is the third resolution. ZIP+1 is used because using five-digit ZIP numbers
produced only two neighborhoods for some cities. Addresses of homes sold over the
study period (2000-2005) in four cities each in a different county in Southern California
were geocoded in ArcGIS 9.1. Neighborhood polygons were delineated according to each
of the three resolutions. CT, PN, and ZIP+1 neighborhoods in Burbank of Los Angeles
County are in Figure 1(a), (b), and (c), respectively. CT, PN, and ZIP+1 neighborhoods in
Carlsbad of San Diego County are in Figure 2(a), (b), and (c); while those of Redlands of
San Bernardino County and Riverside of Riverside County are in Figure 3 and 4,
respectively. Given that CT was developed to satisfy city-management administrative
objectives and that ZIP+1 was designed to satisfy the objective of maximizing efficiency
of postal service, PN is expected to work best.
(a) CT Neighborhoods
(b) PN Neighborhoods
(c) ZIP+1 neighborhoods
Figure 1. Resolutions of neighborhoods in Burbank of Los Angeles County
(a) CT Neighborhoods
(b) PN Neighborhoods
(c) ZIP+1 neighborhoods
Figure 2. Resolutions of neighborhoods in Carlsbad of San Diego County
(a) CT Neighborhoods
(b) PN Neighborhoods
(c) ZIP+1 neighborhoods
Figure 3. Resolutions of neighborhoods in Redlands of San Bernardino County
(a) CT Neighborhoods
(b) PN Neighborhoods
(c) ZIP+1 Neighborhoods
Figure 4. Resolutions of neighborhoods in Riverside of Riverside County
3. The Data
A detailed data set containing individual sales and attributes of homes sold in the four
selected counties in Southern California was obtained from DataQuick (2005). Not all
cities in the four counties had consistent data and some had incomplete data. Complete
data with consistent variables were identified for four cities Burbank (BB), Carlsbad
(CB), Redlands (RD), and Riverside (RS). Six years (2000-2005) of available data for the
four cities are selected. Only six years are used because they cover a period of time with
approximately consistent lending rules. It is a period when banks facilitated borrowing
with new lending conditions such as interest only payments and other lending rules that
led to historically relatively low down payments and low monthly mortgage payments.
The period (2000-2005) is thus selected to minimize structural changes in lending rules
that may render inconsistent model estimation results. Data of the first five years (20002004 inclusive) were used to fit different price models for each city. Data for 2005 would
then be used to test the efficacy of one-year-ahead forecasts the models deliver.
Successful models then predict the unknown prices for 2006.
4. Methodology
Similar to standard hedonic individual price models, multiple regression methods apply
when estimating average price models. The average neighborhood price is the dependent
variable and the vector of attributes provides the set of independent variables. Because
the data of average neighborhood prices and attributes is a combination of cross-sectional
and time series observations or panel data, standard OLS multiple regression are not
suitable as mentioned earlier. The method to use for panel data is the random-effects (or
10
available by household, AHI was approximated using the prices of homes and standard
lending rules. Standard lending rules mandate a minimum down payment of 20% of the
amount needed to purchase a house. Mortgage payments are typically around 30% of a
home-buying households annual income. Using these rules and the average price of
homes four quarters before a current quarter (the time needed to actually complete a
purchase from the time a decision is made to buy a house), income for a current quarter
was approximated. The loan amount lagged one year was computed as: LAt = home
pricet-4 * 0.8. LA was then used to approximate the average monthly mortgage payment
(PMT), where
(MR t 4 /12) *(1 MR t 4 /12) k
.
(1 MR t 4 /12) k 1
PMTt LA t
(1)
where k = loan duration (360 months for 30-year fixed loan). Since the approximate
annual payments (APt = PMTt*12) are 30% of a households annual income (It),
I t APt / 0.3 .
(2)
1 n
It .
n i 1
(3)
AHI thus approximates annual household income for a neighborhood i such that prices of
houses sold a year ago determine the level of income needed to purchase a house in the
current quarter. Given that AHI is different for each neighborhood, the variable plays two
roles. In addition to being a measure of income needed to purchase a house in each
neighborhood based on the average prices of homes four quarters earlier, it is also a
spatial variable that distinguishes between neighborhoods for other reasons.
12
5. Comparison of Results
A comparison between results of estimating average prices and individual prices is
presented here to show whether averaging prices does help produce better estimates of
the equations and/or better forecasts. First, GLS hedonic individual homes price levels
models for each citys are obtained. Average neighborhood price models follow.
Comparisons are first made between estimated models then between forecasts the
different models deliver. Comparison of the estimated models is based upon the
coefficient of determination (R2) and the mean absolute percent error (MAPE) of fitted
values. Comparisons between the forecasts are based upon prediction MAPE (PMAPE)
and Theils U-statistic. Theils U is defined as
1
(Yf Y f )2
F f
1
f2 1
Y
Yf2
f
F
F f
(4)
The estimated GSL equations using individual and using average home prices in each of
the defined neighborhoods for each city follow. The p-values are reported in parentheses
below the estimated coefficients.
Burbank - BB:
Using individual home prices for the entire city:
RPt = 141.181 + 76.455 SSFt + 0.259 CAt + 11.45 NGt + 3.191 LSFt 7.271 DVAt + 1.834 AHIt
(0.00)
(0.00)
(0.005)
(0.009)
(0.003)
(0.018)
(0.00)
13
10.944 MRt-2 10.918 MRt-3 8.475 MRt-6 + 1.378 SCDt 1.168 PAt
(0.00)
(0.00)
(0.00)
(0.00)
(0.00)
(5)
(6)
(7)
(8)
Carlsbad - CB:
Using individual home prices for the entire city:
RPt = 151.182 + 97.512 SSFt + 1.051 CAt + 13.47 NGt + 1.987 AHIt
(0.00)
(0.00)
(0.001)
(0.00)
(0.00)
12.05 MRt-2 5.886 MRt-3 15.306 MRt-6 + 2.141 SCDt 21.293
(0.00)
(0.022)
(0.00)
(0.00)
(0.00)
PAt
(9)
(10)
(11)
(12)
Redlands - RD:
Using individual home prices for the entire city:
RPt = 82.425 + 44.634 SSFt + 0.079 CAt + 5.28 NGt + 0.457 LSFt 6. 275 DVAt + 2.165 AHIt
(0.00)
(0.00)
(0.034)
(0.01)
(0.00)
(0.054)
(0.00)
6.3234 MRt-2 5.809 MRt-3 6.3095 MRt-6 1.139 PAt
(0.00)
(0.00)
(0.00)
(0.00)
(13)
(14)
14
PN: APt = 67.1 + 46.07 SSFt + 2.56 AHIt - 8.31 MRt-2 - 3.68 MRt-3 - 3.82 MRt-6
(0.00) (0.00)
(0.00)
(0.00)
(0.03)
(0.03)
ZIP: APt = 59 SSFt + 0.54 CAt + 2.15 AHIt - 11.12 MR2 + 0.26 PHt
(0.00)
(0.00)
(0.00)
(0.00)
(0.03)
(15)
(16)
Riverside - RS:
Using individual home prices for the entire city:
RPt = 86.276 + 33.374 SSFt + 0.266 CAt + 1.582 NGt + 0.55 LSFt 6.19 DVAt + 2.988 AHIt
(0.00)
(0.00)
(0.005)
(0.009)
(0.00)
(0.048)
(0.00)
4.424 MRt-2 7.578 MRt-3 5.079 MRt-6 - 0.494 SCDt + 1.776 PAt
(0.00)
(0.00)
(0.00)
(0.00)
(0.00)
(17)
(18)
(19)
(20)
In equations (15) (20), SSF = average structure square footage; BD = average number
of bedrooms; CA = average construction age; LSF = average lot square footage; AHI =
average minimum household income needed to purchase a house; MR = mortgage rate;
PA = percent of African American population in a neighborhood; PH = percent of
Hispanic population in a neighborhood; SCD = average distance to nearest school in the
neighborhood. All estimated coefficients have signs consistent with logical expectations
and are significantly different from zero at the 5% level of significance (except for one
BD in (10) that is significant at the 6%). Estimation (over 2000-2004) and forecast (for
2006) statistics of the above equations are in Table 1.
Table 1
Estimation and forecast comparative statistics
Estimation Statistics
Obs.
R2
MAPE DW
Forecast Statistics
Obs.
U
PMAPE
15
BB
Individual Prices:
Neighborhoods:
CT
PN
ZIP
CB
Individual Prices:
Neighborhoods:
CT
PN
ZIP
RD
Individual Prices:
Neighborhoods:
CT
PN
ZIP
RS
Individual Prices:
Neighborhoods:
CT
PN
ZIP
3189
0.78
15.73
1.76
794
0.09
13.94
260
146
221
0.86
0.91
0.85
8.96
8.35
10.19
1.57
1.63
1.62
76
42
61
0.05
0.06
0.07
7.76
8.53
11.47
1230
0.81
19.10
1.62
215
0.10
17.38
117
276
264
0.91
0.88
0.89
17.35
13.19
14.43
1.86
1.77
1.75
25
60
57
0.10
0.06
0.09
14.95
10.09
13.53
3393
0.78
17.83
1.70
788
0.12
15.56
155
242
216
0.89
0.91
0.91
11.75
11.34
9.00
1.74
1.86
1.50
40
63
60
0.09
0.08
0.09
12.53
11.17
10.78
3571
0.71
14.65
1.70
901
0.11
17.45
178
260
344
0.86
0.87
0.83
7.61
6.95
8.01
1.77
1.54
1.60
38
52
77
0.09
0.07
0.06
10.65
11.47
10.37
The results in Table 1 on estimation statistics provide a comparison between the number
of observations used to obtain each equation (obs.), the R2, MAPE, and the DurbinWatson statistic. The coefficients of determination (R2) for using individual price data are
all lower than those of the average neighborhood price equations. The MAPE statistics
also confirm that the average price equations may have the advantage. The DW statistics
are persistently below the critical 2.0 level suggesting slight positive autocorrelation
persisting. Forecast statistics provide comparisons between the number of observations,
the U-statistic, and the prediction MAPE (PMAPE). For all four cities, the average
neighborhood price equations show forecast statistics suggesting improvements over the
individual home price equations. Generally, the 2005 predictive powers using PN
resolution models are better than the other two.
16
To test which equation produces the better 2005 forecasts of individual home prices for
each citys neighborhood resolution, we test the null hypothesis that predictions of prices
using the average price equation predictions of the same prices using the individual
price equation. It is assumed here that the average price equations can be equally useful
in predicting individual home prices. Using the PMAPE statistic, the test can be rewritten
as:
Ho:
PMAPE1 - PMAPE2 0
where PMAPE1 = PMAPE obtained when predicting 2005 individual prices using an
average price equation and PMAPE2 = PMAPE obtained when predicting the same prices
using an individual price equation. The test statistic to use is:
z
PMAPE1 PMAPE 2
(21)
s12 s 22
F F
PMAPE2
z-score
p-value
1-tailed
BB
PN
CT
ZIP+1
14.14
22.76
22.99
13.94
13.94
13.94
-0.10
4.59
4.71
0.46
0.00
0.00
CB
PN
CT
ZIP+1
20.22
17.51
18.09
17.38
17.38
17.38
0.65
0.03
0.17
0.26
0.49
0.43
RD
PN
CT
ZIP+1
16.29
20.47
20.37
15.56
15.56
15.56
0.57
3.92
3.71
0.29
0.00
0.00
RS
17
PN
CT
ZIP+1
19.87
19.68
18.78
17.34
17.34
17.34
1.84
1.67
1.02
0.03
0.05
0.15
The comparison of the test results are in Table 2. Although PMAPE1 > PMAPE2 in all
situations, the null is not rejected at the 5% level of significance for PN in three of the
four cities. This means that it is possible to obtain predictions of individual home prices
using the average price equations that are not significantly different from those obtained
using the individual price equations.
The better neighborhood models may now be used to determine the future of housing
prices. They are used to predict average neighborhood prices in 2006 assuming that
houses sold in 2005 were resold in 2006. Predicting 2006 prices is possible without
having to predict any of the explanatory variables; the income variable is lagged one year
and because mortgage rates are easily adjusted to account for increases that occurred in
the first half of 2006. The 2005 and 2006 forecasts were then used to compute expected
price changes between the two years. Year-over-year expected quarterly changes in
average neighborhood price levels using the PN equations is reported in Table 3. PN is
selected since it was best according to the statistics in Table 2. The results in Table 3
suggest that home prices in 2006 are expected to rise only in BB and decrease otherwise.
Figures 5 (a) (d) compare actual real average neighborhood prices with the ex post
forecasts for 2005. Figures 6 (a) (d) compare average neighborhood ex post price
forecasts for 2005 with ex ante price forecasts for 2006.
Table 3
2006 over 2005 quarterly expected % price changes
BB
CB
RD
RS
Q1
9.20
-4.22
4.78
3.06
18
Q2
Q3
Q4
0.89
0.08
2.22
-3.08
-6.47
-6.75
-1.86
-1.96
-2.38
-1.40
-5.80
-5.76
19
20
Figure 6. Ex post predicted 2005 versus ex ante 2006 real average neighborhood prices.
21
Ideally forecast statistics should be compared with those reported in the literature.
However, reported results found do not use consistent dependent variables or report the
same statistics. MSE cannot be compared since they are dependent on relative prices of
homes in different areas and time periods analyzed. Only MAPE can be compared. A
comparison of the statistics found is in Table 4. There is a main difference between the
results in Table 3 and the results in Table 2. The results in Table 2 belong to forecasts of
future prices. Those in Table 4 are predictions of prices of homes sold in the same period
as the data used in model estimation.
Table 4. Comparison with literature forecast statistics
Sample MAPE
Genay and Yang (1996)
50
12.3
100
525
200
16.7
19.8
14.8
6. Concluding Remarks
This paper proposed a novel specification strategy to forecast residential home prices.
Rather than estimating a model to forecast prices directly, an equation to estimate an
average neighborhood price is adopted instead. The proposed method implicitly
generalizes subtleties about neighborhood attributes. Neighborhoods were defined
according to census tract (CT), the county assessors parcel number (PN), and the ZIP+1
code (ZIP+1). CT and ZIP+1 codes are well established resolution definitions. The PN
aggregation is justifiable since parcel numbers are typically assigned sequentially to
contiguous parcels of land before the construction of a house. Therefore, PNneighborhoods are assumed to include homes that have similar houses and akin spatial
attributes. Average price equations were specified as a function of the average house
22
Acknowledgement
This work would not have been possible without the support of the University of
Redlands, School of Business. Grants were generously provided to purchase the data
from DataQuick.
23
References
Ball M (1973) Recent empirical work on the determinants of relative house prices. Urban
Studies 10: 213-233
Basu A, Thibodeau TG (1998) Analysis of spatial autocorrelation in house prices. Journal
of Real Estate Finance and Economics 17: 61-85
Bin O (2004) A prediction comparison of housing sales prices by parametric versus nonparametric regressions. Journal of Housing Economics 13: 68-84
Bourassa SC, Hoesli M, Peng VS (2003) Do housing submarkets really matter? Journal
of Housing Economics 12: 12-18
Can A (1998) GIS and spatial analysis of housing and mortgage markets. Journal of
Housing Research 9: 61-86
Clapp J, and Giaccotto C (2002) Evaluating house price forecasts. Journal of Real Estate
Research 24:1-26
Clapp J, Kim H, and Gelfand A (2002) Predicting spatial patterns of house prices using
LPR and Bayesian smoothing. Real Estate Economics 30: 505-532
Clapp JM (2003) A semiparametric method for valuing residential locations: Application
to automated valuation. Journal of Real Estate Finance and Economics 27: 303-320
Clapp J (2004) A semiparametric method for estimating local house price indices. Real
Estate Economics 32: 127-160
Cukierman A (1979) The relationship between relative prices and the general price level:
a suggested interpretation. The American Economic Review 69: 444-447
DataQuick
24
25
26
Shiller R (1993) Measuring asset value for cash settlement in derivative markets: hedonic
repeated measures indices and perpetual futures. Journal of Finance 48: 911-931
White H (1980) Heteroskedasticity-consistent covariance matrix estimator and a direct
test for heteroskedasticity. Econometrica 48: 817-838
Zhou Z (1997) Forecasting sales and price for existing single-family homes: a VAR
model with error correction. Journal of Real Estate Research 14:155-168
27