You are on page 1of 6

Cross-Validation

We earlier evaluated the adequacy of various sample statistics (mean, standard deviation,
median, etc. ) through comparison of the sample and population values in the Walker Lake
data. In general, however, we do not know the population distribution, and must validate
the model strictly from the sample data values.
Cross-validation is a method of evaluating the adequacy of a spatial correlation model using
only data from the sample. It can also be used for evaluating the choice of lag and angle
tolerances in estimating variograms, choosing a radius of influence in estimation methods,
etc. Cross-validation is especially useful for pointing out which specific areas of a region are
difficult to estimate from the observed data.
The (Leave-One-Out) Cross-Validation Procedure
1. For location si , omit the observation vi from the data set temporarily.
2. Estimate vi based on the remaining points using whatever estimation method or model
is being evaluated.
3. Compare vbi to vi . How?

4. Repeat steps 1-3 for all i = 1, . . . , n data points in the sample.


5. Compute summary statistics and graphs of the cross-validation error distribution. This
will be discussed in what follows.
Summary Statistics
1. PRESS Statistic: The prediction sum of squares (PRESS) statistic is given by:
n
1X
1
(vi vbi )2 ,
PRESS =
n
n i=1

where vbi indicates the prediction of Vi from all data values except vi .

This quantity should be small if the model fits the data well (i.e.: there shouldnt
be much of a difference in what is predicted for Vi if we do not use the value vi .
The PRESS statistic is most commonly used for model selection/validation in a
regression context.

Where have we seen this before? If we view the differences in the sample and crossvalidated predicted values as residuals, then the PRESS statistic is nothing more
than the mean squared error (MSE) due to prediction. The difference here is that
the prediction is made without the use of the point being predicted, and hence is
independent of the value sampled at that location.

127

2. PRESS Statistic Based on Absolute Differences: The prediction sum of absolute deviations (PRESAD???) statistic could also be considered and is given by:
n
1X
|vi vbi |.
n i=1

Again, this quantity should be small if the model fits the data well.
Where have we seen this before? If we again view the differences in the sample
and cross-validated predicted values as residuals, then this statistic is nothing
more than the mean absolute error (MAE) due to prediction.
3. Standardized PRESS Residuals: The mean of the standardized PRESS residuals is
given by:
n
1X
(vi vbi )/eR(i) ,
n i=1
2
where eR(i)
is the mean squared prediction error for predicting vi from the remaining
data values.

This quantity should be close to 0 if the model fits well (i.e.: we would like the
prediction errors to be small and not all of one sign).
4. Root Mean Squared Prediction Residuals (Standardized): This statistic is given by:
v
u
n
u1 X
t

n i=1

(vi vbi )
eR(i)

!2

This quantity should be close to one if the model fits well. Why? The variance
of the cross-validation errors is an empirical estimate of the prediction variance.
Hence, if the model is correct, the standardized residuals in parentheses above
should have value 1. And so the average of the squared standardized residuals
should be one, as that is the sample variance of the standardized residuals.
5. Histogram of the Standardized PRESS Residuals: A histogram of these residuals will
reveal any lack of symmetry about zero, and the closeness of these values to zero.
6. Quantile-Quantile Plots: Q-Q plots of the sample values versus their cross-validated
predicted values will reveal the closeness with which the distributions of sampled and
predicted points agree. Large deviations from the 45-degree line indicate inadequacies
in the model or estimation method.
7. Scatterplots: A scatterplot of the sample values versus the cross-validated predicted
values will reveal the closeness with which the pairs of points agree. Such a plot will
indicate whether or not there is conditional bias as well as the degree of variability
in the predictions. Pairs of points deviating greatly from the 45-degree line indicate
values which are not well-described by the current model.

128

Example: Using the Walker Lake data, cross-validation was performed on each of the 470
values, by first removing the point and then predicting its value based on the remaining 469
under all five point estimation methods discussed in class (polygonal declustering, triangulation, local sample mean (with r = 25), inverse distance weighting (with p = 2), and ordinary
kriging). For ordinary kriging, a spherical variogram model as fit on page 108 of your class
notes was used to model the correlation structure. Tables of summaries from these five
cross-validations are given below. The first table contains simple summary statistics on the
cross-validation predictions, and the second table contains summaries of the cross-validation
residuals (predicted - true). What do you notice?
Prediction Value Statistics

n
v
s
CV
min
Q1
M
Q3
max

Polygonal
True Declustering
470
470
436.5
490.3
300.2
289.5
0.69
0.59
0.0
0.0
184.6
242.5
425.2
517.2
644.4
673.5
1528.1
1528.1

Local Sample Inverse Distance


Mean (r = 25) Squared (p = 2)
470
470
461.5
479.2
182.6
204.0
0.40
0.43
6.2
5.9
364.5
357.7
484.5
494.9
575.9
599.3
855.6
1015.9

Ordinary
TrianguKriging
True
lation
470
386
386
443.5
447.4
458.4
221.8
291.7
238.6
0.50
0.71
0.54
0.0
0.0
3.8
279.9
199.6
276.3
417.8
432.2
451.4
587.4
648.2
609.0
1136.0
1521.6 1217.4

Residual Statistics

n
eb
sbe
IQRbe
PRESAD
PRESS
b

Polygonal
Declustering
470
53.9
245.9
303.5
196.6
63242
0.653

Triangulation
386
11.0
182.0
239.6
142.1
33177
0.782

Local Sample Inverse Distance


Mean (r = 25) Squared (p = 2)
470
470
25.0
42.8
233.7
204.2
326.3
269.7
186.1
164.9
55114
43454
0.628
0.735

Ordinary
Kriging
470
7.0
180.9
255.2
145.0
32711
0.800

The average of the cross-validated values for triangulation and ordinary kriging (OK)
are more in-line with the true mean of the data than for the other three methods.
The variation among the LSM, IDW, TRI, and OK cross-validated values are much
smaller than the true or polygon values. Why?

129

In viewing the 5-Number summary of the quantiles, the true values and polygon predictions are more extreme. (i.e.: the maximum is larger, as well as the larger quantiles).
Why?
From the second table, we can note a hierarchy of level of positive bias in these methods,
with POLY having the largest positive bias, followed in sequence by IDS, LSM, TRI,
and OK. The level of global bias is smallest for ordinary kriging.
Looking at the variability in the residuals (with s or IQR), OK and TRI give the greatest
precision in the estimates, followed closely by IDS. The POLY and LSM methods have
noticeably higher variability in the residuals.
The increased bias and variability associated with the predictions from the POLY
and LSM methods can also be noted by the high mean absolute errors (MAE), high
mean squared errors (MSE), and low correlation between the observed and predicted
V-values. The OK method has the smallest MSE and highest correlation between
observed and predicted values, followed closely by the triangulation method, which
had the smallest MAE. The inverse distance squared method appears to perform third
best of the 5 methods. Based on these tables alone, one would conclude that the
extra effort involved with modeling the variogram for use with ordinary kriging slightly
improved the accuracy of the crossvalidation predictions.

500 1000
0

True Quantiles

Consider now the following set of quantile-quantile plots for the true and predicted values
of each of the five estimation methods. What do you notice?

500

1000

1000

500

1000

True Quantiles

500 1000
0

True Quantiles

500 1000
0

500

1000

Quantiles of LSMean Estimates

Quantiles of IDS Estimates

500 1000

500

Quantiles of Triangulation Estimates

True Quantiles

500 1000
0

True Quantiles

Quantiles of Polygonal Estimates

500

1000

Quantiles of OK Estimates

130

1000
500
0

True Values

Scatterplots of the true values versus the predicted values are shown below for the two
estimation methods. What do you notice?

500

1000

500

1000
500
0

True Values

1000
500
0

True Values

Polygonal Estimates

1000

500

1000

1000

500

1000

LSMean Estimates

True Values

1000
500
0

True Values

Triangulation Estimates

500

IDS Estimates

500

OK Estimates

131

1000

Some Cautions with these Statistics


1. If the data are clustered, the points within clusters are more heavily validated than
points outside clusters. Why?
As a consequence of this, if a model is chosen based on cross-validation, it will represent
the cluster areas better than the other areas.
2. Estimation of cross-validation points necessarily involves longer distances than estimation of points that were never sampled. Why?
This can easily be seen if you consider sampling
on a regular grid, as shown to the right.

Suppose we want to estimate the value of the


plus location. It is at least one unit away from
the nearest sample point. However, if you are
estimating a point that was never sampled, say

the triangle, it is never more than 1/ 2 units


from a point.

As a consequence of this, cross-validation may not be helpful for deciding between


variogram models for the correlation at small lag distances (except perhaps in clustered samples).
3. The model that appears best may depend on which of the summary statistics we
consider. This, of course, is true of any cross-validation procedure used for regression
model selection or otherwise.

132

You might also like