Comprehensive Spatial Query Containment Framework For Minimizing Redundancy

The Pennsylvania State University
The Graduate School
COMPREHENSIVE SPATIAL QUERY CONTAINMENT FRAMEWORK FOR MINIMIZING

REDUNDANCY
A Thesis in
Computer Science and Engineering
by
Brandon M. Unger
c 2009 Brandon M. Unger
Submitted in Partial Fulfillment

of the Requirements
for the Degree of
Master of Science
May 2009
The thesis of Brandon M. Unger was reviewed and approved by the following:
Wang-Chien Lee
Associate Professor of Computer Science and Engineering
Thesis Adviser
John Hannan
Associate Professor of Computer Science and Engineering
Daniel Kifer
Assistant Professor of Computer Science and Engineering
Raj Acharya
Professor of Computer Science and Engineering
Head of the Department of Computer Science and Engineering
Signatures are on file in the Graduate School.
Abstract
As storage capacity and computational power continue to increase, society is able to collect considerable
amounts of data from heterogeneous sources. The analysis of this information may require programs
to perform complex, multidimensional analysis in potentially adverse environments. Example applications include business intelligence operations, geographic information systems, and location-based
services. While these tools produce valuable information for users, they frequently must operate on
systems with limited processing capability and bandwidth capacity. To minimize unnecessary resource
consumption, a primary goal is to avoid the execution of any query that is redundant based on results
previously obtained by the client. This work introduces the concept of spatial query containment as a
means to identify when a new query can be answered solely using results from an existing query. Spatial
query containment has been engineered to support a variety of popular spatial query types, including
range, window, k-nearest neighbor, and reverse k-nearest neighbor. Each query Q has an associated
containment scope area, and any future query Q0 both semantically contained by Q and issued at a point
inside of the containment scope of Q can be answered using only the results from Q. Theoretical and
experimental analysis indicate that the containment scope processing framework outperforms existing
techniques under a wide variety of datasets, query loads, and computing environments. The substantial
reduction in redundant query evaluations provided by the spatial query containment framework supports the deployment of novel, data rich applications in challenging environments while maintaining
sufficient scalability, reliability, and performance.
iii
Table of Contents
List of Figures
viii
List of Tables
Acknowledgments
Chapter 1
Introduction
1.1 Problem Background . . . . . .
1.2 Problem Motivation . . . . . . .
1.3 Problem Definition . . . . . . .
1.4 Solution Outline . . . . . . . . .
1.5 Contribution and Organization
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
4
6
9
Chapter 2
Literature Review
2.1 Essential Concepts . . . . . . . . . . . . .
2.2 Data Organization Techniques . . . . . . .
2.2.1 B-Tree Index . . . . . . . . . . . . .
2.2.2 R-Tree Index . . . . . . . . . . . . .
2.2.3 Other Spatial Indexes . . . . . . .
2.2.4 Voronoi Cells . . . . . . . . . . . .
2.3 Spatial Query Types . . . . . . . . . . . . .
2.3.1 Region Query . . . . . . . . . . . .
2.3.2 Nearest Neighbor Query . . . . . .
2.3.3 Reverse Nearest Neighbor Query .
2.3.4 Location-Dependent Spatial Query
2.3.5 Time-Parameterized Spatial Query
2.4 Auxiliary Scope Techniques . . . . . . . .
2.4.1 Semantic Scope . . . . . . . . . . .
2.4.2 Valid Scope . . . . . . . . . . . . .
2.5 Caching Mechanisms . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
12
12
12
13
15
16
17
17
18
19
20
26
26
27
28
29
32
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iv
Chapter 3
Containment Scope Framework
3.1 System Overview . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 System Components . . . . . . . . . . . . . . . . . . . . . . .
3.3 Underlying Assumptions . . . . . . . . . . . . . . . . . . . .
3.4 Communication Model . . . . . . . . . . . . . . . . . . . . . .
3.5 Spatial Query Definitions . . . . . . . . . . . . . . . . . . . .
3.6 Containment Scope Definitions . . . . . . . . . . . . . . . . .
3.7 Containment Scope Evaluation and Computation Strategies
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
34
34
34
35
36
37
39
40
Chapter 4
Region Query Computation Methods
4.1 Containment Scope Server Processing for Region Queries
4.2 Containment Scope for Range Query . . . . . . . . . . . .
4.2.1 Basic Observations . . . . . . . . . . . . . . . . . .
4.2.2 Algorithm Implementation . . . . . . . . . . . . .
4.2.3 Optimized Computation Strategy . . . . . . . . .
4.3 Containment Scope for Window Query . . . . . . . . . .
4.3.1 Basic Observations . . . . . . . . . . . . . . . . . .
4.3.2 Algorithm Implementation . . . . . . . . . . . . .
4.3.3 Optimized Computation Strategy . . . . . . . . .
4.4 Containment Scope Client Processing for Region Queries
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
47
47
48
50
51
51
51
52
53
Chapter 5
Nearest Neighbor Query Computation Methods
5.1 Containment Scope Server Processing for NN Queries
5.2 Containment Scope for 1NN Query . . . . . . . . . . .
5.2.1 Basic Observations . . . . . . . . . . . . . . . .
5.2.2 Algorithm Implementation . . . . . . . . . . .
5.3 Containment Scope for kNN Query . . . . . . . . . . .
5.3.1 Basic Observations . . . . . . . . . . . . . . . .
5.3.2 Algorithm Implementation . . . . . . . . . . .
5.4 Containment Scope Client Processing for NN Queries
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
56
56
56
57
58
61
61
63
65
Chapter 6
Reverse Nearest Neighbor Query Computation Methods
6.1 Preliminary Notes on Reverse Nearest Neighbor Query Processing
6.2 Basic RkNN Auxiliary Scope Construction . . . . . . . . . . . . . . .
6.2.1 Korn Unary Basic Auxiliary Scope Processing . . . . . . . .
6.2.2 Basic Auxiliary Scope Client Evaluation . . . . . . . . . . . .
6.2.3 Basic Auxiliary Scope Processing Variants . . . . . . . . . . .
6.3 Dynamic RkNN Auxiliary Scope Construction . . . . . . . . . . . .
6.3.1 Dynamic RkNN Auxiliary Scope Processing . . . . . . . . .
6.3.2 Dynamic RkNN Auxiliary Scope Example . . . . . . . . . .
6.3.3 Dynamic RkNN Auxiliary Scope Client Evaluation . . . . .
6.4 Optimial RkNN Auxiliary Scope Construction . . . . . . . . . . . .
6.4.1 Monochromatic Optimal Auxiliary Scope Processing . . . .
6.4.2 Bichromatic Optimal Auxiliary Scope Processing . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
67
67
69
70
75
76
77
78
86
92
93
93
98
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 7
Theoretical Analysis
7.1 Introduction . . . . . . . . . . . . . .
7.2 Relevant Performance Metrics . . . .
7.3 Cost Model Terminology . . . . . . .
7.4 Region Query Cost Model . . . . . .
7.4.1 Query Submission Rate . . .
7.4.2 Auxiliary Scope Size . . . . .
7.4.3 Bandwidth Consumption . .
7.4.4 I/O Cost . . . . . . . . . . . .
7.4.5 Execution Time . . . . . . . .
7.5 NN Query Cost Model . . . . . . . .
7.5.1 Query Submission Rate . . .
7.5.2 Auxiliary Scope Size . . . . .
7.5.3 Bandwidth Consumption . .
7.5.4 I/O Cost . . . . . . . . . . . .
7.5.5 Execution Time . . . . . . . .
7.6 Extension to Non-Uniform Datasets
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
104
104
104
106
106
106
108
109
110
112
112
113
113
113
114
115
115
Chapter 8
Experimental Analysis
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Domain of Interest . . . . . . . . . . . . . . . . . . . . . .
8.3 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . .
8.4 Exp. I. Impact of Auxiliary Scope Formation . . . . . . .
8.4.1 Uniform Dataset Performance Analysis . . . . . .
8.4.1.1 Region Query . . . . . . . . . . . . . . .
8.4.1.2 kNN Query . . . . . . . . . . . . . . . . .
8.4.1.3 RkNN Query . . . . . . . . . . . . . . . .
8.4.2 Non-Uniform Dataset Performance Analysis . . .
8.4.2.1 Region Query . . . . . . . . . . . . . . .
8.4.2.2 kNN Query . . . . . . . . . . . . . . . . .
8.4.2.3 RkNN Query . . . . . . . . . . . . . . . .
8.5 Exp. II. Impact of Client Mobility . . . . . . . . . . . . . .
8.5.1 Fixed Query Parameter Performance Analysis . .
8.5.1.1 Region Query . . . . . . . . . . . . . . .
8.5.1.2 kNN Query . . . . . . . . . . . . . . . . .
8.5.2 Variable Query Parameter Performance Analysis
8.5.2.1 Region Query . . . . . . . . . . . . . . .
8.5.2.2 kNN Query . . . . . . . . . . . . . . . . .
8.6 Exp. III. Impact of Object Density . . . . . . . . . . . . . .
8.6.1 Region Query . . . . . . . . . . . . . . . . . . . . .
8.6.2 kNN Query . . . . . . . . . . . . . . . . . . . . . .
8.7 Recommendation . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
117
117
117
118
120
121
121
126
127
128
129
131
131
132
133
133
137
138
139
140
140
141
142
142
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Chapter 9
Auxiliary Scope Simulator
9.1 Simulator Project Overview . . . . . . .
9.2 Simulator Objectives . . . . . . . . . . .
9.3 Simulator Components . . . . . . . . . .
9.4 Simulator Development Roadmap . . .
9.5 Simulator Implementation Observations
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
143
143
144
145
147
149
Chapter 10
Conclusion
10.1 Spatial Query Processing Problem .
10.2 Spatial Query Containment Solution
10.2.1 Advantages . . . . . . . . . .
10.2.2 Disadvantages . . . . . . . .
10.2.3 Applications . . . . . . . . . .
10.3 Final Thoughts . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
152
152
154
155
156
157
158
.
.
.
.
.
.
.
.
.
.
.
.
Bibliography
159
vii
List of Figures
1.1
1.2
1.3
Example LBS system model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Illustration of overlapped query results . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Containment scope and containment test . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
Traditional data indexing methods . . . . . . . .

Spatial data indexing methods . . . . . . . . . .
Spatial query types . . . . . . . . . . . . . . . . .
Basic spatial query attempts to solve RNN query
RNN evaluation techniques . . . . . . . . . . . .
Semantic scope construction approaches . . . . .
Valid scope formulation (TP-query approach) . .
Valid scope formulation (geometric approach) .
.
.
.
.
.
.
.
.
13
15
18
22
23
28
29
31
3.1
3.2
3.3
General spatial query containment system model . . . . . . . . . . . . . . . . . . . . . .

Example R-tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Algorithm best first search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
41
42
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Algorithm region query containment scope . . . . . . . . . . .

Subroutine not needed for cs . . . . . . . . . . . . . . . . . . .
Range query circle and Minkowski circles of objects . . . . . .
Determining the containment scope for a range query result .
Detection of redundant complementary objects . . . . . . . . .
Determining the containment scope for a window query result
Removable complementary objects . . . . . . . . . . . . . . . .
Algorithm client region query eval cs . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
46
46
47
49
50
53
54
54
5.1
5.2
5.3
5.4
5.5
5.6
Geometric representation of NN containment scope . . . .

Algorithm nn query containment scope . . . . . . . . . .
Determining the containment scope for a NN query result
Determining the containment scope for a 2NN query result
Algorithm knn query containment scope . . . . . . . . . .
Algorithm client knn query eval cs . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
59
60
61
64
65
6.1
6.2
6.3
6.4
6.5
Eect of k on RkNN result . . . . . . . . . .

Algorithm find korn unary as . . . . . . . .
Subroutine not needed for vs . . . . . . . .
Subroutine not needed for cs . . . . . . . .
Sample query auxiliary scope computation
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
68
71
72
72
74
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
7
8
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18
6.19
6.20
6.21
6.22
6.23
6.24
6.25
6.26
Algorithm client query eval vs . . . . . . . . . . . . . . . .

Algorithm client query eval cs . . . . . . . . . . . . . . . .
Algorithm find dynamic as . . . . . . . . . . . . . . . . . .
Dynamic RkNN auxiliary scope set membership flowchart
Subroutine finalize kcnt . . . . . . . . . . . . . . . . . . . .
Subroutine finalize kdist . . . . . . . . . . . . . . . . . . . .
Subroutine initialize stats . . . . . . . . . . . . . . . . . . .
Subroutine update stats . . . . . . . . . . . . . . . . . . . .
Subroutine refine vs comp set . . . . . . . . . . . . . . . . .
Subroutine refine cs comp set . . . . . . . . . . . . . . . . .
Example dynamic auxiliary scope computation . . . . . . .
Algorithm client query eval vs (Revised) . . . . . . . . . .
Algorithm client query eval cs (Revised) . . . . . . . . . .
Outside search space scenarios . . . . . . . . . . . . . . . .
Subroutine outside search . . . . . . . . . . . . . . . . . . .
Algorithm find optimal as . . . . . . . . . . . . . . . . . . .
Subroutine initialize stats (Revised) . . . . . . . . . . . . .
Algorithm find optimal as (Bichromatic) . . . . . . . . . .
Subroutine initialize stats (Bichromatic) . . . . . . . . . . .
Subroutine outside search (Bichromatic) . . . . . . . . . . .
Sample bichromatic query auxiliary scope computation . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
76
80
82
83
83
84
84
85
86
87
92
92
95
96
97
98
100
101
102
103
7.1
Search area cir(q, 3r) and MBRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
111
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
8.10
8.11
8.12
8.13
Server overhead for computing range query auxiliary scope on uniform dataset . . . . .
Server overhead for computing window query auxiliary scope on uniform dataset . . .
Server overhead for computing kNN query auxiliary scope on uniform dataset . . . . .
Server overhead for computing RkNN query auxiliary scope on uniform dataset . . . .
Server overhead for computing range query auxiliary scope on non-uniform dataset . .
Server overhead for computing window query auxiliary scope on non-uniform dataset
Server overhead for computing kNN query auxiliary scope on non-uniform dataset . . .
Server overhead for computing RkNN query auxiliary scope on uniform dataset . . . .
Impact of client mobility on the performance of fixed range query (r = 1.5%) . . . . . . .
Impact of client mobility on the performance of fixed window query (l = 1.5%) . . . . .
Impact of client mobility on the performance of fixed kNN query (k = 4) . . . . . . . . .
Impact of client mobility on traditional spatial queries with variable parameters . . . . .
Impact of object density on traditional spatial query performance . . . . . . . . . . . . .
121
122
123
124
128
129
130
131
133
134
135
138
141
9.1
9.2
Auxiliary scope simulator components . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Auxiliary scope simulator screen captures . . . . . . . . . . . . . . . . . . . . . . . . . . .
147
151
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
List of Tables
6.1
6.2
6.3
6.4
6.5
6.6
Algorithm assumptions . . . . . . . . . . . . .
Set definitions . . . . . . . . . . . . . . . . . . .
Dynamic RkNN auxiliary scope (Stage I) . . . .
Dynamic RkNN auxiliary scope (Stage II) . . .
Dynamic RkNN auxiliary scope (Stage III - VS)
Dynamic RkNN auxiliary scope (Stage III - CS)
.
.
.
.
.
.
69
81
88
89
90
91
7.1
Cost model definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107
8.1
Experiment parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
119
9.1
Auxiliary scope simulator release schedule . . . . . . . . . . . . . . . . . . . . . . . . . .
149
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Acknowledgments
I would not have been in a position to perform this work without the steadfast support of many
people in my life. In particular, I owe a great debt to my research advisor, Professor Wang-Chien Lee,
and fellow Penn State graduate student Ken C.K. Lee. Their continued guidance and high standards
contributed in a large part to the success of this work. The Pervasive Data Access (PDA) group at
Penn State has provided much needed input as I refined my research ideas, and both Ken C.K. Lee and
Baihua Zheng went one step further by assisting in draft revisions and experimental analysis. I also
would like to acknowledge the useful thesis template files oered by Professors Francesco Costanzo
and Gary Gray. Dr. John Hannan has served graciously as my academic advisor despite my continual
questions, inquiries, and issues. For this, I oer my admiration and thanks. In addition, I would like
to thank Michael Kozeniauskas for his assistance in reviewing drafts of this work and for suggesting
much needed improvements. Furthermore, I extend my warmest regards to the entire thesis committee
for donating their limited time and considerable talent to review my research. Finally, I would not
have maintained sanity throughout the entire thesis composition process without the unyielding and
unconditional support of my family and my friends. There are far too many of you to name, and
regardless, I would be afraid of ommitting a name. However, this thesis would not have been possible
without your help.
xi
Dedication
This work is dedicated to my family, my friends, and my teachers (both past and present). You were
there to share in my accomplishments and to lift me up after disappointments. This work is as much
yours as it is my own.
xii
Chapter
Introduction
1.1
Problem Background
As society moves boldly into a new century, computer scientists are witnessing an exponential explosion
in the amount of data being collected by systems and sensors for analysis. Dramatic price drops and
performance improvements in both computationally rich mobile devices (e.g. laptops, portable device
assistants, smart phones) and computationally poor sensors oer more ways to collect data than at any
other point in history. With this wealth of new knowledge, information workers have developed a need
to mine literally exabytes of raw data for useful information. Furthermore, the successful processing of
modern data often requires an application to perform complex, multi-dimensional analysis to search for
patterns. Alternatively, users may seek information about increasingly popular spatial and temporal
datasets. By bringing timely and geographically appropriate information to end users, corporations are
beginning to develop an exciting new category of applications that is likely to increase in popularity
and importance over the next decade has their user base continues to expand.
Consider the following situations that often arise in todays computing landscape:
Geographic Information Systems (GIS)
Advances in processing power and storage capacity have made it possible to analyze geographic
information on a massive scale. Workstations can now perform useful operations on geographic
data in real-time and oer users the ability to plan vacations and business trips, to identify the
market region for dierent retail locations, and to track the spread of a disease.
Business Intelligence (BI)
Business intelligence eorts are becoming increasingly important in todays global economy and
require the synthesis of massive amounts of operational data. The general goal of such functionality is to identify patterns and relationships that can guide future business decisions. This
frequently requires trend analysis and discovering complex interactions between multiple variables.
2
Location-Based Services (LBS)
As prices fall and performance improves, consumers are moving away from stationary desktop
systems and embracing mobile devices such as laptops, PDAs, and smart phones. This increasing mobile user base presents new opportunities for location based services (LBS), which oer
information that is specific to a clients location. Examples include locating nearby restaurants
and calculating directions to a hotel. LBS applications frequently must issue location-dependent
spatial queries (LDSQ), which depend on the current location of the client.
In all of these cases, there is a need to solve complex problems on multi-dimensional data. Identifying
unique and efficient methods for answering these questions in dynamic and diverse environments has
become an important area of research in the computer science field.
1.2
Problem Motivation
The increased demand for and proliferation of multi-dimensional data has widened the problem space
that needs to be addressed by modern information processing systems. Traditionally, basic point and
range queries have been sufficient to answer the majority of questions posed by users. However, this
is no longer the case. Increasingly diverse and complex questions necessitate a more comprehensive
toolset for successful data analysis. In one particular case, we often need to identify those objects that
depend on or are near some specified location. Spatial queries that operate on a variety of (potentially
multi-dimensional) indexes provide one key mechanism to address the demands of todays changing
computing landscape. Basic spatial query types (including region and nearest neighbor queries) as well
as advanced spatial query types (including k nearest neighbor and reverse nearest neighbor) provide a
language for requesting this information.
Despite the great opportunity provided by spatial queries, these new tools are frequently deployed
in areas that require exceptional scalability, reliability, and performance under highly stressed systems
in possibly adverse conditions. There is an urgent need to conserve server and (in some cases) network
resources through the reduction of CPU, disk, and bandwidth utilization. In cases where mobile
devices are included in the processing scheme, additional concerns of wireless connectivity, client
battery life, and mobile computing resources become a concern. High client-server communication
costs can degrade the user experience through high query latency and slow user response times.
Unfortunately, addressing these concerns is more easily said than done. One of the keys to achieving
reasonable and economically sustainable performance in these situations is the identification of redundancy in query requests. Notice that all of the above sample applications potentially contain a high
level of overlapping data requests.
GIS: Systems may serve many users or issue queries at many nearby points to perform analysis.
In such cases, a subsequent query answer may be partially or even entirely contained in a previous
answer obtained by the same client.
BI: Spatial queries oer a mechanism for identifying the eect that dierent business factors
have on some outcome. For example, a retailer may want to determine how decreasing a sale
3
price aects the quantity of surf boards sold during cool weather? This singular logical question
may induce many queries executed simultaneously over temperature and sale price dimensions.
Substantial overlap in requested temperature and sale price values oer an opportunity to reuse
some query results.
LBS: Requests in LBS applications are based on a users current location. While this location
may change frequently and result in a high rate of client query submissions, the actual range
of movement between each update may be very limited in relation to the domain of the entire
dataset. Therefore, we can expect hight degrees of query overlap in situations where the client is
frequently updating information.
It follows from the above observations that substantial reductions in query submissions, processing
overhead, and bandwidth consumption can be achieved if redundant spatial query submissions could
be detected and suppressed. That is, the client should only issue a spatial query to the server for
processing when that spatial query cannot be answered locally using previous query results. Adopting
this strategy would lead directly to improved response time, increased system scalability, and reduced
system overhead. However, it is very difficult to detect redundancy in a spatial query request since the
result set for such a request is contingent on (1) the query point location, (2) additional query parameters,
and (3) the distribution of items within the dataset.
Extensive work has been done in pursuit of a solution to the redundancy identification problem
with limited success. One such attempt and the focus of this paper is the construction of an auxiliary
scope for each spatial query that is submitted to the server. This auxiliary scope data allows a client to
determine if a new spatial query can be answered locally using results from a previous query. Some of
the existing research that falls under this category of research includes the following:
Semantic Scope
Semantic scope techniques use bounds on the size of the query region to detect when one query is
completely contained within another query. An example situation may be a circular range query
issued at the same query point with a smaller radius than a previously issued query. Popular
approaches include semantic region processing for range and window queries as well as mNN
query development for kNN queries.
Valid Scope
Valid scope techniques construct a valid scope region for each spatial query Q that is issued to
the server. Any new spatial query Q0 that diers from Q only in its query point and which is
issued within the valid scope region of query Q can be answered using only result set data from
Q. Thus, the client can answer the new query Q0 locally without sending it to the server for
processing. Example implementations of this technique include time-parameterized (TP) query
valid scope, geometric valid scope, as well as a specialized geometric valid scope framework for
Location-Dependent Spatial Queries (LDSQs) issued within a broadcast environment.
4
Although both semantic scope and valid scope techniques can substantially reduce the number of
redundant queries that are submitted to the server for processing, these existing approaches are not optimal because they fail to identify many situations in which a spatial query may be completely contained
within a previous result. To address this concern, this work introduces the notion of containment scope as
the newest member within the auxiliary scope family of redundancy solutions. Compared with other
auxiliary scope techniques, containment scope applies to future spatial queries that are exceptionally
varied and oers a large area over which existing spatial queries can be reused.
1.3
Problem Definition
While much work has been done in the diverse area of spatial query processing, the inability to eectively
and efficiently detect redundancy has at least partially limited the functionality, performance, and
availabilty of applications that require such functionality in a real world environment. Furthermore,
no existing technique obtains optimal redundancy detection over a wide variety of spatial query types
with a minimal amount of overhead. Thus, we seek to develop computational methods for a novel
auxiliary scope framework, called spatial query containment, that provides the following necessary
properties to real-world systems:
Scalability: We must provide for a massive and growing user base through the reduction of a
maximal number of redundant client spatial query requests.
Reliability: We must protect mission critical infrastructure by providing accurate results and not
introduce inconsistent results into the information processing system.
Performance: We must provide responses quickly as analyzed data is frequently used by clients
to make time sensitive decisions.
Flexibility: We must support the processing of highly varied information requests through the
deployment of a framework that can process spatial queries of a wide variety.
Satisfying these requirements is particularly challenging in dynamic environments that feature (1)
transient network connections with limited data transmission capabilities, (2) centralized servers under
high demands, and (3) a potentially large client base. Thus our fundamental goal is as follows: to
develop a containment scope processing framework that accurately solves spatial queries in a way
that efficiently utilizes existing data available to the client to conserve server resources and to mitigate
bandwidth contention by maximizing client self-reliance.
Typically, auxiliary scope implementations are deployed using a client-server model in either a
stationary or mobile nvironment using wired or wireless connections, respectively. An example of
a mobile deployment of auxiliary scope in support of location-based services (LBSs) is depicted in
Figure 1.1. Mobile clients seek spatially nearby objects by transmitting location-dependent spatial
queries (LDSQs) over a wireless channel to a base station that then relays queries to an LBS server.
The server evaluates these queries and delivers a result set of matching spatial objects from the global
5
dataset for each submitted LDSQ to the client from which the request originated. As will be discussed in
more detail later, some auxiliary scope implementations may operate using broadcast communication,
using stationary clients, or using logical clients that reside on the same physical machine as the server
component. In all cases, the general flow of information remains the same.
mobile
client
mobile
client
base
station
LBS server
Figure 1.1. Example LBS system model
In addition to being compatible with a number of communication models, our system framework
must support a wide range of spatial query requests. Common types of spatial queries include region
(range, window) queries, nearest neighbor (NN, kNN) queries, and reverse nearest neighbor (RNN)
queries. Region queries retrieve objects within specified areas (e.g. circles and rectangles) that are
geographically centered at the query point. Meanwhile, nearest neighbor queries find an object located
closest to a query position. Finally, reverse nearest neighbor queries seek all data objects that are closer
to the query point than to any other data point in the dataset. Although this final query type may
resemble the nearest neighbor query, it is important to note and will later be shown that the reverse
nearest neighbor query cannot be solved using nearest neighbor query algorithms. Although each
auxiliary scope concept studied in this work supports some subset of these query types, only spatial
query containment oers native methods and a common framework for processing all three query
groups.
As previously mentioned in Section 1.2, an appropriate method for obtaining necessary spatial query
scalability, reliability, performance, and flexibility in real-world systems should avoid resubmitting
spatial queries to the server for processing whenever the result of that query does not require information
from the dataset that is locally unavailable. We refer to such a request as a redundant spatial query.
It can be observed that two spatial queries, even if issued at dierent query points or with dierent
parameters, may share the same result. Therefore, a result of a previous spatial query possibly can be
reused to answer a new query without sending a new query submission to the server for processing. To
enable every client to determine whether the results of new spatial query are locally available, additional
information about previous spatial queries is needed. Meanwhile, the eectiveness of such an approach
is dependent on the ability to minimize the overhead required to generate, to transmit, and to utilize
the additional data.
1.4
Solution Outline
This work introduce a new concept referred to as spatial query containment, which determines whether
the result of a spatial query Q0 (denoted by RQ0 ) is contained by that of a previous spatial query Q
(denoted by RQ). Any contained query can be answered locally and avoid submission to the server for
evaluation. Formally, RQ0 RQ denotes a containment relationship whereby RQ0 is contained by RQ.
Perhaps most importantly, our solution avoids costly resource consumption for mobile applications
(e.g. military troop notification) or for large batches of related operations (e.g. business intelligence
scenario analysis). Containment scope eectively reduces I/O accesses, CPU consumption, as well as
network bandwidth utilization within the established system model. In addition, containment scopes
serve to highlight areas in which the result set will not be larger than original query with which it is
associated. That is, any query Q0 that is semantically contained by another query Q and issued inside
of the containment scope of Q will have the cardinality of its result set bounded by |RQ |, the cardinality
of the result set of Q. This information can be valuable in its own right under certain situations, and we
point out such cases at the appropriate places in our discussion.
Consider how some real world applications can benefit from a containment framework:
Air force pilots need to remain aware of anti-aircraft batteries that pose a threat through the use of
surface-to-air missiles (SAMs). Pilots constantly retrieve information about nearby threats from
a centralized database. Accurate information is vital in such situations, yet the military must
support a large number of aircraft in combat. Spatial query containment can help fighter jets
to realize when new threats are encountered and allow computer systems to trigger appropriate
system updates.
Frequent business travelers need to locate hotel accommodations, dining establishments, and gas
stations. Some devices such as cell phones may be unable to contain a complete database of
travel information. Such a scenario would require server queries by the client. In addition, a
travelers position may change quickly, so constant updates are often needed. It is essential that
redundant queries be eliminated if many users are to take advantage of such a system. Spatial
query containment oer a way to provide this functionality.
A pizza chain wants to determine which customers might frequent a new store added in some
geographic region. They may also want to know what other pizza restaurants might lose business.
Using containment scope in tandem with an RNN query can identify both pieces of information
with minimal redundant processing by the server.
Consider the conceptual illustration in Figure 1.2(a) in which a circular region query (range query)
Q is to be evaluated at a query point q on a set of objects {a, b, c, d, e, f, g}. In the context of our previous
examples, this range query could be issued by a fighter jet in search of information about nearby threats.
We use cir(q, r) to represent the circular search region of Q, where q is the center and r is the search
radius. The result set R contains {c, d} and the remaining objects (e.g. g and h) are non-result objects.
Logically, for another range query Q0 with search area cir(q0 , r0 ), it is straightforward to determine R0 R
7
if cir(q0 , r0 ) cir(q, r). However, this is only one of several possibilities. There are many other cases in
which cir(q0 , r0 ) * cir(q, r) but R0 R still holds. Being able to efficiently identify a large number of these
scenarios can improve the reusability of R for new queries. We observe and list the five cases in which
R0 R is possible below.
result
objects
Q
b
cir(q,r)
d
q
Q2
non-result
objects
a
(a) Range query Q
q2
e
d
b
r2=r
e
c
r1<r q1=q
c q3
r3<r
Q1
Q3
(b) Range queries Q1 , Q2 and Q3
Figure 1.2. Illustration of overlapped query results
1. q0 = q and r0 = r. The radii and the query points of Q0 and Q are identical, which immediately
implies that cir(q0 , r0 ) = cir(q, r). In this case, it is certain that RQ0 = RQ as they cover exactly the
same region and consequently the same set of result objects.
2. q0 = q and r0 < r. The radius of Q0 is smaller than that of Q and their query points are the same.
This implies that the search area of Q0 is fully contained by that of Q (i.e., cir(q0 , r0 ) cir(q, r)). In
this case, RQ0 must be contained by RQ . As illustrated in Figure 1.2(b), both Q1 and Q are issued
at the same query point, q, and r1 < r. It follows from the previous discussion that RQ1 RQ .
3. q0 , q and r0 = r. Both Q and Q0 have the same search area size because of the same radius, but
dierent query points. This case is common in mobile environments when a client repeatedly
issues the same range query while moving. In this case, Q0 is only contained by Q if the the region
cir(q0 , r0 )
cir(q, r) contains no data objects. As shown in Figure 1.2(b), RQ2 is such that r2 equals r
but q2 , q. In this case, RQ2 = {d} RQ .

4. q0 , q and r0 < r. This case considers both the change of search area sizes and the change of
query points. This scenario happens, for instance, when a mobile client moves to a location and
issues a query of a smaller search range. The condition for containment is similar to that of the
previous case but may be more easily satisfied since cir(q0 , r0 )
cir(q, r) is a smaller region under
this scenario. Q3 , depicted in Figure 1.2(b), shows an example where RQ3 = {c} RQ .
5. q0 , q and r0 > r. This case occurs when the newly issued query includes (1) a dierent query
point and (2) a larger search range. The condition for containment is identical to the previous two
cases but is unlikely in practice since cir(q0 , r0 )
cir(q, r) could be substantially larger. This is also
the only case not considered by spatial query containment.
8
For the cases in which query points are unchanged (i.e., cases 1-2), we can decide whether RQ0 RQ
immediately by comparing the radii of their search ranges. On the contrary, it is challenging to examine
the containment of results for situations where query points and possibly search ranges are dierent
(i.e., cases 3-5). Existing solutions to eliminate redundant spatial queries only consider at most two
of the previously described cases. In contrast, our proposed containment scope can identify result set
reusability in all but the last case.
Before explaining how containment scope attains its improved redundancy detection, we briefly
examine the example depicted in Figure 1.2(b). From the cases of RQ2 RQ and RQ3 RQ , we can
see that although the search ranges of both Q2 and Q3 are dierent from that of Q, Q2 and Q3 cover
some result objects in RQ and do not contain objects outside the result RQ . This corresponds to the
general observation about cir(q0 , r0 )
cir(q, r) mentioned previously. Any spatial query Q0 in which all
its result objects are located in the search range covered by Q will possess a result set RQ0 that must be
totally contained in RQ . In other words, knowledge of the surrounding object distribution is needed to
determine containment. Assume that the object distribution is fixed. Then there exists a spatial area for
each query Q and corresponding result set RQ such that any future query Q0 (with an equal or smaller
search area than that of Q) issued inside of this region has its result set RQ0 RQ . To capture this
spatial area, we propose a notion of containment scope, denoted by SQ , through which we may determine
whether a query (say Q0 ) can be answered by a maintained result (say RQ ).
The shaded area shown in Figure 1.3(a) represents a containment scope corresponding to the result
of a range query Q over a given dataset. It is guaranteed that for any range query Q0 with r0 r and
q0 2 SQ , RQ0 RQ . Thus, with a containment scope associated with RQ , the client can answer Q0 locally.
This evaluation of whether Q0 can be answered based on the result of a query Q is defined as spatial
query containment test (or containment test for short). As shown in Figure 1.3(b), because conditions (1)
r1 r, (2) r2 r, (3) r3 r and (4) q1 , q2 , q3 2 SQ hold, the client can completely answer Q1 , Q2 , and Q3
by retrieving result objects from RQ . In particular, we have RQ1 = {d}, RQ2 = {d}, and RQ3 = {c}.
containment scope
g
f
q2
d
c
b
a
complementary
objects
(a) Containment scope SQ
d
q1
c q3
a
(b) Containment test
Figure 1.3. Containment scope and containment test
The above discussion is based on range query, a type of region query. In fact, the concept of spatial
query containment is much more general and is applicable to all previously mentioned query types.
9
As the formulation of containment scopes and containment tests are highly related to the type of spatial
query, we shall explore them in detail throughout the remained of this work.
To exploit spatial query containment, we present a system framework that includes (1) basic spatial
query processing, (2) containment scope computation, and (3) spatial query containment testing logic.
Since the formation of a containment scope requires knowledge of both result objects and non-result
objects, we assign containment scope computation to the server. When a query is submitted and
evaluated, the containment scope for that query result is computed. In order to minimize the processing
cost of containment scope computation, we devise efficient online algorithms that are integrated with
spatial query processing whenever possible to minimize index access. As will be discussed later, our
approach can finish the evaluation of a spatial query and then determine the corresponding containment
scope with a single index traversal. It also can deliver the query result coupled with the corresponding
containment scope back to the client in one message. Issuing a new query Q0 causes the client first to
perform a containment test for each stored containment scope SQ and its associated spatial query result
RQ . It only submits the new query to the server when the containment test indicates RQ0 * RQ for all
stored containment scopes.
Also, the representation of a containment scope SQ has a direct impact on (1) the communication
cost of delivering SQ back to the client, (2) the containment test overhead incurred by the client in
deciding whether the new query point q0 2 SQ , and (3) the local storage cost incurred by the client for
maintaining SQ . Hence, excessive care must be used in selecting a containment scope representation
for our framework such that the benefit of spatial query containment is achieved while minimizing
overhead. Certainly, a containment scope can be represented as a polygon that consists of edges and
vertices. However, this approach will incur a large volume of data and high computation costs in
checking if a point is inside a polygon. Furthermore, for some queries (e.g. range queries with circular
search areas), polygon-based representation cannot provide an exact containment scope.
Instead, we choose to use individual data object locations to represent containment scope data.
Recall that a new query Q whose search area does not touch any non-result object is guaranteed to
have its result contained by RQ . However, the number of non-result objects can be very large; thus it is
impractical to transmit and to store all objects on the client. Instead, our approach tries to identify only
those representative non-result objects that aect the formation of containment scope, to minimize the
communication overhead. We refer to such objects as complementary objects. Referring to Figure 1.3(a),
the containment scope is composed of result objects {c, d} and complimentary objects {a, b, e, f }. Notice
that some non-result objects such as g and h are skipped.
1.5
Contribution and Organization
In the remainder of this paper, we continue to describe the concept of spatial query containment, a new
technique to reduce redundant queries by allowing clients to determine whether their maintained
spatial query results are sufficient to answer subsequent spatial queries. We propose containment
scope, containment testing logic, and a spatial query processing framework to efficiently realize this
new concept for a number of dierent query types and under a wide variety of circumstances that
10
include applications to GIS, BI, and LBS. We further conduct a comprehensive set of experiments to
evaluate the eectiveness of spatial query containment in relation to a representative sample of existing
techniques. The results consistently indicate the superiority of the spatial query containment approach
under a wide variety of scenarios.
In summary, the primary contributions presented in this work are as follows:
1. We introduce the concept of spatial query containment, which can eliminate the submission of spatial
queries when their results are locally available and thereby reduce redundant server requests,
query response time, client energy consumption, and bandwidth contention.
2. We propose a new notion of containment scope, which represents a spatial area corresponding to a
result set RQ of an LDSQ Q wherein a new LDSQ Q0 has a result set RQ0 that is fully covered by
RQ so long as the search area of Q0 is smaller than (or contained by) that of Q.
3. We devise efficient online containment scope computation algorithms for region (range, window)
queries, nearest neighbor (NN) queries, and reverse nearest neighbor (RNN) queries. Several
variants of these basic query types (such as kNN and RkNN) are also considered. Our computation
methods integrate containment scope evaluation with spatial query processing whenever possible
to minimize incurred processing overhead.
4. For each query type, we devise a containment test algorithm that uses a previous computed
containment scope to determine if a new spatial query result is fully covered by the previous one.
5. We present a spatial processing framework that incorporates online containment scope computation and containment testing over several existing communication models in support of a wide
variety of commercial applications. The eectiveness of spatial query containment over this model
is analyzed within the context of the assumptions defined in Chapter 3.
6. We conduct extensive theoretical analysis and empirical experiments to evaluate system performance in comparison with existing related approaches. In general, the amortized savings by using
spatial query containment are shown to outweigh the minimal overhead required during initial
query processing. Furthermore, spatial query containment outperforms all existing related works
under a wide variety of circumstances.
7. We implement a working auxiliary scope simulator to test the eectiveness of various techniques
under real-world application scenarios. The performance, reliability, and scalability of spatial
query containment in the example system is measured in relation to other auxiliary scope techniques as well as a baseline system with now query reduction mechanisms.
The remainder of the paper is organized as follows. Chapter 2 reviews literature used as a foundation
for this work as well as numerous existing methods for the reduction of spatial queries. Distinctions
between spatial query containment and these approaches are mentioned when appropriate. Chapter 3
provides an outline of the spatial query containment framework, defines basic definitions and assumptions, and discusses the spatial query processing algorithms that form the basis of this work. Chapter 4,
11
Chapter 5 , and Chapter 6 discuss spatial query containment for region (range and window) queries,
nearest neighbor (NN and kNN) queries, and reverse nearest neighbor (RNN and RkNN) queries as well
as our proposed approaches. Chapter 7 analysis spatial query containment from a theoretical perspective, while Chapter 8 evaluates our proposed framework against related works over various situations.
In Chapter 9, the results from the construction of our auxiliary scope simulator and their implications
on the eectiveness of spatial query containment are considered. Finally, Chapter 10 concludes this
paper and states possible future research directions.
Chapter
Literature Review
2.1
Essential Concepts
This chapter reviews a variety of work that is relevant to the issue of spatial query containment as well
as to the construction of efficient and eective containment scope processing algorithms. We begin in
Section 2.2 by considering various indexing structures that are frequently used to facilitate the efficient
insertion, deletion, or updating of spatial data. Next, Section 2.3 examines the spatial queries supported
by our containment framework. We oer example usage scenarios for each query type as well as a
general overview of existing computational approaches. After considering spatial data indexing and
querying, the notion of auxiliary scope support is carefully examined in Section 2.4. Techniques in
this section attempt to accomplish similar goals as spatial query containment by forming a region
wherein query results can be reused. Current processing methods as well as the relative advantages
and disadvantages of each approach are reviewed. All auxiliary scope methods attempt to identify
future redundant queries by examining stored data that is associated with a specific previously issued
query. We close the chapter in Section 2.5 by examining the important role that various client caching
strategies play in the eective use of dierent auxiliary scopes.
2.2
Data Organization Techniques
We begin our comprehensive literature review with a look at how multi-dimensional data is typically
organized to provide for efficient access and modification. We begin with the classical B-tree index
and associated linearization techniques. Next, we turn our attention to custom spatial data indexing
methods such as the ubiquitous R-tree index, the quad tree index, and the D-tree index. Finally, we
oer a definition for a Voronoi cell and consider its applicability to spatial information processing.
13
d
b
d
b
a
root
dX
fX
a
)1
b
)2
g
(a) Sample dataset
b
a
root
root
hZ
fY
)2
d
)1
)3
c
(c) B-tree index (y-dimension)
)1
bY
(b) B-tree index (x-dimension)
)3
bZ
)2
f
)3
d
(d) B-tree index (z-order curve)
Figure 2.1. Traditional data indexing methods
2.2.1 B-Tree Index

One principle research issue in the area spatial information management has been the development of
efficient data storage structures that can be used to hold data that is relevant to a given system. Spatial
query processing represents a unique technical challenge given that these requests typically restrict the
dataset using two separate fields simultaneously (e.g. latitude and longitude). Classical disk based
indexing structures such as the ubiquitous B-tree and its variant the B+-tree can only efficiently index
information in a single dimension [1].
Recall that the B+-tree index sorts data keys based on some relative ordering property. Each leaf
node stores the keys for data objects, while each internal node of the tree stores pointers to children
nodes. These child nodes are responsible for storing all keys that fall inside of some closed interval that
14
is specified by the key values stored by the parent node. For example, we may produce an index based
on the age of students. A parent node may store pointers to three children as well as the key values 21
and 25. This means that the first child holds all data keys with a value that is less than 21, the second
child holds all data keys with a value between 21 (inclusive) and 25 (exclusive), and the third child
holds all data keys with a value greater than 25. Any internal node that has n children will have n
key values to facilitate tree traversal. As a result, those keys with similar values tend to be grouped
into the same index node. In our example, younger students (age 18-20) would be grouped together
into one part of the tree, while older students (age 21-24) would be grouped into a dierent branch of
the tree. Unlike the classical B-tree, the B+-tree always stores data objects at the leaf level of the tree to
facilitate sequential scanning of the dataset.
The typical B+-tree usage scenario involves accessing items on disk, and we generally size each tree
node such that it is equal to a single disk page. Because disk pages are large in size relative to a typical
key size, each tree node can store many keys and each internal node will have many children nodes.
The number of children that belong to each node is referred to as the fanout of the tree. Each data
objects key is inserted as an entry into a leaf node of the B-tree based on its key value. The leaf node
is chosen by starting at the root of the tree and following the branch of the tree that is responsible for
storing the key range in which the new data key value lives. All nodes have a specified minimum and
maximum capacity, and the B-tree nodes are recursively split or merged as needed to accomodate data
updates.
As previously mentioned, the B+-tree is a popular method for indexing data in a single dimension.
However, it spatial data often requires that two or more dimensions be considered simultaneously.
This method is not implicitly supported by the B+-tree structure. Consider the example dataset given
in Figure 2.1. Here, we have a set of eight data points (a-h) that need to be indexed based on spatial
locality. However, the B+-tree index can only consider a single attribute. In Figure 2.1(b), we consider
just the x-dimension of each data object. The root node has three children (N1, N2, and N3), and each
child is responsible for storing a certain subset of possible x-coordinate values. N1 stores values in the
interval ( 1, dx ), N2 stores values in the interval [dx , fx ), and N3 stores values in the interval [ fx , 1).
Here, ax refers to the value of the x-coordinate of data object a. Examining this grouping, we notice
that data objects with similar x-coordinates are grouped together; however, this does not necessarily
imply spatial locality. For instance, objects b and g are grouped together but are not actually in close
proximity. Figure 2.1(c) illustrates a similar process by which we index our spatial data points by their
y-coordinates. The resulting index also fails to truly represent the spatial locality of stored objects as is
evident by the grouping of objects f , g, and h into a single index node.
A final organizational technique for the B+-tree index utiilizes linearization techniques such as zcurve ordering (pictured in Figure 2.1(d)) and Hilbert curve ordering [2, 3]. These processes attempt to
merge two dierent dimensions of information in a way that maintains the spatial locality. That is, the
resulting linear B+-tree index represents spatial information by projecting multi-dimensional objects
onto a one-dimensional space. The example dataset in Figure 2.1(d) shows the eect of z-ordering on the
dataset. We assign each data point a location on the curve that minimizes the Euclidean distance between
the objects curve location and real location. Next, assign one end of the z-order curve a small value and
15
then monotonically increase z-coordinate values as the curve is traversed. This approach produces the
best spatial locality of all B+-tree indexes but still is limited by the fact that only a single dimension of
information can be represented in the final index. This is clearly illustrated by the grouping of objects d
and f into a single node despite the relatively large Euclidean distance that separates them. In general,
spatial query evaluation performs poorly under this scenario, so researchers have endeavored to create
new data organizations that actually maintain full dimensionality in the resulting index. These data
indexes primarily involve either grid or tree structures and have experienced varying degrees of success
[1, 4, 5].
2.2.2 R-Tree Index

g
)2
)3
d
b
)1
)1
a
)2
)3
h
i
q
f
)2
c
e
c
root
)1
)3
g
(a) R-tree index
d
(b) Voronoi cell
Figure 2.2. Spatial data indexing methods
In 1984, Antonin Guttman introduced the R-tree indexing structure, which oers efficient storage
in both memory and disk, incurs minimal update cost, and indexes information using all spatial
attributes. R-trees group nearby data points together into minimum bounding rectangles (MBRs)
[4, 5, 6]. A minimum bounding rectangle represents the smallest rectangle that contains all of the data
points with which the MBR is associated. As the R-tree becomes full, additional levels are added, and
high level MBRs are determined by the smallest rectangle that is needed to contain the MBRs of all
children nodes. All data objects are always stored at the lowest level of the tree. Much like the B+-tree
indexing structure, the R-tree has a minimum and maximum node capacity. Underflow and overflow
are handled through recursively merging or splitting nodes (and MBRs) in tree as needed. However,
unlike the B+-tree structure in which sibling nodes store disjoint subsets of the data key range, the MBRs
associated with sibling R-tree nodes can overlap. Thus, a query may need to follow multiple paths in
the R-tree in order to consider all possible results. Many popular spatial queries are supported well
under this structure. For example, window queries over a particular region can be answered quickly
16
by traversing the children of any node whose MBR overlaps the search region.
Continuing with the same dataset used in our B+-tree example, Figure 2.2(a) illustrates how data
objects a-h can be indexed using an R-tree. Once again, we have a root node with three children nodes
N1, N2, and N3. Each internal node has an MBR (shown in the figure) that contains all of its children
objects. Notice that the R-tree oers a much higher degree of spatial locality than any of the previous
B-tree solutions since it incorporates both dimensions into the index structure. The outter rectangle in
the figure represents the MBR for the root node and illustrates how child MBRs can be used to create a
new MBR for a higher level in the tree.
Future variants to the general R-tree structure resulted in popular optimizations such as the R+-tree
and the R*-tree. These refined data structures cemented the R-tree as an ubiquitous choice for indexing
spatial information [4, 6]. When grouping objects, R-trees attempt to minimize area using various
heuristics that trade computational speed for algorithm eectiveness. The R+-tree attempts to avoid
the overlapping MBRs of internal index nodes. However, this complicates the grouping and updating
logic for the overall index. In contrast, R*-trees consider area, overlap, perimeter, and node fill factor
when making decisions about how to group, split, and merge data objects. The choice and relative
influence of each of these factors is based largely on empirical results. R*-trees provide very good
performance with low-dimensionality datasets and are one of the most commonly used data structures
for indexing massive spatial datasets. Consequently, we adopt this variant as the primary spatial
indexing method used for this work. Any deviation from this decision will be noted as appropriate.
When issuing spatial queries against the R*-tree, we adopt the distance browsing technique proposed
by Hjaltason et al [7]. They use an incremental, greedy approach to locate nearby objects. A priority
queue stores R*-tree nodes and is pre-populated with the root of the tree. For each iteration of the
algorithm, we remove the node that is closest to the query location from the queue. If the object is
an internal node, we add all of its children to the priority queue for further analysis. Otherwise, we
examine records in the leaf node as potential result objects.
2.2.3 Other Spatial Indexes

Various other spatial indexes have been proposed for the eective management of complex data. Two
interesting approaches that exemplify overall design patterns for spatial indexes are the quad tree and
the D-tree. The quad tree recursively divides the data space into quadrants based on the relative
location of data objects. When a particular quadrant in the index is filled to capacity, the restructing
logic splits the quadrant once in the x-dimension and once in the y-dimension. Typically, the goal of
the splitting routine is to perform more recursive splits in areas of the dataset that are exceptionally
dense. Compared with the R-tree approach, D-trees oer a more rigid index structure that provides
more predictable behavior at the expense of flexibility.
The second type of alternative spatial index is the D-tree, which indexes the data space based on
regional divisions. D-trees divide the entire data space into non-overlapping polygonal regions. It
indexes this information in a way that allows for quick determination of membership in a particular
region. Such a membership determination is referred to as a planar point query. Many spatial queries
17
can be reduced to planar point queries, so such an index can be quite useful. D-trees also provide clients
with information about the specific partition, or zone, in which their query was issued.
2.2.4 Voronoi Cells

For our final data structure discussion, we consider the important role that Voronoi cells play in
classifying locations in the data space. A Voronoi cell for some data object o in the dataset is the convex
polygon formed by taking the intersection of all perpendicular bisectors formed by considering a line
segment from o to any other object o0 , o in the dataset. Let ?o,o0 represent the perpendicular bisector
of the line segment between data objects o and o0 . It follows that ?o,o0 divides the data space into two
disjoint subsets. The region that contains object o represents all points in the data space closer to o than
to o0 . Similiary, the region that contains object o0 represents all points in the data space closer to o0 than
to o. For convenience, we let ?o,o0 refer to the subregion of the dataset that contains object o. Then the
Voronoi cell V(o) for data object o can be represented as V(o) = \o0 ,o ?o,o0 . In addition, no two Voronoi
cells overlap (i.e. V(o) \ V(o0 ) = ;). Finally, another useful property for subsequent discussion is that
the union of the Voronoi cells of all data objects covers the entire data space. Extending our observation
about the two disjoint regions formed from a perpendicular bisector ?o,o0 , we can conclude that any
point inside of V(o) is closer to object o than to any other object o0 in the dataset.
Consider the example Voronoi cell depicted in Figure 2.2(b). The shaded region represents the
Voronoi cell V(c) for object c. (Note that point q is a query point and is not a part of the dataset.)
Then, the sides of the Voronoi cell are formed by perpendicular bisectors ?c,a , ?c,b , ?c,d , and ?c,e . Other
perpendicular bisectors such as ?c, f , ?c,g , ?c,h , and ?c,i do not aect the final Voronoi cell since they
are less restrictive than the original bisector. The notion of Voronoi cells is essential to spatial query
containment for nearest neighbor and k nearest neighbor queries. For example, we know that the closest
data object to query point q is object c by virtue of the fact that q 2 V(c).
2.3
Spatial Query Types
With essential data organization techniques now firmly established, we turn our attention to common
questions asked about spatial information. This section considers traditional core spatial queries that
include region (range, window) queries and nearest neighbor (NN, kNN) queries. In addition, we
review the more recent and complex reverse nearest neighbor (RNN, RkNN) query family. These
queries represent a comprehensive set of popular queries that should be supported by any auxiliary
scope approach. As such, we explore methods for supporting each of these spatial query types in the
containment scope framework throughout the rest of this paper. In addition, our framework is easily
extendible to support additional query types as necessary. Each query type is first defined. We then
oer real world situations in which such a query would be useful. Finally, we oer an overview of
common query evaluation techniques. Since query evaluation is highly dependent on the type of data
index available, we restrict our discussion to algorithms that are appropriate for the R-tree data index
or its variants. Recall that our spatial query containment framework uses the R-tree index because of
18
cir(q,r)
win(q,l,h)
a
(a) Range query
cir(d,|q-d|)
f
e
d
q
d
q
(b) Window query
cir(b,|q-b|)
g
d
q
d
q
b
cir(q,|q-d|)
cir(c,|q-c|)
(c) NN query
(d) Reverse NN query
Figure 2.3. Spatial query types
its widespread acceptance and efficient performance.
2.3.1 Region Query

The first basic type of spatial query is the region query, which returns all objects in the dataset that lie
within a specified area. Typically, we define the region to be searched in terms of a central query point q
as well as some set of supplemental query parameters given by E. Dierent subtypes of region queries
exist based on the shape of the specified region. Two common categories that will be considered in this
paper include the range query and the window query.
Range queries attempt to identify all objects inside of a circular region centered at the query point q
with a radius r. In this case, r is the sole parameter included in set E. A sample range query is given in
Figure 2.3(a). Here, the shaded region given by cir(q, r) represents the search area. It follows that objects
c and d are returned as query results since c, d 2 cir(q, r). All other objects are outside of the circle and
are not returned to the client.
The second type of region query is the window query, which attempts to identify all objects inside
of a rectangular region centered at the query point q with extents given by length l and height h. Here,
the total size of the rectangle is 2l 2h, and the passed query parameters represent the horizontal and
vertical distances between the query point and the rectangle boundary. A sample window query is
19
given in Figure 2.3(b). Here, the shaded region given by rect(q, l, h) represents the search area. It follows
that objects c and d are returned as query results since c, d 2 rect(q, l, h). All other objects are outside of
the rectangle and are not returned to the client.
Consider some examples where a region query could be useful:

A military base may want to locate all fighters within range of a particular targe. A range query
can accomplish this task.
A tourist may want to find points of interest on a certain city block prior to leaving the area. This
is precisely the case that is solved by a window query.
Pennsylvania State University officials may want to locate potential food distributors that are
within 10 miles of the University Park campus.
A retailer may want to identify sales that ocurred during a particular timeframe and within a
certain price range. These two parameters can be simultaneously restricted using a window
query with time as the first dimension and price as the second dimension.
Several mechanisms exist for processing region query information. Recall that an R-tree consists of
both internal and external nodes and that each node has an MBR. The general strategy for answering a
region query is to explore the children of all nodes in the R-tree that have an MBR which overlaps the
query search region. At the leaf level, we include all objects that are located inside of the query range.
Various algorithms process nodes in dierent orders and have dierent termination criteria. The first
two obvious choices are to traverse the tree using either a breadth first search (BFS) or a depth first
search (DFS). We manage such searches using a queue or stack. We only continue a search path if the
current nodes MBR overlaps with the query region.
An alternative search process follows the distance browsing technique [7]. In this case, the algorithm
explores nodes based on their minimum distance (or mindist) from the query point q. This increases the
speed with which data objects are found, since result objects are likely to be spatially nearby the query
point. The order of and list of items that still need to be explored is maintained by a priority queue data
structure. The distance browsing method has the beneficial byproduct that the algorithm terminates
with all unexplored data nodes already sorted by the mindist metric in the priority queue. We exploit
this fact during the construction of several spatial query containment algorithms.
2.3.2 Nearest Neighbor Query

The second basic type of spatial query is the nearest neighbor (NN) query, which returns those objects in
the dataset that are closest to a given query point q. Two dierent subtypes of nearest neighbor queries
exist based on the number of objects that are to be returned to the client. The 1-NN query identifies
the closest object to a given query point, while the k-NN query returns the k closest objects to the query
point
First, we consier the 1-NN query. This query only has a single parameter q that denotes the location
at which the query is to be issued. The algorithm returns the data object that minimizes the Euclidean
20
distance between the query point and the data object. If we denote the set of all data points as O, then
the 1-NN result object o satisfies the property |q, o| |q, o0 |8o0 2 O. Note that the cardinality of the result
set is always one by definition.
Furthermore, we notice that the 1-NN query is relatively more difficult to solve than a standard
region query. This follows from the observation that the query answer is dependent on not only the
query location and data object location but also on the relative location of all other objects in the dataset.
An example nearest neighbor query is given in Figure 2.3(c). Here, a 1-NN query is issued at point q.
The entire result set consists only of object d, as it is the closest to point q. To see this, observe that the
circle cir(q, |q, d|) is empty, so no other object in the dataset can possibly be closer to q than object d.
The second type of nearest neighbor query is the k-NN query. In this case, we return the k objects
in the dataset that are closest to the query point q. That is, we return the k data objects that minimize
the sum of the Euclidean distances between the query point q and each of k dierent data objects. The
result set R of any k-NN query satisfies the property |q, o| |q, o0 |8o 2 R, o0 2 O
R. Furthermore, we
observe that the cardinality of the result set is always k. Finally, it is worth noting that the 1-NN query
is simply a specific type of the k-NN query with k = 1.
Consider some examples where a region query could be useful:
A motorist may want to find the five closest gas stations to his/her current geographic location.
Air force pilots may need to identify the closest enemy fighter in order to engage in combat
eectively.
Given projected audience age and income demographics, a movie studio may attempt to identify
past movies that are similar to a proposed motion picture in an eort to predict sales.
Several mechanisms exist for processing nearest neighbor query information. Once again we assume
that an R-tree index exists on the data to be processed. As such, we know that each node has an associated
MBR to indicate the region that is covered by child nodes. The primary method used for answering NN
queries is that of the distance browsing technique [7]. Recall that this algorithm processes nodes in order
of their minimum distance from the query point q. During each iteration of the algorithm processing
logic, we dequeue a node entry from a priority queue and insert its children back into the priority queue
for future processing. Furthermore, the algorithm examines data objects precisely in increasing order of
their Euclidean distance from the query point. It follows that the first object (excluding internal nodes)
examined is precisely the single result set object in the case of a 1-NN query. By extension, the first k
objects located by the algorithm are precisely the k objects in the result set of a k-NN query. As in the
case of region queries, the distance browsing technique terminates with a priority queues of unexplored
nodes that are sorted by the mindist metric. This fact will be useful in constructing a containment scope
for nearest neighbor queries.
2.3.3 Reverse Nearest Neighbor Query

With the two basic spatial query types now defined, we turn our attention to the reverse nearest neighbor
query (RNN). Two dierent subtypes of reverse nearest neighbor queries exist and are similar to those
21
defined for the nearest neighbor query. The two categories of RNN queries include the R1NN query
and the RkNN query. We review the general idea of RNN queries below and then supplement the
discussion with details that are specific to each subtype of the RNN query category.
Recall that a nearest neighbor query identifies the object in a dataset that is closest to a query
point q with respect to all other data objects. The term closest allows for some ambiguity in this
definition which can be eliminated by introducing a clearly defined distance function. Most commonly,
the Euclidean distance metric is used to perform comparisons. The RNN query attempts to identify the
same relationship as a NN query but does so in the opposite direction. That is, it identifies all objects in
a dataset that would have the query point as one of their closest points if the query point were added
to the dataset. Unlike the NN query type, the cardinality of an RNN result set is not fixed and also can
potentially be empty.
Considering the R1NN query type, the result set consists of all data objects o that are closer to the
query point q than to all other data objects in the dataset O. Any result object o of an R1NN query
satisfies the property |o, q| |o, o0 |8o0 2 O. Figure 2.3(d) shows an example of an R1NN query issued at
point q. The result set of this query includes objects b, c, and d. Notice that the corresponding circles
cir(b, |b
q|), cir(c, |c
q|), and cir(d, |d
q|) contain no other data objects. It follows that q is the closest
point to each data object. On the other hand, objects a, e, f , g, and h are closer to other dataset objects
than they are to the query point q.
Next, we examine the RkNN query type. There is a natural analog between the relationship of NN
and kNN queries, and this relationship can be extended to cover the case of reverse NN and reverse
kNN (RkNN) queries as well. Specifically, a kNN query looks for the closest k objects to a query point,
while an RkNN query searches for any object that has the query point as one of its closest k objects. Any
result object o of an RkNN query satisfies the property |o, q| |o, o0 |8o0 2 Z. Here, Z represents any set
that satisfies (1) Z O and (2) |S| = |O|
k. Once again, the R1NN query is simply a specialized case of
the RkNN query with k = 1.
Although the RNN query type is analgous to the nearest neighbor query but cannot be addressed
by that basic query type because of the inherent asymmetry between the two query definitions. To
illustrate that an R1NN query cannot be solved using existing 1NN or range query types, consider
Figure 2.4. In Figure 2.4(a), we issue a 1NN query at point q and obtain the result of data object d.
However, d is not a member of the R1NN result set, as data point c is closer to d than q. (That is, L0 < L.)
In Figure 2.4(b), we attempt to use a range query issued at point q to identify the result of an R1NN
query issued q. Using a radius of r, we ensure that every included data object has q as its closest point
but accidentally eliminate the legitimate result object a. If we expand the radius to r0 so as to include
data point a, we accidentally include non-result objects c and d that are closer to each other than they are
to query point q. Thus, we conclude that basic spatial queries cannot easily be used to solve an R1NN
(and, by extension, RkNN) query.
Consider some examples where an RNN query could be useful:
A pizza chain wants to determine which customers might frequent a new store added in some
geographic region. They may also want to know what other pizza restaurants might lose business.
An RNN query can identify both pieces of information.
22
f
c
d
L q
g
c
d
r q
b
a
(a) NN query
f
e
r
a
(b) Range query
Figure 2.4. Basic spatial query attempts to solve RNN query
If we define the closeness of two objects to be some sort of similarity function, then we can
compare the eect of adding dierent products into a market. For instance, a movie studio could
rate the similarity between multiple movies being released to that of a previous blockbuster and
select the one that would provide the highest predicted viewership and, by extension, profitability.
Schools could use RNN queries to identify possible papers that have been victims of plagiarism
by some new but dubious work by identifying common phrasing and content. Once again, the
distance function of the RNN query can be designed in a way to identify the suspect commonality
between papers.
In the United States military, the Joint Chiefs of Sta may need to conduct simulations to determine
where command stations should be constructed to serve as many troops as possible. RkNN queries
can help to answer these questions.
On a related note, military commanders can alert troops of new enemies in a field of combat by
identifying those troops that are closer to the enemies in question than to other friendly or hostile
forces. RNN queries provide precisely this ability.
We now turn our attention to various methods for computing RNN queries. There has been substantial work done in the area of RNN query analysis since the query type was formally introduced in
2000 by Korn. This survey considers four popular methods [8, 9, 10, 11] for computing the results of an
RNN query and its variants. We focus on algorithms that would provide an exact result set in order to
satisfy the accuracy requirements of our spatial query containment framework.
The first approach for computing an RNN query was proposed by a paper by Korn on the topic
of influence sets. Korn introduced the concept of reverse nearest neighbor (RNN) queries and several
straightforward variants such as the reverse k nearest neighbor (RkNN) query [8]. In addition, he
distinguished between two types of RNN queries: monochromatic RNN queries and bichromatic RNN
queries. In the former case, each point in the dataset considers all other points as possible nearest
neighbor candidates. An example of this scenario might be a commuter looking for nearby people
with whom to visit. In contrast, a bichromatic dataset is divided into two distinct categories, and each
23
B
g
g
d
q
d
q
c
a
E
(a) Korn RNN processing algorithm
(b) Stanoi RNN processing algorithm
g
(1)
(0)
PP
e
(0 1)
q
b
h
f
(1 2)
(1 2)
q
c
b
(0)
c
(0)
a
(0)
P
(c) Tao RNN processing algorithm
(d) Lee RRNN processing algorithm
Figure 2.5. RNN evaluation techniques
category only considers candidates from the other category as possible neighbors. We often color the
points in our dataset as either red or blue depending upon designated group membership. In such a
case, red points identify their nearest neighbor from all existing blue points and vice-versa. A real-world
example of such a scenario might be a categorization of police officers (colored red) and the citizens that
they are charges with protecting (colored blue).
In this inaugural solution to the RNN query problem, Korn pre-computed the NN query result for
each point in the dataset. He then maintained an R-tree that was populated with NN circles instead
of data points. An NN circle for a given data object, o, is the circle centered at o whose circumference
touches the nearest neighbor o0 of o. We denote such a circle as cir(o, |o, o0 |). Sample RNN circles for
our running example dataset are given in Figure 2.5(a). An RNN query then reduces to identifying
all circles that contain the query point. This was an eective solution for static datasets in which the
24
expensive pre-computation step was only performed once. However, dynamic datasets exhibited poor
performance since the updated process was computationally expensive. Upon insertion of a new object,
the circles of every object that had the new object as a nearest neighbor must be updated. The NN result
for the new object also had to be computed and has to be inserted into the R-tree. To facilitate the
addition of new data objects, Korn also maintains a second R-tree that contains only the data objects to
facilitate NN query evaluation.
This work also extends the general algorithm to support the RkNN query type by storing the circle
for the kth closest object to each object in the dataset. However, the approach assumes that k is both
fixed and known in advance. Unfortunately, this generally is not the case.
Shortly after the publication of Korns technique, Stanoi developed a more efficient method of RNN
computation on dynamic datasets [11]. This new method addressed the large update cost associated
with the R-tree structure of NN circles in Korns original design. The new technique for computing
RNN queries avoids pre-computing NN circles by observing that there can be at most six RNN query
results when monochromatic queries are considered. This last restriction is an important one, as the
assumption does not hold for bichromatic cases. Beyond simply identifying that there can only be six
monochromatic RNN query results, Stanoi partitioned the dataspace into six sections in such a way
that each section could contain at most one of the RNN result objects. A sample partitioning scheme is
provided in Figure 2.5(b). The query and dataset are the same as in the example used to illustrate Korns
approach. We depict the six partitions as A, B, C, D, E, and F. Notice that the partitions are centered
around the RNN query point and that each consist of infinite length sectors with interior angles of 60
degrees. Finally, notice that at most one result object exists in each partition as is to be expected.
To compute an RNN query result, the authors use an R-Tree structure that contains only the points
of the dataset. (In fact, other spatial indexes can be used. For example, the R*-Tree structure can actually
yield better performance in most cases.) To begin, the algorithm issues six NN queries from the query
point but restricts the results to the sector under consideration. Next, the approach issues additional NN
queries from each of the previously identified NN points to determine if they are in fact RNN solutions.
That is, the distance between one of the candidate result objects and its nearest neighbor in the data set
must be greater than the distance between the candidate result object and the query point for it to be
an actual result object for the RNN query. Unlike the solution by Korn, this technique lacks scalable
support for RkNN query types and thus makes it ill-suited for environments where such queries are
needed.
As a third substantial work, we consider the contribution of Tao et. al in discovering an efficient
method for performing RkNN queries in a variety of datasets [9]. This method leverages ideas from the
study of Voronoi cells in order to prune R-tree nodes from the search space. Recall that Voronoi cells
are simply formed using a series of perpendicular bisectors. The algorithm by Tao eectively pretends
that the query point q is in the dataset and prunes the space using a series of bisectors between q and
other objects in the dataset. Any data object that lies completely on the opposite side (with respect to
the query point) of a perpendicular bisector of the line between q and some other data object o0 cannot
possibly be a part of the RNN query result set. Figure 2.5(c) illustrates the incremental reduction of the
query space by the algorithm. Here, we are examining object d and use the perpendicular bisector ?o,q
25
to reduce the search area to region P.
To reduce the computational and storage complexity needed to represent the arbitrary polygon
formed by this reduction process, the authors decrease the granularity of their solution and simply trim
the bounding box of the search space as much as possible. This is represented by region P [ PP in the
example figure. A final important improvement for the approach by Tao is its complete support for
dynamically changing values of k for RkNN queries.
With some basic query evaluation techniques for RNN queries now established, the fourth and
final technique reviewed considers a dierent variation on RNN query evaluation that will prove to
be pertinent to our construction of our containment scope framework. This approach by Lee defines
and processes an RNN query variant referred to as the ranked reverse nearest neighbor (RRNN) query.
RRNN queries attempt to identify those objects that are most influenced by the query point. That is,
suppose that k < k0 . Then the algorithm prioritizes the result set to include those data objects that have
the query point q as their kth closest neighbor before those data objects that have q as the k0th nearest
neighbor. The query has a stop condition based on a certain required cardinality of the result set as
opposed to the RkNN query approach that places a strict requirement on the value of k.
In order to answer this new type of query, Lee et al oer two dierent approaches to solve the
problem: k-counting and k-browsing [10]. We briefly describe each of these techniques. Both methods
utilize the notion of perpendicular bisectors discussed previously in the RkNN technique proposed by
Tao. In addition, the R-tree index structure is used to facilitate access to objects and to provide an
organized spatial grouping of data objects. In the k-counting approach, the algorithm visits data objects
in order of increasing minimum distance from the query point. Each data point contains a mink value
that denotes the minimum number of other objects in the dataset that are closer to that object than q.
As each data point is visited, we update the mink counts of other objects by incrementing those objects
on the distant side (with respect to q) of the perpendicular bisector by one. Counts are also maintained
for the internal bounding box nodes in the R-tree but are only incremented if the entire bounding box
lies on the distant side of a perpendicular bisector. Through a straightforward geometric argument,
Lee was able to finalize a mink count (that is, declare that mink = k) for an object o once the minimum
distance between q and o is less than half the distance between q and all unexplored objects in the
dataset. Symbolically, the algorithm finalizes object o if |o, q| > 2 |o0 , q|8o0 2 O. Consider the example in
Figure 2.5(d). Here, we have already processed object d and updated the mink values of all other data
objects. When object c is examined, we update the mink values of all other objects in ?c,q . Thus, the mink
values of e, f , and h are all incremented by one.
We now consider the k-browsing, the second RRNN evaluation technique proposed by Lee. In this
case, we maintain an estimated mink count for each object and attempt to visit points based on increasing
order of mink. The logic of this procedure inherits many key features from the k-browsing technique
previously described. However, there are several key dierences. First, as already mentioned, we use
a dierent criteria for ordering potential data nodes for processing. Those nodes that are most likely to
have small values of k (as predicted by the mink value) are visited before those nodes with large mink
values. Secondly, we attempt to provide intelligent processing of internal nodes whose MBRs lie on
the perpendicular bisector of the data object being processed. Specifically, we track the number of data
26
objects inside of each internal node and determine (using the exact positioning of MBR extents) how
many of those objects must lie closer to the data point than the query point.
For both RRNN evaluation algorithms, it is possible to adapt the stopping criteria to satisfy an RkNN
query through the following observation. Throughout RRNN evaluation, data results are incrementally
finalized based on increasing values of k. That is, all result objects that have a value of k = 1 will be
returned by the algorithm before all result objects with a value of k = 2. Consequently, we can issue an
RRNN query with an arbitrarily high requested result set cardinality and then proceed to short circuit
our evaluation when the first result object is returned with a value of k that is greater than that of our
RkNN query parameter. Thus, solving an RkNN query can be reduced to solving an equivalent RRNN
query.
2.3.4 Location-Dependent Spatial Query

Before concluding our discussion on spatial query types, we briefly consider two query categories that
are orthogonal to the region, NN, and RNN classifications described previously. The two important
alternative classifications of spatial queries are location dependent spatial queries (LDSQs) and timeparameterized spatial queries (TP queries). LDSQs search for objects according to their proximity to
the clients current location. That is, the position of the query point q is correlated to a clients actual
location. LDSQs are very popular in the growing business of location based services (LBSs), as clients
typically want to obtain information that is relevant to their geospatial position. Basic LDSQs can be
categorized as region (range, window) queries or nearest neighbor (1NN, kNN) queries. Other spatial
query types such as the RNN query also have LDSQ equivalents.
Because LDSQ result sets are a function of the clients current location and because users positions
are highly volitile, LDSQs are frequently reissued to ensure correctness. In cases where the result
set does not change, subsequent submissions of the same LDSQ at a new location are redundant.
Unnecessary LDSQ processing decreases system performance, limits scalability, and poses a significant
threat to the growth of LBS technologies. The mobile environments in which LBSs operate are highly
constrained by scarce wireless bandwidth, unreliable network connections, and limited client battery
power. Furthermore, each client may issue a large number of LDSQs to the LBS server. To improve
system performance, administrators must (1) decrease the number of users, (2) increase network capacity
and server resources, or (3) reduce the number of queries issued by each client to the server. The third
approach is the most appealing as it involves no cost and maintains system scalability. It follows that
LDSQs represent an important query subset with wide applicability that can potentially benefit from
our spatial query containment framework.
2.3.5 Time-Parameterized Spatial Query

Like LDSQs, time-parameterized (TP) queries can include all of the previous spatial query categorizations. That is, region, NN, and RNN queries can all be adapted to support the time-parameterized
spatial query model. The notion of TP queries was developed by Tao [12] and extends spatial queries
27
to incorporate temporal restrictions. That is, a TP-query pretends that a spatial query is issued continuously with a query point that moves with some fixed velocity v. The query returns a result tuple
(R, T, C). Here, R represents the result set for the original query position, T represents the time at which
the original result set is invalidated, and C represents the appropriate modifications to result set R at
time T. We say that a result set is invalidated when the contents of the result set change through either
(1) the addition of a new data object to the result set or (2) the elimination of an existing result object
from the result set. The time at which the invalidation occurs is determined by considering both the
query results as well as the velocity of the TP-query.
One positive side eect of the TP query type structure is that multiple TP queries can be chained
together in order to produce incremental results as a query point travels through a spatial dataset.
That is, it is possible to map the correct result set R for an point in space along some fixed trajectory
given by the velocity vector v of the TP query. However, TP queries are computationally expensive to
compute and repeatedly issuing such queries can place a stress on overall system resources. Eective
disk caching mechanisms are particularly important for repeated TP queries in order to amortize the
substantial cost of the first TP query over subsequent queries that are likely to explore similar data
objects and, by extension, to retrieve a significant portion of requested data from the system cache.
2.4
Auxiliary Scope Techniques
While data organizational structures and spatial query types oer essential background for understanding the tools used in the construction of our spatial query containment framework, this section reviews
existing auxiliary scope concepts that attempt to accomplish the same goals as our novel containment
scope solution. Specifically, auxiliary scope solutions associate a region with some answered query
Q wherein a future query Q0 may be able to be answered locally using previous result information.
Dierent techniques place dierent restrictions on the region size and on the parameterization of Q0
with respect to Q. As will be shown, existing auxiliary scope techniques lack the flexibility to eectively
and efficiently eiminate a large number of redundant spatial queries. However, they can still have a
positive impact on the overall resource utilization of a spatial query processing framework.
Existing auxiliary scope techniques can be divided into two broad categories. These are commonly
referred to as semantic scope [13, 14, 15] and valid scope [16, 17]. All of these methods use previous
query results to eliminate some unneeded query processing. In addition, each auxiliary scope method
can support a variety of basic spatial query types, including both region and NN queries. To the
authors knowledge, there does not exist any auxiliary scope method for the reduction of redundant
RNN queries. This paper will present both valid scope and containment scope approaches for this
query type. We now introduce the theory and various implementations of both semantic scope and
valid scope approaches.
28
g
e
a
Q
h
i
Q1
f
Q2
(a) Semantic region
(b) mNN query
Figure 2.6. Semantic scope construction approaches
2.4.1 Semantic Scope

The basic idea of semantic scope processing [13, 14, 15] is to associate the search region of a query Q
with its result set R. By doing so, a new query Q0 whose search region is included by that of a previous
query Q can be answered by R. We first consider region query processing and then turn our attention
to the issue of NN query processing.
Most existing research has revolved around window queries, but the extension to other range
queries and other region query types is straightforward. Dar et al [13] associates semantic regions with
previously issued window query results. Any window query fully covered by an existing semantic
region is guaranteed to be answerable by the client using only the locally available result set of the
query with which the semantic region is associated. Consider the example shown in Figure 2.6(a). A
window query Q has been executed and its result set R = {c, e, g, h, i} has been preserved along with the
query space by a semantic region in the client. Next, a new window query Q1 is issued. Notice that the
window of Q1 is fully covered by the semantic region. By comparing the search areas of Q and Q1 , the
algorithm can guaranteed that Q1 can be fully answered using a subset of R. Specifically, the new result
set is given as {c}.
However, notice that region query semantic scope does not support reusing result set R of query Q
to answer the new query Q2 . Despite the fact that the result set for Q2 is a subset of the result set for
Q, the semantic region algorithm has no knowledge of those non-result objects located outside of the
query search area. Consequently, the client is forced to contact the server to evaluate the query.
Beyond region queries, there also exist semantic scope approaches to efficiently answering NN
queries in general and kNN queries in particular. Song discovered an interesting distance property
related to kNN queries that allow a stored mNN query result set to be used to answer a new kNN
query where m > k [18]. Suppose that an mNN query was issued at a query point, q and its result set
R {o1 , , om } with |oi , q| < |o j , q|, 8i < j. (Here, we use |i, j| to denote the Euclidean distance between
points i and j.) When the client issues a new NN query at another query point q0 with k m, the triangle
inequality allows us to conclude that the new querys result set is a subset of the original querys result
set if the distance between q and q0 , denoted by |q, q0 |, is not greater than (|om , q|
conclude that RQ0 RQ if |q, q0 | (|om , q|
|ok , q|)/2.
|ok , q|)/2. That is, we
However, this conditional evaluation for verifying that a future kNN query is contained within a
29
previously issued mNN query is overly conservative. It follows that this method is less eective at
reducing redundant queries than an optimal solution. This is because the safe distance bound of |q, q0 |
is derived completely based on the result objects without considering the distribution of non-result
objects. For instance, Figure 2.6(b) shows the result set of a 4NN query that has been stored by the client
at some earlier point in time. Using the previous definition of the safe distance for mNN queries, we
notice that the safe distance bound of this 4NN query is given by (|g, q|
|e, q|)/2. A new 2NN query
is issued at location q , which is more than the safe distance away from q. Thus, the 2NN query is
considered to be not covered by the 4NN query result. In actuality, this earlier result set could be used
to answer the query, since the 4NN query covers the 2NN query result objects e and i. It therefore
is reasonable to conclude that it is possible to improve upon the mNN approach to redundant query
elimination.
2.4.2 Valid Scope

g
e
c
h
i
q
f
(a) Window query
d
(b) NN query
Figure 2.7. Valid scope formulation (TP-query approach)
As the second existing auxiliary scope evaluation method, valid scope adopts a substantially dierent
approach. Let Q denote a spatial query and let Q0 denote a second spatial query that is identical to Q
in every respect except for the location at which the query is issued. A valid scope for Q corresponds
to an area inside of which a new query of the form Q0 is guaranteed to have exactly the same result
set as Q. That is RQ = RQ0 if Q0 .q 2 VS(Q). Here VS(Q) denotes the valid scope for query Q. Valid
scope exploits information about the relative distribution of data objects, and both result objects as well
as non-result objects are considered when constructing the validity region. Also, notice that the valid
scope approach requires an exact match between the previous querys result set and the future querys
result set. This is dierent from the case of semantic scope that only required that future query results
be a subset of the existing results. As with all auxiliary scope methods, we associate a valid scope with
each query Q and that querys result set R. Thus, redundant queries can be answered by the client
locally using the result set RQ of a previously issued query Q so long as the new query is inside the
valid scope of Q. Membership in a single stored valid scope is sufficient to avoid issuing the query to
the server for processing. However, storing multiple valid scopes increases the likelihood of finding a
match, since a higher percentage of the data space will be covered by stored querys valid scope. Both
30
region queries and NN queries are supported by the valid scope processing algorithm, and two popular
implementations exist [16, 17]. The first implementation by Zhang uses TP queries to form valid scope
boundaries, while the second implementation by Lee leverages several innovative geometric properties
to efficiently calculate the valid scope area. We consider each approach in turn.
For the first solution, Zhang adopts an intuitive approach that dynamically simulates client movement from the query point in all possible directions in an eort to probe for non-result objects that
aect result validity [16]. Since the client movement is simulated using TP queries, we refer to this
method throughout the paper as the TP-query valid scope approach. After computing the result set
of a particular spatial query, a number of time-parameterized (TP) queries [12] are issued to identify
non-result objects that may influence the result set prior to it being invalidated for some other reason.
For the case of region queries, the algorithm only discusses a solution to window queries. The initial
tentative valid scope domain for a given window query is formed as the intersection of the Minkowski
regions of all result objects [19]. Here, a Minkowski region is a rectangular region centered at a selected
data object and has extents equal to that of the query window being processed. In other words, the
Minkowski region is simply the query search space shifted to some new center point. After computing
the initial valid scope area, a number of TP window queries with universally identical extents are
initiated from the current query point toward all vertices of the tentative valid scope. In the event that
any TP window touches a non-result object before reaching its destination vertex, the valid scope is
trimmed so as to ensure that the identified non-result object cannot possible enter the new valid scope
area. Figure 2.7(a) shows a formation of the valid scope for a window query. Here, the arrows represent
necessary TP-window queries that are issued by the algorithm to the corners of the tentative valid scope
(represented by the Minkowski region of result object c). This valid scope refinement repeats until every
remaining valid scope vertex can be probed without encountering a non-result object. This represents
the finalized valid scope region.
For the case of NN queries, Zhangs algorithm issues a serious of TP NN queries to formulate the
valid scope of some NN spatial query Q. Initially, the valid scope is assumed to be the entire data
space. Then a number of TP NN queries are issued toward each vertex in the tentative valid scope.
If a non-result object is encountered by the TP NN query prior to reaching some vertex in the current
valid scope, the tentative region is restricted to prevent such an encounter, and the process begins again.
Figure 2.7(b) shows the valid scope for an NN query derived from the TP query approach. Notice that
the sides of the finalized valid scope correspond to the Voronoi cell of the result object c. These sides are
precisely the boundaries that ensure that the result set of the TP NN query does not include a non-result
object. For example, allowing the valid scope to extend to the lower-left corner of the figure would cause
object b to potentially become the nearest object to an NN query issued inside the valid scope. This
would change the result set and violate the condition of the valid scope construction. For a k-nearest
neighbor query (k > 1), the validity region is computed as the intersection of the Voronoi cells of every
result object under the assumption that the k
1 other result objects are ignored. Thus, constructing the
valid scope of a kNN query reduces to repeating the 1NN query valid scope construction algorithm k
times and then performing a simple intersection of k convex polygons.
Considering the performance of the TP query approach, it can be observed that the algorithm
31
identifies the exact valid scope for a given query but also incurs significant processing and disk overhead.
For NN queries, the validity region algorithm needs to execute Ntot = Nvert +Ncomp TP NN queries, where
Nvert is the total number of vertices in the final validity region and Ncomp is the total number of nonresult objects that aect the tentative valid scope. On the other hand, window queries require a total
Ntot = 4 + 3Ncomp TPWINDOW queries, where Ncomp is the total number of non-result objects that
aect the tentative valid scope. Thus, each valid scope computation incurs multiple TP queries, which
repeatedly access disk pages and utilize substantial CPU time. A large LRU cache helps to mitigate
some of the impact on disk I/O. However, such a feature may not be available in some situations and
still fails to oset a substantial portion of the computational overhead.
a q
search area
for non-result
objects
(a) Range query
range
query
a q
search area
for non-result
objects
window
query
(b) Window query
Figure 2.8. Valid scope formulation (geometric approach)
Observing that the TP query based approach to constructing a querys valid scope often results
in substantial overhead, Lee devised an alternative valid scope computation method that exploits the
geometric properties of various spatial query types [17]. Supported spatial queries include both region
and NN query types. One key observation used by this approach is that those non-result objects required
to formulate a valid scope should be spatially nearby the result objects of the query being process. Lees
work focuses on performing valid scope computation on a wireless data broadcast system in which the
arrival order of spatial objects follows a certain broadcast schedule. In the broadcast, the search area
for needed non-result objects is fine-tuned dynamically as query result objects are downloaded from
the broadcast. As the computation of the result set and valid scope are integrated together, the entire
algorithm can execute within a single broadcast cycle. Figure 2.8(a) and Figure 2.8(b) show the search
area for non-result objects in grey for a sample range query and a sample window query, respectively
Other non-result objects not covered by the search area are determined by the algorithm to have no
impact on the valid scope based on other more restrictive requirements already in place. It follows
that these objects are not needed for valid scope computation and are not collected. In general, the
geometric approach proposed by Lee significantly outperforms the TP query approach by Zhang in
terms of minimizing disk accesses and reducing execution time. Note that the actual computed valid
scope is identical for each algorithm. For the purposes of this report, we adapted the geometric valid
scope computation technique to work on an R-tree index in a client-server communication environment.
In doing so, we ensured that our spatial query containment framework would be compared against the
best alternative auxiliary scope techniques that currently are available.
32
Finally, it is worth noting that several alternative methods exist for computing the valid scope of a
NN query aside from the approaches by Zhang and Lee. In one technique, the valid scope is formed by
referencing the result objects Voronoi cell, which is pre-computed prior to introducing the system into
operation [19]. Recall that the Voronoi cell of an object o represents the region wherein o is the closest
object of all items in the dataset. When an NN query is evaluated, the Voronoi cell of the result object
is downloaded to the client [20]. So long as the new query location remains within the stored Voronoi
cell, we conclude that o remains the nearest object to the query point, and the result set is unchanged.
Note that this approach is limited to 1NN queries and does not easily scale to support kNN queries
with arbitrary values of k.
Unfortunately, valid scope is not a panacea for our ultimate goal of eliminating redundant spatial
queries. Valid scope is designed mainly for result validity check. Two queries that have similar
but dierent query parameters (e.g. dierent radii, dierent extents, or dierent object cardinality
requirements) are considered to be fundamentally incompatible by the valid scope routine. In this case,
the client has to submit the query to the server for evaluation even if the result set is identical to a
previously issued query. Furthermore, we note that valid scope does not all for cases in which the result
set of a future query is a strict subset of an existing querys result set. This case should be handled
locally as all necessary data is available. However, the valid scope approach would place the burden of
submitting these queries on the server.
2.5
Caching Mechanisms
In the final section of our literature review, we consider the important role that caching plays in
reducing redundant spatial queries. This role can complement or replace the auxiliary scope approaches
mentioned in Section 2.4.
As has already been alluded, caching mechanisms can operate in tandem with auxiliary scope
techniques in order to provide more eective elimination of redundant spatial queries. By caching
additional auxiliary scope information, we increase the likelihood of identifying redundant queries
and can answer such queries using locally contained data. Various cache replacement strategies such
as least recently used (LRU) eviction can be adopted for eliminating auxiliary scope entries for which
the client does not have sufficient space to retain. However, it is important to note that other more
complex caching schemes exist that can reduce the server query submission rate without the assistance
of auxiliary scope methods.
One popular caching model for spatial data interaction is the complementary space (CS) caching
system proposed by Lee [21]. CS caching stores a collection of data objects and coalesced regions
(referred to as complementary regions) in order to maintain a global view of the dataset at all times.
This global representation of data includes both result objects and non-result objects. Since clients
have limited storage capacity and since distant non-result objects are less likely to be accessed in
the near future, the algorithm represents clusters of distant objects as complementary regions. The
complementary regions indicate a less fine granularity of data representation than individual object
storage but ensure that some information is maintained about all objects in the dataset. As long as new
33
queries do not cover any stored complementary region, the result sets of these queries are guaranteed
to be locally available. That is, the result objects are stored at the finest level of granularity and can be
accessed by the client directly.
The CS caching approach diers from auxilary scope techniques in that a global view of the dataset
is maintained. However, the burden of storing this extensive view may be substantial and impede the
caching of relevant result object information. Thus, auxiliary scope solutions can oer a low overhead
solution in cases where the cost of adopting the CS caching model is prohibitive. Furthermore, auxiliary
scope methods can possibly eliminate additional queries (1) by examining query semantics and (2) by
maximizing the available space for cached result objects. Ultimately, the choice between CS caching
and auxiliary scope implementations is dependent upon the environment in which spatial queries are
being issued.
Chapter
Containment Scope Framework

3.1
System Overview
As a foundation for the discussion and analysis in the remainder of this paper, this section introduces the
general computational framework necessary to support spatial query containment. First, we enumerate
the dierent components of our computational system and state any underlying assumptions of the
model. In addition, we oer details on the various communication environments in which spatial query
containment can be deployed. Next, this report formally defines and clarifies relevant definitions for
spatial query containment and describes several fundamental processing strategies that are common
across all supported spatial query types. Subsequent chapters will examine specific computational
methods for supporting region (range, window), nearest neigbhor (NN, kNN) and reverse nearest
neighbor (RNN, RkNN) spatial query types.
3.2
System Components
The general system model for spatial query containment is illustrated in Figure 3.1. The proposed
framework consists of multiple clients served by a single central server. It is possible to deploy multiple
servers with trivial changes to the algorithms discussed in this paper. Such a depolyment can potentially
improve the scalability of a system or increase the service region. However, we assume that there is
only one server to simplify our discussion of the spatial query containment solution. There are three
possible types of clients that correspond to various communication and distribution options.
1. Mobile clients communicate wireless with the server and change location frequently. Consequently, we expect such systems to have limited bandwidth and unreliable network connections.
2. Stationary clients are desktops, workstations, or other computing devices that are physically
separated from the server. Unlike mobile clients, stationary clients can connect to a network
via wired or wireless networks and tend to leverage a stable infrastructure for communication.
35
Spatial Query (location, supplemental parameters)
Wir
ele
mobile
client
ss C
omm
un
(Query, Result Set, Containment Scope) triple

icat
io
Wired Communication
Communication
Network
stationary
client
n
icatio
mun
l Com
a
n
r
Inte
Central
server
virtual
client
Figure 3.1. General spatial query containment system model
Because they do not move regularly, we expect communication with these devices to be more
stable.
3. The final type of client is a virtual client. These clients actually coexist on the same physical
machine as the server. In this case, the client and server are simply two dierent processes
running on the same hardware. Communication occurs internally to the system and typically is
quite fast and reliable.
The type of clients used in a system is dependent on the application for which spatial query containment
is deployed. It is also possible to support multiple types of clients simultaneously when the application
demands such functionality.
In our system model, clients are information seekers and servers are information providers. That
is, clients are responsible for issuing spatial queries, while the server is responsible for producing a
result. In the case of containment scope processing, the server actually returns a triple to the client.
The components of the response include (1) an identifier to indicate the spatial query Q to which this
response should be associated, (2) the result set of query Q, and (3) supplementary information that
represents the containment scope for query Q. The client stores the returned triple in its local cache
and compares future queries against it in hopes of avoiding additional query submissions to the server.
Each component of the spatial query containment model is important to ensuring that redundant query
elimination occurs both accurately and efficiently.
3.3
Underlying Assumptions
With a general overview of the spatial query processing components now complete, we turn our
attention to several important assumptions that are made about the state of the system and the spatial
data that it contains. The first set of assumptions surrounds the data stored on the central server. The
36
server in our system model maintains a large set of stationary spatial objects S on a two-dimensional
service area A. The location of each spatial object is represented by a set of spatial coordinates in A.
The examples and code in this work assume that spatial objects are point objects and not region objects.
However, it is possible to handle this second case with minor adapations. The entire dataset is stored
on the server, but we assume that each client lacks sufficient resources to maintain or to query the entire
dataset. Thus, clients are forced to submit queries to the server unless such a submission is avoided
through cached containment scope data. In addition, the dataset S is assumed to be static, and the issue
of cache invalidation is not discussed. This last issue is a limitation of all auxiliary scope approaches,
and its removal will be the subject of future work. All data is indexed using an R*-tree index T because
of its favorable spatial cluster properties, excellent performance, and wide acceptance in industry. The
R-tree index will be essential in the evaluation of query result sets as well as the computation of each
querys containment scope. Section 3.7 oers several strategies for efficiently traversing R-trees that
will be useful in developed algorithms.
The spatial query containment framework presumes that clients issue queries to a central server
through some communication medium (e.g. wired network, wireless network, interprocess communication). The client may exist as a separate physical device or simply as a logical process on the same
hardware as the server component. Clients can issue one of three possible spatial query types: region
(range, window) queries, nearest neighbor (NN, kNN) queries, and reverse nearest neighbor (RNN,
RkNN) queries. Each query is defined by its query location q as well as by one or more supplemental
parameters (e.g. radius r, extent l, cardinality k). Section 3.5 oers formal definitions for each of these
queries that will be used throughout the paper. The server is responsible for evaluating queries against
the locally stored dataset and for communication result and containment scope information to the client.
3.4
Communication Model
There are several dierent methods by which query requests and query responses can be transmitted
between the server and client. The three most common options are internal transmission, on-demand
transmission, and broadcast transmission. With internal transmission, the system incorporates virtual
clients (as shown in Figure 3.1), and all communication is done between processes using one of several
inter-process communication mechanisms. This option results in exceptionally fast and reliable communication, as most data is simply transfered between dierent memory locations on the same machine.
No external communication medium is necessary. The most popular applications of spatial query containment that make use of internal transmission are complex multi-dimensional analysis programs and
geospatial information systems that require large amounts of computing resources.
In contrast, on-demand transmission mechanisms involve exchanging data between two or more
physically distinct machines. A copy of the server component typically is maintained on a powerful
system, while the clients may reside on desktops or more limited mobile device platforms such as
PDAs and smart phones. Communication in this case occurs using wired or wireless connections
established through network adapters. As such, bandwidth and latency are of much greater concern
for these systems than for models that choose to use internal transmission. The type and robustness of
37
the network determines the degree to which network reliability and bandwidth impact the scalability
and performance of the spatial query processing framework. However, spatial query containment has
an opportunity to decrease network reliance and network load substantially, so the deployment of
this framework under cases of on-demand transmission are very important. A second key aspect of
this type of network is that clients actively request information by submitting queries and receive a
personalized response from the server. This is a straightforward approach but may limit scalability if
too many queries are issued. Example applications of on-demand transmission include location based
services that send a clients location to the server in order to obtain content about surrounding points
of interest.
The final communication model is broadcast transmission, which once again involves exchanging
data between physically distinct machines. However, unlike on-demand transmission, clients do not
actually submit queries to the server for evaluation. Instead, a constant indexed broadcast of data
content is sent (broadcast) from the server to all clients. This information is not personalized, so it
may be used by every client for processing. Broadcast transmission oer highly scalable infrastructures
but incur potentially high latency and place a substantial processing burden on the client that may
or may not be reasonable. Several auxiliary scope broadcast systems have been developed [17], but
the fundamental operation of such communication models is quite dierent from the client-server
approach in this paper. Consequently, we focus on cases of internal and on-demand communication.
The extension of spatial query processing to broadcast environments will be the focus of future work.
3.5
Spatial Query Definitions
With the necessary components, assumptions, and communication methods for spatial query containment established, this section seeks to formalize the notion of containment scope and supporting
concepts. We begin by considering the precise mathematical definition of the spatial query types supported by spatial query containment. Next, this report introduces the notion of semantically contained
queries that leads to the eventual definition of containment scope.
First, this section considers the definitions for region queries, nearest neighbor queries, and reverse
nearest neighbor queries. Region queries return all data objects within some bounded space that is
defined by some central query point q and additional supplementary parameters. The two region
queries considered in this paper are the range query specified by Definition 1 and the window query
specified by Definition 2.
Definition 1. Range query. Given a set of objects, S, a range query denoted by Qrange (q, r) retrieves all the
objects o 2 S with Euclidean distances from a query point q that do not exceed a supplemental parameter distance
r. The result set of query Q is denoted by Rrange (q, r) and equals {o | o 2 S ^ |o, q| r}. Rrange (q, r) is equivalently
defined as all objects in the region given by cir(q, r).
Definition 2. Window query. A window query denoted by Qwindow (q, l, h) returns all objects o 2 S whose xand y-distances from a query point q do not exceed supplemental parameter distances h and l, respectively. The
result set of Q is denoted by Rwindow (q, l, h) and equals {o | o 2 S ^ |o, q|x l ^ |o, q| y h} Here, |i, j|x and |i, j| y
38
represent the Euclidean distance between points i and j after these points are projected onto the x axis and the y
axis, respectively. Rwindow (q, l, h) is equivalently defined as all objects in the region given by rect(q, l, h).
The second category of spatial queries that are processed by the containment scope approach are
nearest neighbor queries. Such queries can be divided into the traditional nearest neighbor (NN) query
specified by Definition 3 and the k nearest neighbor (kNN) query specified by Definition 4.
Definition 3. NN query. An NN query, Qnn (q) retrieves an object o0 2 S that is closest to a query point q.
That is, o0 has the minimal Euclidean distance from q out of all possible data objects in S. The result set for Q is
denoted by Rnn (q) = {o0 } and satisfies following conditions: (1) o0 2 S; and (2) 8o 2 S
is equivalently represented by the object whose Voronoi cell contains the query point q.
Definition 4. kNN query. A kNN query, Qnn (q, k), retrieves the k (k
{o0 }, |o0 , q| |o, q|. Rnn (q)
1) closest objects to a query point q
from a set of objects S. That is, these objects represent the k smallest Euclidean distance values from a data object
to q. The result set for Qnn (q, k) is denoted by Rnn (q, k) and satisfies three conditions: (1) Rnn (q, k) = S0 S; (2)
|S0 | = k; and (3) 8o0 2 S0 , 8o 2 S
S0 , |o0 , q| |o, q|. Rnn (q, k) is equivalently represented as the k objects whose
Voronoi cells contain query point q when j < k data objects are removed from the dataset S. Notice that the kNN
query generalizes the NN query and that we can represent a NN query issued at location q by the kNN query
defined as Qnn (q, 1).
The final type of supported spatial query is the reverse nearest neighbor query. The two subtypes
in this query category include the reverse nearest neighbor (RNN) query specified by Definition 5 and
the reverse k nearest neighbor (RkNN) query specified by Definition 6.
Definition 5. RNN query (monochromatic). An RNN query, Qr nn(q) retrieves all objects o0 2 S that are
closer to a query point q than to any other object in S. That is, q has the minimal Euclidean distance from o0
when compared to all other possible data objects in S. The result set for Q is denoted by Rrnn (q) and satisfies two
conditions (1) o0 2 S; and (2) 8o 2 S
{o0 }, |o0 , q| |o0 , o|.
Definition 6. RkNN query (monochromatic). An RkNN query, Qr nn(q, k) retrieves all objects o0 2 S that
have a query point q among their k closest objects in S. That is, q is among the k smallest Euclidean distances
from o0 when compared to all other possible data objects in S. The result set for Q is denoted by Rrnn (q, k) and
satisfies the property that 8o0 2 Rnn (q, k)9S0 S such that |S
S0 | < k ^ 8o 2 S0 , |o0 , q| |o0 , o|. Notice that
the RkNN query generalizes the RNN query and that we can represent an RNN query issued at location q by the
RkNN query defined as Qrnn (q, 1).
It is also possible to categorize reverse nearest neighbor queries as monochromatic or bichromatic.

Definitions 5 and 6 specify the monochromatic version where there exists a single, unparitioned dataset
S over which queries are processed. In the bichromatic case, S is partitioned into two sets, SA and
SB . Objects in SA are considered as possible result set candidates but are ignored when counting the
number of objects closer to an object than the query point q. In contrast, objects in the set SB cannot be
selected as result objects but do influence the relative ranking of how close query point q is to candidate
result objects in set SA . The bichromatic definitions for the RNN query and the RkNN query are given
in Definition 7 and Definition 8, respectively
39
Definition 7. RNN query (bichromatic). An RNN query, Qr nn(q) retrieves all objects o0 2 SA that are closer
to a query point q than to any other object in SB . That is, q has the minimal Euclidean distance from o0 when
compared to all possible data objects in SB . The result set for Q is denoted by Rrnn (q) and satisfies two conditions
(1) o0 2 SA ; and (2) 8o 2 SB , |o0 , q| |o0 , o|.
Definition 8. RkNN query (bichromatic). An RkNN query, Qr nn(q, k) retrieves all objects o0 2 SA that have
a query point q among their k closest objects in SB . That is, q is among the k smallest Euclidean distances from
o0 when compared to all possible data objects in SB . The result set for Q is denoted by Rrnn (q, k) and satisfies the
property that 8o0 2 Rnn (q, k)9S0 SB such that |SB
S0 | < k ^ 8o 2 S0 , |o0 , q| |o0 , o|. Notice that the RkNN
query generalizes the RNN query and that we can represent an RNN query issued at location q by the RkNN
query defined as Qrnn (q, 1).
Out of notational convenience, this paper uses Qt (q) to denote a spatial query of arbitrary type
t 2 {range, window, nn, rnn} issued at a query point q and we use Qt .x to represent a supplemental
parameter x of Qt . For example, Qt .k for t = knn specifies the k value for a kNN search. Also, we use
RQ to denote the corresponding result set for query Q. In cases where the query type is clear, we ommit
the subscript and simply refer to the issued query as Q.
3.6
Containment Scope Definitions
With all supported query types now concretely defined, this paper defines the title concept of spatial
query containment as described by containment scope and semantic query containment.
Sptail query containment determines whether a spatial query Q0 can be answered by the result set
RQ of a previously issued spatial query Q using a novel containment test. This test is comprised of two
dierent verifications. The first portion of the test determines if Q0 is semantically contained by Q. We
denote such containment by by Q0 v Q and determine the result of the test by evaluating the type of
the query as well as relevant supplemental query parameters as specified in Definition 9.
Definition 9. Semantic containment. An LDSQ, Qt0 is said to be semantically contained by another LDSQ,
Qt , denoted by Qt0 v Qt according to the following cases:
1. Qt0 .r Qt .r if t = t0 = range.
2. Qt0 .h Qt .h ^ Qt0 .l Qt .l if t = t0 =window.
3. Qt0 .k Qt .k if t = t0 =nn
4. Qt0 .k Qt .k if t = t0 =rnn.
Otherwise, Qt0 @ Qt
For the second portion of the containment test, the spatial query containment framework verifies
that the query point Q0 (p) is located within the containment scope of query Q. The containment scope
describes an area inside of which any query that is semantically contained by Q will have a result set
40
that is entirely covered by RQ . Thus, any query that is both semantically contained by Q and inside
of the containment scope of Q can be answered locally by simply evaluating objects in RQ against
the restrictions given by Q0 . Note that the computed containment scope is dependent on both the
distribution of objects in the dataset S and the original query Q for which the containment scope was
constructed. As will be discussed later, Q0 only aects the containment scope area in the case of kNN
queries. The description of containment scope is formalized in Definition 10.
Definition 10. Containment scope. Given a spatial query, Qt (q), the containment scope represents the set of
locations denoted by SQ for which the result set of a query Q0t semantically contained by Qt is a subset of RQ .
Equivalently, {p 2 SQ | 8Q0t0 (p), Q0t0 v Qt ) RQ0 RQ }.
Based on the previous definitions for semantic query containment and containment scope, the client
containment test can be performed as follows. Given a containment scope SQ for a query Qt , if a new
query Q0t0 (q0 ) v Qt is issued and q0 2 SQ , then the client can reuse RQ to answer Q0t0 . Otherwise, it sends
Q0t0 (q0 ) to the server for processing and stores the returned result set and containment scope data to
assist in the evaluation of future spatial queries.
3.7
Containment Scope Evaluation and Computation Strategies
With the notion of spatial query containment and containment scope defined, we now focus on eecient
methods for computing and utilizing containment scope data within our processing framework. Recall
that a client can use information about a current query in tandem with previous query results and
associated containment scope data to avoid issuing potentially redundant queries to the server. This
requires a client to receive and to maintain a triple of data about each query submitted to the server.
These triples must include (1) all specifications and parameters of the submitted spatial query, (2) the
returned result set, and (3) the containment scope for that query. This information can then be used to
determine if a new query can be answered using the same result set as the query for which the store
triple was constructed. This general technique for identifying query redundancy spans all supported
spatial query types. However, the specific means by which containment scope data is stored and is
reconstructed varies based on type of query under consideration. Thus, we delay discussing specific
algorithms for utilizing containment scope until later in the paper.
Thus far, discussion in this paper has surrounded the use of containment scope to avoid needless
query evaluation. However, it is equally important to derive methods for computing and transmitting
containment scope data. Recall from the introductory chapter that a containment scope can be represented by returning a set of complementary objects to the client. This complementary set includes
non-result objects whose locations are essential for the formation of the containment scope. All of the
algorithms presented in this paper traverse an R-tree index, form a result set and complementary set,
and then communicate these results to the client. Beyond this common theme, the precise logic for
computing containment scope information varies based on the type of spatial query issued. Chapter
4, Chapter 5, and Chapter 6 discuss computation and evaluation strategies for region queries, nearest
neighbor queries, and reverse nearest neighbor queries, respectively.
41
As a final point of interest, this chapter examines an efficient R-tree search strategy that is used
to form result set and complementary set information for every supported query type. In this paper,
we assume that all objects are indexed on their spatial coordinates by an R-tree [4], because of its
wide acceptance and efficiency. As mentioned in Chapter 2, an R-tree clusters spatially close objects,
represents those groupings using minimum bounding rectangles (MBRs), and then recursively groups
MBRs until a root node is formed for the index. Figure 3.2(a) depicts an R-tree with a maximum fanout
of three. At the bottom, eight objects labeled a through h are grouped into the three MBRs N1 , N2 and
N3 . Continuing upward in the diagram, the three MBRs are grouped together to form the root of the
index. The positions of objects and MBRs are shown in Figure 3.2(b).
(a) R-tree (fanout=3)
(b) Objects in 2D space
Figure 3.2. Example R-tree
To efficiently retrieve objects required by a spatial query, many efficient search algorithms have been
developed based on the notion of best-first traversal [22] on an R-tree. Best-first search algorithms
arrange unexplored index nodes and objects in a priority queue according to the smallest Euclidean
distance from an object to the a query point (i.e., mindists [23]). Doing so guarantees every dequeued
head entry should have the minimum distance to the query point among all unexplored entries. The
pseudo-code for a generalized best-first search algorithm is shown in Figure 3.3. After initializing the
priority queue with the root of the R-tree index (line 1), the algorithm explores a head entry from a
priority queue during each iteration of the loop (lines 2-9). If is a node, it is expanded (lines 5-6), and
its children are enqueued for future processing. Otherwise, the entry is checked against the query and
collected as a result object if it satisfies the query constraints (line 8). We use a termination condition
(line 2) to indicate when the search completes. This condition varies for each type of spatial query In the
case of range and window query types, the termination condition is satisfied when all of the remaining
objects in the queue are outside the search area. On the other hand, the algorithm terminates for a kNN
query once the result set contains k result objects. Note that the processing of objects according to their
Euclidean distance ensures that these first k objects are in fact the correct result objects.
Consider the illustrative running example depicted in Figure 3.2. (We will continue to use this
example throughout the remainder of this paper.) Here, a range query Qrange (q, r) is issued at point q
with radius r over a given dataset that is indexed by an R-tree T. Notice that the radius r is long enough
to cover both objects c and d and that these objects must therefore form the result set for the given
query. After running the best-first search algorithm, objects c and d clearly have been selected as result
42
Algorithm best first search(T, q)

Input.
an R-tree (T), a query point (q)
Local.
a priority queue (P)
Output. a result set of objects (R)
Begin
1. P.enqueue(T.root, mindist(T.root, q));
2. While (P.not empty() AND terminate condition not satisfied)
3.
(, d)
P.dequeue();
4.
If is a node
5.
For each child c of
6.
P.enqueue(c, mindist(c, q));
7.
Else
8.
If{} satisfies the query
9.
R
R [ {};
10. Return R;
End.
Figure 3.3. Algorithm best first search
candidates for the range query Q, while the residual entries (i.e., a, b, N3 and g) in the priority queue P
represent data objects and internal nodes that are guaranteed to be located outside of the query search
space defined by the circle cir(q, r). At this point, the search completes, and those remaining entries in
the queue form the set of non-result objects ordered by non-decreasing Euclidean distance from q.
The best-first search algorithm is useful not only for result set processing but also for computing
the containment scope because of following two reasons. First, at the time when the search terminates,
the remaining priority queue entries preserve a representation of all non-result objects. We can use this
information to form the containment scope of the query for which the priority queue was constructed.
Second, as those non-result objects that aect the determination of a containment scope are expected to
be close to the result objects and to the query point, the remaining priority queue has already sorted the
remaining items based on their relevance to containment scope formation. That is, the priority queue
has already sorted non-result objects (and internal nodes containing non-result objects) based on their
Euclidean distances to the query point. It follows that the derivation of a containment scope can begin
immediately upon termination of the result object search routine by examining the closest non-result
objects. By tracking both result and non-result objects in a single priority queue, our containment
scope computation and query processing algorithms incur at most one disk access per index node and
thereby minimize the overhead of the spatial query containment framework. Chapter 4 continues this
discussion in the context of region query processing and illustrates how the best-first search algorithm
can assist in formulating the result set and containment scope of a given region query.
Chapter
Region Query Computation Methods

4.1
Containment Scope Server Processing for Region Queries
This section discusses how to calculate and how to use spatial query containment to reduce the number
of redundant region queries that are issued by a client to the server. We first describe some general
concepts about containment scope for region queries and present a universal algorithm to determine
the containment scope result set, RQ (q), and the containment scope complementary set, CSQ (q). Next,
we consider two common types of region queries: the range query and the window query. In each case,
we customize the general spatial query containment formulation algorithm in an eort to reduce computational overhead by exploiting geometric properties that are unique to each region query subtype.
Additional optimizations that improve overall system performance are also introduced. Finally, the
last subsection discusses how clients can use containment scope to avoid issuing unnecessary region
queries to the server.
A region query, Q(q), retrieves objects that are located within a specified query area, G(q), that
contains and is partly defined by query point, q. Any object o 2 G O is a member of the query result
set, RQ (q), and the remaining objects are in the non-result set, O
R(q). Recall that the spatial query
containment model seeks to ensure that at any point within the containment scope, a client possesses
all necessary information to construct the result set of a query Q0 (q0 ) that is semantically contained by
Q without contacting the central server. We assume that every client stores a local copy of the result
set for its most recently issued query. Therefore, a containment scope is determined by identifying the
specific subregion of the domain of S in which no non-result object will enter the spatial region G0 (q0 )
for future query Q0 (and consequently become part of the result set RQ0 (q0 ) ). Unlike other auxiliary scope
processing, a containment scope is not invalidated by the abscence of a result object in RQ0 (q0 ) , as the
client is capable of realizing this fact independently with the spatial information that it stores locally.
That is, we only require that RQ0 (q0 ) RQ(q) . In order to reduce computation time and to guarantee a
bound on the size of the containment scope, we require that at least one object from RQ (q) remain in the
result set of any semantically contained future query Q0 issued within the containment scope. That is,
44
we require that RQ (q) \ RQ0 (q0 ) , ;.
To determine the containment scope of a query result, we borrow the idea of Minkowski regions.
Recall that a Minkowski region is centered at some data object o and has the same spatial dimensions
as the query Q with which it is associated. When a query region, G(q), centered at q, covers an object
o, the Minkowski region of o denoted by G(o) covers q. This symmetry allows us to conclude that the
result objects are those whose Minkowksi regions cover q. Likewise, non-result objects are those whose
Minkowksi regions do not include q.
Given a result set, R, its valid scope VR is defined as the area covered by the Minkowski regions of
all the result objects but not that of any non-result objects. In other words, as long as a range query Q0
with the same search space (except, perhaps, the query point q) is issued inside VR , Q0 is guaranteed to
have R as the result set. Formally,
VR =
where the first term
o2R
G(o)
o2R
G(o0 )
o0 2O R
G(o) represents the intersection of all Minkowski regions of the result objects,
S
0
o0 2O R G(o ) refers
i.e., an area where all the result objects are included for a query, and the second term
to the area inside which at least one non-result object will be included as a result object. However,
the concept of containment scope is dierent from that of valid scope. It maximizes the reusability of
RQ (q, r) by considering not only region queries Q( q0 ) issued at dierent query point q0 with the same
search space G, but also those semantically contained by Q( q). Let R0 ( RQ (q)) denote a result set of
any query that can be answered by R( q). The containment scope, denoted as SQ (q), can be derived in
Equation (4.1):
SQ (q)
S
=
=
=
8R0 RQ (q)
8R0 R
Q (q)
o2RQ (q)
V R0
T
o2R0
G(o)
G(o)
S
S
o0 2O
o0 2O RQ (q)
R0
G0 (o0 )
(4.1)
G(o0 ).
It follows that the containment scope equals the union of the Minkowki regions of all result objects less
the area covered by the Minkowski region of any non-result object.
As we can observe from Equation (4.1), the derivation of SQ (q, r) requires a complete evaluation of
all the non-result objects, which incurs non-negligible overhead. In order to improve the performance
and to reduce the number of non-result objects that are accessed, we re-formulate the calculation of
SQ (q) in Equation (4.2). Our result is based on the observations that for any two collections of sets
A = {a1 , ..., am } and B = {b1 , ..., bn }, it easily can be shown (1) that A
([i2{1,2,...,m} ai ) \ ([ j2{1,2,...,n} b j ) [i2{1,2,...,m} ([ j2{1,2,...,n} (ai \ b j )).
B A
A \ B and (2) that
45
SQ (q)
=
o2RQ (q)
G(o)
o0 2O RQ (q) o2RQ (q)
G(o)
G(o ))
(4.2)
This implies that only those non-result objects with Minkowski regions that overlap the Minkowski
region of at least one result object need to be used in computing a containment scope. In this paper,
all those non-result objects that are involved in the calculation of the containment scope are defined as
complementary objects. As a result, our algorithm for an online containment scope computation is to find
a result set and all the complementary objects for a given query. We then define the complementary set
of a region query Q as CQ (q).
The problem of computing a containment scope for a region query, Q, has now been reduced to the
problem of constructing the result set and the complementary set described above. As the Minkowki
regions of complementary objects must overlap that of result objects, they must be located spatially
close to result objects and, by extension, to the query point q. Consequently, it is reasonable to gradually
expand the search space for result objects to search for complementary objects. Our algorithms for
calculating the containment scope of a region are inspired by this observation and employ the assistance
of the BestFirstSearch algorithm given in Figure 3.3.
As already discussed in Section 3, the BestFirstSearch algorithm is an efficient access method for
locating nearby data objects that uses the mindist metric, which minimizes the Euclidean distance
between two points. Because our algorithm uses BestFirstSearch either directly or with slight variations
that still order nodes based on spatial proximity to the query region, we conclude that result objects
and complementary objects have a high probability of being accessed early in the search. This reduces
computational overhead and is a key feature of our algorithm design. As the membership criteria
for RQ (q) and CQ (q) varies depending on the type of region query that is issued, we now present a
generalized version of the containment scope formation algorithm for region queries. Customizations
that are specific to range queries and window queries are discussed in Section 4.2 and Section 4.3,
respectively.
The general algorithm for all region queries is presented in Figure 4.1. It accepts a region query at
some location q with spatial constaints given by G. These constraints vary based on the type of region
query that is issued by the client and are used to determine if a data object is inside of the result set.
We begin by initializing temporary variables and by enqueuing the root node of the R-tree index T over
spatial dataset S into a priority queue P (lines 1-4). The ordering of objects within the queue is based on
the function distmetric, which varies for each region query subtype but which always gives preference
to those objects that are in close proximity to the query region. We delay additional discussion on
distmetric until Section 4.2 and Section 4.3. For each object dequeued from the priority queue P, the
algorithm issues a call to the subroutine not needed for cs to determine whether or not given node is
or could possibly contain members of the final result set and/or final complementary set. If not, we
ignore this node and continue processing (lines 7-8) the next enqueued object. Otherwise, we determine
if the current node is an internal node or a data node. Children of an internal node are enqueued for
46
Algorithm region query containment scope(T,q,G)

Input.
Region query centered at point q with spatial region
G issued against R-tree index T
Output. Result set R and complementary set C
Begin
1. Define empty priority queue P with tuples (node, dist)
2. P.enqueue((T.root, 0))
3. Define R-tree node e
4. R
{}
5. While (P.not empty())
6.
(e, d)
P.dequeue()
7.
If not needed for cs(e,R,q,G)
8.
Break
9.
If e is an internal node
10.
For each child m of e
11.
P.enqueue((m, distmetric(q, m.loc)))
12.
Else If e 2 G(q)
13.
R
R [ {e}
14.
Else If pass optimize(R,C,e,q,G)
15.
C
C [ {e}
16. Return (R, C);
End.
Figure 4.1. Algorithm region query containment scope
Subroutine not needed for cs (e,R,q,G)

Input.
Region query centered at point q with spatial region
G, tentative result set R, and current object e
Output. true if object is needed for containment scope
computation and f alse otherwise
Begin
1. If e \ G(q) , ; Then Return f alse
1. For each r in R
// Each complementary object Minkowski region must
// intersect ANY result set object Minkowski region
2.
If 9o 2 e such that q 2 G(o)
3.
Return f alse
4. Return true;
End.
Figure 4.2. Subroutine not needed for cs
additional processing (lines 9-11). On the other hand, a data node (1) is added to the result set R if
it is inside the search space G(q) or (2) is added to the complementary set C if it passes an additional
filtering step passOptimize (lines 12-15). The logic for passOptimize is unique for each specific region
query implementation and will be discussed momentarily. Once the priority queue contains no more
nodes, the algorithm terminates by returning the result set and complementary set that represent the
containment scope of query Q. Finally, we consider the node exclusion subroutine given in Figure 4.2.
47
Any node with a bounding box that overlaps with the query search space G(q) may contain (or be)
result objects and must be examined by the algorithm. Otherwise, we only need to check if the node
could possibly contain objects required for the complementary set. We do this by verifying that there
can exist some object within the node that has a Minkowski region which contains the query point q.
If so, the subroutine indicates that this object should be explored further. Otherwise, the node may
be skipped. Notice that the logic of subroutine not needed for cs assumes that all result objects are
encountered before any complementary objects. (If this is not the case, it is possible to prematurely
exclude a complementary object.) In all region queries discussed in this paper, distmetric is defined in
such a way as to ensure that all result objects preceed all possible complementary objects.
With a general outline of the region query containment algorithm defined, we now consider specific
implementations, examples, and optimizations of this approach for range queries and for window
queries in Section 4.2 and Section 4.3, respectively.
4.2
Containment Scope for Range Query
Using the general region query algorithm in Figure 4.1 as a guide, we now identify specific geometric
properties of range queries in order to formulate a complete containment scope construction algorithm.
4.2.1 Basic Observations

For a given range query Qrange (q, r), the search space can be represented as a circle cir(q, r) that is
centered at the query point q with radius r as the radius, as shown in Figure 4.3(a). An object o, with
its Minkowski circle cir(o, r) covering the query point q, must be a result object. In other words, if
q 2 cir(o, r), o 2 Rrange (q, r). Figure 4.3(b) illustrates the Minkowski circles of all the objects. As the query
point q only locates inside cir(c, r) and cir(d, r), objects c and d form the result set.
{d}
g
r
e
d
q
b
cir(q,r)
valid scope
(a) circle cir(q, r) of a range query
{c}
cir(a,r)
(b) Minkowski circles of objects
Figure 4.3. Range query circle and Minkowski circles of objects
The valid scope for Rrange (q, r) = {c, d} is depicted in Figure 4.3(b). For any q0 2 VR , Rrange (q0 , r) = {c, d}.
However, the concept of containment scope is dierent from that of valid scope. It maximizes the
48
reusability of Rrange (q, r) by considering not only range queries Qrange (q0 , r) issued at dierent query point
q0 with the same range distance r, but also those semantically contained by Qrange (q, r). The containment
scope, denoted as Srange (q, r), can be derived in Equation (4.3), which is simply a specialization of the
general region query form given in Equation 4.2.
Srange (q, r)
S
=
cir(o, r)
o2Rrange (q,r)
S
S
(
o0 2O Rrange (q,r) o2Rrange (q,r)
cir(o, r)
cir(o , r))
(4.3)
As such, only those non-result objects o0 that have Minkowski circles overlapping at least one Minkowski
circle of a result object (i.e., 9o 2 Rrange (q, r) such that cir(o0 , r) \ cir(o, r) , ;) might change the size of
Srange (q, r). Such objects are the complementary objects for a range query. The containment scope of
Rrange (q, r) is shown in Figure 4.4(d). Notice that it covers a much larger area than the valid scope that is
shown in Figure 4.3(b). Consequently, more queries can be answered using Rrange (q, r) and hence more
savings are to be expected.
4.2.2 Algorithm Implementation

Our online containment scope computation algorithm for range query result is derived from (1) the
general region query containment scope algorithm given by region query containment scope, (2) Equation (4.3) and (3) Lemma 1, which estimates the upper bound of the search space.
Three specializations of the region query containment scope algorithm are required to support range
queries efficiently. First, the search space G of a range query requires that a result object lie within
a certain Euclidean distance, r, from the query point q. That is, dist(q, o) r for any result object o.
Secondly, the distmetric parameter must be defined for range queries. We observe that ordering objects
by Euclidean distance from query point q will ensure that all result objects are visited prior to visiting
any complementary objects. Thus, we define distmetric := mindist(q, o), where o is the data object being
ordered and mindist is the minimum Euclidean distance between point q and point o. Notice that this
condition corresponds exactly to the one used in the BestFirstSearch algorithm. Using the mindist metric
has two key advantages. First, it ensures that all result objects are identified before any complementary
object is processed. Secondly, it ensures high spatial locality and reduced computational overhead, since
result objects and complementary objects are likely to lie close to the query point q, and these locations
are visited first by the algorithm. The passOptimize function is only implemented for the optimized
approach discussed later in this section and simply returns true for the basic approach. Finally, we can
optimize the stopping criteria for adding elements to the processing queue by observing the result of
Lemma 1, which restricts the search space by constructing an upper bound.
Lemma 1. The search space for complementary objects for a given range query Qrange (q, r) is bounded by a circle
cir(q, 3r).
Proof. For a range query, Qrange (q, r), the maximal distance between a result object and the query point
49
q is bounded by r. On the other hand, two circles cir(o1 , r) and cir(o2 , r) overlap only when the distance
between two centers (i.e., |o1 , o2 |) is bounded by 2r. Consequently, the longest distance between the
query point and a complementary object for a range query Qrange (q, r) must be 3r.
non-answer objects
in the priority queue
(a) Contents of R and P
(b) Examine a and N3
containment
scope
(c) Examine b, e, f , g and h
(d) Containment scope
Figure 4.4. Determining the containment scope for a range query result
To illustrate how the containment scope algorithm runs with range queries, we continue our example
shown in Figure 3.2 with r = |q, c|. When the processing of the range query finishes, we assume that the
result set R contains d and c and that the priority queue P maintains (a, b, N3 , g) in non-decreasing order
of mindists from q. Those objects are depicted in Figure 4.4(a). First, a, the head entry of P is examined.
Since cir(a, r) overlaps cir(c, r), a is included in C. Next, b is dequeued and its circle overlaps with cir(d, r).
Then b is included in C. Further, N3 s Minkowski range overlaps the result objects circles as shown
in Figure 4.4(b). Its children, e, f , and h are put into P. Now P becomes (e, f , g, h) and those objects
are depicted in Figure 4.4(c). Subsequently, e and f are dequeued. As they both have their Minkowski
circles overlap with that of result objects, C is updated to {a, b, e, f }. After determining that g and h do
not overlap with the result objects circles, P becomes empty and the search completes. Finally, C that
consists of a, b, e and f is returned, and the corresponding containment scope is shown in Figure 4.4(d).
50
4.2.3 Optimized Computation Strategy

By identifying complementary objects, i.e., the non-result objects whose Minkowski circles overlap with
those of result objects, the calculation of the containment scope is significantly reduced as the number of
non-result objects that we need to consider decreases. However, some of the complementary objects are
identified as having zero impact on the size of the containment scope. Figure 4.5(a) shows a case that
o1 and o2 are two complementary objects with respect to a range query Qrange (q, r) whose result consists
of o. Since the portion of cir(o, r) shared with cir(o1 , r) covers the overlap between cir(o, r) and cir(o2 , r),
o2 does not aect the containment scope. If those redundant complementary objects are identified and
removed, the transmission and storage costs for the complementary objects can be reduced further.
In addition, if they can be identified during containment scope computation, I/O costs incurred for
accessing the underlying index can also be mitigated. In this subsection, we discuss the technique to
identify removable complementary objects.
(a) objects o1 and o2
(b) Arc ab and cd
(c) Covered by two objects
Figure 4.5. Detection of redundant complementary objects
The basic idea of determining if a complementary object o0 is redundant is by checking whether the
overlap between it and all result objects is fully covered by other complementary objects. In order to
facilitate this checking, we identify two possible cases to determine if the overlap between the Minkowski
circle of o0 and that of result objects is covered by that of one or multiple other complementary objects.
Figure 4.5(b) shows an example for the first case in which O2 O1 , with O1 = cir(o, r) \ cir(o1 , r) and
O2 = cir(o, r) \ cir(o2 , r). As shown in the figure, the arc belonging to cir(o1 , r) but located inside cir(o, r)
_
is ab, and the arc belonging to cir(o2 , r) but located inside cir(o, r) is cd. As O1 fully bounds O2 , the arc ab
_
should cover the arc cd. Based on this observation, an examination is simply performed to determine
whether arcs formed between a complementary object and result objects are fully covered by others.
Figure 4.5(c) shows the second example. Here, o3 is the evaluated object with the arc of its Minkowski
_
circle inside the circle cir(q, r), i.e., bc is partially covered by the arc related to o1 or that of o2 , but fully
covered by both. To identify if o3 can be omitted, we exploit another property. Suppose that there
is an object whose arc is covered by two other arcs. Then we determine a point p as the intersection
points between perimeters of the two circles inside the Minkowski region of result object. If the distance
51
between an object (say o3 ) to p is longer than r, o3 can be omitted. The criteria just discussed can be
added to the region query containment scope algorithm within the passOptimize subroutine to reduce the
size of the complementary set.
4.3
Containment Scope for Window Query
With an efficient range query implementation derived from the general region query containment scope
processing algorithm, we turn our attention to window query containment scope construction. Once
again, we will use the general approach in conjunction with specific geometric properties of window
queries to formulate a complete containment scope construction algorithm.

Given a window query Qwindow (q, h, l), rect(q, h, l) represents a rectangular window which is centered at
q, with height and length of 2h and 2l, respectively. By applying Equation (4.2) in the context of window
queries, the containment scope for a window query result, Rwindow (q, h, l), denoted by Swindow (q, h, l), is
formulated and expressed in Equation (4.4).
Swindow (q, h, l)
S
=
rect(o, h, l)
o2Rwindow (q,h,l)
o2Rwindow (q,h,l)
rect(o, h, l)
S
o0 2O
Rwindow (q,h,l)
(4.4)
rect(o , h, l))
Once again, we expect the containment scope for a given window query to be substantially larger
than the equivalent valid scope for the same query, as the definition consists of the union of all result
object Minkowski windows as opposed to the intersection of the same Minkowski windows.

Our online containment scope computation algorithm for window query result is derived from the
region query containment scope algorithm in Figure 4.1 and includes the following three specializations.
First, the query region for window queries is a rectangular region specified as rect(q, 2l, 2h), where l and
h are extents from a query point, q, in the x-dimension and y-dimension, respectively. A window query
returns all objects in the dataset that are within this rectangular region. We also assume that a result
set is non-empty and that a priority queue of complementary objects is preserved after the query is
processed.
Second (and unlike range query processing), some potential complementary objects might have
smaller Euclidean distance to query point q than some of the result objects. To avoid the overhead
of storing a large number of potentially incorrect objects within the complementary set, we modify
the distmetric prioritization measure so that it considers the minimum distance between the bounding box of a dequeued object and the bounding box of the issue query Q. That is, distmetric :=
52
mindist(rect(q, 2l, 2h), o), where rect(q, 2l, 2h) represents the actual query region and o represents some
index node in R-tree T. Thus, any data object or internal node that overlaps with the query region
will be processed first because they will have a distmetric value of zero and mindist values cannot be
negative. It follows that the result set will be fully formed prior to handling any complementary objects,
and processing can mirror that of both the general region query algorithm as well as the specialized
range query approach. While this search scheme can cause redundant page faults for those windows
with large perimeters with respect to their contained area, this performance penalty is largely mitigated
by even small system caches. On the other hand, using the distmetric function precisely as described for
range queries would have required that all non-result objects be maintained until the entire result set
was constructed. Such an approach consumes a potentially large amount of memory and is not ideal.
As before, we temporarily delay the discussion of the optimization function for window queries until
the remainder of the window query containment scope processing algorithm has been clarified. Finally,
the search space for complementary objects of a given window query can yet again be bounded as is
clear from the result obtained in Lemma 2.
Lemma 2. The maximum search space for complementary objects for a given window query Qwindow (q, h, l) is
bounded by a rectangle rect(q, 3h, 3l).
Proof. For a window query, Qwindow (q, l, h), the possibly farthest result objects, o will be l (or h) from q on
x- (y-) dimension. Centering at o, rect(o, l, h) can touch a complementary object, o0 , 2l (2h) away from it.
Hence, the possible longest distance from o0 to q is 3h (or 3l).
To illustrate how a containment scope is formed based on our algorithm, we provide an example
based on a window query, Qwindow (q, x, y) on the R-tree shown in Figure 3.2. In this example, c and d are
the result objects. Meanwhile, the priority queue maintains a, b, N3 and g as depicted in Figure 4.6(a).
As both a and b have their Minkowski regions overlap with those of c and d, both of them are collected as
complementary objects. Next, N3 is dequeued. Its Minkowski region covers those of c and d as shown
in Figure 4.6(b). N3 is expanded, and its children, e, f and h are enqueued. Now, the queue contains e,
f , g and h. Later, e and f are collected since their Minkowski regions overlap those of the result objects.
Finally, g, which is far away from the result objects, and h, which is out of the maximum search space,
are ignored as shown in Figure 4.6(c). When the queue is completely scanned, the search completes.
The containment scope is formed by the result objects, c and d and complementary objects a, b, e and f ,
as shown in Figure 4.6(d).
4.3.3 Optimized Computation Strategy

As pointed out earlier, complimentary objects whose impact on the containment scope is actually
blocked by other complimentary objects can be safely removed without aecting the shape of a containment scope. We detail the technique to identify those removable complementary objects in the
following discussion. Generally speaking, our technique examines if the overlaps between a complementary object and result objects in terms of Minkowski regions are fully covered by that of others.
There are two possible cases as shown in Figure 4.7. Figure 4.7(a) shows the first case in which the
overlap between a complementary object o2 and the result object o, (i.e., rect(o2 , l, h) \ rect(o, l, h)) is fully
53
result set
non-answer objects
in the priority queue
(a) Sample window query Q
(b) Examining a, b and N3
containment
scope
(c) Examining e, f , g and h
(d) Containment scope
Figure 4.6. Determining the containment scope for a window query result
covered by rect(o1 , l, h)\rect(o, l, h). Figure 4.7(b) illustrates the second case in which the overlap between
a complementary object o2 and the result object o, (i.e., rect(o2 , l, h) \ rect(o, l, h)) is partially covered by o1
or o2 but entirely covered by both. In this case, we trim the overlap portion of the object against all other
complementary objects. If the entire portion is trimmed, the object can be safely removed. As in the
case of range query containment scope construction, we can use the previous observations to augment
the region query containment scope algorithm in general, and passOptimize subroutine in particular, to
reduce the size of the complementary set.
4.4
Containment Scope Client Processing for Region Queries
Now that the tools for computing a containment scope for region queries have been discussed, we
present the necessary concepts for client evaluation and utilization of containment scope data. Recall
that when a client does not have sufficient knowledge to process an spatial query, it submits the query
to the server for evaluation. Suppose that region query Q is issued to the server. Then the server returns
both the result set RQ (q) and the complementary set CSQ (q). It follows that all necessary information
for future containment scope evaluation is contained in the triple (Q, RQ (q), CSQ (q)). Furthermore, it is
54
(a) o2 hidden by o1
(b) o3 hidden by o1 and o2 jointly
Figure 4.7. Removable complementary objects
possible for the client to be able to store more than one containment scope triple if sufficient storage is
available. In such a case, passing the query containment test for any of the stored triples is sufficient to
answer the query independently without server assistance.
Algorithm client region query eval cs (Z, Q0 , T)
Input.
A set Z of containment scope tuples (Q, R, C),
where Q is a region query with corresponding result
set R and complementary set C. We also receive
a new query Q0 and R-tree index T
Output. Result set R0 for region query Q0 and updated set Z
Begin
1. Define set R0
2. For each (Q, R, C) in Z such that Q0 v Q
3.
R0
{}
4.
For each c in C
5.
If c 2 G0 (q0 ) Then Break
6.
For each r in R
7.
If r 2 G0 (q0 ) Then R0
R0 [ {r}
0
8.
If |R | > 0 Then Return (R0 , Z)
9. (R, C)
region query containment scope(T,Q0 .q)
10. Return (R, Z.append((Q0 , R, C)))
End.
Figure 4.8. Algorithm client region query eval cs
When a new query, Q0 , is needed by the client, our spatial query containment framework enables
the client possibly to answer Q0 locally using the algorithm described in Figure 4.8. For each valid
containment scope tuple, we first check to see if the new query is semantically contained by the query
Q that was used to construct the containment scope (line 1-2). If so, we verify that no complementary
objects are a part of the new result set RQ0 (lines 3-5). Next, the algorithm considers the result set RQ
and adds any member object that falls inside of the new query search space G0 (q0 ) to RQ0 (lines 6-7).
We require that at least one result object match in order for the local query evaluation to succeed. If
55
all previously defined conditions have been met, we avoid communication with the server and directly
return the result set to the client (line 8). Otherwise, we are forced to transmit the query to the server and
receive an appropriate result set and complementary set to form the new containment scope (lines 9-10).
Although not mentioned specifically in the algorithm, it is possible that the client may lack sufficient
storage to keep a complete history of all retrieved containment scope data. In this case, an eviction
strategy such as Least Recently Used (LRU) can be used to reduce the set. The provided algorithm is
very flexible in that it is capable of processing all types of region queries with no modifications aside
from the explicit methods for calculating G(q).
Chapter
Nearest Neighbor Query Computation

Methods
5.1
Containment Scope Server Processing for NN Queries
With a firm processing methodology in place for handling region queries, we now turn our attention
to nearest neighbor (NN) queries, which include 1NN queries as well as more general kNN queries.
Unlike region queries, NN queries do not have fixed query regions. Rather, they retrieve result objects
according to their Euclidean distance from the query point relative to the locations of other objects in
the dataset. Since the kNN query is an extension of the 1NN query, we begin by describing how to
compute containment scope for a 1NN query and then generalize the approach for use on kNN queries.
In both approaches, we determine the result set, RQ (q), and the complementary set, CQ(q), in a way
that reduces computational overhead by exploiting several fundamental geometric properties that are
unique to relative queries.
Finally, the chapter closes with a discussion of how clients can use cached result set and complementary set information to construct an appropriate containment scope in hopes of eliminating redundant
query submissions to the server. Unlike processing for region queries and for reverse nearest neighbor
queries, the containment scope for NN queries cannot be determined at the server because such an area
is dependent on both the the original query parameters and future candidate query parameters. In
particular, the value of k must be known for the query under evaluation. Thus, the containment scope
construction algorithm at the server will return sufficient information to allow the client to construct a
containment scope for any legal value of k.
5.2
Containment Scope for 1NN Query
To begin our analysis of a spatial query containment solution for nearest neighbor queries, we consider
how to efficiently compute the containment scope for a 1NN query at the server.
57

Recall that a 1NN query returns the object in a dataset that is closest to some specified location, which
may be a clients location or some other established point of interest in geospatial processing. Suppose
that a 1NN query Q is issued at point q and that its result set consists of the single object o. Since o
represents the only result object of the NN query Q, it follows that any other object o0 2 S is a non-result
object. That is, any result set RQ (q) for a NN query will contain exactly one member.
Given any non-result object, o0 , the entire search space S can be partitioned into two disjointed
half-planes HPo,o0 and HPo0 ,o . This half plane occurs along the perpendicular bisector ?o,o0 between o
and o0 . Furthermore, HPo,o0 will always cover o, and HPo0 ,o will always cover o0 . Figure 5.1(a) shows two
half-planes HPd, f and HP f,d formed based on the perpendicular bisector ?d, f . Any point located inside
the half-plane HPd, f is always closer to d than f . It follows that the area inside of which o is guaranteed
to be the closest to q among all dataset objects can be represented by S \ \o0 2O {o} HPo,o0 . This area is
called a Voronoi cell, and was discussed in Chapter 2. The Voronoi cell is commonly used to represent
the valid scope of a 1NN query result.
T
(a) Half-planes
(b) Largest empty circles
Figure 5.1. Geometric representation of NN containment scope
Because (1) the containment scope model requires that the result set for any semantically contained
query Q0 (q0 ) be a subset of the original result set RQ (q) when q0 is in the containment scope of Q and
(2) because 1NN queries have a result cardinality of precisely one object, we can conclude that the
containment scope problem for 1NN queries is equivalent to identifying the region where RQ = RQ0 .
This is precisely the property that defines the Voronoi cell of the sole object in RQ . Thus, we must compute
the Voronoi cell of the result object and represent it using data in the result set and complementary set
passed to the client.
Returning to our example from Figure 5.1(a), the complete Voronoi cell for result object d is shown
in Figure 5.1(b). Notice that the sides of the Voronoi cell correspond to the intersection of half-planes
formed by pairing result object d with nearby non-result objects. Extending this observation, we consider
the intersection of all half-planes that cover d and which are formed by pairing d and some non-result
object o0 as the containment scope of query Q(q). At any point p inside of the containment scope and
for any non-result object o0 , we must have |d, q0 | |o0 , q0 | and thus RQ (p) = RQ (q). To see that the formed
58
Voronoi cell is the largest region over which the aforementioned equality holds, consider the illustration
in Figure 5.1(b). Here, we see that a circle centered at each vertex v of the Voronoi cell with radius
equal to |v, d| has been drawn. It is also clear that the circle cir(v, |v, d|) is as large as possible without
containing any non-result objects. Attempting to expand the containment scope further by moving v
would cause these circles to contain non-result objects, and those objects would be closer to v than d.
That is, |d, v| > |o0 , v| for some non-result object o0 and RQ (v) , RQ (q). Thus, the containment scope for Q
is maximal.
Generalizing the above results for any 1NN query Q(q) with result set RQ (q) = {o}, we obtain the
expression for the containment scope of Q(q) given in Equation (5.1).

SQ =
\
o0 2O
HPoo0
(5.1)
{o}
Unfortunately, collecting all half-planes to determine the containment scope is a computationally

expensive process. According to Lemma 3, we can conclude that a half-plane will change the size of the
containment scope only when the corresponding complementary object o0 is at least as close to some
vertex v of the containment scope as the result object o is to that vertex. That is, o0 must lie on or within
the circle cir(v, |v, o|).
Lemma 3. Given a result object o and a convex hull SQ defined by n vertices vi (i 2 [1, n]), a half-plane HPo0 ,o
overlaps SQ i there is at least one vertex vi such that |o0 , vi | |o, vi |.
Proof: Suppose a half-plane HPo0 ,o overlaps SQ but all the vertices vi are closer to o than to o0 . That is,
8vi 2 SQ , |o0 , vi | > |o, vi |. It follows that all vertices and, by extension, the entire convex hull SQ are inside
of the half-plane HPo,o0 . This contradicts our assumption that SQ \ HPo0 ,o , ;.
Consquently, we can filter out those non-result objects that are farther away from all vertices of the
containment scope than the result object is from those same vertices. Recognizing this important fact, our
containment scoe computation algorithm for 1NN queries only returns objects in the complementary set
whose half-planes contribute to the formation of the containment scope boundary. As complementary
objects are expected to be close to the result object, we extend the distance browsing technique to find
both the 1NN result object and all subsequent complementary objects using only a single scan of the
R-tree index.

The spatial query containment scope computation algorithm for 1NN queries is provided in Figure 5.2.
It accepts a 1NN query at some location q and identifies the result object r that is closest to the query
point. We begin as in the case of region queries by initializing temporary variables and by enqueuing
the root node of the R-tree index T (of our spatial dataset S) into a priority queue P (lines 1-4). For
each object in the priority queue, the algorithm first checks to see if the result object has been located
(line 8). If not, the algorithm enqueues all children of any internal node and calculates a priority level
equal to each nodes minimum Euclidean distance from query point q (lines 9-11). In the event that the
newly dequeued entry is a data object, the algorithm assigns the entry to the result set and continues
59
Algorithm nn query containment scope (T,q)

Input.
NN query centered at point q issued against
an R-tree T
Begin
3. Define R-tree nodes e, r
4. Define convex polygon CReg and set Z
5. R
{}, Z
{}, CReg
T.root.get mbr()
6. While (P.not empty())
7.
e
P.dequeue()
8.
If R
;
9.
10.
11.
P.enqueue((m, mindist(q, m.loc)))
12.
Else
13.
R
{e}, r
e
14.
Else
15.
For each vertex v of CReg
16.
If mindist(v, e) < mindist(v, r)
17.
18.
19.
20.
Else
21.
CReg
CReg \ HPre
22.
Z
Z [ {e}
23. C
{c|c 2 Z ^ HPrc contributes to CReg}
24. Return (R, C);
End.
Figure 5.2. Algorithm nn query containment scope
to process remaining items in the queue (lines 12-13). Note that the result object will always be visited
prior to any potential complementary objects, as (1) the closest data objects to query point q are visited
first and (2) the sole 1NN result object is by definition the object that is closest to point q.
If the result set is nonempty, then the result object has already been found, and we must form the
complementary set. We maintain a convex polygon CReg for this purpose. CReg represents the tenative
containment scope area and is initialized to the entire data set (line 5). When a new complementary
object is located, we check to see if it is closer than the result object to some vertex of CReg. If so, the
complementary object is capable of entering the result set at some query location within the tenative
containment scope, so we refine the containment scope to only include the subregion wherein the
complementary object cannot become the closest object to point q. Finally, we only add children of
internal nodes that could possibly have childrean that are to some vertext of CReg than the result
object is to that same vertex (lines 14-22). The algorithm finishes when no more entries exist in the
priority queue P for processing. At this point, we examine the convex polygon CReg and identify the
60
complementary objects c that form the half-planes HPrc used to construct the sides of CReg (line 23).
To facilitate this process, we record those objects that have been used to refine the convex hull CReg in
a temporary set Z. These objects represent a superset of the complementary objects that are capable
of influencing the contents of the original result set. After this filtering process is complete, we have
finished forming both the result set and the complementary set. These sets are then returned to the
client, and the algorithm terminates.
v2
v3
b
a
v1
13
v
v
|d,v4|
largest empty circle centered at v4
(a) Initial convex hull for d
d
q
v4
v
(b) Trimmed convex hull by HPc,d
v
(c) Final containment scope for query
Figure 5.3. Determining the containment scope for a NN query result
To illustrate how the algorithm derives a containment scope, Figure 5.3 shows an example in which
a 1NN query is issued at point q. Observe that the querys result set RQ contains the single object d. We
now must compute the containment scope for query Q, which is equivalent to identifying the Voronoi
cell of d. Initially, the convex hull for d is set to the entire service area A of the dataset S and consists of
the four vertex v1 , v2 , v3 and v4 . This result is illustrated in Figure 5.3(a). The initial state of the priority
queue is given as P = {d, c, a, b, N3 , g}, and the algorithm examines objects and nodes in ascending order
of their mindist with respect to query point q.
First, d is dequeued and identified as the result object of the query. Next, c is dequeued and identified
as being covered by the largest empty circle centered at v4 . Consequently, the convex hull is trimmed
by the half-plane HPc,d , and the containment scope is refined to a new set of vertices v1 , v2 , v3 , v6 and v5 .
Non-result object c inserted into Z as a tentative complementary object (see Figure 5.3(b)). Afterwards,
other objects are examined in the same fashion. Objects a and b both contribute sides to the Voronoi
cell and are added to set Z. In addition, the algorithm must examine the children of internal node N3
since mindist(N3 , v3 ) |d, v3 |. The final convex hull of d is shown in Figure 5.3(c). Objects a, b, c, f and g
are complementary objects for the query Q, as they all contribute a side the final Voronoi cell. Objects
e and f do not influence this region and consequently are not returned to the client in the result set or
complementary set.
61
5.3
Containment Scope for kNN Query
With the description of 1NN query containment scope processing now complete, we turn our attention
to computing the containment scope for a kNN query at the server. Given that kNN queries are simply
a generalization of 1NN queries, we expect to use many of the same geometric properties that were
employed for 1NN queries.
(a) Containment scope for Q0 .k = 1
(b) Containment scope for Q0 .k = 1 and

Q0 .k = 2
Figure 5.4. Determining the containment scope for a 2NN query result
The kNN query is an extension of the 1NN query in that it finds the k objects that are closest to
a query point q. Thus, the 1NN query is a special case of the kNN query with k = 1. There are two
types of kNN queries that dier depending on whether the application requires that the result objects be
ordered. Order-insensitive kNN query results change when any complementary object becomes closer
than any of the previously identified result objects. Order-sensitive kNN queries change when (1) any
complementary object becomes closer than any result object or (2) when the relative ordering of result
objects changes with respect to their Euclidean distance from query point q. We assume that clients can
detect changes in result object ordering independently. Therefore, the computation and implementation
of containment scope for order-insentive and order-sensitive kNN queries is identical except perhaps
for the addition of some trivial client code that records the order in which result objects were located.
For simplicity, the spatial query containment model discussed in this paper focuses on order-insensitive
kNN queries.
In order to broaden the conceptual evaluation of a 1NN query, we consider a generic kNN query
issued at some location q and its result set RQ . We obtain half-planes formed by each result object o 2 R
and each non-result object o0 2 O
RQ . The region VQ wherein RQ remains valid is formulated as:

VQ = S \
\ \
o2R o0 2O R
HPo,o0
62
or equivalently
VQ = S
[ [
HPo0 ,o .
o2R o0 2O R
This region is commonly used by valid scope algorithms to represent the space where result set
equality holds between two queries Q and Q0 that dier only in their query location. Figure 5.4 shows
a 2NN query Q issued at location q. The result set RQ contains the objects {c, d}, and the valid scope VQ
for the result set is shaded dark gray. Notice that this region is precisely the intersection between the
Voronoi cells of c and d. Here each the Voronoi cell for each result object is formed by ignoring the all
other objects in the result set RQ . That is, the Voronoi cell for object c ignores object d, and the Voronoi
cell for object d likewise ignores object c. Rather than constructing the valid scope for a particular kNN
query Q, we instead consider the containment scope for Q.
Recall from Chapter 3 that such a containment scope has the property that any query Q0 (q0 ) such that
Q0 v Q must have the property RQ0 RQ if q0 is in the containment scope Snn (q, k). From our previous
discussion of valid scope, we can formulate the containment scope for kNN queries as in Equation (5.2).
Snn (q, k)
=
=
=
S
R0 Rnn (q,k)
R0 Rnn (q,k)
o2Rnn (q,k)
VR0
S
S
S
HPo0 ,o
o2R0 o0 2O R0
S
HPo0 ,o .
(5.2)
o0 2O {o}
As Equation (5.2) indicates, the containment scope is reduced by subtracting the union of all half-planes
formed between individual result objects and other non-result objects.
Notice that the reqion in which the containment scope definition holds is dependent upon the value
of Q0 .k. In the case where Q0 .k = 1, the containment scope consists of the union of every result object
Voronoi cell, as ensuring that at least one of the objects in RQ is closest to q0 is sufficient to answer
Q0 (q0 ) locally. This case represents the maximal area size for the containment scope over all semantically
contained queries Q0 . In contrast, choosing Q0 .k = Q.k yields a containment scope of equivalent size to
the valid scope case. That is, SQ = VQ under this scenario because all k objects must remain in the result
set to avoid the introduct of non-result objects into the result set. Finally, the third possible case is that
1 < Q0 .k < Q.k. Here, we require that Q0 .k of the Q.k objects in RQ have the Q0 .k smallest Euclidean
distances to query point q0 when compared with all other dataset objects. This yields a containment
scope size between the two bounds previously discussed.
A key benefit of adopting containment scope over valid scope for kNN query processing is the
relative size of the two auxiliary scopes. Containment scope will never be smaller than valid scope and
often results in a substantially larger area over which a query result can be reused. Consider once again
the sample query in Figure 5.4 and assume that a future query Q0 is issued with Q0 .k = 1. Then the
containment scoe for this 2NN query result is composed of the union of the Voronoi cell for object c and
the Voronoi cell for object d. Valid scope does not apply in this situation because Q.k , Q0 .k ! RQ , RQ0 .
The dark shading in Figure 5.4(b) illustrates the containment scope when Q0 .k = 2, and it is only
63
this substantially smaller region that valid scope also supports. Thus, containment scope is capable
of eliminating the same queries that can be eliminated through the application of valid scope, but
containment scope also can be applied over substantially more query types and over a substantially
greater area.
Our discussion of containment scope has led to the realization that the precise containment scope
cannot be computed by the server, as there are Q.k dierent possible scopes depending on the future
query Q0 , which is not assumed to be known in advance. Therefore, the containment scope algorithm
returns sufficient information in the complementary set to allow the client to compute any legal containment scope on demand. Returning the complementary objects responsible for every side of each
result objects Voronoi cell provides this support since the client containment test simply must check
for membership of a query location in some subset of the dierent Voronoi cells.
Although there are a substantial number of half-planes that are considered in Equation (5.2), many
of them do not contribute to the formation of the containment scope and can be eliminated. In order
to identify those half-planes with no impact on the final containment scope, we once again exploit the
notion of the largest empty circle of a Voronoi cell for a result object o. That is given a Voronoi cell,
every circle cir(v, |v, o|) must be empty. Any vertex v cannot be a finalize vertex if its circle is non-empty.
Based on this property, we develop can develop an efficient containment scope calculation algorithm
for kNN queries.

The kNN containment scope computation algorithm illustrated in Figure 5.5 makes use of the 1NN
containment scope computation algorithm as a subroutine during its execution. We begin our routine
as usual by allocating temporary variables and by enqueuing the root node of the R-tree index T (on
the spatial dataset S) into a priority queue P (lines 1-5). Using the distance browsing technique, the
algorithm first locates the k closest data objects to query point q (lines 6-13). Recall that these will be
the first k data objects removed by the queue since (1) the distance browsing assigns object priorities
based on their relative closeness to query point q and (2) the definition of the kNN query demands that
the result set consist of the k closest objects in S. Once the result set has been finalized, we leverage a
modified form of the 1NN containment scope computation algorithm to form the complementary set
for the given kNN query. The modified NN containment scope algorithm accepts a third parameter
(assigned R
{o} in our implementation) that includes a set of elements that are to be ignored during
query and containment scope processing. That is, anytime an element from this set is dequeued in the
1NN algorithm, it is immediately discarded. This ensures that the the computed Voronoi cells conform
to Equation 5.2.
In order to properly analyze kNN query correctness and to fully support query containment
processing, our kNN containment scope computation algorithm must return enough complementary objects to determine if the k0 k objects closest to q0 remain within RQ . Thus, the Algorithm
knn query containment scope computes k NN containment scopes. The complementary sets for each
containment scope are combined via a union operation, and duplicate references to the same data object
64
Algorithm knn query containment scope (T,q)

Input.
kNN query centered at point q with object count k
issued against an R-tree T
Begin
3. Define R-tree nodes e
4. Define sets R0 , C0
5. R
{}, C
{}
6. While (P.not empty()^|R| < k)
7.
e
P.dequeue()
9.
10.
11.
12.
Else
13.
R
{e}
// Implementation constructs all scopes in parallel to decrease overhead
14. For each object o 2 R
15.
(R0 , C0 )
mod nn query containment scope (T,q,R {o})
16.
C
C [ C0
17. Return (R, C);
End.
Figure 5.5. Algorithm knn query containment scope
are removed (lines 14-16). The resulting union of complementary sets forms the complementary set
for the submitted kNN query, and we return this information and the result set to the client. In actual
implementations of the kNN query containment scope algorithm, it is much more efficient to compute
all k Voronoi cells simultaneously. We do this by first computing the result set of the kNN query. Next,
we continue to process items in priority queue P according to mindist ordering. Each data object is
checked against k dierent tentative containment scope regions, and any necessary reductions are made
at that time. Thus, the integrated approach can proceed in almost the same manner as in 1NN query
processing and requires only a single index pass.
To illustrate how the algorithm derives a containment scope, Figure 5.3 shows an example in which a
2NN query is issued at point q and its result set contains objects c and d. The computation of the Voronoi
cell for object d has already been discussed during our 1NN query containment scope construction, and
the formation of the Voronoi cell for object c is similar. Recall that other result objects are not included
in the evaluation of a result objects Voronoi cell. In this case, object c will be ignored when processing
object d and vice-versa. The final complementary objects are identified as a, b, e, f and g. Note that c
and d are not taken as they are result objects and were ignored. In actuality, the Voronoi cells for c and
d are derived simultaneously after finalizing the result set RQ .
65
5.4
Containment Scope Client Processing for NN Queries

Algorithm client knn query eval cs (Z, Q0 , T)
Input.
where Q is a kNN query with corresponding result
a new query Q0 and R-tree index T
Output. Result set R0 for kNN query Q0 and updated set Z
Begin
1. Define set R0 , data object o
2. Define priority queue P with tuples (object, dist)
3. For each (Q, R, C) in Z such that Q0 v Q
4.
R0
{}, P.clear()
5.
For each object r 2 R
6.
P.enqueue((r, mindist(q, r.loc)))
7.
For each object c 2 C
8.
P.enqueue((c, mindist(q, c.loc)))
9.
o
P.dequeue()
10.
While (|R0 | < k ^ o < C)
11.
R0
R0 [ {o}
12.
o
P.dequeue()
13.
If |R| = k Then Return (R0 , Z)
14. (R, C)
knn query containment scope(T,Q0 .q)
End.
Figure 5.6. Algorithm client knn query eval cs
With a collection Z of stored (kNN query, result set, complementary set) triples cached locally by the
client, determination of whether a new kNN query Q0 can be answered locally becomes surprisingly
straightforward. The logic of the containment test is given in Figure 5.6. For each cached triple of query
information, we first check to see if Q0 is semantically contained by the associated original query Q.
That is, we check if Q0 v Q by determining if Q0 .k Q.k (lines 1-3). Next, we add every result object
and complementary object associated with a satisfactory query Q to a priority queue P that prioritizes
its contents by the Euclidean distance between the stored object and the new query point q0 of the query
Q0 (q0 ) (lines 5-8). So long as the first Q.k0 objects dequeued are all members of RQ (i.e., they are not
complementary objects), we process the query locally and return the top Q0 .k objects from the priority
queue (lines 9-13). Only one cached containment scope needs to match the previously stated demands.
If no matches are found, we are forced to transmit the query to the server and receive an appropriate
result set and complementary set to form the new containment scope (lines 14-15). This content is then
added to the clients cache to aid in eliminating future redundant queries. Although not mentioned
specifically in the algorithm, it is possible that the client may lack sufficient storage to keep a complete
history of all retrieved containment scope data. In this case, an eviction strategy such as Least Recently
Used (LRU) can be employed to reduce the set.
Finally, notice that the containment test algorithm does not actually compute the entire containment
66
scope region, which could potentially be computationally expensive. Instead, it simply verifies whether
a particular query issued at a particular point with a particular supplemental parameter (k) is located
within the containment scope. This computation is substanitally easier to perform than the complete
formation of a kNN query containment scope. Adopting such a technique ensures (1) that computational
overhead is minimized and (2) that client users receive a response quickly.
Chapter
Reverse Nearest Neighbor Query

Computation Methods
6.1
Preliminary Notes on Reverse Nearest Neighbor Query Processing
The goal of this section of the work is to provide for the efficient construction and evaluation of a
containment scope processing framework for the reverse nearest neighbor (RNN) and reverse k nearest
neighbor (RkNN) query types. Unlike the other query processing algorithms considered so far, there
does not exist any known auxiliary scope mechanisms for the RkNN query type or its variants. That
is, no semantic scope or valid scope approach has been proposed that supports RkNN queries. As
the design of a conceptual framework for other auxiliary scope methods serves to clarify the unique
attributes of the containment scope approach and as the construction of such methods is closely related
to that of containment scope, this chapter defines a general auxiliary scope processing framework for
RkNN queries and then oers specific implementations for both valid scope and containment scope.
One general observation of the existing RkNN query processing work in discussed in Chapter 2
is that most existing RkNN processing frameworks utilize the notion of NN (or kNN) circles that are
centered around each data object and which extend to the (k) nearest data object. An alternative but
logically equivalent construction is to view the dataset as forming a series of perpendicular bisectors
between every pair of data points. The ultimate goal then is to determine the region in which a query
point is in the NN (or kNN) circle and, by extension, the result set. This information is critical to
determining if objects enter or exit the RkNN result set when the client moves and will motivate our
construction of an eective auxiliary scope processing algorithm for issued RkNN queries.
On the surface, it does not appear to be exceptionally difficult to extend auxiliary scope techniques
for basic spatial queries to address this new type of problem. However, the naive extension of auxiliary scope to RkNN queries places artificial constraints on scope applicability and supported queries.
68
Providing for an optimal, dynamic, and rich RkNN query auxiliary scope implementation is not trivial.
In particular, the value of k in an RkNN query should ideally be dynamic to increase system flexibility
and to provide for maximum query reuse. To be eective, auxiliary scope must be able to recognize
redundancy in semantically similar queries that have either varying parameter values (k) or varying
query locations (q) or both and to respond accordingly. Changing k aects the size of all constructed
RkNN circles, which aects RkNN result set inclusion and the associated auxiliary scope. Consider our
two example RkNN queries shown in Figure 6.1. Both queries are issued at the same query point q.
However, the first query in Figure 6.1(a) has k = 1, whereas the second query has k = 2. Changing
k = 2 changes the radius and size of every data points kNN circle, which is used to determine result
set membership. That is, modifying the query parameter possibly aects the region wherein the query
point q is within the k closest objects to some particular data point. For example, objects c and d are not
in the RkNN result set when k = 1 but enter the result set whenever k = 2 because q is noy contained
within the NN circles of c and d and is contained within the 2NN circles of c and d. In contrast, objects
a and b are in both result sets since the NN circles and 2NN circles of these data objects all contain q.
Finally, the NN circles and 2NN circles of all other data objects do not contain q, so none of these objects
are in the result set.
b
b
c
L
d
q
c
e
d
q
f
g
h h
e
b
(a) k = 1
c
L
d
q
a
c
e
d
q
f
g
h h
(b) k = 2
Figure 6.1. Eect of k on RkNN result
To systematically address the considerable challenges posed by this new query type, we temporarily
impose additional constraints to simplify the problem. We later remove these assumptions to construct
a final optimized auxiliary scope computation algorithm. In particular, we impose a subset of the
following four assumptions on each algorithm:
1. k = 1
2. k is fixed for dataset generation
3. k is fixed for client query result reuse
4. The dataset is monochromatic
69
In all cases, we assume that the dataset S is static. Most auxiliary scope analysis for traditional
spatial queries also makes this fundamental assumption, and removing it will be the subject of a future
work. We now consider algorithms that address the RkNN auxiliary scope problem with varying levels
of flexibility. For each approach, we state which assumptions are in place. Ultimately, we will derive a
robust algorithm that is capable of relaxing all four of the above restrictions.
We begin by considering several naive approaches for constructing an auxiliary scope for a RkNN
query. Based on careful analysis of the strengths and limitations of these methods, we build a new
dynamic auxiliary scope computation routine for RkNN queries that is capable of forming the auxiliar
scope and answering the query simultaneously to reduce index accesses. Finally, we examine an
optimal processing algorithm that reduces computational overhead and extends the dynamic approach
to support bichromatic datasets. Both valid scope and complementary scope algorithm variants are
considered for each algorithm type. A summary of the algorithms as well as the assumptions that they
require is given in Table 6.1.
Algorithm
Basic Auxiliary Scope Processing (VS)
Basic Auxiliary Scope Processing (CS)
Dynamic Auxiliary Scope Processing (VS)
Dynamic Auxiliary Scope Processing (CS)
Optimal Auxiliary Scope Processing (VS)
Optimal Auxiliary Scope Processing (CS)
Assumptions
1,2,3,4
1,2,3,4
3,4
4
3
None
Table 6.1. Algorithm assumptions
6.2
Basic RkNN Auxiliary Scope Construction
In this section, we consider several naive auxiliary scope computation techniques. While these algorithms are not optimized for real world applications, they are useful from an academic point of view to
introduce several of the basic concepts that will be expanded upon later in the paper. In this section,
we assume that all four processing assumptions hold. That is we assume that k = 1 for both data index
generation and for RkNN query execution. Furthermore, the value of k remains unchanged by both the
client and the server, and only monochromatic data is used. Our technique makes use of the previously
noted relationship between the RNN query result set and data object NN circles. We now formalize this
notion in Definition 11 and Lemma 4. Our basic RkNN auxiliary scope computation techniques rely on
the relationship between result set membership and NN circles to construct auxiliary scopes.
Definition 11. NN circle. The NN circle of data point o 2 S, denoted as NNCir(o), is equal to the circle centered
at o with radius equal to r = mino0 2S dist(o, o0 ). That is, NNCir(o) = cir(o, r).
Lemma 4. Data object o is a member of the result set for the RNN query issued at point q and denoted as RNN(q)
if and only if q 2 NNCir(o).
70
Proof. Assume that q 2 NNCir(o). Since the radius of NNCir(o) denotes the distance between o and
its closest object in the dataset, any point inside of NNCir(o) must be closer to o than any object in the
dataset. If this criteria is met for q, then o is in the result set for RNN(q) by definition. Next, assume that
o is in the result set for RNN(q). Then o must be closer to q than to any other data object. It follows that
q 2 cir(o, dist(o, o0 ))8o0 2 S. In particular, q 2 NNCir(o).
6.2.1 Korn Unary Basic Auxiliary Scope Processing

In our first attempt to construct an RkNN auxiliary scope algorithm, we employ a design based on the
Korn RNN indexing structure discussed in Chapter 2. Recall that this indexing method includes an
R-tree that stores the NN circles for each object in the dataset S. Our technique first identifies all objects
in the result set. Using this information, we identify a tentative scope wherein the client can move
without violating the terms of the auxiliary scope being used. In the case of valid scope, every object
must remain within the result set. On the other hand, containment scope only requires that at least
one result object retain its membership information. (Distinctions between dierent auxiliary scope
methods are discussed in greater detail later in this section.) Any object that possibly could enter the
result set if the client were to move within the tentative scope must be included as a complementary
object. Furthermore, the distance between every result or complementary object and its nearest neighbor
in the dataset must be returned to ensure that the client can determine when the computed result set
for some previous query can be used to answer a future query locally. Thus, the output of the server
query evaluation algorithm consists of a result set R and a complementary set C. Each object o in either
set consists of a (Location, NNDist) pair that can be used by the client to reconstruct NNCir(o).
The code for the Korn unary basic auxiliary scope processing algorithm is given in Figure 6.2. We
begin by initializing the result set and complementary sets as empty. Furthermore, we construct a queue
(prioritized by the distance from query point q) and enqueue the root node of the Korn NN circle R-tree
(lines 1-4). We then traverse the index to identify all result objects for the RNN query. We only examine
those nodes whose region contains the query point q since those are the only nodes that can contain
result objects. At the conclusion of this phase of the algorithm, the priority queue is empty (lines 5-13).
We then re-enqueue the index node to construct the complementary set for the RNN querys auxiliary
scope. Unlike the original index scan that excludes nodes based on the coverage of q, this pass seeks to
examine nodes that could be needed to identify when the client can answer a new RNN query locally.
The specific criteria is dependent on the type of auxiliary scope employed, so the logic is extracted into
a separate subroutine. Two implementations based on valid scope and containment scope are provided
and will be discussed next. Any objects that meet the requirement for auxiliary scope inclusion are
processed so long as they are not already returned to the client as elements in the result set (lines 14-23).
The algorithm terminates when no nodes remain that could be needed for auxiliary scope processing,
and the result set and complementary set are returned to the user.
With the general overview of basic Korn unary auxiliary scope computation described, we turn our
attention to determining membership in the auxiliary scope complementary set. Dierent logic is used
depending on the type of auxiliary scope employed. This paper focuses on valid scope and containment
71
Algorithm find korn unary as(I,q)

Input.
A preconstructed R-tree index I of NN circles
used by Korn and query point q
Output. Result set R and complementary set S
for auxiliary scope
Begin
1. Initialize set R
{}, C
{}
2. Initialize priority queue P as empty
3. P.set sort method(mindist(q, P.object))
4. P.enqueue(I.root)
5. While(P.is not empty())
6.
N
P.dequeue()
7.
If q < N
8.
break
9.
If N is internal node
10.
For each object m in N
11.
P.enqueue(m)
12.
Else
13.
R
R [ {N}
// P is empty at this point
14. P.enqueue(I.root)
16.
N
P.dequeue()
17.
If N 2 R OR not needed for as(R,N)
18.
break
19.
20.
21.
P.enqueue(m)
22.
Else
23.
C
C [ {N}
24. Return (R, C)
End.
Figure 6.2. Algorithm find korn unary as
scope implementations. Notice that the result set for an RNN query serves to restrict the region in which
a query point can move without aecting the query answer. Recall that data object o 2 R if and only if
q 2 NNCir(o). In the case of valid scope, we require that the result set remain entirely unchanged. We
can use this fact to obtain a bound on the valid scope of a query Q issued at point q. The result is given
in Lemma 5 and states that the area of the valid scope for Q must intersect the NN circle of every result
object.
Lemma 5. The valid scope corresponding to an RNN query Q issued at point q is bounded by the region TVS ,
T
where TVS = o2RQ NNCir(o).
Proof. A valid scope for Q indicates the region wherein a new RNN query Q0 can be issued and return
precisely the same result set RQ0 = RQ . In particular, RQ0 RQ and RQ0 must contain every element of
RQ . For any object o 2 RQ , q0 2 NNCir(o) in order for o 2 RQ0 . It follows that q0 must be in the NN circle
72
Subroutine not needed for vs (R,N)

Implements not needed for as
Input.
An RNN query result set R and potential
complementary object N
Output. true if object can be ignored during valid scope
calculation and f alse otherwise
Begin
1. For each r in R
// Each complementary object circle must intersect
// ALL result set object circles
2.
If N \ r = ;
3.
Return true
4. Return f alse
End.
Figure 6.3. Subroutine not needed for vs
of every object in RQ and we obtain TVS = TVS =
o2RQ
NNCir(o)
Knowing this fact, we can conclude that only those non-result objects that could possibly become
result objects of a query issued inside of the space defined by TVS need to be communicated to the
client. A complementary object can become a result object at some point in T if and only if its NN circle
overlaps with T. Thus, the criteria for inclusion of an object o in the complementary set C is defined as
C = {o 2 S
R|NNCir(o) \ TVS , ;}. Figure 6.3 presents the code for refining the complementary set for
valid scope processing. As previously described, the potential complementary object is compared with
each result object to ensure that the two NN circles overlap.
Subroutine not needed for cs (R,N)
Implements not needed for as
Input.
An RNN query result set R and potential
complementary object N
Output. true if object can be ignored during containment scope
calculation and f alse otherwise
Begin
1. For each r in R
// Each complementary object circle must intersect
// ANY result set object circle
2.
If N \ r , ;
3.
Return f alse
4. Return true;
End.
Figure 6.4. Subroutine not needed for cs
In contrast with the valid scope technique, containment scope simply seeks to ensure (1) that at least
one result object remains within the result set of a future query and (2) that no new objects enter the
result set. In other words, RQ0 RQ for any RNN query Q0 issued inside of the containment scope for
73
RNN query Q. It follows that the area of applicability for a containment scope is typically larger than
that of the valid scope for the same query. Therefore, a larger complementary set may be needed to
provide information about new objects that may aect the result set of a future query evaluation. A
benefit of calculating and communicating this additional information is that the client will be able to
use containment scope results under a wide range of circumstances. In the same manner as was done
for valid scope, we now use the definition of containment scope to obtain a bound on its size for a query
Q issued at point q. The result is given in Lemma 6 and states that the area of the valid scope for Q must
intersect the NN circle of at least one result object.
Lemma 6. The containment scope corresponding to an RNN query Q issued at point q is bounded by the region
S
TCS , where TCS = o2RQ NNCir(o).
Proof. A containment scope for Q indicates the region wherein a new RNN query Q0 can be issued and
return a subset of the original result set (RQ0 RQ ). In addition, we place the constaint that RQ0 [ RQ , ;.
Then RQ0 must include at least one object o 2 RQ . Based on the result of Lemma 4, o 2 RQ0 only if
S
q0 2 NNCir(o). It follows that TCS = TCS = o2RQ NNCir(o)
We can conclude that only those non-result objects inside of the space defined by TCS need to be
considered by the client. Once again, the criteria for inclusion of an object o in the complementary set
C is defined as C = {o 2 S
R|NNCir(o) \ TCS , ;}. Figure 6.4 presents the code for refining the com-
plementary set for containment scope processing. In this algorithm, we examine each complementary
object and result object pairing to ensure that the NN circle of each complementary object overlaps with
the NN circle of at least one result object.
To clarify the Korn Unary Basic Auxiliary Scope algorithm as well as the fundamental geometric
properties on which it is based, we oer the sample dataset shown in Figure 6.5. This same example will
be revisited during our design of the Dynamic RkNN Auxiliary Scope algorithm. Figure 6.5(a) shows
an R-tree index constructed for the sample dataset as well as an RNN query Q issued at point q. Recall
that an object o in the dataset is in the result set RQ if and only if it is closer to q than to any other point
o0 in the dataset. Equivalently, if we draw a circle centered at o and extending to query point q, then o is
a result object if and only if the previously described circle is empty. (Otherwise, any of the contained
objects is closer to o than q is to o.) Figure 6.5(b) shows these query circles and allows us to visually
identify objects a and b as result objects, as they are the only two objects with empty query circles. In
Figure 6.5(c), we draw the NN circles of each data object o. Recall that the NN circle for o is centered
at o and extends to the closest other object in the dataset. The two shaded circles denote the region
wherein a query Q0 can be issued with either result object a or result object b remaining in the new
result set. Finally, Figure 6.5(d) builds on this information to construct the two auxiliary scopes defined
above. In the case of valid scope, we note that both a and b must remain in the result set. Therefore,
the intersection of their two NN circles provides an upper bound on the valid scope region. In fact, we
notice that the NN circle of object d intersects with this region, so it is included in the complementary
set. Future queries issued inside of the dark shaded region are within the valid scope of Q and will have
identical result sets. In contrast, containment scope only requires that either object a or object b remain
in the result set of a new RNN query issued inside of the scope. This area is represented by the union
74
R2
f h
R4 g
e
R3
c
d
q
b
f
g
e
c
d
q
b
R1
(a) Sample dataset
(b) Query circles
f
g
e
e
c
d
q
c
d
q
b
b
a
VS
CS
(c) NN circles
(d) Auxiliary scopes
Figure 6.5. Sample query auxiliary scope computation
of the NN circles of object a and object b. Thus, object d, object c, and object e all can enter the result set
in the tentative containment scope range. Thus, we include them in the complementary set, and any
(light or dark) shaded area is part of the containment scope.
Finally, we consider the efficiency and applicability of the Korn Unary Basic Auxiliary Scope algorithm. Recall that we placed substantial limitations on the type of RkNN query that is accepted by this
algorithm. In particular, we restricted the value of k to k = 1. This significantly limits the types of
queries that users can issue and limits the number of scenarios under which our auxiliary scope system
can be deployed. In addition, the algorithm is built on the Korn RNN indexing structure, which is
precomputed and which does not oer performance that is competitive with respect to new techniques
oered by Stanoi and Tao (see Chapter 2). Perhaps even worse, our technique requires two passes
through the data index. The first determines the result set of the query, and the second constructs
and refines a complementary set to be returned to be returned to the client. This sequential approach
increase disk accesses and processing time. Ideally, we would like to be able to construct both the result
set and the complementary set at the same time, as they need to analyze similar data. On the other hand,
the Korn Unary Basic Auxiliary Scope algorithm does reduce the number of client requests sent to the
75
server whenever the query conforms to our restrictions. The magnitude of the reduction is largest when
using containment scope, as its area of application is larger than that of valid scope. Most importantly,
the design of this basic algorithm led to key geometric insights about the relationship between RNN
queries and the underlying dataset on which they are issued. We introduced the notion of NN circles
and defined auxiliary scope, result set membership, and containment scope membership in terms of
them. The dynamic and optimal auxiliary scope algorithms developed in Section 6.3 and Section 6.4
will build on these observations.
6.2.2 Basic Auxiliary Scope Client Evaluation

Algorithm client query eval vs (Z, Q0 , I)
Input.
A set Z of valid scope tuples (Q, R, C), where Q
is an RNN query with corresponding result set R
and complementary set C. We also receive a new
query Q0 and R-tree index I
Output. Result set R for RNN query Q0 and updated set Z
Begin
1. For each (Q, R, C) in Z
2.
For each c in C
3.
If Q0 .q 2 c Then Break
4.
For each r in R
5.
If Q0 .q < r Then Break
6.
Return (R, Z)
7. (R, C)
find korn unary vs(I,Q0 .q)
End.
Figure 6.6. Algorithm client query eval vs
Once an auxiliary scope is returned to the client, it can be used to avoid sending redundant query
requests to the server. We now present an algorithm for determining when the stored auxiliary scope
for a query Q can be used to answer a future query Q0 . This logic is entirely dependent on the specific
auxiliary scope chosen. We present code variants for valid scope and for containment scope in Figure
6.6 and Figure 6.7, respectively. Because of the restrictions metioned at the beginning of this section, we
assume that all RkNN queries issued have k = 1. (If not, the client sends them to the server and does not
store any auxiliary scope data.) Our model is flexible in that it considers the fact that a client may retain
a set Z of multiple auxiliary scopes from distinct queries. These scopes can be checked in turn, and the
client avoids sending a request to the server if any scope contains the new query point of Q0 . In the case
of valid scope, we verify (1) that no complementary set objects are members of the result set of Q0 and
(2) that every result set object is a member of the result set of Q. We determine result set membership
using NN circles, which can be computed using a data objects location and pre-computed NN circle.
If these conditions are met for some valid scope given by (Q, R, C), then the client returns the previous
result set R as the answer to Q0 and leaves the set Z of auxiliary scope data unchanged. Otherwise,
the client lacks sufficient local data for processing and submits the query to the server for processing.
76
Algorithm client query eval cs (Z, Q0 , I)

Input.
where Q is an RNN query with corresponding result
a new query Q0 and R-tree index I
Begin
2.
R
{}
3.
For each c in C
4.
If Q0 .q 2 c Then Break
5.
For each r in R
6.
If Q0 .q 2 r Then R
R [ {r}
7.
If |R| > 0 Then Return (R, Z)
8. (R, C)
find korn unary cs(I,Q0 .q)
End.
Figure 6.7. Algorithm client query eval cs
The result set of this evaluation is provided as the query answer, and Z is updated to include this new
information. Although not mentioned specifically in the algorithm, it is possible that the client may
lack sufficient storage to keep all retrieved auxiliary scopes. In this case, an eviction strategy such as
Least Recently Used (LRU) can be used to reduce the set. The algorithm for client evaluation using
containment scope is similar to that of valid scope. As before, no complementary set objects can be a
part of the result set. However, the client only requires that at least one result object be a member of the
new result set of query Q0 . As an aside, note that the true result set in the case of either auxiliary scope
method consists of the center point of each NN circle object stored in R. We abuse the notation slightly
in the algorithm description to enhance readability.
6.2.3 Basic Auxiliary Scope Processing Variants

To conclude this section and our discussion on the Korn Unary Basic Auxiliary Scope algorithm, we
consider several possible variants and analyze the eect that these changes would have on the generality
and efficiency of our approach. First, this section has thus far assumed that k = 1. In fact, it is possible
to use a fixed k and to remove the first of our four imposed assumptions. However, we require that
the value of k is fixed for dataset generation as well as for all future queries from all clients. Thus, the
practicality of such an implementation is still very much questionable. We proceed by defining the
equivalent of a NN circle for an RkNN query, referred to as as a kNN circle, in Definition 12.
Definition 12. kNN circle. The kNN circle, denoted by kNNCir(o, k) is defined similarly to the NN circle with
the radius being computed as the distance between a data object o and its kth closest data object in the dataset S,
or rk = minZ={o1 ,...,ok |oi 2S} (maxz2Z dist(o, z)). Then we have kNNCir(o, k) = cir(o, rk ).
Referred to henceforth as the Korn Fixed Basic Auxiliary Scope algorithm, this approach allows the
77
value of k to be arbitrarily large so long as it is set prior to the generation of the index. Our adaptation
simply builds the Korn R-tree index using the kNN circle for each dataset object instead of the NN circle
of each object. The fundamental properties of the index are preserved with the only change being a
possible increase in the size of the radius of each circle given by the dierence in distance between an
objects closest neighbor in the dataset and its k closest neighbor. The inclusion of kNN circles allows us
to identify the auxiliary scope for an RkNN query in the same exact manner that was used to determine
an auxiliary scope for an RNN query. Note that the definition of k is very restricted in this definition
and does not accurately model most real-world scenarios. Nevertheless, the removal of the condition
that k = 1 represents a positive step toward a more generalized algorithm.
A second (and equally impractical) alternative is to replace the Korn indexing structure with the
realtime computation method employed by Stanoi (see Chapter 2). This algorithm, referred to as
Stanoi Basic Auxiliary Scope, relies on all four original assumptions and is based on the traditional
R-tree structure used by Stanoi. (Technically, the approach can be adapted for k > 1. However, the
mathematical properties leveraged in the algorithm do not scale well with k and quickly result in an
inefficient solution.) Recall that the R-tree in this case simply stores the objects in the dataset and not
their pre-computed NN circles as in the case of Korn. Instead, a limited number of restricted NN queries
are issued from the query point and the returned objects are considered as possible members of the
result set. Remember that our technique identifies both result objects and complementary objects that
could become result objects within the tentative scope region TAS . Since NN circles are not already
available, we are forced to computed the nearest neighbor of each complementary object that could
possibly overlap with the result set. This process is extremely time consuming and incurs substantial
overhead. Thus, we conclude that the Korn indexing structure is most appropriate among the basic
approaches listed in this section.
6.3
Dynamic RkNN Auxiliary Scope Construction
With a basic RNN auxiliary scope construction framework established, we now seek to generalize our
approach to work in a variety of environments. The Dynamic RkNN Auxiliary Scope Construction
algorithm presented in this section removes the first two assumptions imposed at the beginning of our
design process. Specifically, we remove the requirements that k = 1 and that k be fixed at index creation.
By definition, the valid scope of a query Q can only be applied to another query Q0 if both queries
have identical supplemental query parameters (i.e. identical values of k). However, using the dynamic
approach in conjunction with containment scope provides the flexibility to eliminate redundant queries
even if those queries have dierent k values. In fact, it is possible that a containment scope for query Q
can be used to answer a new query Q0 so long as Q.k
Q0 .k. The dynamic approach still only supports
monotonic datasets, and several optimizations have been removed to facilitate the development of the
algorithm. In Section 6.4, we will address these final limitations.
78
6.3.1 Dynamic RkNN Auxiliary Scope Processing

For most of the paper, we have been motivating the construction of RkNN query result and auxiliary
scope formation with the notion of RkNN circles. Recall that membership in an RkNN query result set
can be determined based on the membership of query point q in a data objects kNN circle. The Korn
index has provided a convenient way of retrieving RkNN circles for data objects, as it is precomputed
during index formation. However, it is precisely this precomputation that prevents our algorithm from
supporting dynamic values of k. At the same time, the approaches by Stanoi and Tao can quickly
determine result set membership for dynamic values of k but do not oer efficient mechanisms for
determining objects that could enter the result set if the location of the query point was perturbed
slightly and, by extension, cannot easily determine complementary set membership. To address this
problem, we will modify and substantially extend an approach used by Lee to answer ranked reverse
nearest neighbor (RRNN) queries. The dynamic RkNN auxiliary scope computation method relies on
two important metrics: kcnt and kdist. We formally introduce each of these terms in Definition 13 and
Definition 14.
Definition 13. kcnt. Consider an RkNN query Q issued at point q. The kcnt of a data point o 2 S, denoted as
o.kcnt, is equal to the number of data objects in S
Symbolically, o.kcnt = |{a 2 S
{o} that are closer to object o than query point q is to object o.
{o}s.t.dist(a, o) dist(q, o)}|.
Definition 14. kdist. Consider an RkNN query Q with paramter k. The kdist of a data point o 2 S, denoted
as o.kdist, is the vector of size k that denotes the distance between o and its k closest objects in S
th
{o}. kdist[i]
denotes the distance between o and its i closest object. Mathematically, o.kdist[i] = dist(o, a) for i = 1, ..., k and
where a 2 S
{o} such that |{b 2 S
{o, a}s.t.dist(b, o) dist(a, o)}| = i
1.
Notice that the value of kcnt depends on both the dataset S and on the query point q. In contrast,
kdist depends only on the dataset. (Technically, the query parameter k defines the size of the kdist vector,
but all component values are independent of any query parameter.) It is also possible to connect the
ideas of kcnt and kdist to that of those of kNN circle and result set membership. The kNN circle of a data
object o can be defined as kNNCir(o, k) =cir(o,o.kdist[k]). That is, o.kdist provides the radii of the first k
kNN circles of object o. Membership of a data object o in the result set of a query Q can be determined
by that objects kcnt value. Specifically, o 2 R , o.kcnt < Q.k. We prove these two results in Lemma 7
and Lemma 8.
Lemma 7. Consider a RkNN query Q issued at query point q and supplemental parameter k. Then kNNCir(o, k) =
cir(o,o.kdist[k]) for all o 2 S.
Proof. Recall that o.kdist[k] refers to the distance between object o and the object in S that is further
away from o than exactly k
1 other objects in S. Then, any object a such that dist(a, o) o.kdist[k] must
be among the k closest objects to o in S. This region is precisely cir(o,o.kdist[k]). Conversely, any object
a with dist(a, o) > o.kdist[k] cannot be among the k closest objects to o in S, as there are k objects with
distances bounded by o.kdist[k]. Therefore, kNNCir(o, k) =cir(o,o.kdist[k]).
79
Lemma 8. Consider a RkNN query Q issued at query point q and supplemental parameter k. Then o 2 R ,
o.kcnt < Q.k for all o 2 S.
Proof. Suppose that o 2 R. Then there can be at most k objects a 2 S such that dist(o, a) dist(o, q).
Because o.kcnt is defined to be precisely the number of data points in S that are closer to o than q is to
o, we have that o.kcnt < Q.k. Conversely, suppose that o.kcnt < Q.k. Then, there are less than k points
closer to o than q is to o, and we conclude that o 2 R by definition.
It follows from the aforementioned results that we can use kcnt and kdist as tools to determine
membership in result set R. We also know from Section 6.2 that kNN circles can be used to form the
auxiliary scope for a query, so membership in complementary set C can also be determined using only
kcnt and kdist information. However, it is not obvious how to determine kcnt and kdist dynamically
in an efficient manner. Our approach is to incrementally explore the dataset S indexed by an R-tree I.
During this exploration, the algorithm conservatively estimates kcnt and kdist for explored objects and
refines these values as more information becomes available. Once an object o is proven to be outside of
both the result set R and complementary set C of a query Q, we can avoid doing any further refinement
on that object. As previously mentioned, with use kcnt to determine result set membership and use
kdist to determine complementary set membership. Our strategy is to provide a conservative lower
bound on kcnt and a conservative upper bound on kdist (or, more specifically, kdist[k]). If our estimate
of kcnt grows to become at least as large as k, we know that the corresponding object cannot be in R
since there are at least k objects closer to o than q. The criteria for excluding an object from C is more
complicated and varies based on the type of auxiliary scope used. In general, we can exclude an object
o from membership in C if its kdist value restricts the size of its tentative kNN circle such that the circle
does not intersect the tentative auxiliary scope bound TAS , which was defined in Section 6.2.
Figure 6.8 presents the general pseudocode for the Dynamic RkNN Auxiliary Scope Construction
algorithm. From a high level perspective, we can divide the algorithm tasks into roughly five stages.
First, we declare and initialize any local variables that may be required later during the algorithm
execution. The second, third, and fourth stages occur iteratively for each node N in the R-tree index
passed to our function. Nodes are examined in increasing order of their minimum distance, mindist, to
the query point q. During stage two, we determine if it is possible to finalize the kcnt or kdist values
of any previously identified point. That is, we mathematically analyze if it is possible for o.kcnt to
increase or for o.kdist to decrease for any discovered data object o. In the third stage, we examine the
current index node N to be processed. If it is an internal node, we initialize and enqueue its children
for processing. Otherwise, it must be a data object. In this case, we save the object and update the kcnt
and kdist values of previously found data objects where applicable. The fourth stage occurs at the end
of our loop body and eliminates items from the preliminary complementary set if they do not overlap
the tentative auxiliary scope region given by TAS . Section 6.4 discusses ways to short circuit index node
evaluation to further reduce computation time and disk accesses. After every node has been considered,
we finalize all remaining kcnt and kdist values, refine the complementary set, and return both the result
set R and the complementary set C to the user.
In the following paragraphs, we take a closer look at each of the five stages of the algorithm as well
80
Algorithm find dynamic as(I,q,k)

Input.
A preconstructed R-tree index I of all data points in the
dataset, the query point q, and the RkNN parameter k
Output. Result set R and complementary set C for auxiliary scope
Begin
// Declare local variables
1. Initialize set R
{}, C
{}, H
{}, O
{}
2. Initialize set TentR
{}, TentC
{}, TentD
{}
3. Initialize counter z
1
4. Initialize tuple initdist
(11 , ..., 1k )
6. P.set sort method(mindist(P.object, q))
7. P.enqueue((I.root, 0, initdist))
// Finalize kcnt and kdist values
9.
(N, N.kcnt, N.kdist)
P.dequeue()
10.
If N.kcnt < k Then z
z 1
11.
(z, TentD, TentR, TentC)
finalize kcnt(q, k, N, z, TentD, TentR, TentC)
12.
(R, TentR) finalize kdist(q, k, N, R, TentR)
13.
(C, TentC) finalize kdist(q, k, N, C, TentC)
14.
N
initialize stats(q, k, N, H)
// Process index node
15.
16.
17.
m
initialize stats(q, k, m, H)
18.
P.enqueue((m, m.kcnt, m.kdist))
19.
If m.kcnt < k Then z
z+1
20.
Else
21.
(TentD, TentR, TentC)
update stats(q, k, N, TentD, TentR, TentC)
22.
H
H [ {N}
23.
TentD
TentD [ {N}
24.
z
z+1
25.
C refine as comp set(q, k, z > 0, R [ TentR, C)
26.
TentC
refine as comp set(q, k, z > 0, R [ TentR, TentC)
// Finalize auxiliary scope
27. (z, TentD, TentR, TentC)
finalize kcnt(q, k, 1, z, TentD, TentR, TentC)
28. (R, TentR) finalize kdist(q, k, 1, R, TentR)
29. (C, TentC) finalize kdist(q, k, 1, C, TentC)
30. C refine as comp set(q, k, true, R, C)
31. Return (R, C);
End.
Figure 6.8. Algorithm find dynamic as
81
Symbol
Set Name
TentD
TentR
TentC
R
C
H
Tentative Object Set

Tentative Result Set
Tentative Comp. Set
Partially Refined Result Set
Partially Refined Comp. Set
Visited Object Set
kcnt
finalized
No
Yes
Yes
Yes
Yes
Mix
value
<k
<k
k
<k
k
Any
kdist
finalized
No
No
No
Yes
Yes
Mix
Table 6.2. Set definitions
as the subroutines that support our processing objectives. However, it is necessary to define the various
sets used by our dynamic algorithm prior to continuing our discussion. The primary sets include
the tentative object set TentD, the tentative result (complementary) set TentR (TentC), the partially
refined result (complementary) set R (C), as well as the visited object set H. Table 6.2 summarizes this
information. All data objects begin as index node entries in the priority queue P. Internal nodes are
explored with their children being placed back into P. Once a data object is found, it is copied into both
H and TentD. Data objects are never removed from H, as a permanent record of all object locations
is needed to properly categorize future data points. The copy of the data object in TentD is moved
eventually to either TentR or to TentC. Membership in TentC indicates that a minimum of k data objects
have been found to be closer to the the object than the query point q is to the object. Alternatively,
membership in TentR indicates that the object has fewer than k objects closer than the distance to q and
that we have considered all such objects that could possibly exist. Once in TentR (TentC), an object will
eventually move to R (C) once its kdist value is finalized. Note that it is possible for an object to be
removed from TentC or C if its kNN circle is found to not overlap the tentative auxiliary scope TAS . This
information flow is summarized in the flowchart given by Figure 6.9.
We now take a closer look at the logice behind each of the five stages of our algorithm. During the
first stage (lines 1-7), we initialize the local variables that will be needed for computing the result set and
complementary set for the submitted query Q, which is given to the algorithm as input by passing its
query point q and supplemental parameter k. Our technique also requires a reference to an R-tree index
I of the dataset S. (Most R-tree variants such as an R*-tree or aR-tree are also acceptable.) Unlike the
basic approach, our dynamic approach assumes that the value of k is not known until query execution.
Therefore, the R-tree index is of the actual data points and not of the kNN circles formed by those data
points. We declare a counter z that tracks the number of data objects and enqueued nodes that have
unfinalized kcnt values less than k. This will be needed for refining the complementary set whenever
the containment scope version of the algorithm is used. Next, we initialize all data object sets as empty
and enqueue the root node of I into the priority queue P for processing. Note that P processes objects
based on their minimum distance to the query point q.
During the second stage (lines 8-14), we dequeue the next object from P and consider its distance
from the query point q. This distance is used by the finalize kcnt subroutine to determine if any
objects in TentD can be moved to TentR or to TentC. Figure 6.10 provides the relevant pseudocode
for this operation. Objects can be moved to TentC if they have kcnt
k per Lemma 8. Otherwise,
82
R-Tree Index
(I)
EXPLORE
Priority Queue
(P)
COPY
Unclassified Data
Objects
(TentD)
Eliminated
Objects
(-)
Examined Data
Objects
(H)
Tentative
Complementary
Objects
(TentC)
Tentative Result
Objects
(TentR)
Complementary
Objects
(C)
Result Objects
(R)
Figure 6.9. Dynamic RkNN auxiliary scope set membership flowchart
we evaluate the Boolean condition dist(q, o) mindist(q, N)/2. This is equivalent to verifying that
cir(o, dist(o, q)) cir(q, mindist(q, N)). If this condition holds, then we have examined the entire search
space for objects closer to o than q is to o because the priority queue processes objects using the mindist
function. Thus, we conclude that kcnt cannot increase further and can finalize the object as a member of
the result set. Once relevant kcnt values have been finalized, we perform a similar operation with kdist
values. The subroutine for kdist finalization is given in Figure 6.11 and is invoked on both TentR and on
TentC. The algorithm checks the Boolean condition dist(q, o) + o.kdist[k] mindist(q, N). Like before, an
equivalent definition of this expression is cir(o, o.kdist[k]) cir(o, mindist(q, N)). If this condition holds,
then we have examined the entire search space for objects closer to o than the k objects already identified
by the algorithm. It follows that kdist cannot be reduced further, and we move such objects into R or C.
Once all finalization takes place, our last step in the second stage is to initialize kcnt and kdist attributes
for the dequeued node N using the latest data object information from H. This is performed with a call
to the initialize stats subroutine, which is presented in Figure 6.12. Here, we examine each identified
data object h in turn. If h is closer to N than query point q is to N, then we increment o.kcnt. Similarly, if h
83
Subroutine finalize kcnt(q, k, N, z, TentD, TentR, TentC)

Input.
An RkNN query at point q with parameter k,
current index node N, possible result object counter z
as well as tentative sets TentD, TentR, and TentC
Output. Revised tentative sets TentD, TentR, and TentC
as well as updated counter z
Begin
1. For each object o in TentD
2.
If o.kcnt k
3.
TentD
TentD {o}
4.
TentC
TentC [ {o}
5.
z
z 1
6.
Else If dist(q, o) mindist(q, N)/2
7.
TentD
TentD {o}
8.
TentR
TentR [ {o}
9.
z
z 1
10. Return (z, TentD, TentR, TentC);
End.
Figure 6.10. Subroutine finalize kcnt
Subroutine finalize kdist(q, k, N, A, TentA)

Input.
current index node N, as well as tentative set
TentA and finalized set A
Output. Revised tentative set TentA, and finalized set A
Begin
1. For each object o in TentA
2.
If dist(q, o) + o.kdist[k] mindist(q, N)
3.
TentA
TentA {o}
4.
A
A [ {o}
5. Return (A, TentA);
End.
Figure 6.11. Subroutine finalize kdist
is among the k closest examined objects to o, we update the kdist vector to retain this information. At the
conclusion of the algorithm, N will have the smallest kdist and largest kcnt that can be guaranteed based
on the algorithms present information about the dataset. Notice here that N may not be a data object
but could be an internal node defined by a bounding box. In such a case, we provide an estimate of
kcnt that is guaranteed to be a lower bound for any contained data object by considering the minimum
distance from N to q and the maximum distance from N to h. It is possible that a contained data object
could have a larger kcnt value than that of its parent node. In the case of kdist, we assume that every
internal node has a kdist of infinity. This issue will be revisited in Section 6.4. Since N now contains
current statistics and since all finalization of data objects has been performed, stage two is complete.
We now consider stage three of our processing algorithm (lines 15-24). Here, we actually process
the dequeued node N. This procedure varies depending on the type of N. If N is an internal node, then
84
Subroutine initialize stats(q, k, m, H)

Input.
An RkNN query at point q with parameter k, set H of
identified data points, and uninitialized object m
Output. Initialized object m
Begin
// D contains dist values to current k closest points
1. Initialize list D
{11 , , ..., 1k }
2. Initialize m.kcnt
0
3. For each object h in H
4.
If maxdist(m, h) mindist(m, q)
5.
m.kcnt
m.kcnt + 1
6.
If m is internal node AND mindist(m, h) D[k].dist
7.
D.remove(k)
8.
D.append(dist(m, h))
9.
D.sort()
10. m.kdist
(D[1].dist, ..., D[k].dist)
11. Return m
End.
Figure 6.12. Subroutine initialize stats
Subroutine update stats(q, k, N, TentD, TentR, TentC)

Input.
current data object N as well as
tentative sets TentD, TentR, and TentC
Output. Revised tentative sets TentD, TentR, and TentC
Begin
1. For each object o in TentD
2.
If dist(o, N) dist(o, q)
3.
o.kcnt
o.kcnt + 1
4. For each object o in TentD [ TentR [ TentC
5.
If dist(o, N) o.kdist[k]
6.
o.kdist[k]
dist(o, N)
7.
o.kdist.sort()
8. Return (TentD, TentR, TentC);
End.
Figure 6.13. Subroutine update stats
we examine each of its children, compute their kcnt and kidst values using the previously mentioned
initialize stats subroutine, and enqueue them in P for later processing. Otherwise, N must be a data
object. In such a case, we perform the following operations. First, we call the update stats subroutine
given in Figure 6.13 to increase kcnt or to decrease kdist of previously identified objects wherever
possible. The logic of this step is identical to that of initialize stats with one exception. In the previous
case, we had a single new uninitialized object and needed to construct kcnt and kdist using the entirety
of set H. In this case, we consider all objects presently in H and seek to update previously computed
kcnt and kdist values using a single new data object N. As usual, an object o has its kcnt increased if
85
dist(o, N) < dist(o, q) and has its kdist decreased if dist(o, N) < o.kdist[k]. Once the statistics of objects in H
are updated, we add N to both H and to TentD. This concludes stage three.
Subroutine refine vs comp set (q, k, f inal, R, C)
Implements refine as comp set
Input.
set R of result objects, set C of
complementary objects, and Boolean flag f inal
to indicate if result set is finalized
Output. Revised complementary set C with
unnecessary data points removed
Begin
1. For each object c in C
// The NN circle of each complementary object must
// intersect EVERY result objects NN circle
2.
For each object r in R
3.
If cir(c, c.kdist[k]) \ cir(r, r.kdist[k]) =
4.
C
C {c}
5.
Break
6. Return C;
End.
Figure 6.14. Subroutine refine vs comp set
In stage four (lines 25-26), we search for objects in TentC and C that can be removed because they have
kNN circles that do not intersect the tentative auxiliary scope given by TAS . The logic for this reduction
diers depending on the type of auxiliary scope used. Our dynamic RkNN algorithm uses a call to
the subroutine refine as comp set that is implemented by refine vs comp set and refine cs comp set in
Figure 6.14 and Figure 6.15, respectively. In the case of valid scope, we form the tentative region TVS by
computing the intersection of every result objects kNN circle. Since the order in which result objects
and complementary objects are found is not guaranteed, it is possible that TVS may shrink at any point
that a new result object is found by the algorithm. However, we know that any object with a kNN circle
that does not intersect the current tentative auxiliary scope must not be in the final complementary set.
Thus, we test each complementary object to see if it lies within every result objects kNN circle. Recall
that the kNN circle can be reconstructed using an objects kdist. Once again, the value of kdist may not
be finalized. However, our conservative upper bound ensures that objects can never be prematurely
excluded from the complementary set. The logic for complementary set refinement for containment
scope is similar but diers in two important ways. First of all, an objects kNN circle need only touch
the kNN circle of a single result object. Secondly, this first change implies that TCS may actually grow as
additional result objects are discovered. It follows that we cannot remove any objects from C or TentC
until all result objects have been found. We use the counter z to detect when this condition has occurred
and respond accordingly. The result of the logic is similar to that of valid scope refinement.
Once all index nodes have been processed and the priority queue is empty, we enter stage five (lines
27-31). Here, we finalize every objects kcnt and kdist values with the appropriate calls to finalize kcnt
and finalize kdist. Additionally, we perform a final refinement of the complementary set and then
86
Subroutine refine cs comp set (q, k, f inal, R, C)

Implements refine as comp set
Input.
set R of result objects, set C of
complementary objects, and Boolean flag f inal
to indicate if result set is finalized
Output. Revised complementary set C with
unnecessary data points removed
Begin
1. Define Boolean flag contained
2. If NOT f inal
3.
Return C
4. For each object c in C
// The NN circle of each complementary object must
// intersect SOME result objects NN circle
5.
contained
f alse
6.
For each object r in R
7.
If cir(c, c.kdist[k]) \ cir(r, r.kdist[k]) ,
8.
contained
true
9.
Break
10.
If NOT contained
11.
C
C {c}
12. Return C;
End.
Figure 6.15. Subroutine refine cs comp set
return the result set R and the complementary set C to the user.
6.3.2 Dynamic RkNN Auxiliary Scope Example

With a general description of our Dynamic RkNN Auxiliary Scope Construction algorithm complete,
we seek to provide a more concrete intuition about the algorithms execution via a working example.
We use the same dataset and R-tree index given during the description of our basic RkNN processing
algorithm in Figure 6.5. Our algorithm requires 13 steps to process this dataset, and these steps have
been segmented into three general stages to facilitate the presentation. (Note that these stages do not
correspond to the five stages of the algorithm mentioned previously.) The beginning and ending states
of each stage are given in Figure 6.16. The first two stages are identical regardless of whether auxiliary
scope is implemented using valid scope or using containment scope. The third stage involves minor
dierences between these two techniques, and we will point out these nuances as appropriate.
Table 6.3 provides a step-by-step breakdown of algorithm execution during the first stage. We begin
with empty data object sets and with a single root node enqueued in priority queue P. We then dequeue
R, compute its kcnt and kdist values (0 and 1 at this point) and enqueue children R1 and R2 in P. Actions
for steps 1-3 follow a similar pattern, as an internal node is dequeued from P, its statistics are computed,
and its childrens statistics are computed prior to insertion into priority queue P. Note that the order in
87
R4
e
c
d
q
(a) Stage I (Before)
a
(b) Stage 1 (After)
R4
R4
e
d
q
d
q
b
a
(c) Stage II (Before)
(d) Stage II (After)
R4
c
d
q
d
q
b
(e) Stage III (Before)
(f) Stage III (After)
Figure 6.16. Example dynamic auxiliary scope computation
which a node N is dequeued from P is in increasing order of mindist(q, N). Also, kcnt = 0 and kdist = 1
for all objects during this stage as (1) no data object has been processed and (2) H = ;. Through the
first stage, our algorithms view of the dataset has grown from simply seeing the root node R to having
individual objects a, b, c, d, e as well as internal node R4 enqueued for processing.
Moving on to Stage II of our example, Table 6.4 oers a detailed description of actions that occur
during steps 4-8. Our first action is to dequeue object d and to compute its statistics. We then can add
d to both TentD and to H as it is an object with unfinalized kcnt and kdist values. Our next step is to
dequeue object c. We also add this object to TentD and H and also use its information to update the
88
Step
0
Set Contents
P = {R}
TentD = {}
TentR = {}
TentC = {}
R = {}
C = {}
H = {}
P = {R1 , R2 }
TentD = {}
TentR = {}
TentC = {}
R = {}
C = {}
H = {}
P = {R2 , a, b}
TentD = {}
TentR = {}
TentC = {}
R = {}
C = {}
H = {}
P = {R3 , a, b, R4 }
TentD = {}
TentR = {}
TentC = {}
R = {}
C = {}
H = {}
Action
Dequeue R
Compute stats for R
Compute stats and enqueue
R1 , R2 in P
Dequeue R1
Compute stats for R1
a, b in P
Dequeue R2
R3 , R4 in P
Dequeue R3
d, c, e in P
Table 6.3. Dynamic RkNN auxiliary scope (Stage I)
statistics of d. During the next phase, a is dequeued. We now have determined that both c and d cannot
be result objects, as they are both closer to each other than they are to query point q. Thus, we move
them to TentC. Based on the distance from q to current object a, we can also conclude that kdist for
a is finalized and consequently move it to C. Finally a is added to TentD and H. The following step
dequeues and processes b in the same way as previous data objects. Notice that d is not included in
statistic calculation updates since both d.kcnt and d.kdist are finalized. (It is worth noting that d.kcnt
may not actually be finalized. However, the fact that d.kcnt
k allows us to conclude that it is not a
result object, so the actual final kcnt value is, in fact, irrelevant.) During the final step in this stage, we
dequeue object e, compute and update relevant statistics, and then add it to TentD and to H. At this
point Figure 6.16(d) depicts the current search range as well as the tentative kdist values for each data
object.
The final stage of our example diers depending on the type of auxiliary scope used. Table 6.5 and
Table 6.5 oer detailed descriptions of the third stage of the example for valid scope and for containment
scope, respectively. We consider valid scope first. In Step 9, we dequeue internal node R4 and enqueue
its children f , g, and h into P. Based on the value of mindist(q, R4 ), we are able to reclassify object e as a
89
Step
4
Set Contents
P = {d, c, a, b, e, R4 }
TentD = {}
TentR = {}
TentC = {}
R = {}
C = {}
H = {}
P = {c, a, b, e, R4 }
TentD = {d}
TentR = {}
TentC = {}
R = {}
C = {}
H = {d}
P = {a, b, e, R4 }
TentD = {c, d}
TentR = {}
TentC = {}
R = {}
C = {}
H = {c, d}
P = {b, e, R4 }
TentD = {a}
TentR = {}
TentC = {c}
R = {}
C = {d}
H = {a, c, d}
P = {e, R4 }
TentD = {a, b}
TentR = {}
TentC = {c}
R = {}
C = {d}
H = {a, b, c, d}
Action
Dequeue d
Compute stats for d
Add d to TentD
Add d to H
Dequeue c
Compute stats for c
Update stats for d
Add c to TentD
Add c to H
Dequeue a
Move c, d from TentD to TentC
Move d from TentC to C
Compute stats for a
Update stats for c, d
Add a to TentD
Add a to H
Dequeue b
Compute stats for b
Update stats for a, c
Add b to TentD
Add b to H
Dequeue e
Move c from TentC to C
Compute stats for e
Add e to TentD
Add e to H
Table 6.4. Dynamic RkNN auxiliary scope (Stage II)
complementary object and to reclassify a and b as result objects. The valid scope refinement process then
removes objects c and e from consideration as their NN circles do not intersect the tentative scope TVS
formed by the intersection of NN circles for result objects a and b. In Step 10, data object f is dequeued
and relevant statistics are updated. Step 11 involves the similar dequeuing of object g. However, we are
also able to reclassify and subsequently remove object f from TentC. In Step 12, object h is dequeued
and object g is removed from the complementary set. Finally, the algorithms last action is the removal
of object h from the complementary set and transmission of result set R and complementary set C to the
client in Step 13. The steps for the containment scope version of this algorithm are slightly dierent.
First, notice that objects c and e are not removed from the complementary set in Step 9. In fact, both of
90
Step
9
10
11
12
13
Set Contents
P = {R4 }
TentD = {a, b, e}
TentR = {}
TentC = {}
R = {}
C = {c, d}
H = {a, b, c, d, e}
P = { f, g, h}
TentD = {}
TentR = {a, b}
TentC = {e}
R = {}
C = {c, d}
H = {a, b, c, d, e}
P = {g, h}
TentD = { f }
TentR = {}
TentC = {e}
R = {a, b}
C = {c, d}
H = {a, b, c, d, e, f }
P = {h}
TentD = {g}
TentR = {}
TentC = {}
R = {a, b}
C = {c, d, e}
H = {a, b, c, d, e, f, g}
P = {}
TentD = {h}
TentR = {}
TentC = {}
R = {a, b}
C = {c, d, e}
H = {a, b, c, d, e, f, g, h}
Action
Dequeue R4
Move e from TentD to TentC
Move a, b from TentD to TentR
f , g, h in P
Remove c, e from TentC
Dequeue f
Move a, b from TentR to R
Compute stats for f
Update stats for e
Add f to TentD
Add f to H
Dequeue g
Move f from TentD to TentC
Move e from TentC to C
Compute stats for g
Add g to TentD
Add g to H
Remove f from TentC
Dequeue h
Move g from TentD to TentC
Compute stats for h
Add h to TentD
Add h to H
Remove g from TentC
Move h from TentD to TentC
Move h from TentC to C
Remove h from TentC
Return (R, C)
Table 6.5. Dynamic RkNN auxiliary scope (Stage III - VS)
these objects will be a part of the final complementary set at completion of the algorithm. Aside from
this one change and the fact that e now moves to set C at Step 11, the algorithms perform similarly.
Finally, we consider the efficiency and applicability of the Dynamic RkNN Auxiliary Scope algorithm.
Recall that our process successfully removes many constraints that had been placed on basic versions
of our RkNN auxiliary scope computation algorithm. The ability to allow k to vary dynamically at
runtime and with every client query is an important and substantial improvement. Also note that this
algorithm integrates the discovery of both result set and complementary set information. Both sets are
formed simultaneously using conservative estimates of kcnt and kdist as tools. This integration oers
an opportunity to substantially reduce processing time and disk accesses. However, our algorithm still
91
Step
9
10
11
12
13
Set Contents
P = {R4 }
TentD = {a, b, e}
TentR = {}
TentC = {}
R = {}
C = {c, d}
H = {a, b, c, d, e}
P = { f, g, h}
TentD = {}
TentR = {a, b}
TentC = {e}
R = {}
C = {c, d}
H = {a, b, c, d, e}
P = {g, h}
TentD = { f }
TentR = {}
TentC = {e}
R = {a, b}
C = {c, d}
H = {a, b, c, d, e, f }
P = {h}
TentD = {g}
TentR = {}
TentC = {}
R = {a, b}
C = {c, d, e}
H = {a, b, c, d, e, f, g}
P = {}
TentD = {h}
TentR = {}
TentC = {}
R = {a, b}
C = {c, d, e}
H = {a, b, c, d, e, f, g, h}
Action
Dequeue R4
Move e from TentD to TentC
Move a, b from TentD to TentR
f , g, h in P
Dequeue f
Move a, b from TentR to R
Compute stats for f
Update stats for a, b, e
Add f to TentD
Add f to H
Dequeue g
Move f from TentD to TentC
Move e from TentC to C
Compute stats for g
Add g to TentD
Add g to H
Remove f from TentC
Dequeue h
Move g from TentD to TentC
Compute stats for h
Add h to TentD
Add h to H
Remove g from TentC
Move h from TentD to TentC
Move h from TentC to C
Remove h from TentC
Return (R, C)
Table 6.6. Dynamic RkNN auxiliary scope (Stage III - CS)
suers from two fundamental flaws. First, the approach thus far has focused on the development of
algorithms for monochromatic datasets. It is also possible to support bichromatic data, and such an
extension will be given in Section 6.4. In addition and perhaps more importantly, the current version of
the Dynamic RkNN Auxiliary Scope algorithm processes every index node without regard to whether
or not it is truly needed to compute the result set and complementary set of an auxiliary scope. This
is impractical, as the dataset index is likely to be quite large and require many page accesses. The
next section discusses an optimal method that augments our algorithm in such a way as to allow it
to avoid accessing those data objects and internal nodes that will not influence the result set or the
complementary set. This will enhance our algorithms performance and ensure minimal overhead in
92
the computation of auxiliary scope information.
6.3.3 Dynamic RkNN Auxiliary Scope Client Evaluation

Algorithm client query eval vs (Z, Q0 , I)
Input.
A set Z of valid scope tuples (Q, R, C), where Q
is an RkNN query with corresponding result set R
and complementary set C. We also receive a new
query Q0 and R-tree index I
Begin
2.
If Q0 .k , Q.k Then Break
3.
For each c in C
4.
If Q0 .q 2 cir(c, c.kdist[Q.k]) Then Break
5.
For each r in R
6.
If Q0 .q < cir(r, r.kdist[Q.k]) Then Break
7.
Return (R, Z)
8. (R, C)
find dynamic vs(I,Q0 .q,Q0 .k)
End.
Figure 6.17. Algorithm client query eval vs (Revised)
Algorithm client query eval cs (Z, Q0 , I)

Input.
where Q is an RkNN query with corresponding result
a new query Q0 and R-tree index I
Begin
2.
R
{}
3.
If Q0 .k > Q.k Then Break
4.
For each c in C
5.
If Q0 .q 2 cir(c, c.kdist[Q0 .k]) Then Break
6.
For each r in R
7.
If Q0 .q 2 cir(r, r.kdist[Q0 .k]) Then Break
8.
If |R| > 0 Then Return (R, Z)
9. (R, C)
find dynamic cs(I,Q0 .q,Q0 .k)
End.
Figure 6.18. Algorithm client query eval cs (Revised)
With the server computation algorithm for dynamic RkNN auxiliary scope defined, we now turn
our attention to necessary changes to enable the client to make use of this new data. Once again,
93
the exact logic for RkNN query evaluation is highly dependent on the specific type of auxiliary scope
employed. Figure 6.17 and Figure 6.18 present the relevant code for valid scope and for containment
scope, respectively. Recall from Section 6.2 that a client maintains a set of previously compiled auxiliary
scope information and need only find a single scope containing the new RkNN query Q0 in order to
avoid submitting the query to the server. In the case of valid scope, we require a precise match on
supplemental query parameters. That is, we must have Q.k = Q0 .k. The rest of the logic remains
unchanged with two exceptions. First, the NN circles must now be computed using the location of an
element in R or C in tandem with its stored kdist value. Before, the actual stored element was the NN
circle of the result object. The second necessary change is to consider the kNN circle of a data point
instead of its NN circle. Note that both of these changes are straightforward to implement. The changes
to containment scope computation logic are slightly more substantial. By definition, containment scope
features semantic containment of queries. It is therefore able to assist in the evaluation of any query
Q0 that has more restrictive supplemental parameters than those used by the query Q to which the
containment scope is associated. This implies that a containment scope is applicable if Q0 .k Q.k.
The broader applicability of containment scope is one of the primary benefits that it oers system
administrators over a valid scope model. Like its valid scope brother, the client containment scope
evaluation routine for dynamic RkNN auxiliary scope must form kNN circles dynamically using object
location data and object kdist data from R or C. Notice that the employed value of k is determined by the
new query Q0 as opposed to Q. Thus, it is necessary to return the entire kdist vector (and not just kdist[k])
when using a containment scope approach. With the exception of the changes and generalizations
mentioned above, the logic for containment scope evaluation also remains very similar to that of basic
method.
6.4
Optimial RkNN Auxiliary Scope Construction
In Section 6.2 and Section 6.3, we introduced the fundamental concepts behind an RkNN auxiliary
scope processing algorithm. We now build on this work to produce an Optimal RkNN Auxiliary Scope
algorithm that only examines those objects and nodes that could possibly aect the construction of
the result set or complementary set for our auxiliary scope. As system implementations may need to
process either monochromatic datasets or bichromatic datasets, we provide two versions of this optimal
algorithm to address both cases.
6.4.1 Monochromatic Optimal Auxiliary Scope Processing

We begin by considering the monochromatic case that corresponds to all previous variants of the
RkNN auxiliary scope algorithm. Our goal is to modify the Dynamic RkNN Auxiliary Scope algorithm
to determine when a particular index node can be skipped. There are three separate cases where a
particular node N may be need to be processed further. These are as follows:
1. N possibly is a result object or contains result object.
94
2. N possibly is a complementary object or contains complementary objects.
3. N can be used to reduce the kcnt or kdist value of another data object and thereby possibly reduce
the final result set and complementary set.
Figure 6.19 oers an example of each case as well as a fourth case where an object truly can be ignored
by the algorithm. Assume that a R1NN query Q is issued at query point q and that containment scope
is selected for our auxiliary scope algorithm type. In Figure 6.19(a) object b can still possibly be a
result object because (1) its current kcnt < k and (2) the query search space given by cir(q, dist(q, b)) does
not entirely cover cir(b, dist(q, b)). This is an example of Case 1, so object b must be processed by our
algorithm. Now consider Figure 6.19(b). Here, we observe that current object e must be processed
because the kNN circle of e given by cir(e, dist(e, e)) still overlaps the tentative auxiliary scope space TCS
given by the union of the NN circles of current result objects a and b. This is an example of Case 2.
Figure 6.19(c) considers whether object R4 needs to be processed. Although we can conclude both that
R4 cannot contain either result objects or complementary objects, we still must examine it because the
object lies within the tentative NN circle of e and consequently could be used (1) to reduce e.kdist and (2)
to eliminate e from the complementary set. This is precisely the condition described by Case 3. Finally,
consider the evaluation of object g in Figure 6.19(d). Notice that this object does not fit any of our three
previously descirbed cases. That is, we know that object g cannot be a member of the result set (because
of object f ), cannot be a member of the complementary set (because the tentative NN circle of g does
not overlap with TCS ), and cannot aect the kcnt or kdist values of object e or any other data object. Thus,
we conclude that g does not need to be evaluated by the algorithm.
The three cases above exhaustively cover the conditions under which the evaluation of an index
node is required. Therefore, our general approach is to augment the Dynamic RkNN Auxiliary Scope
algorithm to identify when a node fails to meet any of the three cases and to avoid further processing.
However, there is one problem with this approach; namely, we do not have global knowledge of all
other data objects when determining if a particular node can be excluded. It follows that we may ignore
a node because it does not influence the kcnt or kdist values of any stored data object and later process
a new data object that needs information about the skipped node to correctly compute kcnt and kdist
values. To address this issued, we store ignored nodes in a special set O. We then modify our original
code to check this set for potentially relevant objects anytime that a new data object is processed. This
ensures that erroneously ignored objects will still eventually be processed by the algorithm.
The pseudocode for ignoring an index node is given in Figure 6.20. The subroutine receives information about the query (q, k), current node (N), as well as current set contents (R, TentR, TentC, and
TentD). Using this information, the algorithm returns true if the object can be ignored and f alse otherwise. Notice that the code body corresponds exactly with the three previously described cases. We first
check that the object cannot be a result object or contain result objects (lines 1-2). Next, we verify that
the object cannot be a complementary object or contain complementary objects by invoking a call to the
complementary set refinement algorithm and providing the current node as the sole complementary
object to be processed (lines 3-4). Finally, we cover the remaining case by checking for membership of
N in cir(o, dist(o, q)) and cir(o, o.kdist[k]) for every known object o. If it is inside either circle, it cannot be
95
R4
R4
e
c
c
d
q
d
q
b
(a) Case 1
(b) Case 2
f
g
ce
c
d
q
b
b
a
R4
e
R4
d
q
c
d
q
b
(c) Case 3
(d) Case 4
Figure 6.19. Outside search space scenarios
ignored. Otherwise, we conclude that processing is not necessary and communicate this result to the
general RkNN auxiliary scope processing algorithm.
The introduction of the index node exclusion and the necessary management of the ignored object
set O necessitates the modification of the primary Dynamic RkNN Auxiliary Scope algorithm as well
as the statistics initialization subroutine initialize stats. We present the the modified versions of these
two code segments in Figure 6.21 and in Figure 6.22, respectively. The dierences in these algorithms
are minor but substantial. In the case of the Optimal RkNN Auxiliary Scope algorithm, we now
declare an ignored object set O at the beginning of query evaluation (line 1) and call the outside search
subroutine prior to processing a new node (line 15). If it is determined that the object can be ignored
(at least for now), we add N to the ignored object set O and bypass all further processing of the index
entry. Otherwise, we continue as before. Any call to initialize stats has been modified to also provide
the current ignored object set O as a modifiable parameter. Now consider the changes made to the
initialize stats subroutine. The computation of kcnt and kdist for data object nodes remains unchanged.
Similarly, kcnt for internal nodes is processed in the same way as it was for the Dynamic RkNN Auxiliary
Scope algorithm. However, we now must compute a more accurate upper bound for kdist for internal
96
Subroutine outside search(q, k, N, z, R, TentR, TentC, TentD)

Input.
An RkNN query at point q with parameter k, sets
TentR and R of result objects, set TentC of
complementary objects, set TentD of unclassified
objects, current index node N, and possible result
object counter z
Output. Returns true if N can be ignored by search
Begin
// Ensure object cannot be in result set
1. If N.kcnt < k
2.
Return f alse;
// Ensure object cannot be in complementary set
3. If (, {N}) =
refine as comp set(q, k, z > 0, R [ TentR, , {N})
4.
Return f alse;
// Ensure object will not be helpful in possibly eliminating
// other candidate complementary objects in future
5. For each object o in TentR [ TentC [ TentD
6.
If mindist(o, N) min(dist(o, q), o.kdist[k])
7.
Return f alse;
8. Return true;
End.
Figure 6.20. Subroutine outside search
nodes. Otherwise, only nodes at the data object (leaf) level will be ignored by our optimal algorithm.
Our strategy here is as follows. Consider a bounding box B that contains at least k data objects. This will
normally be the case, as R-trees are disk indexing structures and typically have a large fanout factor and
k is typically not very large. If B contains at least k objects, then every object within B must have a kNN
circle of radius less than or equal to r = 32 B.diagonal, where B.diagonal refers to the length of the diagonal
p
rectangle B or equivalently B.diagonal = B.length2 + B.width2 . A kNN circle for any object o 2 B cannot
be larger than cir(o, r) since (1) that circle must contain B and (2) B has at least k objects. Thus, we use
N.kdist = r as a conservative upper bound for internal nodes. In the rare event that N does not contain
at least k objects, we expand the bounding box of N to include the k
N.capacity closest data objects
(using the mindist metric). This new bounding box is then used in the computation. The pseudocode
performs the internal node kdist computation on lines 10-12. The second and final modification to the
initialize stats subroutine is the evaluation of ignored objects in O to determine if they could influence
the kcnt or kdist value of N. If so, we remove the suspect object from O and enqueue it in P for further
analysis (lines 15-18).
All other subroutines remain unchanged from the Dynamic RkNN Auxiliary Scope computation
algorithm. The evaluation of queries on the client side also is not aected. The modifications applied
in this section adddress the final performance concerns of RkNN auxiliary scope computation. By
constructing an ignored object set O, we are able to avoid costly processing of irrelevant nodes and in
doing so drammatically reduce the computational overhead of performing auxiliary scope computation.
97
Algorithm find optimal as(I,q,k)

Input.
A preconstructed R-tree index I of all data points in the
dataset, the query point q, and the RkNN parameter k
Output. Result set R and complementary set C for auxiliary scope
Begin
1. Initialize set R
{}, C
{}, H
{}, O
{}
{}, TentC
{}, TentD
{}
1
(11 , ..., 1k )
6. P.set sort method(mindist(P.object, q))
7. P.enqueue((I.root, 0, initdist))
9.
P.dequeue()
10.
z 1
11.
12.
13.
14.
(N, O, P)
initialize stats(q, k, N, H, O, P)
15.
If outside search(q, k, N, z, R, TentR, TentC, TentD)
16.
O
O [ {N}
17.
Continue
18.
19.
20.
(m, O, P)
initialize stats(q, k, m, H, O, P)
21.
22.
z+1
23.
Else
24.
25.
H
H [ {N}
26.
TentD
TentD [ {N}
27.
z
z+1
28.
C refine as comp set(q, k, z 0, R [ TentR, C)
29.
TentC
refine as comp set(q, k, z 0, R [ TentR, TentC)
34. Return (R, C);
End.
Figure 6.21. Algorithm find optimal as
98
Subroutine initialize stats(q, k, m, H, O, P)

Input.
identified data points, set O of previously skipped data
nodes, an uninitialized object m, and priority queue P
Output. Initialized object m as well as updated P and O
Begin
// D contains (dist, point) tuples and is ordered by dist
{(11 , ), ..., (1k , )}
0, c
m.min capacity()
4.
5.
m.kcnt
m.kcnt + 1
6.
If mindist(m, h) D[k].dist
7.
D.remove(k)
8.
D.append(mindist(m, h), h)
9.
D.sort()
10. If m is internal node
11.
Define bounding box B
{m, D[1].point, ..., D[k c].point}
12.
m.kdist[1..k]
3/2 B.diagonal()
13. Else
14.
m.kdist
15.
For each object o in O
16.
If mindist(m, o) min(dist(m, q), m.kdist[k])
17.
P.enqueue((o, o.kcnt, o.kdist))
18.
O
O {o}
19. Return (m, O, P)
End.
Figure 6.22. Subroutine initialize stats (Revised)
Processing time and disk accesses are minimized at the server, and the introduction of a flexibile auxiliary
scope framework improves the scalability of our model by avoiding redundant query computation by
the server. In our final algorithm variant, we provide support for bichromatic data and remove the final
restriction on our processing algorithm.
6.4.2 Bichromatic Optimal Auxiliary Scope Processing

Recall from Chapter 2 that bichromatic data partitions the dataset into two types. The first type,
henceforth referred to as type A, contains objects that are possible RkNN query results. That is, only
type A objects are returned by an RkNN query. The second type of object, referred to as type B, represent
possible neighbors to type A objects. That is, only type B objects influence the kNN circle of a data object
and only type A objects are considered as result objects. It follows that our algorithm should only track
kcnt and kdist values for objects of type A. Furthermore, only objects of type B will aect kcnt and kdist
values, and only those objects will be added to the visited object set H. It follows that set H will contain
objects of only type B and that sets TentD, TentR, TentC, R, and C will contain objects of only type A.
99
The ignored object set O may contain objects of either type depending upon the reason for exclusion.
Figure 6.23 provides the pseudocode for the Optimal RkNN Auxiliary Scope Construction algorithm
for bichromatic datasets. The primary changes are to the processing logic for each node N given in
lines 20-37. Unlike the case of monochromatic datasets, we now must consider two possible data types
for each node. If the node is of type A (lines 14-16 and lines 20-28), we initialize the nodes statistics
as usual with a call to initialize stats. If the node is an internal node, we initialize the statistics of each
child and enqueue it in the priority queue P. Otherwise, the node is a data object, and it must be added
to set TentD. Notice that the statistics of other data objects are not updated, as this object is of type A
and cannot aect other objects statistics. In the event that an object is of type B (lines 29-37), we skip
all statistic computation, as it will not be a part of the final result set or complementary set. If the node
cannot be ignored and is an internal node, then we simply enqueue each child in the priority queue P.
Otherwise, if the node is an unignorable data object, we update the statistics of all other data objects
and add the current node to set H. As one additional minor change, notice that the algorithm begins
by enqueuing the root of the R-tree index for objects of type A as well as the root of the R-tree index
for objects of type B. Processing of P then proceeds as usual based on the mindist ordering of contained
elements.
In addition to changes to the main processing algorithm, both the initialize stats subroutine and
the outside search subroutine require changes to properly handle bichromatic datasets. The modified
initialize stats algorithm is given in Figure 6.24 and involves only minor changes to the way in which
the ignored object set O is handled. As only objects of type B can influence the kcnt and kdist values of
other objects, we only consider type B objects in our scan of elements to add back to priority queue P. A
similarly minor change is required for the outside search subroutine given in Figure 6.25. Here, type A
must pass Case 1 (not possibly in result set) and Case 2 (not possibly in complementary set) tests in order
to be excluded from analysis. Objects of type B must pass the Case 3 (not able to aect other objects
result set or complementary set membership) test in order to be excluded from processing. Thus, we
conclude that it is easier for bichromatic index nodes to be excluded from the search process since they
must only pass a subset of the tests required of monochromatic data for exclusion. The remainder of
the subroutines are unchanged from the Dynamic RkNN Auxiliary Scope Construction algorithm for
monochromatic data and can be used directly to support this new problem.
Finally, we consider an example of an RNN query Q issued on a bichromatic dataset S. We adapt
the original example given in Figure 6.5 to be over the bichromatic data shown in Figure 6.26. The
figure contains both red objects that represent type A (a, c, d, e, h) and blue objects that represent type
B (b, f, g). Notice that the indexing structure is homogenous and does not mix objects of dierent
types. Figure 6.26(b) shows the query circles cir(o, dist(o, q)) for each data object o of type A. Notice that
objects a, c, and d are result objects since (1) they are of type A and (2) have no type B objects inside
of their query circles. Next, Figure 6.26(c) shows the NN circles of each data object of type A. Based
on result membership, we can conclude that the tentative auxiliary scope is illustrated by the entire
shaded region for containment scope and the dark shaded region for valid scope. Furthermore, object
e must be returned as a complementary object since its NN circle overlaps with the tentative auxiliary
scope region. As shown in Figure 6.26(d), objects b, f , and g are excluded from the result set and
100
Algorithm find optimal as(I,q,k)

Input.
A preconstructed R-tree index I of all data points in the dataset S = A [ B,
the query point q, and the RkNN parameter k
Output.
Result set R and complementary set C for auxiliary scope
Begin
1. Initialize set R
{}, C
{}, H
{}, O
{}
{}, TentC
{}, TentD
{}
1
(11 , ..., 1k )
6. P.set sort method(mindist(P.ob ject, q))
7. P.enqueue((I.A.root, 0, initdist))
8. P.enqueue((I.B.root, 0, initdist))
10.
P.dequeue()
11.
12.
13.
14.
If N.is type(A)
15.
z 1
16.
(N, O, P)
initialize stats(q, k, N, H, O, P)
17.
If outside search(q, k, N, z, R, TentR, TentC, TentD)
18.
O
O [ {N}
19.
Continue
20.
If N.is type(A)
21.
22.
23.
(m, O, P)
initialize stats(q, k, m, H, O, P)
24.
25.
z+1
26.
Else
27.
TentD
TentD [ {N}
28.
z
z+1
29.
Else
30.
31.
32.
m.kdist
initdist
33.
m.kcnt
0
34.
35.
Else
36.
37.
H
H [ {N}
38.
C refine as comp set(q, k, z 0, R [ TentR, C)
39.
TentC
refine as comp set(q, k, z 0, R [ TentR, TentC)
44. Return (R, C);
End.
Figure 6.23. Algorithm find optimal as (Bichromatic)
101
Subroutine initialize stats(q, k, m, H, O, P)

Input.
identified data points, set O of previously skipped data
nodes, an uninitialized object m, and priority queue P
Output. Initialized object m as well as updated P and O
Begin
// D contains (dist, point) tuples and is ordered by dist
{(11 , ), ..., (1k , )}
0, c
m.min capacity()
4.
5.
m.kcnt
m.kcnt + 1
6.
If mindist(m, h) D[k].dist
7.
D.remove(k)
8.
D.append(mindist(m, h), h)
9.
D.sort()
10. If m is internal node
11.
Define bounding box B
{m, D[1].point, ..., D[k c].point}
12.
m.kdist[1..k]
3/2 B.diagonal()
13. Else
14.
m.kdist
15.
For each object o in O such that o.is type(B)
16.
If mindist(m, o) min(dist(m, q), m.kdist[k])
17.
P.enqueue((o, o.kcnt, o.kdist))
18.
O
O {o}
19. Return (m, O, P)
End.
Figure 6.24. Subroutine initialize stats (Bichromatic)
complementary set because they are of type B. Furthermore, object h is excluded because its NN circle
does not overlap the shaded tentative scope region.
102
Subroutine outside search(q, k, N, z, R, TentR, TentC, TentD)

Input.
An RkNN query at point q with parameter k, sets
TentR and R of result objects, set TentC of
complementary objects, set TentD of unclassified
objects, current index node N, and possible result
object counter z
Output. Returns true if N can be ignored by search
Begin
// Ensure object cannot be in result set
1. If N.is type(A) AND N.kcnt < k
2.
Return f alse;
// Ensure object cannot be in complementary set
3. If N.is type(A) AND (, {N}) =
refine as comp set(q, k, z > 0, R [ TentR, , {N})
4.
Return f alse;
// Ensure object will not be helpful in possibly eliminating
// other candidate complementary objects in future
5. If N.is type(B)
6.
For each object o in TentR [ TentC [ TentD
7.
If mindist(o, N) min(dist(o, q), o.kdist[k])
8.
Return f alse;
9. Return true;
End.
Figure 6.25. Subroutine outside search (Bichromatic)
103
RB
f h
RB2 g
R B1
b
R A2
c
d
q
f
g
e
c
d
q
b
a
RA1
RA
(a) Sample dataset
(b) Query circles

f
f
g
c
d
q
c
d
q
b
b
a
(c) NN circles
CS
VS
(d) Auxiliary scopes
Figure 6.26. Sample bichromatic query auxiliary scope computation
Chapter
Theoretical Analysis
7.1
Introduction
With all necessary algorithms and components of the containment scope framework introduced for
region, nearest neighbor, and reverse nearest neighbor spatial query types, we now turn our attention
to the development and careful analysis of an accurate spatial query cost model that can be used to
analyze the eectiveness of containment scope in reducing system resource consumption.
7.2
Relevant Performance Metrics
Recall that the primary reasoning for developing containment scope is to improve system scalability,
to conserve system resources, and to increase client autonomy. These objectives are interrelated and
improving one of them can often have a positive (or, in some cases, adverse) eect on another. To
precisely quantify our progress toward achieving these three goals, we consider the following five
performance metrics:
1. query submission rate. Query submission rate indicates the ratio of the number of queries submitted
by the client to the server in relation to the total number of queries processed by the client. A low
query submission rate implies that the client is able to answer a substantial number of requests
locally without server intervention.
2. auxiliary scope size. Auxiliary scope area size examines the region wherein a client can reuse the
result set for some query Q to answer a new query Q0 . It follows that this metric is closely related
to that of the query submission rate. In fact, the two have a negative correlation. A larger area
implies a higher probability that clients can use maintained results to answer their queries locally,
thus leading to a lower query submission rate.
3. bandwidth consumption. Bandwidth consumption measures the amount of data transmitted from
the server to the client. Since the transmission of query parameters over an uplink channel
105
are typically small and fixed, we focus on the bandwidth needed to transmit query results and
auxiliary scope data via a downlink channel. Compact auxiliary scope representations and low
client query submission rates directly lead to efficient bandwidth utilization and improve system
scalability. As additional bandwidth consumption implies additional client communication and
energy consumption, this metric is also linked to client autonomy.
4. I/O cost. I/O cost refers to the number of pages that the server has to read from the disk in
order to answer a query and to form the corresponding auxiliary scope. Typically, disk accesses
are assumed to be significantly more costly to perform than memory accesses and arithmetic
computation. Thus, the minimization of this parameter at the expense of a modest increase in
computational complexity is generally preferred.
5. execution time. Execution time refers to the period of time between when a query is received by
the server to when the result and the auxiliary scope is computed. We assume that the server has
no other requests and is able to begin processing the request immediately.
As server requests utilize limited bandwidth, are slow and energy-intensive operations for clients,
and impose significant overhead on central servers, our cost model focuses on query submission rate
and bandwidth consumption as primary indicators of system performance. Auxiliary scope size also
oers important information about the autonomy of the client. Server I/O cost is the next most important
metric because disk accesses are traditionally slower than in-memory computations. Execution time
represents the performance measure of lowest importance but still must be of reasonable length if the
containment scope algorithm is to be of practical use. It is worth noting that auxiliary scope algorithms
can aord to spend additional I/O time and CPU time to compute an auxiliary scope if this information
will later reduce the query workload on the server by an equivalent amount. In essence, the additional
cost of computing an auxiliary scope may be amortized over the entire client query set and acutally
lead to a reduction in average query overhead for the server.
In the rest of this chapter, we logically evaluate the eect that containment scope has on the five aforementioned performance metrics for both region (range, window) queries as well as nearest neighbor
(NN, kNN) queries. Reverse nearest neighbor (RNN, RkNN) queries are impractical to examine theoretically given the high degree of interdependence between dataset objects and query parameters. That is,
there is a substantial degree of complexity involved in the construction and evaluation of containment
scopes for RkNN queries. Therefore, RNN queries are excluded from our theoretical analysis. Careful
experimental analysis of RNN queries as well as other query types will be performed in Chapter 8. For
each type of traditional spatial query, we identify lower and upper bounds of algorithm performance
and also compute average cost where possible. To simplify our analytical model in certain situations,
we assume that the dataset S under consideration is uniformly distributed. Section 8 will evaluate
non-uniform synthetic datasets as well as publicly available real datasets from the U.S. Census Bureau.
Our analysis will consider how performance metrics vary over dierent data densities, query types,
and query parameters. We assume that an R*-tree index I has been constructed for the dataset S but
will not assume that the index conforms to any particular height or node capacity constraints.
106
We first establish some necessary definitions and terminology for the remainder of the chapter in
Section 7.3. We then examine performance characteristics of region queries and NN queries in Section
7.4 and Section 7.5, respectively. Finally, Section 7.6 oers some closing thoughts and considers the
applicability of our analysis to arbitrary datasets.
7.3
Cost Model Terminology
As previously stated, we assume that there exists a uniformly distributed dataset S indexed by an
R*-tree I with node capacity given by Nc and node fanout given by f . Then we define N = |S|, the
cardinality of S, and A = bbox(S), the bounding box of all data points in S. We assume that the dataspace
area A of dataset S is the unit square for simplicity. That is, we assume that A = 1. It follows that the
density of data objects is uniformly given by D =
uniformly distributed dataset can be given by
N
.
Aq
The average distance between two objects in the

q
A
1
=
N
N . All datasets considered in this analysis
will be two-dimensional, but most of our observations will extend to arbitrary dimensionality. Assume
that each objects spatial coordinates require size bc bytes of storage and that each objects data contents
(including spatial coordinates) requires br bytes of storage. It is reasonable to assume that br >> bc for
most applications, so we do so in our analysis here as well.
We issue either a region query QR or kNN query QN at some location denoted by qQ . The subscript
describing the query type may be ommitted when clear from the context of the discussion. We generally
denote the query type and query parameter of a generic spatial query Q as typeQ and paramQ , respectively.
We also use specific notation in place of paramQ when the type of a given spatial query is known. Here,
param = r for range queries, param = {h, l} for window queries, and param = k for kNN queries.The area
covered by spatial query Q is denoted as AQ .
When computing our desired containment scope performance metrics, we let ZQ symbolize the
containment scope size of query Q and let BQ denote the total bandwidth required to transmit the
response for query Q to the client. BQ includes the downlink transmission of all result objects and
complementary objects. Finally, we let DAQ and TQ denote the number of disk accesses and total
execution time for a spatial query Q.
7.4
Region Query Cost Model
Recall from our previous discussion in Chapter 4 that the computation techniques for range and window
query containment scopes are related. It follows that there should be considerable similarity in any
developed cost model for the two query types.
7.4.1 Query Submission Rate

To begin, we consider the eect that containment scope has on query submission rate. Equivalently,
our cost model analyzes the eect that the previously obtained result of some query Q has on the
evaluation of a future query Q0 . Notice that for very large query loads, the query submission rate R
107
Symbol
S
A
N
D
bc
br
I
Nc
f
QR
QN
qQ
typeQ
paramQ
rQ
lQ
hQ
kQ
AQ
ZQ
BQ
DAQ
TQ
Definition
Dataset
Dataspace area (1)
Cardinality of S
Density of dataset
Average object-object distance
Size of objects spatial coordinates
Size of objects stored data
R*-tree index of S
R*-tree node capacity
R*-tree node fanout
Region query
kNN query
query location for Q
type of query Q
parameter of location for Q
radius for range query Q
length for window query Q
height for window query Q
k value for kNN query Q
Area of query Q
Area of containment scope for Q
Bandwidth required for response to Q
Disk accesses required to process Q
Execution time required to process Q
Table 7.1. Cost model definitions
can be approximated by the probability that Q0 can be answered locally by the client without server
intervention. If we denote P(L) as the probability that Q0 can be answered locally using the containment
scope of Q, then R P(L). The basic elements that characterize query reuse for the containment scope
framework are (1) the type of queries Q and Q0 , (2) the supplemental parameters that restrict the spatial
region (i.e., radius for range query, extents for window queries) of queries Q and Q0 , and (3) relative
location of the query point qQ0 in relation to the auxiliary scope of Q. A query result can be answered
locally if and only if Q0 satisfies all conditions set by the containment scope on these three items.
Containment scope oers the great flexibility in query parameterization, which should yield a high P(L)
and, by extension, R. We can express P(L) for containment scope as shown in Equation 7.1.
P(L)CS = P(Q v Q0 ^ qQ0 2 ZQ )
(7.1)
First of all, we notice that any semantically contained query can be considered for local processing
by spatial query containment algorithms. This diers from other methods that require a precise match.
In contrast, bare query processing results in a submission rate of approximately 100% as the general
probability of answering a query Q0 locally given a previous query Q reduces to Equation 7.2. That is,
a query can only be answered locally if it is identical to the original query in every respect.
108
P(L)Bare = P(typeQ = typeQ0 ^ paramQ = paramQ0 ^ qQ = qQ0 )
(7.2)
Analyzing the above results further, we notice that the query mix and dataset distribution will have a
significant impact on the server query submission rate. As the contianment scope of Q is spatially close
to the original query point qQ , we conclude that the query submission rate will be positively correlated
with the distance moved by the client between query submissions. The sensitivity of containment scope
to movement is less than other approaches, as it covers a comparatively large region of the dataset.
Containment scope is sensitive to changes in query supplemental parameters. However, containment
scope provides for result reuse when it is possible to answer new queries semantically contained by
the query for which the auxiliary scope was constructed. Finally, all existing auxiliary scope methods
require the specific type of query to remain unchanged. Removing this restriction for containment scope
will be the focus of future work. It is worth noting that our analysis assumes that only a single querys
containment scope is stored. However, it is possible to store multiple containment scopes in which case
only one such scope must match the above conditions. Thus, storing additional containment scopes
increases the probability that a client can answer a query locally and may reduce the query submission
rate.
Since the query workload very much depends on the environment in which the containment scope
framework is being used, we will assume for the remainder of this discussion that the query type and
supplemental parameters (i.e., every query parameter except its location q) remain unchanged. Then the
query submission rate is negatively correlated to the containment scope size, which we now consider.
7.4.2 Auxiliary Scope Size

Observe that the auxiliary scope size for basic query processing is zero, as the only way a result set can
be reused is if qQ0 = qQ . It follows that the auxiliary scope ZQ,Bare = 0. We compute the scope size for
containment scope based on the region in which at least one result object remains in the result set of a
new query Q0 that is semantically contained by the query Q for which the scope is computed. Then,
we know that that the containment scope has an upper bound given by ZQ,CS 4AQ , which yields the
bounds ZQ,CS cir(qQ , 2rQ ) and ZQ,CS rect(qQ , 2hQ , 2lQ ) for range and window queries, respectively. In
addition, it is possible to construct a containment scope of arbitrarily small size by placing data objects
around the entire boundary of a region query in a circular or rectangular pattern based on the specific
query type. As none of these objects are allowed to enter the result set for a semantically contained
query issued inside of the containment scope and as these objects will enter said result set if the query
point is perturbed slightly in any direction, the containment scope size can approach zero. Thus, we
have 0 ZQ,CS 4AQ . The formula for computing the exact containment scope size is given by Equation
7.3, which yields Equation 7.4 and Equation 7.5 for range and window queries.
ZQ,CS =
[
o2RQ
Q.Minkowski(o)
[
o0 2O RQ
Q.Minkowski(o0 )
(7.3)
109
ZQ,CS =
[
o2RQ
ZQ,CS =
cir(o, rQ )
cir(o0 , rQ )
(7.4)
rect(o0 , hQ , wQ )
(7.5)
o0 2O RQ
rect(o, hQ , wQ )
o2RQ
o0 2O
RQ
We know that an object is only included in a query Q if its Minkowski region includes qQ . Since we
require that at least one result object be included, we take the union of the Minkowski regions of all
result objects. Furthermore, no new objects can enter the result set, so we remove the area covered by
the Minkowski regions of all complementary objects. This gives a precise containment scope area size
for a region query Q. However, the resulting region is a complex (and potentially concave) polygon that
cannot easily be computed even for highly structured datasets. Therefore, we approximate this region
experimentally using a Monte Carlo technique. We sample a large number of points that are uniformly
distributed within the upper bound on the containment scope and use the percentage of those points
that fall within the containment scope region as an estimate for the regions size (normalized to the size
of the scope upper bound). Thus, the containment scope size (and, by extension, the query submission
rate) will be dependent upon the dataset distribution. However, we observe that containment scope has
a substantially larger area than either semantic scope, which only considers query attributes, or valid
scope, which only considers data distribution. Containment scope is the only approach that considers
both aspects when attempting to reuse query results.
7.4.3 Bandwidth Consumption

With query submission rate and auxiliary scope size both considered, we now turn our attention to
measuring the performance metric of bandwidth consumption and how the introduction of containment
logic aects this important component. Bare query processing must transmit all result objects to the
client. Recall that each result object has size br . Then the transmission size of a query Q for bare query
processing is equal to nr br , where nr denotes the number of result objects for query Q. We can use the
uniformity of our dataset to approximate the number of result objects as nr = AQ D =
that the bandwidth utilization BQ for bare query processing is given by BQ,Bare = nQ br =
AQ N
A .
AQ N
A br
It follows
for region
queries. This transmission cost is also incurred by every other auxiliary scope method (including
containment scope), as the client always must receive the complete answer for its request. We now
consider the amount of extra bandwidth required to transmit containment scope information. Recall
that each complementary object has a size given by bc . It has previously been shown by Lemma 1
and by Lemma 2 that all complementary objects for a range query must lie within the area given by
cir(qQ , 3rQ ) and that all complementary objects for a window query must lie within the area given by
rect(qQ , 3hQ , 3lQ ). Furthermore, the query region can be excluded from these areas because all of its
objects are already being returned in the result set. This gives us the regions cir(qQ , 3rQ )
and rect(qQ , 3hQ , 3lQ )
cir(qQ , rQ )
rect(qQ , hQ , lQ ). These areas are larger than the query space by a factor of nine.
We then give upper bounds of nc (9D)AQ
(D)AQ 8
NAQ
A
for both query types. In the event that no
objects lie in this region, there is no additional cost to form the containment scope because the required
110
complementary set is empty. Thus, we have that BQ,CS
nr br .
We can represent the total bandwidth consumption of a server processed query Q as the sum
of the cost to transmit result set information and the cost to transmit complementary set information.
Mathematically, we express this equation as BQ,CS = nr br +nc bc . If we assume that the dataset distribution
is uniform, then we can bound this expression as shown in Equation 7.6.
br
AQ N
AQ N
NAQ
BQ,CS br
+ 8bc
A
A
A
(7.6)
At first glance, it appears that BQ is larger for containment scope than for bare query processing. However, there are two important factors that cause total bandwidth consumption to favor the
containment scope method. First, we only require the location of complementary objects and not all
supplementary data stored in our dataset. This allows us to assume that br >> bc . In addition, the
complementary set data required by containment scope provide for a decrease in the query submission
rate. Note that any locally answered query has an incurred bandwidth cost of BQ = 0. It follows that it
is favorable to incur the extra minimal cost to send complementary set information if doing so reduces
future transmissions of result object data from the server to the client by an equal or greater amount.
7.4.4 I/O Cost

For the next part of our cost model construction, we consider the management of I/O cost for server
systems. First, we consider the approximate number of disk accesses required to answer range and
window queries without any auxiliary scope. To simplify our calculations, we assume that the issued
window query is square, but our analysis can easily be generalized to handle arbitrarily sized rectangles.
According to Theodoridis [24], the number of disk accesses DA required to answer a two-dimensional
square window query Q (with lQ = hQ ) on a uniformly distributed dataset indexed by an R-tree I with
fanout f is given by Equation 7.7.
DAQ, Bare
=
=
dlog f N/ f e h
Nl (s2l + 2sl hQ + h2Q )
l=1
dlog f N/ f e h
l=1
Nl
( ND2l
l
p
2hQ Dl
Nl
h2Q )
(7.7)
p
Here, sl is the expected side lengthpof each nodes MBR at level l and is equivalent to sl = Dl . Dl
p
D
1
is defined by the recurrence Dl = 1 + pl 1 with D1 = 1 p1 . Furthermore, Nl is also a recurrence
f
defined by Nl =
Nl
f
with N1 =
N
f .
Finally, dlog f N/ f e determines the height of an R-tree index. This
provides a direct approximation for bare window query processing.
In the case of containment scope, recall that our algorithm is designed in such a way as to require
only a single pass of the index. Furthermore, we know from our auxiliary scope size analysis that this
pass cannot extend beyond a window centered at qQ with extents given by 3hQ for the square case of
hQ = lQ . Thus, the disk access upper bound is given by Equation 7.8.
111
DAQ,CS
=
=
dlog f N/ f e h
Nl (s2l + 6sl hQ + 9h2Q )
l=1
dlog f N/ f e h
l=1
Nl
( ND2l
l
p
6hQ Dl
Nl
i
(7.8)
9h2Q )
We can obtain a similar conservative upper bound for range queries. Recall that the access probability
for a given node is computed as the overlap area between its minimum bounding rectangle (MBR) and
the query search area QA . As shown in Figure 7.1(a), a range query search area cir(qQ , 3rQ ) overlaps the
MBR of a node N1 but not N2 . Thus, N1 must be explored. In the presented R-tree cost model, the MBRs
of nodes are expected to be squares for a uniform data set. Since our search space is circular in the case
of range queries, we can determine the node access probability by expanding the MBR areas based on
the dimensions of the search area. Figure 7.1(b) shows the expanded MBRs of N1 and N2 .
3r
q
3r
3r
N1
N1
N2
N2
(a) cir(q, 3r), N1 and N2
(b) expanded MBRs of N1 and N2
Figure 7.1. Search area cir(q, 3r) and MBRs
Using this logic, the access probability of a node is estimated based on its expanded area with respect
to cir(q, 3r), and so DAQ,CS can be computed using Equation (7.9).
DAQ,CS
=
=
dlog f N/ f e h
l=1
dlog f N/ f e h
l=1
Nl (s2l + 12rQ sl + 9r2Q )

Nl
( ND2l
l
p
12rQ Dl
Nl
9r2Q )
Correspondingly, the number of node access for bare query processing is given by DAQ,Bare =
( ND2l +
l
4rQ Dl
Nl
(7.9)
dlog f N/ f e
l=1
[Nl
+ r2Q )]. It follows that the number of additional node accesses incurred by computing
containment scope is
dlog f N/ f e
l=1
[Nl (
p
8rQ Dl
Nl
+ 8r2Q )]. Notice that as rQ , hQ 2 (0, 1) is expected to be small,
this processing overhead is in fact not very significant when compared with the cost of bare query
processing. The same observation holds for the window query upper bound obtained previously.
A tight lower bound for the number of disk accesses required for a region query with containment
scope is given by the number of accesses needed to obtain all result objects because (1) only the
Minkowski region around each result object needs to be checked for complementary objects and (2) it is
possible to make the portion of the complementary object search space that does not overlap the query
search space arbitrarily small by having a single result object o such that o = qQ .
112
7.4.5 Execution Time

Finally, we consider the fifth performance metric of execution time within the context of region query
processing. Although not as critical as the parameters discussed so far, execution time has an important
eect on achieving our goals of system scalability, resource conservation, and client autonomy. We must
ensure that the additional computational load placed on the server is sufficiently oset by savings in
reduced query submissions, bandwidth consumption, and disk access. Notice that processing for both
the bare query approach and for the containment scope approach occurs using a priority queue ordered
by mindist. Containment scope reduces computational overhead in comparison to other auxiliary scope
techniques by performing only a single passes through the dataset. This directly results in a reduction of
repetitive accesses and leads to fewer disk requests and computational activites. Result set membership
can be determined in constant time, while complementary set membership computational time is linear
in the size of the result set, as we must compare each potential complementary object to every result set
object in search of match.
We can optionally choose to reduce bandwidth consumption and disk accesses further by performing
additional pruning of the complementary set. In particular, we calculate if existing complementary
objects shield other objects and, if so, do not process the shielded objects. The additional computation
is quadratic in the number of identified complementary objects and linear with respect to the number
of result objects. If we let mp denote the number of possible complementary objects in the search
space (previously found to be mp = (9D)AQ
(D)AQ = 8
NAQ
A ),
m f denote the number of verified
complementary objects, and n denote the number of result objects of a query Q, then we have a bound
of algorithm complexity given by O(nmp ) without additional reduction techniques and O(nmp + nm2f )
with reduction techniques. Recall that auxiliary scope techniques reduce the query submission rate and
consequently may yield overall gains in execution time since fewer queries will need to be processed.
In other words, execution time will improve if the reduction in query submissions outweighs the added
O(nmp ) computational complexity.
Finally, we can oer a rough approximation of the execution time of an algorithm based on the
number of disk accesses required during the life of the process. The logic behind this analysis is that
disk access times are the slowest among most computational devices and oer a good estimate of the
total time required for an I/O intensive process to complete its execution. In comparing theoretical
results to experimental results, we use a theoretical upper bound of 10 milliseconds per disk access
based on empirical studies performed by Tao. [25]
7.5
NN Query Cost Model
With an analysis of region query containment scope overhead now complete, we turn our attention to
the problem of computing containment scopes for nearest neighbor queries. There is some similariity
between region query analysis and nearest neighbor query analysis, and we point out these common
themes where appropriate.
113
7.5.1 Query Submission Rate

To begin, we consider how query submission rate is processed in the case of NN queries. Recall from our
discussion of region queries that the query submission rate is dependent on both the query workload
and dataset distribution. Consider a new query Q0 and a previously issued query Q for which an
auxiliary scope is available. In the case of bare query processing, we require that qQ = qQ0 and kQ = kQ0
for a query to be answered locally. Containment scope substantially broadens the circumstances under
which a previous query result is applicable by requiring only that P(Q v Q0 ) and P(qQ0 2 ZQ ). Recall
that Q v Q0 , kQ0 kQ when Q is a kNN query. Aside from the change in semantic containment logic
and auxiliary scope formation, the logic behind query submission rate remains unchanged from the
case of region query processing.
7.5.2 Auxiliary Scope Size

We now analyze the formation of a containment scope from the standpoint of the expected containment
scope area size. The containment scope for a NN query Q is equal to the size of the Voronoi cell of its
result object o from the dataset. If the dataset is uniformly distributed, we can approximate the size
of an individual Voronoi cell as the ratio of the domain of the dataset (A = 1) to the cardinality of the
dataset as expressed in Equation 7.10.
ZQ,CS =
A
1
=
N N
(7.10)
In the case of a kNN query, the containment scope is bounded by the union of k dierent Voronoi
cells. Each of these cells is computed by excluding k
1 of the result objects from the dataset and then
forming the Voronoi cell for the remaining result object with respect to all non-result objects in the
dataset. This yields an upper bound on the size of the containment scope that is given by Equation 7.11.
ZQ,CS k
A
k
=
k+1 N k+1
(7.11)
7.5.3 Bandwidth Consumption

Next, we consider the bandwidth cost of implementing the containment scope framework for the kNN
query type. Recall from our region query cost model discussion that the transmission size of a response
to some query Q when using bare query processing is equal to nr br , where nr denotes the number of
result objects for query Q. In the case of a kNN query, nr = k. We now need to identify the number of
complementary objects, denoted as nc , that need to be transmitted with the result objects back to the
client. These objects are those that form the boundaries of the k Voronoi cells that are associated with
the objects in the result set of Q. It follows that the complementary objects are precisely the non-result
objects that are closest to these points. If the dataset is uniform, then we can approximate the distance
between each data object as in Equation 7.12.
114
r
=
A
=
N
1
N
(7.12)
Here, we are assumed that the dataset is square for simplicity. (If not, we can replace Equation 7.12
with a slightly more complex but equivalent form.) Considering the kNN circle for our query point qQ
that is given by cir(qQ , dist(qQ , ok )) allows us to theorize that all complementary objects will be identified
within the search space given by AS = cir(qQ , + dist(qQ , ok )) cir(qQ , dist(qQ , ok )), which yields the upper
bound of nc DAS =
by nc
NAS
A
= NAS . A lower bound on the number of complementary objects is given
3. We use this bound since there must exist at least three data objects in order to complete all
sides of any Voronoi cell, but it is possible that all k Voronoi cells could share the same complementary
objects. The only exception to this bound occurs in cases where the dataset boundary contributes one
or more sides of a result objects Voronoi cell. However, we ignore such boundary cases in our analysis.
As previously described in this section, we can represent the total bandwidth consumption of a
server processed query Q as BQ,CS = nr br + nc bc . This yields the expression given in Equation 7.13.
kQ br + 3bc BCS kQ br +
NAS
bc = kQ br + NAS bc
A
(7.13)
Also, for every kNN query Q, BQ,CS is at least as large as BQ,Bare . However, recall that br >> bc and that
containment scope reduces the server query submission rate. This fact coupled with the observation
that BQ,CS = BQ,Bare = 0 for any query answered locally by the client allows for the possibility of an
overall reduction in average bandwidth consumption per client query. Chapter 8 will explore this
possibility for a wide variety of query workloads.
7.5.4 I/O Cost

For the fourth performance metric, we estimate the total processing cost of evaluating Q and for
computing the corresponding containment scope in terms of the total number of required index node
accesses. The search space that must be explored in order to ensure that all k result objects from
q a
k
query Q (issued at location qQ ) are covered can be represented by a circle cir(qQ , dk ). Here, dk N
represents the expected distance between the query point qQ and the kth furthest object in S from qQ .
Because complementary objects are those dataset objects that are immediate neighbors to the querys
p
result objects, we can use the expected distance between objects ( = 1N) to approximate the expected
search area for both result objects and complementary objects as cir(qQ , dk + ), which is equivalent to
q
q
p p
kQ
k+
cir(qQ , N
+ N1 ) $ cir(qQ , p
).
N
Similar to the case of region queries, we make use of a result from Theodoridis [24] that allows us
to accurately predict the number of disk accesses DA required to answer a two-dimensional window
query Q on a uniformly distributed dataset indexed by an R-tree I with fanout f . According to the
model, we can obtain a bound on the disk access count DA that conforms to Equation (7.14).
115
DAQ,CS
=
=
dlog f
Pf e h
l=1
dlog f Nf e h
l=1
Nl ( ND2l + 4
l
Nl ( ND2l + 4
l
Dl
Nl (dk
Dl
Nl
+ ) + (dk + )2 )
p
kQ +
p
|O|
i
(7.14)
kQ +2 kQ + i
+
)
N
Notice that the node access for evaluating Q takes only
dlog f
p
i
Pf e h
D
Nl ( ND2l + 4 Nl l (dk ) + d2k ) . We can
l=1
use this result as a lower bound on the number of disk accesses that will be required for containment
scope processing (or any other auixliary scope processing). The computation of the query result set
constitutes the majority of the cost incurred by the containment scope framework in processing a query
request on the server, so we can conclude that the extra overhead required for constructing a querys
complementary set is relatively small.
7.5.5 Execution Time

For the last of the five performance metrics, we consider the eect that containment scope has on server
execution time during the processing of kNN queries. Note that execution time is the least important
of the five measured items. We generally would expect the primary limitations of a containment scope
processing system to be bandwidth consumption and disk accesses, while execution time would be a
secondary concern. However, it still plays a significant role in ensuring the viability of our approach.
Notice that, as in the case of region queries, processing for both the bare query approach and for the
containment scope approach occurs using a priority queue ordered by mindist. Containment scope
reduces computational overhead in comparison to other auxiliary scope techniques by performing only
a single passes through the dataset. This directly results in a reduction of repetitive data processing and
leads to fewer disk requests and computational activites in general. Tentative result set membership
can be determined in constant time, while complementary set membership computational complexity
is quadratic with respect to the size of the result set (kQ ). Symbolically, we express the algorithm
complexity by O(kmp ), where mp denotes the number of possible complementary objects in the search
space. Based on our previous area estimates, we can approximate mp by the upper bound mp =
NAS
A ,
where AS is given by AS = cir(qQ , +dist(qQ , ok )) cir(qQ , dist(qQ , ok )). Recall from previous discussion that
auxiliary scope techniques reduce the query submission rate and consequently may yield overall gains
in execution time since fewer queries will need to be processed. Chapter 8 examines the eectiveness
of query submission reduction on total server execution time in depth.
7.6
Extension to Non-Uniform Datasets
Before concluding our discussion on the theoretical eectiveness of the containment scope framework,
we consider the applicability of such analysis to arbitrary datasets commonly encountered during
real world spatial query processing. In particular, we consider the eectiveness of containment scope
116
on datasets that are not necessarily uniform. Notice that the logic used in the construction of lower
bounds and upper bounds in this chapter rely on the uniformity of data solely to simplify requisite
computations. The uniformity assumption also enables us to obtain closed form expressions in cases
where such expressions would otherwise require actual raw data from the specific object set. However,
there are several trends that continue to hold regardless of the dataset distribution:
The query submission rate is dependent on the query load, client movement pattern, and data
object distribution. Containment scope oers greater query reuse (and consequently a lower query
submission rate) because it allows all semantically contained queries to be answered locally by a
client so long as the query point remains within the containment scope area.
The auxiliary scope area size is likely to be larger for containment scope than for existing semantic
scope and valid scope methods discussed in Chapter 2. This large area size reinforces the ability
of containment scope to eliminate redundant queries by allowing future semantically contained

spatial queries to be answered over a wider domain. Two important features that promote maximal
containment scope areas are (1) the precise generation of scope boundaries based on localized data
object orientation and (2) the simple requirement that the result set of a semantically contained
query remain merely a subset of the result set of the query for which the containment scope was
constructed.
Bandwidth consumption is dominated by the number of result objects that must be transmitted
from the server to the client. The fact that containment scope only requires the spatial coordinates
of any complementary object (and not that objects actual data) minimizes the storage footprint for
each complementary object. Since br >> bc we predict that the ability of an auxiliary scope technique to eliminate redundant queries will have a much greater impact on bandwidth consumption
than the ability to marginally reduce the size of the complementary set.
Both I/O cost and execution time vary based on the specific spatial query type as well as the
area covered by the query. The containment scope processing framework employs geometric
bounds to limit the search space for any given query. In addition, we eliminate redundant
computational processing and I/O disk accesses by integrating query evaluation and containment
scope construction into a single pass algorithm that only examines each index node a maximum
of one time.
Analyzing a wide range of dierent dataset types will be an important component of the experimental
results presented in Chapter 8. By conducting extensive testing, we will also be able to analyze the
correctness and accuracy of the theoretical bounds given in this chapter.
Chapter
Experimental Analysis
8.1
Introduction
In this chapter, we evaluate the eectiveness of the proposed spatial query containment model as
a means of minimizing resource consumption in mobile environments through an extensive set of
experiments and demonstrate its exceptional performance by comparing spatial query containment
with other existing approaches discussed in Chapter 2. We also consider how our experimental results
conform to the theoretical cost model defined in Chapter 7 where applicable.
The primary focus of our experiments is on measuring (1) the overhead incurred by the LBS server in
computing a query containment scope and (2) the overall improvement in system performance attained
through a reduction in redundant query evaluations. Both of these aspects are considered in the analysis
that follows.
8.2
Domain of Interest
In our evaluation of traditional spatial queries, we include bare query processing, semantic scope
(for window query [13, 14, 15] and for kNN query [18]), valid scope (TP-query approach) [12], valid
scope (geometric approach) [17], as well as our proposed spatial query containment. These dierent
methods are labeled as Bare, Semantic, Valid Scope (TPQ),Valid Scope (Geo) and Containment Scope,
respectively. In the case of RkNN spatial queries, we note that there does not exist any auxiliary
scope techniques with which to compare our method. Therefore, we conduct tests on bare query
processing, valid scope (dynamic method), containment scope (optimal method). As before, we label
our approaches using the scheme Bare, Valid Scope (RkNN), and Containment Scope. We implemented
all the approaches in GNU C++. Notice that with sole exception of Bare, each approach defines a spatial
scope that allows the client to use locally maintained data to check if spatial queries can be answered
with previous query results. Collectively, we denote all those spatial scopes as auxiliary scope in the
following.
118
Bare serves as a baseline approach where clients submit all queries to the server for processing. It
does not reuse previous spatial query results and does not incur any extra processing overhead for
determining auxiliary scopes. Semantic checks whether the new query region is completely covered by
the previous query using semantic data. For kNN queries, Semantic collects m = 2k nearest objects (i.e., k
result objects plus the coordinates of the k nearest non-result objects). Both Valid Scope (TPQ) and Valid
Scope (Geo) form the valid scope for each spatial query result, but they adopt dierent computation
techniques. Notice that Valid Scope (TPQ) does not support range queries, while Valid Scope (Geo)
supports all discussed query types. In the case of RkNN queries, Valid Scope (RkNN) adopts the
computation strategy of the Dynamic RkNN Auxiliary Scope algorithm proposed in Section 6.3. Finally,
for Containment Scope, we consider kNN query evaluation as well as two variants for region (range,
window) query evaluation and RkNN query evaluation. For the first variant, we compute containment
scope using no additional filtering logic. This technique is denoted as Containment Scope. The second
evaluation method includes reduction logic for range, window, and RkNN queries that removes false
candidate complementary objects during containment scope computation. We expect this approach to
decrease bandwidth consumption and I/O cost at the expense of additional computational complexity.
Both Containment Scope and Containment Scope (RD) provide identical containment scopes.
To measure overall system performance and incurred overhead, we consider the five performance
metrics from our Containment Scope model discussed in Section 7. These are (1) query submission
rate, (2) auxiliary scope area size, (3) bandwidth consumption, (4) I/O cost, and (5) execution time. Query
submission rate refers to the ratio of server processed queries to total requested queries. The auxiliary
scope area size examines the coverage oered by various auxiliary scope methods and should be
strongly correlated to the query submission rate. This is particularly true if the client issues similar
(i.e., semantically contained) queries that are spatially close to one another. A large area implies a high
probability that clients can use maintained results to answer queries locally and is indicative of a low
query submission rate. The next three metrics consider the overhead incurred by the server and query
processing framework in computing and communicating results to the client. Bandwidth consumption
measures the amount of data (in kilobytes) transmitted over the wireless channel from the server to the
client. Meanwhile, I/O cost refers to the number of pages that the server has to read from disk in order
to answer a query and to form the corresponding auxiliary region if needed. Finally, execution time (in
milliseconds) refers to the duration from the time that a single query is received by the server to the
time by which all the result data and auxiliary scope information has been computed.
8.3
Experiment Setup
Our evaluations use synthetic and real object sets. Synthetic object sets are used to test the sensitivity
of our approaches to various dataset cardinalities and distributions, while realistic datasets are used to
examine the practicality of our approaches in real environments. Synthetic datasets are produced with
object locations following a Uniform distribution and Gaussian distribution with a mean and a standard
deviation of 500 and 100, respectively. The real dataset that is obtained from the United States Census
Bureau TIGER/Line collection [26] includes the locations of 11, 000 shopping malls across the country.
119
Parameter
Approaches:
Service area:
Object sets:
Query Types:
Client:
Server Cache:
Value
Bare Query Processing (Bare),
Semantic Region (Semantic),
TPQ Valid Scope (Valid Scope (TPQ)),
Geometric Valid Scope (Valid Scope (Geo)),
RkNN Dynamic Valid Scope (Valid Scope (RkNN)),
Containment Scope (Containment Scope)
Containment Scope w/o Reduction (Containment Scope (RD))
[1000, 1000]
Uniform (1k, 10k, 100k),
Gaussian (10k),
Real
Range (radius r=5, 10, 15, 20)
Window (square l=05, 10, 15, 20)
kNN (k=1, 4, 16, 64)
RkNN (k=1, 4, 16, 64)
Max distance moved per step (1, 5, 10)
5%, of index size
Table 8.1. Experiment parameters
All of the object sets are normalized to a two-dimensional service area of 1, 000 1, 000 units. Further,
we fix the size of result object content and complementary object content (i.e., spatial coordinates) to 256
bytes and 16 bytes, respectively. These values model a real world environment in which supplemental
information contained about an object at some location is expected to be substantially larger than the
actual representation of the objects location in the dataset. Because of space constaints, we mainly
present results from the synthetic datasets with 10, 000 objects; however, the results for other object
cardinalities are similar. Finally, we denote the uniform dataset as Uni, the Gaussian dataset as Gau,
and the real dataset as Real.
After selecting an object distribution, we generate client positions at which queries are issued and
processed. Three types of queries, namely, range queries, window queries, and kNN queries are
evaluated. The radii of range queries r vary between 5, 10, 15, and 20 dataspace units. We consider
the search area of window queries to be square with a half-length extent l that varies between the four
values of 5, 10, 15, and 20 dataspace units. The k of kNN queries is set to 1, 4, 16 or 64 objects. We ran our
experiments on Solaris Blade1000 Workstations equipped with 1GB RAM and running the SunOS 5.10
operating system. Furthermore, all the experimental datasets are indexed by an R-tree [4] with a disk
page size of 4KB. In addition, a cache with its size equal to 5% of the R-tree index size managed by a least
recently used (LRU) replacement policy is used to alleviate some server I/O cost for query processing and
auxiliary scope computation. This cache is particularly useful to Valid Scope (TPQ) but does not aect
spatial query containment processing as consequence of the one-pass integrated containment scope
construction algorithm developed by this work. While we have conducted extensive experiments on a
wide variety of datasets, the results shown below form a representative sample given space constraints.
Abnormalities identified in excluded results are noted when appropriate. Finally, we summarize all
evaluation parameters for convenient reference in Table 8.1. Unless otherwise specified, values in bold
120
are used as the defaults for our experiments.
In what follows, we examine the overall impact that dierent auxiliary scope approaches have on
system performance, system scalability, and client autonomy via three dierent sets of experiments. The
first experimental group consider the overhead that auxiliary scope imposes on server query evaluation
and on communication with clients. Here, we focus on the cost of computing query results and auxiliary
scope information in isolation. That is, we do not allow clients to use reuse query results and expect the
individual cost of each query evaluation with auxiliary scope methods to be higher than with simple
bare query processing. In practice, we would expect auxiliary scope methods to outperform bare query
processing since the cost of computing the auxiliary scope would be oset by the savings from being able
to answer future redundant queries locally. We compute the average auxiliary scope size to measure the
relative success of each auxiliary scope method at reducing redundancy. The second set of experiments
simulates issuing a set of spatially related queries from a single mobile client whose position changes
over time. The goal of this experiment is to empirically guage the eectiveness of auxiliary scope
methods at reducing the server query submission rate. Additionally, we measure the average I/O
cost, bandwidth consumption, and execution time for executing a single client query for each deployed
auxiliary scope framework. The third and final set of experiments considers the eect that dataset object
cardinality has on the eectiveness of the dierent auxiliary scope processing techniques. In general,
we observed that spatial query containment can eectively enable clients to answer queries locally and
consequently can reduce overall system processing costs. In almost all cases, spatial query containment
incurs much lower average query processing overhead than other approaches.
Throughout the course of our experiments, we refer to clients as in the context of location based
services. That is, we assume that a mobile client issues a series of spatial queries to a centralized
server. In reality, the client may simply be a separate logical process on the same machine as the
server component. Alternatively, the client may be stored on a stationary workstation and may issue
spatial queries based on a users area of interest. Regardless, we operate under the assumption that
there is some degree of spatial locality in issued client queries and model such locality by having the
client move to its new query location prior to issuing its next request. The application of obtained
auxiliary scope test results to other contexts can be done by simply altering the terminology used in our
experimental analysis.
8.4
Exp. I. Impact of Auxiliary Scope Formation
In the first set of experiments, we measure the processing overhead incurred for spatial queries evaluated
at the server. For each configuration and for each implementation, we issue 100 spatial queries to the
server for the chosen query processing and auxiliary scope computation method. First, we consider
uniform datasets and compare obtained results to the theoretical bounds put forth in Section 7. We follow
this discussion by examining the eectiveness of spatial query containment and existing approaches
on both non-uniform synthetic datasets as well as real datasets. Auxiliary scope size, bandwidth
consumption, I/O cost, and execution time are all measured an analyzed.
121
8.4.1 Uniform Dataset Performance Analysis
(a) Scope area size
(b) Bandwidth consumption
(c) I/O cost
(d) Server execution time
Figure 8.1. Server overhead for computing range query auxiliary scope on uniform dataset
Our discussion considers for region, kNN, and RkNN query types. While each query type is
analyzed separately, we make a point to illustrate common themes among all containment scope
processing methods where appropriate. Note that RkNN results are preliminary and that the code has
not yet been optimized for performance. As such, we do not display execution time and page access
data for this approach. Finally, note that client simulation results for the RkNN query containment
scope computation also is not available. However, initial testing suggests that eectiveness and overall
performance is on par with containment scope algorithms for other spatial query types.
8.4.1.1
Region Query
Before performing a detailed discussion of processing overhead in terms of execution time, I/O cost
and bandwidth consumption, we present the calculated scope area sizes for each of the evaluated
approaches. As mentioned in Section 7, it is difficult to compute or to predict the exact spatial query
containment scope size to machine precision. Therefore, we adopt the Monte Carlo method as a means
to estimate the overall size of the containment scope. We do this by observing that each spatial query
type has an easily computed upper bound on its size. For region queries, this bound is a search space
122
(a) Scope area size
(c) I/O cost
Figure 8.2. Server overhead for computing window query auxiliary scope on uniform dataset
nine times as large as that of the original query. Once we obtain an easily computable bound on
the auxiliary scope area, we randomly select any point inside the feasible scope region (with uniform
probability) and check that point for membership in the scope. After repeating this process a large
number of times (10,000 times in the case of the results presented in this section), we can approximate
the size of the auxiliary scope as the percentage of objects marked inside of the scope multiplied by the
total area of scope boundary. It follows that we can represent the area size as ZCS pB, where p is the
percentage of objects inside the containment scope and B is the area size of the bounding box of the
feasible scope area.
Logically, a large scope areas should increase the likelihood that a future spatial query will be issued
within the scope and, by extension, the probability that an spatial query can be answered locally by the
client. It is then reasonable to expect that we can use the sizes, to predict the future relative performance
of dierent approaches. The eectiveness of computed auxiliary scopes will be studied in greater
detail in the second set of experiments. Figures 8.1(a) and 8.2(a) show the scope area sizes for range
and window queries, respectively. In the case of the range query type, Containment Scope covers
considerably larger area than Valid Scope (Geo), while Bare and Semantic result in no scope coverage
and consequently have an area size of zero. A similar observation can be made for window query as
shown in Figure 8.2(a). Notice that the scope size for Valid Scope (TPQ) and Valid Scope (Geo) are
identical since both methods compute the same auxiliary scope using two separate techniques. Similarly,
123
(a) Scope area size
(c) I/O cost
Figure 8.3. Server overhead for computing kNN query auxiliary scope on uniform dataset
both Containment Scope and Containment Scope (RD) compute the same scope size, as reduction logic
only removes complementary objects that do not aect the formation of the final containment scope.
Notice that there remains a considerable gap that favors Containment Scope over all other approaches
for both region query types over all query sizes considered by our experimental analysis.
Continuing our auxiliary scope analysis, we observe that the scope area sizes of Semantic and
Containment Scope remain constant or increase as the search area (determined by the radius or side
length) increases. However, the scope area sizes of Valid Scope (TPQ) and Valid Scope (Geo) are
negatively correlated with increases in the search area. This is because valid scope is formed by
intersecting the Minkowski regions of result objects. When the number of result objects increases, the
intersection area among the included Minkowski regions tends to decrease to a very small amount. On
the other hand, containment scope is formed based on the union of Minkowski regions of result objects,
and it is this property that guarantees that such a region will certainly be a least as large as the valid
scope computed for the same spatial query. Uniform datasets inhibit substantial growth from the union
operation of multiple Minkowski regions, as the Minkowski regions of other nearby complementary
objects must be removed from the scope area. However, Containment Scope clearly oers favorable
scalability for query search size even in the restrictive situation of uniform datasets.
Finally, we compare the experimantal results for Containment Scope with the predicted results
124
(a) Scope area size
Figure 8.4. Server overhead for computing RkNN query auxiliary scope on uniform dataset
from our theoretical cost model discussed in Section 7. The plotted line on each graph represents the
theoretical upper bound computed based on the specific properties of the dataset used in this test.
Notice that the collected region query data results do conform to the predicted bounds. Recall that
the upper bound for containment scope is given by ZCS 4Q.Area. This bound increases as the query
size grows, yet the actual containment scope does not increase by a proportional amount. We note
that this phenomena occurs because the provided upper bound does not account for complementary
objects, which will restrict the containment scope size from reaching the maximal value. Our observed
results on auxiliary scope area size allow us to hypothesize that Containment Scope (and Containment
Scope (RD)) will have the lowest query submission rates for all region query types when studied in
Section 8.5.1.
Next, we measure the bandwidth consumption for all the approaches. The results for range and
window queries are shown in Figures 8.1(b) and 8.2(b), respectively. As the additional bandwidth
consumed is mainly attributed to the relatively small size (16 bytes) of the spatial coordinates that pertain
to required complementary objects, the dierence among the various approaches is notably minimal.
We consider the Bare method as our baseline. The additional bandwidth required for complementary
objects by Containment Scope is on average less than 3% of space requirements for the result objects
(256 bytes each) communicated for each single query response. This extra bandwidth consumption is
worthwhile because of the benefit of redundant query avoidance provided by the Containment Scope
solution. The large auxiliary scope size of this method and minimal overhead in complementary object
transmission suggest that the technique will have an overall positive eect toward reducing bandwidth
consumption and improving system scalability. Notice that the amount of data transmitted increases
exponentially as the query search space expands. This is to be expected of all methods, since additional
(large) result objects need to be transmitted to the client as part of the servers response. Furthermore,
we notice that the theoretical upper and lower bounds for the Containment Scope solution correspond
well with our collected experimental data. All of our results fall within the range formed by these two
bounds, which are shown in Figures 8.1(b) and 8.2(b) as smoothed lines.
Continuing our analysis further, we turn our attention to the I/O cost for all of the auxiliary scope
computation methods. The relevant data from our experiment for uniform datasets is shown in Figures
125
8.1(c) and 8.2(c) for range and window queries, respectively. For the supported windowquery types,
Valid Scope (TPQ) incurs the highest I/O cost due to a large number of TP queries. Notice that a modest
LRU cache is used to alleviate some page accesses. Eliminating this cache substantially increases the
number of disk accesses needed by this method. In comparison, Valid Scope (Geo) is more efficient
due to its ability to perform a single scan of the index to determine both result set and complementary
set membership. Containment Scope oers this same advantage over Valid Scope (TPQ) but still
accesses more pages than others when querying over our sample uniform dataset. In reality, this is a
reasonable outcome since Containment Scope needs more complementary objects in order to formulate
a containment scope that is substantially larger in size than all other auxuilary scopes. In addition,
Containment Scope (RD) incurs a few more pages accessed because of its inclusion of false-positive
complementary object nodes during the R-tree index traversal. I/O cost increases modestly as the search
space of the query and, by extension, the search space for complementary objects are widened. With
respect to the theoretical cost model discussed in Section 7, we observe that our experimental results
sometimes incurred 10-15% additional disk accesses beyond the upper bound dipicted on the figures.
This same observation was made by the Theodoridis in [27] and is attributed to as of yet unresolved
inefficiencies in the R-tree structure. Thus, we conclude that our I/O bounds correspond well with
actual experimental results on uniform data.
Finally, we measure the execution time for all of the approaches. The results are shown in Figures 8.1(d) and 8.2(d) for range and window queries, respectively. Bare and Semantic region query
processing do not need to compute any additional auxiliary scope information. It naturally follows
that the execution times for these algorithms are the shortest. At the other end of the spectrum, Valid
Scope (TPQ), when used in tandem with supported window queries, incurs the longest execution time
because of exhaustive invocations of TP queries that are used by the algorithm to refine the auxiliary
scope. In contrast to the extremes exhibited by the previously discussed approaches algorithms, Valid
Scope (Geo) and Containment Scope provide moderate execution times among all evaluated schemes.
Further, we can also see that Containment Scope (RD) can consistently improve the execution time of
region queries if a modest increase in disk I/O and bandwidth consumption is allowed. As discussed
before, we expect that the decrease in query submission rate from deploying the containment scope
framework will oset any increase in processing time required to form a response to an individual
server query. Finally, we take note of the theoretical bound computed from our cost model. Recall that
this bound is based on the number of disk accesses performed by the program, as this is expected to be
the main source of any delay in the program. Our experimental results remain significantly below our
stated upper bound and indicate that the algorithm is behaving as expected. It is also worth mentioning
that the added computational complexity from the reduction logic is quite visible when comparing
Containment Scope and Containment Scope (RD) experimental results. Using the same notation from
Section 7, we let mp denote the number of possible complementary objects in the search space, m f denote
the number of verified complementary objects, and n denote the number of result objects of a query Q.
Then the algorithm complexity is given by O(nmp ) without additional reduction techniques and is given
by O(nmp + nm2f ) with reduction techniques. The additional factor of nm2f is visible in the algorithm
performance.
126
8.4.1.2
kNN Query
We now turn our attention to analyzing the eectiveness of various auxiliary scope techniques at
efficiently processing kNN queries issued from clients. We focus on those characteristics that distinguish
kNN queries from the region query types that were discussed previously. First, we present the calculated
scope area sizes for each of the evaluated approaches. Recall that the containment scope of a kNN query
depends on the future value of k. Therefore, we assume that future queries have a supplemental
parameter of k = 1 in this case. Our system once again adopts the Monte Carlo method of area
estimation and bounds the possible kNN query search space in each dimension by the complementary
objects that have the largest and smallest spatial coordinates for that dimension. After determining
auxiliary scope membership at 10,000 uniformly distributed points inside of this feasible scope region,
we can approximate the size of the auxiliary scope using ZCS pB, where p is the percentage of
objects inside the auxiliary scope and B is the area size of the bounding box of the feasible scope
area. Figure 8.3(a) shows the scope area sizes for kNN queries. Notice that Containment Scope once
again outperforms all competing techniques by computing an area that is at least as large as any other
approach studied in our experiment.
In addition, we observe that the scope area sizes of Semantic and Containment Scope remain
constant or increase as the search area (determined by k) increases. However, the scope area sizes
of Valid Scope (TPQ) and Valid Scope (Geo) are negatively correlated with increases in the search
area. This is because valid scope is formed by intersecting the Voronoi cells of result objects, while
containment scope is formed based on the union of Voronoi cells of result objects. When the number
of result objects increases, the intersection area among the included Voronoi cells decreases, and the
union area potentially increases. This observation allows us to conclude that a containment scope for a
particular query will be at least as large as the valid scope computed for the same query. In the case of a
kNN query, the Semantic approach is based on a safe distance that is overly conservative. Containment
Scope derives precise containment scopes, and provides larger scope areas than the Semantic approach
in general. As the value of k increases, Containment Scope oers greater benefit in scope size over
alternative approaches.
As in the case of region queries, we compare the experimantal results for Containment Scope with
the predicted results from our theoretical cost model and conclude that the collected region query
and kNN query data results do conform to the predicted bounds. Moreover, the theoretical bound
for kNN queries is quite close to the observed value. Because our approximation of Voronoi cell size
does not account for partial cells at the boundaries of the dataset, we anticipate that the upper bound
max be slightly lower than our observed results. This does in fact occur for the cases of k = 1, 4 but
is well within the margin of error for our estimation. In the cases of k = 16, 64, we observe that the
upper bound is in fact larger than the experimental values. We explain this dierence in behavior
by noting that kNN query containment scope estimation involves counting overlapping portions of k
dierent Voronoi cells multiple times. As the value of k grows, this overlap becomes more significant
and begins to overshadow the eect of partial Voronoi cells on the dataspace boundary. Our observed
results on auxiliary scope area size allow us to hypothesize that Containment Scope (and Containment
127
Scope (RD)) will have the lowest query submission rates for all spatial query types when studied in
Section 8.5.1.
Next, we turn our attention yet again to the issue of bandwidth consumption. The results for kNN
queries are shown in Figure 8.3(b). As before, the additional bandwidth required for complementary
objects by Containment Scope is on average less than 3% of space requirements for the result objects
communicated for each single query response. This extra bandwidth consumption is worthwhile
because of the benefit of redundant query avoidance provided by the Containment Scope solution.
Furthermore, we notice that the theoretical upper and lower bounds for the Containment Scope solution
correspond well with our collected experimental data. All of our results fall within the range formed
by these two bounds.
Continuing our analysis of kNN queries further, we turn our attention to the I/O cost for all studied
methods. The relevant data from our experiment for uniform datasets is shown in Figure 8.3(c). Valid
Scope (TPQ) incurs the highest I/O cost yet again due to a large number of TP queries. In comparison,
Valid Scope (Geo) is more efficient due to its ability to perform a single scan of the index to determine
both result set and complementary set membership. Containment Scope oers this same advantage
over Valid Scope (TPQ) and accesses the same number of pages in the computation method. Our
experimental results sometimes incurred 10-15% additional disk accesses beyond the theoretical upper
bound, but we once again attribute such behavior to unresolved inefficiencies in the R-tree structure.
Finally, we measure the execution time for all the approaches and display the results in Figure
8.3(d). Bare kNN does not need to compute any additional auxiliary scope information, while kNN
query processing for Semantic is also computationally straightforward. It follows that the execution
times for these algorithms are the shortest. At the other end of the spectrum, Valid Scope (TPQ) incurs
the longest execution time because of exhaustive invocations of TP NN queries that are used by the
algorithm to refine the auxiliary scope. In contrast to the extremes exhibited by the previously discussed
approaches algorithms, Valid Scope (Geo) and Containment Scope provide moderate execution times
among all evaluated schemes.
8.4.1.3
RkNN Query
Finally, we consider the performance of the server in computing containment scope information whenever RkNN queries are issued. As previously mentioned, we only analyze auxiliary scope area size and
bandwidth consumption because the server algorithm has not yet been tuned for optimal performance.
Figure 8.4(a) shows the average auxiliary scope area size for RkNN queries with k = 1, 4, 16. Note
that bare query processing has an area of zero since the only query that can be answered locally is a
semantically contained query that is issued in precisely the same query location. Any perturbation
from this initial point will require the query to be submitted to the server for evaluation. In contrast,
containment scope processing yields a very large area for small values of k. Unlike kNN queries, we
observe that the scope area decreases as the value of k increases. This is because we do not allow new
result objects to enter the RkNN result set, and increasing the value of k makes it relatively easy for such
additions to occur. That is, more candidates are potentially elligible for result set membership.
128
We consider the total bandwidth required for RkNN query evaluation in Figure 8.4(b). Notice that the
dierence in bandwidth submission between bare query processing and containment scope processing
is relatively minor. Once again, this is a direct result of the disparity between result set object size and
complementary set object size. The relative dierence between the two studied methods increases as
the value of k increases. This observation implies that the complementary set cardinality increases at
a faster rate than the result set cardinality when varying the parameter k. Overall, we conclude that
Containment Scope is a viable option. It is likely that the moderate increase in bandwidth will be
more than oset by the decrease in the query submission rate that is implied by a substantial auxiliary
scope area. It also appears that the eectiveness of Containment Scope (and likely any auxiliary scope
approach) decreases as the value of k increases.
8.4.2 Non-Uniform Dataset Performance Analysis
(a) Scope area size
(c) I/O cost
Figure 8.5. Server overhead for computing range query auxiliary scope on non-uniform dataset
With a careful comparative analysis of how dierent auxiliary scope frameworks compute individual
server query results on uniform datasets, we now turn our attention to non-uniform datasets. This
includes both Gaussian synthetic datasets as well as our real data based on the TIGER dataset [26].
Once again, we consider region (range and window) queries, kNN queries, and RkNN queries.
129
(a) Scope area size
(c) I/O cost
Figure 8.6. Server overhead for computing window query auxiliary scope on non-uniform dataset
8.4.2.1
Region Query
The results of our experiments on non-uniform data distributions are summarized in Figure 8.5 and
Figure 8.6 for range and window queries, respectively. Here, we briefly summarize notable dierences
between the uniform dataset results and the non-uniform dataset results.
In the case of auxiliary scope area, we note two important observations in the results reported in in
Figures 8.5(a) and 8.6(a). First, the irregularity in data object spacing leads to complementary objects
holding comparatively less influence in the formation of auxiliary scope boundaries for non-uniform
datasets than for uniform datasets. Whats more, this shift in focus serves to increase the size of
containment scope drammatically more than other auxiliary scope types. Recall that this gap is a direct
result of the unioning of Minkowski regions in containment scope formation instead of the intersection
of Minkowski regions in valid scope formation.
Bandwidth consumption for non-uniform dataset experiments is quite similar to that of uniform
dataset experiments. The results are summariezed in Figures 8.5(b) and 8.6(b). The additional bandwidth consumed by various auxiliary scope frameworks remains relatively small because of the large
disparity between the significant size of result objects and the minimal size of complementary objects.
Once again, we expect that the additional consumed bandwidth will be oset by the reduction in the
total number of queries submitted by clients to the server.
As a third performance metric, we consider the I/O cost of dierent auxiliary scope computation
130
(a) Scope area size
(c) I/O cost
Figure 8.7. Server overhead for computing kNN query auxiliary scope on non-uniform dataset
methods over non-uniform datasets. Experimental results are shown in Figures 8.5(c) and 8.6(c). Aside
from the fact that the rate of increase in I/O cost with increases in query area is sometimes greater for
non-uniform datasets, the relative ordering of dierent auxiliary scope computation algorithms remains
the same. Valid Scope (TPQ) incurs the highest I/O cost due to a large number of TP queries, while
Valid Scope (Geo) and Semantic are the most efficient of all auxiliary scope methods (excluding Bare).
Containment Scope requires a minimal additional number of I/O accesses over Valid Scope (Geo) and
Semantic. However, the substantially large containment scope area size should mitigate the number of
additional I/O access requests by facilitating the elimination of redundant client queries.
Finally, we measure the execution time for all approaches over non-uniform datasets and illustrate
the results in Figures 8.5(d) and 8.6(d). Bare and Semantic spatial query processing require no additional
auxiliary scope information and induce relatively short execution times. Valid Scope (TPQ) incurred
the longest execution time for uniform dataset processing and also incurs the longest execution time
for non-uniform datasets. Generally speaking, processing the less predictable structure of non-uniform
data requires additional processing time and resources. This is likely a result of additional union and
intersection logic in the case of region queries. The added complexity leads to a more crystalized
dierence between Containment Scope and Containment Scope (RD) experimental results when using
non-uniform data than when using uniform data. However, the relative performance of dierent
131
(a) Scope area size
Figure 8.8. Server overhead for computing RkNN query auxiliary scope on uniform dataset
auxiliary scope computation techniques remains unchanged.

8.4.2.2
kNN Query
The results of our experiments on non-uniform data distributions are summarized in Figure 8.7 for the
kNN query type. Once again, we briefly summarize notable dierences between the uniform dataset
results and the non-uniform dataset results. Most eects from analyzing dierent dataset distributions
hold true regardless of whether region queries or kNN queries are employed.
The auxiliary scope area size results for kNN queries are shown in Figure 8.7(a). Once again, the size
of containment scope is substantially larger than other auxiliary scope types. Recall that this gap is a
direct result of the unioning of Voronoi cells in containment scope formation instead of the intersection
of Voronoi cells in valid scope formation. Containment Scope also outperforms Semantic approaches
by constructing the auxiliary scope in a way that maintains maximal precision. The Semantic approach
approximates an auxiliary scope region based on collecting an arbitrary number of nearby objects.
Bandwidth consumption results are summariezed in Figure 8.7(b). As in all previous cases, the
additional bandwidth consumed by various auxiliary scope frameworks remains small because of the
large disparity between result objects and complementary object size.
For our consideration of the third and fourth performance metric, we consider the I/O cost and
execution time of kNN queries over non-uniform datasets. Experimental results are shown in Figures
8.7(c) and 8.7(d). Notice that both the I/O cost and overall execution time increase as the query space
(as determined by k) increases. Generally speaking, processing the less predictable structure of nonuniform data requires additional processing time and resources. This is likely a result of more complex
Voronoi cells in the case of kNN queries. However, we note that the relative performance of dierent
auxiliary scope computation techniques remains unchanged.
8.4.2.3
RkNN Query
Finally, we consider the performance of the server in computing containment scope information for
RkNN queries that are issued over non-uniform data. Here, our focus is on synthetic datasets, as the
132
final code base for RkNN query containment scope evaluation is not yet available.
Figure 8.8(a) shows the average auxiliary scope area size for RkNN queries with k = 1, 4, 16. Compared with the uniform dataset results for RkNN queries, we make two important observations. First,
Containment Scope actually is more eective as k increases in this case. Previously, Containment
Scope behaved poorly and reported small area sizes as the value of k increased. This suggests that the
increased freedom for maintaining result set objects in a future querys result set had a greater eect
than the increased probability that a complementary object may enter the result set. That is, the benefit
of increased result set kdist values outweighs the increased complementary set kdist values. The former
eect increases our likelihood of having a positive result set cardinality, while the second eect increases
the likelihood of result set invalidation. Regardless, the dataset distribution clearly has an important
eect on the eectiveness of containment scope processing for RkNN query. As in the uniform case, the
auxiliary scope are for bare query processing is zero by definition.
Finally, we consider the total bandwidth required for RkNN query evaluation in Figure 8.8(b). Here,
the behavior is similar to the uniform dataset case. The overhead incurred by containment scope
processing is minimal for low k values but becomes more substantial as the k increases. Initial tests
suggest that this overhead is largely oset in practice by a decreased server query submission rate.
8.5
Exp. II. Impact of Client Mobility
While the first experiment set considered the eect of isolated query auxiliary scope computation on
the server, the second experiment set studies the impact of client mobility on the overall performance
of each auxiliary scope system framework. The goal of this analysis is to weigh the cost of forming
each auxiliary scope against the benefit gained by avoiding future redundant query submissions to the
server. As mentioned in Section 7, the query submission rate is dependent on both the clients location
as well as the clients query load. For our experiment, every client is initially placed at a random
position in the service area. We then allow each client to move based on a random walk model in which
a client chooses (1) a random direction to move and (2) a traveling distance in the range [0, D) for each
step. After completion of one movement step, the client issues a query. The value of D is assigned a
value of either 0.1%, 0.5%, or 1.0% of the total dataspace length depending upon the specific test being
run. We assume that a maximum distance D = 0.5% is used unless otherwise specified. When D is
assigned to a small value, the result of future spatial queries is more likely to be covered by a previous
result. Thus, the eectiveness of all of the evaluated approaches as measured by the minimization of the
query submission rate and the reduction of bandwidth consumption is enhanced. In this evaluation,
we simulate ten separate mobile clients, and each of these clients proceeds through 100 steps and issues
100 queries along its trajectory. The experimental results to be reported are obtained by averaging the
performance for all queries among these ten clients. Furthermore, we study two scenarios. The first
scenario considers the parameters of all queries to be fixed, while the second scenario assumes spatial
query parameters can vary.
In many respects, the results obtained in this set of experiments oer the most accurate depiction
of the real world performance gains that can be realized through various auxiliary scope techniques
133
since these results amortize the cost of each implementation over both server processed queries and
client processed queries. That is, the results in this section consider any query that can be answered
independently by a client to have no cost to the system. However, the query is still considered to have
been answered by the system and will help to lower the average computational cost for answering a
given spatial query.
8.5.1 Fixed Query Parameter Performance Analysis
(a) Query submission rate
(c) I/O cost
Figure 8.9. Impact of client mobility on the performance of fixed range query (r = 1.5%)
In this experiment set, we evaluate the overall system performance improvement observed by
various approaches in terms of query submission rate, bandwidth consumption, server I/O cost, and
server execution time for region (range, window), kNN, and RkNN query types. We consider each type
in turn below.
8.5.1.1
Region Query
First, Figure 8.9 plots all fixed parameter performance results pertaining to range queries. Here, the
radius of each query is fixed at 1.5%. Notice that Valid Scope (TPQ) is not included in our results
because it does not support range query. In general and as is shown in Figure 8.9(a), Bare and Semantic
134
(c) I/O cost
Figure 8.10. Impact of client mobility on the performance of fixed window query (l = 1.5%)
submit all queries to the server so their query submission rates approach 100%. Only exactly the same
query could be answered locally in these cases. That is, the Semantic method asserts the reuse of
spatial query result only when new spatial queries are issued at exactly the same query point. However,
in our evaluation, clients are moving and issue queries at dierent locations. Thus, no spatial query
result can be reused. A direct result of this reality is that both Bare and Semantic cannot help moving
clients to reuse previous spatial query results. On the contrary, Valid Scope (Geo) and Containment
Scope allow better reuse of a previous spatial query result to answer new spatial queries by providing a
non-zero area auxiliary scope. As shown in Figure 8.9 , they significantly reduce the query submission
rates. Furthermore, Containment Scope outperforms all other methods and beats the nearest existing
approach by up to 75%. When the maximum distance moved, D, increases, the likelihood that new
spatial queries are covered by previous spatial queries decreases. That is, it becomes increasingly likely
that the client will exit the auxiliary scope area defined for previous spatial queries. It follows that the
query submission rate should increase under these circumstances. This is precisely what occurs in the
case of our fixed parameter range query results.
The measurement of bandwidth consumption is shown in Figure 8.9(b). Due to the extremely high
query submission rate, both Bare and Semantic incur the largest bandwidth consumption among all
the evaluated approaches. Containment Scope consumes less bandwidth than Valid Scope (Geo) as
135
(c) I/O cost
Figure 8.11. Impact of client mobility on the performance of fixed kNN query (k = 4)
a direct result of the fact that it has a lower query submission rate. As predicted when examining the
results from our first set of experiments, the overall bandwidth consumption for Containment Scope is
the lowest out of all methods despite the fact that individual responses to clients using the containment
scope framework are not minimal among all approaches studied here. The highly eective reduction of
redundant queries allows Containment Scope to eectively overcompensate for the cost of transmitting
a low amount of additional data to clients.
We can perform a similar analysis for both server I/O cost and server execution time. Figure 8.9(d) and
Figure 8.9(c) show present the experimental results of fixed parameter range query average execution
time and I/O cost, respectively. As Bare and Semantic approaches do not derive the auxiliary scope
at the server, their processing and I/O costs are minimal. On the other hand, Valid Scope (Geo) and
Containment Scope incur higher execution time and I/O cost. In addition, Containment Scope incurs
high execution time because of the additional complementary objects that must be examined and
because of the computationally expensive logic of removing false positive complementary objects. To
identify the extra cost incurred by the reduction logic, we include Containment Scope (RD). As shown in
Figure 8.9(d), Containment Scope (RD) can considerably reduce the execution time while incurring only
a few (about 1) extra page accesses and slightly increased bandwidth consumption. The small spatial
coordinate size ensures that the penalty for transmitting the additional and superfluous complementary
136
objects is not too great. As such, we can see that disabling reduction logic is an attractive option for
determining containment scope when server processing resources are constrained. However, we note
that the execution cost and I/O cost of implementing both Containment Scope and Containment Scope
(RD) is almost oset by the decreased number of submitted queries submitted to the server. When
comparing Figure 8.9(d) to Figure 8.1(d), we notice that most of the dierence in execution time has
been absorbed by the amortization process. A similar and even more compelling case can be made for
I/O cost. Comparing Figure 8.9(c) to Figure 8.1(c) illustrates that almost the entirety of containment
scopes overhead is oset by the clients ability to answer future queries locally.
With a careful simulation for range queries now complete, we now investigate the system performance improvement for window queries using various auxiliary scope methods. In this case, we
continue to assume that query parameters are fixed and establish a side extent l = 1.5%. Here, Valid
Scope (TPQ) is included since the approach can be used with the window query type. The general
trends of our experimental results shown in Figure 8.10 mirror that of the range query experimental
results shown in Figure 8.9 and can be explained with similar reasoning. Containment Scope (and
Containment Scope (RD)) continue to lead all tested techniques in the ability to reduce the query server
submission rate and helps to drammatically increase client autonomy. The introduction of Valid Scope
(TPQ) does not change this fact, as both valid scope computation methods produce an area of the same
size. Furthermore, this area is much smaller than what is possible in the case of containment scope. The
low query submission rate is a direct consequence of the large area obtained through the containment
scope computational process.
Bandwidth consumption, server execution time, and server I/O cost also remain low for Containment Scope in the case of window queries. The additional bandwidth requirements to transmit complementary set information is almost always justified by avoiding the redundant transmission of future
query objects. Containment scope approaches oer the most eective solution for reducing bandwidth
consumption across all tested datasets and all tested movement distances. While Containment Scope
typically minimizes the overall average transmission size per query, Containment Scope (RD) oers a
compelling alternative that provides nearly as good of performance while saving a substantial amount
of processing time. Despite the fact that Valid Scope (TPQ) and Valid Scope (Geo) provide identical
valid scopes and, by extension, identical query submission rates, their processing cost and I/O cost
are significantly dierent. Valid Scope (TPQ) involves issuing a large number of TP window queries,
which leads to higher processing cost and I/O cost compared to that of Valid Scope (Geo), which needs
only one index lookup. Containment Scope employs the same single index pass scheme adopted by
Valid Scope (Geo) to reduce overhead. In addition, the reduction in server requests actually causes
the more computationally complex containment scope algorithm to beat more simplistic alternatives
under most cases. Once again, we see that Containment Scope (RD) can reduce total execution time
to highly competitive levels at an expense of a small amount of additional I/O accesses and bandwidth
consumption. Finally, it is worth noting once more that the eectiveness of all approaches diminishes
as the maximum distance moved by the client increases. However, Containment Scope suers the least
degradation in its ability to avoid redundant queries as the client movement pattern grows less spatially
focused.
137
8.5.1.2
kNN Query
For the final experimental analysis for client simulation with fixed parameter queries, we turn our focus
to performance measurements collected for kNN queries. In this case, we fix the value of k to four.
A summary of collected experiment results is shown in Figure 8.11. Recall that the Semantic method
for the kNN query type is formed by issuing mNN queries (where m = 2k). Thus, Semantic in this
experiment supports the reuse of kNN query results for new kNN queries issued nearby. It naturally
follows that Figure 8.11(a) illustrates that Semantic improves the query submission rate. However, it
still performs more poorly than Valid Scope (TPQ), Valid Scope (Geo) and Containment Scope, which
all compute precise auxiliary scopes. Furthermore, the overall size of each containment scope is larger
than the overall size of the corresponding valid scope for that query, as containment scope is formed
via a unioning of k overlapping Voronoi cell regions but valid scope is formed via an intersecting of k
overlapping Voronoi cell regions. A direct result of the larger area of application for containment scope
(with respect to valid scope and semantic scope) is an advantage in query server submission rate. Our
experimental results clearly show that Containment Scope has the lowest query submission rate among
all tested approaches. Note that for any value of k, containment scope and valid scope oer the same
functionality and consequently provide the same query submission rate. Dierences between the two
approaches are only evident in cases where the query parameter is allowed to vary over the query series
issued by the client. As the maximum client movement distance grows, we see that the eectiveness of
all approaches in reducing server requests diminishes. This observation mirrors the same results seen
for region queries and can be explained by an increased likelihood that the clients new location lies
outside of the auxiliary scope boundary.
Once again, we also see that Containment Scope and Containment Scope (RD) perform well
with respect to minimizing bandwidth consumption, server execution time, as well as I/O cost. As
previously explained, Valid Scope (TPQ) incurs a very high processing and I/O cost as a result of
the repeated time-parameterized queries that are submitted against the dataset in order to finalize the
auxiliary scope region. The high performance cost of this method grows exceptionally large as the
index cache is restricted or is eliminated completely. Semantic oers a reasonable and low-cost query
reduction method under the right circumstances. However, the inability of mNN queries to construct
a precise auxiliary scope limit their eectiveness under ill-conditioned data distributions. This lack of
refinement in scope formation leads to a relatively high number of query submissions and increases
the overall cost of semantic scope framework in relation to efficient implementations of valid scope
and containment scope methods. Notice that the I/O cost and execution time for kNN queries is nearly
identical when using Containment Scope or Valid Scope (Geo). This is because the implementation
and client utilization of the two techniques is quite similar for each method. In addition, the bandwidth
requirements for Containment Scope is marginally higher than those for Valid Scope (Geo). However,
we expect this disadvantage to be overcome when working in real world environments in which the
query load includes queries with multiple parameter types. Recall that when query parameters are
varied, Containment Scope can outperform both Valid Scope (TPQ) and Valid Scope (Geo) since it
allows otherwise redundantly submitted query requests to be processed locally by the client. This
138
situation is the topic of the next portion of our experimental analysis.
8.5.2 Variable Query Parameter Performance Analysis
(c) I/O cost
Figure 8.12. Impact of client mobility on traditional spatial queries with variable parameters
For our next experiment, we investigate the eect of allowing query loads with variable query
parameters on overall system performance. Each of the previously studied auxiliary scope approaches
is considered in this analysis. For this study, clients randomly pick query parameters (i.e. range
query radius, window query extents, or k nearest objects) independently of previous spatial queries.
This scenario represents the most accurate depiction of real world query behavior, which is likely to
vary based on the needs and expectations of the system user. No current auxiliary scope computation
technique supports the application of one type of spatial query to another type of spatial query, although
such an extension can easily be performed within the spatial query containment framework. Therefore,
our experimental analysis considers each query type sequentially and allows only the query parameter
and client location to change during a clients lifetime. We again have ten clients, and each client
issues 100 queries. The performance of each query is measured, and the average results are shown in
Figure 8.12(a).
In contrast to the previous set of experiments, it is now important to observe that the results of
spatial queries with dierent query parameters may still be covered by a previous spatial query result.
139
This observation is only accounted for by a subset of the auxiliary scope computation techniques. Valid
Scope (TPQ) and Valid Scope (Geo), which only support the reuse of results based on result set equality
and precise matching of query parameters, are likely to suer greatly in this experiment. Furthermore,
Semantic and Containment Scope, which can detect if one spatial query is contained by a previous
result, are expected to see a relative improvement in their performance in comparison to the fixed query
parameter results previously studied.
8.5.2.1
Region Query
Consider once again the results in Figure 8.12(a). For their supported query types, we observe that
Valid Scope (TPQ) and Valid Scope (Geo) incur a much higher submission rate than that of Semantic
and Containment Scope methods as anticipated. In addition, Semantic generally incurs a higher
submission rate than Containment Scope as a result of the precise nature in which the containment
scope for a particular query is computed. It follows that Containment Scope is the most eective
at eliminating redundant queries in the common situation in which both query location and query
parameters change over time. Notice that the optimality of Containment Scope at reducing the query
submission rate is not dependent on the dataset distribution or specific region query type.
As a consequence of having the lowest query submission rate, bandwidth consumption by Containment Scope is also the lowest as shown in Figure 8.12(b). This reinforces a generally observed
phenomena for region queries with fixed parameter query loads. The ability to construct a precise
and maximal auxiliary scope area allows Containment Scope to outperform Semantic in the efficient
utilization of system resources by ensuring that a large number of redundant queries are answered
locally by the client.
As a final step in our analysis of client mobility on auxiliary scope performance, we consider the roles
of server execution time and server I/O cost in our analysis. Containment Scope has high processing
costs for region queries as indicated in Figure 8.12(d). However, we yet again consider Containment
Scope (RD) as a highly eective method at providing the majority of I/O cost reduction and bandwidth
savings obtained by containment scope at a much lower execution cost. The substantial dierence
in processing time between the two approaches suggests that detecting false complementary objects
remains a costly step in the execution of Containment Scope. Thus, we recommend using Containment
Scope (RD) except in cases where bandwidth conservation is of the utmost importance. Doing so allows
containment scope to be computed in roughly the same amount of time as is required to compute valid
scope. At the same time, system users gain the added benefit of further eliminating redundant server
requests.
Finally, we turn our attention to average server I/O cost as presented in Figure 8.12(c). Here,
we observe that Containment Scope oers the lowest average number of disk accesses among all
algorithms tested. In addition, the number of accesses for Containment Scope (RD) is still quite small.
We attribute this exceptional performance to the highly eective means by which containment scope
processing identifies redundant queries and avoids submitting them to the server. The integration of
result set and complementary set membership identification via a single index pass further reduces the
140
number of necessary I/O accesses and makes containment scope an ideal algorithm for environments
in which a large disk cache is not available.
8.5.2.2
kNN Query
We now consider the ability of dierent auxiliary scope techniques to process kNN queries with variable
query parameters. The overall server query submission rate is shown in Figure 8.12(a). As in the case of
region queries, notice that Valid Scope (TPQ) and Valid Scope (Geo) incur a much higher submission
rate than that of Semantic and Containment Scope methods. In addition, Semantic generally incurs
a higher submission rate than Containment Scope as a result of the precise nature in which the
containment scope for a particular query is computed. It follows that Containment Scope is the most
eective at eliminating redundant kNN queries in the common situation in which both query location
and the query parameter k change over time.
As a consequence of having the lowest query submission rate, bandwidth consumption by Containment Scope is also the lowest as shown in Figure 8.12(b). This reverses the trend for kNN queries
with fixed parameters. Recall that valid scope methods incurred a lower transmission overhead for
fixed parameter kNN queries on average than containment scope because of the low cardinality of valid
scope complementary sets and the similar utility of the two methods in processing a fixed parameter
query load. However, Containment Scope is able to reuse result set information more frequently than
Valid Scope (TPQ) and Valid Scope (Geo) in a variable query parameter environment. The principle of
including complementary objects only when absolutely necessary also enables Containment Scope to
outperform Semantic in the efficient utilization of system resources.
As a final step in our analysis of client mobility on auxiliary scope performance for kNN queries,
we consider the roles of server execution time and server I/O cost in our analysis. Containment
Scope has a moderate execution time with respect to other auxiliary scope computation techniques.
Furthermore, server I/O cost is competitive with the best existing auxiliary scope algorithms. In many
cases, Containment Scope actually outperforms these techniques. The unique integration of result
set and complementary set membership identification via a single index pass enables the containment
scope processing framework to be both eective and efficient.
8.6
Exp. III. Impact of Object Density
In the last set of experiments, we investigate system performance using dierent auxiliary scope computation techniques while the object density changes. This factor aects the average distance between
objects and, by extension, the auxiliary scope size. In these experiments, we vary the cardinality of
objects that are uniformly distributed on the same 1 1 unit square service area between 1, 000, 10, 000
and 100, 000 objects. The experiment results shown in Figure 8.13 are obtained by the same settings as
our second set of client mobility experiments in which the query parameters are fixed. Here, we assume
that the maximum distance that clients can move is fixed at 0.5%.
141
(c) I/O cost
Figure 8.13. Impact of object density on traditional spatial query performance
8.6.1 Region Query

When the object density increases, the eectiveness of all auxiliary scope approaches is reduced as is
reflected by the computed query submission rates that are displayed in Figure 8.13(a). This increase
in query submissions to the server is a direct consequence of the diminished size of auxiliary scope
that results from a large number of nearby objects. A high object density allows objects to easily enter
or exit the result space, and either of these actions can limit the eectiveness of a given region query
auxiliary scope computation method. Containment Scope avoids the eect of dense object clusters
as much as possible by allowing result objects to leave the result set of a future spatial query without
necessarily requiring that the new query be sent to the server for evaluation. On the other hand, valid
scope approaches require the remote evaluation of such queries.
Similarly, bandwidth consumption, server I/O cost, and server execution time all increase with
object density as depicted in Figure 8.13(b), Figure 8.13(c), and Figure 8.13(d), respectively. This is
because a high object density implies the need to access, to process, and to transmit additional relevant
objects under the assumption that all other parameters such as query size, query location, and dataset
distribution remain constannt. However, we observe spatial query containment still outperforms all
other approaches tested under the majority of circumstances.
142
8.6.2
kNN Query
As in the case of region queries, any increase in object density decreases the overall eectiveness of all
auxiliary scope approaches. The query submission rates shown in Figure 8.13(a) allow us to conclude
that this same result holds for kNN queries. High object densities decrease the average size of each
Voronoi cell and limit the total area in which a client can move without invalidating the result set
through the inclusion of an additional non-result object. By supporting semantically contained queries
with small k values, Containment Scope avoids the eect of dense object clusters as much as possible.
However, the reality of increased object density cannot be completely avoided by any auxiliary scope
computation method.
Finally, we observe that bandwidth consumption, server I/O cost, and server execution time all
increase with object density as depicted in Figure 8.13(b), Figure 8.13(c), and Figure 8.13(d), respectively.
This is because the polygons used to bound the Voronoi cells of result objects may be more complex in
nature. Thus, additional information about nearby objects may need to be analyzed in order to compute
a precise auxiliary scope answer. However, Containment Scope still remains the best solution with
respect to resource utilization for most situations.
8.7
Recommendation
In summary, Containment Scope performs the best among all evaluated approaches for query loads
that feature both fixed and varied query parameters. This is primarily because the spatial query
containment processing framework can enable every client to determine if a new spatial query is
covered by previous spatial query in cases that other auxiliary scope frameworks fail to consider in
their analysis. The overhead incurred by the server in computing containment scope data is more
than oset by the eventual savings of not having to compute future query results at the server level.
The theoretical and experimental results obtained about spatial query containment suggest that it is an
excellent auxiliary scope method for achieving our goals of system scalability, resource conservation,
and client autonomy. Furthermore, Containment Scope (RD) clearly can improve server execution time
while incurring only a modest amount of extra bandwidth consumption and I/O cost. As a result, we
consider Containment Scope (RD) to be an appropriate model of containment scope computation for
spatial query containment in mobile and wireless environments.
Chapter
Auxiliary Scope Simulator

9.1
Simulator Project Overview
Given the favorable performance results shown in Chapter 7 and Chapter 8, it seems clear that there
exists substantial applicability of auxiliary scope frameworks in general, and containment scope in
particular, to real world situations. However, most current implementations of these ideas have been
designed for the sole purpose of collecting research data in a controlled laboratory setting. As such,
these applications provide only very rudimentary interfaces and features to allow for the testing of
containment scope eectiveness and performance evaluation. While sufficient for data collection,
the present state of these applications limits their usefulness in real life situations such as geospatial
information systems (GISs), business intelligence systems (BISs), and location-based services (LBSs).
The goal of this chapter is to facilitate the planning and eventual construction of a robust simulator that
can illustrate the utility of auxiliary scope methods under common usage scenarios.
As in the case of our experimental analysis, we focus on the application of containment scope to the
area of location-based services. In many ways, this is the most challenging environment since the client
is physically separated from the server and has potentially limited resources. Battery life, processing
capability, storage constraints, as well as bandwidth capacity and network connectivity are all in limited
supply and present real challenges to deploying auxiliary scope solutions in the dynamic environment
in which location-based services exist. The adaptation of an LBS containment scope simulator to GIS or
BIS fields would simply require the translation of the client component from a native mobile application
to a standard desktop application. This is a much simpler translation process than the adaptation of a
desktop application to a mobile application, so we once again focus on the most difficult application
environment.
The auxiliary scope simulator has been designed to support modular construction as well as a core
feature set. In the remainder of this section, we outline the objectives of the auxiliary scope simulator
and describe the necessary components required for proper operation. Moreover, we define a detailed
roadmap for completion of the project and oer commentary on portions of the simulator that have
144
been completed as of the publication date of this document.
9.2
Simulator Objectives
When designing the auxiliary scope simulator, there were several important goals that we wanted to
accomplish. The central objective of the project was the development of a comprehensive, flexible,
and extendible framework that would allow for the eective integration of an arbitrary auxiliary
scope method into a core LBS content delivery system. The end solution should oer a true point-topoint communication model between physically distinct clients and one or more central servers. To
that end, the simulator must allow for information retrieval in a mobile setting and should optimize
communication patterns to avoid unnecessary overhead. Ideally, users would be able to communicate
with this simulator using input means common to mobile devices (e.g. digital stylus, touch screen, voice
recognition) as well as through typical input means (e.g. keyboard, mouse). The optional activation
of auxiliary scope technology would visibly illustrate the reduction in processing overhead during the
querying of available data sources.
Important areas of functionality include the following:
Clear Communication Model. An auxiliary scope simulator must be able to model the real
communication patterns that occur in industrial applications. Such functionality necessitates a
functional client-server communication model that can operate over a universally recognized
protocol.
Practical User Interface. Given that clients using the auxiliary scope framework may take various
forms (e.g. phones, PDAs, workstations), it is important to support a wide range of input
mechanisms. Graphical displays of spatial information when appropriate can aid the user in
ascertaining his/her current location and nearby data objects. Textual representation may also be
useful or necessary under certain circumstances.
Extendible Code Base. It is important to deliver a working auxiliary scope simulator in a short
amount of time to prove the viability of a containment scope solution. However, there are many
useful features that may improve the overall value of an auxiliary scope processing system. Such
features may delay the overall completion of the simulator substantially, so the underlying design
architecture must be such that features can be added or modified at a later date. Furthermore, we
want to be able to consider various auxiliary scope processing techniques, so any developed API
must be as general as possible to allow for the addition of novel approachs as necessary.
Efficient Processing Algorithms. While communication methods, interface design, and overall
design concerns are important, this work also recognizes the need to ensure that algorithms
for processing results are as efficient as possible. Otherwise, it will be difficult to measure the
overall eect of auxiliary scope implementations on real world applications. In particular, the
core algorithms for accessing and manipulating index data and for forming the auxiliary scope
should be written with fast performance as a primary objective.
145
Beyond the basic need of modular design, the auxiliary scope simulator must be capable of iterative
development. We recognize that there are a large number of features that are desirable in the final
simulator product. However, the need to support many features should not proclude the deployment
of incremental versions of the product that meet some subset of the system design requirements. We
highlight the components of the auxiliary scope simulator architecture as well as projected application
milestones in the rest of this chapter.
9.3
Simulator Components
As previously mentioned, the auxiliary scope simulator has been designed incrementally to allow for
rapid application deployment of a basic feature set and later integration of advanced functionality.
Given the client-server model on which most LBSs and auxiliary scope work is founded, the simulator
features both a efficient server component as well as a compact and scalable client component.
Client software has been deployed on Windows Mobile 5.0 platforms (including both Pocket PC
and smart phone devices) and will soon be ported to Windows Mobile 6.0 and Windows Mobile 6.5
environments. In order to facilitate code reusability and efficient mobile application development, the
current client software has been written using Microsoft C# .NET 2008 and the Microsoft .NET Compact
Framework 2.0 architecture. A Palm OS version of the client based on Sun JAVA will also be developed
at a future date. A key factor in choosing C# and JAVA as our mobile client programming languages of
choice is that both languages oer rich integrated development environments for constructing applications that require graphical user interfaces and network communication. While these languages are not
appropriate for low-level manipulation of large amounts of data, we recall that the client component
simply must have the capacity to examine existing auxiliary scope data as represented by result set and
complementary set pairs. Stored auxiliary scope data should be comparative small when considering
the substantial amount of data indexed and processed by the server component. High-level languages
such as C# and Java also allow us to leverage a substantial existing code base of well-tested libraries.
The server component has been deployed on Windows XP, Windows Vista, as well as Windows
Server 2003/2008 platforms. The current implementation utilizes a combination of several applications
developed using ANSI standard C++ code as well as a server request manager written using Microsoft
C# .NET 2008 and the Microsoft .NET Framework 2.0. In order to provide for (1) code reuse and (2)
efficient low-level memory and I/O manipulation, the C# server manager incorporates unmanaged C++
code segment calls for computationally intensive operations and calls other native C++ query processing
applications when appropriate. Future development eorts will extend the server component to also
run on Linux and UNIX based operating systems. Such support will be useful for any Palm OS client
support using JAVA.
In choosing C# as our primary client programming language, we have leveraged comprehensive code
libraries for graphical interface design and network management. In addition, C# utilizes Microsofts
Common Language Runtime (CLR) support for multiple language integration and deployment in
diverse environments. The ability to call C++ code segments from within a C# application ensures that
the substantial amount of existing code for various auxiliary scope processing methods can be included
146
in the simulator as needed.
With respect to graphical user interface requirements, our simulator uses the native interface design
tools oered in Microsoft Visual Studio .NET 2008. Specifically, we employ the Microsofts Windows
Forms for C# technology that replaces the traditional Microsoft Foundation Class (MFC) architecture.
Windows Forms applications can be constructed quickly and with the assistance of interactive tools
inside of Visual Studio. As a result, the auxiliary scope simulator conforms well to object-oriented
design and the Model-View-Controller (MVC) design pattern. Changes in interface design between
mobile and desktop applications are minimal under the Windows Forms design, as any dierences in
architectures are primarily abstracted by the .NET Framework. This facilitates the translation of our
LBS auxiliary scope simulator into systems that conform the GIS or BIS architypes. In the future, we
hope to update the auxiliary scope simulator to use the .NET Framework 3.0, which oers the Windows
Platform Foundation display model for application design. However, support for Windows Mobile is
still limited under this model.
With respect to nework support, the auxiliary scope simulator features two-way communication
between one or more clients and a centralized server. The system can also easily be adapted to support
multiple servers, but this is not a focus of our development at this time. In an eort to minimize
complexity while still providing sufficient control of network traffic, the auxiliary scope simulator
will utilizes socket communication over the TCP/IP protocol. Part of our rationale for deploying the
socket programming model is the widespread use of the model in industrial applications as well as the
fact that substantial code libraries exist within the .NET Framework to support socket programming
using both C++ and C# programing language. Furthermore, data exchange can be managed at bytelevel granularity with sufficient monitoring capabilities provided by the virtual machine. Additional
network monitoring tools may be used by implementors to facilitate statistical analysis as needed. We
elected to use Wireshark in order to observe packet communication and to collecte aggregate traffic
data. In order to support socket programming on a Windows Mobile device, a network connection
must first be established. This connection can be wired via a desktop docking station or wireless via
either direct Internet access (in the case of wireless smart phones) or indirect Internet access (in the
case of a Pocket PC connected to a wireless router). Initial versions of the auxiliary scope simulator
focus on indirect wireless connections, but extensions to direct Internet access are trivial. API calls
to a connection manager object ensure that the appropriate connection type is established, and this
abstraction supports the logical notion that the type of connection established should be orthogonal to
auxiliary scope construction.
A pictorial representation of the constructed system model is provided in Figure 9.1. Notice that
the master server can consist of either a powerful workstation, computing cluster, or mainframe system
depending on the degree of scalability required for the simulation. Our focus has been on developing
a simulator that can support several dozen clients using a workstation with a 2.8Ghz Intel Core 2
Quad Core processor with 8GB DDR2 RAM. However, limited tests suggest that the simulator can
run on substantially less powerful hardware with minimal degradation in speed. The server runs two
logical processes. The first process is the connection manager written in C# that acts as an arbitrator
for dierent client requests and sends result set and complementary set information back to the client.
147
Figure 9.1. Auxiliary scope simulator components
The second process running on the server piece of the auxiliary scope simulator is the spatial query
processing engine written in C++ that performs all raw result set and auxiliary set computation. All
interaction with the dataset and its corresponding R-tree index occur through this component. Also,
multiple engines may be running on a single server in order to simultaneously support mutliple types
of auxiliary scope processing frameworks. The connection manager on the server component can
communicate with heterogeneous devices using the .NET Framework. In the figure, we see examples
in which a PDA, smart phone, and desktop all have instances of the auxiliary scope simulator running.
Each of these devices can issue queries independently to the server, and each device can be running any
supported auxiliary scope type. In actuality, we can support even more diverse platforms than pictured
in Figure 9.1. For example, portable music devices and video game consoles can both interact with
our server through requests managed by client software and the .NET Framework (or .NET Compact
Framework). Communication occurs via wired or wireless channels based on the type of device.
9.4
Simulator Development Roadmap
As previously mentioned, the auxiliary scope simulator has a modular design to allow for incremental
construction and rapid deployment. There are four versions of the simulator planned. The first of these
editions has been completed, and the second version is in development with a tentative release date of
May 2009. In the remainder of this section, we describe the anticipated features and functionality of
each edition of the auxiliary scope application.
148
Version 1.0 (Basic Functionality). This revision of the auxiliary scope simulator includes func-
tioning server and desktop client components. Network connectivity is supported between a
single server and multiple clients via the socket TCPI/IP protocol. Clients specify query information by inputting query type, query parameter, and spatial location manually in the provided
fields. Textual information about result objects and complementary object location is provided for
any issued query. Containment scope and bare query processing represent available LBS query
processing frameworks.
Version 2.0 (Mobile Client Auxiliary Scope Solution). In addition to supporting all functionality
from Version 1, this revision of the auxiliary scope simulator includes full support for Windows
Mobile 5.0 and Windows Mobile 6.0 devices through the Microsoft .NET Compact Framework.
The mobile client oers a complete graphical user interface that maps nearby points of interest
based on the results of spatial queries. To maintain an internal representation of spatial objects,
we use the ESRI shape file standard. This also allows us to provide persistence support for
the auxiliary scope simulator by loading previous memory contents from disk when necessary.
Clients can update their current location on the map by selecting the location using a stylus or
other pointing device. Spatial queries can be issued manually or automatically with user-specified
parameters. Dialog boxes provide additional information about selected objects where available
and requested. Containment scope and bare query processing continue to be supported for this
version of the program.
Version 3.0 (Generalized Auxiliary Scope Library). With complete support for desktop and
mobile clients provided in previous versions of the software, this particular edition aims to increase
the generality of the auxiliary scope simulator solution. We widen our networking support to
feature multiple server and multiple client environments with many-to-many connection types.
In addition, the graphical user interface is updated to support color maps that oer a detailed
and dierentiated display of dataset information. Finally, we add the valid scope (TP-query and
geometric) and semantic scope (mNN and region) to the list of supported auxilairy scope methods.
Version 4.0 (Autonomous Auxiliary Scope Management System). For the final version of the
auxiliary scope simulator, we approach the level of quality expected in a commercial product.
Beyond supporting all functionality from previous versions of the software, we add support for
GPS device tracking on PDAs and smart phones running the Windows Mobile operating system.
This allows client location to be determined independently of user operation and adds a great
degree of flexibility in interface design. Finally, we port our system solution to run on a variety
of additional platforms to include various distributions of Linux and Sun UNIX via the Sun JAVA
programming lnaguage. We also extend our client component to run on Palm OS and Google
Anderoid. These extentions will take time but should be irrelevant to measuring the applicability
of auxiliary scope mechanisms to solving real world problems.
Table 9.1 oers details on the projected completion dates of each version of the auxiliary scope
simulator, while Section 9.5 discusses observations that have been made from completed revisions of
149
Version
Title
1.0
2.0
3.0
4.0
Basic Functionality
Mobile Client Auxiliary Scope Solution
Generalized Auxiliary Scope Library
Autonomous Auxiliary Scope Management System
Release Dates
Alpha Beta
Final
01/09
02/09 03/09
03/09
04/09 05/09
06/09
08/09 TBD
TBD
TBD
TBD
Table 9.1. Auxiliary scope simulator release schedule
the auxiliary scope simulator.
9.5
Simulator Implementation Observations
With a complete description of the auxiliary scope simulator project and associated objectives, this
paper now presents some results from version 1.0 (final release) and from version 2.0 (beta release)
of our system model. First, we have successfully met the goal of creating a set of applications with
core functionality that can be extended later as additional features are completed. The auxiliary scope
construction algorithm is general enough to facilitate the inclusion of multiple auxiliary scope processing
methods into the overall simulator framework.
Version 1.0 of the auxiliary scope simulator oers complete support for containment scope and
bare query processing via command line entry of query type, parameter, and location data. Each
query is transmitted from a client component to a server component for processing. Furthermore,
C# and the .NET Framework has made communication via TCP/IP socket programming efficient and
straightforward. A sample screenshot of command line query processing is given in Figure 9.2(a).
Notice that the output echoes query given query parameters and proceeds to give spatial data about
both result set objects as well as complementary set objects.
With version 1.0 of the program complete, we turned our attention to the mobile client demands of
version 2.0. Our current application oers all capabilities from version 1.0 of the client on mobile devices
running Windows Mobile 5.0 or Windows Mobile 6.0. Instead of running from a command line shell,
we enable similar support and input/output capabilities using a simple Windows Form application with
appropriate control objects. Figure 9.2 oers an example of such an application in action. Notice that
the same information is displayed as was shown in Figure 9.2(a). However, all data is now accessible
from a mobile interface. Figure 9.2(b) displays the connection screen that allows clients to establish a
communication channel to the server. In addition, Figure 9.2(c) illustrates the mobile form for inputting
query information, while Figure 9.2(d) shows the result set and complementary set information that is
sent back to the client from the server for processing.
In the coming months, we plan to refine the content in version 2.0 of the auxiliary scope simulator
and to begin work on versions 3.0 and 4.0 of the project. Even with the limited work conducted so far
on the application, it is clear that various auxiliary scope processing frameworks such as containment
scope can be implemented successfully in real world applications such as LBSs. This eort in tandem
with previously obtained theoretical and experimental results leads us to conclude that containment
150
scope can make a positive and real impact in the daily lives of people around the world. Through the
efficient and eective reduction of redundant spatial queries, containment scope oers an opportunity
to improve the scalability, performance, and autonomy of numerous data intensive systems around the
world.
151
(a) Auxiliary Scope Simulator (Version 1.0, Console)
(c) Auxiliary Scope Simulator (Version

2.0, GUI Search)
(b) Auxiliary Scope Simulator (Version

2.0, GUI Connection)
(d) Auxiliary Scope Simulator (Version

2.0, GUI Results)
Figure 9.2. Auxiliary scope simulator screen captures
Chapter
10
Conclusion
10.1
Spatial Query Processing Problem
As was mentioned in Chapter 1, rapid increases in the processing capability, storage capacity, and
physical mobility of computing devices over the past decade have drammatically changed the landscape
of computing. More data exists today than at any other time in history, and the complex relationships
modeled by this information have substantially increased the demand for and proliferation of multidimensional data processing systems. Users of applications as diverse as Geospatial Information
Systems (GISs), Business Intelligence Systems (BISs), and Location Based Services (LBSs) issue complex
questions that task system infrastructures to the breaking point on a daily basis.
Working with this massive amount of complex data demands new and powerful tools for mining
information that is both useful and relevant to users. Traditional point and range queries are no longer
sufficient to answer these increasingly diverse and complex questions posed by a growing user base.
Over the past 50 years, a common set of spatial queries has emerged that can be used to address a
majority of user questions over multi-dimensional datasets. Popular query types of queries include
region (range, window) queries, nearest neighbor (NN, kNN) queries, and more recently reverse nearest
neighbor (RNN, RkNN) queries. Region queries request information about some subset of the data
domain and are restricted based on spatial boundaries. A second major category of spatial query is
the nearest neighbor query, which returns the object (or k objects) in the dataset that is closest to the
query point. Finally, the reverse nearest neighbor query oers an interesting and novel method of
identifying information about a particular dataset. Queries of this type search for objects in the dataset
that are closer to the query location than to any other data object. Designing a system model that
supports region, nearest neighbor, and reverse nearest neighbor queries helps to ensure the viability
and adaptability of any spatial query processing system.
Although spatial queries are very useful for acquiring needed information, results are often needed
in problem domains that require exceptional scalability, reliability, and performance. However, systems
that process spatial queries often have limited CPU, disk, and bandwidth resources. For example, LBS
153
applications frequently operate on mobile devices that interact with a limited number of central servers
to retrieve information about a clients surrounding environment. Even systems that have substantial
computing capabilities may be incapable of processing demanding queries at a rate that is satisfactory
to end users due to network latency, high demand, or large query workloads. Consequently, there is an
immediate need for eective methods for improving the scalability, efficiency, and autonomy of spatial
query processing systems.
One observation regarding naive query processing approaches is that there is often a large amount
of redundancy in client query requests. That is, many queries request information that partially or fully
overlaps with data already obtained by the client at some earlier time. Considering the example case of
LBSs, query requests are based on a users current location. While this location may change frequently
and result in a high rate of client query submissions, the actual range of movement and, by extension,
the query result changes may be quite limited. Therefore, we can expect hight degrees of query overlap
in situations where the client is frequently updating information. It follows from the above analysis
that substantial reductions in query submissions, processing overhead, and bandwidth consumption
can be achieved if redundant spatial query submissions are eliminated. However, it is very difficult to
detect redundancy in a spatial query request since the result set for such any query is contingent on
both the query parameterization and the dataset distribution.
Existing works have attempted to reduce redundant spatial queries by examing either query parameterization or the relative positioning of data objects. However, each potential solution suers from
several shortcomings. A popular approach focused on by this paper is the construction of an auxiliary
scope for each processed spatial query that specifies an area in which certain types of queries can be
answered locally using previously obtained results. Two existing conceptual techniques for obtaining
auxiliary scope information include semantic scope and valid scope. Semantic scope uses bounds on
the size of a query region to detect when one query is completely contained within another query.
However, this approach does not consider the distribution of data objects and may in fact artificially
limit the reuse of query results. In contrast, valid scope techniques use information about the data
object locations to construct an area in which any future query that diers from the original query only
in its location can be answered using the same result set. That is, the future query Q0 has exactly the
same result as the original query Q so long as (1) only its location changes and (2) it remains inside the
valid scope. However, valid scope refuses to reuse results to answer redundant queries in situations
in which the query parameterization changes or the new query result set does not include all original
result objects.
Through an extensive literature review in Chapter 2, we conclude that there currently exists no
auxiliary scope technique that constructs a precise auxiliary scope that oers optimal reduction of
redundant queries. It follows that servers in current spatial query processing systems must spend
substantial resources to compute information that clients already possess. Furthermore, bandwidth
and energy must be expended to communicate this information from the server to the client. This
redundant evaluation unnecessarily impedes system performance, limits scalability, and increases the
reliance of clients on the server. This work considers how to mitigate this problem eectively.
154
10.2
Spatial Query Containment Solution
The goal of our research is the development of an auxiliary scope processing framework that accurately
solves spatial queries in a way that efficiently utilizes existing data available to the client in tandem
with semantic query information to conserve server resources and to mitigate bandwidth contention
by maximizing client self-reliance. Although both semantic scope and valid scope techniques can
reduce the number of redundant queries that are submitted to the server for processing, these existing
approaches are not optimal because they fail to identify many situations in which a spatial query may
be completely contained within a previous result. To address this concern, our work introduces the
notion of containment scope, which applies to future spatial queries that are exceptionally varied and
oers an optimal area over which existing spatial queries can be reused.
Spatial query containment determines whether a given query Q0 can be answered the result set
RQ of a previous query Q. Our approach answers this question by performing two dierent tests.
The first check determines if Q0 is semantically contained by Q. That is, we determine if Q0 is at
least as restrictive of a query as the original query Q with which we are comparing. The second test
considers if the location of the query point qQ0 is within the containment scope area of Q. If these
two conditions are met, we conclude (1) that the query can be answered locally and (2) that the result
set RQ0 of Q0 will be a subset of the result set RQ of Q. Notice that a key aspect of our technique is
its incorporation of both query semantics (used by the first test) and data object distribution (used by
the second test). To use containment scope in a spatial query processing system, we must enable the
computation and transmission of containment scope information from the server to the client during
regular query evaluation. Notice that containment scope is defined for a specific query and a specific
dataset. Thus, a unique containment scope must be transmitted to the client for each query request for
maximum eectiveness.
To standardize containment scope representation across multiple spatial query types and to minimize
the overhead in deploying such a technqiue, we choose to represent containment scope data using a
combination of the result set computed for a particular query and a complementary set of selected nonresult objects. Only the locations of objects in the complementary set need to be transmitted to the client,
and we expect such data to be much more compact than the supplementary information stored and
retrieved for result set objects. Therefore, the overall communication cost of sending containment scope
data is minimal. Moreover, the overall bandwidth consumed by the system actually decreases because
future redundant queries can now be processed locally by the client without server intervention, and the
cost of communicating this information often far exceeds the minimal cost of sending complementary
set data. Despite the minimal size of each complementary object, our algorithms still reduces the
cardinality of the complementary set as much as possible without incurring undue computational
complexity and long processing times. The precise logic for constructing a containment scope varies
based on the specific type of spatial query.
Spatial query containment supports all three primary spatial query types, namely region queries,
nearest neighbor queries, and reverse nearest neighbor queries. All of our query processing approaches
integrate containment scope construction with result set formulation. A direct consequence of this
155
design decision is that each index page only needs to be accessed a maximum of one time. We adopt
an R-tree indexing structure and the distance browsing technique in order to filter unnecessary objects
from our search space. Our algorithms examine nearby objects first, since these objects are most likely to
aect the result set of a query when its query point is perturbed by a small amount. We require that the
cardinality of the result set remain greater than zero and identify any non-result object that could enter
the result set prior to invalidation by some other object as a complementary object. Various techniques
for additional object filtering are also presented for region queries and reverse nearest neighbor queries.
Clients can maintain any number of (query, result set, complementary set) tuples and can answer a
query locally if any tuple both semantically contains the new query and includes the new query point
in the containment scope that is reconstructed from information contained within the triple.
We continue the evaluation of spatial query containment by considering the advantages, disadvantages, and applications of this novel technique.
10.2.1 Advantages
Spatial query containment was developed to address concerns with the eectiveness, performance, and
generality of existing auxiliary scope methods. The approach addresses the shortcomings of existing
algorithms through its support of the following key features:
Accuracy Spatial query containment uses exact information about the query and underlying
dataset to ensure that queries are only answered locally if the exact result set can be constructed.
No approximate solutions are used, so accuracy is always maintained.
Precision Spatial query containment maximizes the size of the auxiliary scope and, by extension,
the domain over which semantically contained queries can be answered locally. Existing techniques artificially restrict the size of this region by placing additional, unnecessary limitations on
result set construction or by approximating the auxiliary scope using conservative estimates. Precise auxiliary scope construction ensures that spatial query containments attempts to eliminate
redundant queries are highly eective.
Adaptability Spatial query containment supports three of the most popular query types, including
region queries, nearest neighbor queries, and reverse nearest neighbor queries. In addition, our
framework can be deployed in diverse environments where client components reside on the same
machine as the server, on dierent workstations, or on embedded devices such as PDAs and smart
phones. Network communication is optimized to allow for efficient transmission of data through
both wired and wireless interfaces. Increased client independence also ensures that the spatial
query containment framework works well in the dynamic environment of mobile devices that
suers from limited, unreliable network connections.
Scalability Spatial query containment incurs limited overhead in computing the containment
scope for a given query because of the light-weight design of processing algorithms. This design emphasis allows system resources to be used sparingly and improves the scalability of the
156
overall system. This benefit allows organizations and corporations to support more users with
no additional capital expenditure. The integration of query processing and containment scope
computation ensure that additional disk accesses and computation time are minimized. Extensive theoretical and experimental analysis confirm the efficiency of spatial query containment and
show that it outperforms existing algorithms under the vast majority of circumstances. The total
conservation is magnified by the exceptional eectiveness of spatial query containment. Eliminating redundant queries substantially decreases total server processing demands and bandwidth
consumption.
Eectiveness Spatial query containment is uniquely positioned to eliminate a large number of
redundant queries with minimal overhead. Our tests indicate that the average size of containment
scope areas is substantially larger than other auxiliary scope approaches across all query types.
This increased size directly leads to a lower query submission rate than other approaches. However, additional design decisions further decrease the query submission rate. First, notice that we
only require that result sets of semantically contained queries be a subset of the containing querys
result set. This design conforms to the natural view that clients should answer a query locally if
they have all requisite information. Requiring exact result set matches (as some approaches do)
violates this intuitive and eective realization. Furthermore, spatial query containment allows
more restrictive queries to be answered using containment scopes formed for queries that are less
restrictive by introducing the notion of semantically contained queries. This approach reinforces
the notion that adding additional constraints should not impact the local answerability of a query.
The eect of architecting a solution around these views is a natural, efficient, and useful auxiliary
scope framework.
Clearly, there is substantial evidence to argue the superiority and utility of spatial query containment.
10.2.2 Disadvantages
While most of our conceptual, theoretical, and experimental analysis on spatial query containment
suggests that it is the most efficient and eective auxiliary scope solution, there are several drawbacks.
Most of these concerns are minor, but this work lists them in the interest of completeness and full
disclosure:
Query Latency Spatial query containment requires moderately more time to return query results since it must compute complementary set information in addition to result set information.
However, it is important to remember that later queries may be able to be answered much more
rapidly as spatial query containment may allow previously submitted queries to be processed
locally. Thus, there is a tradeo of increased latency for server processed queries in exchange for
the decreased latency of being able to locally answer queries in the future.
Static Data Spatial query containment is very eective at reducing redundant query submissions
but operates under the assumption that the server dataset is static. This is a limitation shared by
all existing auxiliary scope approaches and will be addressed in future work.
157
Spatial Locality The eectiveness of spatial query containment is dependent on the relative spatial
locality of the future query workload. Query sequences that exhibit low spatial locality will likely
not be eliminated any auxiliary scope approach, including containment scope. However, the
majority of query workloads that appear in real applications should exhibit some degree of
locality.
Myopic Analysis Spatial query containment operates by associating an auxiliary scope with each
query and considers each containment scope triple separately when processing a new query
request. In reality, it may be possible to answer a query locally by combining the result sets
of multiple queries. Future variants of this work will consider a global view of containment
scope information and look for ways of merging stored content to further reduce query results.
However, we note that spatial query containment can be eectively combined with traditional
global caching structures to further reduce overhead. Thus, spatial query containment and global
caching algorithms are complementary and not competing approaches.
The weaknesses of spatial query containment are shared by other auxiliary scope approaches and
represent minor shortcomings of the approach. They do not detract from the overall eectiveness of
the solution but rather oer suggestions for further refinement and improvement in the future.
10.2.3 Applications
As mentioned in Chapter 1 and confirmed in Chapters 7, 8, and 9, there are numerous applications to the
concept of spatial query containment. Three primary areas of interest include Geographic Information
Systems (GISs), Business Intelligence Systems (BISs), and Location-Based Services (LBSs). GISs attempt
to conduct complex analysis of spatial information in order to assist with identifying marketing regions,
tracking the spread of disease, or monitoring other geospatial phenomena. BISs issue a plethora of
complex queries against massive, multi-dimensional datasets in search of patterns and relationships
that will help to predict sales and future growth opportunities. Finally, LBSs oer information that is
relevant to a clients current location. Examples include restaurant recommendations, movie theatre
locations, and tourist landmarks. Each of these systems processes a large number of queries, and these
queries often are closely related. A sociologist may be conducting a careful review of shopping malls
in a certain geographic region, a business analyst may be searching for trends in the purchase of a
particular product line, or a traveler may be in search of dierent attractions in some town. All of
these situations represent cases in which spatial query containment can eliminate redundant queries
and improve the efficiency of data transmission and analysis.
Chapters 7 and 8 conduct theoretical and experimental analysis of spatial query containment in
relation to well-known techniques. The results for spatial query containment are impressive and indicate
a clear vote of confidence in its ability to process queries in a real world environment. Containment
scope algorithms outperformed existing techniques in almost every usage scenario. Beyond reducing
bandwidth and query submission rate, the spatial query containment framework often had a beneficial
impact of server disk accesses and cumulative execution time because of the servers reduced workload.
158
Chapter 9 took our analysis one step further towards a working product by constructing a functional
auxiliary scope simulator that supports containment scope in addition to a variety of other auxiliary
scope algorithms. Our simulator works on both desktop and mobile platforms and clear illustrates both
the eectiveness of spatial query containment and the practicality of a spatial query containment scope
solution in industrial applications. The solution proposed in this paper is unparalleled in its support of
operating environments and its ability to positively impact overall system performance.
10.3
Final Thoughts
In conclusion, spatial query containment oers a novel technique for reducing redundant spatial queries
in a variety of environments. The approach is both highly eective in its elimination of unnecessary
queries and particularly efficient with minimal client overhead, server overhead, and communication
cost. Spatial query containment supports many query types and produces a very large auxiliary scope
that can be applied to any semantically contained query. Theoretical, experimental, and simulation
results unanimously indicate the superior performance of spatial query containment over other auxiliary
scope methods and suggests that our solution is a very eective method for reducing client query
submission rate. Our studies also indicate the viability of deploying spatial query containment in real
world applications.
There are several avenues of future work to pursue in the area of spatial query containment. Additional work could be done in client management of multiple containment scope triples. The optimization
of such an environment with respect to result reuse, cache utilization, and processing time is of great
concern to the overall eectiveness of the framework and could vary based on the environment and
query workload. A second extension to this work is to provide support for containment scope processing with dynamic datasets. That is, we wish to consider how to deal with situations in which the
server dataset is constantly changing. Here, the construction of an invalidation scheme for stored client
results is important. Thirdly, it may be possible to refine the spatial query containment model to allow
for tradeos between query result accuracy and resource conservation. Finally, there is great promise in
extending spatial query containment to support non-traditional spatial queries such as density queries
or spatial network queries.
Bibliography
[1] J. L. Bentley and J. H. Friedman, Data structures for range searching, ACM Comput. Surv., vol. 11,
no. 4, pp. 397409, 1979.
[2] G. M. MORTON, A computer oriented geodetic data base and a new technique in file sequencing.
IBM Ltd, 1966.
[3] A. R. Butz, Alternative algorithm for hilberts space-filling curve, IEEE Trans. Comput., vol. 20,
no. 4, pp. 424426, 1971.
[4] Y. Manolopoulos, A. Nanopoulos, A. Papadopoulos, and Y. Theodoridis, Rtrees: Theory and Applications. Springer, 2005.
[5] A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching. in Proceedings of Annual
Meeting, ACM SIGMOD84, Boston, MA, USA June 18-21, 1984, pp. 4757.
[6] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, The r*-tree: an efficient and robust
access method for points and rectangles, in SIGMOD 90: Proceedings of the 1990 ACM SIGMOD
international conference on Management of data. New York, NY, USA: ACM Press, 1990, pp. 322331.
[7] G. R. Hjaltason and H. Samet, Distance browsing in spatial databases, ACM Transactionson
Database Systems, vol. 24, no. 2, pp. 265318, 1999.
[8] F. Korn and S. Muthukrishnan, Influence sets based on reverse nearest neighbor queries, SIGMOD Rec., vol. 29, no. 2, pp. 201212, 2000.
[9] Y. Tao, D. Papadias, and X. Lian, Reverse knn search in arbitrary dimensionality, in VLDB 04:
Proceedings of the Thirtieth international conference on Very large data bases. VLDB Endowment, 2004,
pp. 744755.
[10] K. C. K. Lee, B. Zheng, and W.-C. Lee, Ranked reverse nearest neighbor search, IEEE Trans. on
Knowl. and Data Eng., vol. 20, no. 7, pp. 894910, 2008.
[11] I. Stanoi, D. Agrawal, and A. E. Abbadi, Reverse nearest neighbor queries for dynamic databases,
in In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp.
4453.
[12] Y. Tao and D. Papadias, Time-Parameterized Queries in Spatio-Temporal Databases. in Proceedings of the 2002 ACM SIGMOD Conference on Management of Data (SIGMOD02), Madison, Wisconsin,
Jun 3-6, 2002, pp. 334345.
160
[13] S. Dar, M. J. Franklin, B. T. Jonsson,
D. Srivastava, and M. Tan, Semantic Data Caching and
Replacement. in Proceedings of 22th International Conference on Very Large Data Bases (VLDB96),
Mumbai (Bombay), India, Sep 3-6, 1996, pp. 330341.
[14] K. C. K. Lee, H. V. Leong, and A. Si, Semantic Query Caching in a Mobile Environment, ACM
SIGMOBILE Mobile Computing and Communications Review (MC2R), vol. 3, no. 2, pp. 2836, 1999.
[15] Q. Ren, M. H. Dunham, and V. Kumar, Semantic Caching and Query Processing, IEEE Transactions
on Knowledge and Data Engineering (TKDE), vol. 15, no. 1, pp. 192210, 2003.
[16] J. Zhang, M. Zhu, D. Papadias, Y. Tao, and D. L. Lee, Location-based Spatial Queries. in Proceedings of the 2003 ACM SIGMOD Conference on Management of Data (SIGMOD03), San Diego, CA,
USA, Jun 9-12, 2003, pp. 443454.
[17] K. C. K. Lee, J. Schiman, B. Zheng, and W.-C. Lee, Valid Scope Computation for LocationDependent Spatial Query in Mobile Broadcast Environments, in Proceedings of the 17th International
Conference on Information and Knowledge Management (CIKM), Napa Valley, CA, USA Oct 26-30, 2008,
p. to appear.
[18] Z. Song and N. Roussopoulos, K-Nearest Neighbor Search for Moving Query Point. in Proceedings
of 7th Symposium on Advances in Spatial and Temporal Databases, (SSTD01), Redondo Beach, CA, USA,
Jul 12-15, 2001, pp. 7996.
[19] M. de Berg, M. van Kreveld, M. Overmas, and O. Schwarzkopf, Computational Geometry Algorithms
and Applications. Springer, 2000.
[20] B. Zheng and D. L. Lee, Semantic Caching in Location-Dependent Query Processing. in Proceedings of 7th Symposium on Advances in Spatial and Temporal Databases, (SSTD01), Redondo Beach, CA,
USA, Jul 12-15, 2001, pp. 97116.
[21] K. C. K. Lee, W.-C. Lee, B. Zheng, and J. Xu, Caching Complementary Space for Location-Based
Services, in Proceedings of the 10th International Conference on Extending Database Technology (EDBT),
Munich, Germany, Mar 26-31, 2006, pp. 10201038.
[22] G. R. Hjaltason and H. Samet, Distance Browsing in Spatial Databases, ACM Transactions on
Database Systems (TODS), vol. 24, no. 2, pp. 265318, 1999.
[23] N. Roussopoulos, S. Kelley, and F. Vincent, Nearest Neighbor Queries, in Proceedings of the 1995
ACM SIGMOD Conference on Management of Data (SIGMOD95), San Jose, California, May 22-25,
1995, pp. 7179.
[24] Y. Theodoridis, E. Stefanakis, and T. K. Sellis, Efficient Cost Models for Spatial Queries Using
R-Trees, IEEE Transactions on Knowledge and Data Engineering, vol. 12, no. 1, pp. 1932, 2000.
[25] Y. Tao, D. Papadias, and X. Lian, Reverse knn search in arbitrary dimensionality, in VLDB 04:
Proceedings of the Thirtieth international conference on Very large data bases. VLDB Endowment, 2004,
pp. 744755.
[26] U. C. Bureau, Topologically Integrated Geographic Encoding and Referencing System TIGER/Line.
[27] Y. Theodoridis and T. Sellis, A model for the prediction of r-tree performance, in PODS 96:
Proceedings of the Fifteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database
Systems. New York, NY, USA: ACM, 1996, pp. 161171.

Comprehensive Spatial Query Containment Framework For Minimizing Redundancy

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comprehensive Spatial Query Containment Framework For Minimizing Redundancy

Uploaded by

Copyright:

Available Formats

The Pennsylvania State University

The Graduate School

COMPREHENSIVE SPATIAL QUERY CONTAINMENT FRAMEWORK FOR MINIMIZING

c 2009 Brandon M. Unger

Submitted in Partial Fulfillment

Signatures are on file in the Graduate School.

Example LBS system model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Traditional data indexing methods . . . . . . . .

General spatial query containment system model . . . . . . . . . . . . . . . . . . . . . .

Algorithm region query containment scope . . . . . . . . . . .

Geometric representation of NN containment scope . . . .

Eect of k on RkNN result . . . . . . . . . .

Algorithm client query eval vs . . . . . . . . . . . . . . . .

Search area cir(q, 3r) and MBRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Auxiliary scope simulator components . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Cost model definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Auxiliary scope simulator release schedule . . . . . . . . . . . . . . . . . . . . . . . . . .

Business Intelligence (BI)

Figure 1.1. Example LBS system model

(b) Range queries Q1 , Q2 and Q3

Figure 1.2. Illustration of overlapped query results

but q2 , q. In this case, RQ2 = {d} RQ .

cir(q, r) is a smaller region under

cir(q, r) could be substantially larger. This is also

the only case not considered by spatial query containment.

cir(q, r) mentioned previously. Any spatial query Q0 in which all

(a) Containment scope SQ

Figure 1.3. Containment scope and containment test

Contribution and Organization

Data Organization Techniques

(a) Sample dataset

(c) B-tree index (y-dimension)

(b) B-tree index (x-dimension)

(d) B-tree index (z-order curve)

Figure 2.1. Traditional data indexing methods

2.2.1 B-Tree Index

2.2.2 R-Tree Index

(a) R-tree index

Figure 2.2. Spatial data indexing methods

2.2.3 Other Spatial Indexes

2.2.4 Voronoi Cells

Spatial Query Types

(b) Window query

(d) Reverse NN query

Figure 2.3. Spatial query types

its widespread acceptance and efficient performance.

2.3.1 Region Query

Consider some examples where a region query could be useful:

2.3.2 Nearest Neighbor Query

2.3.3 Reverse Nearest Neighbor Query

q|), and cir(d, |d

q|) contain no other data objects. It follows that q is the closest

k. Once again, the R1NN query is simply a specialized case of

the RkNN query with k = 1.

Figure 2.4. Basic spatial query attempts to solve RNN query

(b) Stanoi RNN processing algorithm

(d) Lee RRNN processing algorithm

Figure 2.5. RNN evaluation techniques

2.3.4 Location-Dependent Spatial Query

2.3.5 Time-Parameterized Spatial Query

Auxiliary Scope Techniques

(a) Semantic region

(b) mNN query