You are on page 1of 18

International Journal of Geographical Information

Science

ISSN: 1365-8816 (Print) 1362-3087 (Online) Journal homepage: http://www.tandfonline.com/loi/tgis20

A polygon-based approach for matching


OpenStreetMap road networks with regional
transit authority data
Hongchao Fan, Bisheng Yang, Alexander Zipf & Adam Rousell
To cite this article: Hongchao Fan, Bisheng Yang, Alexander Zipf & Adam Rousell (2015):
A polygon-based approach for matching OpenStreetMap road networks with regional
transit authority data, International Journal of Geographical Information Science, DOI:
10.1080/13658816.2015.1100732
To link to this article: http://dx.doi.org/10.1080/13658816.2015.1100732

Published online: 08 Nov 2015.

Submit your article to this journal

Article views: 1

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=tgis20
Download by: [Monash University Library]

Date: 10 November 2015, At: 06:06

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE, 2015


http://dx.doi.org/10.1080/13658816.2015.1100732

A polygon-based approach for matching OpenStreetMap


road networks with regional transit authority data
Hongchao Fana, Bisheng Yangb, Alexander Zipfa and Adam Rousella
Chair of GIScience, University of Heidelberg, Heidelberg, Germany; bState Key Laboratory of Information
Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan, PR China

Downloaded by [Monash University Library] at 06:06 10 November 2015

ABSTRACT

ARTICLE HISTORY

Matching road networks is an essential step for data enrichment and


data quality assessment, among other processes. Conventionally,
road networks from two datasets are matched using a line-based
approach that checks for the similarity of properties of line segments. In this article, a polygon-based approach is proposed to
match the OpenStreetMap road network with authority data. The
algorithm rst extracts urban blocks that are central elements of
urban planning and are represented by polygons surrounded by
their surrounding streets, and it then assigns road lines to edges of
urban blocks by checking their topologies. In the matching process, polygons of urban blocks are matched in the rst step by
checking for overlapping areas. In the second step, edges of a
matched urban block pair are further matched with each other.
Road lines that are assigned to the same matched pair of urban
block edges are then matched with each other. The computational
cost is substantially reduced because the proposed approach
matches polygons instead of road lines, and thus, the process of
matching is accelerated. Experiments on Heidelberg and Shanghai
datasets show that the proposed approach achieves good and
robust matching results, with a precision higher than 96% and a
F1-score better than 90%.

Received 17 September
2014
Accepted 18 September
2015
KEYWORDS

Map matching;
OpenStreetMap; road
network; polygon similarity

1. Introduction
The use of Web 2.0 technology enables social media users to make contributions and
communicate with each other. Among the various pieces of information that users
contribute and share on social media, the geographic type is called Volunteered
Geographic Information (VGI, Goodchild 2007). Currently, OpenStreetMap (OSM) is
considered to be one of the most successful and popular VGI projects, and it has a
global cast of volunteers. Currently, there are more than two million registered members
(OSM 2015), which has led OSM to grow rapidly.
Because the data are collected through crowd-sourcing, OSM has been often
denounced due to its heterogeneity in quality from the beginning of its development,
and needs to be evaluated by comparing with authority data. In 2008, Haklay conducted
a rst analysis that investigated the data quality of roads in OSM for England (Haklay
CONTACT Hongchao Fan
2015 Taylor & Francis

hongchao.fan@geog.uni-heidelberg.de

Downloaded by [Monash University Library] at 06:06 10 November 2015

H. FAN ET AL.

2010). This rst approach was followed by publications on OSM in Germany (Zielstra and
Zipf 2010, Neis et al. 2012) and France (Girres and Touya 2010). More detailed investigations about point (Neis et al. 2010), line (Helbich et al. 2012) and polygon (Mooney et al.
2010) objects can be found in the projects database. As mentioned by Hagenauer and
Helbich (2012) and Ludwig et al. (2011), nearly all empirical studies indicate that urban
areas are better mapped in OSM. This is not surprising, as most urban areas with a
higher population density inherit larger numbers of contributors, who inuence the
quantity and quality of the collaboratively crowd-sourced OSM objects (Girres and Touya
2010, Haklay et al. 2010, Neis et al. 2012). The most recent work on matching OSM road
networks with authority data was proposed by Koukoletsos et al. (2012). Furthermore, a
comprehensive review of the assessment of OpenStreetMap data can be found in the
work of Singh Sehra et al. (2013).
In most of the existing studies, the OSM road network is assessed using reference
data, which are usually professional or authority data. The road lines in the authority
dataset are usually extracted from aerial photographs and (digital) orthophotos and
have a positional accuracy of  3m. In these works, the OSM road network is matched
with reference data by treating roads as line segments. Similar to the most existing
approaches to map matching, features (e.g., distances, angles, shapes and semantics) or
structures (e.g., sub-graphs and proximity graphs) are detected or extracted to measure
the similarity of road networks (Samal et al. 2004, Xiong and Sperling 2004, Volz 2006,
Min et al. 2007, Mustire and Devogele 2007, Olteanu and Mustire 2008, Zhang 2009,
Kim et al. 2010, Li and Goodchild 2011). Most recently, Yang et al. (2013) proposed a
heuristic probability relaxation approach for matching road networks. Their process
utilises an initial probabilistic matrix, according to the dissimilarities in the shapes, and
then integrates the relative compatibility coecient of neighbouring candidate pairs to
iteratively update the initial probabilistic matrix until it is globally consistent. The object
correspondences are then determined based on the probabilities.
The aforementioned approach concentrates on matching line segments. This type of
approach is, however, very time consuming because of the large number of line
segments both in OSM and in the reference dataset. In this article, a novel polygonbased approach is presented to match the OSM road network with reference data. The
idea is rooted in the quality assessment of building footprint data in OSM presented by
Kunze (2012) and Fan et al. (2014). In these studies, the positional accuracy of building
footprints in OSM is 35 m, on average, which indicates that building footprints in OSM
can almost overlap with their corresponding elements in the reference dataset. From
this point of view, the urban blocks that surround the buildings by the streets should
also be almost identical in the two datasets. If the urban blocks can be matched without
a large computational cost, the street network that forms the urban blocks can then be
matched automatically. The advantage of polygon-based matching is that it can reduce
the computational cost. Figure 1 illustrates an example. Whereas 17 line segments are
involved in the line segment-based matching approach, only two pairs of polygons must
be matched in the polygon-based matching approach. The computation cost is substantially reduced because the number of computational units is highly reduced.
In contrast to directly matching road line segments, the polygon-based approach
involves an indirect matching. Polygons of urban blocks must be extracted from the
street networks both in OSM and in authority datasets. Then, the correspondences

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Downloaded by [Monash University Library] at 06:06 10 November 2015

Figure 1. Line segment-based map matching vs. polygon-based map matching.

among line segments of roads and polygons of urban blocks have to be established,
though only once. The data are then ready for arbitrary matching with other data. In this
work, the polygon-based approach is composed of two stages: a pre-processing stage
for extracting polygons of urban blocks and assigning road lines to urban blocks and a
stage of matching road line segments based on the matching of urban blocks.
As shown in Figure 2, the proposed approach starts with a pre-process to extract
urban blocks from both OSM and reference data and assign line segments of roads to
edges of urban blocks. In the matching process, polygons of urban blocks are matched
initially. For a matched urban block pair, if line segments from two datasets are assigned
to the same edge of the urban block, they are then matched to each other. The highlight
of the proposed method is that it matches urban blocks formed by a street network
instead of matching line segments of a street network. Thus, the computational cost is
reduced substantially.
The remainder of this article is structured as follows: Section 2 presents the pre-processing, in which line segments are assigned to edges of polygons of urban blocks; Section 3

Figure 2. The workow of the polygon-based matching approach.

H. FAN ET AL.

describes the matching process; Section 4 demonstrates the experimental results and
evaluation work and Section 5 concludes the whole work and lists some future works.

2. Pre-processing
In the pre-processing task, urban blocks are extracted from the road network initially. At
the same time, line segments of roads are split into small line segments where they
intersect with other roads (also called crossing points). These smaller line segments are
assigned to edges of polygons of urban blocks.

Downloaded by [Monash University Library] at 06:06 10 November 2015

2.1. Extraction of urban blocks from the road network


An urban block is usually dened as the smallest area surrounded by roads. In some
cases, an urban block can be partly enclosed by line segments of a river, hill, or other
human-made structures. In this work, the second type of urban block is neglected, as it
is not often observed in reality. The process of extracting urban blocks from a road
network consists of three steps. First, road line segments are split into small line
segments where they intersect with other roads by using the tool Planarize Lines of
ESRI ArcGIS (version 10.1). Second, polygons are formed as areas enclosed by the line
segments using the tool Feature To Polygon of ESRI ArcGIS. Third, urban blocks are
extracted from these polygons, as described in the following.
The polygons generated above can be categorised into two classes: polygons as urban
blocks and polygons as road areas formed by multi-lane roads or functional road lines close
to crossroads. Theoretically, these two types of polygons can be distinguished without
much eort because polygons of urban blocks contain buildings and/or other city constructions, whereas the others do not. On the one hand, the data on buildings have low
completeness in many regions; on the other hand, this type of data is normally unavailable
or can only be obtained at extra cost from the authority dataset. Therefore, in this work, the
polygons of the second class are detected and removed using their characteristics in terms
of shapes and sizes. The remaining polygons are then urban blocks.
In general, polygons of urban blocks are large because they contain either a number
of buildings or other constructions, whereas the polygons formed by ramps and road
line segments at a trac junction are normally small. Moreover, the polygons resulting
from line segments of multi-lanes are normally long and thin. In this work, the algorithms proposed by Li et al. (2014) are applied to detect long and thin polygons using a
support vector machine (SVM) and to detect polygons at trac junctions as intermediate connections of the polygons formed by multi-lane road lines.

2.2. Assigning line segments of roads to edges of an urban block


Before describing the proposed method, the term edge, as it is related to urban block
polygons, must be dened. Dierent from the traditional denition, where an edge is
the line segment between two vertices of a polygon, an edge of an urban block polygon
is dened as a polyline between two intersections of roads. Based on this denition,
road line segments are assigned to edges of urban block polygons. For this purpose, the
topologies among road line segments and urban block polygons should be identied by

Downloaded by [Monash University Library] at 06:06 10 November 2015

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

(i) calculating the distance between road line segments to edges of urban polygons and
(ii) checking whether road line segments are parallel to edges of urban polygons.
Figure 3(a) explains the algorithm using an example. We suppose here that LA is a
road line segment and LB is an edge of an urban block polygon.
In the rst step, LA is converted into a sequence of points (Pa;1 ;    ; Pa;n ) with small
and equal intervals. For point Pa;i , its foot point P?a;i perpendicular to LB is calculated. If
P?a;i is located on LB , the distance from the point to LB is calculated as the Euclidean

distance from Pa;i and P?a;i , namely, di Pa;i ; P?a;i . Assume that there are k points
(Pa;i ;    ; Pa;m ) (k m  i; i  1 and m  n) on LA that have foot points (P?a;i ;    ; P?a;m )
perpendicular to LB . The distance between LA and LB is then the RMS (root mean square)
of (di ;    ; dm ), drms . The root mean square error (RMSE) drmse is then used to evaluate the
parallelism of LA and LB . In this work, the distance between two points is set at 0.5 m,
which is suciently small compared with a line segment of a road in the physical world,
which can ensure that the RMSE can be used to evaluate parallelism.
In the case of k  1, there is no relation between LA and LB . Otherwise, if k > 1, the length
of the line segments from P?a;i to P?a;m is calculated as S?bk . The line segment from Pa;i to
Pa;m is calculated as Sak . Then, the relation between LA and LB can be identied as follows:
(1) If S?bk  1m, the perpendicular foot points are located densely together; thus, LA
should be perpendicular to LB . In this case, LA will not be assigned to LB .
(2) If polyline (Pa;i ;    ; Pa;m ) is only a small part of LA and polyline (P?a;i ;    ; P?a;m ) is
also only a small part of LB , LA will not be assigned to LB . In the presented work,
the threshold is empirically set to 20% because the main parts may dier much
from each other.
(3) Otherwise, LA should be parallel (drmse is very small) or quasi-parallel (drmse is smaller
than a given threshold) to LB . Then, the distance between the two polylines drms will
be checked. Normally, there is a maximum of three or ve lanes of a road in one
direction, and a lane is 34 m in length. The threshold is set at 20 m (ve lanes
4 m/lane) so that all of the line segments of the multi-lane streets can be assigned
to the urban block polygon. If drms  20m, LA will be assigned to LB .
It should be noted that there are two special cases. First, (part of) LA overlaps with (part
of) LB . In this case, the k points (Pa;i ;    ; Pa;m ) should be located on LB . In fact, there is
always a polyline that overlaps with an edge of an urban block polygon because a
closed-area polygon is formed by road polylines. Then, LA will be assigned to LB directly.
Second, (part of) LA is located within the urban block polygon. In this case, LA will not be
assigned to any edge of the urban block polygon. In practical experiments, these types
of situations will be checked for at rst to avoid calculating the other parameters and for
identifying the cases (1)(3).
Figure 3(b) demonstrates various relations between road lines and an urban block
polygon. The urban block polygon UBlockA has four edges (L1 , L2 , L3 , and L4 ). In this
example, road lines 5, 6 and 7 are assigned to edge L1 because road line 7 overlaps edge
L1 , road line 5 is parallel to edge L1 , and part of road line 6 is parallel to edge L1 . A small
part of road lines 2, 3 and 4 is parallel to edge L1 . However, this part is shorter than 20%
of the total length of the road lines. Therefore, these three road lines are not assigned to

Downloaded by [Monash University Library] at 06:06 10 November 2015

H. FAN ET AL.

a. The distance and parallelism between


two polylines

b. Relations of polylines to an urban block

Figure 3. Assigning road line segments to edges of an urban block polygon.

edge L1 . A part of road line 8 is quasi-parallel to part of edge L1 . However, this part is
longer than the given threshold. Therefore, it is not assigned to edge L1 . Road line 9 is
located within the urban block. Thus, it will not be assigned to any edge of the urban
block polygon. Road line 10 is perpendicular to edge L4 . Although road line 1 is not
perpendicular to edge L4 , the drmse is smaller than the given threshold. As a result, these
two road lines will not be assigned to edge L4 . Road line 11 is parallel and close to edge
L4 and will be assigned to edge L4 .

3. Polygon-based map matching


Polygon-based map matching involves two steps. First, urban blocks extracted from
OSM are matched with those extracted from the authority road network. Second, the
edges of a matched urban block pair are further matched. If two edges of an urban
block pair are matched, the road lines assigned to them are then matched to each other.

3.1. Matching urban blocks using the overlapped area


In the work of Fan et al. (2014), building footprints on OSM are matched with authority
building footprints by using the overlapped area between building footprints. Their work
shows that OSM building footprints can be almost overlapped with authority building
footprints because there is only a small oset of 34 m in size, on average, between
building footprint pairs. Based on this fact, it can be deduced that the urban blocks in
OSM can also be nearly overlapped with those from the authority data, as they are outlines
of a group of building footprints. Therefore, checking for overlapping areas can also be used

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

to match urban blocks. In contrast to the approach used by Fan et al. (2014), the threshold
of overlapping is adjusted to 50% because it has already been proven that there is only a
small oset between the OSM and authority polygons in Fan et al. (2014).

Let GOSM be the OSM dataset and Gaut be the authority dataset. For urban block
UBlockosm i in GOSM , the urban blocks in Gaut will be checked to determine whether
they intersect with the lines of polygons of UBlockosm i . In the case that there is an
intersection with UBlockaut j , the intersected area is calculated at rst as Areaoverlap .
If the ratio of the area overlap calculated using Equation (1) is larger than 50%, then
urban blocks UBlockosm i and UBlockaut j are matched.

Downloaded by [Monash University Library] at 06:06 10 November 2015

Ratiooverlap

Areaoverlap
 :
min AreaUBlockosm i ; Area UBlockaut j

(1)

Theoretically, there are six types of relations that can be obtained when matching urban
blocks, namely, 1:1, 1:0, 0:1, 1:n, n:1, and n:m. A 1:1 relation (Figure 4(a)) is obtained
when an urban block in Gaut can only be matched to one urban block in Gosm , whereas a
0:1 or 1:0 relation indicates the case that the urban block cannot be matched to those in
another dataset. If an urban block in Gosm can be matched with many urban blocks in
Gaut , a 1:n or n:m relation could be obtained. In this case, the matching results will be
checked for in an inverse manner, namely, for all of the n urban blocks in Gaut , their
matched urban blocks in Gosm are identied using Equation (1). If all of these n urban
blocks are matched to the same urban block in Gosm , it is considered to be a 1:n relation
(Figure 4(c)). Otherwise, if these n urban blocks are matched to more than one urban
block in Gaut , it is then considered to be an n:m relation (Figure 4(b)).

3.2. Matching edges of an urban block pair


In the next step, edges of matched urban blocks will be matched with each other. In
most cases, urban blocks are matched with a 1:1 relation. There are also cases of 1:n and
n:1 relations, when there is updating in one dataset but no updates in the other one.

Figure 4. Possible relations obtained using the polygon matching.

Downloaded by [Monash University Library] at 06:06 10 November 2015

H. FAN ET AL.

There might also be an n:m relation if the two datasets are updated to dierent time
points or with dierent qualities of completeness. In this case, an n:m relation can be
decomposed into several 1:n and 1:1 relations. For example, Figure 4(b) presents a 2:3
relation. To decompose this 2:3 relation for both of the OSM polygons, the ratio of the
overlapped area is calculated for the three polygons in the authority data. It is obvious
that the polygon of UBlockosm 1 is matched with UBlockaut 1 and that UBlockosm 2 is
matched with both UBlockaut 2 and UBlockaut 3 . For this reason, the algorithm for edge
matching is described using examples of only 1:1 and 1:n relations.
For a 1:1 matching result, an edge of an urban block in OSM will be calculated using
all edges of the matched urban block in the authority data (i) for the parallelism and (ii)
for the distance, for which the method described in Section 2.2 is used. An edge is
matched with an edge of the matched urban block if the distance between the two
edges is smaller than a given threshold and the two edges are (quasi) parallel. The
threshold can be calculated according to the scenario that a wide road with 10 lanes
(ve lanes in each direction) is represented as a single line and as a middle lane of the
road in one dataset while being represented with several lines (as the middle line of
each lane) in each direction in the other dataset. In this case, the road has a width of
approximately 40 m (10 lanes 4 m/lane). The polygons formed by lanes are removed
using the method described in Section 2.1. For each direction, only one line segment is
kept and forms urban blocks with other road lines. The distance between the single
centre line and one of these two line segments is then taken as the threshold, which is
approximately 20 m. Considering the oset between the OSM data and authority data,
the threshold is then set at 25 m (20 + 5 m).
Figure 4(a) shows an example of a very common situation in urban block matching:
there is overlapping along some edges, and there is an oset along other edges. The four
edges of the matched urban block can be matched easily because they are almost
overlapped or quasi-parallel and close to each other. Figure 4(b) shows an example
where a 1:n relation is decomposed into n matched pairs with a 1:1 relation. For each
matched pair, the edges are matched using the abovementioned method. In the example
of Figure 4(c), the OSM urban block (blue line) is matched with two authority urban blocks
(red lines). In this work, the urban block above and below can be considered to be two
examples of 1:1 matching with the OSM urban block. For the urban block above, its three
edges (Lb1 , Lb2 , Lb3 ) are matched with the edges (La1 , La2 , La3 ), respectively, whereas the
three edges (Lc1 , Lc3 , Lc4 ) of the urban block below are matched with the edges (La1 , La3 ,
La4 ), respectively. Edges Lb4 and Lc2 might be quasi-parallel to edges La2 and La4 , but the
distances between them are too large; therefore, they cannot be matched.
The real dataset contains cases that are more complicated than the examples shown
in Figure 4. An urban block in OSM can be matched with three or even more urban
blocks in the authority data; conversely, an urban block in the authority data can be
matched with more than two OSM urban blocks. These cases will be solved using the
same methods used for the example presented in Figure 4(c).

4. Experiments and evaluation


The Heidelberg and Shanghai datasets are used to test the proposed algorithm both
with a simple structure of road networks in a small city and with a complex structure of

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

Table 1. Statistical description of road networks in the test area.


Test area
Heidelberg

Downloaded by [Monash University Library] at 06:06 10 November 2015

Shanghai

Spatial extent
12 15 km
142 km 175 km

Source

Number of roads

Total road length (km)

OSM
ATKIS
OSM
SHSMI

12,208
5,945
107,438
61,559

1,724.74
601.58
25,454.96
29,638.15

road networks in a large metropolis, respectively. The OSM datasets of road networks
were downloaded from Geofabrik (http://www.geofabrik.de/data/download.html) in
April 2015. Data from the authority road network in Heidelberg, ATKIS (German
Authority Topographic-Cartographic Information System), for the year 2012 were
obtained. The authority road network in Shanghai was provided by the Shanghai
Bureau of Surveying and Mapping. Table 1 lists a detailed description of the datasets.

4.1. Matching results


Figures 5 and 6 show the matching results of two regions selected for demonstration in
Heidelberg and Shanghai, respectively. Heidelberg, Germany, is a small town with a
relatively simply structured road network. There are only several arterial roads, and they
have two to three lanes in each direction. Most of the roads have only one lane in a

Figure 5. Matching results of a selected region in Heidelberg: (a) a typical case of matching roads in
a residential area and (b) correct matching of authority roads with multi-lane OSM roads.

Downloaded by [Monash University Library] at 06:06 10 November 2015

10

H. FAN ET AL.

Figure 6. Matching results for a region over Huangpu River in Shanghai: (a) the overall matching
results in the region and (b) correct matching of a complicated turn with multi-lanes.

direction. In OSM, only a few arterial roads are mapped with multi-lanes, whereas most
of the roads are mapped with single lines. In the authority dataset, all of the roads are
recorded with single lines. For this reason, the experiment yielded good results for
Heidelberg.
Figure 5(a) shows the OSM roads (red lines) and authority roads (blue lines) of a
region in Neuenheim, Heidelberg, and the matching results between the datasets, with
green lines showing linkages. Figure 5(b) provides a close view of two urban blocks. It
can be seen that the roads surrounding the two urban blocks are correctly matched with
each other. Figure 5(c) presents an example of matching multi-lane roads. The correspondences between the blue lines (authority data) and red lines (OSM data) are found
correctly.
In contrast to Heidelberg, the situation in Shanghai is more complicated. Shanghai is
the largest Chinese city by population and one of the largest Chinese cities in terms of
spatial extent. At the country level, Shanghai is a major hub of Chinas expressway
network. Many national expressways (prexed with G) pass through or terminate in
Shanghai (e.g., G2, G42, G15). There is also a ring expressway within the city. In addition,
there are numerous municipal expressways prexed with S (e.g., S1, S2, S20). In the city
centre, there are several elevated expressways to lessen trac pressure on surface
streets. Furthermore, there are a number of arterial roads and numerous normal roads
that pass through residential areas. Overall, the city is densely structured with a complicated road network. Similar to the Heidelberg dataset, expressways and most of the
arterial roads (highly ranked roads in the trac system) are mapped with multi-lanes in
OSM, whereas most of the roads are mapped with one lane and are lowly ranked in the
trac system. In the authority data, almost all roads are recorded with a single lane,
except for some high-ranked roads mapped with multi-lanes close to and at the junction
points. For this reason, matching in Shanghai is more complicated than in Heidelberg.
Nevertheless, good matching results have been obtained overall.

Downloaded by [Monash University Library] at 06:06 10 November 2015

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

11

Figure 6 shows the matching results for one of the city centres in Shanghai. The
region is located in Yangpu District and Pudong District, which are separated by
Huangpu River. Figure 6(a) shows the overall matching results in the region, where
OSM roads are represented in red and authority data are represented in blue. As
illustrated in Figure 6(a), both single- and multi-lane roads are correctly matched.
Figure 6(b) demonstrates a complicated case at a turn section in Pudong. Both OSM
roads and authority roads are recorded with multi-lanes. As denoted by the linkages
(green lines with arrows) in Figure 6(b), the roads from the two datasets are matched
correctly. The ramp serves as a transition from roads in Pudong to the Yangpu Bridge
over Huangpu River. At the curve, the ramp is composed of two line segments
because of the trac ow in two directions. These two line segments are matched
exactly with their corresponding roads because there is a large distance between
them. At the end of the ramp, the matching result is a 2:2 relation because the two
lines are located so near each other that they are assigned to the same edge of an
urban block.

4.2. Evaluation
To evaluate the proposed approach, the experimental results are examined manually by
randomly checking the matching pairs. Moreover, we attempted to keep the selected
matching pairs evenly distributed in both datasets. For each pair of roads, we took the
road from authority data as reference and checked (1) whether the matched roads in
OSM correctly correspond to the road in the authority dataset and (2) whether there is
missing matching.
For the evaluation, three parameters are dened: TP (true positive) stands for the
number of road pairs that are correctly matched, FP (false positive) stands for the
number of road pairs that are incorrectly matched, and FN (false negative) stands for
the amount of missing matching. Three indicators can then be calculated:
Precision

Recall
F1

TP
 100%;
TP FP

(2)

TP
 100%;
TP FN

(3)

2  Precision  Recall
:
Precision Recall

(4)

It can be seen that the matching results of the Heidelberg dataset are better than those
of the Shanghai dataset, as shown in Table 2. In particular, the Recall for the Heidelberg
data matching is much better than that for the Shanghai data matching because there
Table 2. presents the results of the manual evaluation for both cities. Overall, it shows that the
Precision is greater than 96% and that the F1-score is greater than 90%.
Test bed
Heidelberg
Shanghai

TP
341
903

FP
6
29

FN
17
152

Precision
98.3%
96.9%

Recall
95.3%
85.6%

F1
96.8%
90.9%

Downloaded by [Monash University Library] at 06:06 10 November 2015

12

H. FAN ET AL.

are more missing matching pairs in the Shanghai dataset than there are in the
Heidelberg dataset. According to the visual inspection during the evaluation, almost
all of the missing matching pairs are no-through roads in urban blocks of residential
areas. Because a no-through road is located within an urban block, as shown in Figure 7,
it is not assigned to the polygon of the urban block according to the proposed
algorithm in Section 2.2. Consequently, it will not appear in the matching process at
all. For this reason, it is not matched to any road in the other dataset. Heidelberg is a
small city in Germany. Most of the urban blocks are small and often do not contain nothrough roads. In contrast, Shanghai is a mega metropolis with more than 24 million
inhabitants. Therefore, there are many large urban blocks with several (normally
between two and ve) no-through roads that divide the residential area into several
smaller segments.
According to our local knowledge of both Shanghai and Heidelberg, most of the
incorrect matching pairs are produced because the proposed matching algorithm does
not consider the attributes of the roads. In Heidelberg, pedestrian ways are matched to
roads if they are parallel and closely located to each other. An example of such a case
can be seen in Figure 8. The road (red line) at place of p on the right side of Figure 8
denotes a pedestrian way along the Neckar river bank in the OSM data in Heidelberg.
There is a strip of bushes with a width of approximately 2 m between this pedestrian
way and the road. During the matching process, this pedestrian way is matched to the
road nearby in the authority data because they run parallel and closely. This type of
incorrect matching is, however, dicult to avoid: on the one hand, one is not sure
whether the attributes of the roads in the OSM data are correct; on the other hand, there
are many roads with missing attributes. In addition to the pedestrian ways, service roads
are also incorrectly matched with main roads for the same reason.
It should be noted that in Shanghai, there are certainly more incorrect matching pairs
because there are many elevated express ways with normal trac roads beneath them.
With a third dimension, these two types of roads can be easily distinguished. However,
they look like multi-lane roads both in 2D OSM and in the authority dataset. It is almost

Figure 7. Missing matching due to a pair of no-through roads.

Downloaded by [Monash University Library] at 06:06 10 November 2015

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

13

Figure 8. Incorrect matching of a pedestrian way to a road.

impossible to dierentiate them and evaluate the matching results. Therefore, we did
not check for matching among multi-lane roads in Shanghai during the evaluation. More
incorrect matching for the Shanghai dataset could be identied if the matching pairs of
these complicated roads were checked with their attributes.
Another type of incorrect matching is caused by an incorrect or missing recording in
the dataset. Figure 9 demonstrates a case in Shanghai. In reality, there are two dierent
roads in the dashed-line ellipse in Figure 9 (compared to the Google Map on the right
side). However, there is only one road in each dataset. In the authority dataset (blue
lines), line segments 1 and 3 belong to the same road line segment, whereas line
segments 1 and 2 together represent a road in OSM. Because most of the parts of
these segments overlap and they are assigned to the same edge of the urban block,
they are matched with each other.

Figure 9. An incorrect matching due to incorrect and missing recording.

Downloaded by [Monash University Library] at 06:06 10 November 2015

14

H. FAN ET AL.

In addition to the manual random evaluation, the results of the two test beds are
evaluated by comparing the matching with the results obtained using the probabilistic relaxation approach proposed by Yang et al. (2013). A comparison is conducted
for every matched pair of road segments without considering whether the matching
results are true. The comparison shows that 96.25% of the matching results are
identical for the Heidelberg data, whereas 91.07% of the matching results are
identical for the Shanghai data. From this point of view, it can be stated that the
proposed approach can achieve matching that is as good and robust as that of
Yangs approach.
Furthermore, the proposed approach can conduct the matching with high eciency
because the number of calculation units is reduced substantially when matching using
polygons of the urban block instead of directly matching the line segments. For the two
case studies in this work, it took approximately 17 and 61 minutes for preprocessing
(mainly the assignment of road line segments with edges of urban blocks) the OSM data
in Heidelberg and Shanghai, respectively. The preprocessing times for the authority data
in the two cities were 6 and 24 minutes, respectively. The computational times of the
matching process for Heidelberg and Shanghai were 22 and 142 minutes, respectively.
We note that the computational time of the matching process does not include the
preprocessing time because the preprocess needs to be performed only once, following
which the data are ready for any matching.

5. Conclusion and future works


This article presents a polygon-based approach for matching an OSM road network
with an authority road network. In contrast to the previous approach, the proposed
approach extracts urban blocks at rst and then assigns road lines to edges of urban
blocks. In the matching process, polygons of urban blocks are matched at the rst step
by checking for overlapping area. In the second step, edges of a matched urban block
pair are further matched with each other. Road lines that are assigned to the same
matched pair of urban block edges are then matched with each other. In this
approach, the number of computational units is substantially reduced because the
proposed approach matches polygons instead of road lines. In this way, the process of
matching is accelerated.
The proposed approach is tested using the Heidelberg and Shanghai datasets. The
experiments show that the proposed approach can achieve a good and robust matching. The precision and F1-score of matching for Heidelberg are better than those for
Shanghai. Thus, the proposed approach could provide promising results when the road
structures are not very complicated. Evaluation was conducted by manually and randomly checking the matching pairs. The proposed approach fails for no-through roads
because they cannot form closed polygons with other road lines and, thus, cannot be
assigned to any urban blocks. Another type of missing matching occurs for short road
line segments at road junctions because they cannot be assigned to the urban blocks
nearby. There are few incorrect matching pairs, most of which are caused by missing
and/or incorrect information attributed to road lines.
The proposed method can be used to match the OSM dataset with the authority
dataset. There are two pre-conditions for applying this method to match other datasets.

Downloaded by [Monash University Library] at 06:06 10 November 2015

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

15

First, closed polygons must be able to be formed by the road lines such that urban
blocks in the city can be extracted. Second, the datasets must be represented in the
same coordinate system, and the oset between the datasets should not be too high.
At the current stage, missing matching occurs often at trac junctions. There are
many missing matching cases when matching complicated structured trac junctions
with many ramps. However, road lines within urban blocks cannot be matched
because they cannot be assigned to the urban block. Furthermore, road lines at the
border of the dataset cannot be matched either because closed polygons cannot be
formed as urban blocks. In future work, the following will be done to solve these
problems. First, at trac junctions, polygons with a small area (but not with a long and
thin shape) will not be eliminated. Instead, they will be aggregated with their immediate neighbouring urban blocks. Second, an algorithm will be developed to handle the
cases where road lines are located within an urban block, e.g., by using a line-based
approach. Third, polylines at the border of the test dataset will be analysed to nd
potential convex polygons and close them into pseudo-urban blocks such that the
road lines at borders can also be assigned to urban blocks and further be matched
with those in another dataset.
Similar to the existing line-based matching approaches, the proposed approach fails
where the street networks of the two datasets are complicated and their geometries are
too dierent. This type of situation appears often at large trac junctions due to the two
datasets (i) being acquired with dierent levels of detail in the geometry or (ii) being
acquired at dierent time stamps, over which there are many dierences. In this case, it
is dicult to match the road lines. This is a dicult problem even for a human to
address. A possible solution would be to parameterise the topologies, orientation and
distances of the road lines to an urban block to assign road lines to an edge of an urban
block in a general manner. Then, the sum of squared dierences (SSDs) of the parameters between matched pairs of line segments can be used to denote the quality of
matching, namely, how certainly they can be matched to each other. This method can
also solve the problem of the abovementioned missing matching because the assignment of line segments to urban blocks is not determined using a simple threshold
anymore; instead, every line segment in the surrounding area is assigned to an urban
block with dierent parameters that indicate the topology, orientation and distance of
the line segment to an edge of an urban block.

Acknowledgements
The authors would like to thank Dr Gang Qiao at the School of Surveying and Geoinformatics for
sharing the authority road network of Shanghai City and our student assistants for the great eort
they spent on the manual evaluation.

Disclosure statement
No potential conict of interest was reported by the authors.

16

H. FAN ET AL.

Funding
This work was supported by the Klaus Tschira Foundation.

Downloaded by [Monash University Library] at 06:06 10 November 2015

References
Fan, H., et al., 2014. Quality assessment for building footprints data on OpenStreetMap.
International Journal of Geographical Information Science, 28 (4), 700719. doi:10.1080/
13658816.2013.867495
Girres, J.-F. and Touya, G., 2010. Quality assessment of the French OpenStreetMap dataset.
Transactions in GIS, 14 (4), 435459. doi:10.1111/tgis.2010.14.issue-4
Goodchild, M.F., 2007. Citizens as sensors: the world of volunteered geography. GeoJournal, 69 (4),
211221. doi:10.1007/s10708-007-9111-y
Hagenauer, J. and Helbich, M., 2012. Mining urban land use patterns from volunteered geographic
information by means of genetic algorithms and articial neural networks. International Journal
of Geographical Information Science, 26 6, 963982. doi:10.1080/13658816.2011.619501
Haklay, M., 2010. How good is volunteered geographical information? A comparative study of
OpenStreetMap and Ordnance Survey datasets. Environment and Planning B: Planning and
Design, 37 (4), 682703. doi:10.1068/b35097
Haklay, M., et al., 2010. How many volunteers does it take to map an area well? The validity of
Linus Law to volunteered geographic information. The Cartographic Journal, 47, 315322.
doi:10.1179/000870410X12911304958827
Helbich, M., Amelunxen, C., and Neis, P., 2012. Comparative spatial analysis of positional
accuracy of OpenStreetMap and proprietary geodata. In: International GI_Forum 2012,
Salzburg, Austria.
Kim, J.O., et al., 2010. A new method for matching objects in two dierent geospatial datasets
based on the geographic context. Computers & Geosciences, 36 (9), 11151122. doi:10.1016/j.
cageo.2010.04.003
Koukoletsos, T., Haklay, M., and Ellul, C., 2012. Assessing data completeness of VGI through an
automated matching procedure for linear data. Transactions in GIS, 16 (4), 477498. doi:10.1111/
j.1467-9671.2012.01304.x
Kunze, C., 2012. Vergleichsanalyse des Gebudedatenbestandes aus OpenStreetMap mit amtlichen
Datenquellen [online]. Student research project at the Technical University of Dresden. Available
from: http://www.qucosa.de/leadmin/data/qucosa/documents/8814/SA_Kunze.pdf [Accessed
1 November 2015].
Li, L. and Goodchild, M.F., 2011. An optimisation model for linear feature matching in geographical
data conation. International Journal of Image and Data Fusion, 2 (4), 309328. doi:10.1080/
19479832.2011.577458
Li, Q., et al., 2014. Polygon-based approach for extracting multilane roads from OpenStreetMap
urban road networks. International Journal of Geographical Information Science, 28 (11), 2200
2219. doi:10.1080/13658816.2014.915401
Ludwig, I., Voss, A., and Krause-Traudes, M., 2011. A comparison of the street networks of navteq
and OSM in Germany. In: S.C.M. Geertman, et al., eds. Advancing geoinformation science for a
changing world. Berlin: Springer, 6584.
Min, D., Zhilin, L., and Xiaoyong, C., 2007. Extended Hausdor distance for spatial objects in GIS.
International Journal of Geographical Information Science, 21 (4), 459475 doi:10.1080/
13658810601073315
Mooney, P., Corcoran, P., and Winstanley, A.C., 2010. Towards quality metrics for OpenStreetMap.
In: Proceedings of the 18th SIGSPATIAL international conference on advances in geographic
information systems. New York: ACM, 514517.
Mustire, S. and Devogele, T., 2007. Matching networks with dierent levels of detail.
Geoinformatica, 12 (4), 435453. doi:10.1007/s10707-007-0040-1

Downloaded by [Monash University Library] at 06:06 10 November 2015

INTERNATIONAL JOURNAL OF GEOGRAPHICAL INFORMATION SCIENCE

17

Neis, P., et al. 2010. Empirische Untersuchungen zur Datenqualitt von OpenStreetMap
Erfahrungen aus zwei Jahren Betrieb mehrerer OSM-Online-Dienste. In: AGIT 2010.
Symposium fr Angewandte Geoinformatik, Salzburg.
Neis, P., Zielstra, D., and Zipf, A., 2012. The street network evolution of crowdsourced maps:
OpenStreetMap in Germany 20072011. Future Internet, 4 (4), 121. doi:10.3390/4010001
Olteanu, A. and Mustire, S., 2008. Data matching a matter of belief. In: A. Ruas and C. Gold, eds.
Headway in spatial data mining, lecture notes in geoinformation and cartography. Berlin: Springer,
501519.
OSM, 2015. StatsOpenStreetMap Wiki [online]. Available from: http://wiki.openstreetmap.org/
wiki/Statistics [Accessed 25 August 2015].
Samal, A., Seth, S., and Cueto, K., 2004. A feature-based approach to conation of geospatial
sources. International Journal of Geographical Information Science, 18 (5), 459489. doi:10.1080/
13658810410001658076
Singh Sehra, S., Singh, J., and Singh Rai, H., 2013. Assessment of OpenStreetMap Data-A Review.
International Journal of Computer Applications, 76 (16), 1720. doi:10.5120/13331-0888
Volz, S., 2006. An iterative approach for matching multiple representations of street data. In: M.
Hampe, M. Sester, and L. Harrie, eds. ISPRS workshop multiple representation and interoperability of spatial data, 2224 February. Hannover: ISPRS, 101110.
Xiong, D. and Sperling, J., 2004. Semiautomated matching for network database integration.
ISPRS Journal of Photogrammetry & Remote Sensing, 59 (12), 3546. doi:10.1016/j.
isprsjprs.2003.12.001
Yang, B., Zhang, Y., and Luan, X. 2013. A probabilistic relaxation approach for matching road
networks. International Journal of Geographical Information Science, 27 (2), 319338. doi:10.1080/
13658816.2012.683486
Zhang, M., 2009. Methods and implementations of road-network matching. Thesis (PhD). Institute for
Photogrammetry and Cartography, Technical University of Munich, Munich.
Zielstra, D. and Zipf, A., 2010. A comparative study of proprietary geodata and volunteered
geographic information for Germany. In: Proceedings of 13th AGILE International Conference on
Geographic Information Science, 1014 May 2010. Guimares, Portugal.

You might also like