You are on page 1of 26

Geodata Management

Responsible persons: Andreas Neumann


(Overall)

Helen Freimark
(Specials)

Andreas Wehrle
(Content)

Geodata Management

Content
1. Geodata Management .............................................................................................................................. 2 1.1. Data Import / Export ........................................................................................................................ 3 1.1.1. Data formats .............................................................................................................................. 3 1.1.2. INTERLIS .................................................................................................................................. 6 1.1.3. Geodata Catalogue ..................................................................................................................... 6 1.2. Data Management in a GIS .............................................................................................................. 8 1.2.1. Spatial Databases ....................................................................................................................... 8 1.2.2. Spatial Queries ........................................................................................................................... 9 1.3. Geodata-Modelling ......................................................................................................................... 11 1.3.1. Three level-structure ................................................................................................................ 11 1.3.2. Geometric Model ..................................................................................................................... 11 1.3.3. Topological Model ................................................................................................................... 13 1.3.4. Thematic Model ....................................................................................................................... 15 1.4. Summary ......................................................................................................................................... 19 1.5. Recommended Reading .................................................................................................................. 20 1.6. Glossary .......................................................................................................................................... 21 1.7. Bibliography ................................................................................................................................... 24

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

1. Geodata Management
Learning Objectives

Learn in this lesson about the possibilities of importing and exporting data. You will see various data formats and standards that facilitate the interchange of data between various GIS software. You will learn the differences between various types of database systems. You will also see the three-level structure of the geodata model. Learn about Data Formats used in GIS Learn about special formats for the Data Interchange Learn about Database Systems Learn about Three-Level Structure of a Geodata-Model

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

1.1. Data Import / Export


This unit will introduce you to the ways of exchange data between GIS and other compatible software.

1.1.1. Data formats


The role of GIS in the industry is becoming more important; many fields of work are already taking advantage of the strengths of GIS. Various suppliers offer potent tools to work with geodata, but often, every software has its own data format that is designed for the optimal use within this software. Most of these data formats are proprietary, what means that their structure is not documented, and therefore, the data format may not be used outside that software. For this reason, many applications offer the conversion of the data into other data types what it permits to exchange the data. This is required if data from various sources shall be integrated into one GIS. The most important data formats are descibed below:

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management
Vector formats

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management
Raster formats

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

1.1.2. INTERLIS
"INTERLIS - A Data Exchange Mechanism for Land-Information-Systems" was published first in 1991. It is a mechanism that consists of a conceptional description language and a transfer format that helps to interchange geodata between geosystems. Its aim is to define concretely the data model in order to derive applications and transfer interfaces. The motive is that optimal digital transfer of structured data is just possible if the data is defined unambiguously and consistently. Since 1993, INTERLIS is integrated in the directives of the official land register and since 1998, it is registered as Swiss standard. A version adapted to the user demands was publicated 2003, called INTERLIS 2; supporting advanced compatibility, graphic defintions, multilingual models. INTERLIS is completely compatible to other data structures, such as UML, GML, XML and also backwards and forwards compatible. INTERLIS is free of royalties and guarantees a long-term support and data storage.

1.1.3. Geodata Catalogue


The release of Google Earth and Google Maps has made geodata, especially aerial photos, very popular in the general public. Unfortunately, this data is not exportable, and therefore, it cannot be integrated into GIS. But in the internet, various other sources provide free geodata, vectordata as well as raster data, maps as well as orthophotos. To facilitate the use of this data, several sources containing geodata that can be used in a GIS are listed below:

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

1.2. Data Management in a GIS


Learn in this unit the different kinds of spatial databases and the possibilities that they offer.

1.2.1. Spatial Databases


The storage of the data is fundamental in a GIS. Earlier, data was stored in file systems related to the application. Since every user has its own file system in this method, data may be stored parallel and its consistency, redundancy and safety is not harmonised. With a file system, it is not possible that multiple users access to the dataset simultaneously. Nowadays, data related to a geoinformation-system is usually stored in a database which is organised by a sophisticated data base management system (DBMS). This method separates the data storage and the data management and allows the data to be stored distributed, what means that various data sets may be stored in different databases. As the data is the most valuable part of a GIS, it is inevitable to keep the data well organised and safe; the DBMS is the tool that helps to arrange these objects. It controlls the multiuser access, access authorisation and offers tools for the use of the database.

The database system is defined by the database model. Depending on the model, the database has to be structured following certain rules. The most important models are the following ones: hierarchical, network, relational and object oriented model. Hierarchical Model The hierarchical data model is the oldest type. It requires a hierarchical structure like a tree, similar to the file structure of common desktop computers. Between two types of data, only 1:n-structures are allowed, and therefore, it is in many cases inefficient. Network Model This model consists in two fundamental components: records and sets. Records contain the stored data, sets describe the relation between the records in a fixed order. A record can be an owner or a member: one owner can have various members and one member can have various owners. A set always consists in a owner and a member. The advantage of this model is good performance, at the expense of the extension of the model, since the relations need to be created additionally.

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management
Relational Model In this model, data is stored in multiple tables. This model does not work with an explicit relationship but with an identifying value, a so called key. Every entry has a key by which it is identifyied uniquely. By means of this key, various tables can be combined. Consequently, data belonging to a certain feature can be associated although it is not in the same dataset. For example, the following two tables could be part of a database containing a car inventory:

Structure of the relational model

Common database software (Access, dBase, Oracle) often works with the relational model, and also most of the GIS are based on it. SQL (Structured Query Language) is a database language for the query and modification of data in relational databases. A common abbreviation is RDBMS. Object-oriented Model The object-oriented model (ODBMS or OODBMS) improves the weakness of the realational model - scattered data, and therefore, slow processes - and unites all the information belonging to a certain feature in one object. So, changes in the dataset are easier to process since the data is stored at the same place. Furthermore, data belonging to one object can only be modified by using the methods that are defined for this object. The world's largest database and the hightest ingest rate ever recorded are both holded by object-oriented databases. Since most database systems are still relational, object-oriented databases have a lack of interoperability. Object-relational model Finally, an object-relational database (ORDBMS) is a relational database with the ability to integrate custom data types and methods. This model grew in the 90's by extending relational database concepts with objectoriented concepts. In the near future, big part of Geoinformation Systems is exptected to be based on an objectrelational database system. Nowadays, just a few well-developed ORDBMS are available, the most known is maybe PostgreSQL.

1.2.2. Spatial Queries


In the relational database model, queries can be carried out by SQL (Structured Query Language). To learn SQL, this SQL tutorial is recommended. Question Think: What would be the query that displays all the cars and its owners that were bought before the year 2000?

https://geodata.ethz.ch/geovite/ -Version September 2010

Geodata Management

The previous query is a common operation as it can be carried out in any database. In a database related to spatial data, a similar query could be the selection of every conifer in a forrest that has a trunk thinner than 50 centimeters. This is a normal query of attribute data what means that the location, or with other words, the geometry of the trees does not affect the query (supposed that the trees are stored as point elements and not polygon). As said before, this is a typical type of a GIS query; since the data has a spatial relation (coordinates of the trees), the selected trees can be marked. Although, if the result does't need to be visualised, the same query could be done in a common database, without a spatial relation. Corresponding, queries of geometric data are a specialty of GIS; any data without spatial relation cannot be queried. A typical query is the selection of every tree that is not further than 200 meters of a street. In this case, a diplacement of 200 meters parallel to the street will be done and then, any tree that is inside this displacement will be selected. This example looks like that:

https://geodata.ethz.ch/geovite/ -Version September 2010

10

Geodata Management

1.3. Geodata-Modelling
A geodata-model is normally divided into three parts. You will learn the parts of this division and see the advantages that arise from this structure.

1.3.1. Three level-structure


In order to store the vector, raster and attribute data properly in a database, the geodata-model needs to be well-structured. Common GIS software do that by dividing the model into three levels: geometry, topology and thematic data.

1.3.2. Geometric Model


By the means of the geometric model, which is the basis of the three-level structure, the objects of the data model are described in its geometric aspect. Mainly, it contains the metric attributes of the object; the topological information is in the next level. Often, the purpose of the geometric model is characterised as the definition of the 'Geometry of the Metric'. The approach to the geometric modeling is based upon CAD-systems which used the three primitives edges, areas and bodies to model objects. Various methods to model three-dimensional originated from these primites: Spatial occupancy enumeration The spatial occupancy enumeration is the three-dimensional equivalent to the two-dimensional raster data. Instead of pixels, it subdivides the space into volume elements, so called voxels. Eight smaller cubes may be united to one bigger cube, provided that they have the same value. Similar to the quadtree (cp. Lecture 1), an oktree may visualise the composition.

Example for a voxel-construction

Primitive instancing Every object is characterised by a fixed number of parameters: e.g. length l, width w, height h, depth d and radius r. Therefore, the objects are standardised and therefore well explicated.

https://geodata.ethz.ch/geovite/ -Version September 2010

11

Geodata Management
Cell decomposition This method creates the objects composing small simple bodies to a bigger and more complex object. It may be compared to a construction kit, which allows the construction of any object using basic shapes. Constructive Solid Geometry In this case, an object can be defined as a set of primitive objects. By means of allowable operations (Boolean algebra; such as union, intersect, difference), the primitive objects may be combined with each other. The method is frequently used in CAD systems, but until now, just rarely in GIS.

Creation of a complex object using Boolean Algebra

Boundary Representation The spatial object is described by the boundary elements of the object. These may be a surface, line or point. Since freeforms are allowed in this model, the explications tolerance is rather big.

Teapot modelled by Boundary Representation

https://geodata.ethz.ch/geovite/ -Version September 2010

12

Geodata Management
Definition of the heigth Many geodata model contains a Z coordinate which contains information about the height of a position. Although, this kind of model is not really three dimensional, since only one altitude value is allowed. Therefore, it is called 2.5-D; Digitial Elevation Models (DEM) are normally 2.5-D. The 2+1-D model is similar; every object contains an additional information about the height (e.g. the height of the building). A real 3-D model may contain various Z values on the same position (X,Y). Geometrical queries Typical geometrical queries could be: Evaluate any parcel that has a size bigger than 400 square meters. Search any house that is built higher than 300 meters over sea. Which parts of any house are among a parcel boundary?

1.3.3. Topological Model


The topology is another level of the three-level structure. It contains the spatial relationship between neighbouring objects. In opposition to the geometry, which describes the absolut form and position, the topology is independent of metrical parameters, such as distance, size, etc. It does not change under a set of transformation: imagine a figure drawn onto a balloon: when the balloon gets blown up, the figure will deform metrically, but it will not change its topology. That means, intersected lines will still intersect, a closed circle will still be closed, maybe rather in form of an ellipse. Corresponding to the geometric model, the object of the topological model is the definition of the 'Geometry of the Position'. A benefit of proper topology is the control of data correctness. For example, in a cataster, every boundary line is used twice; once for the parcel on the left side, once for the parcel on the right side. If the topology is correct, this boundary line does exist just once. But if the topology is not correct, the line maybe starts on the same starting point, but has two different end points, and therefore, the topology is not correct.

The topology of the central borderline is not correct

Incidency / Adjacency Two important terms are necessary to know in relation to topography: incidency and adjacency. Incidency refers to the relation in which two objects of different type (e.g. line and a point) are conntected with. For example, a line is incident with its start and end node, and on the other hand, the end node is incident with the line. In contrast to this, the adjacency refers to the relationship of two elements of the same type (e.g. two points). For example, two points may be adjacent through a line, or two lines may start at the same start node, and therefore, they are adjacent .

https://geodata.ethz.ch/geovite/ -Version September 2010

13

Geodata Management

Examples for adjacency and incidency

A spatial element can be interpreted as a collection of points that define the form of the object due to their relationship information. The topological model can be reduced to the following basic elements: p - Number of points l - Number of lines f - Number of surfaces v - Number of volume elements

The previous elements help to check the data consistency. Euler's formula says that for any planar drawing, in which each part is connected with another, the following term is always correct: p-l+f=2. The outer area always counts as a surface.

Quiz Let's see if you still remember the differences between adjacency and incidency. Which of the following statements are correct in relation to the following graphic? a - Three lines are incident with point P4 b - Six lines are adjacent with P1 c - P1 is adjacent with 6 points d - f is adjacent with 5 lines e - a and h are not incident f - P3 is incident with b and g

https://geodata.ethz.ch/geovite/ -Version September 2010

14

Geodata Management

a - Correct. The term incidence refers to objects of different types. c, g and h are incident in P4. b - False. Ajacence referes to objects of the same type. It should be: Six lines are incident with P1. c - Correct. P1 is adjacent with P2, P3, P4, P5, P6 and P7. d - Correct. f is adjacent with a, b, c, d and e. e - False. Two points can't never be incident. The correct phrase would be: a and h are not adjacent. f - Correct.

Topological Queries Typical topological queries are similar to the previous ones, or like that: Which parcels are next to a certain parcel? Does street X cross street Y? Is the railway station Xyz on a certain railway line?

1.3.4. Thematic Model


The thematic model is the base for the relation of the data model with the thematic data. By means of object identifier, every geometric element may be identified uniquely, and thereby, the thematic data can be related to the geometry. The amount of thematic data is not limited, it may be numeric as well as textual. Layer Principle In the mostly used model, the Layer Principle - model, thematic content is stored in different levels, so-called layers. These layers contain just one theme and they may be combined freely. The combination of independent layers can be done by the spatial reference which is given by scale, origin and direction of the layer. Like this, layers such as soil usage and precipitiation can be combined. This principle does not have a hierarchy; every layer is equal. The layer principle is an older model, but still very popular because it is easy to understand. A disadvantage is the fact that every layer needs to be in the same coordinate system in order to combine them.

https://geodata.ethz.ch/geovite/ -Version September 2010

15

Geodata Management

Combination of layers of variuos topics

Feature Class Principle The counterpart to the previous principle is the Feature Class Principle. Depending on its inner construciton, it may be strongly hierarchical (thematic tree) or less (thematic network). The thematic content is divided into objectclasses, whereas every class contains one topic. To specifiy the content exactly, each class is divded into various sublasses. In the tree-structure, every underclass just has one objectclass, in contrast, in the networkstructure, every subclass may have various objectclasses. The feature class model is more flexible than the layer principle, but still less common.

https://geodata.ethz.ch/geovite/ -Version September 2010

16

Geodata Management

Network vs.Tree

The Feature Class Principle is in Germany well-known, since ATKIS (Amtlich Topographisches KartographieInformationssystem) - the catastral information system - is based upon a object class model.

https://geodata.ethz.ch/geovite/ -Version September 2010

17

Geodata Management

Thematic queries Thematic queries are mostly specified by the designation of a geodata-model. Possible queries look like this: Which and how long is the river that flows into the Zrichsee? In which canton live the most farmers? How many cities with more than 20'000 inhabitants exist in Switzerland?

https://geodata.ethz.ch/geovite/ -Version September 2010

18

Geodata Management

1.4. Summary
Remember: Various GIS of various suppliers are available, and therefore, it is difficult to transfer data between the different proprietary systems. Certain data formats facilitate the import and export of data, while other data types aren't interchangeable. Since INTERLIS does not depend on a product, it is a promising project for the interoperability of geodata. The large quantity of geodata that a GIS has to handle with asks for a proper storage method. Modern GIS mostly use a relational database management system (RDBMS) to store and manage the data. It is closely connected to SQL, a language for data queries. In contrast to a pure database management system, a GIS is additional able to carry out spatial queries, what means, that it is possible to query by a spatial location. Finally, remember that a geodata model contains three main parts: a geometric, topologic and thematic model. The geometric model is for the metric definition of the data, that means that it is specifing the physical position and dimension of the objects. The topologic model determines the relationship between the objects, such as intersections, start points, etc. At last, the thematic model defines the thematical content of the geodata model. Mostly used is the Layer Principle, but more flexible is the Feature Class Principle.

https://geodata.ethz.ch/geovite/ -Version September 2010

19

Geodata Management

1.5. Recommended Reading



BILL, R., ZEHNER, M. L., 2001. Lexikon der Geoinformatik. Heidelberg: Herbert Wichmann Verlag.
Good ressource to look up GIS and geodata terms (german)

BILL, R.., 1999. Grundlagen der Geo-Informationssysteme 1. 4. Heidelberg: Herbert Wichmann Verlag.
Especially chapters 1 "Einfhrung in GIS" and 4 "Erfassung raumbezogener Daten" (german)

https://geodata.ethz.ch/geovite/ -Version September 2010

20

Geodata Management

1.6. Glossary
data aquisition: As geodata aquisition we define the collection and recording of geodata for further processing. The process of data aquisition includes the recording of geometry (spatial information), date and time (temporal information) and any non-graphical related attributes (thematic information). data normalization: Data Normalization is the process of removing redundancy in data sets through dividing the data sets in to relations, linked through identifiers. The result of a normalization process not only leads to more efficient data storage (smaller files), but also facilitates geodata updating 2. One distinguishes five different normal forms (NF1 to NF5) with various levels of redundancy removal. datum: A datum defines a spatial reference system. It consists in an ellipsoid which adapts best possible the form of a a local area and in a reference point on the earth's surface against which position measurements are made . E.g. in Switzerland, this reference point is set to the old observatory in Bern, in Germany, it is a point in Potsdam. geodata: As Geodata we can define every dataset that has a spatial aspect or component. Synonyms are "spatial data", "geographic data", "geographic data sets" or "GIS data". The syllable "Geo" implies that the dataset has a spatial component that allows to georeference the described phenomena to a location or region on the earth. geodata model: A geodata model is an abstract, artificially created mapping of a part of the real world relevant to a geoinformatics project. The goal of geodata modeling is to map the relevant conditions and processes in the real world to geodata structures 3. A data model not only describes the content, properties and data structures, but also rules and relations between the entities of a data model. geodata structure: As geodata structure we can define the logical, internal data organization of our geographic information, the means of representing a real-life entity inside a geodata model 4. Data structures should enable data storage and data management, as well as quick retrieval of the data. Unique identifier, links, relationships and dependencies help to build consistent and normalized dataset or to external data sources.
2

data structures and enable links within the

As Geodata Update we can define the process of data appending or the replacement of existing data to reflect changes in the world

or the model the data is derived from. Special data models must be taken into account for temporal GIS functions. Unfortunately these temporal GIS functions are still experimental and not yet part of commercial GIS systems.
3

As geodata structure we can define the logical, internal data organization of our geographic information, the means of representing a

real-life entity inside a geodata model. Data structures should enable data storage and data management, as well as quick retrieval of the data. Unique identifier, links, relationships and dependencies help to build consistent and normalized data structures and enable links within the dataset or to external data sources.
4

A geodata model is an abstract, artificially created mapping of a part of the real world relevant to a geoinformatics project. The goal

of geodata modeling is to map the relevant conditions and processes in the real world to geodata structures. A data model not only describes the content, properties and data structures, but also rules and relations between the entities of a data model.
5

Data Normalization is the process of removing redundancy in data sets through dividing the data sets in to relations, linked through

identifiers. The result of a normalization process not only leads to more efficient data storage (smaller files), but also facilitates geodata updating. One distinguishes five different normal forms (NF1 to NF5) with various levels of redundancy removal.

https://geodata.ethz.ch/geovite/ -Version September 2010

21

Geodata Management
geodata update: As Geodata Update we can define the process of data appending or the replacement of existing data to reflect changes in the world or the model the data is derived from. Special data models must be taken into account for temporal GIS functions. Unfortunately these temporal GIS functions are still experimental and not yet part of commercial GIS systems. geometry: In GIS, the geometry describes the form and situation of an object, but not the relation to other objects, as it does the topology 6. GIS: As a Geo Information System (GIS) we can define a computer-aided system for geographic data management, modeling, analysis, simulation and presentation. A GIS is an organized collection of computer hardware, software, geodata and skilled operators. More powerful GISoftware usually utilizes modern database technology or builds on spatial databases. metadata: Metadata describes other data ("data about data") by defining attributes such as year of creation, author, included area, origin, etc. It helps to identify and select the proper product. Open Geospatial Consortium: The Open Geospatial Consortium (OGC) is an international organisation with more than 300 governmental, non-profit, research and commercial member organisations. Its goal is the development and implementation of standards for geospatial contents and services. orthophoto: A orthophoto is an areal photo that is straightened out to an orthogonal coordinate system. primary data aquisition: Primary data aquisition methods derive geodata directly from the objects to be monitored. Representatives of this method are surveying, photogrammetry and remote sensing. Other primary data aquisition methods include field work, data aquisition through automatic data loggers (e.g. water gages, weather stations), interviews, census and polls. raster graphics: Raster graphics is the combination of raster data with graphical attributes. It is only variable in the color of the raster cell. Usually, it is used for photographies, common formats are .jpg, .png, .gif. secondary data aquisition: Secondary data aquisition methods derive the data from primary data sources. It is f.e. quite common to derive data from maps or aerial images. It is obvious that secondary data aquisitions are of lower quality and less up-to-date than primary data aquisitions. spatial base data: A subset of Geodata. Geographic base data is usually provided by national or international surveying and mapping agencies and includes mainly topographic information stored in maps or landscape models. Satellite and Aerial images can also be regarded as spatial base data, as long as they only provide topographic information in the human-visible bands. Swisstopo:
6

The topology defines the situtation and arrangement of geometrical objects. The metrical relations are irrelevant, just the relation

between the objects is important. A topologic map shows only the logical connections of objects and not the exact situation or dimension. For example, a bus plan ist a typical topologic map; it shows the connections between various points (bus-stops), but not the exact position.

https://geodata.ethz.ch/geovite/ -Version September 2010

22

Geodata Management
Swisstopo is the competence centre of the Swiss Confederation responsible for geographical reference data and all products derived from them. If offers a variety of spatial data, in raster form as well as vector form. thematic data: A subset of Geodata. Thematic data is aquired by specific domains. Thematic data can but does not necessarily have to include a geometry component. It is often linked to spatial base data using coordinates, administrative units, full addresses or zip codes. topology: The topology defines the situtation and arrangement of geometrical objects. The metrical relations are irrelevant, just the relation between the objects is important. A topologic map shows only the logical connections of objects and not the exact situation or dimension. For example, a bus plan ist a typical topologic map; it shows the connections between various points (bus-stops), but not the exact position. vector graphics: Vector graphics is the combination of vector data with graphical attributes. Various attributes can be modified, a polygon may vary in its outline color and thickness, hatching, etc. Common formats are .svg, .dxf, .shp, .pdf. WMS: A Web Map Service (WMS) produces maps of geospatial information dynamically from geographic information. It is defined by the OGC.

https://geodata.ethz.ch/geovite/ -Version September 2010

23

Geodata Management

1.7. Bibliography

BILL, R., ZEHNER, M. L., 2001. Lexikon der Geoinformatik. Heidelberg: Herbert Wichmann Verlag. BILL, R.., 1999. Grundlagen der Geo-Informationssysteme 1. 4. Heidelberg: Herbert Wichmann Verlag. BILL, R.., 1999. Grundlagen der Geo-Informationssysteme 2. 2. Heidelberg: Herbert Wichmann Verlag. INTERLIS. INTERLIS - The GeoLanguage [online]. Available from: http://www.interlis.ch [Accessed 2006-08-15].

https://geodata.ethz.ch/geovite/ -Version September 2010

24

You might also like