Professional Documents
Culture Documents
Motivation
Integrating data from multiple data sources Distributed query and transactions of data. Definitions and adoptions of data, metadata and their storages. Accessing the data seamlessly. Transparency, support for heterogeneity, extensibility and scalability.
2
Outline
Data Integration Approaches
Application Specific Solutions Application-Integration Framework
ASIS (Application Specific Information System)
Database Federation
Ogsa-DAI (Ogsa-Data Access and Integration) Compare ASIS with Ogsa-DAI
Digital Libraries
SRB (Storage Resource Broker) Sompels Digital Library Approach Compare ASIS with SRB and Sompels DL
3
Hard to extend
A new data source requires new code to be written
4
Outline
Data Integration Approaches
Application Specific Solutions Application-Integration Framework
ASIS
Database Federation
Ogsa-DAI Compare ASIS with Ogsa-DAI
Digital Libraries
SRB Sompels DL Compare ASIS with SRB and Sompels DL
5
Application-Integration Framework
It can also be called component-based framework
Such as CORBA or Filters with common interfaces
Not necessarily address data integration issues Based on common data model (such as CML and GML)
With adaptors, if the source change the adaptor may have to change, but application may never see it.
No need to detailed system knowledge Ex. ASIS - OGC GIS Application Integration Framework
6
ASIS (1)
Enables inter-service communication through welldefined service interfaces, message formats and capabilities metadata. Data model is ASL (Application Specific Lang.) Metadata model is capability document Data and metadata have common predefined schema Components are Filter Services
Web Services, comon service interfaces defined in WSDL Information/data services enabling distributed access, querying and transformation through their predictable input/output interfaces. Chainable, located, and capable of updating their metadata manually or dynamically
7
ASIS (2)
Data and data storage model
Any data can be integrated into the system after transforming to ASL. Heterogeneity is handled at the end-Filters with adaptors. ASL is community-accepted application specific language
GML (Geographic Markup Lang.) in GIS applications CML (Chemistry Markup Lang.) in Chemistry applications
ASIS (3)
Metadata and Metadata storage model:
Data integration is done through Filters capability metadata Metadata is stored in local Filters file system as a flat file. Capability:
Inspired from OGC WMS capability specification. Look like Dublin Core format. Capability like structure is also used in Gannons approach (XPOLA), for Grid services security issues. Describes dynamic Web/Grid resources. Updated manually or dynamically. Consists of descriptor, service and provider metadata Inter-service communication is achieved without a third-party. Enables chain of Filters.
ASIS (4)
Data Access and Filter Chaining
F3 F1
State Boundary F2 Earth Fault Earth Fault
State Boundary
F4
Fault
Each Filter is capable of acting as both a server and a client Capability integration is done through getCapability service interface Requests for common service interfaces are created in accordance with predefined XML schema
Filter Name F1 F2
After Chaining Data Provided Earth, Fault and State Boundary Earth and Fault Fault
F3
F4
Outline
Data Integration Approaches
Application Specific Solutions Application-Integration Framework
ASIS
Database Federation
Ogsa-DAI Compare ASIS with Ogsa-DAI
Digital Libraries
SRB Sompels DL Compare ASIS with SRB and Sompels DL
11
Database Federation
Middleware consisting of database management system Uniform access to number of heterogeneous data sources Provides query language used to combine, contrast, analyze and manipulate the data Data integration is done through Database integration. Combine data from multiple sources in a single SQL statement query recreation. Ex. Ogsa-DAI (Open Grid Service Architecture Data Access and Integration)
12
Ogsa-DAI (1)
Provides common Java API for accessing and integrating data resources such relational and XML databases, and files- in Grid environment Specifically designed for OGSA architecture SQL queries on relational resources and XPath statements on XML collections Provides data pipelining (similar to Filter chaining) via an XML document called perform document. Allows developers to easily add or extend functionality within Ogsa-DAI, activity document.
13
Ogsa-DAI (2)
Data and storage model :
Any data stored in XML or relational databases, files No common data model Data is provided through GDS (Grid Data Services) Uses Ogsa-DQP (Distributed Query Processor) to coordinate to access to multiple data services The enactment engine is the core of Ogsa-DAI. Orchestrate running of the perform document Information in perform document includes:
The list of activities and their XML schemas and implementation classes. The list of role mappers and details The info about data resource
14
Ogsa-DAI (3)
Metadata storage model:
Metadata is kept in Catalog Service (MCS) MCS enables attribute-based querying Metadata is for the datasets, data can be anything (binary, text ..) Data integration is done through XML based activity file mixing activities (in SQL queries) and metadata
Ogsa-DAI (4)
Metadata model:
No common schema for metadata like capability Defines Metadata for the datasets
No schema in XML Stored in Database tables as attributes
Defines Metadata for the Database system to enable querying and defining activities
Schema in XML (mcsActivity.xsd schema file) Kept as XML file in the file system (mcsActivity.xml)
16
Outline
Data Integration Approaches
Application Specific Solutions Application-Integration Framework
ASIS
Database Federation
Ogsa-DAI Compare ASIS with Ogsa-DAI
Digital Libraries
SRB Sompels DL Compare ASIS with SRB and Sompels DL
18
Digital Libraries
Main focus is publishing and discovering of the digital objects. Digital Objects : file, URL, SQL command string and any string of bits. Collects data from multiple different data sources. It is little bit different from the other data integration approaches
Data curation services such as publishing and removing data from the data sources.
Ex. SRB (Storage Resource Broker) and Sompels Digital Library Approach
19
SRB (1)
A federated client server system Each server managing/brokering a set of resources An implementation architecture for
Data grids Digital Libraries.
Storage resources include digital libraries, MSS, UniTree and file systems SRB consists of three components
MCAT services, SRB servers to access to storage repositories and SRB clients
Mediates access to distributed heterogeneous resources Uses MCAT (Metadata Catalog Service) to facilitate brokering and attribute based querying. Integrates data and metadata 20
SRB (2)
Uniform storage interface Resource-specific drivers to map from defined storage to interface Storage resources are registered within SRB as physical resources Logical resources (LSR) enable replication. LSR = one or more than one physical resource Client API refers to LSR. Collections are created by LSR
SRB (3)
Metadata and Metadata Exchange Model:
MAPS (Metadata Attribute Presentation Structure) Independent of the internal representation of the attributes inside the catalog. Provides a uniform interface specification that can be used between user applications and the MCAT catalog and vice verse. Structures which form the MAPS:
MAPS_Query_Struct, MAPS_Result_Struct, MAPS_Update_Struct and MAPS_Definition_Struct
Mapping from MAPS to other models and exchange format. Dublin Core format is under implementation.
22
SRB (4)
Simple data access scenario:
SRB server spawns SRB agent to authenticate the user/Application by comparing it with information stored in MCAT. Find the location in MCAT. Check user request against permissions stored in MCAT. SRB agent contacts user with the result of his request. SRB agent communicates with the user through a port specific to this client session.
Sompels DL (1)
Scholarly communication as a network-based workflow Instead of Filters and ASL in ASIS, Sompel defines repositories and digital objects, respectively. Repository is a networked system that provides services pertaining to a collection of Digital Objects Repositories have common service interfaces.
Obtain, Harvest and Put.
SP collect metadata from DPs (via 3 service interface); normalize and cluster it to deal with duplicates. DP offer some type of search mechanism for their own repositories.
25
Sompels DL (2)
Data and storage model: Data is the abstraction of the Digital Objects Digital Objects = Digital data + key metadata. Serialization of Digital Objects = Surrogates Surrogates
Information for the value chains and service information used at repository service interfaces. In the XML/RDF format Composed of dataStream and/or Entity tag elements. Chained object is defined by keymetadataID or providerInfo.
Different storage types: book repositories, teaching object repositories, dataset repositories etc. Repositories are active nodes. Repositories enable the use and re-use of materials in many contexts.
26
Metadata model:
Sompels DL (3)
Surrogates are essentially metadata records for objects Based on Dublin Core format with domain specific extensions. Dublin core has 15 standard entities to define resources. For more details see http://doublincore.org
Sompels DL uses Dublin Core for the representation of the resources ASIS uses its own schema. ASIS uses ASL for the representation of the data - Sompels approach doesnt have common data model.
28
Summary
Application-Integration Framework (ASIS)
Easy to add new sources Using online Filters providing required adaptors peer-to-peer chain of Filters no central metadata catalog server Distributed capability exchange and aggregation SOA
Re-usable components (Filters) for different applications in predefined domain Implications of Filter services
Scalable and Fault-tolerant
Load-balancing and caching
29
THANKS !
30
APPENDIX
31
Existing grid security solutions to fine-grained authorization were not addressing general Web/Grid services in compliant with Web Services security specs. With central admins, other approaches dont address dynamic services
32
33
Dublin Core
Challenge of resource description and discovery Language for making a particular class of statements about resources There 2 namespaces Dublin Core element set (dc)and Dublin Core qualifiers (dcq ex. dcq:iso8601). Some of Dublin core metadata element set
Title (dc:title), subject, description, creator, publisher, type, format, source, language, rights
Using DC in RDF, specifications for DC in RDF (work in progress) Resource has(verb) property(dc:creator) X(dc:Ahmet)
34
35 http://www.ils.unc.edu/mrc/jcdl2006/slides/kunze.pdf
36
OAI
Deals with e-print server world Need to develop services that permitted searching across papers housed at multiple repositories Repositories also needed capabilities to automatically identify and copy papers that had been deposited in them. Definition of an interface to permit e-print servers to expose the metadata for the papers that it held. Service providers with similar metadata standards need to harvest this metadata Service providers act as a federation of repositories, by indexing documents, so that multiple collections cen be searched as though they form a single collection
37
OAI-PMH
For the variety of the communities engaged in publishing content on the Web Any networked server can emplly the protocol to enable service providers to collect its metadata HTTP-based request-response transaction Service Providers
Harvest metadata from Data Providers using the OAI protocol and use the returned metadata as a basis for building value-added services.
Comments on OAI
OAI-PMH is ultimately only as useful as the metadata it transports. The tendency of implementers to almost exclusively apply the lowest common denominator of unqualified dublin core makes it difficult to implement more advanced search interface features. Content providers should prefer more expressive metadata schema like MARC or qualified DC and find ways to augment humangenerated descriptive metadata.
39
40
41 http://msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf
42 msc.mellon.org/Meetings/Interop/lagoze_data_model.pdf
Ogsa-DAI
43
Ogsa-DAI Figure
http://www.globus.org/grid_software/data/dai.php
44
Perform Document
http://www.ogsadai.org.uk/documentation/ogsadai-wsi-2.2/doc/interaction/Perform.html
45
MCS
MCS present a design of Metadata Catalog Service that provides mechanism for storing and accessing descriptive metadata attributes Requirements: Store domain-independent attributes, user-defined attributes, query with a set of attributes, query with a logical name, authentication, authorization and auditing Allows users to discover data sets based on the value of descriptive attributes, rather then requiring to know specific names or physical locations of data items
46
SRB
48
SRB
49
CLIENT
Example interaction with SRB using Scommands:
Sinit
Start interaction with SRB
Spwd
Display current position within SRB repository
Sget myFile
Copy myFile from SRB to local storage
Srm myFile
Remove myFile (and all replicas) from SRB
Sexit
End interaction with SRB
50