You are on page 1of 22

Chapter 17: Additional Slides

November 16, 2008


Outline
 Physical Data Management
 Fragments

 DistributedQuery Processing
 Transactions

 Logical Data Management


 Transparency

 Conceptual Data Management


Physical Data Management
Fragments
 What is a Fragment?
 Vertical subset (project operation)
 Horizontal subset (restrict operation)
 Mixed Fragment (combination of project and restrict)
A fragment may be allocated to a single or
multiple sites
 Fragments may be replicated where
 there is a primary fragment as a single site
 Copies of the fragment are placed at multiple sites
(secondary)
Physical Data Management
Distributed Query Processing
 Involves both local (intra site) and global
(inter site) optimization.
 Multiple optimization objectives
 The weighting of communication costs
versus local processing costs depends on
network characteristics.
 There are many more possible access
plans for a distributed query.
Physical Data Management
Distributed Query Processing cont’d
 Local vs. Global query processing
 In Local,
 Queries are performed at a central server
(single site)
 In Global,
 Mustdecide which sites to access for the fragments
 May need to move fragments from site to site

 Multiple optimization is needed for Global


 Due to the multiple sites and access plans
 Many possible access plans for Global
 Choosing the best one may be difficult
Physical Data Management
Distributed Query Processing cont’d

 Communication Costs
 Communication Time (CT)
 Fixed Message Delay (MD)

 Variable Transmission Time (TT)

 CT = MD + TT
 MD = Number of Messages * Delay per
message
 TT = Number of bits/Data rate
Physical Data Management
Distributed Query Processing cont’d
 Global
Query
Example (p. 632) Product data
 List the order number, Customer-order data

order date, product


number, product name, Product data
Customer-order data
product price, and order
quantity for eastern
orders with a specified
customer number, date
range, and product
color.
 Four possible access
plans
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 1
 Move the Product table
to the Tulsa site where
the query is processed Product data
Customer-order data

Product data
Customer-order data
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 2
 Restrict the Product
table at the Denver Site
Then move result to the
Product data
 Customer-order data

Tulsa site to execute


the remainder of the Product data
Customer-order data

query
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 3
 Perform join and
restrictions of Eastern-
Orders and Eastern Product data
Customer-order data

Order-lines fragments
at the Tulsa site Product data
Customer-order data
 Then move result to
Denver site to join with
Product Table
Physical Data Management
Distributed Query Processing cont’d
 Access Plan 4
 Restrict the Product
table at the Denver site
Move product numbers
Product data
 Customer-order data

to Tulsa and do
restrict/join Product data
Customer-order data

 Then move result back


to Denver to combine
with Product table to
get product names
Physical Data Management
Distributed Query Processing cont’d

 Obviously many different access plans


can be used to answer the same query
 Need to investigate
 actual network costs
 the local processing costs at each site to
determine which access plan is the best
Physical Data Management
Transactions – 2 Phase Commit Protocol

2 Phase Commit (2PC)


 Ensures that all transactions are Atomic
 One site is selected as a Coordinator
while other sites are Participants
 Each Participant site execute a different part of
the transaction
 Two phases: Voting Phase and Decision Phase

 Figure 17.18, page 634


Physical Data Management
Transactions – 2 Phase Commit Protocol

 Several Complications IF
 Failuresduring recovery and Timeouts
 Log records are lost

 Coordinator fails

 Etc.

 Several methods to resolve these, but out


of scope for this class
Logical Data Management
Transparency

 Transparency is related to data independence.


 With transparency, users can write queries with
no knowledge of the distribution, and distribution
changes will not cause changes to existing
queries and transactions.
 Without transparency, users must reference
some distribution details in queries and
distribution changes can lead to changes in
existing queries.
Logical Data Management
Fragmentation Transparency

 Fragmentation transparency provides the


highest level of data independence.
 Users formulate queries and transactions
without knowledge of fragments (locations,
or local formats).
 If fragments change, queries and
transactions are not affected.
 Table 17.6, p 626
Logical Data Management
Location Transparency

 Location transparency provides a lesser


level of data independence than
fragmentation transparency.
 Users need to reference fragments in
formulating queries and transactions.
 However, knowledge of locations and local
formats is not necessary.
 Table 17.7, p 627
Conceptual Data Management
Schema Integration

 Multiple types of schemas may exist to


describe the same dataset
 Integrate multiple schemas into a single
schema
 Best explained using an Exercise
Conceptual Data Management
Schema Integration - Exercise

Engineering Database – Relational Schema


First, find the common
E(eno, ename, title), p.k. = eno entities and relationships
J(jno, jname, budget, loc, cname), p.k. = jno between schemas
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Database III – E/R Model Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration - Exercise

Engineering Database – Relational Schema


Second, draw the conceptual
E(eno, ename, title), p.k. = eno diagram for the common
J(jno, jname, budget, loc, cname), p.k. = jno entities and relationships
G(eno, jno, resp, dur), p.k. = eno, jno
S(title, sal), p.k. = title
Employee Database – CODASYL Schema
Department(dept-name, budget, manager)
Employee(e#, name, address, title, salary)
Database III – E/R Model Department Employs Employee (1:N relationship)
• Entities
Engineer(Engineer No, name, title, salary)
Project(PNo, project name, budget, location)
Client(Client Name, Address)
• Relationships
Engineer Works_In Project : (Responsibility, Duration)
Project Contract_By Client : (Contract Date)
Conceptual Data Management
Schema Integration

 No single correct solution to this exercise


 It may be seen that there are multiple
solutions to this problem
 Considered a very HARD problem
 Often hard to find the best synonyms
especially from a large set of schemas
Weekly Exercise
 Questions 2, 4, and 5

You might also like