Professional Documents
Culture Documents
DATABASE DESIGN ISSUES Er model Normalization Security,Integrity Consistency,Tuning Optimization Research Issues Temporal and Spatial Database
Functional Analysis
Access Specifications
Conceptual Design
Conceptual Model
Logical Design
Logical Schema
Fall 2001
Database Systems
Physical Design 2
Entity-Relationship Model
An entity is a collection of real-world objects that have many common properties. Examples: Students, Instructors, Courses, Sections Student entities have properties: name, address, major, graduation-year A student may be John Smith, 22 Sage Rd., Computer Science, 2000 An attribute is a data item that describes a property of an entity
Fall 2001
Database Systems
Entities
primary identifier
sid
Students
composite attribute
student_name
lastname
firstname
mid_initial
Fall 2001
Database Systems
Fall 2001
Database Systems
Mapping Entities
Students( sid, lastname, firstname, mid_initial )
hobbies
sid
Students
student_name
lastname
firstname
Fall 2001
Database Systems
buyid
cc_num email
Owners
Buyers
owner_name
phone
address
phone
lastname
firstname
mid_initial
street
state
city
zip
Fall 2001
Database Systems
Items
Bids
location
name
amount
Fall 2001
Database Systems
Relationships
Given a set of entities E1,E2,,Ek, a relationship R defines a rule of correspondence between these entities. An instance r(e1,e2,,ek) of the R relation means entities e1,e2,,ek are in a relation r at this instance.
Fall 2001
Database Systems
Relationships
binary relationship
own
Items
Owners
accept
Buyers
place
Bids
date
ternary relationship
Fall 2001 Database Systems 10
Cardinalities of Relationships
Participation cardinalities of a relationship R for an entity E are: min-card(E, R) : the minimum number of entities in E that should be mapped via R max-card(E, R): the maximum number of entities in E that can be mapped via R Own is a relation between owner and item Should each owner be selling items? How many items can an owner sell?
Fall 2001
Database Systems
11
Cardinalities of Relationships
E R F E R F E R F
Cardinalities
(1,1) own (0,N)
Items
(0,N)
Owners
(0,N) accept
Buyers
(0,N)
place (1,1)
Bids
(0,1)
date
Fall 2001
Database Systems
13
Cardinalities
If max-card(E,R)=1 then E has single-valued participation in R If max-card(E,R)=N then E has multi-valued participation in R Given a binary relation R between E and F, R is said to be one-to-one if both E and F have single-valued participation one-to-many if E has single and F has multi-valued participation many-to-many if both E and F have multi-valued participation
Fall 2001
Database Systems
14
Fall 2001
Database Systems
Cardinalities
(1,1) own (0,N)
Items
(0,N)
Owners
(0,N) accept
Buyers
(0,N)
place (1,1)
Bids
(0,1)
date
Fall 2001
Database Systems
17
Problem
Consider the design of a database to manage airline reservations: For flights, it contains the departure and arrival airports, dates and times For flights, it also contains a number of different pricing plans with different conditions (Saturday stay, advance booking, etc.) For passengers, it contains the name, telephone number and seat type preference Reservations include the seat assigned to a passenger Passengers can have multiple reservations
Fall 2001
Database Systems
18
Solution
date depart airport (0,N) (0,N) arrive (1,1) (1,1) time (0,N) flight name time pricing plan
date
(0,N) conditions
reservation
passenger
seat name
Fall 2001
phone
seat pref
Database Systems 19
Employees(eid, , supervisor-id)
Employees
supervises
Supervisor-of (0, N) employee-supervisor (1, 1)
Recursive relationship
project-supervisor (1, N)
Employees
supervised (0, N)
supervises
Projects
Database Systems 20
Fall 2001
Buyers
amount
buy
(0,N)
Items
Stores
BUY
Item I1 I2 I3 I2
Fall 2001
Database Systems
Buyers
(0,N)
buy_item
Items
(0,N)
buy_from
(0,N)
sell_item
(0,N)
Stores
Fall 2001
Database Systems
Weak Entities
The existence of a weak entity W depends on the existence of another (strong) entity E through a relationship R. (Alternate) Two different weak entities may have the same identity (key) if they are related to two different strong entities.
(0,N)
(1,1)
Bank
has
Branch
name
Fall 2001 Database Systems
number
address
23
Weak Entities
Weak entities can be mapped to the relational model by:
Map each weak entity E that depends on a strong entity F to a new relation R Relation R contains all the attributes in E and the primary key of F The primary key for R is the primary key of E and the primary key of F
Fall 2001
Database Systems
24
Generalization Hierarchies
Lower items inherit attributes of their parents
date
Concerts
location
Option 1. Translate into a single relation with a flag for the type of entity [many null values]
Classical
orchestra pieces soloists conductor
Fall 2001
Other
Option 2. Translate into three entities and two is-a relationships, then translate the resulting graph.
25
performers
Database Systems
Extensions
All relational DBMSs come with extensions that give more flexibility to the DBA Examples from Informix composite attributes -> translate as a record address of type ROW(street string, city string, state string, zip string) multi-valued attributes -> translate into collection types such as sets, lists, multi-sets (bags) hierarchies -> create typed tables and translate into a type hierarchy.
REMEMBER, the extensions complicate the data model and make certain SQL queries much harder or impossible, leaving the database programmer with a much harder job of maintaining the database! 26
Intro
Database Security
Aspects of security Access to databases Privileges and views
Database Integrity
View updating, Integrity constraints
Database Security
Database security is about controlling access to information
Some information should be available freely Other information should only be available to certain people or groups
DBMS verifies password and checks a users permissions when they try to
Retrieve data Modify data Modify the database structure
Privileges in SQL
GRANT ON TO [WITH <privileges> <object> <users> GRANT OPTION] <users> is a list of user names or PUBLIC <object> is the name of a table or view (later)
<privileges> is a list of SELECT <columns>, INSERT <columns>, DELETE, and UPDATE <columns>, or simply ALL
WITH GRANT OPTION means that the users can pass their privileges on to others
Privileges Examples
GRANT ALL ON Employee TO Manager WITH GRANT OPTION The user Manager can do anything to the Employee table, and can allow other users to do the same (by using GRANT statements) GRANT SELECT, UPDATE(Salary) ON Employee TO Finance The user Finance can view the entire Employee table, and can change Salary values, but cannot change other values or pass on their privilege
Removing Privileges
If you want to remove a privilege you have granted you use
REVOKE <privileges> ON <object> FROM <users>
If a user has the same privilege from other users then they keep it All privileges dependent on the revoked one are also revoked
Removing Privileges
Example
Admin grants ALL privileges to Manager, and SELECT to Finance with grant option Manager grants ALL to Personnel Finance grants SELECT to Personnel
SELECT
Admin
ALL
Finance
SELECT
Manager
ALL
Personnel
Removing Privileges
Manager revokes ALL from Personnel
Personnel still has SELECT privileges from Finance
SELECT
Admin
ALL
Finance
SELECT
Manager
ALL
Personnel
Views
Privileges work at the level of tables
You can restrict access by column You cannot restrict access by row
Creating Views
CREATE VIEW <name> AS <select stmt> <name> is the name of the new view <select stmt> is a query that returns the rows and columns of the view Example
We want each user to be able to view the names and phone numbers (only) of those employees in their own department
View Example
Example
We want each user to be able to view the names and phone numbers (only) of those employees in their own department In Oracle, you can refer to the current user as USER Employee ID E158 E159 E160 Name Phone Department Mark Mary Jane x6387 Accounts x6387 Marketing x6387 Marketing Salary 15,000 15,000 15,000
Database Tuning
Overview
After ER design, schema refinement, and the definition of views, we have the conceptual and external schemas for our database. The next step is to choose indexes, make clustering decisions, and to refine the conceptual and external schemas (if necessary) to meet performance goals. We must begin by understanding the workload:
The most important queries and how often they arise. The most important updates and how often they arise. The desired performance for these queries and updates.
Which relations does it access? Which attributes are retrieved? Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?
Which attributes are involved in selection/join conditions? How selective are these conditions likely to be? The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.
Decisions to Make
What indexes should we create?
Which relations should have indexes? What field(s) should be the search key? Should we build several indexes?
Clustered? Hash/tree? Dynamic/static? Dense/sparse? Consider alternative normalized schemas? (Remember, there are many choices in decomposing into BCNF, etc.) Should we ``undo some decomposition steps and settle for a lower normal form? (Denormalization.) Horizontal partitioning, replication, views ...
Choice of Indexes
One approach: consider the most important queries in turn. Consider the best plan using the current indexes, and see if a better plan is possible with an additional index. If so, create it. Before creating an index, must also consider the impact on updates in the workload!
Trade-off: indexes can make queries go faster, updates slower. Require disk space, too.
Exact match condition suggests hash index. Range query suggests tree index.
Clustering is especially useful for range queries, although it can help on equality queries as well in the presence of duplicates.
Try to choose indexes that benefit as many queries as possible. Since only one index can be clustered per relation, choose it based on important queries that would benefit the most from clustering.
If range selections are involved, order of attributes should be carefully chosen to match the range ordering. Such indexes can sometimes enable index-only strategies for important queries.
For index-only strategies, clustering is not important!
Example 1
SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname=Toy AND E.dno=D.dno
Hash index on E.dno allows us to get matching (inner) Emp tuples for each selected (outer) Dept tuple. What if WHERE included: `` ... AND E.age=25 ?
Could retrieve Emp tuples using index on E.age, then join with Dept tuples satisfying dname selection. Comparable to strategy that used E.dno index. So, if E.age index is already created, this query provides much less motivation for adding an E.dno index.
Example 2
SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE E.sal BETWEEN 10000 AND 20000 AND E.hobby=Stamps AND E.dno=D.dno
Suggests that we build a hash index on D.dno. B+ tree on E.sal could be used, OR an index on E.hobby could be used. Only one of these is needed, and which is better depends upon the selectivity of the conditions.
As a rule of thumb, equality selections more selective than range selections.
As both examples indicate, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query. Have to understand optimizers!
Examples of Clustering
B+ tree index on E.age can be used to get qualifying tuples.
How selective is the condition? Is the index clustered? Consider the GROUP BY query. If many tuples have E.age > 10, using E.age index and sorting the retrieved tuples may be costly. Clustered E.dno index may be better!
SELECT E.dno, COUNT (*) FROM Emp E WHERE E.age>10 GROUP BY E.dno SELECT E.dno FROM Emp E WHERE E.hobby=Stamps
If many employees collect stamps, Sort-Merge join may be worth considering. A clustered index on D.dno would help.
Such indexes also called composite or concatenated indexes. Choice of index key orthogonal to clustering etc. Clustered tree index on <age,sal> or <sal,age> is best. Clustered <age,sal> index much better than <sal,age> index!
Index-Only Plans
<E.dno>
SELECT D.mgr, E.eid A number of <E.dno,E.eid> FROM Dept D, Emp E queries can Tree index! WHERE D.dno=E.dno be answered SELECT E.dno, COUNT(*) without <E.dno> FROM Emp E retrieving any GROUP BY E.dno tuples from one or more SELECT E.dno, MIN(E.sal) <E.dno,E.sal> FROM Emp E of the Tree index! GROUP BY E.dno relations involved if a <E. age,E.sal> SELECT AVG(E.sal) suitable index FROM Emp E or is available. <E.sal, E.age> WHERE E.age=25 AND
Tree!
We may settle for a 3NF schema rather than BCNF. Workload may influence the choice we make in decomposing a relation into 3NF or BCNF. We may further decompose a BCNF schema! We might denormalize (i.e., undo a decomposition step), or we might add fields to a relation. We might consider horizontal decompositions.
If such changes are made after a database is in use, called schema evolution; might want to mask some of these changes from applications by defining views.
Example Schemas
Contracts (Cid, Sid, Jid, Did, Pid, Qty, Val) Depts (Did, Budget, Report) Suppliers (Sid, Address) Parts (Pid, Cost) Projects (Jid, Mgr)
We will concentrate on Contracts, denoted as CSJDPQV. The following integrity constraints are given to hold: JP C, SD P, C is the primary key.
What are the candidate keys for CSJDPQV? What normal form is this relation schema in?
Lossless decomposition, but not dependency-preserving. Adding CJP makes it dependency-preserving as well. Find the number of copies Q of part P ordered in contract C. Requires a join on the decomposed schema, but can be answered by a scan of the original relation CSJDPQV. Could lead us to settle for the 3NF schema CSJDPQV.
Denormalization
Suppose that the following query is important:
We might choose to modify Contracts despite this if the query is sufficiently important, and we cannot obtain adequate performance otherwise (i.e., by adding indexes or by choosing an alternative 3NF schema.)
Choice of Decompositions
There are 2 ways to decompose CSJDPQV into BCNF:
SDP and CSJDQV; lossless-join but not dep-preserving. SDP, CSJDQV and CJP; dep-preserving as well.
CREATE ASSERTION CheckDep CHECK 2nd decomposition: Index on JP on relation CJP. ( NOT EXISTS 1st: ( SELECT * FROM PartInfo P, ContractInfo C WHERE P.sid=C.sid AND P.did=C.did GROUP BY C.jid, P.pid HAVING COUNT (C.cid) > 1 ))
Begin by decomposing it into SPQV and CSJDPQ. Then, decompose CSJDPQ (not in 3NF) into SDP, CSJDQ. This gives us the lossless-join decomp: SPQV, SDP, CSJDQ. To preserve JP C, we can add CJP, as before.
Find the contracts held by supplier S. Find the contracts that department D is involved in.
Decomposing CSJDQV further into CS, CD and CJQV could speed up these queries. (Why?) On the other hand, the following query is slower:
Horizontal Decompositions
Our definition of decomposition, so far: Relation is replaced by a collection of relations that are projections. This is vertical decomposition. Most important case. Sometimes, might want to replace relation by a collection of relations that are selections. This is horizontal decomposition.
Each new relation has same schema as the original, but a subset of the rows. Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.
Performs like index on such queries, but no index overhead. Can build clustered indexes on other attributes, in addition!
The replacement of Contracts by LargeContracts and SmallContracts can be masked by the view. However, queries with the condition val>10000 must be asked wrt LargeContracts for efficient execution: so users concerned with performance have to be aware of the change.
Selections involving null values. Selections involving arithmetic or string expressions. Selections involving OR conditions. Lack of evaluation features like index-only strategies or certain join methods or poor size estimation.
Check the plan that is being used! Then adjust the choice of indexes or rewrite the query/view.
Guideline: Use only one query block, if possible. SELECT DISTINCT * SELECT DISTINCT S.*
FROM Sailors S WHERE S.sname IN (SELECT Y.sname FROM YoungSailors Y)
SELECT * FROM Sailors S WHERE S.sname IN (SELECT DISTINCT Y.sname FROM YoungSailors Y)
convert. Subqueries inside OR: Hard to convert. ALL subqueries: Hard to convert.
Aggregates in subqueries: Tricky. Good news: Some systems now rewrite under the covers (e.g. DB2).
Consider DBMS use of index when writing arithmetic expressions: E.age=2*D.age will benefit from index on E.age, but might not benefit from index on D.age!
SELECT * INTO Temp SELECT T.dno, AVG(T.sal) FROM Emp E, Dept D vs. and FROM Temp T WHERE E.dno=D.dno GROUP BY T.dno AND D.mgrname=Joe
Does not materialize the intermediate reln Temp. If there is a dense B+ tree index on <dno, sal>, an index-only plan can be used to avoid retrieving Emp tuples in the first query!
Summary (Design)
Database design consists of several tasks: requirements analysis, conceptual design, schema refinement, physical design and tuning.
In general, have to go back and forth between these tasks to refine a database design, and decisions in one task can influence the choices in another task.
Understanding the nature of the workload for the application, and the performance goals, is essential to developing a good design.
What are the important queries and updates? What attributes/relations are involved?
Index maintenance overhead on updates to key fields. Choose indexes that can help many queries, if possible. Build indexes to support index-only strategies. Clustering is an important decision; only one index on a given relation can be clustered! Order of fields in composite index key can be important.
Static indexes may have to be periodically re-built. Statistics have to be periodically updated.
Summary (Tuning)
The conceptual schema should be refined by considering performance criteria and workload:
May choose 3NF or lower normal form over BCNF. May choose among alternative decompositions into BCNF (or 3NF) based upon the workload. May denormalize, or undo some decompositions. May decompose a BCNF relation further! May choose a horizontal decomposition of a relation. Importance of dependency-preservation based upon the dependency to be preserved, and the cost of the IC check.
Can add a relation to ensure dep-preservation (for 3NF, not BCNF!); or else, can check dependency using a join.
Should determine the plan used by the system, and adjust the choice of indexes appropriately.
Only left-deep plans considered! Null values, arithmetic conditions, string expressions, the use of ORs, etc. can confuse an optimizer.
Avoid nested queries, temporary relations, complex conditions, and operations like DISTINCT and GROUP BY.
Temporal Databases
Applications of temporal db
There are many examples of applications where some aspect of time is needed to maintain the information in a DB. Health care: patient histories need to be maintained Insurance: claims and accident histories are required Finance: stock price histories need to be maintained. Personnel management: salary and position history need to be maintained Banking: credit histories
Introduction
Temporal database: a database that contains historical data as well as current data.
Note: historical is a misleading term temporal databases may contain data regarding the future as well as the past.
Extreme case: data is only inserted, never deleted from a temporal database (eg. vehicle position data in the project). So far, we have studied the other extreme - i.e. snapshot databases. 77
Introduction
Temporal data: encoded representation of timestamped facts. Each tuple must include at least one timestamp.
Problem:What about queries that produce results that are not temporal? i.e. result of query is outside the domain of (temporal) database. eg. Get names of all people who have supplied something in the past.
Redefine temporal database: database that includes, but is not limited to, temporal data.
78
Motivation
Queries on time-varying data are difficult to express in SQL. Temporal databases provide build-in support for recording and querying such information. It is possible to use SQL to evaluate these queries, but performance is poor.
79
Motivation
Most applications manage temporal data. If a temporal database is used for such data:
Schemas, including integrity constraints are simpler. Queries are simpler
Applications
Most applications of database technology are temporal in nature: Financial apps.: portfolio management, accounting & banking, stock market analysis, audit analysis
Record-keeping apps.: personnel, medical records, inventory management, legal records (commercial laws change frequently)
Data Warehousing: historical trends for analysis Scheduling apps.: airline, car, hotel reservations and project management Scientific apps.: weather monitoring, chemical process monitoring 81
Intervals
An interval [s,e] is a set of times from time s to time e. Does interval [s,e] represent an infinite set? Assumption: Timeline is a finite sequence of discrete, indivisible time quanta. Time Quanta: smallest unit of time system can represent. Timepoints/point: time unit considered indivisible for our purpose. An interval is treated as a single type, not as pair of separate values. Interval can be open/closed w.r.t. start point/end point. eg. [d04,d10],[d04,d11),(d03,d10],(d03,d11) all represent the sequence of days from day4 to day10 inclusive.
82
Operators on Intervals
Temporal predicate operators: i1 = [s1,e1]; i2 = [s2,e2] i1 BEFORE i2 (e1<s2) i1 MEETS i2 (s2 = e1) i1 EQUALS i2 (s1 = s2 AND e1 = e2) i1 OVERLAPS i2 (s2 < s1 < e2 OR s1 < s2 < e1)
i1 i1
i2 i2
i1
i2 i1 i2
83
Operators on Intervals
i1
i1 DURING i2 (s2 < s1 AND e2 > e1 ) i1 STARTS i2 (s1 = s2 AND e1 < e2) i1 FINISHES i2 (e1 = e2 AND s1 > s2) Additional operators: i1 MERGES i2: i1 CONTAINS i2: (i1 MEETS i2 OR i1 OVERLAPS i2) (i2 DURING i1)
84
i2 i1 i2 i1 i2
Aggregate Operators
EXPAND(X): Where X is a set. The output is also a set. Used to generate time quantum intervals. The expanded form of X is the set of all intervals of the form [p,p] where p is a time point in some interval in X. e.g.: X1 = { [d01,d01],[d03,d05],[d04,d06] } X2 = { [d01,dp1],[d03,d04],[d05,d05],[d05,d06] } X3 = { [d01,d01],[d03,d03],[d04,d04],[d05,d05],[d06,d06] } Then EXPAND(X1) = EXPAND(X2) = X3
86
Aggregate Operators
COLLAPSE(X): The collapsed form of X is the set Y of intervals of the same type such that (a) X & Y have the same unfolded form. (b) no two distinct members i1 and i2 of Y are such that (i1 MERGES i2) is true. e.g.: X1 = { [d01,d01],[d03,d05],[d04,d06] } X2 = { [d01,d01],[d03,d04],[d05,d05],[d05,d06] } X3 = { [d01,d01],[d03,d06] } Then COLLAPSE (X1) = COLLAPSE (X2) = X3
87
PACK r on A: groups the relation r by all its attributes apart from A This is equivalent to WITH ( r GROUP {A} AS X ) AS R1 ( EXTEND R1 ADD COLLAPSE (X) AS Y ) {ALL BUT X } AS R2 : R2 UNGROUP Y
UNPACK r on A: Replace COLLAPSE with EXPAND in PACK.
88
Example
Given two temporal relations: S: Supplier S# was under contract during the interval During SP: Supplier S# was able to supply During During part P# during S theS# interval
S1 S2 S2 S3 S4 S5 [d04,d10] [d02,d04] [d07,d10] [d03,d10] [d04,d10] [d02,d10]
SP
S2 P5 [d09,d10]
S3 P1 [d08,d10] S4 P2 [d06,d09] S4 P5 [d04,d08] S4 P7 [d05,d10]
89
Example 1
Active supplier intervals: Get S#-DURING pairs for suppliers who have been able to supply at least one part during at least one interval of time, where DURING designates such an interval. PACK SP {S#,DURING} ON DURING
SP
S1 P3 [d09,d10]
S1 P5 [d06,d10] S2 P1 [d02,d04] S2 P9 [d03,d03] S2 P1 [d08,d10]
RESULTS#
S1 S2 S2 S3 S4
S2 P5 [d09,d10]
S3 P1 [d08,d10] S4 P2 [d06,d09] S4 P5 [d04,d08] S4 P7 [d05,d10]
90
Example 2
Inactive (passive) supplier intervals: Get S#DURING pairs for suppliers who have been unable to supply any parts at all during at least S# During RESULT one interval of time, where DURING designates S2 [d07,d07] such an interval.
S3 [d03,d07] [d02,d10]
S5
91
Temporal Databases
1. VALID-TIME TEMPORAL DATA MODEL 2. TIME NORMALIZATION 3. TEMPORAL QUERY LANGUAGE
2. The SQL query language provides very limited support for expressing temporal queries. Therefore, applications that work with complex temporal data should define their own (1) temporal models and (2) query systems.
TIME NORMALIZATION This section defines different types of synchronism among time-varying attributes. It is valid to maintain synchronous attributes in a single relation. We define the concept of temporal dependence, which is used to define the notion of time normalization. Synchronism and Temporal Dependence A set of time-varying attributes (TAVs) in a given relation is called synchronous if every TVA can be uniformly associated with and be directly applied to the timestamp values in each tuple of the relation.
Example 1: The Employee Relation. Here, an employee gets a raise in salary if and only if he or she gets a promotion, and an employee is never demoted. Thus, the Salary and Position form a set of synchronous attributes.
Empno
33 33
Salary
20K 25K
Position
Typist Secretary
TS
12 25
TE
24 35
45 45
27K 30K
Jr Engr Sr Engr
28 38
37 42
Example 2: The relation Maintenance. All time-varying attributes Part, Cond, Place and Cost collectively describe the maintenance event. These TVAs form a quasi synchronous set.
Plane# 91
Part Wheel
Cond Detached
Place Atlanta
Cost 1000
TS 10
TE 20
105
105 142
Door
Door Wing
Broken
Unhinged Cracked
N.Y.
L.A. Boston
2000
2500 7000
35
35 60
47
62 72
52
18K
Smith
52
52 52 52 52
20K
25K 25K 31K 31K
Smith
Smith Jones Jones Smith
10
21 30 39 43
20
29 38 42 47
52 97
97
38K 30K
35K
Smith Bradford
Bradford
48 12
18
Now 17
Now
Consider the relation Sal-Mgr. The relation shows the manager and salary of employees over a period of time. In this relation, the attributes Salary and Manager form two singleton synchronous. They change in an asynchronous fashion. Such asynchronism leads to the fragmentation of the lifespan information of a TVA over several tuples and create update and retrieval anomalies.
Definition (Temporal dependence). Let R be a time-varying relation, where K is its temporal invariant key, and let Xi, for i [1,n], be its TVAs and TS and TE be its timestamp attributes. In a relational schema R, for any two TVAs Xi and Xj (i != j), R is said to have a temporal dependency, Xi T Xj, iff there exists an instance of R such that it contain 2 tuples t1 and t2 such that: t1(K) = t2(K) t1(Xi) = t2(Xi) XOR t1(Xj) = t2(Xj) intervals [t1(TS),t1(TE)] and [t2(TS), t2(TE)] are adjacent.
In Sal-Mgr, the attributes Salary and Manager, according to the above definition, have a temporal dependency (consider two tuples <52, 18K, Smith, 5, 9> and <52, 20K, Smith, 10, 20> or two tuples <52, 25K, Smith, 21, 29> and <52, 25K, Jones, 30, 38>).
Temporal dependency arise when two or more temporally unrelated facts are mixed in one timevarying relation.
The formal syntax of a TSQL retrieval statement: SELECT [FIRST| SECOND|THIRD| Nth |LAST] select_item_list FROM table_name_list WHEN temporal_comparison_list WHERE search_condition_list
Example Database TSQL will be illustrated by examples on a database with the following relational schema: E(eno, name, address, date-of-birth) S(eno, salr, TS, TE) M(eno, mgr, TS, TE) T(eno, city, country, cost, TS, TE) E stands for Employee, S for Salary, and M for Manager, T for travel.
Assumption: well-defined tables Assume that the valid time component in temporal table(s) must be well-defined before performing the operation. That means temporal tables do not contain tuples with the same non-temporal attribute values but overlapping or consecutive time intervals. Such tuples are automatically folded in advance by merging their time intervals.
Temporal projection Temporal projection is similar to standard projection, except that the restriction applies to only the non-temporal attributes. Both timestamp columns cannot be excluded in the resultant history. After temporal projection, folding is enforced in order that adjoining intervals should be merged into a single interval in the resultant relation.
Temporal selection
TSQL adds the following new construct to standard SQL: selection based on temporal comparisons of timepoints and intervals using terms in a WHEN clause. The WHEN clause is used to express the temporal part of a query. The temporal comparison in the WHEN clause has the following form: WHEN a interval_compare_operator b where a,b are intervals and interval_compare_operator can be one of the keywords: BEFORE, AFTER, DURING, EQUIVALENT, ADJACENT, OVERLAPS, PRECEDES, and FOLLOWS.
Nontemporal ER Schema
Strong Entity Types Weak Entity Types Entity Type Identifiers (Key Attributes) Attributes Relationship Types Integrity Constraints
Entity Lifespans Entities have a lifespan denoting when they existed. Entities are instantaneous or have a lifespan with a duration. If the entities of an entity type exist for all of time, there may be no need to record the lifespan explicitly. (They are nontemporal). Otherwise, the entity types are temporal. In this case, the designer should also specify the granularity of the lifespan.
instant
period (represented with two instants)
In short, for tables corresponding to entity and relationship types for which valid time is to be recorded, add either - a single instant timestamp column or - a period timestamp, represented with two instant timestamp columns.
In short, we should decompose tables so that all attributes of a table have an identical temporal support and precision.
Spatial Databases
Outline:
a rather old (but quite complete) survey on Spatial DBMS
Introduction & definition Modeling Querying Data structures & algorithms System architecture
Introduction
A common technology for some Applications:
GIS (geographic/geo-referenced data) VLSI design (geometric data) modeling complex phenomena (spatial data)
All need to manage large collections of relatively simple spatial objects Spatial DB vs. Image/pictorial DB [1990]
Spatial DB contains objects in the space Image DB contains representations of a space (images, pictures, : raster data)
SDBMS Definition
A spatial database system: Is a database system
A DBMS with additional capabilities for handling spatial data
Offers spatial data types (SDTs) in its data model and query language
Structure in space: e.g., POINT, LINE, REGION Relationships among them: (l intersects r)
Modeling
Assume 2-D and GIS application, two basic things need to be represented: Objects in space: cities, forests, or rivers single objects Coverage/Field: say something about every point in space (e.g., partitions, thematic maps)
Modeling: coverages
Partition: set of region objects that are required to be disjoint (adjacency or region objects with common boundaries), e.g. thematic maps Networks: embedded graph in plane consisting of set of points (vertices) and lines (edges) objects, e.g. highways, power supply lines, rivers
Querying
Two main issues:
1. Connecting the operations of a spatial algebra (including predicates for spatial relationships) to the facilities of a DBMS query language. Fundamental spatial algebra operator are:
Spatial selection Spatial join (overlay, fusion)
2. Providing graphical presentation of spatial data (i.e. results of queries), and graphical input of SDT values used in queries.