You are on page 1of 170

System Databases

Introduction, Features

THE RELATIONAL DATABASE


MODEL
There are many types of databases
The relational database model is the most
popular
Relational database uses a series of logically
related two-dimensional tables or files to
store information in the form of a database

Databases Are

Collections of information
Created with logical structures
With logical ties within the information
With built-in integrity constraints

Databases Collections of
Information
Databases have many tables
Consider Solomon Enterprises that provides
concrete to home and commercial builders.
Tables or files include:
Order
Customer
Concrete Type
Employee
Truck

Databases Collections of
Information

3-5

Databases Created with


Logical Structures
In databases, the row number is irrelevant
Not true in spreadsheet software
In databases, column names are very
important. Column names are created in the
data dictionary
Data dictionary contains the logical
structure of the information in a database

Databases With Logical Ties


Within the Information
Logical ties must exist between the tables or
files in a database
Logical ties are created with primary and
foreign keys
Primary key field (or group of fields in some
cases) that uniquely describes each record

Databases With Logical Ties


Within the Information
Foreign key primary key of one file that
appears in another file
Foreign keys help you create logical ties within
the information in a database

Databases With Logical Ties


Within the Information

Databases With Built-In


Integrity Constraints
Integrity constraints rules that help ensure
the quality of the information
Examples
Primary keys must be unique
Foreign keys must be present
Sales price cannot be negative
Phone number must have area code

DATABASE MANAGEMENT
SYSTEM TOOLS
Database management system (DBMS)
helps you specify the logical organization for a
databases and access and use the information
within a database
Word processing software = document
Spreadsheet software = workbook
DBMS software = database

DATABASE MANAGEMENT
SYSTEM TOOLS

5 software components:
1.
2.
3.
4.
5.

DBMS engine
Data definition subsystem
Data manipulation subsystem
Application generation subsystem
Data administration subsystem

DATABASE MANAGEMENT
SYSTEM TOOLS

DBMS Engine
DBMS engine accepts logical requests from
the various other DBMS subsystems, converts
them into their physical equivalent, and
actually accesses the database and data
dictionary as they exist on a storage device
DBMS engine separates the logical from the
physical

DBMS Engine
Physical view how information is physically
arranged, stored, and accessed on some type
of storage device
Logical view how you as a knowledge
worker need to arrange and access
information
With a database, you only concern yourself
with your logical view

Data Definition Subsystem


Data definition subsystem helps you create
and maintain the data dictionary and define
the structure of the files in a database
You must create a data dictionary before
entering information into a database
Module J covers this for Microsoft Access

Data Manipulation Subsystem


Data manipulation subsystem helps you
add, change, and delete information
This is your primary DBMS interface as you
work with a database
Views
Report generators
QBE tools
SQL

Views
View allows you to see the contents of a
database file
Make whatever changes you want
Perform simple sorting
Query to find the location of information
Looks similar to a workbook with no row numbers

Views

Report Generators
Report generator helps you quickly define
formats of reports and what information you
want to see in a report
You can save report formats and generate
reports at any time with up-to-date
information

Report Generators

Report Generators

QBE Tools
Query-by-example (QBE) tool helps you
graphically design the answer to a question
What driver most often delivers concrete to
Triple A Homes?

QBE Tools

3-24

SQL
Structured query language (SQL)
standardized fourth-generation language
found in most DBMSs
Performs the same task as a QBE tool
But uses a sentence structure instead of pointand-click interface

SQL is used mostly by IT people

Application Generation
Subsystem
Application generation subsystem contains
facilities to help you develop transactionintensive applications
Data entry screen (called forms)
Programming languages

Used mostly by IT specialists

Data Administration Subsystem

Data administration subsystem helps you


manage the overall database environment
Backup and recovery
Security management
Query optimization
Concurrency control
Change management

Data Administration Subsystem


Backup and recovery
Periodically back up information
Recover a database if a failure occurs

Security management
Who has access to what information
Who can perform certain tasks (e.g., add, change,
or delete) on information

Data Administration Subsystem


Query optimization
Restructure physical view of information to
optimize response times to queries

Concurrency control
What happens if two people makes changes to the
same information at the same time?

Data Administration Subsystem


Change management
What is the effect of structural changes to a
database?
What if you add a new column?
What happens if you delete a column?
What happens if you change a columns
attributes?

DATA WAREHOUSES AND DATA


MINING
Data warehouses support OLAP and decision
making
Data warehouses do not support OLTP
Data-mining tools are the tools you use to
work with a data warehouse
DBMS software = database
Data-mining tools = data warehouse

What Is a Data Warehouse?


Data warehouse logical collection of
information gathered from operational
databases used to create business
intelligence that supports business analysis
activities and decision-making tasks

What Is a Data Warehouse?

What Is a Data Warehouse?

Multidimensional
Rows and columns
Also layers
Many times called hypercubes
What are the dimensions in Figure 3.8 on page
142?

What Are Data-Mining Tools?


Data-mining tools software tools that you
use to query information in a data warehouse
Query-and-reporting tools
Intelligence agents
Multidimensional analysis tools
Statistical tools

What Are Data-Mining Tools?

Query-And-Reporting Tools
Query-and-reporting tools similar to QBE
tools, SQL, and report generators in the typical
database environment

Intelligent Agents
Use various artificial intelligence tools such as
neural networks and fuzzy logic to form the
basis for information discovery and building
business intelligence
Help you find hidden patterns in information
Chapter 4 focuses more on these

Multidimensional Analysis
Tools
Multidimensional analysis (MDA) tools
slice-and-dice techniques that allow you to
view multidimensional information from
different perspectives
Bring new layers to the front
Reorganize rows and columns

Statistical Tools
Help you apply various mathematical models
to the information stored in a data warehouse
to discover new information
Regression
Analysis of variance
And so on

MANAGING THE INFORMATION


RESOURCE
Information is an organizational resource
Just like people, capital, and equipment
It must be managed effectively

MANAGING THE INFORMATION


RESOURCE
Who should oversee your organizations
information resource?
Chief information officer (CIO) oversees an
organizations information resource
Data administration plans for, oversees the
development of, and monitors the information
resource
Database administration technical and
operational aspects of managing information

MANAGING THE INFORMATION


RESOURCE
Is information ownership a consideration?
If you create information, you own it
You will also share it with others
Because you own it, you are responsible for its
quality

MANAGING THE INFORMATION


RESOURCE
How clean must your information be?
Duplicate information (records) must be
eliminated
Inaccurate information must be corrected
Information forms the basis of business
intelligence
If your business intelligence is bad, you will make
poor decisions

Files and Databases


File: A collection of records or documents
dealing with one organization, person, area or
subject (Rowley)
Manual (paper) files
Computer files

Database: A collection of similar records with


relationships between the records (Rowley)
Bibliographic, statistical, business data, images,
etc.

Introducing the Database


Major Database Concepts
Data and information
Data - Raw facts
Information - Processed data

Data management
Database
Metadata
Database management system (DBMS)

Data Storage Requirements


Sensor data

Temperature (15 @ 8 Kbps)


Humidity (15 @ 8 Kbps)
Gas (15 @ 8 Kbps)
Light (15 @ 8 Kbps)
Motion (15 @ 8 Kbps)
Pressure (100 @ 8 Kbps)
Microphone (15 @ 500 Kbps)
Camera (15 @ 10 Mbps)

Data Storage Requirements


User data
Multimedia

Phone messages/conversations (500 Kbps 10 Mbps)


Music (500 Kbps)
TV/Radio broadcasts (500 Kbps 10 Mbps)
Home movies (10 Mbps)
Images

Computer
Programs
Data files
Operating systems

Data Storage Issues


Issues
Query frequency and type
Sampling/recording rates
205 sensors (158,900 Kbps)
Multimedia recordings

Simultaneous playback
Analysis, prediction, decision-making queries

Transaction granularity
Historical data, decay
Security and privacy
Centralized vs. distributed

What Data to Store


Type of Data
Raw data
Pre-processed
Compressed

Frequency of Data Storage for Sensor


Data
Tradeoff between precision and quantity

General Concepts
Database definition
Organized collection of logically related data

Data
Known facts
Types: text, graphics, images, sound, videos

Database management system (DBMS)


Software package for defining and managing a
database

Database
What is a database ?
A collection of files storing related data

Give examples of databases


Accounts database; payroll database; UWs
students database; Amazons products
database; airline reservation database

Database Examples

Class roster
Hospital patients
Literature (published articles in a certain field)
Genomic information
Protein structure
Taxonomy
Single nucleotide polymorphism

Database Types (cont.)


Type

Typical number of
users

Typical
architecture

Typical size

Personal

Desktop/Laptop/
PDA

MB

Workgroup

5-25

Client/server:2 tier MB-GB

Department

25-100

Client/server:3 tier GB

Enterprise

>100

Client/server:
distributed

GB-TB

Internet

>1000

Web sever &


application
servers

MB-GB

Flat Files
Characteristics:
Data is stored as records in regular files
Records usually have a simple structure and fixed
number of fields
For fast access may support indexing of fields in the
records
No mechanisms for relating data between files
One needs special programs in order to access and
manipulate the data

Relational Database
Characteristics:
Data is organized into tables: rows & columns
Each row represents an instance of an entity
Each column represents an attribute of an entity
Metadata describes each table column
Relationships between entities are represented by
values stored in the columns of the corresponding
tables (keys)
Accessible through Standard Query Language (SQL)

Enterprise data model


Graphical representation of the high level entities
Example: Microbial database
each organism has multiple corresponding genes
One:Many relation

1
Organism

m
Gene

Database Types / Data Models

Relational
OO
Hybrid (Object-Relational)
Temporal
Deductive
Others
Spatial,

Example Data Representations


Relational
We all knowflat tables of atomic attributes with foreign
key relationships

OO
Complex data reps
multivalued, composite

Temporal
Relational model: add valid start, end dates to each table
(versions of info and when valid)
Includes time, events, durations

Operations
DDL/DML (data def/manip languages)
SQL
OQL

Update operations
Built-in insert, delete, update
Stored procedures for triggers, active (ECA) rules

Example Operations for


Temporal Databases
INCLUDES
Rows valid in a certain time period

BEFORE/AFTER a time condition


Set operations
Union, intersection of 2 time periods

Active DB
Event-Condition-Action rules
Allow for decisions to be made in the database instead
of a separate application

Relational
Implemented as triggers

Challenges
Rule consistency
(2+ rules do not contradict)

Guaranteed termination
Trigger loops (T1 <->T2)

Metadata
Data that describes the properties or
characteristics of other data
Does not include sample data
Allows database designers and users to
understand the meaning of the data

Metadata & Data Table


Organism
Name

Type

Max Length

Description

Name

Alphanumeric

100

Organism name

Size

Integer

10

Genome length (bases)

Gc

Float

Percent GC

Accession

Alphanumeric

10

Accession number

Release

Date

Release date

Center

Alphanumeric

100

Genome center name

Sequence

Alphanumeric

Variable

Sequence

Name

Size

Gc

Accession

Release

Center

Sequence

Escherichia coli K12

4,640,000

50

NC_000913

09/05/1997

Univ.
Wisconsin

AGCTTTTC
ATT

Streptococcus
pneumoniae R6

2,040,000

40

NC_003098

09/07/2001

Eli Lilly and


Company

TTGAAAGA
AAA

Metadata & Data Table (cont.)


Gene
Name

Type

Max Length

Description

Name

Alphanumeric

100

Gene name

Accession

Alphanumeric

10

Gene accession number

OAccesion

Alphanumeric

10

Organism accession number

Start

Integer

10

Gene start

End

Integer

10

Gene end

Strand

Character

Gene strand

Product

Alphanumeric

1000

Gene annotation

Sequence

Alphanumeric

Variable

Gene sequence

Name

Accession

OAccession

Start

End

Strand

Product

Sequence

thrL

16127995

NC_000913

190

255

the operon leader


peptide

MKRI

thrA

16127996

NC_000913

337

2799

homoserine
dehydrogenase I

MRVL

transposas
e_A

15902058

NC_003098

20207

20554

transposase

MWYN

Relationships

Used to connect tables


Field(s) that have the same value in the related tables
Organism.Accession=Gene.OAccession
Organism.Accession
Unique
Primary key
Gene.OAccession
Not unique
Secondary key

Database Systems
Types of Database Systems
Number of Users
Single-user

Desktop database
Multiuser

Workgroup database
Enterprise database
Scope

Desktop
Workgroup
Enterprise

Database Systems
Types of Database Systems
Location
Centralized
Distributed

Use
Transactional (Production)
Decision support
Data warehouse

Database
A Database is a collection of stored
operational data used by the application
systems of some particular enterprise (C.J.
Date)
Paper Databases
Still contain a large portion of the worlds knowledge

File-Based Data Processing Systems


Early batch processing of (primarily) business data

Database Management Systems (DBMS)

Why DBMS?
History
50s and 60s all applications were custom built for
particular needs
File based
Many similar/duplicative applications dealing with
collections of business data
Early DBMS were extensions of programming
languages
1970 - E.F. Codd and the Relational Model
1979 - Ashton-Tate and first Microcomputer DBMS

File Based Systems


Application
Delivery
List
Coal
Estimation
Just what
asked for

File
Toys
Addresses

Naughty

Nice Toys

From File Systems to DBMS


Problems with file processing systems
Inconsistent data
Inflexibility
Limited data sharing
Poor enforcement of standards
Excessive program maintenance

DBMS Benefits

Minimal data redundancy


Consistency of data
Integration of data
Sharing of data
Ease of application development
Uniform security, privacy, and integrity controls
Data accessibility and responsiveness
Data independence
Reduced program maintenance

Terms and Concepts


Data independence
Physical representation and location of data and
the use of that data are separated
The application doesnt need to know how or where
the database has stored the data, but just how to ask
for it
Moving a database from one DBMS to another should
not have a material effect on application program
Recoding, adding fields, etc. in the database should not
affect applications

Database Environment

CASE
Tools

Repository

User
Interface

DBMS

Application
Programs

Database

Database Components
DBMS
===============

Design tools

Database

Database contains:
Users Data
Metadata
Indexes
Application Metadata

Table Creation
Form Creation
Query Creation
Report Creation
Procedural
language
compiler (4GL)
=============
Run time
Form processor
Query processor
Report Writer
Language Run time

Application
Programs

User
Interface
Applications

Types of Database Systems

PC databases
Centralized database
Client/server databases
Distributed databases
Database models

PC Databases
E.g.:
Access
FoxPro
Dbase
Etc.

Centralized Databases

Central
Computer

Client Server Databases


Client

Client
Network
Database
Server
Client

Distributed Databases
Location C

Location B

computer

computer

computer
Location A

Homogeneous
Databases

Distributed Databases
Client

Heterogeneous
Or Federated
Databases
Database
Server

Remote
Comp.

Local Network
Comm
Server
Client

Remote
Comp.

Introducing the Database


Importance of DBMS
It helps make data management more efficient and effective.
Its query language allows quick answers to ad hoc queries.
It provides end users better access to more and better-managed
data.

It promotes an integrated view of organizations operations -- big


picture.
It reduces the probability of inconsistent data.

The DBMS Manages the Interaction


Between the End User and the Database

Figure 1.2

Introducing the Database


Why Database Design Is Important?
A well-designed database facilitates data management
and becomes a valuable information generator.
A poorly designed database is a breeding ground for
uncontrolled data redundancies.
A poorly designed database generates errors that lead to
bad decisions.

Historical Roots
Why Study File Systems?
It provides historical perspective.
It teaches lessons to avoid pitfalls of data management.
Its simple characteristics facilitate understanding of the design
complexity of a database.
It provides useful knowledge for converting a file system to a
database system.

Contents of the CUSTOMER File

Table 1.1 Basic File Terminology


Data

Raw facts that have little meaning unless they have been
organized in some logical manner. The smallest piece of data
that can be recognized by the computer is a single
character, such as the letter A, the number 5, or some
symbol such as; ? > * +. A single character requires one
byte of computer storage.

Field

A character or group of characters (alphabetic or numeric)


that has a specific meaning. A field might define a telephone
numbers, a birth date, a customer name, a year-to-date
(YTD) sales value, and so on.

Record

A logically connected set of one or more fields that describes


a person, place, or thing. For example, the fields that
comprise a record for a customer named J. D. Rudd might
consist of J. D. Rudds name, address, phone number, date
of birth, credit limit, unpaid balance, and so on.

File

A collection of related records. For example, a file might


contain data about ROBCOR Companys vendors; or, a file
might contain the records for the students currently enrolled
at Gigantic University.

Contents of the AGENT File

A Simple File System

DBMS
Software package for defining and managing a
database.
Examples:
Proprietary: MS Access, MS SQL Server, DB2,
Oracle, Sybase
Open source: MySql, PostgreSQL

DBMS Advantages
Program-data independence
Minimal data redundancy
Improved data consistency & quality
Access control
Transaction control

Improved accessibility & data sharing


Increased productivity of application development
Enforced standards

Web Databases
Data is accessible through Internet
Have different underlying database models
Example: biological databases
Molecular data: NCBI , Swissprot , PDB , GO
Protein interaction : DIP , BIND
Organism specific: Mouse , Worm, Yeast
Literature: Pubmed
Disease

Database Management System


What is a DBMS ?
A big C/C++ program written by someone else that
allows us to manage efficiently a large database and
allows it to persist over long periods of time
Give examples of DBMS
DB2 (IBM), SQL Server (MS), Oracle, Sybase
MySQL, Postgres,

What the Database Systems


Does
1.
2.
3.
4.
5.
6.

Create/store large datasets


Search/query/update
Change the structure
Concurrent access to many user
Recover from crashes
Security

Possible Organizations
Files
Spreadsheets
DBMS

1. Create/store Large Datasets


Files

Yes, but

Spreadsheets
Not really

DBMS
Yes

2. Search/Query/Update
Files

Simple queries (grep);


Updates are difficult

Spreadsheets
Simple queries;
Simple updates

DBMS
All

Updates: generally OK

3. Change the Structure


Add Address to each Actor
Files

Very hard

Spreadsheets
Yes

DBMS
Yes

4. Concurrent Access
Multiple users access/update the data
concurrently
Lost updates; inconsistent reads,

What can go wrong ?


How do we protect against that in OS ?
This is insufficient in databases; why ?

A logical action consists


of multiple updates

locks

5. Recover from crashes


Transfer $100 from account #4662 to #7199:
X = Read(Account, #4662);
X.amount = X.amount - 100;
Write(Account, #4662, X);
CRASH !

Y = Read(Account, #7199);
Y.amount = Y.amount + 100;
Write(Account, #7199, Y);

What is the problem ?

6. Security
Files

File-level
access control

Spreadsheets
Same [?]

DBMS
Table/attributelevel access control

124

Enters a DMBS
Two tier system or client-server

connection
(ODBC, JDBC)

Database server
(someone elses
C program)
Data files

Applications
125

Data Independence
Logical view
Directors:

Movie_Directors:

id

fName

lName

id

mid

15901

Francis Ford

Coppola

15901

130128

...

...
Movies:
mid

Title

Year

130128

The Godfather

1972

...

Directors_file

Moviews_title_index_file

Directors_fname_index_file

Movies_file

Physical view
126

What the Database Systems


Does
1.
2.
3.
4.
5.
6.

Create/store large datasets


Search/query/update
Change the structure
Concurrent access to many user
Recover from crashes
Security
Grant, Revoke, Roles

SQL DML

SQL DDL

Transactions
ACID

Distributed vs. Centralized


Centralized database can produce a bottleneck
Large volume of data input
Large database
Large volume of queries

In distributed databases, data consistency,


replication, and retrieval can be more problematic
Consistency of schemas
Retrieval in case the data location is not known
Communication overhead to ensure database
consistency

Database Systems
Commercial

DB2
Empress
Informix
Oracle
MS Access
MS SQL
Sybase

Free
Berkeley DB
PostgreSQL
MySQL

Relational Database
Definition:
Data stored in tables that are associated by shared
attributes (keys).
Any data element (or entity) can be found in the
database through the name of the table, the
attribute name, and the value of the primary key.

Relational Database Definitions


Entity: Object, Concept or event (subject)
Attribute: a Characteristic of an entity
Row or Record: the specific characteristics of
one entity
Table: a collection of records
Database: a collection of tables

The Relational Database model

Developed by E.F. Codd, C.J. Date (70s)


Table = Entity = Relation
Table row = tuple = instance
Table column = attribute
Table linkage by values
Entity-Relationship Model

The Relational Model


Each attribute has a unique name within an
entity
All entries in the column are examples of it
Each row is unique
Ordering of rows and columns is unimportant
Each position (tuple) is limited to a single
entry.

Data Model: Whats a model?


A data model is a representation of reality
Its used to define the storage and
manipulation of a data base.
Data Models have two components:
Structure: the structure of the data stored within
Operations: Facilities for manipulation of the data.

Relational Database Systems


Most popular DBMS model for GIS
Flexible approach to linkages between records
comes the closest to modeling the complexity
of spatial relationships between objects.

Database Tables
Tables represent entities
Tables are always named in the singular, such
as: Vehicle, Order, Grade, etc.
Tables in database jargon are flat files, dBase
or Spreadsheet like..

Attributes
Characteristics of an entity
Examples:
Vehicle (VIN, color, make, model, mileage)
Student (SSN, Fname, Lname, Address)
Fishing License (Type, Start_date, End_date)

Database Table Example

Figure 1: A simple and flawed table design.

Figure 2: An improved database table..

Database Views
A View is an individuals picture of a database.
It can be composed of many tables,
unbeknownst to the user.
Its a simplification of a complex data model
It provides a measure of database security
Views are useful, primarily for READ-only users
and are not always safe for CREATE, UPDATE, and
DELETE.

Table Indexing
An Index is a means of expediting the retrieval
of data.
Indexes are built on a column(s).
Indexes occupy disk space; occasionally a lot.
Indexes arent technically necessary for
operation and must be maintained by the
database administrator.

Database Table Keys


Definition:
A key of a relation is a subset of attributes with the
following attributes:
Unique identification
Non-redundancy

Types of Keys
PRIMARY KEY
Serves as the row level addressing mechanism in the relational
database model.
It can be formed through the combination of several items.

FOREIGN KEY
A column or set of columns within a table that are required to
match those of a primary key of a second table.

These keys are used to form a RELATIONAL JOIN - thereby


connecting row to row across the individual tables.

Relational Database Management


System (RDBMS)
Table A
Name

Address

Parcel #

John Smith
18 Lawyers Dr. 756554
T. Brown 14 Summers Tr. 887419

Table B
Parcel # Assessed Value
887419 152,000
446397 100,000

Database Keys
Primary Key - Indicates uniqueness within
records or rows in a table.
Foreign Key - the primary key from another
table, this is the only way join relationships
can be established.
There may also be alternate or secondary keys
within a table.

Constructing Join Relationships


One-to-many relationships include the
Primary Key of the one table and a Foreign
Key (FK) in the many table.

Other common terms


Cardinality: one-to-one, one-to-many, manyto-many relationships
Optionality: the relationship is either
mandatory or optional.

Ensuring Database Integrity


Database integrity involves the maintenance
of the logical and business rules of the
database.
There are two kinds of DB Integrity that
must be addressed:
Entity Integrity
Referential Integrity

Strategies for managing


Integrity

You could ignore it, but it costs you time.


Place the Burden on your customer or user.
Have the programmers fix the problem
Place the burden on the Database
Management System (DBMS)
Temporal integrity is one of the key challenges
of Address Database management.

Entity Integrity
Entity integrity deals
with within-entity
rules.
These rules deal with
ranges and the
permission of null
values in attributes or
possibly between
records

Examples of Entity Integrity


Data Type Integrity: very common and most
basic. Checks only for data type
compatibility with DB Schema, such as:
numeric, character, logical, date format, etc.
Commonly referred to in GIS manuals as:
Range and List domains
Ranges - acceptable Numeric ranges for input
List - acceptable text entries or drop-down lists.

Enforcing Integrity
Not a trivial task!
Not all database management systems or GIS
software enable users to enforce data
integrity during attribute entry or edit
sessions.
Therefore, the programmer or the Database
Administrator must enforce and/or check for
Integrity.

Referential Integrity
Referential integrity concerns two or more
tables that are related.
Example: IF table A contains a foreign key that
matches the primary key of table B
THEN
values of this foreign key either match the
value of the primary key for a row in table B or
must be null.

Functions of a Database
Management System

Data Storage, Retrieval and Update (CRUD)


Catalog or Data Dictionary
Shared Update Support
Backup and Recovery Services
Security Services
Integrity Services
Data Independence - independent from programs
Various Data Manipulation Utilities

CRUD
Four basic functions, for a given entity they
should all be performed with few exceptions,
in your system:
CREATE
READ
UPDATE
DELETE

Normalization
Normalization: a process for analyzing the
design of a relational database
Database Design - Arrangement of attributes into
entities

It permits the identification of potential


problems in your database design
Concepts related to Normalization:
KEYS and FUNCTIONAL DEPENDENCE

Ex: Database Normalization (1)


Sample Student
Activities DB Table
Poorly Designed
Non-unique records
John Smith

Test the Design by


developing sample
reports and queries

Ex: Database Normalization (2)


Created a unique ID
for each Record in the
Activities Table
Required the creation of
an ID look-up table
for reporting (Students
Table)
Converted the Flat-File
into a Relational
Database

Ex: Database Normalization (3)


Wasted Space
Redundant data entry
What about taking a 3rd
Activity?
Query Difficulties trying to find all
swimmers
Data Inconsistencies conflicting prices

Ex: Database Normalization (4)


Students table is fine
Elimination of two
columns and an
Activities Table
restructuring,
Simplifies the Table
BUT, we still have
Redundant data
(activity fees) and data
insertion anomalies.

Problem: If student #219


transfers we lose all references
to Golf and its price.

Ex: Database Normalization (5)


Modify the Design to
ensure that every
non-key field is
dependent on the
whole key
Creation of the
Participants Table,
corrects our problems
and forms a union
between 2 tables.

This is a Better Design!

Database Design: Basic Steps


Step 1: Determine the entities involved and create a
separate table for each type of entity (thing, concept,
event, theme) and name it.
Step 2: Determine the Primary Key for each table.
Step 3: Determine the properties for each entity (the
non-key attributes).
Step 4: Determine the relationships among the
entities

Design Example: Music CD


collection
Entities: the CD, Tracks, Composer
Attributes:
CD (ID, title, musician, cost, etc.)
Track (song title, length, order number)
Composer (name, genre, DOB, DOD)

Relationships: CD to Track, Composer to Track

Table Design Example

Figure 1: A simple and flawed table design.

Figure 2: An improved database table..

Step1: Creating a Data Model

Identify Candidate Entities


Identify Relationships
Define Entities & Relationships
Review Entity-Relationship Model

Step 2: Defining an Attribute


Model

List Candidate Attributes for each Entity


Add KEYS to model
Attribute & Normalize Model
Define Attributes
Review Logical Model

Step 3: Identify & Capture


Business Rules

Review & Verify Cardinalities


Define Referential Integrity
Identify Business Domains
Identify Attribute Default Values

Step 4: Define Physical Model

Select Target DBMS


Name Tables & Columns
Name & Define Indexes
Define Columns
Verify/Update Triggers
Generate Reports & Document Design

Step 5: Review Final Design

Verify Entities & Definitions


Verify Relationships & Definitions
Verify Attributes & Definitions
Verify Business Constraints
Approve Schema Design

A Review of the Advantages of


Database Processing

Lower cost (relative it what?)


More Information from same amount of data
Data Sharing is easier
Controlled or elimination of redundancy
Consistency, Integrity, Security
Increased Productivity

Some Disadvantage of
Database Processing

Greater Complexity
Possibly a greater impact of a failure
Recovery is more difficult
Although these are all debated issues,
opportunities for complete failure are often
reduced with the latest database products,
but reliability results in higher investment
costs.

You might also like