You are on page 1of 26

UNIT-I

A data mean known facts or raw facts that can be recorded and that have
implicit meaning. For example, consider the names, telephone, numbers, and
addresses of the people you know. You may have recorded this data in an indexed
address book; you may have stored it on a hard drive, using a personal computer
and software such as Microsoft Access, or Excel.
Information: Processed data.
Database: Database is a large collection of related data that can be stored
generally describes activities of an organization.
Properties:
1. It is used to store data of an organization.
2. A database is designed and developed for a specific purpose.
3. It has some source from which data is derived and it is populated with that
data.
4. It can be of any size
5. It allows multiple users to share and access database at the same time.
Eg: - University database which includes students, faculty, courses & class
rooms information along with activities such as enrolment and teaching
courses.
DBMS (Database Management System)
 A database management system (DBMS) is a collection of programs
(software) for defining, creating, manipulating and maintaining a database.
 The DBMS is a general-purpose software package that facilitates the
processes of defining, constructing, manipulating, and sharing databases
among various users and applications.
 Defining a database involves specifying the data types, structures, and
constraints for the data to be stored in the database.
 Constructing the database is the process of storing the data itself on some
storage medium that is controlled by the DBMS.
 Manipulating a database includes functions such as querying the database to
retrieve specific data, updating the database to reflect changes,, and
generating reports from the data
 Sharing a database allows multiple users and programs to access the
database concurrently.
Simplified database system environment:

DBMS Applications:
1. Banking – For customer information, accounts, and loans, and banking
transactions. [all transactions]
2. Airlines – For reservation and schedule information. [reservations,
schedules]
3. Universities – For student information, course registrations, and grades.
[registration, grades]
4. Credit Card Transactions – For purchases on credit card and generation of
monthly statements.
5. Telecommunication – For keeping records of calls made, generating
monthly bills, maintaining balances on prepaid calling cards, and storing
information about communication networks.
6. Finance – For storing information about holdings, sales, and purchases of
financial instruments such as stocks and bonds.
7. Sales – For customer, product, and purchase information. [customers,
products, purchases]
8. Manufacturing – For management o f supply chain and for tracking
production of items in factories, inventories of items in
warehouses/stores, and orders for items. [production, inventory, orders,
supply chain]
9. Human Resources – For information about employees, salaries, payroll taxes
and benefits, and generation of paychecks. [employee records, salaries, tax
deductions]
File System:
 Before the evolution of DBMS, File Systems are used to store and manage
the data.
 In File Processing System data will be stored in individual files.
 A File is a collection of data.
 Files are typically designed to meet needs of a particular department or user
group.
 Files are also typically designed to be part of a particular computer
application.
Consider an organization/enterprise that is organized as a collection of
departments/offices. Each department has certain data processing "needs",
many of which are unique to it. In the file processing approach, each
department would "own" a collection of relevant data and software
applications to manipulate that data.

Drawbacks with File Processing System:


1. Data Redundancy and Inconsistency: Data redundancy means appearing
same data in different places. In file system same information is stored at
different places it causes data-inconsistency problems during updates.
2. Difficulty in accessing data: In order to retrieve data from files we need to
write special application program every time. This is not a convenient way
because every time the requirements may change. Need to write a new
program to carry out each new task
3. Data isolation: The data is scattered in different files and with different
formats. So it is difficult to write application programs to retrieve the data.
4. Enforcing Integrity constraints: Data integrity means, all the data has to
obey some condition. In File System integrity constraints are set at program
level. In a bank every savings account must have a minimum balance (i.e.
1000). In order to implement is constraint we need to write an application
program, but when any new constraint is to be added again we need to
modify the application program. This is going to be difficult when this
constraint involves data which is stored in different files.
5. Atomicity problems: Atomicity is a property of a transaction (simply a task
which includes more than 1 action) it states that either all actions to be
performed or none. In file processing system incomplete transactions cannot
roll back. Due this data will be inconsistent. Example fund transfer from
account A to account B. initially both accounts are credited with 1000
rupees. Now 50Rs fund transfer task is initiated if this task is half done i.e.
50Rs debited from account A and not credited to account B. and task
abruptly terminated if leads to data inconsistency for this problem file
system doesn’t have solution.
6. Difficulty in concurrency control: concurrency means same file is updated
by different application programs at the same time. In File Processing
System it is not possible to handle concurrency because of that we can get
inconsistent data.
7. Security Problems: Since the information is scattered in different files and
does not have centralized access path, so it is not possible to provide security
to the data so that everyone can access the data.
Advantages of DBMS
1. Controlling Redundancy:
 In non-database systems (traditional computer file processing), each user
group has its own files. For in University department people and
management people maintain the separate file for employees.
 Redundancy leads to several problems 1.duplication effort: same data can be
entered multiple times 2. Storage space is wasted. 3. Data inconsistency.
 In this case, the duplicated copies of the same data are created at many
places.
 In DBMS, all the data of an organization is integrated into a single database.
 The data is recorded at only one place in the database and it is not
duplicated.
 In DBMS, the data redundancy can be controlled or reduced but is not
removed completely. Sometimes, it is necessary to create duplicate copies of
the same data items in order to relate tables with each other. By controlling
the data redundancy, you can save storage space
2. Data Consistency:
 By controlling the data redundancy, the data consistency is obtained.
 If a data item appears only once, any update to its value has to be performed
only once and the updated value (new value of item) is immediately
available to all users.
 If the DBMS has reduced redundancy to a minimum level, the database
system enforces consistency.
 It means that when a data item appears more than once in the database and is
updated, the DBMS automatically updates each occurrence of a data item in
the database.
3. Data Security
 Data security is the protection of the database from unauthorized users.
 Only the authorized persons are allowed to access the database.
 Some of the users may be allowed to access only a part of database i.e., the
data that is related to them or related to their department.
 Mostly, the DBA or head of a department can access all the data in the
database.
 Some users may be permitted only to retrieve data, whereas others are
allowed to retrieve as well as to update data.
 The database access is controlled by the DBA.
 He creates the accounts of users and gives rights to access the database.
 Typically, users or group of users are given usernames protected by
passwords.
 Most of the DBMSs provide the security sub-system, which the DBA uses to
create accounts of users and to specify account restrictions.
 The user enters his/her account number (or username) and password to
access the data from database
4. Providing Storage Structures for Efficient Query Processing
 Database systems must provide capabilities for efficiently executing queries
and updates.
 Because the database is typically stored on disk, the DBMS must provide
specialized data structures to speed up disk search for the desired records.
 Auxiliary files called indexes are used for this purpose. Indexes are typically
based on tree data structures or hash data structures.
 DBMS supports file manager to manage the allocation of disk space for the
DBMS files.
 Also it supports Buffer Manager to manage the memory buffers used for
processing the database information.
 When updation is required it is first read data from database in to the
memory buffer, where it is manipulated and then updated information is
written back in to the database.
5. Providing Backup and Recovery
 In a computer file-based system, the user creates the backup of data
regularly to protect the valuable data from damaging due to failures to the
computer system or application program.
 It is a time consuming method, if volume of data is large.
 Most of the DBMSs provide the 'backup and recovery' sub-systems that
automatically create the backup of data and restore data if required.
 For example, if the computer system fails in the middle (or end) of an
update operation of the program, the recovery sub-system is responsible for
making sure that the database is restored to the state it was in before the
program started executing.
6. Providing concurrency control
DBMS will support concurrency control tools for permitting multiple users
or application programs to access the database concurrently, while
preserving the consistency of database.
7. Enforcing Integrity constraints
 Integrity constraints can be applied to database sothat correct data can be
entered in to database.
Ex: 1. minimum balance of a account is 1000
2. Max marks of a subject is 100
 Most database applications have certain integrity constraints that must hold
for the data.
 A DBMS should provide capabilities for defining and enforcing these
constraints.
 The simplest type of integrity constraint involves specifying a data type for
each data item.
8. Report Writers
Most of the DBMSs provide the report writer tools used to create reports.
The users can create reports very easily and quickly.
Once a report is created, it can be used many times and it can be modified
very easily.
The created reports are also saved along with database and behave like a
software component.
DBA, Designers and End users:

Database Administrator (DBA)


 The database administrator is a person having central control over data and
programs accessing the data.
 DBA coordinates all the activities of the database system
 They are the users who are most familiar with the database and are
responsible for creating, modifying, and maintaining its three levels.
 Database Administrator is responsible to manage the DBMS’s use and
ensure that the database is functioning
 DBA is responsible for granting permission to the users of the database and
stores the profile of each user in the database
Responsibilities of DBA
 Installing and upgrading the database server and application tools
 Allocating system storage and planning future storage requirements for the
database system
 Modifying the database structure, as necessary, from information given by
application developers
 Enrolling users and maintaining system security
 Ensuring compliance with database vendor license agreement
 Controlling and monitoring user access to the database
 Monitoring and optimizing the performance of the database
 Planning for backup and recovery of database information
 Maintaining archived data
 Backing up and restoring databases
 Contacting database vendor for technical support
 Generating various reports by querying from database as per need
Database Designers
 Database designers are responsible for identifying the data to be stored in the
database and for choosing appropriate structures to represent and store this
data.
 These tasks are mostly undertaken before the database is actually
implemented and populated with data.
 It is the responsibility of database designers to communicate with all
prospective database users in order to understand their requirements, and to
come up with a design that meets these requirements.
 In many cases, the designers are on the staff of the DBA and may be
assigned other staff responsibilities after the database design is completed.
End Users
End users are the people whose jobs require access to the database for
querying, updating, and generating reports; the database primarily exists for their
use. There are several categories of end users:
Casual end users
 Casual end users occasionally access the database, but they may need
different information each time.
 They use a sophisticated database query language to specify their requests
and are typically middle- or high-level managers or other occasional
browsers.
Naive or parametric end users
 Naive or parametric end users make up a sizable portion of database end
users.
 Their main job function is querying and updating the database, using
standard types of queries and updates-called canned transactions-that have
been carefully programmed and tested.
The tasks that such users perform are varied:
 Bank tellers check account balances and post withdrawals and deposits.
 Reservation clerks fur airlines, hotels, and car rental companies check
availability for a given request and make reservations.
Sophisticated end users
Sophisticated end users include engineers, scientists, business analysts, and
others who thoroughly familiarize themselves with the facilities of the DBMS so as
to implement their applications to meet their complex requirements.
Stand-alone users
 Stand-alone users maintain personal databases by using ready-made program
packages that provide easy-to-use menu-based or graphics-based interfaces.
 An example is the user of a tax package that stores a variety of personal
financial data for tax purposes.

Data Models:
A data model is a collection of concepts that can be used to describe the
structure of a database.
Structure of a database; mean the data types, relationships, and constraints
that should hold for the data. Most data models also include a set of basic
operations for specifying retrievals and updates on the database.
Types of Data Models
1. Relational Model
2. Entity-Relationship Model
3. Hierarchical Model
4. Network Model
5. Object-Oriented Model
1.Relational Model
 Most commonly used model is the relational model.
 In this model data is organized as two-dimensional tables.
 Each table is called relation.
 The relational model uses a collection of tables to represent both data and
the relationships among those data. Each table has multiple columns, and
each column has a unique name.
 The relational model is implemented in database where a relation is
represented by a table, a tuple is represented by a row, and an attribute is
represented by a column of the table.
SID SNAME Branch Section
1 Ram CSE A
2 Raj ECE B
3 Rani IT C

2. Entity-Relationship Model
Entity-Relationship model is based on the notion of real world entities and
relationship among them.
ER Model is best used for the conceptual design of database.
ER Model is based on:
 Entities and their attributes
 Relationships among entities

Entity
An entity in ER Model is real world entity, which has some properties
called attributes. Every attribute is defined by its set of values, called domain.
For example, in a school database, a student is considered as an entity. Student has
various attributes like name, age and class etc.

3. Hierarchical Data Model


 A hierarchical data model is a data model which the data is organized into a
tree like structure.
 First commercial DBMS is based on this model.
 In Hierarchical model data is represented as records and the records
organized as collection of trees.
 The relationships among the data are represented by links, which can be
viewed as pointers.
 The tree structure permits that each record can have only one parent record.

Department

Teacher Student

Course
4. Network Data Model
 In Network model data is represented as records and the records organized
as collection of arbitrary graphs.
 The relationships among the data are represented by links, which can be
viewed as pointers.
 In network model a record can have any number of parent records.

Department Student

Course Teacher

5. Object-Oriented Data Model


In this data is represented in the form of objects an object contains data and
functions which works on the data.
Schemas, Instances, and Database State
In any data model, it is important to distinguish between the description of
the database and the database itself. Database Schema refers to the overall
structure of a database.
The description of a database is called the database schema, which is
specified during database design and is not expected to change frequently
Schema Diagram:
STUDENT
Name SNumber Class Major

COURSE

CourseName CourseNumber CreditHours Department


Teacher
TeacherName TeacherId TeacherDepartment TeachingcourseId

The above schema diagram displays structure of each record type but not
actual instances of records.
A schema diagram displays only some aspects of a schema, such as the
names of record types and data items, and some types of constraints.
Instance:
The data in the database at a particular moment in time is called instance or
database state.
The distinction between database schema and database state is very
important. When we define a new database, we specify its database schema only to
the DBMS. At this point, the corresponding database state is the empty state with
no data. From then on, every time an update operation is applied to the database,
we get another Database state.
At any point in time, the database has a current state.
Levels of Abstraction or Three-Schema architecture:

Internal level or internal schema:


 The internal schema uses a physical data model and describes the complete
details of data storage and access paths for the database.
 It tells us what data is stored in the database and how. At least the following
aspects are considered at this level:
 Storage allocation: B-trees, hashing etc.
 Access paths: specification of primary and secondary keys, indexes and
pointers and sequencing..
Conceptual level or conceptual schema:
 The conceptual level has a conceptual schema, which describes the structure
of the whole database for a community of users.
 The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user operations,
and constraints.
 Usually, a representational data model is used to describe the conceptual
schema when a database system is implemented.
External level or External View:
 The external or view level includes a number of external schemas or user
views.
 The external level is the view that the individual user of the database has.
 Each external schema describes the part of the database that a particular user
group is interested in and hides the rest of the database from that user group.
Data Independence
Data independence can be defined as the capacity to change the schema at
one level of a database system without having to change the schema at the next
higher level.
We can define two types of data independence:
1 Physical data independence
2. Logical data independence
Physical data independence:
 Physical data independence is the capacity to change the internal schema
without having to change the conceptual schema.
 Hence, the external schemas need not be changed as well.
 Changes to the internal schema may be needed because some physical files
had to be reorganized-for example, by creating additional access structures-
to improve the performance of retrieval or update.
Logical data independence:
 Logical data independence is the capacity to change the conceptual schema
without having to change external schernas or application programs.
 We may change the conceptual schema to expand the database
(by adding a record type or data item), to change constraints, or to reduce the
database (by removing a record type or data item).
DBMS architecture
The Functional components of a database system can be divided in to
i) Query Processor Components
ii) Storage Manager Components
Query Processor Components
The Query Processor Components include:
DDL interpreter:
It interprets DDL statements and converts them in to a set tables which are saved in
the data dictionary
DML compiler:
It translates DML statements into low-level instructions that are understood by the
Query Evaluation Engine.
It also optimizes the DML Queries for efficient execution by the Query Evaluation
Engine.
Query evaluation engine:
It executes low-level instructions generated by the DML compiler and produces
results.
Storage Manager Components
These components provide the interface between the low-level data stored in the
database and the Query Processor.
The storage manager components include:
Authorization and integrity manager:
It tests for the satisfaction of integrity constraints and checks the authority of users
to access data.
Transaction manager:
This component ensures that concurrent transactions proceed without conflict and
the database remains in a consistent (correct) state despite system failures.
File manager:
It manages the allocation of disk-space for the storage of DBMS files.
Buffer manager:
It is responsible for fetching data from disk storage into main memory buffers for
processing, and then writing the updated data back onto the disk.
Data Components
Data files:
Which store the database itself.
Data dictionary:
It is a metadata file, which stores the database schema. It stores metadata about the
structure of the database.
Indices:
It can provide fast access to data items. Like the index in this textbook, a database
index provides pointers to those data items that hold a particular value.
Statistical data:
It stores the statistical information about processing of previous queries. This
information is used by the Query Processor to optimize the queries.
CLASSIFICATION OF DATABASE MANAGEMENT SYSTEMS
Several criteria are normally used to classify DBMSs are
1. Based on data model (e.g., relational, object, object-relational, network)
2. No of users supported by the System (multi-user vs. single-user)
3. Number of sites over which the database is distributed (centralized vs.
distributed)
4. Cost
5. Types of access path options
6. Based on purpose (general-purpose vs. special-purpose)
1. Based on Data Model:
 The first is the data model on which the DBMS is based.
 The main data model used in many current commercial DBMSs is the
relational data model. The object data model was implemented in some
commercial systems but has not had widespread use.
 Many legacy (older) applications still run on database systems based on the
hierarchical and network data models.
 We can hence categorize DBMSs based on the data model: relational, object,
relational, hierarchical, network, and other.
2. No of users supported by the System:
Multi-user vs. single-user
 The second criterion used to classify DBMSs is the number of users
supported by the System.
 Single-user systems support only one user at a time and are mostly used with
Personal computers.
 Multiuser systems, which include the majority of DBMSs, support multiple
users concurrently.
3. Number of sites over which the database is distributed
Centralized vs. distributed
 A third criterion is the number of sites over which the database is
distributed.
 A DBMS is centralized if the data is stored at a single computer site. A
centralized DBMS can support multiple users, but the DBMS and the
database themselves reside totally at a single computer site.
 A distributed DBMS (DDBMS) can have the actual database and DBMS
software distributed over many sites, connected by a computer network.
Homogeneous DDBMSs use the same DBMS software at multiple sites.
4. Cost
 A fourth criterion is the cost of the DBMS. The majority of DBMS packages
cost between $10,000 and $100,000.
 Single-user low-end systems that work with microcomputers cost between
$100 and $3000. At the other end of the scale, a few elaborate packages cost
more than $100,000.
5. Access path
We can also classify a DBMS on the basis of the types of access path
options for storing files. One well-known family of DBMSs is based on inverted
file structures.
6. General-purpose vs. special-purpose
 A DBMS can be general purpose or special purpose. When performance is a
primary
Consideration, a special-purpose DBMS can be designed and built for a
specific application such a system cannot be used for other applications
without major changes.
 Many airline reservations and telephone directory systems developed in the
past are special purpose DBMSs.
DBMS Languages
 A DBMS must provide appropriate languages and interfaces for each
category of users to express database queries and updates.
 Database Languages are used to create and maintain database on computer.
 There are large numbers of database languages like Oracle, MySQL, MS
Access, dBase, FoxPro etc.
 SQL statements commonly used in Oracle and MS Access can be
categorized as data definition language (DDL), data control language (DCL)
,data manipulation language (DML) and Transaction Control Language
(TCL)
1. Data Definition Language (DDL)
 It is a language that allows the users to define data and their relationship to
other types of data.
 It is mainly used to create files, databases, data dictionary and tables within
databases.
 It is also used to specify the structure of each table, set of associated values
with each attribute, integrity constraints
 DDL Statements are used to define the database structure or schema..
 CREATE - to create objects in the database
 ALTER - alters the structure of the database
 DROP - delete objects from the database
2. Data Manipulation Language (DML)
 It is a language that provides a set of operations to support the basic data
manipulation operations on the data held in the databases.
 It allows users to insert, update, delete and retrieve data from the database.
 The part of DML that involves data retrieval is called a query language.
 SELECT - Retrieve data from the a database
 INSERT - Insert data into a table
 UPDATE - Updates existing data within a table
 DELETE - deletes all records from a table, the space for the
records remain.
3. Data Control Language (DCL)
 DCL statements control access to data and the database using statements
such as GRANT and REVOKE.
 A privilege can either be granted to a User with the help of GRANT
statement.
 The privileges assigned can be SELECT, ALTER, DELETE, EXECUTE,
INSERT, INDEX etc.
 In addition to granting of privileges, you can also revoke (taken back) it by
using REVOKE command
 GRANT - gives user's access privileges to database
 REVOKE - withdraw access privileges given with the
GRANT command
Transaction Control Language (TCL)
 Statements are used to manage the changes made by DML statements.
 It allows statements to be grouped together into logical transactions.
 COMMIT - save work done
 SAVEPOINT - identify a point in a transaction to which you can
later roll back
 ROLLBACK - restore database to original since the last
COMMIT
 SET TRANSACTION - Change transaction options like isolation
level and what rollback segment to used
Centralized and client/server architectures for DBMS:
A 3-tier architecture separates its tiers from each other based on the complexity of
the users and how they use the data present in the database. It is the most widely
used architecture to design a DBMS.
 Database (Data) Tier − At this tier, the database resides along with its
query processing languages. We also have the relations that define the data
and their constraints at this level.

 Application (Middle) Tier − At this tier reside the application server and
the programs that access the database. For a user, this application tier
presents an abstracted view of the database. End-users are unaware of any
existence of the database beyond the application. At the other end, the
database tier is not aware of any other user beyond the application tier.
Hence, the application layer sits in the middle and acts as a mediator
between the end-user and the database.

 User (Presentation) Tier − End-users operate on this tier and they know
nothing about any existence of the database beyond this layer. At this layer,
multiple views of the database can be provided by the application. All
views are generated by applications that reside in the application tier.

Multiple-tier database architecture is highly modifiable, as almost all its


components are independent and can be changed independently.

You might also like