You are on page 1of 80

Hierarchical Database Model

Hierarchical Database model is one of the oldest database models, dating from late
1950s. One of the first hierarchical databases Information Management System (IMS)
was developed jointly by North American Rockwell Company and IBM. This model is
like a structure of a tree with the records forming the nodes and fields forming the
branches of the tree.

The hierarchical model organizes data elements as tabular rows, one for each instance of
entity. Consider a company's organizational structure. At the top we have a General
Manager (GM). Under him we have several Deputy General Managers (DGMs). Each
DGM looks after a couple of departments and each department will have a manager and
many employees. When represented in hierarchical model, there will be separate rows
for representing the GM, each DGM, each department, each manager and each
employee. The row position implies a relationship to other rows. A given employee
belongs to the department that is closest above it in the list and the department belongs
to the manager that is immediately above it in the list and so on as shown.

In the hierarchical data model, records are linked with other superior records on which
they are dependent and also on the records, which are dependent on them. A tree
structure may establish one-to-many relationship. Figure illustrates the structure of a
family. Great grandparent is the root of the structure. Parents can have many children
exhibiting one to many relationships. The great grandparent record is known as the root
of the tree. The grandparents and children are the nodes or dependents of the root. In
general, a root may have any number of dependents. Each of these dependent may have
any number of lower level dependents, and so on, with no restriction of levels.

The different elements (e.g. records) present in the hierarchical tree structure have
Parent-Child relationship. A Parent element can have many children elements but a
Child element cannot have many parent elements. That is, hierarchical model cannot
represent many to many relationships among records.
Another example, of hierarchical model is shown. It shows a database of CustomerLoan, here a customer can take multiple loans and there is also a provision of joint loan
where more than one person can take a joint loan. As shown, CI customer takes a single
loan Ll of amount 10000 jointly with customer C2. Customer C3 takes two loans L2 of
amount 15000 and L3 of amount 25000.

Sample Database

In order to understand the hierarchical data model better, let us take the example of the
sample database consisting of supplier, parts and shipments. The record structure and
some sample records for supplier, parts and shipments elements are as given in
following tables.

We assume that each row in Supplier table is identified by a unique SNo


(Supplier Number) that uniquely identifies the entire row of the table. Likewise each
part has a unique Pno (Part Number). Also we assume that no more than one shipment
exists for a given supplier/part combination in the shipments table.

Hierarchical View for the Suppliers-Parts Database


The tree structure has parts record superior to supplier record. That is parts form the
parent and supplier forms the children. Each of the four trees figure, consists of one part
record occurrence, together with a set of subordinate supplier record occurrences. There
is one supplier record for each supplier of a particular part. Each supplier occurrence
includes the corresponding shipment quantity.

For example, supplier S3 supplies 300 quantities of part P2. Note that the set of supplier
occurrences for a given part occurrence may contain any number of members, including
zero (for the case of part P4). Part PI is supplied by two suppliers, S1 and S2. Part P2 is
supplied by three suppliers, S1, S2 and S3 and part P3 supplied by only supplier SI as
shown in figure.

Operations on Hierarchical Model


There are four basic operations Insert, Update, Delete and Retrieve that can be
performed on each model. Now, we consider in detail that how these basic operations
are performed in hierarchical database model.
Insert Operation: It is not possible to insert the information of the supplier e.g. S4
who does not supply any part. This is because a node cannot exist without a root. Since,
a part P5 that is not supplied by any supplier can be inserted without any problem,
because a parent can exist without any child. So, we can say that insert anomaly exists
only for those children, which has no corresponding parents.
Update Operation: Suppose we wish to change the city of supplier S1 from Qadian to
Jalandhar, then we will have to carry out two operations such as searching S1 for each
part and then multiple updations for different occurrences of S1. But, if we wish to
change the city of part P1 from Qadian to Jalandhar, then these problems will not occur
because there is only a single entry for part P I and the problem of inconsistency will not
arise. So, we can say that update anomalies only exist for children not for parent because
children may have multiple entries in the database.
Delete Operation: In hierarchical model, quantity information is incorporated into
supplier record. Hence, the only way to delete a shipment (or supplied quantity) is to
delete the corresponding supplier record. But such an action will lead to loss of
information of the supplier, which is not desired. For example: Supplier S2 stops
supplying 250 quantity of part PI, then the whole record of S2 has to be deleted under
part PI which may lead to loss the information of supplier. Another problem will arise if
we wish to delete a part information and that part happens to be only part supplied by
some supplier. In hierarchical model, deletion of parent causes the deletion of child
records also and if the child occurrence is the only occurrence in the whole database,
then the information of child records will also lost with the deletion of parent. For
example: if we wish to delete the information of part P2 then we also lost the
information of S3, S2 and S1 supplier. The information of S2 and Sl can be obtained
from PI, but the information about supplier S3 is lost with the deletion of record for P2.
Record Retrieval: Record retrieval methods for hierarchical model are complex and
asymmetric which can be clarified with the following queries:
Query1: Find the supplier number for suppliers who supply part P2.
Solution: In order to get this information, first we search the information of parent P2
from database, since parent occurs only once in the whole database, so we obtain only a

single record for P2. Then, a loop is constructed to search all suppliers under this part
and supplier numbers are printed for all suppliers.
Algorithm
get [next] part where PNO=P2;
do until no more shipments under this part;
get next supplier under this part;
print SNO;
end;
Query2: Find part numbers for parts supplied by supplier S2.
Solution: In order to get required part number we have to search S2 under each part. If
supplier S2, is found under a part then the corresponding part number is printed,
otherwise we go to next part until all the parts are searched for supplier S2.
Algorithm
do until no more parts;
get next part;
get [next] supplier under this part where SNO=S2;
if found then print PNO;
end;
In above algorithms "next" is interpreted relative the current position (normally the row
most recently accessed; for the initial case we assume it to be just prior to the first row of

the table). We have placed square brackets around "next" in those statements where we
expect at the most one occurrence to satisfy the specified conditions.
Since, both the queries involved different logic and are complex, so we can conclude that
retrieval operation of this model is complex and asymmetric.
Conclusion: As explained earlier, we can conclude that hierarchical model suffers from
the Insertion anomalies, Update anomalies and Deletion anomalies, also the retrieval
operation is complex and asymmetric, and thus hierarchical model is not suitable for all
the cases.
Record
A collection of field or data items values that provide information on an entity. Each
field has a certain data type such as integer, real or string. Records of the same type are
group into record type.
Parent Child Relationship Type
It is 1:N relation between two record type. The record type 1 side is parent record type
and one on the N side is called child record type of the PCR type.

Advantages
1. Simplicity
Data naturally have hierarchical relationship in most of the practical situations.
Therefore, it is easier to view data arranged in manner. This makes this type of database
more suitable for the purpose.
2. Security
These database system can enforce varying degree of security feature unlike flat-file
system.
3. Database Integrity
Because of its inherent parent-child structure, database integrity is highly promoted in
these systems.

4. Efficiency: The hierarchical database model is a very efficient, one when the
database contains a large number of I: N relationships (one-to-many relationships) and

when the users require large number of transactions, using data whose relationships are
fixed.

Disadvantages
1. Complexity of Implementation: The actual implementation of a hierarchical
database depends on the physical storage of data. This makes the implementation
complicated.
2. Difficulty in Management: The movement of a data segment from one location to
another cause all the accessing programs to be modified making database management
a complex affair.
3. Complexity of Programming: Programming a hierarchical database is relatively
complex because the programmers must know the physical path of the data items.
4. Poor Portability: The database is not easily portable mainly because there is little
or no standard existing for these types of database.

5. Database Management Problems: If you make any changes in the database


structure of a hierarchical database, then you need to make the necessary changes in all
the application programs that access the database. Thus, maintaining the database and
the applications can become very difficult.

6. Lack of structural independence: Structural independence exists when the


changes to the database structure does not affect the DBMS's ability to access data.
Hierarchical database systems use physical storage paths to navigate to the different
data segments. So, the application programs should have a good knowledge of the
relevant access paths to access the data. So, if the physical structure is changed the
applications will also have to be modified. Thus, in a hierarchical database the benefits
of data independence are limited by structural dependence.

7. Programs Complexity: Due to the structural dependence and the navigational


structure, the application programs and the end users must know precisely how the data
is distributed physically in the database in order to access data. This requires knowledge
of complex pointer systems, which is often beyond the grasp of ordinary users (users
who have little or no programming knowledge).

8. Operational Anomalies: As discussed earlier, hierarchical model suffers from the


Insert anomalies, Update anomalies and Deletion anomalies, also the retrieval operation
is complex and asymmetric, thus hierarchical model is not suitable for all the cases.

9. Implementation Limitation: Many of the common relationships do not conform


to the l:N format required by the hierarchical model. The many-to-many (N:N)
relationships, which are more common in real life are very difficult to implement in a
hierarchical model.

Network Model
The popularity of the network data model coincided with the popularity of the
hierarchical data model. Some data were more naturally modeled with more than one
parent per child. So, the network model permitted the modeling of many-to-many
relationships in data.
In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the
network model. The basic data modeling construct in the network model is the set
construct. A set consists of an owner record type, a set name, and a member record type.
A member record type can have that role in more than one set; hence the multiparent
concept is supported.
An owner record type can also be a member or owner in another set. The data model is a
simple network, and link and intersection record types (called junction records by
IDMS) may exist, as well as sets between them. Thus, the complete network of
relationships is represented by several pairwise sets; in each set some (one) record type
is owner (at the tail of the network arrow) and one or more record types are members
(at the head of the relationship arrow).
Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL
network model is based on mathematical set theory.
Network model is a collection data in which records are physically linked through linked
lists .A DBMS is said to be a Network DBMS if the relationships among
data in the database are of type many-to-many. The relationship among
many-to-many appears in the form of a network.

Thus the structure of a network database is extremely complicated because of these


many-to-many relationships in which one record can be used as a key of the entire
database. A network database is structured in the form of a graph that is also a data
structure.

Relational Model
The Relational Model was the first theoretically founded and well thought out Data
Model, proposed by EfCodd in 1970, then a researcher at IBM. It has been the
foundation of most database software and theoretical database research ever since.
The Relational Model is a depiction of how each piece of stored information relates to
the other stored information. It shows how tables are linked, what type of links are
between tables, what keys are used, what information is referenced between tables. It's
an essential part of developing a normalized database structure to prevent repeat and
redundant data storage.
The basic idea behind the relational model is that a database consists of a series of
unordered tables (or relations) that can be manipulated using non-procedural
operations that return tables. This model was in vast contrast to the more traditional
database theories of the time that were much more complicated, less flexible and
dependent on the physical storage methods of the data.. The RELATIONAL database
model is based on the Relational Algebra, set theory and predicate logic.
It is commonly thought that the word relational in the relational model comes from the
fact that you relate together tables in a relational database. Although this is a convenient
way to think of the term, it's not accurate. Instead, the word relational has its roots in
the terminology that Codd used to define the relational model. The table in Codd's
writings was actually referred to as a relation (a related set of information).
In fact, Codd (and other relational database theorists) use the terms relations, attributes
and tuples where most of us use the more common terms tables, columns and rows,
respectively (or the more physicaland thus less preferable for discussions of database
design theoryfiles, fields and records).
The relational model can be applied to both databases and database management
systems (DBMS) themselves. The relational fidelity of database programs can be
compared using Codd's 12 rules (since Codd's seminal paper on the relational model, the
number of rules has been expanded to 300) for determining how DBMS products
conform to the relational model.

When compared with other database management programs, Microsoft Access fares
quite well in terms of relational fidelity. Still, it has a long way to go before it meets all
twelve rules completely.

Object-Oriented Model
An object-oriented database management system (OODBMS, but sometimes just called
object database or ODBMS) is a DBMS that stores data in a logical model that is
closely aligned with an application programs object model. Of course, an OODBMS will
have a physical data model optimized for the kinds of logical data model it expects.
Object oriented database models have been around since the seventies when the concept
of object oriented programming was first explored. It is only in the last ten or fifteen
years that companies are utilizing object oriented DBMSs (OODBMS). The major
problem for OODBMSs was that relational DBMSs (RDBMS) were already implemented
industry wide.
OODBMS should be used when there is a business need, high performance required,
and complex data is being used. Due to the object oriented nature of the database
model, it is much simpler to approach a problem with these needs in terms of objects.
The result can be a performance increase of ten to one thousand times while writing as
little as 40% of the code (this is because it requires no intermediate language such as
SQL; everything is programmed in the OO language of choice). This code can be directly
applied to a database, and thus saves time and money in development and maintenance.
An object-oriented database interface standard is being developed by an industry group,
the Object Data Management Group (ODMG). The Object Management Group (OMG)
has already standardized an object-oriented data brokering interface between systems in
a network.

What is a Database View


BY DINESH THAKUR

What is a Database View, A view can join information from several tables together, for
example adding the ename field to the Order information. Database View is a subset of
the database sorted and displayed in a particular way. A database view displays one or
more database records on the same page. A view can display some or all of the database
fields.
Views have filters to determine which records they show. Views can be sorted to
control the record order and grouped to display records in related sets. Views have

other options such as totals and subtotals. A query returns information from a table or
set of tables that matches particular criteria.
Most users interact with the database using the database views. A key to creating a
useful database is a well-chosen set of views. Luckily, while views are powerful, they are
also easy to create. Create custom views of a database to organize, filter and sort records.
Database views allow you to easily reduce the complexity of the end user experience and
limit their ability to access data contained in database tables by limiting the data
presented to the end user. Essentially, a view uses the results of a database query to
dynamically populate the contents of an artificial database table.
You can use views to:

Focus on the data that interests them and on the tasks for which they are responsible.
Data that is not of interest to a user can be left out of the view.
Define frequently used joins, projections, and selections as views so that users do not
have to specify all the conditions and qualifications each time an operation is performed on
that data.
Display different data for different users, even when they are using the same data at the
same time. This advantage is particularly important when users of many different interests and
skill levels share the same database.

Advantages:
1. Provide additional level of table security by restricting access to a predetermined set
of
rows
or
columns
of
a
table.
2. Hide Data complexity: For example, a single view might be defined with a join, which
is a collection of related columns or rows in multiple tables. However, the view hides the
fact
that
this
information
actually
originates
from
several
tables.
3. Simplify Statements for User: Views allow users to select information from multiple
tables
without
actually
knowing
how
to
perform
join.
4. Present Data in different perspective: Columns of views can be renamed without
effecting
the
tables
on
which
the
views
are
based.
5. Isolate applications from changes in definitions of base tables. If a view is referencing
three columns of a four columns table, if a fifth column is added or fourth column is
changed,
the
view
and
associated
applications
are
un-affected.
6. Express query that cannot be expressed without using a view. For example, a view can
be defined that joins a group by view with a table or a view can be defined that joins a
UNION
view
with
a
table.
7. Saving of complex queries
Disadvantages:

Rows available through a view are not sorted and are not ordered either.
Cannot
use
DML
operations
on
a
View.
When table is dropped view becomes inactive, it depends on the table objects.
It affects performance, querying from view takes more time than directly querying from
the table

Relational Model
BY DINESH THAKUR

Relational model stores data in the form of tables. This concept purposed by Dr. E.F.
Codd, a researcher of IBM in the year 1960s. The relational model consists of three
major components:
1. The set of relations and set of domains that defines the way data can be represented
(data structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
3. The operations that can be performed on data (data manipulation).
A rational model database is defined as a database that allows you to group its data
items into one or more independent tables that can be related to one another by using
fields common to each related table.

Characteristics of Relational Database


Relational database systems have the following characteristics:
The whole data is conceptually represented as an orderly arrangement of data into
rows and columns, called a relation or table.
. All values are scalar. That is, at any given row/column position in the relation there is
one and only one value.
. All operations are performed on an entire relation and result is an entire relation, a
concept known as closure.
Dr. Codd, when formulating the relational model, chose the term "relation" because it
vas comparatively free of connotations, unlike, for example, the word "table". It is a
common misconception that the relational model is so called because relationships are
established between tables. In fact, the name is derived from the relations on whom it is
based. Notice that the model requires only that data be conceptually represented as a
relation, it does not specify how the data should be physically implemented. A relation is
a relation provided that it is arranged in row and column format and its values are
scalar. Its existence is completely independent of any physical representation.
Basic Terminology used in Relational Model
The figure shows a relation with the. Formal names of the basic components marked the
entire structure is, as we have said, a relation.

Tuples of a Relation
Each row of data is a tuple. Actually, each row is an n-tuple, but the "n-" is
usually dropped.
Cardinality of a relation: The number of tuples in a relation determines its
cardinality. In this case, the relation has a cardinality of 4.
Degree of a relation: Each column in the tuple is called an attribute. The number of
attributes in a relation determines its degree. The relation in figure has a degree of 3.
Domains: A domain definition specifies the kind of data represented by the attribute.
More- particularly, a domain is the set of all possible values that an attribute may validly
contain. Domains are often confused with data types, but this is inaccurate. Data type is
a physical concept while domain is a logical one. "Number" is a data type and "Age" is a
domain. To give another example "StreetName" and "Surname" might both be
represented as text fields, but they are obviously different kinds of text fields; they
belong to different domains.
Domain is also a broader concept than data type, in that a domain definition includes a
more specific description of the valid data. For example, the domain Degree A warded,
which represents the degrees awarded by a university. In the database schema, this
attribute might be defined as Text [3], but it's not just any three-character string, it's a
member of the set {BA, BS, MA, MS, PhD, LLB, MD}. Of course, not all domains can be
defined by simply listing their values. Age, for example, contains a hundred or so values
if we are talking about people, but tens of thousands if we are talking about museum
exhibits. In such instances it's useful to define the domain in terms of the rules, which
can be used to determine the membership of any specific value in the set of all valid
values.
For example, Person Age could be defined as "an integer in the range 0 to 120" whereas
Exhibit Age (age of any object for exhibition) might simply by "an integer equal to or
greater than 0."
Body of a Relation: The body of the relation consists of an unordered set of zero or
more tuples. There are some important concepts here. First the relation is unordered.
Record numbers do not apply to relations. Second a relation with no tuples still qualifies
as a relation. Third, a relation is a set. The items in a set are, by definition, uniquely

identifiable. Therefore, for a table to qualify as a relation each record must be uniquely
identifiable and the table must contain no duplicate records.
Keys of a Relation
It is a set of one or more columns whose combined values are unique among all
occurrences in a given table. A key is the relational means of specifying uniqueness.
Some different types of keys are:
Primary key is an attribute or a set of attributes of a relation which posses the
properties of uniqueness and irreducibility (No subset should be unique). For example:
Supplier number in S table is primary key, Part number in P table is primary key and the
combination of Supplier number and Part Number in SP table is a primary key
Foreign key is the attributes of a table, which refers to the primary key of some
another table. Foreign key permit only those values, which appears in the primary key of
the table to which it refers or may be null (Unknown value). For example: SNO in SP
table refers the SNO of S table, which is the primary key of S table, so we can say that
SNO in SP table is the foreign key. PNO in SP table refers the PNO of P table, which is
the primary key of P table, so we can say that PNO in SP table is the foreign key.
The database of Customer-Loan, which we discussed earlier for hierarchical model and
network model, is now represented for Relational model as shown.
In can easily understood that, this model is very simple and has no redundancy. The
total database is divided in to two tables. Customer table contains the information about
the customers with CNO as the primary key. The Cutomer_Loan table stores the
information about CNO, LNO and AMOUNT. It has the primary key combination of
CNO and LNO. Here, CNO also acts as the foreign key and refers to CNO of Customer
table. It means, only those customer number are allowed in transaction table
Cutomer_Loan that have their entry in the master Customer table.

Relational View of Sample database


Let us take an example of a sample database consisting of supplier, parts and shipments
tables. The table structure and some sample records for supplier, parts and shipments
tables are given as Tables as shown below:

As we discussed earlier, we assume that each row in Supplier table is identified by a


unique SNo (Supplier Number), which uniquely identifies the entire row of the
table. Likewise each part has a unique PNo (Part Number). Also, we assume that no
more than one shipment exists for a given supplier/part combination_in the shipments
table.
Note that the relations Parts and Shipments have PNo (Part Number) in common
andSupplier and Shipments relations have SNo (Supplier Number) in common. The
Supplier and Parts relations have City in common. For example, the fact that supplier S3
and part P2 are located in the same city is represented by the appearance of the same
value, Amritsar, in the city column of the two tuples in relations.

Operations in Relational Model

The four basic operations Insert, Update, Delete and Retrieve operations are shown
below on the sample database in relational model:
Insert Operation: Suppose we wish to insert the information of supplier who does not
supply any part, can be inserted in S table without any anomaly e.g. S4 can be inserted
in Stable. Similarly, if we wish to insert information of a new part that is not supplied by
any supplier can be inserted into a P table. If a supplier starts supplying any new part,
then this information can be stored in shipment table SP with the supplier number, part
number and supplied quantity. So, we can say that insert operations can be performed
in all the cases without any anomaly.
Update Operation: Suppose supplier S1 has moved from Qadian to Jalandhar. In that
case we need to make changes in the record, so that the supplier table is up-to-date.
Since supplier number is the primary key in the S (supplier) table, so there is only a
single entry of S 1, which needs a single update and problem of data inconsistencies
would not arise. Similarly, part and shipment information can be updated by a single
modification in the tables P and SP respectively without the problem of inconsistency.
Update operation in relational model is very simple and without any anomaly in case of
relational model.
Delete Operation: Suppose if supplier S3 stops the supply of part P2, then we have to
delete the shipment connecting part P2 and supplier S3 from shipment table SP. This
information can be deleted from SP table without affecting the details of supplier of S3
in supplier table and part P2 information in part table. Similarly, we can delete the
information of parts in P table and their shipments in SP table and we can delete the
information suppliers in S table and their shipments in SP table.
Record Retrieval: Record retrieval methods for relational model are simple and
symmetric which can be clarified with the following queries:
Query1: Find the supplier numbers for suppliers who supply part P2.
Solution: In order to get this information we have to search the information of part P2
in the SP table (shipment table). For this a loop is constructed to find the records of P2
and on getting the records, corresponding supplier numbers are printed.
Algorithm
do until no more shipments;
get next shipment where PNO=P2;
print SNO;
end;

Query2: Find part numbers for parts supplied by supplier 52.


Solution: In order to get this information we have to search the information of supplier
S2 in the SP table (shipment table). For this a loop is constructed to find the records of
S2 and on getting the records corresponding part numbers are printed.
Algorithm
do until no more parts;
get next shipment where SNO=S2;
print PNO;
end;
Since, both the queries involve the same logic and are very simple, so we can conclude
that retrieval operation of this model is simple and symmetric.
Conclusion: As explained earlier, we can conclude that relational model does not
suffer from the Insert anomalies, Update anomalies and Deletion anomalies, also the
retrieval operation is very simple and symmetric, as compared to hierarchical and
network models, thus we can say that relational model is best suitable for most of the
applications.

Advantages and Disadvantages of Relational Model


The major advantages of the relational model are:
Structural independence: In relational model, changes in the database structure do
not affect the data access. When it is possible to make change to the database structure
without affecting the DBMS's capability to access data, we can say that structural
independence has been achieved. So, relational database model has structural
independence.
Conceptual simplicity: We have seen that both the hierarchical and the network
database model were conceptually simple. But the relational database model is even
simpler at the conceptual level. Since the relational data model frees the designer from
the physical data storage details, the designers can concentrate on the logical view of the
database.
Design, implementation, maintenance and usage ease: The relational database
model\ achieves both data independence and structure independence making the
database design, maintenance, administration and usage much easier than the other
models.

Ad hoc query capability: The presence of very powerful, flexible and easy-to-use
query capability is one of the main reasons for the immense popularity of the relational
database model. The query language of the relational database models structured query
language or SQL makes ad hoc queries a reality. SQL is a fourth generation language
(4GL). A 4 GL allows the user to specify what must be done without specifying how it
must be done. So, sing SQL the users can specify what information they want and leave
the details of how to get the information to the database.

Disadvantages of Relational Model


The relational model's disadvantages are very minor as compared to the advantages and
their capabilities far outweigh the shortcomings Also, the drawbacks of the relational
database systems could be avoided if proper corrective measures are taken. The
drawbacks are not because of the shortcomings in the database model, but the way it is
being implemented.
Some of the disadvantages are:
Hardware overheads: Relational database system hides the implementation
complexities and the physical data storage details from the users. For doing this, i.e. for
making things easier for the users, the relational database systems need more powerful
hardware computers and data storage devices. So, the RDBMS needs powerful machines
to run smoothly. But, as the processing power of modem computers is increasing at an
exponential rate and in today's scenario, the need for more processing power is no
longer a very big issue.
Ease of design can lead to bad design: The relational database is an easy to design
and use. The users need not know the complex details of physical data storage. They
need not know how the data is actually stored to access it. This ease of design and use
can lead to the development and implementation of very poorly designed database
management systems. Since the database is efficient, these design inefficiencies will not
come to light when the database is designed and when there is only a small amount of
data. As the database grows, the poorly designed databases will slow the system down
and will result in performance degradation and data corruption.
'Information island' phenomenon: As we have said before, the relational database
systems are easy to implement and use. This will create a situation where too many
people or departments will create their own databases and applications.
These information islands will prevent the information integration that is essential for
the smooth and efficient functioning of the organization. These individual databases will
also create problems like data inconsistency, data duplication, data redundancy and so
on.

But as we have said all these issues are minor when compared to the advantages and all
these issues could be avoided if the organization has a properly designed database and
has enforced good database standards.

Network Model
BY DINESH THAKUR

The Network model replaces the hierarchical tree with a graph thus allowing more
general connections among the nodes. The main difference of the network model from
the hierarchical model, is its ability to handle many to many (N:N) relations. In other
words, it allows a record to have more than one parent. Suppose an employee works for
two departments. The strict hierarchical arrangement is not possible here and the tree
becomes a more generalized graph - a network. The network model was evolved to
specifically handle non-hierarchical relationships. As shown below data can belong to
more than one parent. Note that there are lateral connections as well as top-down
connections. A network structure thus allows 1:1 (one: one), l: M (one: many), M: M
(many: many) relationships among entities.
In network database terminology, a relationship is a set. Each set is made up of at least
two types of records: an owner record (equivalent to parent in the hierarchical model)
and a member record (similar to the child record in the hierarchical model).
The database of Customer-Loan, which we discussed earlier for hierarchical model, is
now represented for Network model as shown.
In can easily depict that now the information about the joint loan L1 appears single
time, but in case of hierarchical model it appears for two times. Thus, it reduces the
redundancy and is better as compared to hierarchical model.

Network view of Sample Database


Considering again the sample supplier-part database, its network view is shown. In
addition to the part and supplier record types, a third record type is introduced which
we will call as the connector. A connector occurrence specifies the association
(shipment) between one supplier and one part. It contains data (quantity of the parts
supplied) describing the association between supplier and part records.

All connector occurrences for a given supplier are placed on a chain .The chain starts
from a supplier and finally returns to the supplier. Similarly, all connector occurrences
for a given part are placed on a chain starting from the part and finally returning to the
same part.
Operations on Network Model
Detailed description of all basic operations in Network Model is as under:
Insert Operation: To insert a new record containing the details of a new supplier, we
simply create a new record occurrence. Initially, there will be no connector. The new
supplier's chain will simply consist of a single pointer starting from the supplier to itself.
For example, supplier S4 can be inserted in network model that does not supply any
part as a new record occurrence with a single pointer from S4 to itself. This is not
possible in case of hierarchical model. Similarly a new part can be inserted who does not
supplied by any supplier.
Consider another case if supplier S 1 now starts supplying P3 part with quantity 100,
then a new connector containing the 100 as supplied quantity is added in to the model
and the pointer of S1 and P3 are modified as shown in the below.
We can summarize that there is no insert anomalies in network model as in hierarchical
model.
Update Operation: Unlike hierarchical model, where updation was carried out by
search and had many inconsistency problems, in a network model updating a record is a
much easier process. We can change the city of S I from Qadian to Jalandhar without
search or inconsistency problems because the city for S1 appears at just one place in the
network model. Similarly, same operation is performed to change the any attribute of
part.

Delete operation: If we wish to delete the information of any part say PI, then that
record occurrence can be deleted by removing the corresponding pointers and
connectors, without affecting the supplier who supplies that part i.e. P1, the model is
modified as shown. Similarly, same operation is performed to delete the information of
supplier.

In order to delete the shipment information, the connector for that shipment and
its corresponding pointers are removed without affecting supplier and part information.
For example, if supplier SI stops the supply of part PI with 250 quantity the model is
modified as shown below without affecting P1 and S1 information.

Retrieval Operation: Record retrieval methods for network model are symmetric but
complex. In order to understand this considers the following example queries:
Query 1. Find supplier number for suppliers who supply part P2.

Solution: In order to retrieve the required information, first we search for the required
part i.e. P2 we will get only one occurrence of P2 from the entire database, Then a loop
is constructed to visit each connector under this part i.e. P2. Then for each connector we
check the supplier over that connector and supplier number for the concerned supplier
record occurrence is printed as shown in below algorithm.
Algorithm
get [next] part where PNO=P2;
do until no more connectors under this part;
get next connector under this part;
get supplier over this connector;
print SNO;
Query 2. Find part number for parts supplied by supplier S2.
Solution: In order to retrieve the required information, same procedure is adopted.
First we search for the required supplier i.e. S2 and we will get only one occurrence of S2
from the entire database. Then a loop is constructed to visit each connector under this
supplier i.e. S2. Then for each connector we check the part over that connector and part
number for the concerned part record occurrence is printed as shown in below
algorithm.
Algorithm :
get [next] supplier where SNO=S2;
do until no more connectors under this supplier;
get next connector under this supplier;
get part over this connector;
print PNO;
end;
From both the above algorithms, we can conclude that retrieval algorithms are
symmetric, but they are complex because they involved lot of pointers.
Conclusion: As explained earlier, we can conclude that network model does not suffers
from the Insert anomalies, Update anomalies and Deletion anomalies, also the retrieve
operation is symmetric, as compared to hierarchical model, but the main disadvantage
is the complexity of the model. Since, each above operation involves the modification of
pointers, which makes whole model complicated and complex.

Advantages and Disadvantages of Network Model

The Network model retains almost all the advantages of the hierarchical model while
eliminating some of its shortcomings.
The main advantages of the network model are:
Conceptual simplicity: Just like the hierarchical model, the network model IS also
conceptually simple and easy to design.
Capability to handle more relationship types: The network model can handle the
one to- many (l:N) and many to many (N:N) relationships, which is a real help in
modeling the real life situations.
Ease of data access: The data access is easier and flexible than the hierarchical model.
Data Integrity: The network model does not allow a member to exist without an
owner. Thus, a user must first define the owner record and then the member record.
This ensures the data integrity.
Data independence: The network model is better than the hierarchical model in
isolating the programs from the complex physical storage details.
Database Standards: One of the major drawbacks of the hierarchical model was the
non-availability of universal standards for database design and modeling. The network
model is based on the standards formulated by the DBTG and augmented by ANSI/SP
ARC (American National Standards Institute/Standards Planning and Requirements
Committee) in the 1970s. All the network database management systems conformed to
these standards. These standards included a Data Definition Language (DDL) and the
Data Manipulation Language (DML), thus greatly enhancing database administration
and portability.

Disadvantages of Network Model


Even though the network database model was significantly better than the hierarchical
database model, it also had many drawbacks. Some of them are:
System complexity: All the records are maintained using pointers and hence the
whole database structure becomes very complex.
Operational Anomalies: As discussed earlier, network model's insertion, deletion
and updating operations of any record require large number of pointer adjustments,
which makes its implementation very complex and complicated.
Absence of structural independence: Since the data access method in the network
database model is a navigational system, making structural changes to the database is
very difficult in most cases and impossible in some cases. If changes are made to the
database structure then all the application programs need to be modified before they

can access data. Thus, even though the network database model succeeds in achieving
data independence, it still fails to achieve structural independence.
Because of the disadvantages mentioned and the implementation and administration
complexities, the relational database model replaced both the hierarchical and network
database models in the 1980s. The evolution of the relational database model is
considered as one of the greatest events-a major breakthrough in the history of database
management.
What are Strong and Weak Entity Sets in DBMS
BY DINESH THAKUR

The entity set which does not have sufficient attributes to form a primary key is called as
Weak entity set. An entity set that has a primary key is called as Strong entity set.
Consider an entity set Payment which has three attributes: payment_number,
payment_date and payment_amount. Although each payment entity is distinct but
payment for different loans may share the same payment number. Thus, this entity set
does not have a primary key and it is an entity set. Each weak set must be a part of oneto-many relationship set.
A member of a strong entity set is called dominant entity and member of weak entity set
is called as subordinate entity. A weak entity set does not have a primary key but we
need a means of distinguishing among all those entries in the entity set that depend on
one particular strong entity set. The discriminator of a weak entity set is a set of
attributes that allows this distinction be made. For example, payment_number acts as
discriminator for payment entity set. It is also called as the Partial key of the entity set.
The primary key of a weak entity set is formed by the primary key of the strong entity set
on which the weak entity set is existence dependent plus the weak entity sets
discriminator. In the above example {loan_number, payment_number} acts as primary
key for payment entity set.
The relationship between weak entity and strong entity set is called as Identifying
Relationship. In example, loan-payment is the identifying relationship for payment
entity. A weak entity set is represented by doubly outlined box .and corresponding
identifying relation by a doubly outlined diamond as shown in figure. Here double lines
indicate total participation of weak entity in strong entity set it means that every
payment must be related via loan-payment to some account. The arrow from loanpayment to loan indicates that each payment is for a single loan. The discriminator of a
weak entity set is underlined with dashed lines rather than solid line.

Let us consider another scenario, where we want to store the information of employees
and their dependents. The every employee may have zero to n number of dependents.
Every dependent has an id number and name.
Now let us consider the following data base:
There are three employees having E# as 1, 2, and 3 respectively.
Employee having E# 1, has two dependents as 1, Rahat and 2, Chahat.
Employee having E# 2, has no dependents.
Employee having E# 3, has three dependents as 1, Raju; 2, Ruhi; 3 Raja.
Now, in case of Dependent entity id cannot act as primary key because it is not unique.
Thus, Dependent is a weak entity set having id as a discriminator. It has a total
participation with the relationship "has" because no dependent can exist without the
employees (the company is concerned with employees). The E-R diagram for the
employee-dependent database is shown.
There are two tables need to created above e-r diagram. These are Employee having E#
as single column which acts as primary key. The other table will be of Dependent having
E#, id and name columns where primary key is the combination of (E# and id).
The tabular comparison between Strong Entity Set and Weak Entity Set is as follows:

Database Normalization
BY DINESH THAKUR

Normalization is the process of removing redundant data from your tables in order to
improve storage efficiency, data integrity and scalability. This improvement is balanced
against an increase in complexity and potential performance losses from the joining of
the normalized tables at query-time. There are two goals of the normalization process:
eliminating redundant data (for example, storing the same data in more than one table)
and ensuring data dependencies make sense (only storing related data in a table). Both
of these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored.
WHY WE NEED NORMALIZATION?
Normalization is the aim of well design Relational Database Management System
(RDBMS). It is step by step set of rules by which data is put in its simplest forms. We
normalize the relational database management system because of the following reasons:

Minimize data redundancy i.e. no unnecessarily duplication of data.


To make database structure flexible i.e. it should be possible to add new data values and
rows without reorganizing the database structure.

Data should be consistent throughout the database i.e. it should not suffer from
following anomalies.

Insert Anomaly - Due to lack of data i.e., all the data available for insertion such that
null values in keys should be avoided. This kind of anomaly can seriously damage a database

Update Anomaly - It is due to data redundancy i.e. multiple occurrences of same


values in a column. This can lead to inefficiency.

Deletion Anomaly - It leads to loss of data for rows that are not stored else where. It
could result in loss of vital data.

Complex queries required by the user should be easy to handle.

On decomposition of a relation into smaller relations with fewer attributes on


normalization the resulting relations whenever joined must result in the same relation without
any extra rows. The join operations can be performed in any order. This is known as Lossless
Join decomposition.

The resulting relations (tables) obtained on normalization should possess the properties
such as each row must be identified by a unique key, no repeating groups, homogenous
columns, each column is assigned a unique name etc.

ADVANTAGES OF NORMALIZATION
The following are the advantages of the normalization.

More efficient data structure.


Avoid redundant fields or columns.

More flexible data structure i.e. we should be able to add new rows and data values easily

Better understanding of data.

Ensures that distinct tables exist when necessary.

o
Easier to maintain data structure i.e. it is easy to perform operations and
complex queries can be easily handled.
o

Minimizes data duplication.

Close modeling of real world entities, processes and their relationships.

DISADVANTAGES OF NORMALIZATION
The following are disadvantages of normalization.
o

You cannot start building the database before you know what the user needs.

o
On Normalizing the relations to higher normal forms i.e. 4NF, 5NF the
performance degrades.
o
It is very time consuming and difficult process in normalizing relations of
higher degree.

o
Careless decomposition may leads to bad design of database which may
leads to serious problems.
How many normal forms are there?
They are

First Normal Form


Second Normal Form

Third Normal Form

Boyce-Codd Normal Form

Fourth Normal Form

Fifth Normal Form

Sixth or Domain-key Normal form

What do we mean when we say a table is not in normalized form?


Lets take an example to understand this,
Say I want to create a database which stores my friends name and their top three
favorite artists.
This database would be quite a simple so initially Ill be having only one table in it say
friends table. Here FID is the primary key.

This table is not in normal form why?


FavoriteArtist column is not atomic or doesnt have scalar value i.e. it has having more
that one value.
Lets modify this table

This table is also not in normal form why?


We have now changed our table and now each column has only one value!! (So whats
left?)
Because here we are having multiple columns with same kind of value.
I.e. repeating group of data or repeating columns.
So what we need to do to make it normal or at least bring it in First Normal Form?
1.
2.

Well first break our single table into two.


Each table should have information about only one entity so it would be nice if we store
our friends information in one table and his favorite artists information in another

(For simplicity we are working with few columns but in real world scenario there could
be column like friends phone no, email , address and favorites artists albums, awards
received by them, country etc. So in that case having two different tables would make
complete sense)

FID foreign key in FavoriteArtist table which refers to FID in our Friends Table.
Now we can say that our table is in first normal form.
Remember For First Normal Form
Column values should be atomic, scalar or should be holding single value
No repetition of information or values in multiple columns.

So what does Second Normal Form means?


For second normal form our database should already be in first normal form and every
non-key column must depend on entire primary key.
Here we can say that our Friend database was already in second normal form l.
Why?
Because we dont have composite primary key in our friends and favorite artists table.
Composite primary keys are- primary keys made up of more than one column. But there
is no such thing in our database.
But still lets try to understand second normal form with another example
This is our new table

In about table ITEM+SUPPLIER together


Lets check for dependency

form

composite

primary

key.

If I know gadget can I know the cost?


No same gadget is provided my different supplier at different rate.
If I know supplier can I know about the cost?
No because same supplier can provide me with different gadgets.
If I know both gadget and supplier can I know cost?
Yes than we can.
So cost is fully dependent (functionally dependent) on our composite primary key
(Gadgets+Supplier)
Lets start with another non-key column Supplier Address.
If I know gadget will I come to know about supplier address?
Obviously no.
If I know who the supplier is can I have it address?
Yes.
So here supplier is not completely dependent on (partial dependent) on our composite
primary key (Gadgets+Supplier).
This table is surely not in Second Normal Form.

So what do we need to do to
Here again well break the table in two.

bring

it

in

second

normal

form?

We now how to normalize till second normal form.


But lets take a break over here and learn some definitions and terms.
Composite Key: -Composite key is a primary key composed of multiple columns.
Functional Dependency When value of one column is dependent on another
column.
So that if value of one column changes the value of other column changes as well.
e.g. Supplier Address is functionally dependent on supplier name. If suppliers name is
changed in a record we need to change the supplier address as well.
S.SupplierS.SupplierAddress
In our s table supplier address column is functionally dependent on the supplier
column
Partial Functional Dependency A non-key column is dependent on some, but not
all the columns in a composite primary key.
In our above example Supplier Address was partially dependent on our composite
key columns (Gadgets+Supplier).
Transitive Dependency- A transitive dependency is a type of functional
dependency in which the value in a non-key column is determined by the value in
another non-key column.
With these definitions in mind lets move to Third Normal Form.
For a table in third normal form

It should already be in Second Normal Form.


There should be no transitive dependency, i.e. we shouldnt have any non-key column
depending on any other non-key column.

Again we need to make sure that the non-key columns depend upon the primary key and
not on any other non-key column.

Although the above table looks fine but still there is something in it because of which we
will normalize it further.
Album is the primary key of the above table.
Artist and No. of tracks are functionally dependent on the Album(primary key).
But can we say the same of Country as well?
In the above table Country value is getting repeated because of artist.
So in our above table Country column is depended on Artist column which is a non-key
column.
So we will move that information in another table and could save table from redundancy
i.e. repeating values of Country column.

What are the Components of DBMS?


BY DINESH THAKUR

A typical structure of a DBMS with its components and relationships between them is
show. The DBMS software is partitioned into several modules. Each module or
component is assigned a specific operation to perform. Some of the functions of the
DBMS are supported by operating systems (OS) to provide basic services and DBMS is
built on top of it. The physical data and system catalog are stored on a physical disk.
Access to the disk is controlled primarily by as, which schedules disk input/output.
Therefore, while designing a DBMS its interface with the as must be taken into account.

Components of a DBMS
The DBMS accepts the SQL commands generated from a variety of user interfaces,
produces query evaluation plans, executes these plans against the database, and returns
the answers. As shown, the major software modules or components of DBMS are as
follows:

(i) Query processor: The query processor transforms user queries into a series of low
level instructions. It is used to interpret the online user's query and convert it into an
efficient series of operations in a form capable of being sent to the run time data
manager for execution. The query processor uses the data dictionary to find the
structure of the relevant portion of the database and uses this information in modifying
the query and preparing and optimal plan to access the database.
(ii) Run time database manager: Run time database manager is the central
software component of the DBMS, which interfaces with user-submitted application
programs and queries. It handles database access at run time. It converts operations in
user's queries coming. Directly via the query processor or indirectly via an application
program from the user's logical view to a physical file system. It accepts queries and
examines the external and conceptual schemas to determine what conceptual records
are required to satisfy the users request. It enforces constraints to maintain the
consistency and integrity of the data, as well as its security. It also performs backing and
recovery operations. Run time database manager is sometimes referred to as the
database control system and has the following components:
Authorization control: The authorization control module checks the authorization
of users in terms of various privileges to users.
Command processor: The command processor processes the queries passed by
authorization control module.

Integrity checker: It .checks the integrity constraints so that only valid data can be
entered into the database.
Query optimizer: The query optimizers determine an optimal strategy for the query
execution.
Transaction manager: The transaction manager
transaction properties should be maintained by the system.

ensures

that

the

Scheduler: It provides an environment in which multiple users can work on same


piece of data at the same time in other words it supports concurrency.
(iii) Data Manager: The data manager is responsible for the actual handling of data in
the database. It provides recovery to the system which that system should be able to
recover the data after some failure. It includes Recovery manager and Buffer manager.
The buffer manager is responsible for the transfer of data between the
main memory and secondary storage (such as disk or tape). It is also referred as the
cache manger.

Execution Process of a DBMS

As show, conceptually, following logical steps are followed while executing users to
request to access the database system:
(I) Users issue a query using particular database language, for example, SQL commands.
(ii) The passes query is presented to a query optimizer, which uses information about
how the data is stored to produce an efficient execution plan for the evaluating the
query.
(iii) The DBMS accepts the users SQL commands and analyses them.
(iv) The DBMS produces query evaluation plans, that is, the external schema for the
user, the corresponding external/conceptual mapping, the conceptual schema, the
conceptual/internal mapping, and the storage structure definition. Thus, an evaluation\
plan is a blueprint for evaluating a query.
(v) The DBMS executes these plans against the physical database and returns the
answers to the user.
Using components such as transaction manager, buffer manager, and recovery manager,
the DBMS supports concurrency and recovery.
What is ER-Model?Advantages and Disadvantages of E-R Model.
BY DINESH THAKUR

There are two techniques used for the purpose of data base designing from the system
requirements. These are:
Top down Approach known as Entity-Relationship Modeling
Bottom Up approach known as Normalization.
we will focus on top down approach of designing database. It is a graphical technique,
which is used to convert the requirement of the system to graphical representation, so
that it can become well understandable. It also provides the framework for designing of
database.
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 as a way
to unify the network and relational database views. Simply stated, the ER model is a
conceptual data model that views the real world as entities and relationships. A basic
component of the model is the Entity-Relationship diagram, which is used to visually
represent data objects. For the database designer, the utility of the ER model is:
It maps well to the relational model. The constructs used in the ER model can easily be
transformed into relational tables.
It is simple and easy to understand with a minimum of training. Therefore, the model
can be used by the database designer to communicate the design to the end user.
In addition, the model can be used as a design plan by the database developer to
implement a data model in specific database management software.

Advantages and Disadvantages of E-R Data Model


Following are advantages of an E-R Model:
Straightforward relation representation: Having designed an E-R diagram for a
database application, the relational representation of the database model becomes
relatively straightforward.
Easy conversion for E-R to other data model: Conversion from E-R diagram to
a network or hierarchical data model can easily be accomplished.
Graphical representation for better understanding: An E-R model gives graphical and
diagrammatical representation of various entities, its attributes and relationships
between entities. This is turn helps in the clear understanding of the data structure and
in minimizing redundancy and other problems.

Disadvantages of E-R Data Model


Following are disadvantages of an E-R Model:
No industry standard for notation: There is no industry standard notation for
developing an E-R diagram.
Popular for high-level design: The E-R data model is especially popular for high
level.
What are Data Models? Type of Data Models.
BY DINESH THAKUR

A model is a representation of reality, 'real world' objects and events, associations. It is


an abstraction that concentrates on the essential, inherent aspects an organization and
ignores the accidental properties. A data model represents the organization itself. It
should provide the basic concepts and notations that will allow database designers and
end users unambiguously and accurately to communicate their understanding of the
organizational data.
Data Model can be defined as an integrated collection of concepts for describing and
manipulating data, relationships between data, and constraints on the data in an
organization.
A data model comprises of three components:
A structural part, consisting of a set of rules according to which databases can be
constructed.
A manipulative part, defining the types of operation that are allowed on the data (this
includes the operations that are used for updating or retrieving data from the database
and for changing the structure of the database).
Possibly a set of integrity rules, which ensures that the data is accurate.

The purpose of a data model is to represent data and to make the data understandable.
There have been many data models proposed in the literature. They fall into three broad
categories:

Object Based Data Models


Physical Data Models

Record Based Data Models

The object based and record based data models are used to describe data at the
conceptual and external levels, the physical data model is used to describe data at the
internal level.

Object Based Data Models


Object based data models use concepts such as entities, attributes, and relationships. An
entity is a distinct object (a person, place, concept, and event) in the organization that is
to be represented in the database. An attribute is a property that describes some aspect
of the object that we wish to record, and a relationship is an association between
entities.
Some of the more common types of object based data model are:
Entity-Relationship
Object Oriented
Semantic
Functional
The Entity-Relationship model has emerged as one of the main techniques for modeling
database design and forms the basis for the database design methodology. The object
oriented data model extends the definition of an entity to include, not only the attributes
that describe the state of the object but also the actions that are associated with the
object, that is, its behavior. The object is said to encapsulate both state and behavior.
Entities in semantic systems represent the equivalent of a record in a relational system
or an object in an OO system but they do not include behaviour (methods). They are
abstractions 'used to represent real world (e.g. customer) or conceptual (e.g. bank
account) objects. The functional data model is now almost twenty years old. The original
idea was to' view the database as a collection of extensionally defined functions and to
use a functional language for querying the database.

Physical Data Models


Physical data models describe how data is stored in the computer,
representing information such as record structures, record ordering, and access paths.

There are not as many physical data models as logical data models, the most common
one being the Unifying Model.

Record Based Logical Models


Record based logical models are used in describing data at the logical and view levels. In
contrast to object based data models, they are used to specify the overall logical
structure of the database and to provide a higher-level description of the
implementation. Record based models are so named because the database is structured
in fixed format records of several types. Each record type defines a fixed number of
fields, or attributes, and each field is usually of a fixed length.
The three most widely accepted record based data models are:
Hierarchical Model
Network Model
Relational Model
The relational model has gained favor over the other two in recent years. The network
and hierarchical models are still used in a large number of older databases.
Type of Functional Dependence (FD)
BY DINESH THAKUR

A functional dependency is an association between two attributes of the same


relationaldatabase table. One of the attributes is called the determinant and the other
attribute is called the determined. For each value of the determinant there is associated
one and only one value of the determined.
If A is the determinant and B is the determined then we say that A functionally
determines B and graphically represent this as A -> B. The symbols A B can also be
expressed as B is functionally determined by A.
Example

Since for each value of A there is associated one and only one value of B.
Example

Since for A = 3 there is associated more than one value of B.


Functional dependency can also be defined as follows:
An attribute in a relational model is said to be functionally dependent on another
attribute in the table if it can take only one value for a given value of the attribute upon
which it is functionally dependent.
Example: Consider the database having following tables:

Here in Supplier table


Sno

Supplier number of supplier that is unique

Sname

Supplier name

City

City of the supplier

Status
cities

Status of the city e.g. A grade cities may have status 10, B grad
may have status 20 and so on.

Here, Sname is FD on Sno. Because, Sname can take only one value for the given value
of Sno (e.g. S 1) or in other words there must be one Sname for supplier number S1.
FD is represented as:
Sno Sname
FD is shown by which means that Sname is functionally dependent on Sno.
Similarly, city and status are also FD on Sno, because for each value of Sno there will be
only one city and status.
FD is represented as:

Sno - City
Sno - Status
S. Sno - S (Sname, City, Status)
Consider another database of shipment with following attributes:
Sno

Supplier number of the supplier

Pno

Part number supplied by supplier

Qty

Quantity supplied by supplier for a particular Part no

In this case Qty is FD on combination of Sno, Pno because each combination of Sno and
Pno results only for one Quantity.
SP (Sno, Pno) --> SP.QTY

Dependency Diagrams
A dependency diagram consists of the attribute names and all functional dependencies
in a given table. The dependency diagram of Supplier table is.

Here, following functional dependencies exist in supplier table


Sno
Sname

Sname
Sno

Sno

City

Sno

Status

Sname

City

Sname

Status

City

Status

The FD diagram of relation P is

Here following functional dependencies exist in Part table:


Pno - Pname
Pno - Color
Pno - Wt

The FD diagram of relation Shipment is


Here following functional dependencies exist in parts table

SP (Sno, Pno) - SP.QTY


Fully Functional Dependence (FFD)
Fully Functional Dependence (FFD) is defined, as Attribute Y is FFD on attribute" X, if
it is FD on X and not FD on any proper subset of X. For example, in relation Supplier,
different cities may have the same status. It may be possible that cities like Amritsar,
Jalandhar may have the same status 10.
So, the City is not FD on Status.
But, the combination of Sno, Status can give only one corresponding City ,because Sno"
is unique. Thus,
(Sno, Status) City
It means city is FD on composite attribute (Sno, Status) however City is not fully
functional dependent on this composite attribute, which is explained below:
(Sno, Status) City
X

Here Y is FD on X, but X has two proper subsets Sno and Status; city is FD .on one
proper subset .of X i.e. Sno
Sno City
According to 'FFD definition Y must not be FD .on any proper subset of X, but here City
is FD in one subset .of X i.e. Sno, so City is not FFD on (Sno, Status)
Consider another case of SP table:
Here, Qty is FD on combination of Sna, Pno.
(Sno, Pno)
X

Qty
Y

Here, X has two proper subsets Sno and Pna


Qty is not FD on Sno, because one Sna can supply mare than .one quantity.
Qty is also not FD on Pno, because .one Pna may be supplied many times by different
suppliers with different .or same quantities.
So, Qty is FFD and composite attribute of (Sno, Pno) Qty.
Other Functional Dependencies
There are same rather types of functional dependencies, which play a vital rule during
the process .of normalization of data.
Candidate Functional Dependency

A candidate functional dependency is a functional dependency that includes all


attributes of the table. It should also be noted that a well-fanned dependency diagram
must have at least one candidate functional dependency, and that there can be more
than .one candidate functional dependency for a given dependency diagram.
Primary Functional Dependency
A primary functional dependency is a candidate functional dependency that is selected
to determine the primary key. The determinant of the primary functional dependency is
the primary key of the relational database table. Each dependency diagram must have
one and only on primary functional dependency. If a relational database table has
.only .one candidate functional dependency, then it automatically becomes the primary
functional dependency
Once the primary key has been determined, there will be three possible types of
functional dependencies:
Description
A B A key attribute functionally determines a non-key attribute.
A B A non-key attribute functionally determines a non-key attribute.
A B A non-key attribute functionally determines a key attribute.
A partial functional dependency is a functional dependency where the determinant
consists of key attributes, but not the entire primary key, and the determined consist~ of
non-key attributes.
A transitive functional dependency is a functional dependency where the
determinant consists of non-key attributes and the determined also consists of non-key
attributes.
A Boyce-Codd functional dependency is a functional dependency where the
determinant consists of non-key attributes and the determined consists of key
attributes.
A Multi-Value Dependency (MVD) occurs when two or more independent multi
valued facts about the same attribute occur within the same table. It means that if in a
relation R having A, Band C as attributes, B and Care multi-value facts about A, which is
represented as A B and A C ,then multi value dependency exist only if B and C are
independent on each other.
A Join Dependency exists if a relation R is equal to the join of the projections X Z.
where X, Y, Z projections of R.
Closure of set of dependencies
Let a relation R have some functional dependencies F specified. The closure of
F (usually written as F+) is the set of all functional dependencies that may be logically
derived from F. Often F is the set of most obvious and important functional

dependencies and. F+, the closure, is the set of all the functional dependencies
includingF and those that can be deduced from F. The closure is important and may, for
example, be needed in finding one or more candidate keys of the relation.
For example, the student relation has the following functional dependencies
sno Sname
cno came
sno address
cno instructor
Instructor office
Let these dependencies be denoted by F. The closure of F, denoted
includes F and all functional- dependencies that are implied by F.

by F +,

To determine F+, we need rules for deriving all functional dependencies that are
implied: by F. A set of rules that may be used to infer additional dependencies was
proposed by Armstrong in 1974. These rules (or axioms) are a complete set of rules in
that all possible functional dependencies may be derived from them. The rules are:
1. Reflexivity Rule - If X is a set of attributes and Y is a subset of X, then X Y holds.
The reflexivity rule is the simplest (almost trivial) rule. It states that each subset of X is
functionally dependent on X. In other words trivial dependence is defined as follows:
Trivial functional dependency: A trivial functional dependency is a functional
dependency of an attribute on a superset of itself.
For example: {Employee ID, Employee Address} {Employee Address} is trivial, here
{Employee Address} is a subset of {Employee ID, Employee Address}.
2. Augmentation Rule
then WX WY holds.

- If X

Y holds

and W is

set

of

attributes,

and

The argumentation ('u rule is also quite simple. It states that if Y is determined
by X then
a
set
of
attributes Wand Y together
will
be
determined
by W and X together. Note that we use the notation WX to mean the collection of all
attributes in W and X and write WX rather than the more conventional (W, X) for
convenience.
For example: Rno - Name; Class and Marks is a set of attributes and act as
W. Then {Rno, Class, Marks} -> {Name, Class, Marks}

3. Transitivity Rule - If X -> Y and Y -> Z hold, then X -> Z holds.


The transitivity rule is perhaps the most important one. It states that if X functionally
determines Y and Y functionally determine Z then X functionally determines Z.
For example: Rno -> City and City -> Status, then Rno -> Status should be holding
true.
These rules are called Armstrong's Axioms.
Further axioms may be derived from the above although the above three axioms
are sound and complete in that they do not generate any incorrect functional
dependencies (soundness) and they do generate all possible functional dependencies
that can be inferred from F (completeness). The most important additional axioms are:
1. Union Rule - If X -> Y and X -> Z hold, then X -> YZ holds.
2. Decomposition Rule - If X YZ holds, then so do X Y and X Z.
3. Pseudotransitivity Rule - If X Y and WY Z hold then so does WX Z.
Based on the above axioms and the .functional dependencies specified for
relation student, we may write a large number of functional dependencies. Some of
these are:
( sno, cno) sno (Rule 1)
(sno, cno) cno (Rule 1)
(sno, cno) (Sname, cname) (Rule 2)
cno office (Rule 3)
sno (Sname, address) (Union Rule)
Etc.
Often a very large list of dependencies can be derived from a given set F since Rule 1
itself will lead to a large number of dependencies. Since we have seven
attributes (sno, Sname, address, cno, cname, instructor, office),there are 128 (that is,
2^7) subsets of these attributes. These 128 subsets could form 128 values of X in
functional dependencies of the type X ~ Y. Of course, each value of X will then be
associated with a number of values for Y (Y being a subset of x) Leading to several
thousand dependencies. These large numbers of dependencies are not particularly
helpful in achieving our aim of normalizing relations.
Although we could follow the present procedure and compute the closure of F to find all
the functional dependencies, the computation requires exponential time and the list of
dependencies is often very large and therefore not very useful. There are two possible
approaches that can be taken to avoid dealing with the large number of dependencies in
the closure. 'One' is to deal with one attribute or a set of attributes at a time and find its
closure (i.e. all functional dependencies relating to them). The aim of this exercise is to

find what attributes depend on a given set of attributes and therefore ought to be
together. The other approach is to find the minimal covers.
Minimal Functional Dependencies or Irreducible Set of Dependencies
In discussing the concept of equivalent FDs, it is useful to define the concept of minimal
functional dependenciesor minimal cover which is useful in eliminating necessary
functional dependencies so that only the minimal numbers of dependencies need to be
enforced by the system. The concept of minimal cover of F is sometimes
called irreducible Set of F.
A functional depending set S is irreducible if the set has three following properties:
Each right set of a functional dependency of S contains only one attribute.
Each left set of a functional dependency of S is irreducible. It means that reducing
anyone attribute from left set will change the content of S (S will lose
some information).
Reducing any functional dependency will change the content of S.
Sets
of
functional
dependencies
called canonical or minimal.

with

these

properties

are

also

E-R NOTATION
BY DINESH THAKUR

There is no standard for representing data objects in ER diagrams. Each modeling


methodology uses its own notation.
All notational styles represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent the cardinality of
connection. The symbols used for the basic ER constructs are:
Entities are represented by labeled rectangles. The label is the name of the entity.
Entity names should be singular nouns.
Attributes are represented by Ellipses.
A solid line connecting two entities represents relationships. The name of the
relationship is written above the line. Relationship names should be verbs and
diamonds sign is used to represent relationship sets.
Attributes, when included, are listed inside the entity rectangle. Attributes, which are
identifiers, are underlined. Attribute names should be singular nouns.
Multi-valued attributes are represented by double ellipses.
Directed line is used to indicate one occurrence and undirected line is used to indicate
many occurrences in a relation.
The symbols used to design an ER diagram are shown.

The ER diagram showing the usage of different symbols

What are the Problems with E-R Model?


BY DINESH THAKUR

The E-R model can result problems due to limitations in the way the entities are related
in the relational databases. These problems are called connection traps. These problems
often occur due to a misinterpretation of the meaning of certain relationships.
Two main types of connection traps are called fan traps and chasm traps.

Fan Trap. It occurs when a model represents a relationship between entity


types, but pathway between certain entity occurrences is ambiguous.
Chasm Trap. It occurs when a model suggests the existence of a relationship
between entity types, but pathway does not exist between certain entity occurrences.

Now, we will discuss the each trap in detail address)

Fan Trap
A fan trap occurs when one to many relationships fan out from a single entity.
For example: Consider a database of Department, Site and Staff, where one site can
contain number of department, but a department is situated only at a single site. There
are multiple staff members working at a single site and a staff member can work from a
single site. The above case is represented in e-r diagram shown.

The problem of above e-r diagram is that, which staff works in a particular department
remain answered. The solution is to restructure the original E-R model to' represent the
correct association as shown.

In other words the two entities should have a direct relationship between them to
provide the necessaryinformation.
There is one another way to solve the problem of e-r diagram of figure, by introducing
direct relationship between DEPT and STAFF as shown in figure.

Another example: Let us consider another case, where one branch contains multiple
staff members and cars, which are represented.

The problem of above E-R diagram is that, it is unable to tell which member of staff uses
a particular, which is represented. It is not possible tell which member of staff uses' car
SH34.

The solution is to shown the relationship between STAFF and CAR as shown.

With this relationship the fan rap is resolved and now it is possible to tell car SH34 is
used by S1500 as shown in figure. It means it is now possible to tell which car is used by
which staff.

Chasm Trap
As discussed earlier, a chasm trap occurs when a model suggests the existence of a
relationship between entity types, but the pathway does not exist between certain entity
occurrences.
It occurs where there is a relationship with partial participation, which forms part of the
pathway between entities that are related.
For example: Let us consider a database where, a single branch is allocated many staff
who handles the management of properties for rent. Not all staff members handle the
property and not all property is managed by a member of staff. The above case is
represented in the e-r diagram.

Now, the above e-r diagram is not able to represent what properties are available at a
branch. The partial participation of Staff and Property in the SP relation means that
some properties cannot be associated with a branch office through a member of staff.
We need to add the missing relationship which is called BP between the Branch and the
Property entities as shown.

Another example: Consider another case, where a branch has multiple cars but a car
can be associated with a single branch. The car is handles by a single staff and a staff can
use only a single cat. Some of staff members have no car available for their use. The
above case is represented in E-R diagram with appropriate connectivity and cardinality.

The problem of the above E-R diagram is that, it is not possible tell in which branch staff
member S0003 works at as shown.

It means the above e-r diagram is not able to represent the relationship between the
BRANCH and STAFF due the partial participation of CAR and STAFF entities. We need
to add the missing relationship which is called BS between the Branch and STAFF
entities as shown.

With this relationship the Chasm trap resolved and now it is possible to represent to
which branch each member of staff works at, as for our example of staff S003 as shown.

Type of Database System


BY DINESH THAKUR

The DBMS can be classified according to the number of users and the database
sitelocations. These are:
On the basis of the number of users:
Single-user DBMS
Multi-user DBMS
On the basis of the site location
Centralized DBMS
Parallel DBMS
Distributed DBMS
Client/server DBMS
we will discuss about some of the important types of DBMS system, which are presently
being used.

The database system may be multi-user or single-user. The configuration of the


hardware and the size of the organization will determine whether it is a multi-user
system or a single user system.
In single user system the database resides on one computer and is only accessed by one
user at a time. This one user may design, maintain, and write database programs.
Due to large amount of data management most systems are multi-user. In this situation
the data are both integrated and shared. A database is integrated when the
same information is not recorded in two places. For example, both the Library
department and the Account department of the college database may need student
addresses. Even though both departments may access different portions of the database,
the students' addresses should only reside in one place. It is the job of the DBA to make
sure that the DBMS makes the correct addresses available from one central storage area.

Centralized Database System


The centralized database system consists of a single processor together with its
associated data storage devicesand other peripherals. It is physically confined to a single
location. Data can be accessed from the multiple sites with the use of a computer
network while the database is maintained at the central site.

Disadvantages of Centralized Database System


When the central site computer or database system goes down, then every one (users)
is blocked from using the system until the system comes back.
Communication costs from the terminals to the central site can be expensive.

Parallel Database System


Parallel database system architecture consists of a multiple Central Processing
Units (CPUs) and data storage disk in parallel. Hence, they improve processing and
Input/Output (I/O) speeds. Parallel database systems are used in the application that
have to query extremely large databases or that have to process an extremely large
number of transactions per second.
Advantages of a Parallel Database System
Parallel database systems are very useful for the applications that have to query
extremely large databases (of the order of terabytes, for example, 1012 bytes) or that
have to process an extremely large number of transactions per second (of the order of
thousands of transactions per second).
In a parallel database system, the throughput (that is, the number of tasks that can be
completed in a given time interval) and the response time (that is, the amount of time it
takes to complete a single task from the time it is submitted) are very high.
Disadvantages of a Parallel Database System
In a parallel database system, there is a startup cost associated with initiating a single
process and the startup-time may overshadow the actual processing time, affecting
speedup adversely.
Since process executing in a parallel system often access shared resources, a slowdown
may result from interference of each new process as it completes with existing processes
for commonly held resources, such as shared data storage disks, system bus and so on.

Distributed Database System


A logically interrelated collection of shared data physically distributed over a computer
network is called as distributed database and the software system that permits the
management of the distributed database and makes the distribution transparent to
users is called as Distributed DBMS.
It consists of a single logical database that is split into a number of fragments. Each
fragment is stored on one or more computers under the control of a separate DBMS,
with the computers connected by a communications network. As shown, in distributed
database system, data is spread across a variety of different databases. These are
managed by a variety of different DBMS software running on a variety of different
operating systems. These machines are spread (or distributed) geographically and
connected together by a variety of communication networks.

Advantages of Distributed Database System


Distributed database architecture provides greater efficiency and better performance.
A single database (on server) can be shared across several distinct client (application)
systems.
As data volumes and transaction rates increase, users can grow the system
incrementally.
It causes less impact on ongoing operations when adding new locations.
Distributed database system provides local autonomy.
Disadvantages of Distributed Database System
Recovery from failure is more complex in distributed database systems than in
centralized systems.

Client-Server DBMS
Client/Server architecture of database system has two logical components namely client,
and server. Clients are generally personal computers or workstations whereas server is
large workstations, mini range computer system or a mainframe computer system. The
applications and tools of DBMS run on one or more client platforms, while the DBMS
soft wares reside on the server. The server computer is caned backend and the client's

computer is called front end. These server and client computers are connected into a
network. The applications and tools act as clients of the DBMS, making requests for its
services. The DBMS, in turn, processes these requests and returns the results to the
client(s). Client/Server architecture handles the Graphical User Interface (GUI) and
does computations and other programming of interest to the end user. The server
handles parts of the job that are common to many clients, for example, database access
and updates.
Multi-Tier client server computing models
In a single-tier system the database is centralized, which means the DBMS Software and
the data reside in one location and the dumb terminals were used to access the DBMS as
shown.

The rise of personal computers in businesses during the 1980s, the increased reliability
of networking hardware causes Two-tier and Three-tier systems became common. In a
two-tier system, different software is required for the server and for the client.
Illustrates the two-tier client server model. At the early stages client server computing
model was called two-tier-computing model in which client is considered as data
capture and validation tier and Server was considered as data storage tier. This scenario
is depicted.
Problems of two-tier architecture
The need of enterprise scalability challenged this traditional two-tier client-server
model. In the mid-1990s, as application became more complex and could be deployed to

hundreds or thousands of end-users, the client side, now undergoes with following
problems:

A' fat' client requiring considerable resources on client's computer to run effectively.
This includes disk space,RAM and CPU.
Client machines require administration which results overhead.
Three-tier architecture
By 1995, three-tier architecture appears as improvement over two-tier architecture. It
has three layers, which are:
First Layer: User Interface which runs on end-user's computer (the client) .
Second Layer: Application Server It is a business logic and data processing layer.
This middle tier runs on a server which is called as Application Server.
Third Layer: Database Server It is a DBMS, which stores the data required by the
middle tier. This tier may run on a separate server called the database server.
As, described earlier, the client is now responsible for application's user interface, thus it
requires less computational resources now clients are called as 'thin client' and it
requires less maintenance.
Advantages of Client/Server Database System
Client/Server system has less expensive platforms to support applications that had
previously been running only on large and expensive mini or mainframe computers
Client offer icon-based menu-driven interface, which is superior to the traditional
command-line, dumb terminal interface typical of mini and mainframe computer
systems.
Client/Server environment facilitates in more productive work by the users and
making better use of existing data.

Client/Server database system is more flexible as compared to the Centralized system.


Response time and throughput is high.
The server (database) machine can be custom-built (tailored) to the DBMS function
and thus can provide a better DBMS performance.
The client (application database) might be a personnel workstation, tailored to the
needs of the end users and thus able to provide better interfaces, high availability, faster
responses and overall improved ease of use to the user. + A single database (on server)
can be shared across several distinct client (application) systems.
Disadvantages of Client/Server Database System
Programming cost is high in client/server environments, particularly in initial phases.
There is a lack of management tools for diagnosis, performance monitoring and tuning
and security control, for the DBMS, client and operating systems and networking
environments.

What is Metadata OR Data Dictionary?


BY DINESH THAKUR

A metadata (also called the data dictionary) is the data about the data. It is the self
describing nature of the database that provides program-data independence. It is also
called as the System Catalog. It holds the following information about each data element
in the databases, it normally includes:

+ Name
+ Type
+ Range of values
+ Source
+ Access authorization
+ Indicates which application programs use the data so that, when a change in a data
structure is contemplated, a list of the affected programs can be generated.
Data dictionary is used to actually control the database operation, data integrity and
accuracy. Metadata is used by developers to develop the programs, queries, controls and
procedures to manage and manipulate the data. Metadata is available to database
administrators (DBAs), designers and authorized user as on-line system documentation.
This improves the control of database administrators (DBAs) over the information
system and the user's understanding and use of the system.

Active and Passive Data Dictionaries


Data dictionary may be either active or passive. An active data dictionary (also called
integrated data dictionary) is managed automatically by the database management
software. Consistent with the current structure and definition of the database. Most of
the relational database management systems contain active data dictionaries that can be
derived from their system catalog.
The passive data dictionary (also called non-integrated data dictionary) is the one used
only for documentation purposes. Data about fields, files, people and so on, in the data
processing environment are. Entered into the dictionary and cross-referenced. Passive
dictionary is simply a self-contained application. It is managed by the users of the
system and is modified whenever the structure of the database is changed. Since this
modification must be performed manually by the user, it is possible that the data
dictionary will not be current with the current structure of the database. However, the
passive data dictionaries may be maintained as a separate database. Thus, it allows
developers to remain independent from using a particular relational database
management system. It may be extended to contain information about organizational
data that is not computerized.

Importance of Data Dictionary


Data dictionary is essential in DBMS because of the following reasons:
Data dictionary provides the name of a data element, its description and data structure
in which it may be found.
Data dictionary provides great assistance in producing a report of where a data
element is used in all programs that mention it.
It is also possible to search for a data name, given keywords that describe the name.
For example, one might want to determine the name of a variable that stands for net

pay. Entering keywords would produce a list of possible identifiers and their definitions.
Using keywords one can search the dictionary to locate the proper identifier to use in a
program.
These days, commercial data dictionary packages are available to facilitate entry, editing
and to use the data elements.
What is DBA?
BY DINESH THAKUR

A Database Administrator, Database Analyst or Database Developer is the


person responsible for managing theinformation within an organization. As most
companies continue to experience inevitable growth of their databases, these positions
are probably the most solid within the IT industry.
The DBA has many different responsibilities, but the overall goal of the DBA is to keep
the server up at all times and to provide users with access to the required
information when they need it. The DBA makes sure that the database is protected
and that any chance of data loss is minimized.
A DBA can be a programmer who, by default or by volunteering, took over the
responsibility of maintaining a SQL Server during project development and enjoyed
the job so much that he switched.
A DBA can be a system administrator who was given the added responsibility of
maintaining a SQL Server. DBAs can even come from unrelated fields, such as
accounting or the help desk, and switch to Information Systems to become DBAs. To
start your journey to becoming a Microsoft SQL Server DBA,
DBA Responsibilities
The following sections examine the responsibilities of the database administrator and
how they translate to various Microsoft SQL Server tasks.
Installing and Upgrading an SQL Server
The DBA is responsible for installing SQL Server or upgrading an existing SQL Server.
In the case of upgrading SQL Server, the DBA is responsible for ensuring that if the
upgrade is not successful, the SQL Server can be rolled back to an earlier release until
the upgrade issues can be resolved.

The DBA is also responsible for applying SQL Server service packs. A service pack is not
a true upgrade, but an installation of the current version of software with various bug
fixes and patches that have been resolved since the product's release.
Monitoring the Database Server's Health and Tuning Accordingly
Monitoring the health of the database server means making sure that the following is
done:

The server is running with optimal performance.


The error log or event log is monitored for database errors.
Databases have routine maintenance performed on them, and the overall system has
periodic maintenance performed by the system administrator.

Using Storage Properly


SQL Server 2000 enables you to automatically grow the size of your databases and
transaction logs, or you can choose to select a fixed size for the database and transaction
log. Either way, maintaining the proper use of storage means monitoring space
requirements and adding new storage space (disk drives) when required.
Performing Backup and Recovery Duties
Backup and recovery are the DBA's most critical tasks; they include the following
aspects:

Establishing standards and schedules for database backups


Developing recovery procedures for each database

Making sure that the backup schedules meet the recovery requirements

Managing Database Users and Security


With SQL Server 2000, the DBA works tightly with the Windows NT administrator to
add user NT logins to the database. In non-NT domains, the DBA adds user logins. The
DBA is also responsible for assigning users to databases and determining the proper
security level for each user. Within each database, the DBA is responsible for assigning
permissions to the various database objects such as tables, views, and stored
procedures.

Working with Developers


It is important for the DBA to work closely with development teams to assist in overall
database design, such as creating normalized databases, helping developers tune
queries, assigning proper indexes, and aiding developers in the creation of triggers and
stored procedures.
In the SQL Server 2000 environment, a good DBA will show the developers how to use
and take advantage of the SQL Server Index Tuning Wizard and the SQL Server profiler.
Establishing and Enforcing Standards
The DBA should establish naming conventions and standards for the SQL Server and
databases and make sure that everyone sticks to them.
Transferring Data
The DBA is responsible for importing and exporting data to and from the SQL Server. In
the current trend to downsize and combine client/server systems with mainframe
systems and Web technologies to create Enterprise systems, importing data from the
mainframe to SQL Server is a common occurrence that is about to become more
common with the SQL Server 2000 Data Transformation Services. Good DTS DBAs will
be in hot demand as companies struggle to move and translate legacy system to
Enterprise systems.
Replicating Data
SQL Server version 2000 has many different replication capabilities such as Merge
replication (2-way disconnected replication) and queued replication. Managing and
setting up replication topologies is a big undertaking for a DBA because of the
complexities involved with properly setting up and maintaining replication.
Data Warehousing
SQL Server 2000 has substantial data warehousing capabilities that require the DBA to
learn an additional product (the Microsoft OLAP Server) and architecture. Data

warehousing provides new and interesting challenges to the DBA and in some
companies a new career as a warehouse specialist.
Scheduling Events
The database administrator is responsible for setting up and scheduling various events
using Windows NT and SQL Server to aid in performing many tasks such as backups
and replication.
Providing 24-Hour Access
The database server must stay up, and the databases must always be protected and
online. Be prepared to perform some maintenance and upgrades after hours. Also be
prepared to carry that dreaded beeper. If the database server should go down, be ready
to get the server up and running. After all, that's your job.
Learning Constantly
To be a good DBA, you must continue to study and practice your mission-critical
procedures, such as testing your backups by recovering to a test database. In this
business, technology changes very fast, so you must continue learning about SQL Server,
available client/servers, and database design tools. It is a never-ending process.
The DBA should posses the following skills
(1) A good knowledge of the operating system(s)
(2) A good knowledge of physical database design
(3) Ability to perform both Oracle and also operating system performance monitoring
and the necessary adjustments.
(4) Be able to provide a strategic database direction for the organization.
(5) Excellent knowledge of Oracle backup and recovery scenarios.
(6) Good skills in all Oracle tools.
(7) A good knowledge of Oracle security management.
(8) A good knowledge of how Oracle acquires and manages resources.
(9) Sound knowledge of the applications at your site.
(10) Experience and knowledge in migrating code, database changes, data and

Menus through the various stages of the development life cycle.


(11) A good knowledge of the way Oracle enforces data integrity.
(12) A sound knowledge of both database and program code performance tuning.
(13) A DBA should possess a sound understanding of the business.
(14) A DBA should have sound communication skills with management, development
teams, vendors, systems administrators and other related service providers.

What is a Database Schema


BY DINESH THAKUR

A database instance controls 0 or more databases. A databasecontains 0 or more


database application schemas. A database application schema is the set of database
objects that apply to a specific application. These objects are relational in nature, and
are related to each other, within a database to serve a specific functionality.
For example payroll, purchasing, calibration, trigger, etc. A database application schema
not a database. Usually several schemas coexist in a database. Adatabase
application is the code base to manipulate and retrieve the data stored in the database
application schema.
The database schema changes very infrequently. The database state changes every
time the database is updated. Schema is also called intension, whereas state is
called extension.
What are the Components of DBMS?
BY DINESH THAKUR

A typical structure of a DBMS with its components and relationships between them is
show. The DBMS software is partitioned into several modules. Each module or
component is assigned a specific operation to perform. Some of the functions of the
DBMS are supported by operating systems (OS) to provide basic services and DBMS is
built on top of it. The physical data and system catalog are stored on a physical disk.
Access to the disk is controlled primarily by as, which schedules disk input/output.
Therefore, while designing a DBMS its interface with the as must be taken into account.

Components of a DBMS
The DBMS accepts the SQL commands generated from a variety of user interfaces,
produces query evaluation plans, executes these plans against the database, and returns
the answers. As shown, the major software modules or components of DBMS are as
follows:

(i) Query processor: The query processor transforms user queries into a series of low
level instructions. It is used to interpret the online user's query and convert it into an
efficient series of operations in a form capable of being sent to the run time data
manager for execution. The query processor uses the data dictionary to find the
structure of the relevant portion of the database and uses this information in modifying
the query and preparing and optimal plan to access the database.
(ii) Run time database manager: Run time database manager is the central
software component of the DBMS, which interfaces with user-submitted application
programs and queries. It handles database access at run time. It converts operations in
user's queries coming. Directly via the query processor or indirectly via an application
program from the user's logical view to a physical file system. It accepts queries and
examines the external and conceptual schemas to determine what conceptual records
are required to satisfy the users request. It enforces constraints to maintain the
consistency and integrity of the data, as well as its security. It also performs backing and
recovery operations. Run time database manager is sometimes referred to as the
database control system and has the following components:
Authorization control: The authorization control module checks the authorization
of users in terms of various privileges to users.
Command processor: The command processor processes the queries passed by
authorization control module.

Integrity checker: It .checks the integrity constraints so that only valid data can be
entered into the database.
Query optimizer: The query optimizers determine an optimal strategy for the query
execution.
Transaction manager: The transaction manager
transaction properties should be maintained by the system.

ensures

that

the

Scheduler: It provides an environment in which multiple users can work on same


piece of data at the same time in other words it supports concurrency.
(iii) Data Manager: The data manager is responsible for the actual handling of data in
the database. It provides recovery to the system which that system should be able to
recover the data after some failure. It includes Recovery manager and Buffer manager.
The buffer manager is responsible for the transfer of data between the
main memory and secondary storage (such as disk or tape). It is also referred as the
cache manger.

Execution Process of a DBMS

As show, conceptually, following logical steps are followed while executing users to
request to access the database system:
(I) Users issue a query using particular database language, for example, SQL commands.
(ii) The passes query is presented to a query optimizer, which uses information about
how the data is stored to produce an efficient execution plan for the evaluating the
query.
(iii) The DBMS accepts the users SQL commands and analyses them.
(iv) The DBMS produces query evaluation plans, that is, the external schema for the
user, the corresponding external/conceptual mapping, the conceptual schema, the
conceptual/internal mapping, and the storage structure definition. Thus, an evaluation\
plan is a blueprint for evaluating a query.
(v) The DBMS executes these plans against the physical database and returns the
answers to the user.
Using components such as transaction manager, buffer manager, and recovery manager,
the DBMS supports concurrency and recovery.
What is Data Independence of DBMS?
BY DINESH THAKUR

A major objective for three-level architecture is to provide data independence, which


means that upper levels are unaffected by changes in lower levels.
There are two kinds of data independence:
Logical data independence
Physical data independence
Logical Data Idependence
Logical data independence indicates that the conceptual schema can be changed without
affecting the existing external schemas. The change would be absorbed by the mapping
between the external and conceptual levels. Logical data independence also insulates
application programs from operations such as combining two records into one or
splitting an existing record into two or more records. This would require a. change in the
external/conceptual mapping so as to leave the external view unchanged.
Physical Data Independence
Physical data independence indicates that the physical storage structures or devices
could be changed without affecting conceptual schema. The change would be absorbed
by the mapping between the conceptual and internal levels. Physic 1data independence
is achieved by the presence of the internal level of the database and the n, lPping or
transformation from the conceptual level of the database to the internal level.
Conceptual level to internal level mapping, therefore provides a means to go from the

conceptual view (conceptual records) to the internal view and hence to the stored data
in the database (physical records).
If there is a need to change the file organization or the type of physical device used as a
result of growth in the database or new technology, a change is required in the
conceptual/ internal mapping between the conceptual and internal levels. This change is
necessary to maintain the conceptual level invariant. The physical data independence
criterion requires that the conceptual level does not specify storage structures or the
access methods (indexing, hashing etc.) used to retrieve the data from the physical
storage medium. Making the conceptual schema physically data independent means
that the external schema, which is defined on the conceptual schema, is in turn
physically data independent.
The Logical data independence is difficult to achieve than physical data independence as
it requires the flexibility in the design of database and prograll1iller has to foresee the
future requirements or modifications in the design.
What are the Difference Between DDL, DML and DCL Commands?
BY DINESH THAKUR

SQL statements are divided into two major categories: data definition language (DDL)
and data manipulation language (DML).
Data Definition Language (DDL)statements are used to define the database
structure or schema. Some examples:
*
CREATE
to
create
objects
in
the
database
*
ALTER
alters
the
structure
of
the
database
*
DROP
delete
objects
from
the
database
* TRUNCATE - remove all records from a table, including all spaces allocated for the
records
are
removed
*
COMMENT
add
comments
to
the
data
dictionary
* RENAME - rename an object
Data Manipulation Language (DML) statements are used for managing data
within schema objects. Some examples:
*
SELECT
retrieve
data
from
the
a
database
*
INSERT
insert
data
into
a
table
*
UPDATE
updates
existing
data
within
a
table
* DELETE - deletes all records from a table, the space for the records remain
*
MERGE
UPSERT
operation
(insert
or
update)
*
CALL
call
a
PL/SQL
or Java subprogram
*
EXPLAIN
PLAN
explain
access
path
to
data
* LOCK TABLE - control concurrency

Data Control Language (DCL) statements. Some examples:


*
GRANT
gives
user's
access
privileges
to
* REVOKE - withdraw access privileges given with the GRANT command

database

Transaction Control (TCL) statements are used to manage the changes made by
DML statements. It allows statements to be grouped together into logical transactions.
*
COMMIT
save
work
done
* SAVEPOINT - identify a point in a transaction to which you can later roll back
* ROLLBACK - restore database to original since the last COMMIT
* SET TRANSACTION - Change transaction options like isolation level and what
rollback segment to use
Type of Database System
BY DINESH THAKUR

The DBMS can be classified according to the number of users and the database
sitelocations. These are:
On the basis of the number of users:
Single-user DBMS
Multi-user DBMS
On the basis of the site location
Centralized DBMS
Parallel DBMS
Distributed DBMS
Client/server DBMS
we will discuss about some of the important types of DBMS system, which are presently
being used.
The database system may be multi-user or single-user. The configuration of the
hardware and the size of the organization will determine whether it is a multi-user
system or a single user system.

In single user system the database resides on one computer and is only accessed by one
user at a time. This one user may design, maintain, and write database programs.
Due to large amount of data management most systems are multi-user. In this situation
the data are both integrated and shared. A database is integrated when the
same information is not recorded in two places. For example, both the Library
department and the Account department of the college database may need student
addresses. Even though both departments may access different portions of the database,
the students' addresses should only reside in one place. It is the job of the DBA to make
sure that the DBMS makes the correct addresses available from one central storage area.

Centralized Database System


The centralized database system consists of a single processor together with its
associated data storage devicesand other peripherals. It is physically confined to a single
location. Data can be accessed from the multiple sites with the use of a computer
network while the database is maintained at the central site.

Disadvantages of Centralized Database System

When the central site computer or database system goes down, then every one (users)
is blocked from using the system until the system comes back.
Communication costs from the terminals to the central site can be expensive.

Parallel Database System


Parallel database system architecture consists of a multiple Central Processing
Units (CPUs) and data storage disk in parallel. Hence, they improve processing and
Input/Output (I/O) speeds. Parallel database systems are used in the application that
have to query extremely large databases or that have to process an extremely large
number of transactions per second.
Advantages of a Parallel Database System
Parallel database systems are very useful for the applications that have to query
extremely large databases (of the order of terabytes, for example, 1012 bytes) or that
have to process an extremely large number of transactions per second (of the order of
thousands of transactions per second).
In a parallel database system, the throughput (that is, the number of tasks that can be
completed in a given time interval) and the response time (that is, the amount of time it
takes to complete a single task from the time it is submitted) are very high.
Disadvantages of a Parallel Database System
In a parallel database system, there is a startup cost associated with initiating a single
process and the startup-time may overshadow the actual processing time, affecting
speedup adversely.
Since process executing in a parallel system often access shared resources, a slowdown
may result from interference of each new process as it completes with existing processes
for commonly held resources, such as shared data storage disks, system bus and so on.

Distributed Database System


A logically interrelated collection of shared data physically distributed over a computer
network is called as distributed database and the software system that permits the
management of the distributed database and makes the distribution transparent to
users is called as Distributed DBMS.
It consists of a single logical database that is split into a number of fragments. Each
fragment is stored on one or more computers under the control of a separate DBMS,
with the computers connected by a communications network. As shown, in distributed
database system, data is spread across a variety of different databases. These are
managed by a variety of different DBMS software running on a variety of different
operating systems. These machines are spread (or distributed) geographically and
connected together by a variety of communication networks.

Advantages of Distributed Database System


Distributed database architecture provides greater efficiency and better performance.
A single database (on server) can be shared across several distinct client (application)
systems.
As data volumes and transaction rates increase, users can grow the system
incrementally.
It causes less impact on ongoing operations when adding new locations.
Distributed database system provides local autonomy.
Disadvantages of Distributed Database System
Recovery from failure is more complex in distributed database systems than in
centralized systems.

Client-Server DBMS
Client/Server architecture of database system has two logical components namely client,
and server. Clients are generally personal computers or workstations whereas server is
large workstations, mini range computer system or a mainframe computer system. The
applications and tools of DBMS run on one or more client platforms, while the DBMS
soft wares reside on the server. The server computer is caned backend and the client's

computer is called front end. These server and client computers are connected into a
network. The applications and tools act as clients of the DBMS, making requests for its
services. The DBMS, in turn, processes these requests and returns the results to the
client(s). Client/Server architecture handles the Graphical User Interface (GUI) and
does computations and other programming of interest to the end user. The server
handles parts of the job that are common to many clients, for example, database access
and updates.
Multi-Tier client server computing models
In a single-tier system the database is centralized, which means the DBMS Software and
the data reside in one location and the dumb terminals were used to access the DBMS as
shown.

The rise of personal computers in businesses during the 1980s, the increased reliability
of networking hardware causes Two-tier and Three-tier systems became common. In a
two-tier system, different software is required for the server and for the client.
Illustrates the two-tier client server model. At the early stages client server computing
model was called two-tier-computing model in which client is considered as data
capture and validation tier and Server was considered as data storage tier. This scenario
is depicted.
Problems of two-tier architecture
The need of enterprise scalability challenged this traditional two-tier client-server
model. In the mid-1990s, as application became more complex and could be deployed to

hundreds or thousands of end-users, the client side, now undergoes with following
problems:

A' fat' client requiring considerable resources on client's computer to run effectively.
This includes disk space,RAM and CPU.
Client machines require administration which results overhead.
Three-tier architecture
By 1995, three-tier architecture appears as improvement over two-tier architecture. It
has three layers, which are:
First Layer: User Interface which runs on end-user's computer (the client) .
Second Layer: Application Server It is a business logic and data processing layer.
This middle tier runs on a server which is called as Application Server.
Third Layer: Database Server It is a DBMS, which stores the data required by the
middle tier. This tier may run on a separate server called the database server.
As, described earlier, the client is now responsible for application's user interface, thus it
requires less computational resources now clients are called as 'thin client' and it
requires less maintenance.
Advantages of Client/Server Database System
Client/Server system has less expensive platforms to support applications that had
previously been running only on large and expensive mini or mainframe computers
Client offer icon-based menu-driven interface, which is superior to the traditional
command-line, dumb terminal interface typical of mini and mainframe computer
systems.
Client/Server environment facilitates in more productive work by the users and
making better use of existing data.

Client/Server database system is more flexible as compared to the Centralized system.


Response time and throughput is high.
The server (database) machine can be custom-built (tailored) to the DBMS function
and thus can provide a better DBMS performance.
The client (application database) might be a personnel workstation, tailored to the
needs of the end users and thus able to provide better interfaces, high availability, faster
responses and overall improved ease of use to the user. + A single database (on server)
can be shared across several distinct client (application) systems.
Disadvantages of Client/Server Database System
Programming cost is high in client/server environments, particularly in initial phases.
There is a lack of management tools for diagnosis, performance monitoring and tuning
and security control, for the DBMS, client and operating systems and networking
environments.

You might also like