You are on page 1of 17

1

Advanced Topics in DBMS



Ch-2: Tree Structured Indexing

By
Syed khutubddin Ahmed
Assistant Professor
Dept. of MCA
Reva Institute of Technology & mgmt.


Syed Khutubuddin, Assistant Prof,
REVA ITM
REMEMBER
Two types of Index Data Structures:
1) Hash based Indexing
2) Tree Based Indexing
Syed Khutubuddin, Assistant Prof,
REVA ITM
2
Index Data Structure
2
The data entries are arranged in sorted order
by search key value.
and a hierarchical search data structure is
maintained.

Syed Khutubuddin, Assistant Prof,
REVA ITM
3
What is Tree-Based Indexing:
Syed Khutubuddin, Assistant Prof,
REVA ITM
4
Tree Structured index
3
index storage techniques uses 3 alternatives
for data entries k*:
Data record with key value k
<k, rid of data record with search key value k>
<k, list of rids of data records with search key k>

REMEMBER
Tree-structured indexing techniques support
both range searches and equality searches.
Syed Khutubuddin, Assistant Prof,
REVA ITM
5
Tree Structured indexing
Two techniques available in tree structured
indexing:
1. ISAM (indexed sequential access method)
2. B+ Trees
Both supports effective range searches

Syed Khutubuddin, Assistant Prof,
REVA ITM
6
Tree Structured indexing
4
ISAM
it is static index structure that is effective when the
file is not frequently updated.
This method is not suitable for a file that grows
and shrinks a lot.
B + Trees A dynamic structure that adjusts
to changes in the file gracefully.

Most widely used index structure.
because it adjusts well to changes
Supports equality search and range search
Syed Khutubuddin, Assistant Prof,
REVA ITM
7
Take an example of Range Searches
``Find all students with gpa > 3.0
What is the Solution:
If data is in sorted file, do binary search to find first
such student, then scan to find others.
Remember Cost of binary search can be quite high
if data is more as we are working on original data.

Simple idea: Create an `index file.

Syed Khutubuddin, Assistant Prof,
REVA ITM
8
Motivation for tree indexes
5
Index file may still be quite large. But we can
apply the idea repeatedly!
* Leaf pages contain data entries.
P
0
K
1
P
1
K
2
P
2
K
m
P
m
index entry
Non-leaf
Pages
Pages
Overflow
page
Primary pages
Leaf
ISAM
Comments on ISAM
File creation:
Leaf (data) pages allocated sequentially,
sorted by search key;
then index pages allocated,
then space for overflow pages.
Index entries: <search key value, page id>;
6
Non-leaf
Pages
Pages
Overflow
page
Primary pages
Leaf
ISAM
Data entries of the ISAM index are in the leaf
of the tree
and additional overflow pages chained to
some leaf pages.
ISAM structure is static (except for overflow
pages as they will be very few)


ISAM (Index Sequential Access method)
Each tree node is disk page.
When a file is created all leaf pages are allocated
sequentially and sorted on the search key value.
The non leaf level pages are then allocated.
If there are several inserts to the file (but is there is no
space ) then additional pages are needed because the
index is static (these pages are called Overflow pages).


7
basic operations of insertion, deletion, and search are all
quite straightforward
equality selection search start at the root node
For Range Query the starting point in the data (or
leaf) level is determined similarly, and data pages are
then retrieved sequentially.
Syed Khutubuddin, Assistant Prof,
REVA ITM
13
For inserts and deletes search the page and then
insert it or delete it with overflow pages added if
necessary.
assume that each leaf page can contain two entries.
Syed Khutubuddin, Assistant Prof,
REVA ITM
14
8
Let us insert the value 23 that is done by adding an
overflow page and putting 23* in the overflow page.
Chains of overflow pages can easily develop.
For instance, inserting 48*, 41 *, and 42* leads to an
overflow chain of two pages.
Syed Khutubuddin, Assistant Prof,
REVA ITM
15
The deletion of an entry k* is handled by
simply removing the entry.
If this entry is on an overflow page and the
overflow page becomes empty, the page can
be removed.
If the entry is on a primary page and deletion
makes the primary page empty, the simplest
approach is to simply leave the empty primary
page as it is; it serves as a placeholder for
future insertions.
Syed Khutubuddin, Assistant Prof,
REVA ITM
16
ISAM
9
once the ISAM file is created, inserts and
deletes affect only the contents of leaf pages.
It does not effect the Non leaf pages. As it is
fixed.

In comparison to B+ trees the non leaf pages
are not fixed. This is advantage of ISAM over
B+ tree. At the same time it has a
disadvantage of being static.
Syed Khutubuddin, Assistant Prof,
REVA ITM
17
Overflow pages, Locking Considerations
A static structure such as the ISAM index suffers from
the problem that long overflow chains can develop as
the file grows, leading to poor performance.
This problem motivated the development of more
flexible, dynamic structures that adjust gracefully to
inserts and deletes.
Syed Khutubuddin, Assistant Prof,
REVA ITM
18
B+ trees: Dynamic Index Structure
10
Problems with ISAM:
Long overflow pages leads to poor performance

Characteristics:
insertion:
Deletion:

Searching: just needs traversal from root to node
Height of the tree: rarely more than 3 to 4


Syed Khutubuddin, Assistant Prof,
REVA ITM
19
B+ trees
}
In both operations
tree is balanced
Format of the Tree node:
M index entries contains m+1 pointers


P
0
K
1
P
1
K
2
P
2
K
m
P
m
index entry
11
Search Operation on B+ trees:
* ptr value pointed by a pointer
&(value) to denote address value.

For search :
Assume no Duplicate entries
No same keys




Syed Khutubuddin, Assistant Prof,
REVA ITM
21
B+ trees Search Operation
Syed Khutubuddin, Assistant Prof,
REVA ITM
22
B+ trees- Search
Click the image to watch video
12
Syed Khutubuddin, Assistant Prof,
REVA ITM
23


B+ trees INSERT OPERATION (insert -8)
Syed Khutubuddin, Assistant Prof,
REVA ITM
24
Insert 8


13
Syed Khutubuddin, Assistant Prof,
REVA ITM
25
Insert 8 using REDISTRIBUTION CONCEPT

Syed Khutubuddin, Assistant Prof,
REVA ITM
26
DELETE: Deleting entry 19 & 20 from Fig-1
FIG-1
FIG-2
14
Syed Khutubuddin, Assistant Prof,
REVA ITM
27
DELETE Entry 24
Syed Khutubuddin, Assistant Prof,
REVA ITM
28
B+ tree in Practice
Key Compression:
If search key values are very long (for instance,
the name Devarakonda Venkataramana
Sathyanarayana Seshasayee Yellamanchali
Murthy. not many index entries will fit on a
page and the height of the tree is large.
search key values in index entries are used only
to direct traffic to the appropriate leaf
Solution:
it is sufficient to store the abbreviated forms 'Da'
and 'De for search key values 'David Smith' and
'Devarakonda
15
Syed Khutubuddin, Assistant Prof,
REVA ITM
29
B+ tree in Practice
Bulk Loading of B+ Trees:
Two way of lading data in B+ trees:

First: tree is already available just insert keys
in appropriate places.

Second: Tree is not available create from the
beginning.

Syed Khutubuddin, Assistant Prof,
REVA ITM
30
B+ tree in Practice
Example:
16
Syed Khutubuddin, Assistant Prof,
REVA ITM
31
B+ tree in Practice
Example:
Syed Khutubuddin, Assistant Prof,
REVA ITM
32
B+ tree in Practice
In Sybase ASE, depending on the concurrency
control scheme being used for the index, the
deleted row is removed (with merging if the page
occupancy goes below threshold) or simply
marked as deleted; a garbage collection scheme
is used to recover space.
Oracle 8, deletions are handled by marking the
row as deleted. to reclaim the space occupied by
deleted records, we can rebuild the index online
DB2 and SQL Server remove deleted records
and merge pages when occupancy goes below
threshold.
17
Syed Khutubuddin, Assistant Prof,
REVA ITM
33






END OF UNIT-2

You might also like