This document discusses tree-structured indexing techniques for databases. It covers two main techniques: ISAM and B+ trees. ISAM uses a static tree structure, while B+ trees are dynamic and adjust well to data changes through insertion and deletion operations. B+ trees are widely used because they support both equality and range searches efficiently and maintain a balanced structure with updates. The document provides examples of search, insert, and delete operations for B+ trees.
This document discusses tree-structured indexing techniques for databases. It covers two main techniques: ISAM and B+ trees. ISAM uses a static tree structure, while B+ trees are dynamic and adjust well to data changes through insertion and deletion operations. B+ trees are widely used because they support both equality and range searches efficiently and maintain a balanced structure with updates. The document provides examples of search, insert, and delete operations for B+ trees.
This document discusses tree-structured indexing techniques for databases. It covers two main techniques: ISAM and B+ trees. ISAM uses a static tree structure, while B+ trees are dynamic and adjust well to data changes through insertion and deletion operations. B+ trees are widely used because they support both equality and range searches efficiently and maintain a balanced structure with updates. The document provides examples of search, insert, and delete operations for B+ trees.
By Syed khutubddin Ahmed Assistant Professor Dept. of MCA Reva Institute of Technology & mgmt.
Syed Khutubuddin, Assistant Prof, REVA ITM REMEMBER Two types of Index Data Structures: 1) Hash based Indexing 2) Tree Based Indexing Syed Khutubuddin, Assistant Prof, REVA ITM 2 Index Data Structure 2 The data entries are arranged in sorted order by search key value. and a hierarchical search data structure is maintained.
Syed Khutubuddin, Assistant Prof, REVA ITM 3 What is Tree-Based Indexing: Syed Khutubuddin, Assistant Prof, REVA ITM 4 Tree Structured index 3 index storage techniques uses 3 alternatives for data entries k*: Data record with key value k <k, rid of data record with search key value k> <k, list of rids of data records with search key k>
REMEMBER Tree-structured indexing techniques support both range searches and equality searches. Syed Khutubuddin, Assistant Prof, REVA ITM 5 Tree Structured indexing Two techniques available in tree structured indexing: 1. ISAM (indexed sequential access method) 2. B+ Trees Both supports effective range searches
Syed Khutubuddin, Assistant Prof, REVA ITM 6 Tree Structured indexing 4 ISAM it is static index structure that is effective when the file is not frequently updated. This method is not suitable for a file that grows and shrinks a lot. B + Trees A dynamic structure that adjusts to changes in the file gracefully.
Most widely used index structure. because it adjusts well to changes Supports equality search and range search Syed Khutubuddin, Assistant Prof, REVA ITM 7 Take an example of Range Searches ``Find all students with gpa > 3.0 What is the Solution: If data is in sorted file, do binary search to find first such student, then scan to find others. Remember Cost of binary search can be quite high if data is more as we are working on original data.
Simple idea: Create an `index file.
Syed Khutubuddin, Assistant Prof, REVA ITM 8 Motivation for tree indexes 5 Index file may still be quite large. But we can apply the idea repeatedly! * Leaf pages contain data entries. P 0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf Pages Pages Overflow page Primary pages Leaf ISAM Comments on ISAM File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. Index entries: <search key value, page id>; 6 Non-leaf Pages Pages Overflow page Primary pages Leaf ISAM Data entries of the ISAM index are in the leaf of the tree and additional overflow pages chained to some leaf pages. ISAM structure is static (except for overflow pages as they will be very few)
ISAM (Index Sequential Access method) Each tree node is disk page. When a file is created all leaf pages are allocated sequentially and sorted on the search key value. The non leaf level pages are then allocated. If there are several inserts to the file (but is there is no space ) then additional pages are needed because the index is static (these pages are called Overflow pages).
7 basic operations of insertion, deletion, and search are all quite straightforward equality selection search start at the root node For Range Query the starting point in the data (or leaf) level is determined similarly, and data pages are then retrieved sequentially. Syed Khutubuddin, Assistant Prof, REVA ITM 13 For inserts and deletes search the page and then insert it or delete it with overflow pages added if necessary. assume that each leaf page can contain two entries. Syed Khutubuddin, Assistant Prof, REVA ITM 14 8 Let us insert the value 23 that is done by adding an overflow page and putting 23* in the overflow page. Chains of overflow pages can easily develop. For instance, inserting 48*, 41 *, and 42* leads to an overflow chain of two pages. Syed Khutubuddin, Assistant Prof, REVA ITM 15 The deletion of an entry k* is handled by simply removing the entry. If this entry is on an overflow page and the overflow page becomes empty, the page can be removed. If the entry is on a primary page and deletion makes the primary page empty, the simplest approach is to simply leave the empty primary page as it is; it serves as a placeholder for future insertions. Syed Khutubuddin, Assistant Prof, REVA ITM 16 ISAM 9 once the ISAM file is created, inserts and deletes affect only the contents of leaf pages. It does not effect the Non leaf pages. As it is fixed.
In comparison to B+ trees the non leaf pages are not fixed. This is advantage of ISAM over B+ tree. At the same time it has a disadvantage of being static. Syed Khutubuddin, Assistant Prof, REVA ITM 17 Overflow pages, Locking Considerations A static structure such as the ISAM index suffers from the problem that long overflow chains can develop as the file grows, leading to poor performance. This problem motivated the development of more flexible, dynamic structures that adjust gracefully to inserts and deletes. Syed Khutubuddin, Assistant Prof, REVA ITM 18 B+ trees: Dynamic Index Structure 10 Problems with ISAM: Long overflow pages leads to poor performance
Characteristics: insertion: Deletion:
Searching: just needs traversal from root to node Height of the tree: rarely more than 3 to 4
Syed Khutubuddin, Assistant Prof, REVA ITM 19 B+ trees } In both operations tree is balanced Format of the Tree node: M index entries contains m+1 pointers
P 0 K 1 P 1 K 2 P 2 K m P m index entry 11 Search Operation on B+ trees: * ptr value pointed by a pointer &(value) to denote address value.
For search : Assume no Duplicate entries No same keys
Syed Khutubuddin, Assistant Prof, REVA ITM 21 B+ trees Search Operation Syed Khutubuddin, Assistant Prof, REVA ITM 22 B+ trees- Search Click the image to watch video 12 Syed Khutubuddin, Assistant Prof, REVA ITM 23
Syed Khutubuddin, Assistant Prof, REVA ITM 26 DELETE: Deleting entry 19 & 20 from Fig-1 FIG-1 FIG-2 14 Syed Khutubuddin, Assistant Prof, REVA ITM 27 DELETE Entry 24 Syed Khutubuddin, Assistant Prof, REVA ITM 28 B+ tree in Practice Key Compression: If search key values are very long (for instance, the name Devarakonda Venkataramana Sathyanarayana Seshasayee Yellamanchali Murthy. not many index entries will fit on a page and the height of the tree is large. search key values in index entries are used only to direct traffic to the appropriate leaf Solution: it is sufficient to store the abbreviated forms 'Da' and 'De for search key values 'David Smith' and 'Devarakonda 15 Syed Khutubuddin, Assistant Prof, REVA ITM 29 B+ tree in Practice Bulk Loading of B+ Trees: Two way of lading data in B+ trees:
First: tree is already available just insert keys in appropriate places.
Second: Tree is not available create from the beginning.
Syed Khutubuddin, Assistant Prof, REVA ITM 30 B+ tree in Practice Example: 16 Syed Khutubuddin, Assistant Prof, REVA ITM 31 B+ tree in Practice Example: Syed Khutubuddin, Assistant Prof, REVA ITM 32 B+ tree in Practice In Sybase ASE, depending on the concurrency control scheme being used for the index, the deleted row is removed (with merging if the page occupancy goes below threshold) or simply marked as deleted; a garbage collection scheme is used to recover space. Oracle 8, deletions are handled by marking the row as deleted. to reclaim the space occupied by deleted records, we can rebuild the index online DB2 and SQL Server remove deleted records and merge pages when occupancy goes below threshold. 17 Syed Khutubuddin, Assistant Prof, REVA ITM 33