File and Indexing

Data Storage and Indexing
Outline
Data Storage
storing data: disks buffer manager representing data external sorting
Indexing
types of indexes B+ trees hash tables (maybe)
2
Physical Addresses
Each block and each record have a physical address that consists of:
The host The disk The cylinder number The track number The block within the track For records: an offset in the block
sometimes this is in the blocks header
Logical Addresses
Logical address: a string of bytes (10-16) More flexible: can blocks/records around But need translation table:
Logical address L1 L2 L3
Physical address P1 P2 P3
4
Main Memory Address

When the block is read in main memory, it receives a main memory address Buffer manager has another translation table
Memory address M1 M2 M3
Logical address L1 L2 L3
5
The I/O Model of Computation

In main memory algorithms
we care about CPU time
In databases time is dominated by I/O cost Assumption: cost is given only by I/O Consequence: need to redesign certain algorithms Will illustrate here with sorting
Sorting
Illustrates the difference in algorithm design when your data is not in main memory:
Problem: sort 1Gb of data with 1Mb of RAM.
Arises in many places in database systems:

Data requested in sorted order (ORDER BY) Needed for grouping operations First step in sort-merge join algorithm Duplicate removal Bulk loading of B+-tree indexes.
7
2-Way Merge-sort: Requires 3 Buffers

Pass 1: Read a page, sort it, write it.
only one buffer page is used
Pass 2, 3, , etc.:
three buffer pages used.
INPUT 1 OUTPUT INPUT 2
Disk
Main memory buffers
Disk
8
Two-Way External Merge Sort

Each pass we read + write each page in file. N pages in the file => the number of passes
3,4 3,4 6,2 2,6 9,4 4,9 4,7 8,9 2,3 8,7 7,8 5,6 5,6 3,1 1,3 1,3 5,6 2 2 Input file PASS 0 1-page runs PASS 1 2 2-page runs PASS 2 4,4 6,7 8,9 1,2 3,5 6
2,3 4,6
! log2 N 1
So total cost is:
4-page runs
2 N log 2 N 1
PASS 3 1,2 2,3 3,4 4,5 6,6 7,8 9 9
Improvement: start with larger runs Sort 1GB with 1MB memory in 10 passes
8-page runs
Can We Do Better ?
We have more main memory Should use it to improve performance
10
Cost Model for Our Analysis

B: Block size M: Size of main memory N: Number of records in the file R: Size of one record
11
Indexes
12
Outline
Types of indexes B+ trees Hash tables (maybe)
13
Indexes
An index on a file speeds up selections on the search key field(s) Search key = any subset of the fields of a relation
Search key is not the same as key (minimal set of fields that uniquely identify a record in a relation).
Entries in an index: (k, r), where:

k = the key r = the record OR record id OR record ids
14
Index Classification
Clustered/unclustered
Clustered = records sorted in the key order Unclustered = no
Dense/sparse
Dense = each record has an entry in the index Sparse = only some records have
Primary/secondary
Primary = on the primary key Secondary = on any key Some books interpret these differently
B+ tree / Hash table /

15
Clustered Index
File is sorted on the index attribute Dense index: sequence of (key,pointer) pairs
10 20 30 40
10 20 30 40
50 60 70 80
50 60 70 80 16
Clustered Index
Sparse index: one key per data block
10 30 50 70
10 20 30 40
90 110 130 150
50 60 70 80
17
Clustered Index with Duplicate Keys

Dense index: point to the first record with that key
10 20 30 40
10 10 10 20
50 60 70 80
20 20 30 40
18

Sparse index: pointer to lowest search key in each block:
10 10 20 30
10 10 10 20 20 20
Search for 20
30 40
19

Better: pointer to lowest new search key in each block: Search for 20
10 20 30 40
10 10 10 20
50 60 70 80
30 30 40 50 20
Unclustered Indexes
To index other attributes than primary key Always dense (why ?)
10 10 20 20
20 30 30 20
20 30 30 30
10 20 10 30 21
Summary Clustered vs. Unclustered Index
Data entries Data entries

(Index
File) (Data file)
Data Records
Data Records
CLUSTERED
UNCLUSTERED
22
Composite Search Keys

Composite Search Keys: Search on a combination of fields. Equality query: Every field value is equal to a constant value. E.g. wrt <sal,age> index: age=20 and sal =75 Range query: Some field value is not a constant. E.g.: age =20; or age=20 and sal > 10
Examples of composite key indexes using lexicographic order.
11,80 12,10 12,20 13,75 <age, sal> 10,12 20,12 75,13 80,11 <sal, age> name age sal bob 12 cal 11 joe 12 sue 13 10 80 20 75 10 20 75 80 <sal> 11 12 12 13 <age>
Data records sorted by name
Data entries in index sorted by <sal,age>
Data entries sorted by <sal>

23
B+ Trees
Search trees Idea in B Trees:
make 1 node = 1 block
Idea in B+ Trees:
Make leaves into a linked list (range queries are easier)
24
B+ Trees Basics
Parameter d = the degree Each node has >= d and <= 2d keys (except root)
30 120 240
Keys k < 30
Keys 30<=k<120
Keys 120<=k<240
Keys 240<=k
Each leaf has >=d and <= 2d keys:

40 50 60
Next leaf
40 50 60
25
B+ Tree Example
d=2
80
20
60
100
120
140
10
15
18
20
30
40
50
60
65
80
85
90
10
15
18
20
30
40
50
60
65
80
85
90 26
B+ Tree Design
How large d ? Example:
Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 byes
2d x 4 + (2d+1) x 8 <= 4096 d = 170
27
Searching a B+ Tree
Exact key values:
Start at the root Proceed down, to the leaf
Select name From people Where age = 25
Range queries:
As above Then sequential traversal
Select name From people Where 20 <= age and age <= 30
28
B+ Trees in Practice
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133
Typical capacities:
Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records
Can often hold top levels in buffer pool:

Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 MBytes
29
Insertion in a B+ Tree
Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent:
(K3, ) to parent
K1 P0 K2 P1 P2 K3 P3 K4 P4 K5 p5 K1 P0 P1 K2 P2 K4 P3 P4 K5 p5
If leaf, keep K3 too in right node When root splits, new root has 1 key only
30
Insert K=19
80
20
60
100
120
140
10
15
18
20
30
40
50
60
65
80
85
90
10
15
18
20
30
40
50
60
65
80
85
90 31
After insertion
80
20
60
100
120
140
10
15
18
19
20
30
40
50
60
65
80
85
90
10
15
18
19
20
30
40
50
60
65
80
85
90 32
Now insert 25
80
20
60
100
120
140
10
15
18
19
20
30
40
50
60
65
80
85
90
10
15
18
19
20
30
40
50
60
65
80
85
90 33
After insertion
80
20
60
100
120
140
10
15
18
19
20
25
30
40
50
60
65
80
85
90
10
15
18
19
20
25
30
40
50
60
65
80
85
90 34
But now have to split !
80
20
60
100
120
140
10
15
18
19
20
25
30
40
50
60
65
80
85
90
10
15
18
19
20
25
30
40
50
60
65
80
85
90 35
After the split
80
20
30
60
100
120
140
10
15
18
19
20
25
30
40
50
60
65
80
85
90
10
15
18
19
20
25
30
40
50
60
65
80
85
90 36
Deletion from a B+ Tree

Delete 30
80
20
30
60
100
120
140
10
15
18
19
20
25
30
40
50
60
65
80
85
90
10
15
18
19
20
25
30
40
50
60
65
80
85
90 37

After deleting 30
May change to 40, or not 20 30 60 80
100
120
140
10
15
18
19
20
25
40
50
60
65
80
85
90
10
15
18
19
20
25
40
50
60
65
80
85
90 38

Now delete 25
80
20
30
60
100
120
140
10
15
18
19
20
25
40
50
60
65
80
85
90
10
15
18
19
20
25
40
50
60
65
80
85
90 39

After deleting 25 Need to rebalance Rotate
20 30 60 80
100
120
140
10
15
18
19
20
40
50
60
65
80
85
90
10
15
18
19
20
40
50
60
65
80
85
90 40

Now delete 40
80
19
30
60
100
120
140
10
15
18
19
20
40
50
60
65
80
85
90
10
15
18
19
20
40
50
60
65
80
85
90 41

After deleting 40 Rotation not possible Need to merge nodes
80
19
30
60
100
120
140
10
15
18
19
20
50
60
65
80
85
90
10
15
18
19
20
50
60
65
80
85
90 42

Final tree
80
19
60
100
120
140
10
15
18
19
20
50
60
65
80
85
90
10
15
18
19
20
50
60
65
80
85
90 43
Hash Tables
Secondary storage hash tables are much like main memory ones Recall basics:
There are n buckets A hash function f(k) maps a key k to {0, 1, , n-1} Store in bucket f(k) a pointer to record with key k
Secondary storage: bucket = block, use overflow blocks when needed

44
Hash Table Example

Assume 1 bucket (block) stores 2 keys + pointers h(e)=0 e h(b)=h(f)=1 0 h(g)=2 b 1 f h(a)=h(c)=3
2 3
g a c
45
Searching in a Hash Table

Search for a: Compute h(a)=3 Read bucket 3 1 disk access
0 1 2 3
e b f g a c
46
Insertion in Hash Table

Place in right bucket, if space E.g. h(d)=2
0 1 2 3
e b f g d a c
47
Insertion in Hash Table

Create overflow block, if no space E.g. h(k)=1
0 1 2
e b f g d a c
48
More over- 3 flow blocks may be needed
Hash Table Performance

Excellent, if no overflow blocks Degrades considerably when number of keys exceeds the number of buckets (I.e. many overflow blocks).
49
Extensible Hash Table

Allows has table to grow, to avoid performance degradation Assume a hash function h that returns numbers in {0, , 2k 1} Start with n = 2i << 2k , only look at first i most significant bits
50
Extensible Hash Table

E.g. i=1, n=2, k=4
1
i=1
0(010)
0 1
1(011)
Note: we only look at the first bit (0 or 1)

51
Insertion in Extensible Hash Table

Insert 1110
1
i=1
0(010)
0 1
1(011) 1(110)
52

Now insert 1010
1
i=1
0(010)
0 1
1(011) 1(110), 1(010)
Need to extend table, split blocks i becomes 2
53

Now insert 1110
1
i=2
0(010)
00 01 10 11
10(11) 10(10) 11(10)
54

Now insert 0000, then 0101
1
i=2
0(010) 0(000), 0(101)
00 01 10 11
10(11) 10(10) 11(10)
Need to split block
55

After splitting the block
00(10) i=2 00(00) 01(01)
2 2
00 01 10 11
10(11) 10(10) 11(10)
56
Performance Extensible Hash Table

No overflow blocks: access always one read BUT:
Extensions can be costly and disruptive After an extension table may no longer fit in memory
57
Linear Hash Table

Idea: extend only one entry at a time Problem: n= no longer a power of 2 Let i be such that 2i <= n < 2i+1 After computing h(k), use last i bits:
If last i bits represent a number > n, change msb from 1 to 0 (get a number <= n)
58
Linear Hash Table Example

N=3
(01)00 i=2 (11)00 (01)11 BIT FLIP
00 01 10
(10)10
59
Linear Hash Table Example

Insert 1000: overflow blocks
(01)00 i=2 (11)00 (01)11
(10)00
00 01 10
(10)10
60
Linear Hash Tables

Extension: independent on overflow blocks Extend n:=n+1 when average number of records per block exceeds (say) 80%
61
Linear Hash Table Extension

From n=3 to n=4
(01)00 i=2 (11)00 (01)11
(01)00 (11)00 (01)11 i=2
00 01 10
(10)10 (10)10
Only need to touch one block (which one ?)
00 01 10 11
(01)11
62
Linear Hash Table Extension

From n=3 to n=4 finished
Extension from n=4 to n=5 (new bit) Need to touch every single block (why ?)
(01)00 (11)00
i=2 (10)10
00 01 10 11
(01)11
63

File and Indexing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

File and Indexing

Uploaded by

Copyright:

Available Formats

Data Storage and Indexing

Main Memory Address

The I/O Model of Computation

Arises in many places in database systems:

2-Way Merge-sort: Requires 3 Buffers

Main memory buffers

Two-Way External Merge Sort

PASS 3 1,2 2,3 3,4 4,5 6,6 7,8 9 9

Cost Model for Our Analysis

Entries in an index: (k, r), where:

B+ tree / Hash table /

90 110 130 150

Clustered Index with Duplicate Keys

Clustered Index with Duplicate Keys

Clustered Index with Duplicate Keys

Summary Clustered vs. Unclustered Index

Data entries Data entries

File) (Data file)

Composite Search Keys

Data records sorted by name

Data entries in index sorted by <sal,age>

Data entries sorted by <sal>

Each leaf has >=d and <= 2d keys:

2d x 4 + (2d+1) x 8 <= 4096 d = 170

Can often hold top levels in buffer pool:

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Deletion from a B+ Tree

Secondary storage: bucket = block, use overflow blocks when needed

Hash Table Example

Searching in a Hash Table

Insertion in Hash Table

Insertion in Hash Table

More over- 3 flow blocks may be needed

Hash Table Performance

Extensible Hash Table

Extensible Hash Table

Note: we only look at the first bit (0 or 1)

Insertion in Extensible Hash Table

Insertion in Extensible Hash Table

1(011) 1(110), 1(010)

Need to extend table, split blocks i becomes 2

Insertion in Extensible Hash Table

10(11) 10(10) 11(10)

Insertion in Extensible Hash Table

0(010) 0(000), 0(101)

10(11) 10(10) 11(10)

Need to split block

Insertion in Extensible Hash Table

10(11) 10(10) 11(10)

Performance Extensible Hash Table

Linear Hash Table

Linear Hash Table Example

Linear Hash Table Example

(01)00 i=2 (11)00 (01)11

Linear Hash Tables

Linear Hash Table Extension

(01)00 i=2 (11)00 (01)11

(01)00 (11)00 (01)11 i=2