Professional Documents
Culture Documents
Chap11. Hashing
SNU-OOPSLA-LAB
File Structure
SNU-OOPSLA Lab
Chapter Objectives
File Structure
SNU-OOPSLA Lab
Contents(1)
11.1 Introduction
11.2 A Simple Hashing Algorithm
11.3 Hashing Functions and Record Distribution
11.4 How Much Extra Memory Should Be Used?
11.5 Collision Resolution by Progressive Overflow
11.6 Storing More Than One Record per Address: Buckets
11.7 Making Deletions
11.8 Other Collision Resolution Techniques
11.9 Patterns of Record Access
File Structure
SNU-OOPSLA Lab
Overview
Overview(1)
File Structure
SNU-OOPSLA Lab
Overview
Overview(2)
SNU-OOPSLA Lab
11.1 Introduction
Introduction
File Structure
SNU-OOPSLA Lab
11.1 Introduction
Name
BALL
LOWELL
TREE
File Structure
Product
Home
Address
66 65
66 X 65 =
4,290
4,290
76 96
76 X 96 =
6,004
6,004
84 82
84 X 82 =
6,888
6,888
SNU-OOPSLA Lab
11.1 Introduction
key
K=LOWELL
SNU-OOPSLA Lab
...
File Structure
...
h(K)
Record
Address
0
1
2
3
4 LOWELL . . .
Address
4
5
6
LOWELLs
home
address
8
11.1 Introduction
File Structure
SNU-OOPSLA Lab
11.1 Introduction
70
15 50 1
30 51
57
File Structure
10 45 3
11 60
14
SNU-OOPSLA Lab
61 124 40
20 55
15
10
11.1 Introduction
Hashing(1)
Hashing Functions :
Consider a primary key consisting of
a string of 12 letters and
a file with 100,000 slots.
Since 2612 >> 105,
File Structure
SNU-OOPSLA Lab
11
11.1 Introduction
Hashing(2)
File Structure
SNU-OOPSLA Lab
12
11.1 Introduction
Hashing(3)
M
K
1
N
1- 1
N
k
~ f
~
ek*k!
File Structure
SNU-OOPSLA Lab
13
11.1 Introduction
Collision
Collision
Situation in which a record is hashed to an address that does not have
sufficient room to store the record
Perfect hashing : impossible!
Different key, same hash value
(Different record, same address)
Solutions
File Structure
SNU-OOPSLA Lab
14
e.g.. LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32
L O W E L L ( 6 blanks )
File Structure
SNU-OOPSLA Lab
15
Fold
76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 32
( 7679 + 8769 )
(16448 + 7676 )
( 4187 + 3232 )
( 7419 + 3232 )
( 10651 + 3232)
File Structure
mod
mod
mod
mod
mod
19937 = 16448
19937 = 4187
19937 = 7419
19937 = 10651
19937 = 13883
SNU-OOPSLA Lab
16
a = s mod n
a : home address
s : the sum produced in step 2
n : the number of addresses in the file
A prime number is usually used for the divisor because primes tend to
distribute remainders much more uniformly than do nonprimes
File Structure
SNU-OOPSLA Lab
17
Worst
Record Address
1
A
2
B
3
4
C
5
D
6
E
7
F
8
9
G
10
SNU-OOPSLA Lab
Acceptable
Record
A
B
C
D
E
F
G
Address
1
2
3
4
5
6
7
8
9
10
18
Better-than-random
File Structure
SNU-OOPSLA Lab
19
Packing Density =
# of records
# of spaces
r
N
File Structure
SNU-OOPSLA Lab
20
Poisson Distribution
p(x) =
(r/N)xe -r/N
x!
(poisson distribution)
SNU-OOPSLA Lab
21
File Structure
SNU-OOPSLA Lab
22
r/N
File Structure
Packing
Density(%)
10
20
30
40
50
60
70
80
90
100
SNU-OOPSLA Lab
Synonym as
% of records
4.8
9.4
13.6
17.6
21.4
24.8
28.1
31.2
34.1
36.8
23
File Structure
SNU-OOPSLA Lab
24
Progressive Overflow(Contd)
0
1
Hash
Routine
Address
....
....
Key
York
Novak. . .
Rosen. . .
Jasper. . .
Morely. . .
....
....
File Structure
SNU-OOPSLA Lab
Yorks home
address (busy)
2nd try (busy)
3rd try (busy)
4th try (open)
Yorks actual
address
25
Progressive Overflow(Cont'd)
0
1
2
Hash
Routine
97
Address
99
98
99 Jello...
....
....
File Structure
....
....
Key
Blue
SNU-OOPSLA Lab
Wrapping around
26
Progressive Overflow(Cont'd)
Worst case
When the record does not exist and the file is full
File Structure
SNU-OOPSLA Lab
27
Progressive Overflow(Cont'd)
- Search length : # of accesses required to retrieve a record (from
secondary memory)
20
21
22
23
24
25
...
Actual
Address
Adams. . .
Bates. . .
Cole. . .
Dean. . .
Evans. . .
Home Search
Address length
20
21
21
22
20
1
1
2
2
5
...
...
SNU-OOPSLA Lab
= 2.2
28
Progressive Overflow(Cont'd)
5
4
Average
search 3
length
1
20
40
60
100
80
Packing density
File Structure
SNU-OOPSLA Lab
29
Home
Key Address
Green 30
30
Hall
Jerk 32
King 33
Land 33
Marx 33
Nutt 33
File Structure
Bucket
address
30
31
32
33
Bucket contents
Green ...
Hall ...
Jenks ...
King...
Land...
Marks...
(Nutt... is an
overflow record)
SNU-OOPSLA Lab
30
N : # of addresses
b : # of records fit in a bucket
bN : # of available locations for records
Packing density = r/bN
# of overflow records
N X [ 1XP(b+1) + 2XP(b+2) + 3XP(b+3)...]
File Structure
SNU-OOPSLA Lab
31
Bucket Implementation
Collision counter =< bucket size
0
/////
/////
/////
/////
/////
/////
/////
BRICE
TROOP
An empty bucket
2
JONES
/////
ARNSWORTH
Two entries
JONES
ARNSWORTH
STOCKTON
A full bucket
File Structure
SNU-OOPSLA Lab
32
Bucket Implementation(Cont'd)
Problems when
File Structure
SNU-OOPSLA Lab
33
Making Deletions
Home
Record address
Adams
Jones
Morris
Smith
5
6
6
5
5
6
7
8
Delete 5 Adams...
Morris
6 Jones...
7 ######
8 Smith...
Adams...
Jones...
Morris...
Smith...
SNU-OOPSLA Lab
34
File Structure
SNU-OOPSLA Lab
35
Overflow File
File Structure
SNU-OOPSLA Lab
36
Linear Probing(1)
File Structure
SNU-OOPSLA Lab
37
A S E A R C H I
key :
hash :
N G E X A M P L E
1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
A
insertion sequence
S
E
A A
Memory Space
C
H
I
N
G
E E
E E G H I X
A A C A
M
P
File Structure
SNU-OOPSLA Lab
E E G H I X E
38
Rehashing(1)
File Structure
SNU-OOPSLA Lab
39
Rehashing(where P=3)(2)
N G E X A M P L E
A S E A R C H I
key :
hash :
1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
A
insertion sequence
S
E
A
Memory Space
C
H
I
N
G
A
M
P
File Structure
SNU-OOPSLA Lab
E
40
File Structure
SNU-OOPSLA Lab
41
File Structure
SNU-OOPSLA Lab
42
R1
i+1
R2
R3
R4
R5
File Structure
SNU-OOPSLA Lab
H
H
43
File Structure
SNU-OOPSLA Lab
44
R1
R2
i
i+1
R1
R3
R2
H
H
finally
i
i+1
R1
R3
R2
R4
R5
H
H
2 chains : R1 - R2 -R4
R3 - R5
File Structure
SNU-OOPSLA Lab
45
fast accesses
less collisions
File Structure
SNU-OOPSLA Lab
46
File Structure
SNU-OOPSLA Lab
47