Hashing Techniques

Hashing
Department of Computer Science

Islamia College Univerisity Peshawar
Fall 2012 Semester

BCS course: CS 00 Analysis of Algorithms
Course Instructor: Mr. Zahid
4/7/2019 Lecture #9 Adapted from slides by Dr

Onaiza Maqbol
Dictionary
 Holds n records
 What data structure should be used to implement T?
4/7/2019 Wednesday, March 18, 2009

Lecture #9 Adapted from slides by
Dr Onaiza Maqbol
Hashing

Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Direct Addressing
 Assumptions
 The set of keys
 Keys are distinct
 Create a table T[0..u-1]
 Benefit
 Each operation takes constant time
 Drawbacks
 The range of keys can be large

Onaiza Maqbol
Hashing
 Solution
 Use a hash function h to map the universe U of all keys into {0, 1, …, m–
1}

Onaiza Maqbol
Hash Table
 The mapped keys are stored into table called hash table
 The table consists of m cells
 A hash table requires much less storage than a direct address

table
 With direct addressing, an element in key k is stored in slot k,
with hashing, this element is stored in slot h(k)
 So the hash function h : U → {0, 1, …., m-1}
 h(k) is also called hash value of key k

Onaiza Maqbol
Hashing Functions - Modulo Function
 Several functions can be used to map keys into a set of integers. The
choice is made on the basis of amount of computation time required,
and simplicity of the computational steps. A common choice is a
modulo function h(x) defined as:
h(k) = k mod m
where k is the key, m is some positive integer and mod denotes the
modulus operator which computes the remainder of key k divided by m.
 It follows that the hash function h(x) maps the set of keys {k1, k2,
k3,…….kn} into a set of integers {0,1,2,……m-1}
 In essence, the modulo function is used to create a hash table of size m

Onaiza Maqbol
Modulo Function (contd…)

Onaiza Maqbol
Hashing Functions - Multiplication
Method

Onaiza Maqbol
Hashing of Strings

Onaiza Maqbol
ASCII Sum Method

Onaiza Maqbol
Radix Method

Onaiza Maqbol
Universal Hashing

Onaiza Maqbol
Universal Hashing (contd…)
s
Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0
to p-1, inclusive, and 0<a<p and 0<=b<p
belongs to the the family of universal functions
mod 6

Onaiza Maqbol
Perfect Hashing

Onaiza Maqbol
Perfect Hashing
0
1 m2 a2 b2 S2
2 4 10 18 60 75
3
…
8
 Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function
is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g.
h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table

Onaiza Maqbol
Collisions
 Two or more than two keys may hash to the same slot
 When a record to be inserted maps to an already occupied slot in

T, a collision occurs
 Can we avoid collisions altogether?
 Not if |U| > m
 We need a method to resolve collisions that occur

Onaiza Maqbol
Collisions

Onaiza Maqbol
Collision Resolution
 Two basic approaches to collision resolution are called chained
hashing and open address hashing
 Chained Hashing: In chained hashing the elements of a hash

table are stored in a set of linked lists.
 All colliding elements are kept in one linked list.
 The list head pointers are usually stored in an array.
 Chained hashing is also known as open hashing
 Open Address Hashing: In open address hashing, the hashed

keys are stored in the hash table itself.
 The colliding keys are allocated distinct cells in the table.
 Open address hashing is also referred to as closed hashing
Onaiza Maqbol
Collision Resolution by Chaining
 Records in the same slot are linked into a list

Onaiza Maqbol
Collision Resolution by Chaining (contd…)

Onaiza Maqbol
Analysis of Hashing with Chaining
 How long does it take to search for an element with a given key?
 Let n be the number of keys in the table, and let m be the number
of slots
 Define the load factor of T to be α = n/m = average number of

keys per slot
 Analysis is in terms of α, which can be less than, equal to, or

greater than 1

Onaiza Maqbol
Worst Hashing - Searching
 All hash keys are mapped to a single list.
 This situation may be referred to as worst distribution of hash keys
 In practice, this extreme situation may not arise, but nevertheless, possibility
does exist
 Worst case time for searching is thus θ(n), plus time to compute the hash
function
 The best search time is θ(1), since the key will be found in the front node
 On an average, half the list will be examined. Thus, average search time is θ(n)

Onaiza Maqbol
Worst Hashing - Insertion
 The worst case running time for insertion is θ(1)
 The assumption is that the key is not already present in the table
 To check presence, search of the key is required – As just

mentioned, worst case time of searching is θ(n)
 Thus worst case running time of insertion is θ(n)
 Average cost running time of insertion is also θ(n)

Onaiza Maqbol
Simple Uniform Hashing - Searching
 The keys are uniformly distributed among all the linked lists i.e. it is
assumed that any given element is equally likely to hash into any of the
m slots
 Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that
n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m
 We assume that hash value h(k) can be computed in O(1) time
 So time required to search for an element with key k depends linearly on

the length nh(k) of the list T[h(k)]

Onaiza Maqbol
 Two cases
 Unsuccessful search
 Successful search
 Unsuccessful search
 Expected time to search unsuccessfully for a key k is the expected time to search to
the end of list T[h(k)], which has the expected length E[nh(k)]= α
 Thus total time required is θ(1+ α)

Onaiza Maqbol
Simple Uniform Hashing - Insertion
 In order to find average time for inserting a key, let us consider the case
when kth key is inserted. At that stage, the list has already k-1 keys
distributed uniformly over m linked lists. Thus, prior to insertion of kth
key, the average length of each list is (k-1)/m, as shown in the diagram

Onaiza Maqbol
Simple Uniform Hashing - Insertion
 The insertion of new key would require probing of (k-1)/m keys plus the cost of
adding new key.
 Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each
operation consumes unit time 1.
 The expected cost of inserting a key is obtained by summing over all possible
values of k. Thus, the expected cost I is given by
 The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)

Onaiza Maqbol
 Successful search
 We assume that element x to be searched is equally likely to be any
of the n elements stored in the table
 The number of elements examined is one more than number of
elements that appear before x is x’s list
 Elements before x in the list were all placed after x was inserted
 Total time required for a successful search is 1+ α /2- α /2n = θ(1+

α)
 If n=O(m), α=n/m=O(m)/m=1
 Thus searching takes constant time on average

Onaiza Maqbol
Open Addressing
 All elements are stored in the hash table itself
 In open addressing, the hash table can fill up, so that no further
insertions can be made
 The load factor α can never exceed 1
 Advantage is that open addressing avoids pointers altogether
 Extra memory freed provides hash table with a larger number of

slots for the same amount of memory

Onaiza Maqbol
Insertion
 We successively examine or probe the hash table until we find an
empty slot in which to put the key
 The sequence of positions probed depends upon the key being

inserted
 To determine which points to probe, we extend hash functions to

include the probe number as a second input. Thus hash function
becomes:
h : U x {0, 1, …., m-1} → {0, 1, …., m-1}

Onaiza Maqbol
Pseudo code
HASH-INSERT(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=NIL
4. then T[j]←k
5. return j
6. else i ← i+1
7. until i=m
8. Error “Table full”

Onaiza Maqbol
Linear Probing
 In linear probing the hashed key is incremented by an integer value. In
general the hash function is defined as function
h(k,i)=( h’(k)+ i) mod m,
where h’(k) is an auxiliary hash function and m is the table size.

Onaiza Maqbol
Linear Probing (contd…)

Onaiza Maqbol
Searching
HASH-SEARCH(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=k
4. then return j
5. i ← i+1
6. until T[j]=NIL or i=m
7. Return NIL

Onaiza Maqbol
Quadratic Probing

Onaiza Maqbol
Quadratic Probing

Onaiza Maqbol
Quadratic Probing

Onaiza Maqbol

Hashing Techniques

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hashing Techniques

Uploaded by

Copyright:

Available Formats

Hashing

Department of Computer Science

Fall 2012 Semester

Course Instructor: Mr. Zahid

4/7/2019 Lecture #9 Adapted from slides by Dr

 What data structure should be used to implement T?

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 Create a table T[0..u-1]

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 The table consists of m cells

 A hash table requires much less storage than a direct address

4/7/2019 Wednesday, March 18, 2009

 In essence, the modulo function is used to create a hash table of size m

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 When a record to be inserted maps to an already occupied slot in

 Can we avoid collisions altogether?

 Not if |U| > m

 We need a method to resolve collisions that occur

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 Chained Hashing: In chained hashing the elements of a hash

 Open Address Hashing: In open address hashing, the hashed

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 Define the load factor of T to be α = n/m = average number of

 Analysis is in terms of α, which can be less than, equal to, or

4/7/2019 Wednesday, March 18, 2009

 All hash keys are mapped to a single list.

 This situation may be referred to as worst distribution of hash keys

4/7/2019 Wednesday, March 18, 2009

 To check presence, search of the key is required – As just

 Thus worst case running time of insertion is θ(n)

 Average cost running time of insertion is also θ(n)

4/7/2019 Wednesday, March 18, 2009

 We assume that hash value h(k) can be computed in O(1) time

 So time required to search for an element with key k depends linearly on

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

 The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)

 Total time required for a successful search is 1+ α /2- α /2n = θ(1+

4/7/2019 Wednesday, March 18, 2009

 The load factor α can never exceed 1

 Advantage is that open addressing avoids pointers altogether

 Extra memory freed provides hash table with a larger number of

4/7/2019 Wednesday, March 18, 2009

 The sequence of positions probed depends upon the key being

 To determine which points to probe, we extend hash functions to

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009

4/7/2019 Wednesday, March 18, 2009