You are on page 1of 38

Hashing

Department of Computer Science


Islamia College Univerisity Peshawar

Fall 2012 Semester


BCS course: CS 00 Analysis of Algorithms

Course Instructor: Mr. Zahid

4/7/2019 Lecture #9 Adapted from slides by Dr


Onaiza Maqbol
Dictionary
 Holds n records

 What data structure should be used to implement T?

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by
Dr Onaiza Maqbol
Hashing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Direct Addressing
 Assumptions
 The set of keys
 Keys are distinct

 Create a table T[0..u-1]

 Benefit
 Each operation takes constant time

 Drawbacks
 The range of keys can be large

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing
 Solution
 Use a hash function h to map the universe U of all keys into {0, 1, …, m–
1}

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hash Table
 The mapped keys are stored into table called hash table

 The table consists of m cells

 A hash table requires much less storage than a direct address


table
 With direct addressing, an element in key k is stored in slot k,
with hashing, this element is stored in slot h(k)
 So the hash function h : U → {0, 1, …., m-1}
 h(k) is also called hash value of key k

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing Functions - Modulo Function
 Several functions can be used to map keys into a set of integers. The
choice is made on the basis of amount of computation time required,
and simplicity of the computational steps. A common choice is a
modulo function h(x) defined as:

h(k) = k mod m

where k is the key, m is some positive integer and mod denotes the
modulus operator which computes the remainder of key k divided by m.

 It follows that the hash function h(x) maps the set of keys {k1, k2,
k3,…….kn} into a set of integers {0,1,2,……m-1}

 In essence, the modulo function is used to create a hash table of size m

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Modulo Function (contd…)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing Functions - Multiplication
Method

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing of Strings

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
ASCII Sum Method

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Radix Method

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Universal Hashing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Universal Hashing (contd…)
s

Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0
to p-1, inclusive, and 0<a<p and 0<=b<p
belongs to the the family of universal functions

mod 6

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Perfect Hashing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Perfect Hashing
0
1 m2 a2 b2 S2
2 4 10 18 60 75

3

8

 Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function
is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g.
h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collisions
 Two or more than two keys may hash to the same slot

 When a record to be inserted maps to an already occupied slot in


T, a collision occurs

 Can we avoid collisions altogether?

 Not if |U| > m

 We need a method to resolve collisions that occur

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collisions

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution
 Two basic approaches to collision resolution are called chained
hashing and open address hashing

 Chained Hashing: In chained hashing the elements of a hash


table are stored in a set of linked lists.
 All colliding elements are kept in one linked list.
 The list head pointers are usually stored in an array.
 Chained hashing is also known as open hashing

 Open Address Hashing: In open address hashing, the hashed


keys are stored in the hash table itself.
 The colliding keys are allocated distinct cells in the table.
 Open address hashing is also referred to as closed hashing
4/7/2019 Wednesday, March 18, 2009
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution by Chaining
 Records in the same slot are linked into a list

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution by Chaining (contd…)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Analysis of Hashing with Chaining
 How long does it take to search for an element with a given key?

 Let n be the number of keys in the table, and let m be the number
of slots

 Define the load factor of T to be α = n/m = average number of


keys per slot

 Analysis is in terms of α, which can be less than, equal to, or


greater than 1

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Worst Hashing - Searching

 All hash keys are mapped to a single list.

 This situation may be referred to as worst distribution of hash keys

 In practice, this extreme situation may not arise, but nevertheless, possibility
does exist

 Worst case time for searching is thus θ(n), plus time to compute the hash
function
 The best search time is θ(1), since the key will be found in the front node
 On an average, half the list will be examined. Thus, average search time is θ(n)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Worst Hashing - Insertion
 The worst case running time for insertion is θ(1)

 The assumption is that the key is not already present in the table

 To check presence, search of the key is required – As just


mentioned, worst case time of searching is θ(n)

 Thus worst case running time of insertion is θ(n)

 Average cost running time of insertion is also θ(n)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching
 The keys are uniformly distributed among all the linked lists i.e. it is
assumed that any given element is equally likely to hash into any of the
m slots

 Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that
n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m

 We assume that hash value h(k) can be computed in O(1) time

 So time required to search for an element with key k depends linearly on


the length nh(k) of the list T[h(k)]

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching

 Two cases
 Unsuccessful search
 Successful search

 Unsuccessful search
 Expected time to search unsuccessfully for a key k is the expected time to search to
the end of list T[h(k)], which has the expected length E[nh(k)]= α
 Thus total time required is θ(1+ α)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Insertion
 In order to find average time for inserting a key, let us consider the case
when kth key is inserted. At that stage, the list has already k-1 keys
distributed uniformly over m linked lists. Thus, prior to insertion of kth
key, the average length of each list is (k-1)/m, as shown in the diagram

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Insertion
 The insertion of new key would require probing of (k-1)/m keys plus the cost of
adding new key.

 Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each
operation consumes unit time 1.

 The expected cost of inserting a key is obtained by summing over all possible
values of k. Thus, the expected cost I is given by

 The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)


4/7/2019 Wednesday, March 18, 2009
Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching
 Successful search
 We assume that element x to be searched is equally likely to be any
of the n elements stored in the table
 The number of elements examined is one more than number of
elements that appear before x is x’s list
 Elements before x in the list were all placed after x was inserted

 Total time required for a successful search is 1+ α /2- α /2n = θ(1+


α)

 If n=O(m), α=n/m=O(m)/m=1
 Thus searching takes constant time on average

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Open Addressing
 All elements are stored in the hash table itself

 In open addressing, the hash table can fill up, so that no further
insertions can be made

 The load factor α can never exceed 1

 Advantage is that open addressing avoids pointers altogether

 Extra memory freed provides hash table with a larger number of


slots for the same amount of memory

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Insertion
 We successively examine or probe the hash table until we find an
empty slot in which to put the key

 The sequence of positions probed depends upon the key being


inserted

 To determine which points to probe, we extend hash functions to


include the probe number as a second input. Thus hash function
becomes:
h : U x {0, 1, …., m-1} → {0, 1, …., m-1}

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Pseudo code
HASH-INSERT(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=NIL
4. then T[j]←k
5. return j
6. else i ← i+1
7. until i=m
8. Error “Table full”

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Linear Probing
 In linear probing the hashed key is incremented by an integer value. In
general the hash function is defined as function
h(k,i)=( h’(k)+ i) mod m,
where h’(k) is an auxiliary hash function and m is the table size.

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Linear Probing (contd…)

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Searching
HASH-SEARCH(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=k
4. then return j
5. i ← i+1
6. until T[j]=NIL or i=m
7. Return NIL

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing

4/7/2019 Wednesday, March 18, 2009


Lecture #9 Adapted from slides by Dr
Onaiza Maqbol

You might also like