You are on page 1of 16

1

Problem

RT&Tisalargephonecompany,andtheywanttoprovideenhanced
callerIDcapability:

givenaphonenumber,returnthecallersname
phonenumbersareintherange0toR=10 101
nisthenumberofphonenumbersused
wanttodothisasefficientlyaspossible

Weknowtwowaystodesignthisdictionary:
abalancedsearchtree(AVL,redblack)oraskiplistwiththephone

numberasthekeyhasO(logn)querytimeandO(n)spacegoodspace
usageandsearchtime,butcanwereducethesearchtimetoconstant?
abucketarrayindexedbythephonenumberhasoptimalO(1)querytime,
butthereisahugeamountofwastedspace:O(n+R)

AnotherSolution
AHashTableisanalternativesolutionwithO(1)expectedquerytimeandO(n+N)
space,whereNisthesizeofthetable
Likeanarray,butwithafunctiontomapthelargerangeofkeysintoasmallerone
e.g.,taketheoriginalkey,modthesizeofthetable,andusethatasanindex
Insertitem(4018637639,Roberto)intoatableofsize5
4018637639mod5=4,soitem(4018637639,Roberto)isstoredinslot4ofthetable

Alookupusesthesameprocess:mapthekeytoanindex,thencheckthearray
cellatthatindex
Insert(4018639350,Andy)
Andinsert(4018632234,Devin).Wehaveacollision!

CollisionResolution
Howtodealwithtwokeyswhichmaptothesamecellofthearray?
Usechaining
Setuplistsofitemswiththesameindex

Theexpected,search/insertion/removaltimeis
O(n/N),providedtheindicesareuniformlydistributed
Theperformanceofthedatastructurecanbefinetunedbychanging

4
thetablesizeN

FromKeystoIndices

Themappingofkeystoindicesofahashtableiscalledahashfunction
Ahashfunctionisusuallythecompositionoftwomaps,ahashcodemapandacompressionmap.
Anessentialrequirementofthehashfunctionistomapequalkeystoequalindices
Agoodhashfunctionminimizestheprobabilityofcollisions
JavaprovidesahashCode()methodfortheObjectclass,whichtypicallyreturnsthe32bitmemoryaddressoftheobject.
ThisdefaulthashcodewouldworkpoorlyforIntegerandStringobjects
ThehashCode()methodshouldbesuitablyredefinedbyclasses.

PopularHashCodeMaps
Integercast:fornumerictypeswith32bitsorless,wecanreinterpretthe
bitsofthenuberasanint
Componentsum:fornumerictypeswithmorethan32bits(e.g.,longand
double),wecanaddthe32bitcomponents.
Polynomialaccumulation:forstringsofanaturallanguage,combinethe
charactervalues(ASCIIorUnicode)a0a1...an1byviewingthemasthe
coefficientsofapolynomial:
a0+a1x+...+xn1an1
ThepolynomialiscomputedwithHornersrule,ignoringoverflows,ata
fixedvaluex:
a0+x(a1+x(a2+...x(an2+xan1)...))
Thechoicex=33,37,39,or41givesatmost6collisionsonavocabularyof
50,000Englishwords

6
Whyisthecomponentsumhashcodebadforstrings?

PopularCompressionMaps
Division:h(k)=|k|modN
thechoiceN=2kisbadbecausenotallthebitsaretakeninto
account
thetablesizeNisusuallychosenasaprimenumber
certainpatternsinthehashcodesarepropagated

Multiply,Add,andDivide(MAD):h(k)=|ak+b|modN
eliminatespatternsprovidedamodN0
sameformulausedinlinearcongruential(pseudo)randomnumber
generators

MoreonCollisions
Akeyismappedtoanalreadyoccupiedtablelocation
whattodo?!?
Useacollisionhandlingtechnique
WeveseenChaining
CanalsouseOpenAddressing
DoubleHashing
LinearProbing

LinearProbing
Ifthecurrentlocationisused,trythenexttablelocation
linear_probing_insert(K)
if (table is full) error
probe = h(K)
while (table[probe] occupied)
probe = (probe + 1) mod M
table[probe] = K

Lookupswalkalongtableuntilthekeyoranemptyslotisfound
Useslessmemorythanchaining.(Donthavetostoreallthoselinks)
Slowerthanchaining.(Mayhavetowalkalongtableforalongway.)
Deletionismorecomplex.(Eithermarkthedeletedslotorfillintheslotby
shiftingsomeelementsdown.)

LinearProbingExample
h(k)=kmod13
Insertkeys:

31
44 32
41

73

18 44 59 32 22 31 73

10

LinearProbingExample(cont.)

11

DoubleHashing

Usetwohashfunctions
IfMisprime,eventuallywillexamineeverypositioninthetable
double_hash_insert(K)
if(tableisfull)error
probe=h1(K)
offset=h2(K)
while(table[probe]occupied)
probe=(probe+offset)modM
table[probe]=K
Manyofsame(dis)advantagesaslinearprobing
Distributeskeysmoreuniformlythanlinearprobingdoes

12

DoubleHashingExample
h1(K)=Kmod13
h2(K)=8Kmod8
wewanth2tobeanoffsettoadd

13

DoubleHashingExample(cont.)

14

TheoreticalResults

Let=N/M
theloadfactor:averagenumberofkeysperarrayindex
Analysisisprobabilistic,ratherthanworstcase
ExpectedNumberofProbes
Notfound

found

15

ExpectedNumberofProbesvs.
LoadFactor

16

You might also like