Professional Documents
Culture Documents
Dr.Aruna Malapati
BITS Pilani Asst Professor
Department of CSIS
Hyderabad Campus
BITS Pilani
Hyderabad Campus
Tolerant retrieval
Spelling correction
Soundex
...
An Array of
structures
Pros:
Lookup in a hash table is faster than lookup in a tree.
Cons:
No prefix search (all terms starting with automat) [tolerant retrieval]
Use B-trees
Solves the prefix problem.
Rebalancing is expensive
BITS Pilani, Hyderabad Campus
Binary Tree
M-N
A-K
H-J Ka-Ke
An-As
B-G
An-Ar Go-Gu
Assam B-C
* mon: find all docs containing any term ending with mon
Maintain an additional tree for terms backwards
How do we enumerate all the terms meeting the wild card query pro*cent?
Example: m*nchen
hello$,ello$h,llo$he,lo$hel,o$hell,$hello where$ is a
special symbol.
X lookup on X$ X* lookup on $ X*
$a,ap,pr,ri,il,l$,$i,is,s$,$t,th,he,e$,$c,cr,ru,
ue,el,le,es,st,t$, $m,mo,on,nt,h$
Query: mon*
$m
mace madden
Bigrams for query: $m, mo,on
mo
among amortize $m AND mo AND on
on
along among All terms staring with m and containing
Mo and containing on
2-gram index for the term moon
BITS Pilani, Hyderabad Campus
Processing wild card query
Search
Type your search terms, use * if you need to.
E.g., Alex* will match Alexander.