Professional Documents
Culture Documents
Topic-Sensitive PageRank
Query context
Bookmarks
Browsing history
- Expensive at runtime
Link-Based Scoring
(PageRank) Original PageRank
PageRank query
+ Inexpensive at runtime
1
1 0.5
1
1
0.5
1
1.5
1 0.5
1.5
0.5
0.5
0.5
1 1.2
1.5 1.2
0.5 0.6
After a while
Original PageRank Influencing the Computation
Input Uninfluenced:
Web graph G Page is important if many important pages
Output point to it.
Rank vector r : (page page importance) Influenced:
r = PR(G ) Page is important if many important pages
point to it, and btw, the following are by
definition important pages.
Topic-Sensitive PageRank:
Part I (preprocessing) Offline Processing
Goal: Generate multiple a-priori estimates of Input:
page importance, each score providing an
Web W
importance estimate with respect to a topic
Basis topics [c1, ... ,c16]
Use the Open Directory as a source of We use 16 categories (first level of ODP)
representative basis topics (i.e., use ODP
Output:
pages to form a set of influence vectors vj)
List of rank vectors [r1, ... ,r16]
Offline preprocessing step, just as with ordinary
rj : (page page importance wrt topic cj)
PageRank
Offline Processing Graphical Depiction of Part I
For each topic cj FirstLevel(ODP):
Sports
1
if i pages (cj )
set vj[i ] = pages (cj )
0 otherwise
Query Processor
(page,topic)
Classifier
Web ranktopic
Query-time
d TSPageRank()
Only the link structure of pages relevant For the query golf, with no additional context,
to the query topic will be used to rank the distribution of topic weights we would use
pages is:
1
Better to rank query golf with the Sports- 0.9
0.8
y
G s
ns
So g
Bu rts
ts
l th
es
ec ws
ld
ef ion
R ce
Sc al
om ess
Sh nce
et
nd m
n
r
or
or
te
am
ee
ea
A
en
pi
e
ci
at
io
_a Ho
n
ie
pu
Sp
W
N
op
re
eg
si
er
R
R
C
ds
Ki
Classify the Query Context Picking the Topic Distribution
The topic distribution will influence If the query is golf, but the previous query was
rankings to prefer pages important to the financial services investments, then the
topic of the query context distribution of topic weights we would use is:
1
If user issues queries about investment 0.9
0.8
0.7
opportunities, a follow-up query on golf 0.6
0.5
should be ranked with the Business- 0.4
0.3
Sp y
Sc a l
Bu rt s
o m ess
ts
Sh nce
es
_T e
e c ws
ld
eg e
e f i on
G s
lth
So g
ns
et
c
r
nd m
n
n
or
or
te
en
ea
am
ee
A
pi
ci
e
io
at
_a Ho
n
ie
pu
W
N
op
si
re
er
H
R
R
R
C
ds
Ki
Interpretation Interpretation
Health
Sports
d d
Does it make a difference? User Study (no search context)
Do the different topical rank vectors rank Test set of 10 queries
results for queries differently?
To answer, measure the similarity of
5 users were each shown top 10 results
to queries, when ranked using
induced ranks for some set of test query Standard PageRank vector
results Topic-Sensitive PageRank vector
Details in paper, but short answer is,
yes, the different rank vectors induce
A page in the result was relevant if 3 of
different result rankings the 5 users judged it to be relevant
Query for golf
(topic-sensitivity disabled) Results for golf
golf again, but query history judged to
be Business topic Search Context
Advantages of mediating through basis topics,
as opposed to keyword extraction:
Flexibility: uniformly treat variety of sources of
context and personalization
Transparency: topic weights are easily interpreted
by user
Privacy: topic weights reveal less unintentionally
Efficiency: low query time cost, with small additional
preprocessing cost
Current Approach Alternative Approach
Sports
Health {Sports,Health}
Related Work
Scaling Personalized Search
[Jeh,Widom 02]
Dynamic programming for generation of complete basis
What is this Page Known For?
[Rafiei,Mendelzon WWW9 00]
What keywords is a page known for?
The Intelligent Surfer: ...
[Richardson,Domingos NIPS 02]
Computes PageRank once for each query
Persona
[Tanudjaja,Mui HICSS 02]
Enhances HITS with ODP data