Multimedia Information Retrieval

Joemon M Jose
joemon.jose@glasgow.ac.uk

Information Retrieval Group
Department of Computing Science
University of Glasgow
2 Parts
Part 1- Information Retrieval Concepts & Multimedia
Retrieval

Part 2- Interactive Retrieval of Multimedia documents
(focusing on the searcher)
1/9/2010 Multimedia Retrieval 2
Part 1- Outline
Information Retrieval Basic Concepts
Retrieval Models/weighting
Evaluation
Multimedia Retrieval
Evaluation
Low-level features
Retrieval Models
1/9/2010 3 Multimedia Retrieval
Scientific Paradigm of IR
Theory
Experiments
Real Life
Applications
Top down approach
Bottom-up approach
Basic research in IR is concerned with the design of better systems
Text retrieval system
Queries Documents
Indexing
Indexed
Documents
Similarity
Computation
Retrieved
Documents
Retrieval System
Relevance relation between query and document representation
Major Components
Architecture of the System
Retrieval Models
Document representation
Similarity Scheme
Query representation
Query Interface
Tokenize
Stop word
Stemming
Documents
Query
Tokenize
Stop word
Stemming
Query features
Indexing features
Matching

Doc Score
Term 1
Term 2
Term 3
di dj dk
dj
di
dk
s1
s2
s3
s1>s2>s3> ...
IR System Architecture
Retrieval Models
Queries
Documents
Similarity
Computation
Model
An IR Model defines
- a model for document representation
- a model for query representation
- a mechanism for estimating the relevance of a
document for a given query.
Zipfs Distribution
Associate with each word e, its frequency F(e), the
number of times it occurs anywhere in the corpus

Imagine that weve sorted the vocabulary according
to frequency

George Kingsley Zipf has become famous for
noticing that the distribution is same for any large
sample of natural language we might consider
Plotting Word Frequency by Rank
Main idea: Count (Frequency)
How many times tokens occur in the text
Over all texts in the collection

Now rank these according to how often they occur.

Rank Freq
1 37 system
2 32 knowledg
3 24 base
4 20 problem
5 18 abstract
6 15 model
7 15 languag
8 15 implem
9 13 reason
10 13 inform
11 11 expert
12 11 analysi
13 10 rule
14 10 program
15 10 oper
16 10 evalu
17 10 comput
18 10 case
19 9 gener
20 9 form
The Corresponding Curve
Word Frequency vs. Resolving
Power (from van Rijsbergen 79)
The most frequent words are not the most descriptive.
Resolving Power
Why some words occur more frequently and how
such statistics can be exploited when building an
index automatically
. the frequency of a word occurrence in an article
furnishes a useful measurement of word significance
[Luhn 1957]
Two critical factors
Word frequency within a document (TF)
Collection frequency (Inverse document frequency)
Information Retrieval Models
Examples
Boolean Model
Vector-space model
Probabilistic models (OKAPI- BM25)
Probabilistic Ranking Principle
Logic Models
Much expressive power
Language Models
Divergence from Randomness model (DFR)
Concepts Visited
Architecture of the System
Efficiency Issues
Effectiveness Issues
Relevance
Retrieval Model
Document Representation
Query Representation
Similarity Computation
Concept of feature weighting
Advances & Applications
Applications Advances

Web search
Desktop search
Email search

Ranking Schemes
BM-25
Retrieval Models
Language models
Divergence from randomness

Croft et. al- Search engines Information Retrieval in practice
Addison Wesley, 2009
Manning et. al. Introduction to Information Retrieval,
Cambridge University Press, 2008
Why Evaluate?
To provide that your ideas/approach is better than
someone else
To decide between alternate methods
To tune/train/optimize your system
To discover points of failure
Types of Evaluation
System Evaluation
User Evaluation
Evaluation in operational setting
Evaluation Classical approach
Cranfield Paradigm/TREC
Collection of documents
Queries and Ground truth
Measures
Measurement of performance
Precision, recall, F-measure, MAP
Systems
Test Collection/Corpora
Collections of documents that also have
associated with them
a set of queries for which relevance
assessments are available.
Experts in the domain provide such judgements
Documents? What are they?
Genuine documents like email messages, journal articles,
memos etc. kind of material for which we tend to look for
some information
Queries? What are they?
Kind of questions users of such collection will ask! How
do we catch them? Co-operation of user population.
Until 1992- smaller textual collections are used

TREC: Text Retrieval Conference

Started in 1992, organised by NIST, USA and funded
by the US government.
Introduced a new standard for retrieval system
evaluation
millions of documents, GBs of data
USA government reports, emails, scientific abstracts, news
wires
avoid exhaustive assessment of documents using the
pooling method.
Basic idea is to use different search engines
independently and pool their results to form a set of
documents that have at least a recommendation of
potential relevance

Top ranked 100 documents from each search engine is
grouped together and manually assessed for relevance

assessors were retired security analysts (from CIA)
un-assessed documents are assumed to be irrelevant
Pooling Method
The TREC Objectives
Provide a common ground for comparing different IR
techniques.
Same set of documents and queries, and same evaluation
method.
Sharing of resources and experiences in developing the
benchmark.
With major sponsorship from government to develop large
benchmark collections.
Encourage participation from industry and academia.
Development of new evaluation techniques, particularly
for new applications.
Retrieval, routing/filtering, non-English collection, web-
based collection, question answering, video collections
IR Functionality?
Given a Query, the IR system provides a ranked list
after searching the underlying collection of
documents
the assumption is
Better system provides better ranked list
A better ranked list satisfies the USERS overall
So?
How to compare ranked list of documents?
How to verify whether one system is effective than
other?
Effectiveness measure?
The function of an IR system is to:
retrieve all relevant documents
measured by recall
retrieve no non-relevant documents,
measured by precision

Effectiveness Measures
Recall & Precision

Van Rijsbergen- Information Retrieval - Chapter 7 on Evaluation
http://www.dcs.gla.ac.uk/Keith/Preface.html

Relevant vs. Retrieved
Relevant
Retrieved
All docs
Precision vs. Recall
Relevant
Retrieved
| Collection in Rel |
| ed RelRetriev |
Recall =
| Retrieved |
| ed RelRetriev |
Precision =
All docs
Why Precision and Recall?
Get as much good stuff while at the same time
getting as little junk as possible.
Trade-off between Recall and
Precision
1
0
1
Recall
P
r
e
c
i
s
i
o
n

The ideal
Returns relevant documents but
misses many useful ones too
Returns most relevant
documents but includes
lots of junk
Rel? Nrel Recall Precision
1 1.00 0.20 1.00 1
1 2.00 0.40 1.00 2
0 2.00 0.40 0.67 3
1 3.00 0.60 0.75 4
0 3.00 0.60 0.60 5
0 3.00 0.60 0.50 6
0 3.00 0.60 0.43 7
0 3.00 0.60 0.38 8
0 3.00 0.60 0.33 9
0 3.00 0.60 0.30 10
0 3.00 0.60 0.27 11
0 3.00 0.60 0.25 12
0 3.00 0.60 0.23 13
0 3.00 0.60 0.21 14
1 4.00 0.80 0.27 15
0 4.00 0.80 0.25 16
0 4.00 0.80 0.24 17
0 4.00 0.80 0.22 18
0 4.00 0.80 0.21 19
0 4.00 0.80 0.20 20
0 4.00 0.80 0.19 21
0 4.00 0.80 0.18 22
0 4.00 0.80 0.17 23
0 4.00 0.80 0.17 24
1 5.00 1.00 0.20 25
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
0.
00
0.
10
0.
20
0.
30
0.
40
0.
50
0.
60
0.
70
0.
80
0.
90
1.
00
Recall
P
r
e
c
i
s
i
o
n
Recall/Precision Curve
Interpolate a precision value for each standard recall level:
r
j
e{0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0}
r
0
= 0.0, r
1
= 0.1, , r
10
=1.0
The interpolated precision at the j-th standard recall level
is the maximum known precision at any recall level
between the j-th and (j + 1)-th level:

) ( max ) (
1
r P r P
j j
r r r
j
+
s s
=

Interpolating a Recall/Precision
Curve: An Example
0.4 0.8
1.0
0.8
0.6
0.4
0.2
0.2
1.0
0.6
Recall
P
r
e
c
i
s
i
o
n

Average Recall/Precision Curve
Typically average performance over a large set of
queries.
Query characteristics vary?
Length, term distribution etc
Document characteristics vary?
Compute average precision at each standard recall
level across all queries.
Plot average precision/recall curves to evaluate
overall system performance on a document/query
corpus.

Precision/Recall Curves
Mean Average Precision (MAP)
Average of precision values obtained after each
relevant document is retrieved
If not retrieved precision is 0
NOT the average of precision at the standard 11 recall
points
Mean Average Precision (MAP)
Across all queries
F-Measure
One measure of performance that takes into account
both recall and precision.
Harmonic mean of recall and precision:

Compared to arithmetic mean, both need to be high
for harmonic mean to be high.

P R
R P
PR
F
1 1
2 2
+
=
+
=
trec-eval
trec-eval is a publicly available program, developed by
Chris Buckley, used extensively by TREC and other
evaluation campaigns, which computes many usable
metric values based on standardised file input formats;

Its available, multi-platform, easy to use, so use it !
Concepts Visited
Effectiveness measures
Precision, Recall
Precision-Recall curves
Single Measures
MAP, F-Measure
Portable test collections
Pooling method to assemble test collections

Multimedia Evaluation
Evaluation Challenge
Comparability
Unlike text, lots of dependency on content
Domain
Features extracted
Corel data set
Some categories are easy!
Effect on retrieval
Image Annotation evaluation Using a normalised collection,
it was found that SVM with global features performed better
than the state of the art image annotation algorithms!
Konstantinos et. al, A Framework For Evaluating Automatic
Image Annotation Algorithms, In ECIR 2010 (LNCS 5993)
TRECVid: Video IR Evaluation

In 2001, video retrieval started as a TREC track;
Usual TREC mode of operation (data-topics-search submissions-
pooling-evaluation by metrics-workshop)
In 2003 TRECVid separated from TREC because if was sufficiently
different, and had enough participation, though TREC and TRECVid
workshops are co-located;
2003-2004- US Broadcast news
CNN, ABC world news
2005-2006 International broadcast news
2007-2009 Dutch sound & vision data set (infotainment)

Major responsibilities
NIST: Organize data, tasks, and other resources of interest with input
from sponsors and participating researchers
Select and secure available data for training and test
Define user and system tasks, submission formats, etc.
LDC: Collect, secure IPR, prepare, distribute data
NIST: Define and develop evaluation infrastructure
Create shot boundary ground truth
Create and support interactive judging software for features and
search
Create and support the scoring software for all tasks
Researchers: create common resources & share
Researchers: Develop systems
Researchers: Run systems on the test data and submit results to NIST
NIST: Evaluate submissions
Run automatic shot boundary scoring software
Manage the manual judging by contractors viewing a sample of
system output (~76,000 shots for features, ~78,000 shots for search)
NIST, Researchers: Analyze and present results
NIST: Organize and run annual workshop in mid-November at NIST
Slides from Alan Smeaton
TRECVid 2010 Data
A new set of videos characterized by a high degree of diversity in creator, content,
style, production qualities, original collection device/encoding, language, etc - as is
common in much "web video".
The collection also has associated keywords and descriptions provided by the video
donor. The videos are available under Creative Commons licenses from the Internet
Archive.
TREC VID 2010
Known-item search task (interactive, manual, automatic)
Semantic indexing
Content-based multimedia copy detection
Event detection in airport surveillance video
Instance search
A number of datasets are available for these tasks
Details at
http://www-
nlpir.nist.gov/projects/tv2010/tv2010.html

Image Database

Representation of images
Intermediate Image Representation
Indexing
Query Image
Matching
Retrieval of images
Retrieved Images
General architecture of a typical image archival and retrieval system
Input image

Transformation
Transformation
Image Retrieval System:
Architecture
Luminance Histogram
Represents the relative frequency of occurrence of the
various gray levels in the image
For each gray level, count the # of pixels having that level
Can group nearby levels to form a big bin & count #pixels in it
( From Matlab Image Toolbox Guide Fig.10-4 )
Colour Histogram
8-bit Image
Histogram intersection
colour
Colour
Frequency
A
colour
Colour
Frequency
A
B
colour
Colour
Frequency
A
B
A B
=
i
i
i
i
i
B
B A
B A d
) , min(
1 ) , (
Distance Measures
Bin by Bin dissimilarity measures:
L1 norm (absolute deviation)

L2 norm (Euclidean distance)

Minkowski Distance

| |
i
B
i
i
A
2
1
2
| | |
.
|
\
|

i
i i
B A
r
r
i
i
i L
B A B A d
r
/ 1
| | ) , ( |
.
|
\
|
=

Image Database

Representation of images
Indexing
Query Image
Matching
Retrieval of images
Retrieved Images
General architecture of a typical image archival and retrieval system
Input image

Transformation
Transformation
Image Retrieval System:
Architecture
Video Retrieval An Exploration
Low-level features are
Extracted & stored
Low-level features are
Extracted
Ranked Shots
Low-level features
Colour Histogram ( 32 )
0.0656269729 0.1105883049 0.3914042771 0.1834655145 0.1822522096
0.0533656881 0.0095880682 0.0037089646 0.0921026673 .........
Homogenous Texture (80)
93 103 85 166 168 200 196 191 206 161 ...
Edge Histogram (62)
4 2 4 4 4 0 4 3 4 6 2 4 1 .....
Colour Layout (12)
15 21 31 8 9 14 21 18 15 16 15 15

Dimensionality of feature vectors is very large
Sequential scanning

Limitations
Low-level feature representation is very noisy
High dimensionality and computational complexity
Feature representation depends on application
Same applies to similarity measures
Semantics of such a representation is different from
the concepts a user submits as a query
Alternative is
High-level feature selection
Very difficult and will not scale for large scale applications
Summary
Shots as a retrieval unit
Each shot is represented by a Key Frame
Low-level features are extracted
No index structure is used (as a standard)
Interactive retrieval from large collections is an issue
No concept of retrieval model is used
Similarity functions/matching functions are feature
dependent

Approaches
Computer vision
We need new robust features
Development of new features, high-level feature selection

Information retrieval perspective
How best we can use already existing features
Focus on effectiveness

TREC VID experimentation
TREC automatic search track
Collection 2007-2008
Dutch data set (Sound and Vision)
ASR text in Dutch
MT into English
Very noisy ...

TREC VID Topics
Topic 197 Find shots of one or more people walking up stairs

Topic 207 Find shots of waterfront with water and buildings

Topic- 216 Find shots of a bridge

Multiple query examples
TREC topics provide multiple query examples
Combining results form each relevant example
Two approaches
Early fusion
Average query descriptor
Late fusion
Best results are combined often using ML weights

Relevant documents in result list ...

Recall P@1000 Recall P@1000
0.0150 0.0181 0.0783 0.0920
0.0000 0.0050 0.0095 0.0260
0.0143 0.0200 0.0000 0.0090
0.0239 0.0310 0.0107 0.0200

Multiple examples continued ...
What would be the impact of diverse examples ....
Assumption
Query examples are equally important
If they are visually similar
Results most likely to be identical
As a result, given large amount of non-relevant results, fusion
will emphasise more irrelevant items
How to eliminate redundant examples
Correlation between feature vectors
90% of correlation means more or less visually similar
Characteristic of each group

Re-ranking & result fusion
Visually similar examples are removed
Results lists are clustered using single-link approach
Redundant results are removed
Result fusion using round robin approach
TREC VID 2007 data set
Corpora No. of Keyframes No. of Topic No. of
Examples Topics

Database1 18131 1355 24

Database2 368236 1355 24

Results - P@10
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
CH EH HT CL
P@10 for Database2
Baseline RRRF
0
0.01
0.02
0.03
0.04
0.05
0.06
CH EH HT CL
P@10 for Database1
Baseline RRRF
Results P@100
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
CH EH HT CL
P@100 for Database2
Baseline RRRF
0
0.01
0.02
0.03
0.04
0.05
0.06
CH EH HT CL
P@100 for Database1
Baseline RRRF
Mean Average Precision
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
CH EH HT CL
MAP for Database1
Baseline RRRF
Combining all features
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
CH EH HT CL All Four All Four + HLF
MAP for Database2
Automatic Search
Top teams
MCG_ICT Chinese Academy of Science
UvA MediaMill
City University of Hong Kong
Chinese Academy of Science
HLF based retrieval

Visual based retrieval

Concept based retrieval

Motion and Face re-ranking
CAS Overview
No. of Unique documents retrieved
Low-level features - Summary
Needed in many tasks
Looked into some of the strategies
Can be used to achieve 12% MAP (for TREC VID 2007
collection)
If used together with HLF or other model performance
can improve.

Wilkins et. al Properties of Optimally Weighted
Data Fusion in CBMIR, In SIGIR 2010,

Video Retrieval based on LDA 2
nd

Study
Retrieve videos based on low level region based
features such as Scale-invariant feature transform
(SIFT)
Method: LDA based approach
Database: TRECVid 2009 search evaluation

Approach implemented is based on the model:
Misra et. al. News video story segmentation based on
semantic coherence and content similarity. In MMM10:
Proceedings of the 16th International Conference on
Multimedia Modeling, Chongqing, China

75 1/9/2010 Multimedia Retrieval
Image Retrieval based on LDA
SIFT features (128 dimension)
K-Means clustering of a subset of features from
TRECVid 2008 dataset
% SIFT points from each image: 2, 5, 10, 20
K-Means Iterations: 10, 20, 40, 80
Cluster centers are obtained for each condition
Find the cluster centers for TRECVid 2009 Test
dataset
Each Test image represented in terms of cluster centers
Like: Each Test Document represented in terms of words

System Description
LDA training!
Each image is represented in terms of LDA topics (topic
distribution)
Each LDA topic is represented in terms of cluster centers
(cluster center distribution)
Find the cluster centers for TRECVid 2009 Query
dataset
LDA testing!
Estimate topic distribution for each Query Image by
iterative procedure

System Description
Matching can be done in the LDA topic space to find
the relevant images
KL Divergence between query, Q, and Test Image i, Ti:
KL(Q, Ti)
What did people do earlier when using LDA for this
task?
P(Q | Ti) =
Cw
W
w
wt
T
t
it i
Q P
[
= =
(
=
1 1
) , | ( | u | u
System Description
Results
Different amount of data for clustering : 2%,5%, 10%, 20%
of the TRECVid 2008 Relevant Data
Different number of iterations for clustering :10, 20, 40, 80
Number of LDA topics = 50
Performance is similar in all the cases

80
Method Performance
LDA based method 948/10691
High level features 1082/10691
Combination 1781/10691
Best result on TRECVid 4527/10691
1/9/2010 Multimedia Retrieval
Text vs. Video - Comparison
Video
Low-level features are not
good enough
However, still needed for
some tasks
E.g., Interactive retrieval
Sequential matching is
not good
Time consuming for any
real-time search
Performance is 10% level
Retrieval Models?
Text
Terms as index features
Use of inverted index
Can use for fast retrieval
Billions of items/documents
Retrieval Models
Performance is 50% level
MAP
Urruty et. al An Efficient Indexing Structure for
Multimedia Data, In MIR 2008
Part 2- Outline
MIR from the Users Perspective
Information Need
Relevance feedback
Browsing
Search Interfaces

83
Retrieval vs. A Question-answer Scenario
Question
Answer
Assessment
1/9/2010
1/9/2010
The Information Need (IN)
Searching is motivated by a problematic situation
Gap in user knowledge between what they know and
what they want to know is Information Need
ASK- Anomalous State of Knowledge (Nicholas Belkin)
The information need is not static, and develops
during the search session
The transformation of a users information need into a
query is known as query formulation
one of the most challenging activities in information
seeking
amplified if the information need is vague or collection
knowledge is poor
Multimedia Retrieval 84
1/9/2010
IN Transformation
real information
need
Perception

perceived
information
need

Expression
implicit in
the mind of the user
request
Formalisation
representation
in a human language
query
system language
Multimedia Retrieval 85
1/9/2010 86
Interaction Relevance feedback
Information need
Query formulation
Result evaluation (by User)
Satisfied?
Relevance assessments
Query reformulation
Query evaluation (Retrieval)
Result display
yes no
1/9/2010 87
Visualisation
1/9/2010 88
Benefits
It shields the user from the details of the query formulation
process, and permits the construction of useful search
statements without intimate knowledge of collection
make-up and search environment

It breaks down the search operation into a sequence of
small search steps, designed to approach the wanted
subject area gradually

It provides a controlled query alteration process designed
to emphasize some terms and to deemphasize others, as
required in particular search environments
It takes into consideration the evolving information needs
Issues of focusing on the user
How to evaluate such systems?
Role of test collections
Static queries vs. dynamic user needs
Expert judgments vs. user interpretation situated on
their understanding
Measures?
Precision, recall variants!
Experimental Design
Tasks, system, user order
Simulated Work task situation
Used to frame simulated information need
situation - allowing its development and
modification during retrieval.
Consists of the source of the need, the environment
of the situation and the problem which has to be
solved and serves to make the test person
undertake the objective of the search.
It provides the context which ensures a degree of
freedom for each individual to react and respond in
relation to his interpretation of the given indicative
request for that particular situation
Measures
Qualitative
User preferences
User satisfaction
Tools
Semantic differentials (for example- reaction to system)
Terrible-wonderful
Satisfying-frustrating
Difficult-easy
Likert Scales
How easy was it to use the system? (Extremely-----not at all)

Quantitative data
Time spent?
No of documents selected?
Statistical testing?
Is the difference observed is just a trend or statistically significant
Sample size?

Relevance Feedback in CBIR
Improve CBIR systems
Overcome semantic gap
Capture individuality/context of user
How to gain information about users relevance
judgements?
Explicitly: user assesses degree of relevance
Implicitly: users actions are interpreted by the system
Static vs. dynamic relevance judgements:
Users actions and goals are time-dependent
Image browsing- Using Implicit
Feedback Case Study
Problems in specifying information need
Vagueness
Changing need
Learning when interacting with the system

Can we build a browsing system that adapts to user
interests?
Urban et. al. An Adaptive Technique for Content-based
Image Retrieval. Multimedia Tools and Applications
Processing (MTAP), 2006

Solution Adapt to user actions
Exploit user actions
Users know what they want
User actions are directed at increasing their gain
Their requirement change as they interact
Exploit interactions to refine high-level queries to
representations based on low-level features relevance
feedback
The OM The Ostensive Browsing
The OM Ostensive Profiles
Ostensive Profile: Weighting scheme of how much
each object along the path contributes to the next
query
Reflects how uncertainty/relevance changes over time
relevance
age
Problems addressed
Semantic gap in CBIR:
Between low-level image features and high-level
semantic concepts
Query formulation process in CBIR systems is difficult
Low-level representation of data
Vagueness (I dont know what Im looking for, but Ill
know when I find it)
Dynamic nature of information needs
The Ostensive Model
Combines retrieval-based and browse-based
approaches to information seeking
Query-less interface: pointing at an object is
interpreted as an indication of relevance
Temporal dimension added to the notion of
relevance
The Ostensive Relevance of an information object is the
degree to which evidence from the object is
representative/indicative of the current information need.
Ostensive Browser Interface
The Features

Text feature Visual feature
What set of keywords
obtained from
manual
annotations
global colour
distribution
Representation tf*idf term vector

colour
histogram

Similarity
Measure
Cosine Histogram
intersection
Query Adaptation Scheme
Ostensive Relevance Profile determines the weights
associated with each query image on the path
Weights are updated every time a new image is
added to the path
Query is constructed from the weighted sum of
evidence found in the path images, collected from
the images features
Features are treated separately resulting in multiple
query representations
The OM The Ostensive Path
Query Adaptation Text Query
Construct new query term vector, by updating
the terms intrinsic weights (e.g. idf) with the
ostensive relevance weights
( )
e
=
=
p
i
l
D t
i
i t i t t
D tf l O idf w
1
) ( Re
( )
3 1 1
Re Re
1
l O l O t w
t
+ =
t1, t2 t2 t1, t3
=
=
p
l
i
i
i
i i t
p
t
t
l O
i l O
D t D tf
l
idf
t w
1
1 Re
position at weight relevance ostensive : Re
document in term of frequency term : ) (
path of length :
value idf s ' term :
term of weight resulting :
t1, t2 t2
t1,t3
Query Adaptation Visual Query
Compute weighted centroid of path image
histograms

( )
=
=
p
i
l
i
D i Q
H l O H
1
Re
Adaptive Query - Final Evidence
Text and Histogram query return two different
result lists
Weighting for each feature computed by

Dempster-Shafer used to combine the scores

( )
( ) ( )

= =
=
+

=
p p
p
l
i
l
i
i i i i
l
i
i i k
k
l O Q D sim l O Q D sim
l O Q D sim
strenght
1 1
2 1
1
Re ) , ( Re ) , (
Re ) , (
Issues in Evaluation
Classical IR Evaluation Strategy
Test Collection approach
Precision-Recall
Users are out of loop
User satisfaction
No strategy to evaluate adaptive systems
Manual Query Interface
The OM The Ostensive Browsing
Controlled OB Interface
Experimental Methodology
18 postgraduate design students, 3 systems, 3 tasks
Simulated work task situation
Imagine you are a designer
Hypothesis:
The ostensive approach is generally more acceptable or
satisfying to the user
Query adaptation along with an ostensive interface provides a
better environment for CBIR
Providing explicit control on the ostensive system results in better
task satisfaction
Results Analysis
Semantic differentials
Task clear, simple, familiar
Search relaxing, interesting, restful, easy,
simple, pleasant
Images relevant, important, useful, appropriate,
complete
Interaction in control, comfortable, confident
System efficient, satisfying, reliable, flexible,
useful, easy, novel, fast, simple,
stimulating, effective
Results Analysis Sem. Diff.
Differential MQS POB COB p-value Dunns post test
restful 3.9 3.1 2.8 0.008 MQS vs. COB < 0.05
pleasant 3.4 2.6 2.2 0.050 -
comfortable 3.2 2.2 2.2 0.014 -
flexible 3.7 3.4 2.4 0.007 MQS vs. COB <0.05
POB vs. COB < 0.05
useful 3.4 2.6 1.9 0.001 MQS vs. COB <0.01
novel 3.3 2.4 2.0 0.010 MQS vs. COB <0.05
simple 2.9 2.2 2.9 0.030 -
stimulating 3.3 2.6 2.1 0.003 MQS vs. COB <0.05
effective 3.2 2.4 2.1 0.007 MQS vs. COB <0.05
Results Analysis Likert Scales
Statements on nature of information need and
satisfaction on search process and outcome of
search
Information need:
Initial need from no idea to rather well-defined
Ostensive browser succeeded in giving alternate ideas,
supports explorative search
Manual Query System too restrictive, problems with query
formulation and uncertainty of collection make-up:
My expectations have been adapted to the available images. This,
however, is not how a designer wants to think, he doesnt want
limitations to influence decisions.
Results Analysis - IN
MQS COB
Images
match initial
idea
Clear idea of images
wanted (2 out of 8)
No exploration of DB
(4 / 8)
Clear idea (3 / 3)
Images do
not match
initial idea
Could not find images
in mind (4 / 5)
Not able to formulate
query (4 / 8)
Bigger choice and
variety of images (4 / 5)
Better selection (7 / 10)
Results Analysis - Ranking
MQS OB
- Complexity of
query
formulation
Missing control
over search (POB)

+ System
accuracy, control
over search
Intuitive, graphical
display of images,
simplicity and
pleasure to use
(POB)
Flexibility (COB)
I liked the flexibility when I needed and the
automatic selection when I didnt

Conclusions
Flexibility to accommodate different types of users
and search requirements as important design goal
Interactive, supportive systems to overcome
deficiencies in retrieval accuracy
New strategies for holistic multimedia retrieval /
organisation system (search, organise, annotate)
More sophisticated learning scheme, including
long-term and across-user learning facilities
IR Evaluation two disciplines

computer science - system evaluation
propose new (well-founded) solutions
evaluate them in evaluation campaigns to uncover
what benefits searchers and in what way
new questions for investigation information science

Information science - investigate searching behaviour from a human
perspective - user evaluation
identify generalities amongst searchers or search behaviour
identify meaningful differences between searchers or search behaviour
make recommendations to system designers

Case Study 2- Community based
feedback for video retrieval
Number of problems with video search
Semantic Gap
Lack of annotation
Different query methods

Solution: Use previous interaction between users and
the system, to aid future users of the same system

Vallet et. al, Search Trials using user feedback to
improve video search, In ACM MM 2008
Video Retrieval Interface
Implicit Feedback: A Graph Based
Approach
Graph used to represent user interactions with dataset
There are two desired properties of this graph
Representation of all the user interactions with the
system, including the interaction sequence
The aggregation of implicit information from multiple
sessions and users into a single representation
Implicit Graph: Detailed Graph
Q
i
R
i
Nodes
Query
- Query terms
Document
- Document ID
Actions
General metadata:
- Timestamp
- User id
q
p
b
n

Query Action
Navigation Action
Browsing Action
Play Action
- Time of play
c
Click Action
t
Tooltip Action
Q
1
D
1
D
2
b q c
D
1
D
3
c
Q
2
q c D
4
b
Q
1
D
2
b q c
Q
1
D
1
D
2
b q c
D
3
c
Q
2
q c D
4
b
Q
3
q c
c
Q
1
D
1
R
1
b q c
D
3
Q
2
q c R
2
b
Q
3
q c
Possible
recommendation
c c
Implicit Graph: Probability Graph
Q
1
D
1
R
1
q
D
3
Q
2
q R
2
Q
3
q
0.5

0.6

0.8

0.7

0.7

0.8

Implicit Graph: Probability Graph
1/x(n) - 1 = lr(n)
=
j
j j
j
n for relevance explicit iff 1,
n for relevance implicit ), lr(n (0,1)
n for e irrelevanc explicit iff , 1
) ( j s n w
Action f(a) Action f(a)
Play ( Sec) 3 Navigate Browse
R/L
2
View 10 Tooltip 1
Detailed
weights
Evaluation: Hypothesis
The performance of the users of the system will
improve with the use of recommendations based
on implicit feedback
The users will be able to explore the collection to a
greater extent, and also discover aspects of the
topic that they may not have considered
The users will be more satisfied with the system
that provides feedback, and also be more satisfied
with the results of their search

Evaluation: Experimental Design
2 Searcher by 2 Topic Latin Square design
2 systems
Recommendation
Baseline
4 tasks
Users were divided into groups of 4, to evaluate effect
of additional information
Evaluation: Collection
Used the TRECVID 2006 dataset.
TRECVID 2006 dataset, contains 79,848 shots
Concentrating on interactive tasks, there are normally
24 tasks

Evaluation: Participants
18 male, 6 female
Average age 25.2
All had advance proficiency with English
The majority
Deal with multimedia regularly
Are familiar with creating multimedia data
Regularly search for multimedia, particularly online

Task Completion: P@N
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
5 10 15 20 30 100 200 500 1000 2000
P
r
e
c
i
s
i
o
n

N
P@N over all topics
Baseline Recommendation
Task Completion: MAP
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
User 5-8 User 9-12 User 13-16 User 17-20 User 21-24
Baseline Recommendation
M
A
P

Results: Task Completion
Precision was higher for recommendation system
Precision@N values for values of N between 5 and
100 was statistically significant
MAP was higher for the recommendation system
P/R values higher for recommendation system, but
recall values not particularly high overall
User Exploration: User Interaction
Graph
Analysed graph of user interactions
49% of videos selected by participants 1-12 were also
selected by 13-24
Users 13-24 clicked on 596 unique documents, 1-12
clicked on 1050
Participants are following new paths and not merely
repeating was previous users did
Results: User Exploration
Participants marked more videos as being relevant
using recommendation system
Recommendations did not guide participants in a
particular direction
Number of unique queries increased through out
evaluation
Very little query repetition
User Perceptions: Systems
Differential Recommendation Baseline Same
Best 16 2 1
Learn 7 2 11
Easier 5 2 13
Prefer 17 1 2
Perception 11 3 6
Effective 14 3 3
Results: User Perceptions
The majority of users preferred the
recommendation system and found it more
effective for the tasks they performed
The system could help change user perceptions
about the task and present new ideas
In general participants found results from
recommendation system more satisfactory,
relevant, appropriate and complete

Conclusions
Recommendations aided user exploration of data
collection
Number of videos marked as relevant increased with
recommendation system
Accuracy with which videos were marked as relevant
increased with recommendation system
Majority of users preferred system that provided
recommendations

Summary
Often user centered techniques bring lots of improvement
It complements the limitations of system oriented
approaches
Techniques from cognitive and information science fields
It provides rich set of data for analysis
Importance of appropriate experimental design methods
Statistical tests to show significance
Evaluation methodology is relatively expensive compared
to system centered methods

Multimedia Information Retrieval

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multimedia Information Retrieval

Uploaded by

Copyright:

Available Formats

Joemon M Jose

You might also like