You are on page 1of 41

Image Retrieval Using Multi-Feature and Genetic

Algorithm

Abstract:
This paper proposes an image retrieval method based on multi-feature
similarity score fusion using genetic algorithm. Single feature describes image
content only from one point of view, which has a certain one-sided. Fusing
multifeature similarity score is expected to improve the system's retrieval
performance. In this paper, the retrieval results from color feature and texture
feature are analyzed, and the method of fusing multi-feature similarity score is
described. For the purpose of assigning the fusion weights of multi-feature
similarity scores reasonably, the genetic algorithm is applied. For comparison,
other three methods are implemented. They are image retrieval based on color
feature, texture feature and fusion of color-texture feature similarity score with
equal weights. The experimental results show that the proposed method is
superior to other methods.

Introduction
With the rapid development of multimedia and network technology, people
can access a large number of multimedia information. For people who want to
make full use of multimedia information resources, the primary question is how
to query the multimedia information of interest. Text query can be applied to
multimedia information retrieval, but it has inherent deficiencies. One hand, text
annotation of multimedia information will spend a lot of manpower and resources
and it is inefficient. On the other hand, annotated text is usually a person's
perception of multimedia information. It is subject to impact of individual
difference and state of human and environment, and the described results may
be more one-sided. In addition, it is clearly incomplete to describe content-rich
multimedia information with a small amount of text. Content Based Image
Retrieval (CBIR) techniques appeared in 1990s. It solves the above problems
well. It uses low-level features like color, texture and shape to describe image
content, and breaks through the limitation of traditional text query technique.

CBIR system can be implemented based on single feature. Single image feature
describes the content of an image from a specific angle. It may be suitable for
some images, but it also may be difficult to describe other images. Moreover,
describing an image with single feature is also incomplete. Representing an
image with multi-features from multi-angles is expected to achieve better results.
Information is multi-source, and information fusion approach is diverse. The
problem how to organize multi-source information in a suitable way to achieve
the intended results attracts extensive attention from the researchers in this field.

Background
While not all encompassing, below is a data flow diagram that represents the
primary components of content-based image retrieval system (CBIRS). In order
to retrieve images, the current mode of thinking is to break down an image in its
various components to be able to represent their values mathematically thus
being able to perform quantified analysis of an image.
From a high level, the components of CBIRS are that of its:
Features: The features of the images can be represented by its structure or
geometric shape, the colors that are used for it, and the texture of the image. As
well, the local features of the image will eventually lead to object recognition as
well.
Associated Text: Often a methodology employed on web sites such as a
google.com, the categorization and/or identification of an image can be done by
the text that is associated with it.
Relevance Feedback: In the process of searching for an image, a concept called
Query by Example (QBE) is often employed in which the user will be able to
identify which images are relevant and which ones are not.

By taking into

account of a users feedback, it is possible to be more precise in the search of


relevant images.
3

Figure 1: Components of Content-Based Image Retrieval Analysis

Areas that are important to CBIRS but that not covered within this document
include:
Similarity/Distance Measures: measures to define how similar / different images
from a result of a query vs. the image in which the query was based.

Benchmarking: An important aspect of image retrieval is the need for quantitative


methods to measure the precision, recall, and other performance characteristics
of image retrieval.
Indexing Scheme: The method by which to organize and index the images for
optimal storage and recall.
Query specification: Examples of this include MRML (Multimedia Retrieval
Markup Language) which is a standard by which to query for images (e.g. similar
to T-SQL language for Microsoft SQL Server).

Examples of CBIRS
Below are some examples of current CBIRS:
Viper (http://viper.unige.ch/)
Developed by the University of Geneva Hospitals, Viper stands for Visual
Information Processing for Enhanced Retrieval. It is a CBIRS that has developed
a freely available image finding tool (GNU Image Finding Tool a.k.a. GIFT) and
developed the image query specification of MRML (Multimedia Retrieval Markup
Language).
GIFT (http://www.gnu.org/software/gift/)
It is a free GNU image finding tool developed by the Viper project.
QBIC (http://wwwqbic.almaden.ibm.com/)
IBMs Query by Image Content allowing one to query large image databases
based on visual image content.
Photobook (http://vismod.media.mit.edu/vismod/demos/facerec/basic.html)
MITs Photobook project is able to perform object and facial recognition by
analyzing and comparing the individual components of an image.

Berkeleys

Digital

Library

Project

(http://elib.cs.berkeley.edu/;

http://elib.cs.berkeley.edu/vision.html)
Computer vision meets digital libraries the digital library project contains a
specific analysis of Image Retrieval by Image Content.
CIRES: http://amazon.ece.utexas.edu/~qasim/research.htm
A content-based image retrieval system utilizing high level (perceptual
organization, etc.) and low level (colors, textures, etc.) to identify images a
good example of feature integration.
VisualSEEK: http://www.ctr.columbia.edu/VisualSEEk/
It is a web tool for searching images and videos.
WebSEEK: http://www.ctr.columbia.edu/webseek/
A derivative of Visual Seek, the tool is a content-based image and video search
and catalog tool for the web.

Examples of Medical Image Retrieval


medGIFT (http://www.sim.hcuge.ch/medgift/w01_Presentation_EN.htm)
Developed by the Service DInformatique Medicale of Hpitaux Universitaires de
Genve (University of Geneva Hospitals Medical Informatics Service), they are
one of the pioneers in medical content-based image retrieval. They created the
Viper project and provide the GIFT (GNU Image Finding Tool).
IRMA: http://libra.imib.rwth-aachen.de/irma/index_en.php
Standing for Image Retrieval for Medical Applications, this project is the
culmination of various departments within the Aachen University of Technology
(Germany). Their focus is that of their radiology image archive.
Assert: http://rvl2.ecn.purdue.edu/~cbirdev/WWW/CBIRmain.html
6

A content-based image retrieval system based out of Purdue in cooperation with


Department of Radiology at Indiana University, University of Wisconsin School of
Medicine, and Purdues own Machine Learning and Robot Vision Labs (School of
Electrical and Computer Engineering).

Features
As noted in the above section, there are many different features associated to an
image:
Color
Texture
Shape
Local Features
This section will provide a high level over view of these features.

Color
One of the primary components of image analysis for the purpose of contentbased image retrieval is that of color analysis. As you may recall, color that is
visible to the human eye represents a small range of the entire electromagnetic
spectrum that represents everything from cosmic rays to x-rays to electric waves.

Figure 2: The Electromagentic Spectrum [27]

As noted above, the color visible to the human eye range in wavelength from
4000 to 7000 angstroms respectively representing the colors violet and red and
all of the colors in between. All other waves ranging from cosmic rays from the
stars to the FM waves to our radios cannot be perceived by the human eye. It is
this small range of the spectrum that is referred as human perceived color space.

Color Spaces
The models of human perception of color differences are described in the form of
color spaces. The two primary color spaces are that of the CIE and HSV model
hence these are the typical color spaces used in content-based image retrieval
systems.

Commission International de LEclairage (CIE) Color Model

The CIE color model was developed by the French organization


Commission International de LEclairage developed in the first half of the 20 th
century. This model was based on the tristimulus theory of color perception in
which the three (3) different types of color receptors in our eyes, also known as
cones, respond differently to different wavelengths of light.

The response

differential between the three different types of cones can be represented in a 3D model as per below.

Figure 3: Three dimensional CIE model - Z coordinates projected to the XY plane [27]

The below representation is that of the above 3-D model projected onto a 2-D
graph.

Figure 4: CIE color model mapped to X and Y coordinates [27]

As you can see from the above diagram, the CIE color model represents the
wavelengths (400nm or 4000 angstroms for violet to 700 nm or 7000 angstroms
for red) of human visible light. The color white is when all three cones are
stimulated evenly.
While the CIE model is a very precise method to measure color, it is not a
very practical nor easy method to use to examine color [27]. Because of this,
many current CBIRS utilize the HSV color space for image analysis.

10

Hue, Saturation, Value (HSV) Model

The HSV model represents color in its distinct components of hue, saturation,
and value. To understand this model, we will first explore its components.
Hue
The primary colors are identified as the primary set of colors that when combined
together can create all of the other colors within the visible human spectrum.
Similar to that of a computer monitor, the primary colors are that of red, green,
and blue. Equal mixing of these colors produce what is known as the secondary
colors of cyan, magenta, and yellow.

Figure 5: Primary and Secondary Color Wheel [27]

If we were to represent the primary and secondary colors within a color wheel,
you will note that the secondary colors complement the primary colors.

For

example, the primary colors of red and blue mixed evenly will produce magenta,
blue and green create cyan, and red and green create yellow. This process of
inter-mixing colors will produce tertiary, quandary, , eventually producing a
solid ring of colors. This definition of color based on the combination of primary
colors is also known as hue; note the color wheels above and below.
11

Figure 6: Primary, Secondary, and Tertiary Color Wheel [27]

12

Figure 7: Hue, Saturation, and Value Color Wheel Model [27]

Saturation and Value


As can be noted form the diagram above, saturation refers to the
dominance of a particular hue within a color. A less saturated color is closer to
white while a more saturated color is closer to the pure color found on the outer
edge of the HSV color wheel diagram (toward the pure colors). Meanwhile, the
value of a color refers to the intensity (the lightness or darkness of the color).
While the two components appear to be similar, they have different effects
concerning the visibility of a color.

Figure 8: Comparing Saturation and Value within HSV model [27]

13

As you can see from the above diagram, a color that is highly saturated will have
a lower value as noted from the red color. A highly valued color will have less
saturation and have a color that is closer to black.

Meanwhile, a minimally

saturated and minimal valued color will be white.


Hence, the HSV model utilizes its components of hue, saturation, and
value to quantify a color. This models more straight-forward ability to quantify
color is the reason why many CBIRS utilize this method for color analysis.
Color Representation
The above descriptions describe the models used to quantify colors. For
example, for the CIE-RGB model (the color model for computer monitors per the
CIE color model), some numeric value is noted for each color component: R
Red, G Green, B Blue such as R:60, G:10, B:20. The same can be said for
the HSV model in which numeric values are assigned to individual colors for the
hue, saturation, and value.
Noting this, an image is composed of many pixels many small segments
that when put together piece together the image (i.e. think a puzzle except with
many small square pieces instead of all of the weird pieces). It is then necessary
to find a way to represent the numeric representation of color for the thousands
of pixels that make up the image.

Color Histogram

The color histogram represents an image by breaking down the various


color components of an image and graphs out the occurrences and intensity of
each color.

14

Figure 9: Color Histogram diagrams

The above histograms break down a particular image by noting its red, green,
and blue components. To compare two images, one needs only to now compare
the color histograms of the two images and determine the similarity of the two
histograms.

Color Correlogram

Another approach for color comparison is to utilize the color correlogram


method.

As noted in Huang [22], most commercial CBIRS utilize color

histograms for image color comparison but this method of comparison does not
take into account of space information i.e. the space or distance between one
color vs. another color. There are various approaches that attempt to integrate
spatial information into color histograms, but color correlograms natively resolve
this issue.

15

Figure 10: Representation of a color correlogram

Also known as scatterplots, correlograms will create a visual representation of


the image similar to that of above. Using this diagram, a simplistic way to note
how spatial information is included in a correlogram is to note that the circled
dark blue pixel is 3 pixels left and one pixel below the squared medium red box.
So while the histogram will note the number of colors and their intensities, a
correlogram will be able to note space information indicating the distance
between the different colors. Therefore, when comparing two different images, it
is not only the color components that are being compared, but also the distance
they are from each other. As noted by Huang [22], while both methods of image
color comparison have good performance, correlograms are more stable to color
change, large appearance change, and contrast and brightness changes. It is
this native inclusion spatial information and stability within the color correlogram
method that has resulted in an increase of literature and its utilization within
CBIRS.

16

Color Addendum
In addition to the methods discussed above, there are many other color analysis
methods including [31]:
Color Moments: utilized in systems like IBMs QBIC, it is a very useful
methodology especially when the image contains just an object.
Color Coherence Vector: is a method to incorporate spatial information into
color histograms.
Invariant Color features: Color can vary due to the illumination, orientation, and
angle of view. A way to handle these factors is to standardize images that enter
into image databases (highly unlikely) or develop methodologies to resolve this.

Texture
Another key component of image analysis is the analysis of the texture of
an image i.e. the perception of smoothness or coarseness of an object. Similar
to the color histogram above, many of the current techniques for image texture
analysis while quantified, lack the spatial information allowing one to compare the
location of a coarse object within an image vs. a smooth object.

Examples

methodologies include [10]:

Gabor Filters
Similar to a Fourier transform, Gabor functions when applied to images
convert image texture components into graphs similar to the ones below. There
are many widely-used approaches to the usage of Gabor filters for text image
characterization.
17

Figure 11: Gabor Filter Representation of Image Texture [28]

The careful manipulation of these Gabor filters will allow one to quantify the
coarseness or smoothness of an image. For example, within the above figure b)
could indicate a more coarse texture than that of what was found in a). Note, the
comparison of these images are performed against the mathematical
representation of these graphs hence the CBIRS ability to compare the textures
of two different images.

Wold Features
Similar to the above Gabor filters, the purpose of using Wold features is to
utilize a mathematical function and coefficients to represent the texture of an
image.

The Wold decomposition algorithm fits within the context of human

textual perception in that it breaks down image texture into the components of:
periodicity, directionality, and randomness. As noted in Liu and Picard [29], these
three components correspond to the dimensions of human textual perception
determined by a psychological study. As noted by the figure below, it appears
that the Wold method has better precision than that of other traditional texture
methods.

18

Figure 12: Wold features for texture comparison vs. other methods [29]

The usage of Wold features is the basis of MITs Photobook tool (for texture
modeling) for performing queries of image databases based on image content.

Wavelet Decomposition
The recurring theme for these texture-based algorithms is the conversion
of image information into a graph and/or some mathematical representation that
can be quantified. Saying this, the wavelet transform provides a multi-resolution
approach to texture analysis and classification [31]. Wavelet decompositions are
already utilized in various forms of image analysis including:
Wavelet analysis is largely monotonic [2] in that it lacks the ability to analyze
color information within its transform.

19

The wavelet transform is able to transform both scale and frequency information
into a simpler mathematical representation [3]. Most algorithms are able to only
transform scale information.
The Georgetown University Radiology department determined the usage of the
Daubechies Wavelet had the highest compression coefficiency with lowest mean
square error when analyzing digital mammograms (i.e. extremely efficient
compression and low error rate when analyzing the image) [4].
The FBI utilizes wavelets to compress digital fingerprint images at a ratio of 20:1
[2].
The wavelet transform is able to process an image via its horizontal and vertical
edges separately; it is able to separate low frequency (individual objects within
an image) and high frequency (edges, noise, etc.) information for both sets of
axis. Note, one of the issues with using this transform is that it is susceptible to
small shifts in the image (i.e. it may exaggerate differences in the image based
on small changes such as location, color, etc.) But, as verified in more recent
literature [30], the wavelet transform can be utilized to classify texture with a low
rate of error.

Texture Addendum
In addition to the methodologies noted above, there are many other
texture image analysis / classification methodologies including (but not
exclusively) [31]:
Tamura features [92]: This concept breaks out texture features into components
including coarseness, contrast, directionality, linelikeness, regularity, and
20

roughness designed in accordance with psychological studies on the human


perception of texture. Wold features and IBMs QBIC system are methodologies
that handle the first three concepts.
SAR: Simultaneous Auto-Regressive Model: The SAR model is an instance of
Markov random field (MRF) models, which have been very successful in texture
modeling in the past decades. Compared with other MRF models, SAR uses
fewer parameters. In the SAR model, pixel intensities are taken as random
variables.
MRSAR: Multi-resolution autoregressive model: More information can be
found in the paper: J. Mao, A.K. Jain, Texture classification and segmentation
using multi-resolution simultaneous autoregressive models, Pattern Recognition,
Vol. 25, 173-188, 1992
As well as other methodologies including shift-invariant principal component
analysis (SPCA) and Tree-Structured Wavelet Transform (TWT)

Shape
Used in many CBIRS, shape features are usually described after the images
have already been segmented or broken out [31].

While a good shape

representation of an image should be handle changes in translation, rotation,


and/or scaling; this is rather difficult to achieve. The primary difficulty is that
images involve numerous geometric shapes that when numerically characterized
will typically lose information.

21

Figure 13: Part of a Degas painting (Etoile)

A methodology that identifies information at too detail a level (down the individual
colors and shapes of a Degas painting for example) will only be able to identify
the color palette. For example, the above image has very few identifiable shapes
that allow one to know what the entire image encompasses. But this shape
found within the entire painting (as noted by the rectangle in the image below)
will allow one to see the entire image of the ballerina dancing.

22

Figure 14: Degas' Etoile Painting (http://www.ibiblio.org/wm/paint/auth/degas/ballet/degas.etoile.jpg)

Conversely, a methodology that characterizes image shape at too global a level


will only be able to quantify the entire image vs. identifying individual components
within the image.

For example, the above image contains the central ballet

dancer but it also contains additional components within the audience and the
other dancers that are watching.

A global approach to shape analysis and

identification would require any similar images to be similar in all of its


23

components - i.e. could not just identify the image by its central ballet dancer
alone.
Global Image Transforms
To continue along the lines of global image transformations, a common
methodology is to utilize the wavelet transform (noted within the Texture section
of this document) which transforms the entire image into frequency components
that can be quantified and compared against. As noted above, with respect to
shape, this may be problematic because all information (color, texture, shape) is
encoded and hence it is not possible to differentiate differences based solely on
shape nor is it possible to compare only part of an image. Therefore many of
the CBIRS that perform shape characterization utilizing global image transforms
presume that each image only contains a single shape.

Boundary-Based
There are various boundary-based shape characterizations [31] which
perform the task of identifying an image by attempting to identify the boundary of
a particular image.

Information such as color and texture often help in the

identification of the boundaries as well.

24

Figure 15: Example of Boundary-Based Shape Characterization [32]

Examples of this include:


Rectilinear shapes
Polygonal Approximation
Finite Element Models
Modal Matching / Fourier-Based Shape Descriptors
As you can see from above, the various methodologies perform the task of
tracing out the actual image and thus being able to identify the shape. This
shape identified is then compared to stored identified shapes and/or the same
procedure is performed on other images to which to be compared.

25

Region-Based
Region-based shape characterizations [31] also refer to statistical
moments of an image. To understand this concept, a quick review of moments
will be provided. As described in Wikipedia (http://en.wikipedia.org), the concept
of moment (mathematics) evolved from the concept of moment within physics.
The latter concept illustrates the magnification of force in rotational systems
between the application of force and where the force is applied (think levers or
pulleys).
The basis of this concept is inertia which refers to Newtons Second Law,
an object in motion continues in that same motion unless acted upon by some
external force. The law states that the acceleration of an object is proportional to
the force exerted on that object this constant of proportionality is also called
inertial mass represented by the M in the diagrams below.

Figure 16: Moment of Inertia (thin hoop) [33]

26

Figure 17: Moment of Inertia (thin hoop w/ width)

Figure 18: Moment of Inertia (solid cylinder)

Figure 19: Moment of Inertia (sphere)

Figure 20: Moment of Inertia (rectangle)

Figure 21: Moment of Inertia (thin rod)

Figure 22: Moment of Inertia (thin rod end)

27

When a force is applied to rotate an object, the angular acceleration is also


linearly proportional to force, and its constant of proportionality is called the
moment of inertia. How all of this relates to shape or pattern recognition refers to
the above diagrams. Notice, how all of the images while having the same height,
weight, and radius, have different moments.

That is different shapes have

different formulas and have different numerical values for their moments even
though they share the same characteristics. That is, moments can be used to
distinguish different shapes of objects [33].
FYI, mathematically, moments can be described as the nth moment of a real
valued function f(x) of a real variable is:

That is, moments seek to characterize sequences of values (1, 2, 3, .)


specifically sequences of values derived from the above formulae.

Local Features
Often termed as spatial information, the basic idea of local features is to
be able to identify characteristics of an image by the individual objects within the
image instead of viewing the image globally. Specifically, the spatial layout of
colors, textures, and shapes or objects within an images vs. the entire image
(e.g. the ballerina in Degas Etoile painting vs. the entire painting). Examples of
this include the color correlogram method or the color coherence vector for color
characterization which helps take into account of spatial relationships between
objects within an image. Another example is that of the Perceptually Based
Image comparison method [7] which is modeled on the human vision system
using contrast sensitivity for color comparison and FT for spatial frequency
computation.
The basic idea of understanding spatial characteristics is the ability to
breakout the image into regions and features such as color, texture, and shape
28

are extracted from those regions [10]. One of the more common methods of
representing spatial relationships is the concept of 2D strings by Chang et al
[31, 34]. For more information concerning this concept, please refer to Chang et
al [34].

Object Recognition
An area that isnt talked about much is object recognition. The implication
of being able to segment images into separate regions and then understanding
the spatial relationships of each of their components is the ability to actually
recognize an object in the human sense.

One example of this is that of

Machine Translation [19]. It is a method that annotates image regions with


words.

It is based on the concept of auto-annotation which clusters image

representation and text to produce a representation of a joint distribution linking


images and words. In order for the system to actually recognize, models are
designed to act as lexicons, devices that predict one representation (words;
English) given another representation (image regions; French). These lexicons
learn from a form of dataset known as aligned bitext. In the end, an Expectation
Maximization (EM) algorithm is utilized to find correspondence between blobs
and words; i.e an iterative process to estimate some unknown parameter. For
more information, please refer to Duygulu et al. [19].

29

Feature Integration
While we have been talking about the characteristics or features of an
image as separate characteristics, the integration of the various types of feature
analysis will often find more precise recall of image retrieval. While the color
correlogram can handle color and spatial information, it is a methodology that
natively handles these two characteristics. The trick is being able to properly
handle different methodologies used for different characteristics; for example:
Blobworld [18] transforms raw pixels to small sets of image regions (spatial
information) into their color and texture components (color and texture
information) represented by histograms for image analysis.
Viper utilizes global color histogram (color information), local color blocks (shape
information), and global texture histograms of Gabor filters (texture information).
As noted in Iqbal and Aggarwal [1], their CBIRS may get better precision in
image retrieval by applying different weighting on the different features of shape
(S), color (C), and texture(T).

For example, within their CIRES system, they

found that a weighting of:


S = 0.05, C = 0.05, T = 0.9
resulted in 80% precision on flower retrieval.

An interesting note for weighting is that GIFT [9] has implemented weighting
schemes that are known from text retrieval literature [13].

Similar to text

retrievals term frequency and inverse document frequency, GIFT has utilized a
form of inverse document frequency weighting for images.
An interesting implication is that to help with image retrieval, mapping tables will
need to be created to store different weights for different categories of images to
help with image retrieval. In its simplest form, this would be a spreadsheet that
would identify the image type and the weights associated with it. For example,
the weighting of S = 0.05, C = 0.05, and T = 0.9 increases the precision of flower
30

image retrieval but could have detrimental effects in precision for other types of
images. While it is an intriguing notion, the implication is there will need to be a
lot of metadata (different types of weighting data for different methodologies of
color, texture, shape, and spatial image characteristic analysis) stored and
readily available for consumption.

Though, if it is possible to identify and

generalize many image weighting schemes, this metadata is exponentially


smaller than the storage and retrieval requirements of an image database.

Associated Text
The idea of associating text and images is rather straight-forward and the
mechanisms for its association have already been utilized in web search engines
such as lycos.com, Columbias WebSeek and google.com. By far the easiest
method to identify images, examples of text associated within an image include:
image name
text within a web page associated with the image
the text associated with the image within the <img> statement of the HTML
etc.
Simple text retrieval methods can then be utilized to categorize and query the
images (instead of actually identifying the objects within the image). But, the
primary concern with text associated images is similar to that of the medical
information database MEDLINE. That is, there is the requirement of manual
human intervention in which to identify keywords and/or descriptions to be
associated with the images (e.g. MEDLINE requires doctors and medical
professionals to personally review literature and assign keywords, descriptors,
etc. to each of the documents). With images, this issue may be far worse in that
there is no real standardization of the keywords and/or descriptors associated
with an image. The lack of standards and the need to automate a system of
image retrieval (considering the million of images one may need to review makes

31

manual human annotation very inefficient) notes the fact that associated text,
while may be helpful, cannot resolve the image retrieval process.

Relevance Feedback
An important way to increase the precision of image retrieval for a user is
to obtain relevance feedback from the user. That is, to have the user indicate
what type of images he is looking for and what images are relevant or nonrelevant to the search. This type of user feedback will allow the users own
qualitative (vs. quantitative) image perception come into play with image retrieval.

Query by Example
The most common approach for relevance feedback is via the concept of
query by example i.e. query for images based on example images provided to
the user. The user then can use these images to help determine what type of
images desired. Three common types of user feedback are:
Multiple Image Query: The user selects a number of images that are relevant
before executing the query.
Cluster Feedback: User starts with single or multiple image queries and selects
images after the query as well. That is, giving the user the ability to provide
additional feedback to the types of images being viewed.

Logging
An interesting extension to QBE is to log the types of images considered
relevant and irrelevant thus having a record of user interaction with the system.

32

The purpose is to record user feedback on images and perform data mining
analytics to predict and determine image retrieval.
Specifically, the application of data mining algorithms is very suitable for this type
of problem. The four types of data mining problems to be discussed are:
Market Basket Analysis
Decision Tree
Path Analysis
Funnel Analysis
Experimental results have shown strong improvement in CBIRS performance
when using feature weights that are calculated with usage log files [11].

Note,

that most systems calculate this type of relevance feedback from one query step
to the following. That is, the feedback takes into account of only the preceding
step instead of taking into account of the entire session. Additional analysis will
need to be performed to determine the effectiveness of session-based prediction
(the entire users session of QBE) vs. the node-specific prediction (predicting the
image based on only the previous image(s) chosen).

Market Basket Analysis


The market basket analysis problem refers to the problem of analyzing
what items are bought together in a supermarket. If you purchase a certain
group of items, you are more or less likely to purchase another group of items.
For example, if you end up buying soft drinks, you are more inclined to buy chips
33

and other snacks to go with them.

And you are probably less inclined to

purchase milk. The purpose is to find relationships of items in the form of:
IF (soft drink) THEN (chips, not milk)
In the end, one of the goals of this common data mining problem is to be able to
predict with some yet-to-be-determined certainty a subsequent purchase (chips)
based on an initial purchase (soft drink).

Saying this, this particular type of

problem and its solutions can be applied to the concept of image retrieval via
query by example. In our particular case, we would want to know the certainty to
predict the retrieval of a particular image based on another image, e.g.:
IF (imagea) THEN (imageb, not imaged)
The research provided by Mller et al [9] has noted this application of the market
basket analysis problem to relevance feedback logs significantly improves
subsequent image retrieval precision. Their analysis also suggests that while
individual image to image market basket analysis provided significant
improvement, further research should be done for session-based analysis; i.e.
build the market basket analysis based on the users entire session interaction of
image query and results.

34

Decision Tree
Decision tree analysis is the ability to create a hierarchy of prediction
mapping out the probability an action will occur based on the previous steps. For
example, using the figure below, image Y can be predicted by the three decision
nodes of:
Image 01 -> Image 02
Image 01 -> Image 03 -> Image 10
Image 01 -> Image M -> Image N
That is, to obtain Image Y, a user can go down the three indicated paths noting
which images are relevant and which are non-relevant (e.g. Image 01 was
relevant, Image 03 was relevant while Image 04 was not, Image 10 was relevant
while Image 11 was not, all resulting in retrieving Image Y).

Figure 23: Example Decision Tree Hierarchy

35

Within the context of relevance feedback, each feedback iteration will learn this
data mining technique to uncover a common thread between all images marked
as relevant. Within this example, the above hierarchy notes that Images 01, 02,
03, 10, M, and N are a common thread for Image Y. As more relevance feedback
iterations are performed, more precision can be made in the retrieval of related
images.

Path Analysis
While not getting into the mathematics, path analysis is a common data
mining technique used in the analysis of web logs. The point of path analysis is
to analyze the entire path from start to finish to determine the common paths
users will use to get to some end point. Within the context of web log analysis,
using the example of barnesandnoble.com, the path analysis desired is to
determine all of the different paths a user would go from the start page to a
promotion page. For example, there is a web-based promotion offering Free
Shipping that is all over the BN.com web site. The idea of path analysis is to
determine the most common set of web clicks to get to the free promotion
whether it is from the users shopping cart, from the main page of the web site,
from the search results page, etc. By being able to determine what are the more
common paths to get to a promotion (e.g. the users shopping cart), BN.com can
now remove some of the other paths and use them for other promotions,
advertising, etc.
What does this have to do with image retrieval? In essence they are one
in the same. Similar to BN.com trying to determine the most common paths to
get to the free promotion, this same technique can be used to determine the
most common paths to get from the start page / image of a CBIRS to a specific
image.

36

By knowing your users predicaments to visualize and/or go down a

particular path it may then be possible to predict which images are related to
each other thus being able to provide more precise image retrieval.
Note, from the review of current literature, it appears that this technique
has not been applied to CBIRS relevance feedback. Saying this, web log path
analysis has been successfully utilized and it is suggested by the author that
research should be devoted to determine the applicability of this technique.

Funnel Analysis
Funnel Analysis is a similar data mining technique that has been applied
to web analytics. For more information concerning this technique, please refer to
Mah et al. [25] in their paper Funnel Report Mining for the MSN Network. As
well, similar to path analysis, a review of current literature does not indicate that
this particular technique has been used within the context of CBIRS. Again, the
author suggests that this technique be utilized for image retrieval due to the
successes seen in web log analysis.
The purpose of funnel analysis within the context of web log analysis is
the study of retention behavior among a series of images or sites. Using the
37

MSN.com network, the question a funnel report will answer is to determine the
percentage of users that are retained going from the start to end. In this example,
were trying to determine the percentage of users retained from msn.com to
completing the tax estimator on moneycentral.

Figure 24: Example funnel report for msn.com to completion of tax estimator

As shown in the diagram above, a funnel of sorts has been graphically depicted
noting the percentage of users retained from the beginning (msn.com) to the end
(completing the tax estimator w/in moneycentral/taxes) of a particular path to be
analyzed.

38

Figure 25: Example Funnel Analysis of CBIRS

Similar to the path analysis above, this web log data mining technique can be
applied to CBIRS relevance feedback as well. Once a particular set of images
have been identified to be interesting to analyze (e.g. chest X-rays indicating
respiratory illnesses) either by interest or by analysis (e.g. path analysis), a
funnel analysis technique can be applied to the relevance feedback logs
determining any particular set of image groupings or categories that are able to
retain users to view the final set of chest X-rays.

CONCLUSION:
This paper proposed an image retrieval method based multi-feature
similarity score fusion. For a query image, multiple similarity score lists based on
different features are obtained. Then using genetic algorithm, multi-feature
similarity scores are fused, and better image retrieval results are gained. In this
paper, when we evaluated the fitness of an individual, we considered only the
occurrence frequencies of an image in retrieval result, and not the location of an
image in retrieval result. However, the location of an image in retrieval result
reflects directly the similarity of it and query image. So, this factor should be
39

taken into account when evaluating the fitness of an individual, which is also our
future work.

References
[1] Gudivada V. N., Raghavan V. V., "Content based image retrieval systems,"
IEEE Computer, vol. 28, pp. 18-22, 1995.
[2] Ritendra Datta, Dhiraj Joshi, Jia Li, James Z. Wang, Image retrieval: Ideas,
influences, and trends of the new age, ACM Computing Surveys, vol. 40, pp. 160, 2008.
[3] B.G. Prasad, K.K. Biswas, S.K. Gupta, Region-based image retrieval using
integrated color, shape, and location index, Computer Vision and Image
Understanding, vol. 94, pp. 193233, 2004.
[4] Young Deok Chun, Nam Chul Kim, Ick Hoon Jang, Content-Based Image
Retrieval Using Multiresolution Color and Texture Features, IEEE Transaction on
Multimedia, vol. 10, pp. 1073-1084, 2008.
[5] Tai X. Y., Wang L. D. , Medical Image Retrieval Based on Color- Texture
Algorithm and GTI Model, Bioinformatics and Biomedical Engineering, 2008,
ICBBE 2008, The 2nd International Conference on, pp. 2574-2578.
[6] H. Yu, M. Li, H.-J. Zhang, J. Feng, Color texture moments for content-based
image retrieval, In International Conference on Image Processing, pp. 24-28,
2002.
[7] Anil K. Jain , Aditya Vailaya, Image retrieval using color and shape, Pattern
Recognition, vol. 29, pp. 1233-1244, 1996.
40

[8] Xiuqi Li, Shu-Ching Chen, Mei-Ling Shyu, Borko Furht, Image Retrieval By
Color, Texture, And Spatial Information, In: Proceedings of the the 8th
International Conference on Distributed Multimedia Systems (DMS 2002), San
Francisco Bay, CA, USA, pp.152159 , 2002.
[9] I. Markov, N. Vassilieva, Image Retrieval: Color and Texture Combining
Based on Query-Image, ICISP 2008, LNCS 5099, Springer-Verlag Berlin
Heidelberg, 2008, pp. 430438.
[10] M. Jovic, Y. Hatakeyama, F. Dong, K. Hirota, Image Retrieval Based on
Similarity Score Fusion from Feature Similarity Ranking Lists, 3rd International
Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2006), LNAI
4223, Springer-Verlag Berlin Heidelberg, 2006, pp. 461470.

41

You might also like