Extraction-Based Single-Document Summarization Using Random Indexing

19th IEEE International Conference on Tools with Artificial Intelligence
Extraction-Based Single-Document Summarization Using

Random Indexing
Niladri Chatterjee Shiwali Mohan

Department of Mathematics Division of Instrumentation and Control
Indian Institute of Technology Delhi Netaji Subhas Institute of Technology
New Delhi, India – 110016 New Delhi, India - 110075
niladri@maths.iitd.ac.in shiwali.mohan@gmail.com
Abstract indicating the central topics) or informative (content

laden) [3].
In this work we have focused on producing a generic,
This paper presents a summarization technique for extractive, informative, single document summary
text documents exploiting the semantic similarity between exploiting the semantic similarity of sentences.
sentences to remove the redundancy from the text.
Semantic similarity scores are computed by mapping the 2. Previous Work in Extractive Text
sentences on a semantic space using Random Indexing. Summarization
Random Indexing, in comparison with other semantic
space algorithms, presents a computationally efficient
Various methods have been proposed to achieve
way of implicit dimensionality reduction. It involves
extractive summarization. Most of them are based on
inexpensive vector computations such as addition. It thus
scoring of the sentences. Maximal Marginal Relevance [4]
provides an efficient way to compute similarities between
scores the sentences according to their relevance to the
words, sentences and documents. Random Indexing has
query, Mutual Reinforcement Principle for Summary
been used to compute the semantic similarity scores of
generation [5] uses clustering of sentences to score them
sentences and graph-based ranking algorithms have been
according to how close they are to the central theme. QR
employed to produce an extract of the given text.
decomposition method [6] scores the sentences using
column pivoting. The sentences can also be scored by
certain predefined features. These features may include
1. Introduction linguistic features and statistical features, such as location,
rhetorical structure [7], presence or absence of certain
Automatic Text Summarization is an important and syntactic features [8] and presence of proper names, and
challenging area of Natural Language Processing. The statistical measures of term prominence [9].
task of a text summarizer is to produce a synopsis of any Rough set based extractive summarization [10] has
document or a set of documents submitted to it. been proposed that aims at selecting important sentences
Summaries differ in several ways. A summary can be from a given text using rough sets, which has been
an extract i.e. certain portions (sentences or phrases) of traditionally used to discover patterns hidden in data.
the text is lifted and reproduced verbatim, whereas Methods using similarity between sentences and
producing an abstract involves breaking down of the text measures of prominence of certain semantic concepts and
into a number of different key ideas, fusion of specific relationships [11] to generate an extractive summary have
ideas to get more general ones, and then generation of also been proposed.
new sentences dealing with these new general ideas [1]. A Some commercially available extractive summarizers
summary can be of a single document or multiple like Copernic [12] and Word [13] summarizers use
documents, generic (author’s perspective) or query certain statistical algorithms to create a list of important
oriented (user specific) [2], indicative (using keywords concepts and hence generate a summary.
1082-3409/07 $25.00 © 2007 IEEE 448

DOI 10.1109/ICTAI.2007.28
We propose to achieve extractive summarization as a An environment in linguistics is called a context. A
three-step process: context of a word can easily be understood as the
1. Mapping the words and sentences onto a semantic linguistic surrounding of the word. As an illustration,
space (Word Space) consider the following sentence.
2. Computing similarities between them. ‘A friend in need is a friend indeed’
3. Employing the use of graph-based ranking algorithms If we define the context of a focus word as one preceding
[14] to remove the redundant sentences in the text. and one succeeding word, then the context of ‘need’ is
The task involves simple mathematical computations, ‘in’ and ‘is’, whereas the context of ‘a’ is ‘is’ and ‘friend’.
such as addition of vectors, and thus is far more effective To tabulate this context information a co-occurrence
than other algorithms based on semantic similarities, such matrix of the following form can be created, in which the
as LSA based summarization that involves expensive (i,j)th element denotes the number of times word i occurs
matrix computations. in the context of word j within the text.
3. The Word Space Model Co-occurrents

Word a friend in need is indeed
The Word-Space Model [15] is a spatial a 0 2 0 0 1 0
representation of word meaning. The complete vocabulary friend 2 0 1 0 0 1
of any text (containing n words) can be represented in an in 0 1 0 1 0 0
n-dimensional space in which each word occupies a need 0 0 1 0 1 0
specific point in the space and has a vector associated is 1 0 0 1 0 0
with it defining its meaning. indeed 0 1 0 0 0 0
The Word Space Model is based entirely on language
data available. It does not rely on previously compiled
lexicons or databases to represent the meaning. It only Here the context vector for ‘need’ is [ 0 0 1 0 1 0] and
represents what is really there in the current universe of for ‘a’ is [0 2 0 0 1 0]. They effectively sum up the
discourse. When meanings change, disappear or appear in environments (contexts) of the words in question, and can
the data at hand, the model changes accordingly. If an be represented in a six-dimensional geometric space
entirely different set of language data is used, a complete (since the text contains 6 words). A context vector thus
different model of meaning is obtained. obtained can be used to represent the distributional
information of the word into geometrical space. (Note that
3.1. The Word Space Hypotheses this is similar to assigning a unary index vector to ‘is’ ([ 0
0 0 0 1 0 ]) and to ‘in’ ([0 0 1 0 0 0]) and adding them up
The Word Space is so constructed that the following to get the context vector of ‘need’.)
two hypotheses hold true.
3.3. Similarity in Mathematical Terms
1. The Proximity Hypothesis: The words which lie closer
to each other in the word space have similar meanings Context vectors as such do not convey any beneficial
while the words distant in the word space have dissimilar information. They just give the location of the word in the
meanings. word space. To get a clue on ‘how similar the words are
in their meaning’ a similarity measure of the context
2. The Distributional Hypothesis: Once the language data vectors is required.
is obtained, the word space model uses the statistics about Various schemes, such as scalar product of vectors,
the distributional properties of the words. The words Euclidean distance, Minkowski metrics [15], can be used
which are used within similar group of words (i.e. similar to compute similarity between vectors.
context) should be placed nearer to each other. We have used cosine of the angles between the two
vectors x and y to compute normalized vector similarity.
3.2. Context Vectors and Co-occurrence matrices The cosine angle between two vectors x and y is defined
as:
Once the distributional property of a word is obtained,
the next step is to transform the distributional information
into a geometric representation. In other words “The
distribution of an element will be understood as the sum
of all its environments” [16].
449
The cosine measure corresponds to taking the scalar which can then, if needed, be assembled into a co-
product of the vectors and then dividing by their norms. occurrence matrix.
The cosine measure is the most frequently utilized
similarity metric in word-space research. The advantage 4.1. RI Algorithm
of using cosine metric over other metrics to calculate
similarity is that it provides a fixed measure of similarity, Random Indexing accumulates context vectors in a
which ranges from 1 (for identical vectors), to 0 (for two step process:
orthogonal vectors) and -1 (for vectors pointing in the
opposite directions). Moreover, it is also comparatively 1. Each word in the text is assigned a unique and
efficient to compute. randomly generated vector called the index vector. The
index vectors are sparse and high dimensional and ternary
3.4. Problems Associated with Implementing (i.e. 1, -1, 0). Each word is also assigned an initially
Word Spaces empty context vector which has the same dimensionality
(r) as the index vector.
The dimension n used to define the word space 2. The context vectors are then accumulated by advancing
corresponding to a text document is equal to the number through the text one word taken at a time, and adding the
of unique words in the document. The number of context's index vector to the focus word's context vector.
dimensions increases as the size of text increases. Thus a When the entire data has been processed, the r-
text document containing a few thousands of words will dimensional context vectors are effectively the sum of the
have a word space of few thousands of dimensions. Thus words' contexts.
computational overhead increases rapidly with the size of For illustration we can again take the example of the
the text. The other problem is of data sparseness. The sentence
majority of cells in co-occurrence matrix constructed ‘A friend in need is a friend indeed’
corresponding to the document will be zero. The reason is
that the most of the words in any language appear in Let the dimension r of the index vector be 10 for
limited context, i.e. the words they co-occur with are very illustration purposes. The context is defined as one
limited. preceding and one succeeding word.
The solution to this predicament is to reduce the high Let ‘friend’ be assigned a random index vector:
dimensionality of the vectors. A few algorithms attempt to [0 0 0 1 0 0 0 0 -1 0 ]
solve this problem by dimensionality reduction. One of and ‘need’ be assigned a random index vector:
the simplest ways is to remove words belonging to certain [0 1 0 0 -1 0 0 0 0 0]
grammatical classes. Other way could be employing Then to compute the context vector of ‘in’ we need to
Latent Semantic Analysis [17]. We have used Random sum up the index vector of its context. Since the context is
Indexing [18] to address the problem of high defined as one preceding and one succeeding word, the
dimensionality. context of ‘in’ is ‘friend’ and ‘need’. We sum up their
index vectors to get the context vector of ‘in’.
4. Random Indexing [0 1 0 1 -1 0 0 0 -1 0]
If a co-occurrence matrix has to be constructed, r-
Random Indexing (RI) [18] is based on Pentti dimensional context vectors can be collected into a matrix
Kanerva's [19] work on sparse distributed memory. of order w x r, where w is the number of unique word
Random Indexing was developed to tackle the problem of types, and r is the chosen dimensionality of for each word.
high dimensionality in word space model. While Note that this is similar to constructing an n-
dimensionality reduction does make the resulting lower- dimensional unary context vector which has a single 1 in
dimensional context vectors easier to compute with, it different positions for different words and n is the number
does not solve the problem of initially having to collect a of distinct words. Mathematically, these n dimensional
potentially huge co-occurrence matrix. Even unary vectors are orthogonal, whereas the r-dimensional
implementations that use powerful dimensionality random index vectors are nearly orthogonal. There are
reduction, such as SVD [17], need to initially collect the many more nearly orthogonal than truly orthogonal
words-by-documents or words-by-words co-occurrence directions in a high-dimensional space [18]. Choosing
matrix. RI removes the need for the huge co-occurrence random indexing is an advantageous tradeoff between the
matrix. Instead of first collecting co-occurrences in a co- number of dimensions and orthogonality, as the r-
occurrence matrix and then extracting context vectors dimensional random index vectors can be seen as
from it, RI incrementally accumulates context vectors, approximations of the n-dimensional unary vectors.
450
Observe that both the unary vectors and the random any definitive pattern about the words they co-occur with.
index vectors assigned to the words construct the word Further, the terms whose distribution is most distinctive
space. The context vectors computed on the language data will be given the most weight.
are used in mapping the words onto the word space. In
our work we used Random Indexing because of the 5. The Experimental Setup
advantages discussed below.
Our experimental data set consists of fifteen
4.2. Advantages of Random Indexing
documents containing 200 to 300 words each. The
processing of each document to generate a summary has
Compared to other word space methodologies Random been carried out as follows:
Indexing approach is unique in the following three ways:
First, it is an incremental method, which means that
5.1. Mapping of Words onto the Word Space
the context vectors can be used for similarity
computations even after just a few examples have been
encountered. By contrast, most other word space methods Each word in the document was initially assigned a
require the entire data to be sampled before similarity unique randomly generated index vector of the dimension
computations can be performed. 100 with ternary values (1, -1, 0). This provided an
Second, it uses fixed dimensionality, which means that implicit dimensionality reduction of around 50%. The
new data do not increase the dimensionality of the index vectors were so constructed that each vector of 100
vectors. Increasing dimensionality can lead to significant units contained two randomly placed 1 and two randomly
scalability problems in other word space methods. placed -1s, rest of the units were assigned 0 value. Each
Third, it uses implicit dimension reduction, since the word was also assigned an initially empty context vector
fixed dimensionality is much lower than the number of of dimension 100. The dimensions r assigned to the words
words in the data. This leads to a significant gain in depend upon the number of unique words in the text.
processing time and memory consumption as compared to Since our test data consisted of small paragraphs of 200-
word space methods that employ computationally 300 words each the vector of dimensions 100 sufficed. If
expensive dimension reduction algorithms. larger texts containing thousands of word are to be
summarized larger dimensional vectors have to be
employed.
4.3. Assigning Semantic Vectors to Documents
We defined the context of a word as two words on
either side. Thus a 2x2 sliding window was used to
The average term vector can be considered as accumulate the context vector of the focus word. The
the central theme of the document and is computed as: context of a given word was also restricted in one
sentence, i.e. across sentence windows were not
considered. In case where the window extended in the
where n is the number of distinct words in the document. preceding or the succeeding sentence, a unidirectional
While we compute the semantic vectors for the window was used. There is fair evidence supporting the
sentences we subtract from the context vectors use of small context window. Kaplan [22] conducted
of the words of the sentence to remove the bias from the various experiments with people in which they
system [21]. The semantic vector of a sentence is thus successfully guessed the meaning of a word if two words
computed as: on either side of it were also provided. Experiments
conducted at SICS, Sweden [23] also indicate that a
narrow context window is preferable for acquiring
where, n is the number of words in the focus sentence and semantic information. The above observation prompted us
th to use a 2x2 window. The window can be weighted as
i refers to the i word of the sentence and is the
corresponding context vector. well to give greater importance to the words lying closer
Note that subtracting the mean vector reduces the to the focus word. For example, the weight vector [0.5 1 0
magnitude of those term vectors which are close in 1 0.5] suggests that the words adjacent to the focus word
direction to the mean vector, and increases the magnitude are given the weight 1 and the words at distance 2 are
of term vectors which are most nearly opposite in assigned a weight of 0.5. In our experiments we have used
direction from the mean vector. Thus the words which the above mentioned weights for computing the context
occur very commonly in a text, such as the auxiliary verbs vectors.
and articles, will have little influence on the sentence
vector so produced. Typically, these words do not have
451
5.2. Mapping of Sentences onto the Word Space 5.4 Calculating Weights of the Sentences and
Generating Summary
Once all the context vectors have been accumulated,
semantic vectors for the sentences were computed. A Once the graph is constructed, our aim is to get rid of
mean vector was calculated from the context vectors of all the redundant information in the text by removing the
the words in the text. This vector was subtracted from the sentences of less importance. To achieve this, the
context vectors of the word appearing in the sentence, the sentences are ranked by applying some graph-based
resultants were summed up and averaged to compute the ranking algorithms. Various graph-based ranking
semantic vector of the sentence. algorithms are available in literature. The one that we
have used for this work is weighted PageRank algorithm
5.3. Construction of Completely Connected [24].
Undirected Graph
Weighted PageRank Algorithm
We constructed a weighted, completely connected, Let G = (V, E) be a directed graph with the set of
undirected graph from the text, wherein each sentence is vertices V and set of edges E, where E is a subset of VxV.
represented by a node in the graph. The edge joining node For a given vertex Vi, let In(Vi) be the set of vertices that
i and node j is associated with a weight wij signifying the point to it (predecessors), and let Out(Vi) be the set of
similarity between the sentence i and sentence j. vertices that vertex Vi points to (successors). Then the
new node weight assigned by PageRank ranking algorithm
5.3.1. Assigning the Node Weights after one iteration is:
The weight of each node was determined by

calculating the ‘relevance’ of each sentence in the text.
For this purpose, we identified the index words [9] of the
This computation is carried out on all the nodes in
document. An index word is a word which has a
succession iteratively until node weights converge. We set
frequency higher than a predetermined lower cutoff and
the value 0.85 to the factor d as per the recommendation
does not belong to the grammatical classes like articles,
in [25]. We have applied this algorithm on undirected
prepositions and auxiliary verbs. All the index words were
graphs constructed for each text by considering In(Vi) =
assigned a weight which was calculated by dividing the
number of occurrence of the word by total number of Out(Vi) for all nodes.
distinct words in the text. The node weights converge in such a way that weights
For example in the text containing 6 distinct words: of the important sentences are highly increased, while
“A friend in need is a friend indeed” the word ‘friend’ those of the redundant sentences do not increase in same
occurs twice. Thus it is assigned a weight of 2/6 = 0.333. proportion. Once the weights stabilize, the sentences are
A 1-dimensional vector was allocated to each sorted in descending order of their weights. The top few
sentence, with length equal to the number of index words sentences are selected to generate the summaries.
and each element referring to an index word. The value of
that element was determined by multiplying the number of 6. Results
times the index word occurred in the sentence by its
weight. We have run the experiments on different texts and
An average document vector was calculated by computed extracts at 10%, 25%, 50% levels. We
averaging the context vectors of all its sentences. Cosine compared the results with manual summaries created by
similarity of each of the sentence with the document experts and also the summaries generated by some
vector was calculated. The value thus obtained was commercially available summarizers namely Copernic
assigned to the respective node and will be called the and Word. Below we show a sample text and its
node weight. summaries at different percentage levels generated by our
scheme. We also show the summaries generated by
5.3.2. Assigning the Edge Weights Copernic and Word at different levels and also the expert
generated summary.
The edges between the nodes are weighted by the Consider the text given below as our sample
similarity scores between the participating nodes document, which is a ten-sentence long document. As
(sentences). The similarity was computed by determining mentioned earlier, for evaluation purpose we have used
the cosine metric between the sentence vectors. documents which are 200 to 300 words long.
452
A solar eclipse occurs when the Moon passes between Earth and the The summary generated by Copernic:
Sun, thereby totally or partially obscuring Earth's view of the Sun. 10% summary
This configuration can only occur during a new moon, when the Sun A solar eclipse occurs when the Moon passes between Earth and the
and Moon are in conjunction as seen from the Earth. Sun, thereby totally or partially obscuring Earth's view of the Sun.
In ancient times, and in some cultures today, solar eclipses are 25% summary
attributed to mythical properties. A solar eclipse occurs when the Moon passes between Earth and the
Total solar eclipses can be frightening events for people unaware of Sun, thereby totally or partially obscuring Earth's view of the Sun.
their astronomical nature, as the Sun suddenly disappears in the middle A total solar eclipse is a spectacular natural phenomenon and many
of the day and the sky darkens in a matter of minutes. people consider travel to remote locations in order to observe one.
However, the spiritual attribution of solar eclipses is now largely 50% summary
disregarded. A solar eclipse occurs when the Moon passes between Earth and the
Total solar eclipses are very rare events for any given place on Earth Sun, thereby totally or partially obscuring Earth's view of the Sun.
because totality is only seen where the Moon's umbra touches the total solar eclipses can be frightening events for people unaware of their
Earth's surface. astronomical nature, as the Sun suddenly disappears in the middle of the
A total solar eclipse is a spectacular natural phenomenon and many day and the sky darkens in a matter of minutes.
people consider travel to remote locations in order to observe one. A total solar eclipse is a spectacular natural phenomenon and many
The 1999 total eclipse in Europe, said by some to be the most-watched people consider travel to remote locations in order to observe one.
eclipse in human history, helped to increase public awareness of the The 1999 total eclipse in Europe, said by some to be the most-watched
phenomenon. eclipse in human history, helped to increase public awareness of the
This was illustrated by the number of people willing to make the trip to phenomenon.
witness the 2005 annular eclipse and the 2006 total eclipse.
The next solar eclipse takes place on September 11, 2007, while the
next total solar eclipse will occur on August 1, 2008. The summary generated by Word:
10% summary
The sentences selected by experts manually to create a next total solar eclipse will occur on August 1, 2008.
summary are: 25% summary
A solar eclipse occurs when the Moon passes between Earth and the
A solar eclipse occurs when the Moon passes between Earth and the Sun, thereby totally or partially obscuring Earth's view of the Sun.
Sun, thereby totally or partially obscuring Earth's view of the Sun. A total solar eclipse is a spectacular natural phenomenon and many
This configuration can only occur during a new moon, when the Sun people consider travel to remote locations in order to observe one.
and Moon are in conjunction as seen from the Earth. The next solar eclipse takes place on September 11, 2007, while the
Total solar eclipses are very rare events for any given place on Earth next total solar eclipse will occur on August 1, 2008.
because totality is only seen where the Moon's umbra touches the 50% summary
Earth's surface. A solar eclipse occurs when the Moon passes between Earth and the
A total solar eclipse is a spectacular natural phenomenon and many Sun, thereby totally or partially obscuring Earth's view of the Sun.
people consider travel to remote locations in order to observe one. A total solar eclipse is a spectacular natural phenomenon and many
The 1999 total eclipse in Europe, said by some to be the most-watched people consider travel to remote locations in order to observe one.
eclipse in human history, helped to increase public awareness of the The 1999 total eclipse in Europe, said by some to be the most-watched
phenomenon. eclipse in human history, helped to increase public awareness of the
phenomenon.
This was illustrated by the number of people willing to make the trip to
The summary generated by our summarizer:
witness the 2005 annular eclipse and the 2006 total eclipse.
10% summary
next total solar eclipse will occur on August 1, 2008.
Sun, thereby totally or partially obscuring Earth's view of the Sun.
25% summary
A solar eclipse occurs when the Moon passes between Earth and the For larger texts, we used Precision, Recall and F, widely
Sun, thereby totally or partially obscuring Earth's view of the Sun. used in Information Retrieval [26] for evaluating our
This configuration can only occur during a new moon, when the Sun results. For each document an extract done manually by
and Moon are in conjunction as seen from the Earth.
experts has been considered as the reference summary
Total solar eclipses can be frightening events for people unaware of
their astronomical nature, as the Sun suddenly disappears in the middle (denoted by Sref). We then compare the candidate
of the day and the sky darkens in a matter of minutes. summary (denoted by Scand) with the reference summary
50% summary and compute the precision, recall and F values as follows:
Sun, thereby totally or partially obscuring Earth's view of the Sun.
This configuration can only occur during a new moon, when the Sun
and Moon are in conjunction as seen from the Earth.
Total solar eclipses can be frightening events for people unaware of We also compute the precision, recall and F values for the
their astronomical nature, as the Sun suddenly disappears in the middle summaries generated by Copernic [12] and Word
of the day and the sky darkens in a matter of minutes.
summarizer [14] by comparing them with Sref. Finally we
Total solar eclipses are very rare events for any given place on Earth
because totality is only seen where the Moon's umbra touches the compare the p, r, F values corresponding to our
Earth's surface. summarizer with these values. The values obtained for
The 1999 total eclipse in Europe, said by some to be the most-watched fifteen documents have been tabulated in Table 1.
eclipse in human history, helped to increase public awareness of the
phenomenon.
453
Text Our Approach Copernic Word
Number
p r F p r F p r F
1 10% 1.000 0.166 0.284 1.000 0.083 0.154 1.000 .250 0.400
25% 0.800 0.333 0.424 0.500 0.166 0.252 0.800 .333 0.424
2 10% 1.000 0.444 0.444 1.000 0.140 0.250 0.500 0.142 0.222
25% 1.000 0.429 0.601 1.000 0.429 0.545 0.333 0.142 0.200
3 10% 0.500 .125 0.200 0.500 0.125 0.200 0.500 0.125 0.200
25% 0.600 .375 0.462 0.750 0.375 0.500 0.600 0.375 0.462
4 10% 1.000 0.200 0.333 1.000 0.200 0.333 0 0 N.A.
25% 1.000 0.400 0.570 0.666 0.400 0.498 0.666 0.400 0.498
5 10% 1.000 0.143 0.249 1.000 0.143 0.249 0.500 0.143 0.222
25% 0.750 0.426 0.545 0.666 0.286 0.400 0.500 0.286 0.329
6 10% 1.00 0.300 0.451 1.000 0.200 0.333 0.500 0.100 0.167
25% 0.833 0.500 0.625 0.666 0.300 0.400 0.400 0.200 0.267
7 10% 1.000 0.222 0.364 1.000 0.222 0.364 0.500 0.111 0.181
25% 0.200 0.444 0.552 0.750 0.333 0.458 0.400 0.400 0.282
8 10% 1.000 0.250 0.400 1.000 0.250 0.400 0 0 N.A.
25% 1.000 0.500 0.666 1.000 0.500 0.666 0.500 0.250 0.400
9 10% 0.750 0.200 0.315 0.666 0.133 0.221 0.500 0.133 0.210
25% 0.875 0.466 0.608 0.857 0.400 0.545 0.625 0.333 0.434
10 10% 0.666 0.200 0.307 1.000 0.200 0.333 0.500 0.200 0.285
25% 0.833 0.500 0.625 0.800 0.400 0.533 0.714 0.500 0.588
11 10% 1.000 0.166 0.285 1.000 0.166 0.284 0.666 0.166 0.265
25% 0.857 0.500 0.632 0.666 0.333 0.444 0.875 0.583 0.699
12 10% 0.666 0.125 0.211 1.000 0.125 0.222 0.666 0.125 0.210
25% 0.875 0.438 0.583 0.750 0.375 0.500 0.750 0.375 0.500
13 10% 1.000 0.222 0.363 1.000 0.222 0.363 0.666 0.222 0.333
25% 0.800 0.444 0.571 0.750 0.333 0.461 0.600 0.333 0.428
14 10% 1.000 0.182 0.305 1.000 0.182 0.308 0.600 0.182 0.279
25% 0.833 0.454 0.593 0.800 0.364 0.499 0.600 0.364 0.453
15 10% 1.000 0.230 0.373 1.000 0.230 0.373 0.666 0.154 0.249
25% 0.857 0.461 0.599 0.666 0.307 0.419 0.714 0.384 0.499
Table 1: Results of Our summariser, Copernic and Word Summarizer
The observations clearly indicate that the summaries memory consumption compared to other dimensionality
generated by our method are closer to the human reduction approaches. The approach gives better results
generated summaries that the summaries produced by than commercially available summarizers namely
Copernic and Word summarizers at 10% and 25% level Copernic and Word Summarizer.
in almost all the text cases. At 50% level too we In future we plan to include a training algorithm
obtained better results compared to Copernic and Word. using Random Indexing which will construct the Word
However, limitation of space precludes us to show the Space on a previously compiled text database and then
figures in this table. to use it for summarization purposes so as to resolve the
ambiguities, such as polysemy, more efficiently.
6. Conclusions and Future Scope We observed some abruptness in the summaries
generated by our method. We plan to smoothen out this
In this paper we have proposed a summarization abruptness by constructing Stiener trees of the graphs
technique which involves mapping of the words and constructed corresponding to the text.
sentences onto a semantic space and exploiting their In our present evaluation we have used measures like
similarities to remove the less important sentences precision, recall and F which are used primarily in the
containing redundant information. The problem of high context of information retrieval. In future we intend to
dimensionality of the semantic space corresponding to use more summarization-specific techniques, e.g.
the text has been tackled by employing Random ROUGE [27] to measure the efficacy of our scheme.
Indexing which is less expensive in computations and
454
Text summarization is an important challenge in [14] Word Sumamriser www.microsoft.com/education/
present day context for huge volumes of text are being autosummarize.mspx
produced every day. We expect that the proposed
[15] M. Sahlgren, “The Word-Space Model: Using
approach will paves the way for developing an efficient
distributional analysis to represent syntagmatic and
AI tool for text summarization. paradigmatic relations between words in high-dimensional
vector spaces”, Ph.D. dissertation, Department of Linguistics,
7. References Stockholm University, 2006
[1] Inderjeet Mani, “Advances in Automatic Text [16] Z. Harris, Mathematical structures of language,
Summarization”, MIT Press, Cambridge, MA, USA, 1999. Interscience Publishers, 1968.
[2] Jade Goldstein, Mark Kantrowitz, Vibhu Mittal, and Jaime [17] Thomas K Landauer, Peter W. Foltz, Darrell Laham, “An
Carbonell, “Summarizing text documents:sentence selection Introduction to Latent Semantic Analysis”, 45th Annual
and evaluation metrics”, ACM SIGIR, 1999, pp 121–128. Computer Personnel Research Conference – ACM, 2004.
[3] E.H. Hovy and C.Y. Lin, “Automated Text Summarization [18] M. Sahlgren, “An Introduction to Random Indexing.
in SUMMARIST”, Proceedings of the Workshop on Proceedings of the Methods and Applications of Semantic
Intelligent Text Summarization, ACL/EACL-97. Madrid, Indexing”, Workshop at the 7th International Conference on
Spain, 1997. Terminology and Knowledge Engineering, TKE, Copenhagen,
Denmark, 2005.
[4] J. Carbonell and J. Goldstein, “The use of MMR, diversity-
based reranking for reordering documents and producing [19] P. Kanerva, Sparse distributed memory, Cambridge, MA,
summaries,” ACM SIGIR, 1998, pp. 335–336. USA: MIT Press, 1988
[5] Zha Hongyuan, “Generic Summarization and Key phrase [20] S. Kaski, “Dimensionality reduction by random mapping:
Extraction Using Mutual Reinforcement Principle and Fast similarity computation for clustering”, Proceedings of the
Sentence Clustering”, ACM, 2002. International Joint Conference on Neural Networks,
. IJCNN'98 IEEE Service Center, 1999.
[6] John Conroy, Leary Dianne, “Text Summarization via
Hidden Markov Models and Pivoted QR Matrix [21] Derrick Higgins, Jill Burstein, “Sentence similarity
Decomposition”, ACM SGIR ,2001. measures for essay coherence”, Proceedings of the 2004
Human language Technology Conference of the North
[7] Daniel Marcu “From discourse structures to text American chapter of the Association for Computational
summaries” In ACL’97/EACL’97 Workshop on Intelligent Linguistics, Boston, Massachusetts, May 2004
Scalable Text Summarization, 1997, pp 82–88.
. [22] A. Kaplan “An experimental study of ambiguity and
[8] J. Pollock and A. Zamora “Automatic abstracting research context”, Mechanical Translation, 2(2), 1955.
at chemical abstracts service”, JCICS, 1975.
[23] J. Karlgren, & M. Sahlgren (2001), “From words to
[9] Hans P. Luhn, “The automatic creation of literature understanding”, Foundations of real-world intelligence ,CSLI
abstracts”, IBM J. of R. and D, 1958. Publications, 2001
[10] Nidhika Yadav, M.Tech Thesis , Indian Institute of [24] Rada Mihalcea, “Graph-based Ranking Algorithms for
Technology Delhi, 2007 Sentence Extraction, Applied to Text Summarization”,
Proceedings of the 42nd Annual Meeting of the Association
[11] Yihong Gong and Xin Liu, “Generic text summarization for Computational Linguistics, companion volume (ACL
using relevance measure and latent semantic analysis”, 2004), Barcelona, Spain, July 2004.
SIGIR,ACM, 2001, pp 19–25.
[25] S. Brin and L. Page, “The anatomy of a large-scale
[12] Copernic Summarizer Homepage: http://www.copernic hypertextual Web search engine”, Computer Networks and
.com /en/products/summarizer. ISDN Systems, 1998.
[13] Rada Mihalcea and Paul Tarau, “An Algorithm for [26] R.B. Yates, B.R. Neto, Modern Information Retrieval,
Language Independent Single and Multiple Document Pearson Education, 1999
Summarization”, Proceedings of the International Joint
Conference on Natural Language Processing (IJCNLP), [27] ROUGE: Recall Oriented Understudy for Gisting
Korea, October 2005. evaluation, http://www.isi.edu/~cyl/ROUGE/
455

Extraction-Based Single-Document Summarization Using Random Indexing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Extraction-Based Single-Document Summarization Using Random Indexing

Uploaded by

Copyright:

Available Formats

19th IEEE International Conference on Tools with Artificial Intelligence

Extraction-Based Single-Document Summarization Using

Niladri Chatterjee Shiwali Mohan

Abstract indicating the central topics) or informative (content

1082-3409/07 $25.00 © 2007 IEEE 448

3. The Word Space Model Co-occurrents

The weight of each node was determined by

Table 1: Results of Our summariser, Copernic and Word Summarizer

You might also like