You are on page 1of 10

The automated generation of Web documents

that are tailored to the individual reader

Chrysanne DiMarco Mary Ellen Foster


Department of Computer Science Department of Computer Science
University of Waterloo University of Waterloo
Waterloo, Ontario N2L 3G1, Canada Waterloo, Ontario N2L 3G1, Canada
E-mail: cdimarco@logos.uwaterloo.ca E-mail: mefoster@logos.uwaterloo.ca

Abstract profile of the user’s interests and characteristics. And


such tailoring would involve much more than just se-
Many studies in communication have shown that lecting streams of basic content: the content of the text,
presenting information in a manner that is tailored
whether for an on-line Web page or a paper document,
to the characteristics of a particular audience can
have a significant effect on an individual. Incor- would be carefully selected, structured, and presented
porating this kind of tailoring facility in a sys- in the manner best calculated to appeal to a particular
tem for the management and presentation of Web individual.
documents would be a very important enhance- It is well known from studies in communication that
ment of the Web’s current state of development. presenting information in a manner that is tailored to
Our long-term research goal has been to inves- the characteristics of a particular audience can have a
tigate and develop theories of text composition, significant effect on an individual. Incorporating this
in particular, computational models of rhetorical kind of tailoring facility in a system for the manage-
structure and lexical semantics, which are partic- ment and presentation of Web documents would be
ular problems for incorporating style in natural
language systems. Recently, we have begun to
a very important enhancement of the Web’s current
apply this research to developing Web-based nat- state of development.
ural language generation systems that can tailor
documents to the individual user on demand. The current state of “customization” on the
Web
Introduction Although there are quite a number of tools now avail-
A recent and growing development in Web applica- able that claim to do “customization” or “personaliza-
tions has been the advent of various tools that claim tion” of material on the Web, very few of these prod-
to “customize” access to information on the Web by ucts do anything involving the tailoring of the actual
allowing users to specify the kinds of information text to a particular audience, and none use anything
they want to receive without having to search for it more than a rudimentary description of the user.
or sift through masses of irrelevant material. But this One of the first of these products, and typical of its
kind of customization is really just a crude filtering kind, is the PointCast Network. The PointCast client
of raw Web material in which the user simply selects can select from many “channels” of news to view con-
the “channels” of information she wishes to receive; tinuously updated stock prices, sports scores, weather
this selection of information sources is hardly more information, and the like. There are currently nu-
“customization” than someone deciding to tune their merous other programs currently available or in de-
television to a certain station. True customization, or velopment which provide the same sort of continu-
tailoring, of information would be done for the user by ous, proprietary content. Among them are Marimba’s
a system that had access to an actual model of the user, a Castanet, Intermind Connector, and BackWeb.1 An-
 This paper can be found at the following Web 1
These products can be found at the following Web sites:

site: http://logos.uwaterloo.ca/ healthdo/Publications/ http://www.pointcast.com,
aaai97-symposium.html. The official version of this paper http://www.marimba.com/products/castanet.html,
has been published by the American Association for Artifi- http://www.intermind.com,
cial Intelligence (http://www.aaai.org). http://www.backweb.com.
other common approach to sifting through and select- fabricated, simple blocks of text. IntelliWeb also seems
ing information on the Web uses, instead of propri- more suited to tailoring a single Web-page template
etary content, customized subsets of publicly available by changing a few simple pieces of text or a couple of
Web pages which fit a user’s preferences. For exam- illustrations. It does not appear to be intended for cus-
ple, FreeLoader, Cognisoft, iFusion’s ArrIve, and First tomization of textual materials of any length greater
Floor’s SmartBookmarks use this idea.2 than a few sentences.
Although these products are often advertised as If the Web document designer wishes to write and
tools for customizing or personalizing Web access, the present material in a way that will communicate well
“customization” provided really only involves allow- with the user, then just displaying the most relevant
ing the user to choose among streams of raw infor- chunks of information will not be sufficient. For ef-
mation content. Nothing is done to the language or fective communication, both the form and content of
layout of the document itself to make it more appeal- the language used in a document should be tailored in
ing or accessible to the individual user. rhetorically significant ways to best suit a user’s par-
Two slightly more sophisti- ticular personal characteristics and preferences. Ide-
cated products, Netscape’s “Power Start Page”, and ally, we would have Web-based natural language gen-
The Microsoft Network’s “Custom Start Page” use an eration systems that could produce fully customized
approach similar to the systems mentioned above, but and customizable documents on demand by individ-
also allow users to set up a customized front page, with ual users, according to a formal user model. As a first
links to various predefined locations and a small num- step in this direction, we have been investigating ap-
ber of personalized links. Both also allow some sim- plications of our earlier work on pragmatics in natural
ple modification of presentation style—MSN permits language processing to building systems for the au-
a background sound, while Netscape allows changing tomated generation of Web documents tailored to the
of the layout and colour scheme. BroadVision claims individual reader.
that its “One-To-One” product can “build and manage
visitor profiles and dynamically match profile data to The HealthDoc approach to automated
Web content to personalize [a] site for each visitor, each generation of tailored natural language
time they visit”, but the nature of the personalization
is described in only vague terms.3
documents
Despite the proliferation of the kinds of products The HealthDoc project
described above, only a rare few attempt any formal Our long-term research goal has been to investigate
user modelling as a basis for customizing a document. and develop theories of text composition, in particular,
One of the most advanced is MicroMass’s “tailoring computational models of rhetorical structure, lexical
engine” for Web documents, which uses the results semantics, and fine-grained meaning, which are par-
of an on-line questionnaire to produce, in real-time, ticular problems for incorporating style and rhetoric
a health information newsletter tailored, down to the in natural language systems. In the past several years,
paragraph and sub-paragraph level, to a user’s speci- we have addressed problems of syntactic style, to un-
fied medical conditions and lifestyle.4 MicroMass also derstand how particular syntactic structures can con-
has a product named “IntelliWeb” that dynamically vey corresponding stylistic effects (DiMarco 1990, Di-
creates Web pages from a content database. The se- Marco and Hirst 1993a). We have applied our the-
lection and presentation of information is based on ory of style to various problems in language analysis
data either entered by the user, provided from a cus- and generation. We have produced prototype imple-
tomer database, or obtained automatically by means mentations of a stylistic analyzer and generator (Hoyt
of a World Wide Web profile administrator. In Mi- 1993, Hoyt and DiMarco 1994, Green 1992, Green and
croMass’s approach to customization, however, the DiMarco 1996). We have also been working on the
language of the text selected for a particular user can- problem of style and lexical choice, with an emphasis
not be tailored in style or structure; the nature of the on representing near-synonymy in generation systems
“customization” is really just selecting among pre- (DiMarco and Hirst 1993b, DiMarco, Hirst, and Stede
2
1993, Hirst 1995).
The respective Web sites are: This earlier work is now feeding in to our Health-
http://www.freeloader.com, Doc project (DiMarco, Hirst, Wanner, and Wilkinson
http://www.cognisoft.com/product.htm,
1995), which is developing natural language gener-
http://www.ifusion.com/company/arrive/what,
http://www.firstfloor.com/sb20data.html. ation systems for producing health-information and
3
The Web sites are: patient-education documents that are customized to
http://personal.netscape.com/custom, the personal and medical characteristics of the indi-
http://www.msn.com/csp/choices/first.asp. vidual patient. The HealthDoc approach is applicable,
http://www.broadvision.com/products/v2datasheet.html. we believe, to many kinds of situations in which the
4
The Web site is: ability to target tailored documents to the character-
http://www.micromass.com/. istics of specific users would be desirable. This kind
of customization would involve much more than just We regard this use of a master document as a new
producing each document in half a dozen different approach to natural language generation, in which
versions for different audiences. Rather, the number generation from scratch is avoided. Generation by se-
of different combinations of factors might easily be in lection and repair uses a partially specified, pre-existing
the tens or hundreds of thousands, and it would be document as the starting point. In this way, we can
impossible to produce, in advance of need, the large finesse many of the intractable problems of genera-
number of different editions of each publication that tion, as we start from a document in which many of
would be entailed by individual tailoring of informa- the decisions have already been predetermined: over-
tion. This is exactly the kind of situation that we face all text organization, division of propositional content
in developing Web-based natural language generation into sentences, choice of words, and lexical cohesive
systems that could produce tailored documents for the structure.
individual Web user.
Recently, we have been experimenting with the ap- How repairs are made
plication of HealthDoc techniques to develop a related
system, WebbeDoc, that can customize Web docu- The core of HealthDoc’s tailoring facility is its sentence
ments on demand, according to a profile of an in- planner, which is presently under development for the
dividual user. In the sections below, we describe the main project. Because the bits and pieces of text se-
design and implementation of the first WebbeDoc pro- lected from a master document might not necessarily
totype, then outline the kinds of long-term research be coherent or cohesive, the sentence planner performs
issues which will need to be addressed to develop a complex linguistic repairs to restore coherence and co-
full working system. To start, we present an overview hesion to the “broken” selected document.
of the concepts and techniques that HealthDoc uses The sentence planner takes as input a set of sen-
and that are being adapted for WebbeDoc. tence plans, written in Text Specification Language
(described below), and performs the necessary repairs,
The master document and generation by with each type of repair performed by an indepen-
selection and repair dent repair module. The sentence planner is based on
a blackboard architecture in which individual repair
The key idea in the HealthDoc approach to producing modules communicate and resolve their conflicts with
tailored documents is that we start from an existing one another. The architecture is described in greater
master document that is then customized for a particu- detail by Hovy and Wanner (1996) and Wanner and
lar audience. A master document is an encapsulation Hovy (1996). Four repair modules are being built in
of all the variations on a given topic that might be the first phase of the main HealthDoc project: for dis-
needed for any potential reader; it is represented in course structuring, aggregation to remove redundan-
an abstract, albeit language-dependent, text specifi- cies, reference restoration using pronouns, and con-
cation language that expresses not only the content stituent re-ordering.
of the document but also information that will assist
any subsequent process of revision; this language will
be described below. Selections from this document WebbeDoc: An application of the
are made for both content and form, but are automat- HealthDoc approach to the automated
ically post-edited—“repaired”— for form, style, and customization of Web documents
coherence.5
The idea behind WebbeDoc
5
It might be argued that a master document could just
be a large set of simple blocks of text (or templates) to be We have now begun to apply the HealthDoc approach
included or excluded as appropriate for both content and to designing Web-based document management sys-
form; the customized patient-information leaflets produced tems that would produce textual materials tailored
by Strecher and colleagues were done this way (Strecher et al to the individual reader. We have developed a pro-
1994; Campbell et al 1994; Skinner, Strecher, and Hospers totype of such a system, called WebbeDoc, that cus-
1994). However, this approach requires that an extremely tomizes a Web document describing the HealthDoc
large number of bits and pieces of text be available: each fact project.6 WebbeDoc first displays only the most basic
expressed in each possible way. And the assembly of such
bits and pieces suffers from the obvious problem that the
information about the project, using a bland style of
resulting document might not be coherent or cohesive, or at presentation, and then allows the user to set various
the very least, not stylistically polished. It might be objected personal parameters and stylistic preferences. This
that the pieces of text could be carefully constructed so that
all possible selections resulted in a well-formed document. ing the number of distinct elements required. In the limit,
Indeed, Strecher et al (1994) tried essentially this. However, one would simply store a distinct document pre-written for
they found it difficult to do even for their fairly simple doc- every single combination of possibilities, a situation that we
ument (Victor Strecher and Sarah Kobrin, p.c.); it would have already assumed to be impractical.
6
surely be very hard to achieve for complex documents un- A demonstration of WebbeDoc can be found at:
less the granularity were extremely coarse, thereby increas- 
http://logos.uwaterloo.ca/ healthdo/About/webbedoc.html.
causes WebbeDoc to “rewrite” its text and presenta- the authoring tool:
tion style in accordance with the selected reader pro- (1) An authoring tool allows a writer to enter all the
file. Users can specify their role (e.g., computational different variations that make up a master doc-
linguist, funder, physician, layperson), age, and read- ument. This authoring facility (for computational
ing level, as well as stylistic preferences about the linguists) / writer’s workbench (for laypersons who
formality or “coolness” of the document to be gen- want a highly technical text) / writer’s assistant
erated. Examples of two different tailored versions of (for laypersons who want a non-technical style) has
the opening section of the WebbeDoc page are shown knowledge about language and document struc-
in figure 1. ture, so that it can guide the author into organiz-
A document can be customized on all levels of lin- ing a large number of snippets of text, all the dif-
guistic structure: paragraph, sentence, and lexical ferent variations, into a single master document.
choice, with each type of structure chosen for the ap-
propriate pragmatic effect. WebbeDoc is doing more From there, the writer develops the theme through
than blindly concatenating blocks of information con- two subsequent sub-topics, on the need for a selec-
tent; it is selecting the most relevant pieces of text, tion specification facility and on the kind of linguistic
with respect to both semantic content and pragmatic knowledge that the authoring tool must have. But
effect, so that they fit together in a coherent and co- each of these sub-topics has eight variations, made
hesive manner. Our approach differs from that used up from all the combinations of role (computational
by Strecher et al (1994) because, for WebbeDoc, the linguist, layperson), degree of technical detail (high,
process of fitting together the bits and pieces of se- low), and degree of formality (formal, informal).7 All
lected text is dependent on an explicit representa- the variations for the first sub-topic are shown in table
tion of the textual structure of the document. It is 1.
the existence of explicit rhetorical and other linguis- For the second sub-topic, which describes the kind
tic relations between the individual pieces of text that of linguistic knowledge that WebbeDoc requires, there
gives WebbeDoc the ability to produce coherent and are similarly eight different variations, differing in fea-
polished tailored documents from the master docu- tures such as specialized language, number and com-
ment. (The nature of the document representation is plexity of sentences, and impersonal or deictic style.
described in detail in the section below entitled “Rep- From this sequence of three sentences, and their
resenting a master document”.) variations, WebbeDoc can produce eight different
The document’s structural representation contains texts—no matter how the selections are made, the
not only linguistic information, but formatting spec- resulting paragraph will be coherent, with the same
ifications for each type of reader. So, in addition to rhetorical structure, cohesive, with the appropriate
textual customization, WebbeDoc can tailor the doc- discourse connectives, and stylistically appropriate,
ument’s style and form of presentation; it can select, with the right level and choice of vocabulary.
according to the user profile, the most appropriate art- For example, for a non-technical, informal layper-
work, font, colour, and general layout. son, the selected text would be:
An authoring tool allows a writer to enter
The present approach: “Generation by all the different variations that make up a
selection only” master document. This writer’s assistant
WebbeDoc is a direct application of the ideas and has knowledge about language and docu-
mechanisms of HealthDoc; the project Web page that ment structure, so that it can guide the au-
it customizes is itself a master document. In the first thor into organizing a large number of snip-
implementation of WebbeDoc, we have implemented pets of text, all the different variations, into
a form of “generation by selection only”: the structure a single master document. Also, the writer’s
of the master document is tightly constrained so that, assistant helps you tag each snippet of text
after selection, no repairs will be needed to produce a that you write with the patient features that
coherent and stylistically adequate text. you choose. And the assistant has expert
knowledge about language. This knowledge
An example of tailoring by WebbeDoc gives it the ability to convert your original
In one section of WebbeDoc’s HealthDoc master docu- sentences into the system’s special internal
ment, we use a sequence of three sentences to explain representation. This is called “Text Specifica-
how an authoring facility must be provided to allow a tion Language”.
writer to enter all the textual variations that make up 7
Currently, WebbeDoc uses a total of five reader parame-
a master document, together with the linguistic infor- ters: role (computational linguist, physician, funder, layper-
mation that makes generation-by-selection-only work son); degree of technical detail (high, low); degree of formal-
well. ity (formal, informal); age (child, adult, senior), and degree
This section begins with an introduction aimed at all  
of “coolness” (bland, cool). This gives a possible 96 (4 2
audiences, differing only in the term used to described  
2 3 2) distinct combinations and different texts.
Figure 1: Examples of different tailored versions from WebbeDoc
Highly technical text
Formal Informal
Linguist In addition, the authoring facility In addition, the authoring facility as-
will assist the writer to specify the se- sists you to specify the selection cri-
lection criteria associated with each teria that you wish to associate with
textual fragment. each textual fragment that you write.

Layperson In addition, the writer’s workbench In addition, the writer’s workbench


will assist the writer to specify the assists you to specify the patient se-
patient selection features associated lection features that you wish to as-
with each text segment. sociate with each text segment that
you write.

Non-technical text
Formal Informal
Linguist Also, the authoring tool will help the Also, the authoring tool helps you
writer to tag each fragment of text tag each fragment of text that you
with a particular set of selection fea- write with the selection features that
tures. you choose.

Layperson Also, the writer’s assistant will help Also, the writer’s assistant helps you
the writer to tag each snippet of text tag each snippet of text that you write
with the particular patient features. with the patient features that you
choose.

Table 1: The conceptual structure of a fragment of a master document

The key to WebbeDoc’s ability to produce tailored ument; authoring and knowledge-based document
documents by selection from a single master docu- management; and sentence planning for automated
ment is the manner of representation of the master post-editing.
document: a WebbeDoc master document has a well-
defined structure of ordering relations, rhetorical rela- The next step: Generation of Web pages by
tions, and other linguistic information, such as coref-
erence links. In the first implementation, the master selection and repair
document was built manually according to our model Representing a master document
of a master document, with additional structural con-
straints imposed so that piecewise selection and re- Text Specification Language, or TSL, is the language
combination would not create any infelicities such as used to represent master documents in the parent
abrupt changes of topic, unnecessary duplications of HealthDoc system. We anticipate that WebbeDoc mas-
noun phrases, or unresolvable pronouns. ter documents will have a hybrid representation: part
TSL (for the portions that will be subject to syntactic
But to compose a master document of this style and or stylistic repair), part “frozen” English text (for the
internal complexity required the efforts of computa- portions that need never be revised). We have defined
tional linguists, rhetoricians, and Web document de- TSL to be an extension of the Sentence Plan Language
signers; obviously this is not realistic for the average (SPL) that is used by the Penman text generation sys-
Web user! In a realistic and usable implementation, tem (Penman Natural Language Group 1989), whose
WebbeDoc would need an authoring tool and a sen- KPML derivation (Bateman 1995) is used in Health-
tence planner that could work in real-time to repair Doc. An SPL expression is an abstract specification
and polish the selected text—we can’t expect the aver- of a sentence, which Penman can convert to the cor-
age Web document author to pre-compile all the pos- responding surface form. This permits expression of
sible combinations in advance. Therefore, to develop the content of the document. The basic SPL structures
such a system, a number of research issues must be ad- are annotated with selection and repair information to
dressed, including representation of the master doc- produce the corresponding TSL representation.
The format of the annotations for selection follows For example, the first sub-topic of the sample
the structure of a user model, with annotations or- WebbeDoc text given above elaborates on the se-
ganized by personal and demographic category; for lection specification facility in the authoring tool;
example: the second sub-topic justifies the kind of special-
:reader-role (layperson) ized linguistic knowledge needed by the author-
:reader-age (adult) ing tool. Essentially, a sub-topic is a semantically
coherent piece of the document.
Other kinds of annotation for selection, such as read-
ing level and preferred style of presentation, will, for  Each sub-topic is a collection of version sets that are
the moment, be represented in a similar manner: connected by ordering relations, rhetorical rela-
tions, coreference links, and formatting relations.
:technical-level (low)
A version set is a set of textual variations such
:formality (informal)
that each variation fulfills the same communica-
The annotations can be included at any level in the SPL tive goal, but has a semantic content and prag-
so that the system can make selections at any level of matic form tailored to a particular audience. Each
linguistic granularity. As stylistic and pragmatic cus- variation in a version set is characterized by a logi-
tomization becomes more complex, additional repre- cal condition and a semantically coherent piece of
sentations will probably be needed. text. The logical condition uses terms that range
But this information isn’t enough. We also re- over sets of mutually exclusive features.
quire the internal discourse structure to be repre- We interpret “mutual exclusion” to mean that the
sented explicitly, to guide repairs to the structure of conditions assigned to the variations in a version
the text. Therefore, TSL contains several kinds of ad- set define a clean partition of the set, so that ex-
ditional annotations, including topic ordering informa- actly one of the variations must be chosen.
tion, coreference links, and rhetorical relations between
sentences. In addition to these current kinds of an- In the example given earlier, the first sub-topic is
notations, WebbeDoc’s TSL will contain information a singleton version set, sentence (1), while the sec-
on formatting and document presentation that would ond version set is made up of the eight sentences
be marked up for inclusion according to specific user shown in table 1, and the third set also contains
eight different sentence variations.
preferences.8
 Ordering relations may exist between the version
The model of a master document A master docu- sets that make up a sub-topic. These relations in-
ment is constructed according to a formal model; the dicate the preferred order of the sequence of varia-
model that we describe here is the most general, in- tions that have been selected to form the working
tended for the overall HealthDoc system, which does document, and thereby specify the ordering of
selection and repair of a master document. (The cur- sub-topics prior to the invocation of the sentence
rent version of WebbeDoc, which does generation by planner.
selection only, with no repairs involved, uses a more
constrained model of a master document.) Preferred order can vary by reader. For example,
We define the general model of a master document the author of the WebbeDoc MD might decide that
(MD) as follows: for computational linguists, the sub-topic about
the authoring tool’s linguistic intelligence should
 An MD has a coherent high-level communicative precede the sub-topic on the selection criteria, but
goal, such as to inform, to command, to persuade, for laypersons, the reverse order would be prefer-
to impress. For example, the purpose of the cur- able.
rent WebbeDoc MD is to inform (and impress)
the reader about the goals and technical achieve-  Rhetorical relations may exist between the version
ments of the HealthDoc project. sets that make up a sub-topic. The rhetorical
relations that we are currently using are taken
 An MD has a coherent topic structure, with a divi- from Rhetorical Structure Theory (RST) (Mann
sion into topics, sub-topics, and so on. The small- and Thompson 1988). In the current version of
est topic unit of an MD at the moment is a sub- WebbeDoc, the same rhetorical relation must ex-
sub-topic; however, we believe the form of the ist between any two members of adjacent version
“smallest topic unit” will vary with the particular sets.
document.
In the example we have been using, the rhetorical
 Each sub-topic corresponds to a section of the doc- relations are as follows:
ument that satisfies a more specific communica- Any choice from the second version set (shown
tive goal, such as to justify or elaborate upon. in table 1) elaborates upon sentence (1) (the first
8
Indeed, we anticipate that there will be a distinct “re- version set).
pair” module for document formatting in the sentence plan- Any choice from the third version set justifies any
ner used with WebbeDoc. earlier choice from the second version set.
 Coreference links may be defined between any two Functions of sentence planning and automated
version sets. In our example, the following terms, post-editing
used in the first and second version sets, are coref- In general, selecting material from pre-existing text
erential: authoring tool, authoring facility, writer’s and then editing it to recover coherence and cohesion
workbench, writer’s assistant, and it. (The first two can involve a wide range of problems in various as-
terms are also near-synonyms.) pects of sentence planning. For example, both syntac-
tic and semantic aggregation may be needed, as well as
 Formatting information may be defined at each
chunking of whole and partial propositions. Pronouns
topic and sub-topic level. Formatting informa-
and other forms of reference need to be chosen. And,
tion may also be defined between and within ver-
of course, aggregation and sentence restructuring will
sion sets, including illustrations, choice of colour,
affect the rhetorical relations between the elements of
design of layout, and so on.
the text.
Our current work is focusing on the development
Authoring a master document of two key modules of the sentence planner: for dis-
course structuring and for aggregation. It is unlikely
WebbeDoc master documents may be based on the that every ordering of the blocks of text that are orga-
natural-language text of pre-existing material, or they nized into a master document will produced a coher-
may be created from scratch (or some combination of ent sequence of selected pieces of text. To ensure that
the two). Either alternative requires the involvement any resulting document makes sense, the discourse
of a human. module uses the rhetorical relations that hold among
the textual units to produce a sequence that is most
The author of a WebbeDoc master document would likely to be coherent. In later work, an additional
normally be a professional technical writer or Web- module will be built to determine the linguistic phras-
document designer, who will need to understand the ing of the discourse relation.
nature of customized and customizable texts, but who The aggregation module eliminates redundancy in
should not be assumed to have any special knowledge TSL expressions by grouping together entities that are
or understanding of TSL or the innards of WebbeDoc. arguments of the same rhetorical relation, verbal pro-
The authoring tool, therefore, should be no more cess, etc. Each aggregation rule recognizes an exact
difficult for the author to use than, say, the more- match of some portions of two input TSL expressions
sophisticated features of a typical word processor. The and returns a single, fused, expression. The actions
text is therefore written in English, and will be trans- of the aggregation module will generally affect the re-
lated to TSL by the authoring tool. (The English source sulting syntactic structure.
text is retained in the TSL for use in subsequent author- A critical problem is the distribution of repair tasks
ing sessions—for example, if the document is updated among the planning modules, as there are often strong
or amended.) interactions. The responsibilities of each module and
It is the writer’s job to decide upon the basic ele- the overlaps between them are an area of on-going
ments of the text, the formatting, ordering, rhetorical, research for our sentence-planning group.
and coreferential links between them, and the condi-
tions under which each element should be included in Conclusion
the output. The elements of the text are then typed into
The HealthDoc project and its WebbeDoc offspring
the authoring tool in English, and are marked up by
aim to provide a comprehensive approach to the au-
the writer with conditions for inclusion, links for co-
tomated tailoring of both paper documents and Web-
hesion and coreference, and annotations for ordering
based materials. We incorporate explicit user mod-
and formatting of the document layout. An example
eling as a basis for the document tailoring, and we
of the authoring tool’s main interface (depicting part
take into account user information ranging from sim-
of the sample WebbeDoc master document described
ple demographic data to complex pragmatic prefer-
earlier) is given in figure 2.
ences. We have developed a model of language gen-
The tool then translates the text into TSL. This is eration, “generation by selection and repair”, that re-
essentially a process of semi-automated parsing, so lies on a “master-document” representation that pre-
that whenever an ambiguity cannot be resolved, the determines the basic form and content of a text and
writer is queried in an easy-to-understand form. The yet is amenable to editing and revision for customiza-
design and development of the authoring tool and its tion. The WebbeDoc project aims to provide useful
user interface is part of the current phase of the overall techniques for natural language applications on the
HealthDoc project (fall 1996 to spring 1997). The user Web and to address a number of important issues for
interface is being developed by Parsons (1997), while research in more-general systems for language gener-
Banks (1997) is implementing the English-to-TSL con- ation.
version (for more details on the underlying model of
conversion, see DiMarco and Banks (1997)).
Figure 2: The main interface of the authoring tool
Acknowledgements by medical condition and personal characteristics.”
The HealthDoc Project is supported by a grant from Technol- Workshop on Artificial Intelligence in Patient Education,
ogy Ontario, administered by the Information Technology Glasgow, August 1995.
Research Centre. Vic DiCiccio was instrumental in helping Green, Stephen (1992). “A functional theory of style
us to obtain the grant, and has been invaluable in subsequent for natural language generation.” Master’s thesis,
administration. Substantial portions of the sections of this Department of Computer Science, University of Wa-
paper that described the HealthDoc project and authoring of terloo, 1993.
master documents were written by Graeme Hirst; they are Green, Stephen J. and DiMarco, Chrysanne (1996).
used here with his permission. Some material in the section “Stylistic decision-making in natural language gen-
on the functions of sentence planning was written by Eduard
Hovy; it is used here with his permission. We are grateful
eration.” In Trends in natural language generation:
to Graeme Hirst and Eduard Hovy for many helpful com- An artificial intelligence perspective. Giovanni Adorni
ments on this research and this paper. The other members and Michael Zock (eds.). Springer-Verlag Lecture
of the HealthDoc Project have also contributed to the work Notes in Artificial Intelligence (a subseries of Lec-
described here, especially Daniel Marcu, Kim Parsons, and ture Notes in Computer Science) number 1036, 1996.
Phil Edmonds. Jonathan Dursi kindly provided help with Hirst, Graeme (1995). “Near-synonymy and the struc-
some of the LATEXdevelopment. ture of lexical knowledge.” Working notes, AAAI
Symposium on Representation and Acquisition of Lexi-
References cal Knowledge: Polysemy, Ambiguity, and Generativity,
Banks, Steven (1997). Master’s thesis. Department of Stanford University, March 1995, 51–56.
Computer Science, University of Waterloo, expected Hovy, Eduard and Wanner, Leo (1996). “Manag-
Spring 1997. ing sentence planning requirements.” Proceedings,
Bateman, John Arnold (1995). “KPML: The KOMET– ECAI-96 Workshop on Gaps and Bridges: New Direc-
Penman multilingual linguistic resource develop- tions in Planning and Natural Language Generation,
ment environment.” Proceedings, 5th European Work- Budapest, August 1996.
shop in Natural Language Generation, Leiden, May Hoyt, Pat (1993). A goal-directed functionally-based
1995, 219–222. stylistic analyzer. Master’s thesis, Department of
Campbell, Marci Kramish; DeVellis, Brenda M.; Computer Science, University of Waterloo, 1993.
Hoyt, Pat and DiMarco, Chrysanne (1994). “A goal-
Strecher, Victor J.; Ammerman, Alice S.; DeVellis,
Robert F.; and Sandler, Robert S. (1994). “Improv- directed multi-level stylistic analyzer.” Proceed-
ing dietary behavior: The effectiveness of tailored ings, 10th Canadian Conference on Artificial Intelli-
gence, Banff, May 1994, 23–30.
messages in primary care settings.” American Jour-
nal of Public Health, 84(5), May 1994, 783–787. Mann, William C. and Thompson, Sandra A. (1988).
“Rhetorical Structure Theory: Toward a functional
DiMarco, Chrysanne (1990). Computational stylistics for
theory of text organization.” Text, 8(3), 1988, 243–
natural language translation. PhD thesis, Department
281.
of Computer Science, University of Toronto, 1990.
Parsons, Kimberley J. (1997). Master’s thesis, Depart-
Published as technical report CSRI-239.
ment of Computer Science, University of Waterloo,
DiMarco, Chrysanne and Banks, Steven (1997). “Us-
expected Spring 1997.
ing subsumption classification on a stylistic hierar-
Penman Natural Language Group (1989). “The Pen-
chy as the basis of a multi-stage conversion of natu-
man primer”, “The Penman user guide”, and “The
ral language text to sentence plans.” In preparation.
Penman reference manual.” Information Sciences
DiMarco, Chrysanne; Hirst, Graeme; and Stede, Man-
Institute, University of Southern California.
fred (1993). “The semantic and stylistic differentia-
Skinner, Celette Sugg; Strecher, Victor J.; and Hos-
tion of synonyms and near-synonyms.” Proceedings,
pers, Harm (1994). “Physicians’ recommendations
AAAI Spring Symposium on Building Lexicons for Ma-
for mammography: Do tailored messages make a
chine Translation, Stanford, March 1993, 114–121.
difference?” American Journal of Public Health, 84(1),
DiMarco, Chrysanne and Hirst, Graeme (1993a). “A
January 1994, 43–49.
computational theory of goal-directed style in syn-
Strecher, Victor J.; Kreuter, Matthew; Den Boer, Dirk-
tax.” Computational Linguistics, 19(3), September
Jan; Kobrin, Sarah; Hospers, Harm J; and Skinner
1993, 451–499.
Celette S. (1994). “The effects of computer-tailored
DiMarco, Chrysanne and Hirst, Graeme (1993b). “Us-
smoking cessation messages in family practice set-
age notes as the basis for a representation of near-
tings.” The Journal of Family Practice, 39(3), Septem-
synonymy for lexical choice.” Proceedings, Ninth An-
ber 1994, 262–270.
nual Conference of the University of Waterloo Centre for
Wanner, Leo and Hovy, Eduard (1996). “The Health-
the New Oxford English Dictionary and Text Research,
Doc sentence planner.” Proceedings of the Eighth In-
Oxford, September 1993, 33–43.
ternational Workshop on Natural Language Generation,
DiMarco, Chrysanne; Hirst, Graeme; Wanner, Leo; Brighton, UK, June 1996.
and Wilkinson, John (1995). “HealthDoc: Cus-
tomizing patient information and health education

You might also like