You are on page 1of 9

Sistrom and Informatics • Origina l Rese arch

Honeyman-
Buck
Information
Transfer:
Free Text vs
Structured
Format

Free Text Versus Structured


Format: Information Transfer
Efficiency of Radiology Reports
Chris L. Sistrom1 OBJECTIVE. We discuss the effect of radiology report format on the accuracy and speed
Janice Honeyman-Buck with which reviewers can extract case-specific information.
MATERIALS AND METHODS. A Web-based testing mechanism was used to present
Sistrom CL, Honeyman-Buck J radiology reports to each of 16 senior medical students and record their answers to 10 multiple
choice questions about specific medical content for each of 12 cases. Subjects were randomly
assigned to view the reports in either free text or structured format. In addition to number of
answers correct for each case, we recorded the time taken for each case and an efficiency score
(correctly answered questions per minute). These three outcomes were tested for differences
on report format using multifactorial analysis of variance. A postexperimental questionnaire
and a mediated focus group elicited subject preference as to radiology report format.
RESULTS. There were no significant differences in the three outcomes (score, time, and
efficiency) between the free text and structured format conditions. The power of the experiment
was sufficient to detect small differences in these outcomes by format. Subjects strongly and
consistently expressed a preference for the structured version.
CONCLUSION. We assert that free text and itemized (structured) forms of radiology re-
ports are equally efficient and accurate for transmitting case-specific interpretative content to
reviewers of the document.

Background and Significance The distinction between reporting into


During his inaugural address to the 2004 structure and reading structure is by no means
ARRS Meeting, the incoming president, purely academic. Current technology allows a
Christopher R. B. Merritt, spoke extensively clinical document to be produced in one format
about radiology reporting [1]. He emphasized and displayed in an entirely different way. For
the need for using new technologies to example, one vendor of a structured computer-
achieve standardization and structure to facil- ized reporting product (eDictation) uses a so-
itate clinical care, research, and compliance. phisticated interface to allow radiologists to
In a 2005 review on the subject, Sistrom and choose relevant findings from a large and
Langlotz [2] described a framework for im- highly structured menu of possibilities. At the
proved radiology reporting that articulated same time, the software produces phrases and
three attributes as targets for improvement. sentences that look just like a typical narrative
Received November 1, 2004; accepted after revision
These were standard language, structured for- report and it is this document that is made
December 8, 2004. mat, and consistent content [2]. In consider- available for referring physicians to review.
ing different options for report format, we be- The assumption is that clinicians will be more
C. L. Sistrom was funded by the General lieve that the choices made will affect the comfortable with radiology reports that look
Electric/Association of University Radiologists Research
process of care in two distinct ways. First, like what they are used to reading, that is, nar-
Fellowship from July 2000 through June 2003.
during report creation, a predefined format rative free text. Our research was designed to
1Both authors: Department of Radiology, University of
may fundamentally alter the way that the in- obtain empiric data about this question. We
Florida Health Center, PO Box 100374, Gainesville, FL 32610. terpreting physician thinks about the case as sought to examine the clinical utility of radiol-
Address correspondence to C. L. Sistrom he or she produces the document. We call this ogy reports in terms of information transfer to
(sistrc@radiology.ufl.edu). reporting into structure. Second, during re- the reader. Specifically, what are the effects of
AJR 2005; 185:804–812
port review, the choice of different report for- report format (independent of content) on the
mats may affect the way in which the reader efficiency with which medical personnel can
0361–803X/05/1853–804
comprehends the information. We call this read them and obtain information needed for
© American Roentgen Ray Society reading structure. patient care? Our working hypothesis was that

804 AJR:185, September 2005


Information Transfer: Free Text vs Structured Format

low) that allowed them to refer back to the report


text as needed. These students did not participate as
subjects in the subsequent experiment. We also
gave printed versions of the questions and associ-
ated reports to two faculty radiologists and two se-
A nior radiology residents and asked them to evaluate
them for clarity and consistency. Using feedback
given to us by all of these evaluators, we corrected
the wording of one of the question stems and three
of the options (all distracter items). We also elimi-
nated one option (also a distracter).
The original reports were in narrative format with
variable use of headings for indication, comparison,
examination details, and findings. All of the original
reports had a labeled impression section. These 12
reports along with the 10 questions pertinent to each
one formed the free text condition of the experiment.
A report structure shell was designed for each of the
three types of studies. These consisted of the compo-
nents typically found in narrative radiology reports
with additional headings in the findings section. For
abdominal CT and sonography, these basically were
anatomic. For head CT, we combined anatomic and
functional headings. The templates for all three ex-
amination types are listed in Appendix 1.
For each case, we parsed the free text report into
the appropriate structured template. This was done
so that all the original content was exactly and com-
pletely replicated in the structured version. We
were careful to leave basic sentence structure and
word choice intact. We duplicated or eliminated
words or phrases as needed to maintain proper syn-
tax in the structured version. For example, the free
text might contain “The pancreas, spleen, and kid-
neys are unremarkable.” The word “unremarkable”
would be placed after the pancreas, spleen, and kid-
B neys headings in the structured version. If a heading
Fig. 1—Screen capture images from the Web-based testing program used to perform the experiment. in the structured template did not have any relevant
A–C, Students first were presented with a clinical scenario page (A). Pressing the button labeled GO TO THE content from the free text version, it was left in the
REPORT caused the second page (B) to be displayed with the report text. Pressing the button labeled GO TO THE
report with no text after it. Appendix 2 is an exam-
QUESTIONS caused a third page (C) to appear. This contained the 10 questions pertaining to the report and had
buttons labeled GO BACK TO THE REPORT that caused redisplay of the report. ple of a free text abdominal CT report and Appen-
(Fig. 1 continues on next page) dix 3 contains the structured version. With 12
unique cases, each having two versions (free text
consistently formatted (structured) reports outpatient clinics. We did not select the cases at ran- and structured format), there were 24 total cases.
would be easier to read and comprehend result- dom but rather looked for reports that had some rel-
ing in greater efficiency for the task of answer- atively common abnormalities. We excluded reports Experimental Details
ing content-specific questions. that described findings that might be considered as Our College of Medicine uses a locally developed
unusual, emergent, or unexpected. All 12 reports Web-based testing system for all examinations ad-
Materials and Methods were blinded by removing patient identifying infor- ministered to students. It has been in use for more
Following approval for the research by our local mation and any reference to proper names in the text. than 7 years and by the time our medical students
institutional review board (IRB), radiology reports For each report, we generated a series of 10 mul- have reached their fourth year, they have taken at
were selected from our departmental archive of rou- tiple-choice items, each having a question stem and least 20 tests using it. Since we used the same system
tine clinical examinations performed between Janu- from 3–5 options. These were designed to have a to administer our experimental tests, all of the sub-
ary 2001 and December 2002. Four reports were se- single option that was unambiguously correct jects were quite familiar with its function and ap-
lected from each of three types: songraphy of the based only on the content of the report to which pearance. This eliminated any need to train subjects
abdomen, CT of the abdomen, and head CT without they referred. The reports and candidate questions before their participation and reduced any potential
contrast. All examinations had been performed on were administered to three senior medical students variability in performance based on differential
patients referred from the emergency department or using a Web-based testing system (described be- learning of the testing procedure itself.

AJR:185, September 2005 805


Sistrom and Honeyman-Buck

was opened for each participant at the College of


Medicine bookstore. Each of the subjects was as-
signed all 12 cases to do during a single experimen-
tal session. Six of these cases were presented with
the free text version of the report and the remaining
six with the structured version of the report. The as-
signment of which of the 12 cases was presented to
each subject as free text versus structured was ran-
domized, with one restriction: We balanced the
number of times each case was presented in free text
versus structure across the entire experiment. The
order in which subjects took their 12 cases was ran-
domized, with one restriction: We balanced the as-
signment so that each case would be presented an
equal number of times in each quartile of the case
order (1–3, 4–6, 7–9, 10–12). Our initial power cal-
culations called for eight subjects. This allowed a
symmetric design in both the format and case order
factors. We performed a second repetition of the ex-
periment with an additional eight subjects, resulting
in a total of 192 sets of case/subject responses for
analysis. During the second repetition, the same ran-
domization scheme was used with the only differ-
ence being that the assignment of the free text or
structured format version of the report was reversed.
C Testing of all 16 subjects (5 women and 11 men)
was completed in the 2002–2003 academic year.
Fig. 1 (continued)—Screen capture images from the Web-based testing program used to perform the experiment.
A–C, Students first were presented with a clinical scenario page (A). Pressing the button labeled GO TO THE There were no dropouts or technical failures during
REPORT caused the second page (B) to be displayed with the report text. Pressing the button labeled GO TO THE the experiment and each of the subjects completed
QUESTIONS caused a third page (C) to appear. This contained the 10 questions pertaining to the report and had their 12 cases in the sequence assigned during a sin-
buttons labeled GO BACK TO THE REPORT that caused redisplay of the report.
gle session and none reported any problems with the
testing system.
The experimental testing software was designed and the time-stamped navigation data. Submission Following completion of the testing, 15 of the 16
to present each case as a set of three Web pages was not accepted until all 10 questions had been an- subjects met with the principal investigator in a
joined into a common frame set. The first page gave swered. It is important to note that the design of the “debriefing” focus group. Before this meeting, sub-
a brief description of a clinical presentation derived testing mechanism caused the entire set of pages jects had been told only that the purpose of the ex-
from the clinical history and the stated reason for comprising a single case to be transferred from periment was to assess their ability to extract infor-
the examination in question. A button (labeled GO Web server to the local computer when the subject mation from reports. At the beginning of the
TO THE REPORT) on the first page caused a sec- was ready to start. All navigation and recording of meeting, subjects were given a brief questionnaire
ond page containing the report text to be displayed. time stamps was accomplished locally and required to fill out. This asked several general questions
Buttons above and below the report text (labeled no further traffic between the testing computer and about radiology report format and content. Next, a
GO TO QUESTIONS) served to advance to the the server. Thus, there was no possibility that tim- brief paragraph defined free text and structure fol-
third page containing all 10 questions relating to the ing during a single case might be confounded by lowed by an example of each format (similar to
report. The question page also had buttons (labeled variations in network speed. All testing was per- Tables 1 and 2). Next, a set of eight Likert-scaled
GO BACK TO THE REPORT) at the top and bot- formed on a single workstation located in a quiet items asked for their preference concerning various
tom that caused redisplay of the report text while room. This computer had a single 18-inch flat panel functional aspects of reading radiology reports (ac-
keeping the question page active in the background, color monitor. Relevant software included Win- curacy, speed, certainty, items not mentioned, pos-
thus preserving its state. Subjects could switch back dows 2000 professional and Microsoft Internet Ex- itive findings, negative findings, and general pref-
and forth between the report and questions as often plorer. It was connected to our hospital’s internal erence) based on their experience with the test
as needed and their previous answers would remain network. cases. The preference scores were anchored as fol-
intact. Figure 1 depicts the three frames making up All of the subjects of the experiment were senior lows: 1 = prefer free text, 5 = no preference,
a single test case. medical students at our institution. They were re- 10 = prefer structure. A mediated discussion was
The Web pages all had JavaScript code (Sun Mi- cruited by means of fliers posted in the medical then conducted to elicit opinions about radiology
crosystems) embedded in them to record a time school teaching complex. The research (and the report structure and content. Note that subjects
stamp (at 1/10 sec precision) for every button click flier) had prior approval by our local IRB. They were not given any feedback—after they partici-
and item selection as the subject navigated through were required to have taken and passed their medi- pated or during the debriefing meeting—concern-
the case and answered questions. A submit button cine and surgery clinical clerkships before partici- ing individual or aggregate performance in answer-
on the question page served to record the answers pating in the research. For incentive, a $100 account ing questions for the cases.

806 AJR:185, September 2005


Information Transfer: Free Text vs Structured Format

TABLE 1: Results of Analysis of Variance for the Number of Correctly Answered Questions Per Case
Source Effect Type Degrees of Freedom Mean Square Error F Value Pr > F
Format (free text vs structure) Fixed 1 1.069 0.86 0.3541
Case ID (case type) Random 8 5.254 4.25 0.0001
Subject Random 15 11.783 9.52 < 0.0001
Order Fixed 11 1.521 1.23 0.2734
Format × subject Random 14 1.580 1.28 0.2292
Error 1 — 139 1.237 — —
Case type Fixed 2 0.482 0.09 0.9153
Error 2 — 7.879 5.388 — —
Note—Pr = probability, — = not applicable.

TABLE 2: Results of Analysis of Variance for the Number of Seconds Taken to Complete Each Case
Source Effect Type Degrees of Freedom Mean Square Error F Value Pr > F
Format (free text vs structure) Fixed 1 3,056 0.67 0.4129
Case ID (case type) Random 8 24,503 5.41 < 0.0001
Subject Random 15 80,762 17.82 < 0.0001
Order Fixed 11 1,251 0.28 0.9893
Format × subject Random 14 2,866 0.63 0.8343
Error 1 — 139 4,531 — —
Case type Fixed 2 46,880 1.86 0.2174
Error 2 — 7.905 25,171 — —
Note—Pr = probability, — = not applicable.

Statistical Analysis manipulation and statistical calculation. For all Outcome = β1 format + β2 case type +
Each subject’s participation generated a set of 12 tests of significance, we set p = 0.05 as the cutoff β3 caseID(case type) + β4 subjectID +
experimental results. These consisted of answers to and used two-tailed alternate hypotheses. The anal- β5 format × subjectID + β6 order + Error
the 10 questions for a case and the time-stamped ysis was done on the basis of a balanced incomplete
navigation data. We scored the subject’s answers block design. The factors included examination Standard F statistics using the type 3 sums of
against a key to obtain the number correct. The start type (3 levels), report format (2 levels), and indi- squares and appropriate error terms were used to
time was subtracted from the final submission time vidual cases (4 levels per type). Thus, there were 24 test the coefficients (β1 − β6) against the null hy-
to obtain number of seconds taken to do each case. factor level combinations (treatments), a block size pothesis of no effect (βi = 0). The Duncan proce-
An efficiency score was then calculated for each of 12 (cases per subject), and eight blocks (sub- dure was used to perform multiple comparisons of
case by dividing the number of questions answered jects). The experiment was replicated twice with mean number of report views by case type [3].
correctly by the number of seconds taken to finish two groups of eight subjects for a total of 16 sub- Since none of the fixed effects was significant in
the entire case. This result was multiplied by 60 to jects and 192 experimental units. Summary statis- any of the other models, no additional multiple
give the number of correctly answered questions per tics for the five outcomes were generated, including comparisons were performed. The two outcomes
minute. The time-stamped navigation activity mean, median, mode, SD, frequency distribution relating to subjects’ test-taking habits (report views
records were processed to obtain two outcomes for plot, and normal probability plot. and answer selections) were correlated within sub-
each case. The number of times the subject moved We performed analysis of variance with a gen- jects, formats, case types, and overall to obtain
back to view the report from the question page was eral linear model procedure (SAS PROC GLM) to Pearson correlation coefficients [4].
tabulated. This outcome could take any positive in- test for differences in each of the five outcomes Because our results showed equivalence on the
teger and will be called report views. The number of (percent of questions correct, time taken, effi- main variable of interest (report format) we per-
answer selections made during each case was ciency, report views, and answer selections) formed post hoc power analysis. The sample size
counted as well. This outcome was at least 10 and jointly related to our independent variables. We was initially set with eight subjects each looking at
was higher when subjects changed their minds about used the same linear model for each outcome. 12 cases for a total of 96 experimental units. We
one or more answers. Thus, there were five out- Fixed effects included report format, case type, were able to double the planned sample size because
comes analyzed: number of questions correct, time and the order that the case was presented to the many students responded to the request for participa-
in seconds taken to complete the case, efficiency subject. Random effects included case identity tion and running the experiments was quite easy due
score, report views, and answer selections. nested within case type, subject identity, and re- to all subjects’ familiarity with the testing system.
We used the Statistical Analysis System (SAS port format crossed with subject identity. The Our analysis was based on paired testing between
Version 9 for Windows, SAS Institute) for all data model was specified as follows: free text and structured format with α = 0.05 and

AJR:185, September 2005 807


Sistrom and Honeyman-Buck

β = 0.01 (90% power). We used the root mean The time taken to complete the cases port format and case type had the same mean
square error from the analysis of variance output as ranged from 30 to 707 sec with mean of 351 number across all levels.
the estimate of sigma for each outcome. and SD of 108. The distribution was not The number of times subjects went back to
For the postexperimental survey, Likert-scaled skewed with a median of 341 but there was look at the report text (report views) while an-
items from the debriefing questionnaires were sum- some kurtosis. The single observation at 30 swering questions ranged from two to 32 with
marized by calculating the median value, the 10th sec was a distinct outlier with the next lowest a mean of 12, median of 11, and SD of 6.8.
percentile, and the 90th percentile. The general value being 102 sec (first percentile). None of The distribution was near normal, allowing for
preference items were enumerated and percentages the fixed effects were significant (format the discrete nature of the outcome. The
calculated. The qualitative content of the debriefing p = 0.41, order p = 0.99, and case type analysis of variance analysis showed no dif-
focus group was summarized from a transcription p = 0.22) and there was no interaction be- ference in the mean number by report type
of a tape recording made during the session. tween format and subject effects on the time. (structured = 13.4, free text = 12.3, p = 0.93).
The majority of the variance (62%) was par- Again, there was considerable variance be-
Results titioned between subjects and between cases tween subjects (p < 0.0001) with a minimum
Experimental Results within case type and R squared for the model of 4.7 times per case ranging up to 25 times
Examination of the response data from the was 0.71. The analysis of variance results are per case. Interestingly, there was a significant
testing system revealed that all 16 subjects reproduced in Table 2. difference between the type of case (p =
submitted valid responses to the 10 questions The efficiency (number of correctly an- 0.0016) even though there was no significant
for each of their 12 cases and that they per- swered questions per minute) ranged from difference between the individual cases
formed them in the assigned order. The timing 0.52 to 4.7 with mean of 1.56 and SD of 0.56. (p = 0.11). The mean number of times the re-
data were complete and consistent with the an- The distribution was nearly normal except port was consulted for abdominal CT (13.1)
swer responses with no gaps, extra entries, or that the positive tail was longer. This was due was essentially the same as for abdominal
ambiguities. Thus, there were 192 complete re- to the same outlier (30 sec to complete the songraphy (12.9). However, for head CT, sub-
sponse sets available for analysis. These con- case) mentioned above. None of the fixed ef- jects went back to look at the report an average
sisted of timing data and answers to all 10 fects were significant (format p = 0.92, order of 11 times. We tested for interaction between
questions for each of 12 cases completed by p = 0.60, and case type p = 0.48) and there case type and subject effects and found none.
our 16 subjects. Each case was shown eight was no interaction between format and sub- Therefore, the tendency to look back at head
times in its free text form and eight times with ject effects on the efficiency. Just under half CT reports less frequently than sonography
the structured format. What follows are of the variance (46%) was partitioned be- and abdomen CT was shared by all subjects.
univariate statistics on each outcome for the tween subjects and between cases within case Across the entire sample of 192 cases, the
entire data set (n = 192) along with the multi- type and R squared for the model was 0.57. correlation between report views and answer
factorial analysis of variance results. The analysis of variance results are repro- selections was weakly positive with a Pearson
The number of correct responses (score) duced in Table 3. coefficient of 0.22 (p = 0.002). When we ex-
ranged from two to 10 with a mean of 8.35 The number of times that any answer was amined the relationship between report views
and SD of 1.52. The distribution was nega- selected for each case (answer selections) and answer selections for each subject, only
tively skewed with median and mode both be- ranged from 10 (obligate floor value) to 19 three of the 16 had significant correlations.
ing nine. None of the fixed effects (format with a mean of 11.5, SD of 1.68, median of These were all positive (0.58, 0.64, 0.75) and
p = 0.35, order p = 0.27, and case type 11, and mode of 10. The distribution was, as probably accounted for the aggregate correla-
p = 0.92) were significant and there was no expected, not normal and looked more like a tion. The correlations between report views
interaction between format and subject ef- Poisson type with a mean of 11.5. We elected and answer selections stratified by case type
fects on the score. The majority of the vari- to proceed with analysis of variance despite and report format were all weakly positive
ance (60%) was partitioned between subjects the violation of normality because this out- (0.18 to 0.28). The one exception was the
and between cases within case type and R come was considered to be secondary and the head CT cases, where there was no correla-
squared for the model was 0.58. The analysis results were relatively uninteresting. The only tion between numbers of report views and an-
of variance results are reproduced in Table 1. significant effect was between subjects. Re- swer selections.

TABLE 3: Results of Analysis of Variance for Efficiency of Answering Questions (Score / Time) × (60)
Source Effect Type Degrees of Freedom Mean Square Error F Value Pr > F
Format (free text vs structure) Fixed 1 0.002 0.01 0.9172
Case ID (case type) Random 8 1.078 6.08 < 0.0001
Subject Random 15 1.126 6.36 < 0.0001
Order Fixed 11 0.148 0.84 0.6040
Format × subject Random 14 0.259 1.46 0.1339
Error 1 — 139 0.177 — —
Case type Fixed 2 0.881 0.80 0.4843
Error 2 — 7.915 1.108 — —
Note—Pr = probability, — = not applicable.

808 AJR:185, September 2005


Information Transfer: Free Text vs Structured Format

The head CT cases seemed to elicit a some- words (8/15 = 53%), as a semiquantitative news story, subjects wanted to see reports
what different pattern of report viewing unre- scale (3/15 = 20%), and in quantitative terms have an equivalent of the lead paragraph in
lated to the number of times answers were se- (2/15 = 13%). The final general question which findings are synthesized and con-
lected. Considering that the head CT asked about how subjects would respond to densed to allow rapid review and to highlight
structured format was functionally organized an explicitly worded recommendation in a ra- the diagnostic impression.
rather than strictly by anatomy, we wanted to diology report. The responses were as fol-
be sure there was no interaction between the lows. The subject would be compelled to fol- Discussion
type of case and our main independent vari- low the recommendation (2/15 = 13%), they The working hypothesis of the experiment
able of interest, the report format. When we might be compelled to follow the recommen- was that our subjects would take significantly
added this interaction term to the analysis of dation (10/15 = 67%), and they would not less time to read a structured report and an-
variance models for score, time, and effi- feel so compelled (3/15 = 20%). swer questions about its content than they
ciency, we were reassured to find that it was The responses to the Likert-scaled ques- would with the free text version. We were not
not significant for any of the outcomes. Thus, tions about preference between free text (1) certain that subjects would attain higher or
the finding that report format had no effect on and structured format (10) are summarized in even equal scores with the structured version.
the outcomes is conclusive and holds across Table 4 with median, mode, and range for To allow for this, we introduced a measure of
case type. each one. Clearly, the subjects strongly efficiency (correctly answered questions per
For the post hoc power analysis, the root tended to prefer the structured format for the minute). The pattern of outcomes that we ex-
mean square errors (sigma) were as follows: seven separately articulated domains as well pected was less time for structure, equal to
score = 1.32, time (seconds) = 65, and effi- as overall. During the mediated discussion, slightly lower scores (accuracy) with struc-
ciency = 0.43. As described above, we set this perception was reinforced. At least five ture, and increased efficiency with structure.
α = 0.05 and power to 90%. For score, we subjects clearly expressed the opinion that To our surprise, there were no significant dif-
could have detected a difference of about 0.5 they would like to see all radiology reports ferences on any of the three measures. The
in the mean of correctly answered questions. formatted in a manner similar to our struc- planned power to detect differences between
The observed mean scores were 8.28 for free tured condition (as in Appendix 3). One par- free text and structure was achieved and in-
text and 8.43 for structured format, with a dif- ticipant mentioned that the headings should deed exceeded in the experiment. Further-
ference of 0.15. For time to complete each be consistent across all instances of a report more, all hypotheses were tested using two-
case, we could have detected a difference of type. He said “it might be confusing if some tailed methods thus allowing for either alter-
about 30 sec. The observed mean times were people include biliary system under liver and native to the null of equivalence.
355 sec for free text and 347 sec for structured some people included it under gallbladder, We assert that there is no effect of report
format with a difference of 8.4 sec. Finally, for example.” Others felt that the order of format on speed, accuracy, or efficiency for
for efficiency of answering questions, we headings should be altered dynamically so our subject population reading the types of re-
could have detected a difference of about 0.2 that the abnormal findings would be at the ports we presented to them. By extension, we
questions per minute. The observed effi- top of the report. One potential drawback to suggest that the same phenomenon may per-
ciency for free text was 1.57 questions per the structured format was expressed as fol- tain to practicing physicians. If this is true, de-
minute and for structured format, it was 1.55 lows: “The overall gestalt of the examination signers of reporting systems may not need to
questions per minute for a difference of 0.02 might be lost in structure whereas with free work so hard to produce old-style narrative
questions per minute. For all three of the main text a sense of severity and more acuity can documents out of structured elements. At the
outcomes of interest, the observed effect of be conveyed.” A corollary opinion that was same time, it would seem that free text reports
report format was far less (approaching an or- expressed quite forcefully and frequently are not as difficult to read for content as some
der of magnitude) than the difference our ex- was that reports should still have a clearly la- believe. The choice of report format and
periment was powered to detect. beled impression section. The concept of or- structure may be made on the basis of refer-
ganizing a report like a newspaper story was ring physician preference and considering the
Qualitative Results linked to the need for an impression. Like a effect of reading into structure by radiolo-
The debriefing meeting was attended by 15
of 16 subjects. One of the male students was
serving in a clinical clerkship at another insti- TABLE 4: Summary of Responses to Format Preference Questions
tution. One of the general questions asked for Question—Preference With Respect To: Median Mode Range
preference about the report organization. One Your accuracy (number correct) 9 10 6–10
option to this question was “like a laboratory
Your speed in finding answers 9 10 3–10
report.” By this we meant standardized head-
ings in the body with results organized under Your efficiency in finding answers 9 10 6–10
these headings. The choices and number of Your certainty of knowing the answer 9 10 5–10
responses were as follows. Like a laboratory Knowing what was not mentioned in the report 10 10 5–10
report (11/15 = 73%), like a newspaper story Negative findings 9 10 2–10
(1/15 = 7%), and in the current (unstructured)
Positive findings 9 10 7–10
format (3/15 = 20%). We also asked how
they would prefer to have uncertainty ex- General preference 10 10 3–10
pressed. The choices and responses were in Note—Questions were all Likert scaled with 1 = free text, 5 = no preference, and 10 = structured format.

AJR:185, September 2005 809


Sistrom and Honeyman-Buck

gists. However, we advise those seeking to nization and format like the “laboratory re- using “telegraphic” constructions such as
fundamentally change the way in which radi- port” option preferred by our subjects. “LIVER: Negative.”
ology reports are created and displayed to Selection of senior medical students to serve To address limitations described above and
proceed with caution in consideration of the as subjects proved to be quite successful. Inter- extend the scope of our inferences, we plan on
following. In a recent article, Ash et al. [5] est and enthusiasm were such that we were at least three extensions to the experiment us-
discussed unintended consequences of over- able to double the sample size with little effort. ing the same cases, questions, and randomiza-
emphasizing structured information entry in Even after the study had closed, numerous stu- tion scheme. First, we will remove the button
health care informatics. They cite evidence dents asked to participate. We found the exper- on the question page that allows going back to
from studies in cognitive psychology and so- imental paradigm to be very acceptable to sub- review the report. Subjects will know this and
ciology that in a shared context, concise, un- jects and quite easy to administer. These that they must answer all 10 questions after
constrained, free text communication is the factors should allow us to easily extend and ex- one reading. There will be no time limit for ei-
most effective for coordinating work around a pand the experiments—using additional co- ther reading the report or answering the ques-
complex task (5, 6). Overly structured data horts of senior medical students—to address tions. Second, we will place a time constraint
can lead to loss of cognitive focus by clini- limitations described below. on how long the report is visible before
cians, both during input and review. This can Perhaps the most important limitation in our switching to the question page. Again, sub-
cause clinicians to experience a loss of over- study has to do with generalizing our results jects will know that they cannot go back to re-
view about the case at hand when they have to from senior medical students answering con- view the report while answering questions.
attend to data contained in many different tent-specific questions to practicing physicians Third, we will enable either structured or free
fields, sometimes on different screens within using radiology reports for clinical decision text formats to be viewed at the discretion of
an interface (7, 8). Furthermore, the act of making. We narrowed the focus of the research the subject while they go through a case. The
writing or dictating in narrative form may be to evaluate readability of the documents con- tracking code will record which version(s)
integral to the cognitive processing of the taining radiology interpretations with respect they look at and for how long. This will allow
case [9]. Our finding of dissonance between to the format alone. Medical students’ subse- us to determine if subjects develop and actu-
subjects’ preferences concerning report for- quent experiences during training and practice ally act on a preference for one format or the
mat and their actual performance reading certainly do lead to differences in many skills other as they move through the cases.
them for comprehension confirms that the and habits. However, we argue that the simple Further research will involve psychometric
cognitive issues are complex. In our opinion, ability to read a passage of text and compre- evaluation of the questions themselves. Once
tried and true methods of authoring and dis- hend its content is already well established by the three additional experiments detailed
playing radiology reports should not be aban- the senior year of medical school. above have been completed, we will have a
doned without considering the consequences. A difficult design consideration was wheth- large number (64) of answers to each of 120
Our follow-up session and the question- er to test subjects with both the free text and different questions about radiology report con-
naires completed by the subjects shed light structured version of each case. This would tent. This should allow us to use standard tech-
on reader preference for report format. They have provided even greater power to detect niques to assess item difficulty, reliability, var-
all strongly and consistently preferred the the effect of format on outcomes by virtue of ious correlations, and discriminatory power.
structured version to the free text. This pref- having a directly paired comparison. We These results will be interesting in their own
erence for structured format was consistent think that our choice of a balanced block de- right by revealing what kinds of questions are
across all seven domains that we asked about sign, incomplete in the format factor, was challenging for readers to answer. This might
with modal values on the 10-point Likert valid for two reasons. First, the planned and guide radiologists in explicitly including
scale all being 10 (prefer structure). Also, achieved power of the chosen design allowed phraseology in their reports to address these
the corollary question about general report us to detect differences between free text and difficulties. Types of questions that exhibit
organization resulted in 73% preferring a structure that were far less than what we con- high levels of variance in the answers given or
“laboratory report” format over the alterna- sidered to be practically relevant. Second, are poorly correlated with subject’s overall
tives. The opinions of our subjects are en- having subjects see cases twice would have scores will also be of interest. Given this
tirely consistent with other workers’ find- introduced methodologically difficult prob- knowledge about the types of questions that
ings with respect to physician preferences lems with memory effects. are most reliable and discriminatory, we can
about radiology reports. There is a large Another issue is that our subjects had no redesign the cases and questions to optimize
body of published research detailing the time constraints or other pressures placed on power to detect subtle differences in reader
opinion of referring physicians regarding the them during testing. We plan on adding fea- performance. Such information about question
content and format of radiology reports tures to the experimental paradigm that will content may guide other researchers in their
[10–15]. The terminology differs somewhat stress a subject’s short-term memory of the own experiments about readability of medical
but attributes consistently endorsed by con- material. The structured versions of the reports documents.
sumers (readers) of radiology reports in- we used had phrasing and syntax identical to To our knowledge, this work is the first
clude complete, itemized, and structured. that found in the original narrative versions. In experimental evaluation of radiology reports
Another element that is commonly preferred practice, the language and construction of in- whose primary outcomes are quantitative
by referring clinicians is that the report terpretative statements would likely be rather measures of information transfer to readers
should contain a complete listing of perti- different in structured reports. Readers may be of the documents. Based on the results de-
nent negative findings. In aggregate, these more (or less) able to rapidly comprehend scribed above, we assert that there is no dif-
opinions seem to militate for a report orga- medical content presented in structured format ference in information transfer efficiency

810 AJR:185, September 2005


Information Transfer: Free Text vs Structured Format

between free text (narrative style) report for- ory of evolution. III. regression, heredity and pan- porting: attitudes of referring physicians. Radiology
mat and structured (itemized) reports having mixia. Phil Trans Royal Soc Ser A 1896; 187:253 1988; 169:825–826
the same content. Despite the fact that they 5. Ash JS, Berg M, Coiera E. Some unintended con- 11. Gunderman RB, Ambrosius WT, Cohen M. Radi-
performed no better with the structured ver- sequences of information technology in health care: ology reporting in an academic children’s hospital:
sions, our subjects clearly preferred it to the the nature of patient care information system-re- what referring physicians think. Pediatr Radiol
free text format. lated errors. J Am Med Inform Assoc 2004; 2000; 30:307–314
11:104–112 12. Johnson AJ, Ying J, Swan JS, Willicam LS, Apple-
6. Garrod S. How groups co-ordinate their concepts gate KE, Littenberg B. Improving the quality of ra-
References and terminology: implications for medical infor- diology reporting: a physician survey to define the
1. Merritt CRB. New president says emphasize signal, matics. Methods Inf Med 1998; 37:471–476 target. J Am Coll Radiol 2004; 1:497–505
delete noise from radiology reports. ARRS Memo: 7. Patel VL, Kaufman DR. Medical informatics and 13. Lafortune M, Breton G, Baudouin JL. The radio-
Newsletter of the American Roentgen Ray Society the science of cognition. J Am Med Inform Assoc logical report: what is useful for the referring phy-
2004; 15(3):1–8 1998; 5:493–502 sician? Can Assoc Radiol J 1988; 39:140–143
2. Sistrom CL, Langlotz CP. A framework for im- 8. Patel VL, Kushniruk AW. Understanding, navigat- 14. McLoughlin RF, So CB, Gray RR, Brandt R. Ra-
proved radiology reporting. J Am Coll Radiol 2005; ing and communicating knowledge: issues and diology reports: how much descriptive detail is
2:159–167 challenges. Methods Inf Med 1998; 37:460–470 enough? AJR 1995; 165:803–806
3. Duncan DB. t tests and intervals for compari- 9. Berg M. Practices of reading and writing: the con- 15. Naik SS, Hanbidge A, Wilson SR. Radiology re-
sons suggested by the data. Biometrics 1975; stitutive role of the patient record in medical work. ports: examining radiologist and clinician prefer-
31:339–359 Sociol Health Illness 1996; 18:499–524 ences regarding style and content. AJR 2001;
4. Pearson K. Mathematical contributions to the the- 10. Clinger NJ, Hunter TB, Hillman BJ. Radiology re- 176:591–598

APPENDIX 1: Headings Used for Structured Format Reports

UNENHANCED CT OF THE HEAD ABDOMINAL ULTRASOUND ABDOMINAL COMPUTED


Comparison: Comparison: TOMOGRAPHY
Procedure: Procedure: Comparison:
Image Quality: Image Quality: Procedure:
Bones-Sinuses: Liver: Liver:
Extracranial Soft Tissues: Gallbladder: Gallbladder/Biliary:
Ventricles: Biliary: Pancreas:
Brain-Developmental: Pancreas: Spleen:
Brain-Age Related: Spleen: Kidneys/Ureters:
Brain-Postive Findings: Kidneys: Adrenals:
Brain-Negatives: Vascular: Aorta/Vessels:
Extra-Axial Spaces: Other Findings: Bowel/Appendix:
Vascular Structures: Impression: Fluid/Free Air:
Other Findings-Comments: Lymph Nodes:
Impression: Pelvic Organs:
Other Findings:
Impression:

Source: Department of Radiology, University of Florida, Gainesville, FL 32602.

AJR:185, September 2005 811


Sistrom and Honeyman-Buck

APPENDIX 2: Free Format Report of an Abdominal CT Scan as Obtained from the Medical Record

A CT scan of the abdomen and pelvis was There is perinephric fat stranding adjacent to There is no mesenteric or retroperitoneal
performed on 3/3/02 without priors. Routine the right kidney. No additional ureteral or lymphadenopathy. No pathologic osseous le-
contiguous images were obtained from the bladder stones are identified. The left kidney sions are seen.
upper abdomen to the proximal femurs with- is without hydronephrosis or significant
out the use of intravenous contrast, following perinephric fat stranding. However, there is Impression:
a renal stone protocol. an exophytic 1.5 cm hypodensity off the in- 1. Multiple bilateral renal calculi as de-
The visualized lung bases are clear and ferior pole of the left kidney that likely rep- scribed above. There is a 6 mm at least par-
show no evidence of pulmonary parenchymal resents a renal cortical cyst. tially obstructing calculus at the right uretero-
nodules or masses. Unenhanced examination of the liver is pelvic junction that is causing mild-to-
There are multiple bilateral renal stones. unremarkable without biliary dilatation. Un- moderate proximal hydronephrosis. Addi-
The largest is present within the left renal enhanced examination of the spleen, tionally, there is perinephric fat stranding ad-
pelvis and measures 16 × 9 mm. There is a 6 adrenals, and pancreas is unremarkable. The jacent to the right kidney.
mm stone at the right ureteropelvic junction bowel is not opacified but otherwise is 2. Cortically based hypodensity within the
with mild-to-moderate proximal hydroneph- unremarkable. There is a normal-appearing inferior pole of the left kidney that likely rep-
rosis. The distal ureter is decompressed. appendix. resents a renal cortical cyst.

APPENDIX 3: Report from Appendix 2 Parsing into the Structured Template

COMPARISON: Without priors.


PROCEDURE: Routine contiguous images were obtained from the upper abdomen to the proximal femurs without the use of
intravenous contrast, following a renal stone protocol.
LIVER: Unremarkable.
GALLBLADDER/BILIARY: Without biliary dilatation.
PANCREAS: Unremarkable.
SPLEEN: Unremarkable.
KIDNEYS/URETERS: There are multiple bilateral renal stones. The largest is present within the left renal pelvis and measures 16 × 9
mm. There is a 6 mm stone at the right ureteropelvic junction with mild-to-moderate proximal hydronephrosis.
The distal ureter is decompressed. There is perinephric fat stranding adjacent to the right kidney. No additional
ureteral or bladder stones are identified. The left kidney is without hydronephrosis or significant perinephric fat
stranding. However, there is an exophytic 1.5 cm hypodensity off the inferior pole of the left kidney that likely
represents a renal cortical cyst.
ADRENALS: Unremarkable.
AORTA/VESSELS:
BOWEL/APPENDIX: The bowel is not opacified but otherwise is unremarkable. There is a normal-appearing appendix.
FLUID/FREE AIR:
LYMPH NODES: There is no mesenteric or retroperitoneal lymphadenopathy.
PELVIC ORGANS:
OTHER FINDINGS: The visualized lung bases are clear and show no evidence of pulmonary parenchymal nodules or masses. No
pathologic osseous lesions are seen.
COMMENTS:
IMPRESSION:
1. Multiple bilateral renal calculi as described above. There is a 6 mm at least partially obstructing calculus at the
right ureteropelvic junction that is causing mild-to-moderate proximal hydronephrosis. Additionally, there is
perinephric fat stranding adjacent to the right kidney.
2. Cortically based hypodensity within the inferior pole of the left kidney that likely represents a renal cortical cyst.

812 AJR:185, September 2005

You might also like