AEGIS Workshop Invitation Letter-2

See discussions, stats, and author proﬁles for this publication at: https://www.researchgate.
net/publication/236259922
Basic Concepts of Modern Epidemiology: Epidemiology and Research
Book · October 2012
CITATION READS
1 2,440
1 author:
Khaled Kasim
Al-Azhar University
41 PUBLICATIONS 207 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Middle East respiratory syndrome coronavirus in children View project
Childhood Leukemia in Egypt View project
All content following this page was uploaded by Khaled Kasim on 08 January 2015.
The user has requested enhancement of the downloaded ﬁle.

Basic
Concepts Of
Modern
Epidemiology
Dr. Khaled Kasim

Associate professor of Epidemiology
Taibah College of Medicine, Madinah, KSA
Al-Azhar Faculty of Medicine, Cairo, Egypt
FIRST EDITION
2010
Preface
This book presented the basic concepts of the essential epidemiologic topics
needed to be known not only for epidemiologists but also for every one interested in
medical researches.
During my Ph.D. study of epidemiology in Laval University, Quebec, Canada, and
my work as post doctoral researcher in Queen’s University Belfast, NI, UK, many of
epidemiologic topics have been written. Selecting some of these topics after its
modification in a way that suits all readers concerned with medical researches and adding
also some of other important epidemiologic topics, I accomplished this simple work. In
this work, I tried to present the selected topics in a simple language without any
sophisticated statistical details that may be hard to be easily understood by non specialists.
The book also presented knowledge necessary to acquaint some important research work
skills such as how to write, read and criticize correctly a medical research paper.
Of course, this book is in no way a substitute for the standard textbooks but my
attempt is to help medical researchers understanding the new concepts of epidemiology
and to aid them while they design, conduct, analyze and write their medical research
thesis and papers.
Khaled Kasim, Ph.D.

September, 2010
Basic Concepts Of Modern Epidemiology Page 2

Table of contents: Pages
1. What is Epidemiology? 4
2. Measures of Disease Frequency 9
3. Measures of Association and Potential Impact 14
4. Causality and Causal Inference 19
5. Study Design 23
6. Precision and Validity in Epidemiologic Studies 48
7. Standardization and Adjustment 63
8. Interaction and Effect Modification 72
9. Meta analysis 77
10. Screening 84
11. Epidemiologic Surveillance 91
12. Writing a Medical Research Paper 97
13. Reading and Criticizing a Medical Research Paper 108
14. Epidemiology Glossary 111

CHAPTER 1
What is Epidemiology?
Epidemiology is the science that forms the basis for public health action and unites
the public health professions. Derived from three Greek roots (epi meaning upon, demos
meaning people, and logia meaning study), the term epidemiology was originally applied
to the study of outbreaks of acute infectious diseases and was defined as the science of
epidemics.
Epidemiology now refers to the study of the distribution and determinants of
health-related states or events in specified populations, and the application of this study to
control health problems. The two key words in this definition are the distribution of
diseases (i.e., what, how, when, where and who’s affected) and the determinants (i.e.,
causes and risk factors) of the diseases. In other words, epidemiology represents the study
of how diseases occur in populations and why. Epidemiological observations are made on
groups of people (populations) rather than individuals, while clinical observations are
made on individual patients. This is one reason why statistical methods are necessary for
analyzing epidemiological data.
The application of epidemiological principles and methods to the practice of
clinical medicine is called Clinical Epidemiology. Clinical decisions should be based on
sound scientific evidence (Evidence Based Medicine); this is the main justification for
clinical epidemiology. Evidence Based Medicine is a method of basing clinical decisions
on the best available scientific evidence. The main areas of study in clinical epidemiology
are: definitions of normality and abnormality, validity and accuracy of diagnostic and
screening tests, natural history and prognosis of diseases, effectiveness of therapies, and
prevention in clinical practice.
Uses of epidemiology
1. Description of health status of populations (i.e., community assessment).
2. Description of the natural history of diseases.
3. Identification of causations.
4. Identification of syndrome (i.e., grouping of some different manifestations under
one syndrome or alternatively splitting of similar events into different categories).

5. Evaluation of clinical signs, symptoms and decision analyses (clinical
epidemiology).
6. Evaluation of interventions.
The common basis for these different applications of epidemiology is the study of
diseases occurrence and its relation to various characteristics of individuals or their
environment.
Emergence of Epidemiology as a science:

Although several studies and ideas resembling epidemiological work have been
done over the centuries, they not called as Epidemiology. Hippocrates, in his work "Airs,
Waters, and Places" written 2400 years ago, linked disease with the environment. James
Lind’s trial of fresh fruit against scurvy in 1747 could be the earliest example of a clinical
trial. William Farr working in the Office of the Registrar General in England in 1840s
demonstrated the effect of imprisonment on mortality. John Snow’s work “On the Mode
of Transmission of Cholera” in 1855 is considered a classic in the field of epidemic
investigation.
The epidemiology as a science became established as a distinct, systematized body
of knowledge only after the Second World War. The period from 1940s to 1970s is
considered as the golden era in the evolution of epidemiology. A few large studies
(particularly those involving smoking and lung cancer) provided the impetus to the
growth of epidemiology and the tremendous improvement in epidemiologic methods.
There are some of the classics in the field of epidemiology. The most important studies of
these classics are:
1. Smoking and lung cancer study: The association between smoking and lung cancer is
now considered almost causal. Since the first epidemiological studies published in the
1950s, several studies have demonstrated the association between smoking and lung
cancer. In particular, studies by Doll and Hill in 1964 are considered classics.
2. The Framingham Heart Study: This study began in 1948 and is still going strong 50
years later. The study was done to identify risk factors for Coronary Heart Disease (CHD)
and is a classic cohort (longitudinal) study. Framingham is a town in Massachusetts
(population of 28,000 when the study began). Thousands of the town residents were

examined for CHD and risk factors. Subsequently, they were offered complete
examination every 2 years since the study began. As new types of investigations appeared
on the scene, they have been added to the examination. The study findings have emerged
in a large series of reports over the years and have contributed tremendously to our
understanding of CHD and its risk factors. More than 1000 articles from FHS have been
published to date. The project has cost the American government $43 million. Analysis
of the Framingham data also paved the way for the evolution of complex statistical
modeling techniques like multivariate analyses.
3. Polio Vaccine Field Trial: The largest formal human experiment ever was done when
the Salk polio vaccine was put through a field trial in 1954, with nearly a million school
children as subjects. The study clearly demonstrated the protective efficacy of the vaccine
and provided the basis for an eradication program.
Epidemiology, heath, and public health:

For a long time the predominant interest in epidemiology was the area of
infectious disease. This interest is often increases dramatically during so-called epidemics.
During the last few decades increasing attention has focused on the epidemiology of
chronic and malignant diseases with the aim to decrease morbidity and mortality from
these diseases. In this way, the objectives of epidemiologists are similar to those of the
clinician, concerning with the health of the people. However, the clinician’s main unit of
concern is the individual patient, whereas the epidemiologist’s unit of concern is the
group of individuals.
Health, itself, is difficult to define. According to mainstream medicine, health is
the absence of disease. “Dis-ease,” the opposite of “ease,” is literally when something is
wrong with bodily function. Optimal health, however, may go beyond the mere absence
of disease. The world Health Organization (WHO) in 1948 defines health as “a state of
complete physical, mental, and social well-being and not merely the absence of disease or
infirmity.” Nowadays, a new concept has been added to the definition of health and is
concerned with the human spirit, introducing what so-called “spiritual component of
health.” In viewing health and disease on a population basis, epidemiologists study both

morbidity and mortality and try to gain insight into factors that increase or decrease
morbidity and mortality in a community.
In recent years, several studies have declared that the morbidity and mortality from
diseases have related directly to health-care system and accordingly the state of public
health services in the community. The Institute of Medicine (1988) defined public health
as “organized community effort to prevent disease and promote health.” By this
definition, the goals of public health are to reduce the burden of disease, disability, and
premature death in the population. Public health is composed of many different
disciplines, of which epidemiology is one of its important disciplines. Other public health
disciplines are the following: biostatistics, community health planning, health policy
development, public health administration, laboratory sciences, environmental health,
occupational health and safety, injury control, mental and child health, nutrition, and
health education.
Epidemiology and Preventive Medicine

Epidemiology is the central science of public health, and preventive medicine is a
clinical approach to public health practice. Epidemiology can provide the preventive
medicine practitioner with the following:
- Information on the state of the health of the population.
- Methods for identifying possible determinants of health and disease within
individuals.
- Appropriate population groups for interventions.
- Understanding the origins of public health recommendations.
BIBLIOGRAPHY
1. Doll R and Hill A. Mortality in relation to smoking: ten year's observations of British
doctors. BMJ 1964; 1:1399-1410 and 1460-1467.
2. Last JM. A dictionary of epidemiology. 2nd ed. Oxford, Oxford University Press, 1988.
3. Kannel WB, et al. Factors of risk in the development of coronary heart disease: six-
year follow-up experience: The Framingham study. Ann Intern Med 1961; 55:33-50.

4. Rothman KJ, Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
5. Sackett DL. Clinical epidemiology: a basic science for clinical medicine. Boston, Little,
Brown and Co, 1985.
6. Tyler CW and Last JM. Epidemiology. In: Last JM and Wallace RB Editors. Maxcy-
Rosenau-Last JM. Public Health & Preventive Medicine. 13th edition. Appleton & Lange,
1992.

CHAPTER 2
Measures of Disease Frequency
Measuring events (disease events or health events) is at the heart of epidemiology.
If one can not quantify, one can not do epidemiological research. One of the simplest
methods of measuring is just simply counting (frequencies of events). For instance, how
many deaths occurred in a village during a year, how many cases of diabetes report to the
hospital every day, etc?
While frequencies are useful in many ways, there is a great danger is just going by
frequencies. The key in epidemiology is relating the frequency (the numerator) to an
appropriate population (the denominator). This is done by computing rates, ratios and
proportions. The measure of disease frequency used for quantifying disease depends on
what question is being asked. Table 2.1 summarizes the three questions that are
commonly asked:
Table 2.1. Commonly asked question to quantify disease frequency.
Measure of disease
Question
frequency
1. How many people in a given population have the Point prevalence
disease at this point in time?
2. How many people in a given population ever had the Period prevalence
disease during a given period of time?
3. How many people in a given population newly Incidence
developed the disease during a given period of time?
There is a lot of difference between these measures and researchers must be clear
as to what they are trying to measure. Often, the researcher finds the terms incidence and
prevalence being used synonymously in medical literature. This only reflects the
underlying lack of awareness of basic epidemiologic concepts.
Prevalence
The prevalence measure is called simply “prevalence” (P) (other terms in use
include prevalence rate and prevalence proportion). The prevalence is defined as the

number of persons with a disease at a specified point in time. When used without
qualification, prevalence usually refers to point prevalence.
P = Number of having the disease at a specific time / Number of individual in the
population at that point of time X 1000
Example: suppose we have interested in finding out how many people living in a village
had diabetes. Let us assume we could perform the standard oral glucose tolerance test on
all people living in the village. If 100 out of 1000 villagers tested were positive for
diabetes, will this proportion (10%) be called incidence or prevalence? It is obviously a
prevalence measure because the estimated 10% was arrived at by testing people at only
one point in time (a cross sectional estimate). We have no idea as to when exactly this
10% actually became diabetic.
Most often, much of the epidemiological work that is done provide only
prevalence estimates. True incidence measures are hard to find because of the inherent
difficulty in estimating incidence. Factors which influence prevalence are the following:
i. Diagnostic methods: It is easy to appreciate that the prevalence of a disease
will depend on how the disease is diagnosed; the prevalence of diabetes as
measured by fasting blood sugar will be different from the prevalence as
measured by urine sugar.
ii. Population at risk: While computing prevalence, it is important to ensure
that only susceptible people included in the denominator. For example, if the
prevalence of cancer cervix is being estimated, then denominator should be
only women who belong to a certain age group (say 25-69) because not all
groups are affected by the disease.
iii. Severity of the disease: If the disease in question is one which is very severe
(high case fatality) then the prevalence rate is likely to be low.
iv. Duration of the disease: If a disease lasts a long time, then the prevalence is
likely to be higher than if the disease lasts for only a short time period. This
one reason why prevalence measures are easier to obtain for chronic diseases.
v. Incidence of the disease: It is obvious that if a disease has a high incidence,
and if it lasts for a reason long period, the prevalence is also likely to be
higher.

vi. Migration: Immigration (in-migration) of susceptible people is likely to
increase the prevalence while emigration (out-migration) of affected cases is
likely to decrease the prevalence.
Incidence
Incidence is defined simply as the number of new events (e.g., new cases of a
disease) in a defined population within a specified period of time.
Example: suppose we wished to know how many people in a given population newly
develop diabetes in a certain period of time. Let us say all people were screened at the
start of the calendar year (January) and 10% of 1000 are found to be diabetic. This means
that 900 people are non-diabetic or healthy at the start of the year. Let us this 900
population is again screened for diabetes at the start of the next year (after one year)
and 9 people found positive. This figure of 10% (9/900) is the one year incidence of
diabetes in this population. It is clear that estimating incidence has a longitudinal (follow-
up) component compared to prevalence which has a cross sectional component. Always
note that, we can not estimate incidence by just measuring disease at one point in time.
This is one reason why prevalence is more easily available that incidence figures.
In general, incidence is usually used to quantify the number of new cases in short
duration, acute illnesses, while prevalence is used to quantify chronic illness.
Epidemiologically, the measures of incidence are more sophisticated and descriptive than
prevalence. Incidence measures are particularly useful when causal associations are being
explored. The incidence can be calculated in two ways; either as incidence rate or as
cumulative incidence.
Incidence Rate (Incidence density)

The basic measure of disease occurrence is the “incidence rate” (I) (an alternative
term is incidence density). In calculating the incidence rate, the numerator is the number
of new events that occur in a defined period of time and the denominator is the
population at risk experiencing the event during this period of time. Most accurate way of
calculating incidence rate is to calculate the person-time incidence rate. Each person in
the population contributes a certain person-time to the denominator. For instance, if 100

persons followed-up or observed for 1 year, then the denominator would be 100 person-
years. Note that, these 100 person-years can be also obtained if only one person is
followed-up for 100 years.
Incidence rate = Number of people who get a disease in a specified period of time /
Sum of the length of time during which each person in the population is at
risk (total person-time follow-up). For instance, if 1000 people were followed up for 3
years, and if 3 people developed the disease, then incidence rate will be 3 / 3000 person-
years = 1 per 1000 person-years. By convention, incidence density is also reported using
a population unit multiplier (like other rates) of 100, 1000, 10,000, or 100,000.
Cumulative Incidence (Incidence proportion)

Cumulative incidence (CI), sometimes called incidence proportion, is a relatively
simpler measure when compared to incidence rate. Unlike incidence rate, the
denominator is only those persons at the beginning of the study.
Cumulative incidence = No. of people who get a disease in a specified period of time /
Number of people free of the disease in the population at the beginning of the period X
1000.
In general, incidence rates are considered more sophisticated and useful than
incidence proportions. This is because the word “rate” inherently means that the
dimension of time is involved. Incidence rate tell us something about the speed at which
events are occurring. In practice, however, the choice of an incidence measure depends
upon the event to be studied. Incidence proportion is well suited for the study of acute
diseases with restricted risk period such as the study of outbreaks. Incidence density, on
the other hand, seems better for studies of chronic diseases with extended risk period. In
addition, incidence density is more suitable in dynamic population studies, where people
are free to leave and enter, while incidence proportion is well suited for studies of fixed
population, where the membership is relatively stable. The advantage of use of incidence
density in dynamic population is that the researcher is able to estimate correctly the risk
period by adjusting follow-up time on a person by person basis. Note that such
adjustment is possible with incidence proportion using life table or by using what is

called actuarial methods. These methods, however, are more tedious to calculate and are
less easily understood than person-time methods.
The interrelation among the three measures:

It was stated that the prevalence is dependent on the incidence and the duration of
the disease. In a stable situation this association may be expressed as follows, where D
indicates average duration of the disease
P / (1 – P) = I X D
The denominator on the left side of the above equation (1 – P) reflects the part of
the population that is free from the disease. It is included in the formula since only those
people who are free from the disease at risk of getting it. For rare diseases, i.e., diseases
where P is low, the following approximation may be used:
P=IXD
The cumulative incidence depends on the incidence rate and the length of the
period (t) at risk. It also is affected by mortality from diseases other than the diseases
studied. If this mortality from other diseases is disregarded, the following relation applies:
CI = 1- exp (-I X t)
Where t is the length of the period and “exp” indicates that the mathematical
constant e = 2.718 should be raised to the power given by the expression in parenthesis.
For diseases with low incidence rate or when the period (t) is short, the following
approximation may be used:
CI = I X t
BIBLIOGRAPHY
1. Gerstman BB. Epidemiology kept simple: An introduction to classic and modern
epidemiology. 1st Edition. New York: A John Willy and Sons, Inc., Publication, 1998.
2. Friedman GD. Primer of Epidemiology, 4th Edition. McGraw Hill International
Editions, 1994.
3. Last JM. A Dictionary of Epidemiology, 3rd Edition. Oxford University Press, 1995.
4. Rothman KJ and Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:

CHAPTER 3
Measures of Association and Potential Impact
While measures of disease frequency like incidence and prevalence are useful in
estimating the magnitude of a given problem, they do not allow us to find out whether an
association exists between a factor and incidence or prevalence of a disease. For instance,
is smoking associated with lung cancer? Incidence of lung cancer alone will not tell us
whether an association exists. The researcher needs to compare the incidence of lung
cancer among smokers and non-smokers to establish an association. This process
involves computing measures of association and it involves comparing disease
occurrence (outcome) among those who are exposed with the occurrence among those
who are unexposed. In the example of smoking and lung cancer, this means comparing
the incidence of lung cancer (outcome) among smokers (exposed) and non-smokers
(unexposed). There are three commonly used measures of association: risk difference,
risk ratio and odds ratio.
1. Risk difference: This is also called excess risk. Risk difference is simply the
difference in the incidence among those who are exposed and those who are
unexposed. For example, if the incidence of lung cancer among smokers is 10 per
1000 and incidence among non-smokers is 1 per 1000, then the risk difference is
10 per 1000 minus 1 per 1000 = 9 per 1000 which means an excess risk of 9 per
1000 among smokers. This measure, however does not give us the magnitude of
risk, it just tells us that there has been a risk difference or excess risk between the
compared groups. Knowing the magnitude of risk, it is better to use the relative
risk or odds ratios.
2. Risk Ratio (Relative Risk): This is nothing but the ratio of incidence among
exposed and incidence among unexposed. In the smoking example, this will be 10
per 1000 divided by 1 per 1000 = 10. In simple language, a relative risk (RR) of
10 means that smokers have a ten times higher risk of developing lung cancer as
compared to non-smokers.

3. Odds Ratio (OR): The OR is usually computed in a case control study where it is
not possible to get the true risk (incidence), as we have group of diseased subjects
(Cases) and group of non diseased subjects (Controls). When odds of exposure
and odds of non exposure are compared between the two studied groups (cases
and controls), then the resultant is the OR. Simply to get an OR, you should
divide the odds of exposure among cases by that among controls and then divide
this result by that of odds of non exposure among cases and controls. Using 2 by 2
tables, the OR will be equal to the following formula:
OR = a X d / b X c
The relative Risk and odds tend to be very similar when the disease occurrence is
rare such as in cancers. Risk ratios and odds ratios are extremely powerful measures of
association and they indicate the presence or absence of positive association between the
studied factor and the disease. In other words, for example, RR above 1 implies a positive
association. RR of 1 implies no association between two variables, and RR less than 1
implies a protective effect of the studied factor (Table 3.1). Note that the data presented
in table 3.1 are also applied for the estimated odds ratios.
Table 3.1. Direction of Association for ratio measures of association
Value of relative risk Direction of association Potential effect of exposure
Approximately 1 No association Neutral
Significantly greater than 1 Positive association Risk
Significantly less than 1 Negative association Benefit (protective effect)
Moreover, ratio measures of association quantify the association strength. The

farther the value is from 1, the stronger the association. Factors that double the risks of a
disease are considered moderately strong risk factors and factors that quadruple the risks
of disease are considered strong risk factors (Table 3.2).
Table 3.2. Strength of Association for ratio measures of association
Relative Risk Strength of association
Greater than 4 Very strong
2–4 Moderate strong
Greater than 1, but less than 2 Weak

Measures of Potential Impact:
Measures of potential impact are used to reflect the expected reduction in a disease
when eliminating a risk factor associated with that disease and, thus, these measures are
useful in evaluating the potential benefit of a proposed intervention. This can be
expressed in two ways:
i. The attributable fraction in the population (AFp), and
ii. The attributable fraction in exposed cases (AFe).
Synonyms for attributable fraction are attributable proportion and etiologic fraction.
1. Attributable Fraction in the population (AFp):

The attributable fraction in the population, sometimes called population
attributable risk (PAR) quantifies the expected reduction in new cases in the population
that would occur if the exposure (risk factor) were removed, assuming the causality
between exposure and disease under the study. In its simplest form, the population
attributable risk is calculated by using this formula:
AFp = R – R0 / R where,
R represents the rate or risk of disease in the population as a whole, and
R0 represent the rate or risk of disease in the unexposed group.
For example, if the risk of lung cancer in the population as a whole (R) = 0.4 per 1000
persons and the risk of lung cancer in nonsmokers (R0) = 0.07 per 1000 persons, then AFp
= (0.4 per 1000 persons  0.07 per 1000 persons) / (0.4 per 1000 persons) = 0.83.
In words, 83 % of the population cases are attributable to smoking.
An alternative formula for the calculation of the attributable fraction in the
population requires information about the proportion (P) of the population that is exposed
to the studied factor (smoking). This formula is:
AFp = (P) (R – 1) / 1 + (P) (R – 1) where,
R represents the rate or risk of disease in the exposed group, and
P represents the prevalence of risk factor (smoking) in the whole population.
If all the confounders of the relation between the exposure and the disease have
been neutralized by adjustment, then the adjusted attributable fraction (AF) can be
calculated. The information needed to calculate this adjusted attributable risk for one

factor at a time is the value of the regression coefficient from the model, the coding
representing the levels of the risk factor used in fitting the model, and the number of
cases at each level. The equation used to calculate this adjusted attributable risk is:
PAR = 1- ∑ Pj / Rj
Where,
PAR = population attributable risk.
∑ = sign of summation.
Pj = proportion of cases in each exposure category of the studied factor.
Rj = adjusted odds ratio for each exposure factor category.
The benefit of this formula, beside it does present the PAR in an adjusted form, it does
not require any information about the general population (we do not use the prevalence of
exposure nor the risk of disease in the whole population).
2. Attributable fraction among the exposed

The attributable fraction among the exposed represents the probability that an
exposed case developed the disease as result of the risk factor in question. It can be
calculated by this formula:
AFe = R1 - R0 / R1
Where,
R1 represent the rate or risk of disease in the exposed group, and
R0 represent the rate or risk of disease in the unexposed group.
In the previous example, if the risk of lung cancer in smokers (R1) = 1.30 per 1000
persons and the risk of lung cancer in nonsmokers (R0) = 0.07 per 1000 persons, then
AFe = (1.30 per 1000 persons  0.07 per 1000 persons) / (1.30 per 1000 person years) =
0.95. In words, 95% of the exposed cases under the study are attributable to smoking.
BIBLIOGRAPHY
1. Brennan P and Croft P. Interpreting the results of observational research: chance is not
such a fine thing, BMJ 1994, 309:727-730.
58:295-300.

2. Friedman GD. Primer of Epidemiology, 4th Edition. McGraw Hill International
Editions, 1994.
3. MacMahon B and Trichopoulos D. Epidemiology. Principles & Methods. 2nd Edition.
Little Brown and Co., 1996.
4. Rothman KJ. Causes. American Journal of Epidemiology 1976;104:587-592.

CHAPTER 4
Causality and Causal Inference
The epidemiologic meaning of cause suggests that a causal factor is any event,
condition, or characteristic that increase the likelihood (risk) of disease, all other things
being equal. To judge that an exposure is the causal agent of a disease, the epidemiologist
should know that the epidemiologic reasoning is divided into two distinct stages:
i. Statistical inference stage: This stage searches for a statistical association
between exposure and disease.
ii. Causal inference stage: This stage involves the derivation of biological
meaning of the observed statistical association.
When the relationship between an exposure and a disease are studied in
populations (not in individual), this exposure may be or may not be found statistically
associated with the disease. If there is a statistical association, then it can be causal or
non-causal in nature. Thus, all associations need not be causal associations but all causal
associations must be statistical associations.
In practice, it is difficult to separate non-causal associations from causal ones. But
by the help of some criteria, we can do it. Some of the well known criteria are:
1. Henle-Koch postulates (1840-1882), and
2. Hill’s criteria (1965).
Henle-Koch's Postulates
1. The agent should be present in every case of the disease under appropriate
circumstances.
2. The agent should not be present in any other disease.
3. The agent must be isolated from the body of the diseased individual in pure
culture, and it should induce disease in another susceptible animal.
It is quite clear that Henle-Koch postulates are not really compatible with the
current multi-factorial model of causation (i.e., multiple causation theory of diseases,
particularly for non-communicable diseases where a single agent rarely exists). Thus,
these postulates are rarely used in practice.

Hill's Criteria for Causation
In his famous paper (Hill 1965) titled “The environment and disease: association
or causation?” Sir Austin Bradford Hill put forward nine conditions for separating causal
from non-causal associations:
1. Strength of the association: Strong associations are more likely to be causal than
weak associations. Weak associations are more likely to be explained by
undetected biases. The association between smoking and lung cancer (large
relative risks have been generated by several observational studies) is often used
as an example for this condition. Note that, while this criterion is reasonable, it
does not rule out the possibility of a weak association being causal.
2. Consistency of the association: Consistency refers to similar results emerging

from several studies done in different populations. Lack of consistency, however,
does not rule out a causal association.
3. Specificity of the association: This criterion requires a single cause to produce a

single effect. Several authors have found this to be a misleading criterion.
Smoking, for instance, causes lung cancer but it is also associated with several
other diseases.
4. Temporality: This criterion denotes the sequence of events with regards to time.
It is an absolute necessity for a causal association; the cause must precede the
effect. In case-control studies, however, we did not actually know if an exposure
precedes the disease under study. Therefore, the temporality is best to be
explained in prospective studies where the exposure actually precedes the
occurrence of the disease.
5. Biological gradient: This implies the presence of a dose-response relationship

(i.e., increasing dose must lead to increasing disease frequency). For instance, the
higher the number of cigarettes smoked, the higher the risk of lung cancer. Dose

response relationship can be tested statistically by the use of tendency tests,
sometimes called linear trend tests (P for trend is now of greater use in regression
analysis). Absence of a dose-response, again, does not rule out a causal
association.
6. Plausibility: This refers to biological plausibility of the observed association.

There should be some biologically acceptable or relevant reason for the cause to
produce a certain effect. But biological plausibility is reflection of available
knowledge as of now; it may change with time.
7. Coherence: Coherence implies that the association does not conflict with current
knowledge about the disease (its natural history, biology, etc.). For example, the
knowledge that smoking damages bronchial epithelium is compatible with the
association between smoking and lung cancer.
8. Experimental evidence: According to Hill, the strongest support for causation

may be revealed by experimental (clinical trial) evidence where introduction or
removal of an agent can lead to a change in the effect. While it is agreed that
experimental studies offer stronger causal inference, it must be understood that
many research questions can never be studied using experiments (for obvious
ethical reasons).
9. Analogy: A previous experience can be used as an analogy to make a causal

inference. Hill uses the example of thalidomide; since we know that this drug
causes congenital anomalies, it is not difficult to appreciate another drug causing
anomalies.
These nine aspects have been used by several epidemiologists as criteria or

checklist for deciding on causation; it is often said that all the nine conditions are
necessary before causation can be inferred. Actually, Hill never used the word criteria
anywhere in his paper, nor did he intend to offer a list of necessary conditions. In his own

words: “Here are nine different viewpoints from all of which we should study association
before we cry causation. What I do not believe... is that we can usefully lay down some
hard-and-fast rules of evidence that must be obeyed before we accept cause and effect.
None of my nine viewpoints can bring indisputable evidence for or against the cause-and-
effect hypothesis.”
BIBLIOGRAPHY
1. Bruzzi P, Green SB, Pyar DP, Brinton LA, Schairer C. Estimating the population
attributable risk for multiple risk factors using case-control data. American Journal of
Epidemiology 1985; 122:904-914.
2. Hill AB. The environment and disease: association or causation? Proc R Soc Med 1965;
3. Rothman KJ. Causal Inference. Boston: Epidemiology Resources, 1988.
4. Susser M. What is a cause and how do we know one? A grammar for pragmatic
epidemiology. American Journal of Epidemiology 1991,133:635-648.

CHAPTER 5
Study Design
Epidemiological studies are classified simply as either Observational studies or
Experimental (Intervention) studies. In observational studies, the investigator does not
manipulate any of the studied factor and is only observe what happen right now (cross
sectional and prevalence studies), what happened in the past (case-control studies), or
what will happen in the future (prospective studies). In experimental studies, however,
the investigator manipulates the studied exposure. In practice, the ethical problems in
human experimentation and the cost involved in such studies almost invariably precludes
extensive use of such study design (See also “Ethical consideration in clinical researches”
at the end of this chapter).
Most studies, therefore, are observational in nature. In an observational study, the
investigator only measures but does not intervene. For example, the rate of occurrence of
acute myocardial infarction among smokers may be compared to the rate among
nonsmokers; in this case, the investigator does not decide who smokes. Observational
designs range from relatively weak studies like ecological and cross sectional studies to
strong designs like case control and prospective cohort studies. Based on the points of
strengths and limitations, the pyramid below (Figure 5.1) illustrates the hierarchy of
strength of epidemiologic studies where the tip of the pyramid represents the strongest
study and its base represents the weakest study.
Figure 5.1. Diagram presenting the hierarchy of strength of epidemiologic studies.
Prospective
R
Case-control
C
T
R
C
Cross sectional
T
Ecological

I. Observational studies:
1. Cross sectional studies
Cross sectional studies, also called descriptive studies, surveys, and prevalence
studies, are used to explore and describe the general pattern of disease or any health
related event in populations.
In this design, measurements are made on a population at one point in time. When
a survey done in a specified community to identify the number of individuals with
diabetes mellitus (DM). The people in the studied community are screened with oral
glucose tolerance test (OGTT) at one point in time to measure what is called “point
prevalence” or at period (interval of time) to measure what is called “period prevalence”
of DM among the studied population. Cross sectional studies also allow the researcher to
examine the frequency of DM in relation to age sex, socioeconomic status, and other risk
factors for DM. Since there is no longitudinal component, cross sectional surveys cannot
possibly measure incidence of any disease.
Strengths and limitations of cross sectional studies:

Cross sectional studies are easy to do and tend to be economical since repeated
data collection is not done. They also yield useful data on prevalence of diseases and this
is often good enough to assess the health situation of the studied population.
The main problem with a cross sectional study stems from the fact that both the
exposure and the outcome are measured simultaneously. So, even if a strong association
is made out between an exposure and the outcome, it is not easy to determine which
occurred first, the exposure or the outcome. In other words, causal associations cannot be
made based on cross sectional data.
A cross sectional study is considered to be one of the weakest epidemiological
designs. The investigators merely describe the health status of a population or
characteristic of a number of patients. Description is usually done with respect to time,
place and person. Cross sectional studies are considered to be weak because they make
no attempt to link cause and effect and therefore no causal association can be determined.
These studies, however, are often the first step to a well designed epidemiological study.
They allow the investigator to generate and define a good hypothesis which can then be

tested using a better study design such as case control and prospective studies. A case
report and case series are other examples of a descriptive study. These reports offer only
limited information about a patient with an unexpected event, or a group of patients and
their clinical characteristics and outcomes.
2. Ecological studies
Ecological studies are also weak designs. Here the units of study are populations
rather than individuals. For example, when coronary heart disease (CHD) prevalence
rates were compared between different countries, it was found that the CHD prevalence
rate was very low in countries with low mean serum cholesterol, while it was very high in
countries with high mean serum cholesterol. This ecological link paved the way for
intensive investigation into the association between serum cholesterol and CHD.
Another example is the ecological link between malaria incidence and prevalence
of sickle cell disease. Malaria is rare in areas where sickle cell disease was prevalent.
Also, the association between smoking and lung cancer was supported by the ecological
link between smoking and gender (males had higher lung cancer rates).
Ecological studies can be useful in generating hypothesis but no causal inference
can be drawn from them; an apparent ecological link may not be a true link, it could be
confounded by several other factors. As the units of study are populations rather than
individuals, it is difficult to control the associated confounding factors.
3. Case Control Studies

Case control studies are backward directional studies that compare a group of
people with disease (cases) with one or more groups of non-diseased (controls). In a case
control design, sampling starts with diseased and non-diseased individuals. The exposure
status is then determined by looking backward in time (using documentation of exposures
or recall of historical events) (Figure 5.2). For this reason, case control studies are also
called as retrospective studies.
The measure of association in a case control study is called an Odds Ratio (OR).
The OR is the ratio of the odds (chance) of exposure among cases in favor of exposure
among controls. If the disease is rare, then the OR tends to be a good approximation of

the Relative Risk (RR). However, true incidence estimates can not be generated from a
case control study.
Case control studies are much simpler and easier to do when compared to cohort
studies. They are very cost-efficient. Unfortunately, lack of a clear understanding of the
case control methodology has lead people to believe that it is a second-rate substitute for
cohort study. In reality, case control designs have a sound theoretical basis and well
designed case control studies can provide information as good as cohort studies.
Figure 5.2. Diagram presenting the design of case-control study.
Strengths and limitations of case-control studies:

Case control studies are the best design for investigating the etiology of rare
diseases such as cancers. If this hypothesis were to be tested using a cohort design,
several thousand mothers would have to be followed up until developing this rare disease.
Case control study allows the investigator to simultaneously explore the multiple possible
associations with a disease. Case control studies are also remarkably cost-efficient as well
as time safer.
The limitations of case-control studies included the possibility of occurrence of
various types of bias. For example, if the control group that is selected for comparison
has a very low odds for exposure, then the resultant OR will be biased. The choice of
controls for such studies is still a challenge in designing these interesting studies. Also,
other types of bias like information bias (recall bias) and confounding can make case

control studies difficult to handle. If we can solve confounding bias by collecting data
about the known confounders with regard to the studied disease and exposure through the
adjustment and multiple regression models, the research will find difficulty in controlling
the possibility of recall bias as cases tend to recall more events related to the disease than
controls did. Case control studies, because they rely on history of past exposure, also
suffer from the problem of unreliable data. Memory for many events fades and if no
documentation of past exposure exists, then results of the study may be invalid. By
invalidity, we mean the distortion of the study that also means its results are completely
out of the truth.
4. Cohort Studies
Cohort studies are considered the strongest of all observational designs. A cohort
study is conceptually very straightforward. The idea is to measure and compare the
incidence of disease in two or more study cohorts. In epidemiology, a cohort is a group of
people who share a common experience, condition or characteristic. For example, a birth
cohort shares the same year of birth; a cohort of industrial worker share the same
exposure in the working environment; also, a cohort of smokers has smoking as the
common experience.
In cohort studies, there is usually exposed and unexposed cohort. The exposed
group or cohort is those individuals that have been exposed to some event or condition,
and the unexposed cohort is those individual that have not been exposed to this event or
condition. For example, in the classic General Surgeon’s cohort study on smoking and
lung cancer, the exposure factor was smoking. A cohort of smokers and a cohort of
nonsmokers were followed up for a long period of time and the incidence of lung cancer
was measured and compared among the exposed and unexposed cohort. Normally, an
effort is made to match both cohorts with respect to age, sex and other important
variables; the only key difference between the two cohorts is the outcome (lung cancer).
In such condition, both cohorts are similar in every thing except exposure and this adds to
the strength of the study as the confounding bias is not liable to occur. If, however, the
researcher decided to match also the exposure status (smoking), then the cohort study is
completely destructed (No thing to be compared).

Cohort studies are usually prospective or forward looking studies. They are also
called longitudinal studies (Figure 5.3). Disease free cohorts are defined on the basis of
the exposure status and then they are followed up for long time periods (follow up
depends on the natural history of the outcome disease and how rare the outcome is; for
example when cancers are the outcome, a very long follow-up period (many years) is
needed). New cases of the outcome disease are picked up during follow up and the
incidence of the disease is computed on the basis of the exposure status. The incidence in
the exposed cohort is then compared with the incidence in the unexposed cohort. This
ratio is called Relative Risk (RR) or Risk Ratio.
RR = Incidence in the exposed cohort /Incidence in the unexposed cohort
The relative risk is a measure of association between the exposure and the
outcome. The larger the RR, the stronger the association (See also Chapter 3). The cohort
study is the only study design in which the true incidence of a disease can be estimated.
Figure 5.3. Diagram presenting the design of prospective cohort studies.
Strengths and limitations of prospective cohort studies:

Cohort studies are very strong designs. But they are very time consuming and
expensive. Since most diseases are rare, large cohorts have to be followed up for many
years to get good estimates of incidence and RR. This makes feasibility very difficult.
Cohort studies have the major advantage of greater assurance that exposure preceded the
outcome (smoking preceded lung cancer). This clear temporal (time) sequencing is

extremely important while making causal inference (See also Hill’s criteria in Chapter 4).
Also, in cohort studies, the effect of a certain exposure can be studied for multiple
outcomes at the same time. For example, in a cohort study on smoking, its association
can be studied with several outcomes, lung cancer, coronary heart disease, stroke, etc.
II. Experimental (intervention) studies:

There are two types of intervention studies:
i. Clinical trials, also called the randomized controlled trial (RCT).
ii. Community intervention trials.
In this section, only RCT is discussed in details because of its importance in clinical
medicine researches as well its highest strength compared with other study designs.
The Randomized Controlled Trial (RCT)
The RCT is widely held as the ultimate study design; the “gold standard” against
which all other designs are compared. The sequence of RCT design is shown in figure 5.4.
The subjects are usually chosen from a large number of potential subjects. Sampling
includes the use of a set of inclusion and exclusion criteria. After this, an informed
consent is obtained from each participant.
Randomization is then done to allocate subjects to either the treatment group or the
placebo group. Randomization achieves two important things: (i) allocation to different
groups (treatment and placebo) is done without bias because it is taken out of hands of
the investigator, and (ii) randomization distributes known and unknown confounders
equally between the two studied groups. Once randomization is done, intervention is
begun. Ideally, intervention, either taking the treatment or placebo, should be done in a
blinded fashion. The issues of blinding are discussed later.
In conducting a randomized clinical trial, the researcher should consider the
following four important points. Also, the reader of any published RCT should take into
consideration these points, and to be able to criticize them on a scientific basis:
1. Appropriation of the control group.
2. Randomized allocation.
3. Blinded intervention and blind ascertainment of outcome.
4. Data analysis by intention-to-treat principle.

Figure 5.4. Diagram presenting the design of RCT.
1. Appropriation of the control group:

In RCT, the test group takes the real treatment while the control group takes a
placebo; many authors argue that it is unethical to do a placebo controlled study when
some therapy is already existent. No patient should be denied some form of therapy even
if it is not very effective. To solve this ethical problem, the following requirements for
the test and control treatment should be considered:
i. They must be distinguishable from one another
ii. They must be medically justifiable
iii. There must be an ethical base for use of either treatment
iv. Either treatments must be acceptable to study patients and to physicians
administering them
v. There must be reasonable doubt regarding the efficacy of the test treatment
vi. There should be reason to believe that the benefits will outweigh the risks of
the treatment.
2. Randomized allocation:
Once the eligible subject has agreed to participate in the trial, it is important that
assignment to treatment or control group is done in a manner that is free of any selection
bias. To avoid bias, neither the patient nor the physician should be aware of the group to

which the patient will be allocated. This is done by randomizing blinded fashion (the best
at this stage is the double blinded fashion).
Randomization also ensures that the baseline characteristics of the test and the
control groups are more or less similar in order to provide a valid basis for comparison. If
allocation is not randomized, however, it is possible that subjects with favorable
characteristics may be allocated to the treatment group while those with less favorable
characteristics may be allocated to the control group. In this way, the trial is not a
randomized.
3. Blinded intervention:
The aim of blinding is to ensure that outcome ascertainment is done without any
bias. Blinding is logistically difficult but essential. Some authors use the word "masking"
instead of blinding. A single-blinded trial is one in which the patient is not informed of
the treatment assignment. A double-blinded trial is one in which neither the patient nor
the physician responsible for the treatment is informed of the treatment assignment.
Sometimes, the investigator analyzed the data did not also known any thing about the
treatment assignments and this is called triple blind technique. RCT usually report the
effectiveness of blinding. Sometimes, known adverse effects of drugs may un-blind the
physician (e.g. bradycardia due to beta blockers). Ideally, data collection, measurement,
reading and classification procedures on individual patients should be made by persons
who are completely blinded. For instance, if chest radiographs have to be read, the films
can be sent to another site where they are read by radiologists who have no idea about the
patients or their treatment groups.
As far as possible, outcomes chosen should be objective and clinically relevant.
Outcomes should be capable of being observed in a blinded fashion. For instance, pain is
a very subjective outcome and difficult to measure in a blinded fashion. On the other
hand, if the outcome is a biochemical parameter, then it is objective and can be easily
measured in a blinded fashion.

4. Data analysis by intention-to-treat principle:
This is a very important issue in the analysis of RCT results. All patients allocated
to each arm of the treatment regimen are analyzed together as representing that treatment
arm, whether or not they received or completed the prescribed regimen. Failure to follow
defeats the main purpose of randomization and can invalidate the results. For instance, if
a patient had been originally randomized to receive placebo, and if, for some reason, he
actually ended up getting the study treatment, for the purposes of analysis, this patient
will still be counted as belonging to the placebo group (effectiveness analysis).
Effectiveness analysis is thus considered the results of all subjects according to their
originally assigned treatment groups irrespective of failures in compliance,
discontinuation or other reason of withdrawal. In contrast, efficacy analysis includes only
subjects who completed the clinical trial protocol and completed the intended treatment.
From a physiological perspective, however, efficacy analysis is more pertinent.

Sampling Methods:
Conducting any of the above mentioned studies, the researchers rarely get the
opportunity to study the entire population. Instead, a sample of the population is studied.
Sampling offers the following advantages:
i. Reduced cost: The data collected from a small fraction of the population is
actually of smaller expenditure than that collected from the entire population.
ii. Greater speed: Data can be collected and summarized more quickly with a
sample that with a complete census.
iii. Greater scope: Collecting data from a sample can allow the researcher to get
information about more factors with greater flexibility in the type of
information collected.
iv. Greater accuracy: Because of the reduced volume of work, the data collected
is suspected to be of high quality data.
Sampling is the process of selecting units (e.g., people, organizations) from a
population of interest so that by studying the sample we may fairly generalize our results
back to the population from which they were chosen. Before selecting a sample, the
researcher must clearly define the target population. This population is called the target
population, internal population, or study base. The target population is chosen in a way to
answer the objectives of the study, and according to the sources of data available to the
researcher. A distinction is made between this internal (target) population and the larger
external population, about which additional generalization may be made. The extent to
which the sample reflects the target is called the study’s internal validity. The extent to
which the target population reflects the external population is called the study’s external
validity. By internal validity, we mean the extent to which the study is accurate with
regard to the systematic errors, while the external validity means the extent to which the
study findings can be generalized (Figure 5.5). In this figure, the space between the
smallest and intermediate circles represents the internal validity of the study, where the
space between the intermediate and the largest circles represents the external validity of
the study. More details about validity are discussed in Chapter 6.

Figure 5.5. Sampling and validity.
Sample
Target
Pop.
External
Pop.
Sometimes, the entire population will be sufficiently small, and the researcher can
include the entire population in the study. This type of research is called a census study
because data is gathered on every member of the population. Usually, the population is
too large for the researcher to attempt to survey all of its members. A small, but carefully
chosen sample can be used to represent the population. The sample reflects the
characteristics of the population from which it is drawn.
Sampling methods are simply classified as either probability or nonprobability. In
probability samples, each member of the population has a known non-zero probability of
being selected (i.e., each member has the same chance of being selected). Probability
methods include random sampling, systematic sampling, stratified and cluster sampling.
In nonprobability sampling, however, members are selected from the population in some
nonrandom manner. These include convenience sampling, judgment sampling, quota
sampling, and snowball sampling. The advantage of probability sampling is that sampling
error can be calculated. In nonprobability sampling, the degree to which the sample
differs from the population remains unknown.

Probability sampling methods:
1. Simple random sampling
The basic sampling method is simple random sampling (SRS), which is a self-
weighting sampling design, meaning that each member of the population has an equal
and known chance of being included in the sample. A convenient way to draw a simple
random sample is to assign each population element a (pseudo) random number, sort the
data set according to the random numbers, and finally select the required sample size
from any sequential part of the population, normally beginning from the first element and
continuing until the desired sample size is reached.
2. Systematic random sampling

Systematic sampling means that every one of the studied element is selected over
the whole sampling frame. Sampling is started by selecting the first element from the
range [1, r] and elements are then selected at the sampling interval r up to the end of the
frame.
A population may be arranged in a certain order that can be used for implicit
stratification. However, this sampling procedure should be approached with caution. If
the elements are in a random order, systematic sampling practically matches simple
random sampling. However, if the population contains some (hidden) order or sequence,
systematic sampling may yield a sample that consists of very similar elements that do not
reflect the true population variation and thus the sample can lead to erroneous or biased
results. In certain cases, proper arranging of the population can yield better samples than
the use of a random order or the use of simple random sampling. If, for example, the
focus of interest is a spatially correlated phenomenon among the population, systematic
sampling from the geographically sorted population register would produce samples that
would be much better distributed than ones obtained by simple random sampling.
3. Stratified sampling
It is commonly used probability method that is superior to random sampling
because it reduces sampling error. In stratified sampling, the population is first divided
into mutually exclusive sub-populations known as strata. Then, different sampling

designs or sampling rates can be applied in different strata. Sometimes all elements in a
certain stratum must be investigated, as in a census, while sampling only may be applied
to another stratum. In household surveys, fairly basic demographic stratification criteria
are used, e.g. geographical area, age or gender.
Stratified sampling is often used when one or more of the stratums in the
population have a low incidence relative to the other stratums. The aim is to achieve as
homogenous sub-populations as possible given the available information. Stratification is
necessary for skewed populations when simple random sampling or other self-weighting
designs are applied. Stratification basically requires that all population elements have
information permitting construction of the strata. Ultimately, each element can only
belong to one stratum.
4. Cluster sampling
In cluster sampling, either all elements from each selected cluster can be included
in the sample, or a sub-selection can be made from within the selected clusters. The
former case is called one-stage cluster sampling while the latter is known as two-stage
cluster sampling. An example of cluster sampling would be a study of work conditions
where the firms/enterprises are selected first and their employees are then selected for the
examination. A firm is a cluster and if all its employees are examined, this will be a one-
stage cluster design, but if only some of the employees are selected, the design will be a
two-stage cluster sampling. The process could comprise even more stages than this,
depending on the population structure.
Nonprobablity sampling methods:

1. Convenience sampling
This type of sampling is used in exploratory research where the researcher is
interested in getting an inexpensive approximation of the truth. As the name implies, the
sample is selected because they are convenient. This nonprobability method is often used
during preliminary research efforts to get a gross estimate of the results, without incurring
the cost or time required to select a random sample.

2. Judgment sampling
It is a common nonprobability method. The researcher selects the sample based on
judgment. This is usually and extension of convenience sampling. For example, a
researcher may decide to draw the entire sample from one "representative" city, even
though the population includes all cities. When using this method, the researcher must be
confident that the chosen sample is truly representative of the entire population.
3. Quota sampling
This sampling method is the nonprobability equivalent of stratified sampling. Like
stratified sampling, the researcher first identifies the stratums and their proportions as
they are represented in the population. Then convenience or judgment sampling is used to
select the required number of subjects from each stratum. This differs from stratified
sampling, where the stratums are filled by random sampling.
4. Snowball sampling
It is a special nonprobability method used when the desired sample characteristic
is rare. It may be extremely difficult or cost prohibitive to locate respondents in these
situations. Snowball sampling relies on referrals from initial subjects to generate
additional subjects. While this technique can dramatically lower search costs, it comes at
the expense of introducing bias because the technique itself reduces the likelihood that
the sample will represent a good cross section from the population.

Sample size calculation
Estimating sample size is a very important aspect of study design, because without
this calculation, sample size may be too large or too small. If sample size is too small, the
experiment will lack the precision to provide reliable answers to the questions it is
investigating. If sample size is too large, the needed time and cost will be increased.
The formulas used to calculate sample size vary from study to study and from
problem to problem. The sample size you need when the outcome is binary (in categories
such as diseased and non diseased, etc) is different than when your outcome is continuous.
For a continuous outcome, you need to specify the variability of your outcome measure
and how much of a change you would consider clinically relevant. For a binary outcome,
you still need to specify the clinically relevant change. But you don't need a measure of
variability. What you need instead is an estimate in your control group of the probability
for one level of your binary outcome, or you might need to specify the distribution
(prevalence) of your explanatory (independent) variable in the control group or better in
the target population. Accordingly, estimation of sample size requires a priori
information in order to perform such calculation correctly. Most of the used statistical
programs are now calculating the sample size when you supply the program by the
information required. There are many available statistical websites to perform this
function; the site below is an example of these available sites:
http://calculators.stat.ucla.edu/powercalc/
It is also desirable to determine, before conducting the study, whether your sample
size will be large enough to have a reasonable probability of detecting and estimating an
effect if exists. This is accomplished by calculating the power of the study. The power is
not the probability that the study will estimate the true size of the effect correctly. Rather,
it is the probability that the study will yield a statistically significant departure from the
null association. The power of the study depends on the following points:
i. The acceptable level for the alpha or Type I error (the error of rejecting the
null hypothesis when it is true). By convention, this value is usually set at 0.05.
ii. The disease rate in the non-exposed group in a cohort study or the exposure
prevalence of the controls in a case-control study.

iii. The specified value of the relative risk under the alternative (non-null)
hypothesis.
iv. The size of the population being studied (sample size).
Once these quantities have been determined, standard formulas are then available to
calculate the statistical power of the study (see the above listed website).
Increasing the power of the study:

The main problem in designing and carrying out studies is to gain sufficient power. Study
power can be increased by:
1. Increase Sample Size.
2. Increase Alpha level.
α = 0.05 (2-tailed), N = 40, Power = 0.49
α = 0.10 (2-tailed), N = 40, Power = 0.62
3. Decrease Standard Deviations (i.e., homogenous groups).
By using more homogenous groups (in an experimental study) the relative effect
size increases. Similarly increasing the reliability of the measures will have the
same effect.
There are still some limitations when you calculate sample size and study power. These
limitations include:
i. They are based on the presumption that the purpose of the study is to make a
decision solely on the basis of the information obtained by the study, whereas
in practice the study findings are usually evaluated in the context of the
findings of previous studies.
ii. They assume that the purpose of the study is simply to distinguish between two,
and only two, competing hypotheses: the null and an alternative.
iii. Power calculations depend on an arbitrary definition of statistical significance
(the choice of the alpha error; by convention it is 0.05), which is increasingly
discouraged in epidemiology, in favor of a weight of evidence approach to
data interpretation.
iv. The choice of beta error rate is also arbitrary.
v. The choice of alternative values of relative risk is often little more than a guess.

Matching in epidemiologic studies
Matching refers to the selection of reference series (unexposed subjects in a cohort
study or controls in a case-control study) that is identical, or nearly so, to the index series
with respect to the distribution of one or more potentially confounding factors. Two types
of matching are known: Individual matching and frequency matching; however, there is
no important difference in the proper analysis of individually matched and frequency
matched data. By individual matching, we mean matching a case of 30 years old by one
or more controls with nearly the same age (i.e., 30 years old). By frequency matching,
however, we match group of cases aged 30-40 years by group of control in the same age
group (10 years group frequency matching). It is pertinent here to note that this age group
frequency matching may be less or more than 10-years. In the frequency matching, there
may still be potential confounding bias during analysis of the data.
Matching of reference to index subjects represents an example of stratified
sampling methods with a forced pre-specified distribution of the index group. In
matching process, the researcher should first know the following important points:
i. Matching on a factor may necessitate its control in the analysis, particularly in
the frequency matching data, because of the potential residual confounding.
ii. Matching is rarely the optimal, but is by far the most common and usually the
most easily implemented stratified design in epidemiology.
iii. If a large number of confounding factors is anticipated, matching may be
desirable to ensure an informative stratified analysis. However, attempting to
match on many variables may render the study expensive or make it
impossible to find matched subjects. Thus the most practical option is often to
match only on a few nominal-scale confounders, especially those with a large
number of possible values. Any remaining confounders can be controlled
along with the matching factors by stratification or regression methods.
iv. The chief importance of matching in observational studies stems from its
effect on study size efficiency (i.e., amount of information per subject), not on
validity.
v. In both cohort and case-control studies, matching may lead to either a gain or
loss in efficiency.

vi. In case-control studies, matching doesn’t eliminate confounding and can
introduce a selection bias, which behaves like confounding but is not a
reflection of original confounding. The bias can be and must be accounted for
in the subsequent analysis through control of the matching factors.
vii. In cohort studies, matching changes the distribution of the matching factors in
the entire source population. Matching in case-control studies affects only the
distribution of controls. If matching factors are associated with exposure, the
selection process will be differential with respect to both exposure and disease,
thereby resulting in selection bias.
viii. In case-control studies, if the matching factors are associated with the
exposure in the source population, matching requires control of the matching
factor in the subsequent analysis even if the matching factors are not risk
factors for the disease. However, in cohort study without competing risks or
losses to follow up, no additional action is required in the analysis to control
confounding by the matching factors.
Matching in case-control Studies

Matching is a useful means for improving study efficiency in terms of the amount
of information per subject studied, in some but not all situations. Case-control matching
is indicated for known confounders that are measured on a nominal scale, especially
those with many categories. The utility of matching variable derives not from an ability
to prevent confounding, but from the enhanced efficiency that is sometimes affords for
the control of confounding. Suppose the matching variable is a confounder and its
distribution between the groups differs dramatically, matching will create constant ratio
of controls to cases across the matching factor strata, and the efficiency may be
improved in this way. In the following sections, the benefits and harms of matching on
the efficiency of case-control study are discussed in some details.
Benefits of matching in case-control study:

1. Matching in case-control studies can be considered a means of providing a more
efficient stratified analysis rather than a direct means of preventing confounding.

2. Matching forces the controls to have the same distribution of matching factors across
strata as the cases and hence prevents extreme departures from what would be the
optimal control distribution.
3. In studies that call for fine stratification and so yield sparse data, matching will more
often lead to an improvement in the efficiency when compared with unmatched
studies with the same stratification. However, stratification may still be necessary to
control the selection bias and any confounding left behind and matching will often
make the stratification more efficient.
4. If there is some flexibility in selecting cases as well as controls, efficiency can be
improved by altering the case distribution, as well as the control distribution, to
approach optimal distribution across strata.
Harms of Matching in case-control study:

1. Case-control matching on a non confounder will usually harm efficiency, for then the
more efficient strategy would usually be neither to match nor to stratify on the factor.
2. The selection bias introduced into the case-control study by the matching process can
occur whether or not there is confounding in the source population. The bias is
generally in the direction of the null value of effect. Matching is liable to induce bias
if controls are selected to match the cases on a factor that is correlated with the
exposure, then the crude exposure frequency in controls will be distorted in the
direction of similarity to that of the cases and it will be different from that of the
source population. In this condition, the resulting bias of the effect estimate will not
depend on the direction of association between the exposure and the matching factor.
If the matching factor, however, is not associated with the exposure, then matching
will not influence the exposure distribution of controls, and therefore no bias is
introduced by matching.
Matching in cohort studies

Matching unexposed to exposed subjects in a constant ratio can prevent
confounding of the crude risk difference and ratio because such matching prevents
association between exposure and matching factor at the start of follow up. However, it

is uncommon, because great expense of matching large cohort and because, if the
exposure and the matching factor affect disease risk or censoring, the original balance
produce by matching will not extend to persons and person-time analysis. Thus, it does
not necessary eliminate the need for control of the matching factors in the subsequent
analysis. Even if only pure-count data and risks are to be examined and no censoring
occurs, control of any risk factors used for matching will be necessary to obtain valid
standard-deviation estimates for the risk-difference and risk-ratio estimates. In the
contrast, case-control studies require control of matching factors associated with
exposure (rather than risk).
Matching and efficiency in cohort studies

Although matching can often improve statistical efficiency in cohort studies by
reducing the standard deviation of effect estimates, such a benefit is not assured if
exposure is not randomized. In experiment design, matching is a protocol for treatment
assignment. However, in non experimental cohort studies, matching refers to a family of
protocols for subject selection rather than treatment assignment.
Overmatching
There are at least three forms of overmatching: the first refers to overmatching that
harms statistical efficiency, such as in case-control study, matching on a variable witch
associated with exposure but not disease. The second refers to matching harms validity,
such as matching on an intermediate factor between exposure and disease. The third
refers to matching that harms cost efficiency
1. Matching that harms statistical efficiency:

Matching on a variable witch is known to be associated with exposure but not with
the disease is expected to harm the statistical efficiency of the study. As for example, if
the case and its matched control are either both exposed or both unexposed, such a pair of
subjects will not contribute any information to the analysis and it harms not only
statistical but also cost efficiency. The extent to which information is lost by matching
depends on the degree of correlation between the matching factor and the exposure. A

strong correlate of exposure that has no relation to disease is the worst factor to match for.
It biases the result toward the null value. Accordingly, the investigator must apply this
practice rule while matching on any factor “do not match on factor associated only with
exposure.” In many situations, however, the potential matching factor will have at least a
very weak relation to the disease, so it will not be clear whether factor needs to be
controlled as a confounder and whether matching will benefit the statistical efficiency. In
such situations, consideration of cost efficiency and misclassification may predominate.
2. Matching that harms validity:

Matching that harms validity is liable to occur when the investigator matching on
an intermediate factor, see overmatching bias below.
3. Matching and cost efficiency:

When matched and unmatched controls have equal cost and the potential matching
factor is to be treated purely in the study analysis as a confounder, with only
summarization (pooling) across the matching strata desired, it is recommend to avoid
matching on such potential confounding factor unless this factor is expected to be a
strong disease risk factor with at least some association with exposure. However, when
the cost of matched and unmatched controls differs, efficiency calculations that take
account of the cost differences can be performed and used to choose a design strategy.
Overmatching bias
Matching on factors that are affected by disease or exposures alone will bias the
result. Matching on a factor affected by exposure but unrelated to disease in any way will
simply reduce statistical efficiency. Matching on intermediate factors will bias the crude
and adjusted effect estimates, such as matching on symptoms or signs of the exposure or
of the disease. As a general rule, before doing matching, the following two questions
should be perfectly answered: (i) what are the benefits, and (2) what are harms of
matching on the study results. Also, one should never match on factors that are affected
by exposure only.

Ethical consideration in conducting epidemiologic researches:
It is now obligatory that all clinical trials, as well as epidemiologic researches that
studied human beings, to be reviewed and cleared by an ethical committee or Institutional
Review Board (IRB) which is now established in most of the universities and research
institutes interested in human researches. The aim of the committee or IRB is principally
to protect the human beings involved in a research and respect the following things
among them:
Respect for Human Dignity:

This represents the cardinal principle of modern research ethics. This principle
aspires to protecting the multiple and interdependent interests of the person from bodily
to psychological to cultural integrity. This principle also forms the basis of most of the
ethical obligations listed below.
Respect for Free and Informed Consent:

The voluntary consent of the human subject is absolutely essential. This means
that the person involved should have legal capacity to give consent; should be so situated
as to be able to exercise free power of choice, without the intervention of any element of
force, fraud, deceit, duress, over-reaching form of constraint or coercion; and should have
sufficient knowledge and comprehension of the elements of the subject matter involved
as to enable him to make an understanding and enlightened decision.
Respect for Vulnerable Persons:

Respect for human dignity entails high ethical obligations towards vulnerable
persons (i.e., those with diminished competence and / or decision-making capacity make
them vulnerable). Children, institutionalized persons or others who are vulnerable are
entitled, on grounds of human dignity, caring, solidarity and fairness, to special
protection against abuse, exploitation or discrimination. Ethical obligations to vulnerable
individuals in the research enterprise will often translate into special procedures to protect
their interests.

Respect for Privacy and Confidentiality:
Respect for human dignity also implies the principles of respect for privacy and
confidentiality. In many cultures, privacy and confidentiality are considered fundamental
to human dignity. From this point of view, there are general guidelines of privacy and
confidentiality should be respected, and which includes :
i. Limiting access, control and dissemination of personal information to those who
have a legitimate need.
ii. Avoiding idle conversation about the patients.
iii. Using pseudonyms and altering other identifying data when presenting cases in
conference and teaching situations.
iv. Keeping the nominal data of the subjects in special file out of reach by others.
Respect for Justice and Inclusiveness:

Justice connotes fairness and equity. Procedural justice requires that the ethics
review process have fair methods, standards and procedures for reviewing research
protocols, and that the process be effectively independent. Justice also concerns the
distribution of benefits and burdens of research. The distributive justice also imposes
duties neither to neglect nor discriminate against individuals and groups who may benefit
from advances in research. Compensatory justice, which is another form of justice, with
the attempt to reward the subjects for any losses that are not the consequence of their own
action.
Balancing Harms and Benefits:

Harms-benefits analysis affects primarily the welfare and rights of human subjects.
Thus its analysis, balance and distribution are essential to the ethics of human research,
and the modern research ethics now are requiring a favorable harms-benefit balance
aiming to minimizing the harm and maximizing the benefit.
Minimizing Harm:
This principle is directly related to harms-benefits analysis. It is the duty to avoid,
prevent or minimize harms to others. Research subjects must not be subjected to

unnecessary risks of harm and their participation in research must be essential to
achieving scientifically important aims that cannot be realized without the participation
of human subjects. In addition, it should be kept in mind that the principle of minimizing
harm requires that the research involve the smallest number of human subjects and the
smallest number of tests on these subjects that will ensure scientifically valid data.
Maximizing Benefit:
Another principle related to the harms and benefits of research is beneficence. The
principle of beneficence imposes a duty to benefit others and, in research ethics, a duty to
maximize net benefits. Human researches are always intended to produce benefits for
subjects themselves, for other individuals or society as a whole, or for the advancement
of knowledge.
BIBLIOGRAPHY
1. Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. World Health
Organization (WHO), 1993.
2. Friedman LM et al. Fundamentals of Clinical Trials. Boston: Johns Wright, 1985.
3. Gerstman BB. Epidemiology kept simple: An introduction to classic and modern
epidemiology. 1st Edition. New York: A John Willy and Sons, Inc., Publication, 1998.
4. Hulley SB. Designing Clinical Research. Baltimore: Williams & Wilkins, 1988.
5. MacMahon B and Trichopoulos D. Epidemiology: Principles & Methods. 2nd Edition.
Little Brown and Co, 1996.
6. Mienert C. Clinical Trials. New York: Oxford University Press, 1986.
7. Rothman KJ, Greenland S. Modem Epidemiology, 2nd Edition. Philadelphia:
8. Vannus H and Satcher D. Ethical complexities of conducting research in developing
countries. NEJM 1997, 337:1003-1005.

CHAPTER 6
Precision and Validity in Epidemiologic Studies
Precision and validity represent the most important topics in epidemiologic studies
design and referred to the accuracy of epidemiologic studies. This chapter will outline the
distinction between issues of precision, which involve random error, and validity, which
involves systematic error. The major validity issues include confounding bias, selection
bias and information bias. The design of epidemiologic studies should be based on the
need to minimize random and systematic error. This often involves a tradeoff between
competing study design considerations. For example, obtaining more detailed exposure
information may reduce systematic error (through reducing information bias) but may
increase random error (by reducing the possible study size). This chapter discussed all
these issues in details but without explaining any sophisticated statistical methods which
is beyond the scope of this book.
Precision
The precision of an epidemiologic study is determined by the degree to which
random error may contribute to the study results. In sampling contexts, the random error
is called sampling error. Sampling error is the degree to which a sample might differ from
the population. When inferring to the population, results are reported plus or minus the
sampling error. Sampling error gives us some idea of the precision of our statistical
estimate. A low sampling error means that we had relatively less variability or range in
the sampling distribution with regard to the target population. Sampling error calculation
is based on the standard deviation of the sample. The greater of the sample standard
deviation, the greater will be the sampling error. The sampling error is also related to the
sample size. The greater the sample size, the smaller will be the error.
Example:
Suppose a sample of 100 subjects from a village is studied and the mean systolic
BP was found to be 110, is this the true mean of the entire village population? To answer
this question, we need to know what will happen if another sample of 100 is drawn from
the same village. The new mean may or may not be 110. Thus each time a fresh sample is
studied, different results may be obtained. This variation is because of a) variation within

the population and b) sampling variation. Sampling variability arises because of chance;
by chance one may pick a lot of hypertensive individuals into the sample and this may
increase the mean systolic BP. The reverse could also happen; by chance one may pick
many people with low systolic BP. The best way of decreasing sampling variability is to
increase the sample size. If large samples are studied, the error will tend to be minimal.
If different samples yield different results, then how will we ever know the true
population value? In other words, will we ever know the truth when we study only
samples? To answer these, we need to understand the concept of confidence intervals.
Confidence interval
We know that most of the biological continuous variables are normally distributed
and the central value is the mean. Interestingly, if repeated samples are drawn from a
population and the mean (x) computed for each sample, it will be found that the means
themselves are normally distributed (most samples will yield means which are close to
some central value while a few will yield extreme values). The mean of all the sample
means will be the true population mean (X). The standard deviation of the sample is
denoted by (s). The standard deviation of the distribution of sample means, called the
standard error of the mean, is the same as the population standard deviation (S).
Mathematically, the random error is calculated by dividing the standard deviation of the
sample by the square root of the sample size (n).
SE = s / √n [formula when mean is estimated]
If one is estimating a proportion (p) instead of a mean, the SE is computed by dividing pq
(where q= 1 -p) by the sample size (n) and getting the square root of the resultant number:
SE = √pq / n [formula when proportion (percentage) is estimated]
In fact, it is an important rule in statistics that even if the underlying population is
not normally distributed, the means of the samples themselves will be approximately
normally distributed if the sample sizes are large enough. This phenomenon is explained
by a statistical theorem called the Central Limit Theorem. It is this rule that permits us to
estimate the population parameter using sample results. It is the fundamental basis for
understanding confidence intervals.

If we knew the standard error of a population, then we can assume that 1.96
standard errors around mean, usually rounded off to 2, will encompass 95% of all the
sample means. Now, when we work with samples, we only generate a sample mean and
sample standard deviation. The only way to guess at the population mean is to construct
confidence intervals around the estimated sample mean, i.e., to construct a range of
values around the estimate. This range, we hope, will include the real population mean
95% of the time. In other words, we will be 95% confidence that the range around the
estimated mean (sample mean) will include the real population mean. Thus, SE is
computed using the sample standard deviation (s) and sample size (n), and 2 SE
constructed around the sample mean will include the population mean 95% of the time.
To understand the issue of confidence interval, suppose that the sample mean (x)
systolic BP is 110 and sample standard deviation (s) is 10 and sample size (n) is 100,
then the standard error of the sample mean (SE) is 10/ √100 = 1. Accordingly two SE = 2.
Thus 110 ± 2 (108 - 112) will be the upper and lower limit of 95% confidence interval
(CI). The inference will be that the real population mean will be somewhere between 108
and 112 in 95 out of 100 samples. If 2.5 SE is computed, then the 99% CI can be
computed.
The importance of 95% CI is that it allows the researcher to guess at the
population value using only the sample value. In research papers, the 95% CI should
always be given along with the sample estimate. It is easy to understand that if the sample
size were to be as large as the population, then SE will be zero and the sample mean will
be the population mean. Thus, larger the sample size is associated with narrow
confidence interval and hence the precision of the study estimates will be high and
approach the truth, that is the true population estimate.
Supposing the mean cholesterol value of a sample population was estimated to be
220 and reported as such, we would not know how close to the truth this estimate is. On
the other hand, if it were reported this estimate as 220 (95% CI= 200 -240), then we
know that the true value is somewhere between 200 and 240. If the 95% CI of this
estimate was 160 -300, the usefulness of this estimate becomes doubtful. Such wide 95%
CI may either mean a very small sample size or a very heterogeneous population.

Hypothesis testing
Hypothesis testing (tests of significance) involves ascertaining whether an
observed difference could have occurred purely due to chance. This probability is
quantified as a P-value. There are many tests of significance. The most popular ones are
t-test, Z-test, Chi-square (2) test and the Fisher's exact test. It must be remembered that
all tests cannot be used in all situations. The test must be carefully chosen after
considering issues like what type of data is being analyzed (continuous or discrete) and
what the sample size is. Tests that are used for continuous data are called “parametric
tests” and tests used for discrete data are called “non-parametric tests.” Parametric
implies that there is an underlying normal distribution assumption. Non-parametric tests
do not assume an underlying normal distribution. Figure 6.1 illustrates the probability
tests used by type of data.
Figure 6.1. Tests of significance according to the data.
Supposing the mean birth weight of babies in Cairo is known to be 3.6 kg and the
standard deviation 0.5. A sample of 100 babies from Nasr city (a district in Cairo) is
found to have a mean birth weight of 3.2 and standard deviation of 0.4. Is this observed
difference (3.6 – 3.2) real or is it due to sampling variability. In other words, is Nasr city
truly different from the rest of Cairo? To answer this question, we need to understand
hypothesis testing.
Hypothesis testing is a method by which we determine how likely it is that
observed differences in data are entirely due to sampling variability (chance) rather than

to underlying population differences. In hypothesis testing, we first start with the
assumption that the observed difference is not a real difference but produced merely due
to play of chance. This is called the null hypothesis. Then we try to disprove the null
hypothesis by calculating the probability of the observed difference being due to chance.
This probability is given by the P value. If the P value is lower than a predetermined
figure (0.05 by convention) obtained from the statistical table for normal distribution,
then we infer that the observed difference is real and cannot be explained purely by
chance. The null hypothesis is thus rejected.
To understand the usefulness of P value to accept (i.e., the observed difference is
due to chance, sampling variability) or reject (difference is not due to chance) the null
hypothesis, we again fall back on the knowledge of normal distribution. We know that
95% of all the values in the data will fall within 2 standard deviations of the mean. The
probability of a value falling beyond 2 standard deviations on either side of the mean is
5% (0.05). To compute P value, the observed difference between 2 means is divided by
the standard error. The result (expressed as number of standard errors) is called the (Z)
value. If the Z value computed is greater than 1.96, then the P value is < 0.05. The
inference will be to reject the null hypothesis and accept that the observed difference is
real.
Z = Population mean (mean 1) – sample mean (mean 2) / SE
Comparing a sample mean with the population mean:

Going back to the previous example of birth weights, the Z value is computed as follows:
Cairo mean (population) = 3.6
Nasr city mean (sample) = 3.2
Standard deviation (sample) = 0.4
Then;
Z = 3.6 – 3.2 / 0.4 / √100 = 0.4 / 0.04
The Z value thus obtained is 10. The probability of getting a Z of 10 is < 0.05 as the Z
value exceeds 1.96. Thus we infer that the studied Nasr city mean birth weight is truly
different from the rest of Cairo, the difference being statistically significant (P < 0.05).

Comparing two groups:
Suppose that we compare the treatment effect of 2 drugs on two samples. Group 1
gets drug A while Group 2 gets drug B. After therapy, we may wish to compare the mean
systolic BP of the two groups. Are these two means (x1 and x1) really different because
of the drug treatments or are they different because of chance variation? Here again, we
need to do a statistical test of significance. The observed difference between the two
means is divided by the SE of difference between two means. The resultant Z value is
used to obtain the P value. The exact method of computing the SE when two groups are
compared with respect to either means or proportions can be looked up in any standard
biostatistics text.
Hypothesis can also be tested by using the confidence interval. In the above
example of treatment, this could be done by constructing 95% CI around the difference in
the mean BP between the drug A group and the drug B group. If the confidence intervals
overlap zero (including zero), then the observed difference could be due to chance (null
hypothesis is accepted). On the other hand, however, if the 95% CI does include zero,
then the difference true (null hypothesis is rejected). Both the Z test and 95% CI are
based on the concepts underlying the normal distribution.
Current epidemiological and statistical opinions favor the use of confidence
intervals than P-values. Nowadays, there are many journals which strongly discourage
the use o P-values. These are some reasons for this shift that include:
1. The P value tells us only whether there is a statistical significance differences or
not. On the other hand 95% CI present a range of values which tell us about the
size of difference in outcomes between two groups, and allows us to make our
own conclusions about the usefulness of the study result.
2. Confidence intervals retain information on the scale of measurement itself; P
values are more abstracted. For instance, if a clinical trial is being done to assess
the effectiveness of one drug over another, the difference between the two groups
can be presented as a blood pressure value (in mm Hg). The CI for this value will
also be presented as blood pressure in mm Hg.
3. Confidence intervals can also be used for testing hypothesis. For example, if cure
rates between two treatment groups of a clinical trial are compared, and if the

difference is 20%, this tells us that one drug has a 20% higher cure rate than the
other drug. If the 95% CI is computed for this observed 20% difference, and if it
is 5% to 35%, then this information can be actually used as a test of significance.
Since the lower limit of the CI does not include 0%, this observed difference of
20% can be reported as “statistically significant” and the P-value will be < 0.05.
4. P values do not take into account size of the observed difference. For instance,
a small difference in a large study can have the same P value as a large difference
in a small study. In other words, if the study sample size is very large, even very
small differences will become statistically significant because of the size of the
sample (even if such a small difference does not have any clinical significance or
relevance).

Validity
Validity, the other key component of study accuracy, is defined as the extent to
which a study is actually estimating what it is intended to estimate. It is essential to
differentiate this type of internal validity from that of external validity which refers to the
issue of generalization (extrapolation) of the study results.
The internal validity of any study is determined by the extent of systematic error
that is avoided or minimized. Systematic error (bias) can thus be distinguished from
random error. Random error can be reduced (and study precision can be improved) by
increasing the size of a study, whereas systematic error (bias) cannot be reduced by
increasing the study size, but only by changing the study design. Bias can occur in any
study design, although, as we will discuss in subsequent chapters, some designs are more
susceptible to certain types of bias. There are many different types of bias, but three
general forms are commonly distinguished: selection bias, information bias, and
confounding bias.
Confounding bias
Confounding can be thought of as a mixing of the effects of the exposure being
studied with effects of other factor(s) on risk of the studied outcome. These factors are
known as confounders. A confounder, if not adequately controlled in design or analysis,
may bias the exposure-disease association, making it either closer or farther from the null
than the true effect. Confounding may even reverse the apparent direction of an effect in
extreme situations. Confounding occurs when the exposed and non-exposed
subpopulations of the source population have different background disease risks; such
differences are caused by confounders. Similar problems may also occur in randomized
trials because randomization may fail, leaving the treatment groups with different
characteristics at the time that they enter the study, and because of differential loss and
non-compliance to the prescribed drug across treatment groups.
The following three preconditions are necessary for a factor to be a confounder.
1. A confounder is a factor that is predictive of disease in the absence of the
exposure under study.

2. A confounder must be associated with exposure in the source population at the
start of follow-up (i.e., at baseline).
3. A confounder should not be an intermediate in the causal pathway between
exposure and disease. For example, in a study of colon cancer among clerical
workers, it would be inappropriate to control for low physical activity if it was
considered that reduced physical activity was a consequence of being a clerical
worker, and hence a part of the causal chain leading from clerical work to colon
cancer. On the other hand, if low physical activity itself was of primary interest,
then this should be studied directly, and clerical work would be regarded as a
potential confounder if it also involved exposure to other risk factors for colon
cancer.
Control of Confounding
There are two possible errors that arise from confounding. These occur when no
attempt is made to control for a confounder; and when one controls for a non-confounder.
The former error is potentially more serious since it results in a biased effect estimate,
whereas controlling for a non-confounder that is not affected by exposure will not usually
bias the effect estimate but may reduce its precision. Confounding can be controlled in
the study design, in the analysis, or both. Control at the design stage is accomplished with
three main methods:
i. Randomized allocation of participants to exposure categories.
ii. Restrict the study to narrow ranges of values of the potential confounders, e.g.,
by restricting the study to white males aged 35-54.
iii. Matching study subjects on potential confounders. For example, in a cohort
study one would match a white male non-exposed subject aged 35-39 with an
exposed white male aged 35-39. This will prevent age-sex-race confounding
in a cohort study, but is seldom done because it often expensive and time-
consuming. In case-control studies, matching does not prevent confounding,
but does facilitate its control in the analysis.
Confounding can also be controlled in the analysis, although it may be desirable to
match on potential confounders in the design to optimize the efficiency of the analysis.
The analysis ideally should control simultaneously for all measurable confounding

factors. Control of confounding in the analysis involves stratifying the data according to
the levels of the confounder(s) and calculating an effect estimate that summarizes the
information across strata of the confounder(s). It is usually not possible to control
simultaneously for more than 2 or 3 confounders in a stratified analysis. This problem
can be mitigated to some extent by the use of multiple regression analysis modeling,
which allows for simultaneous control of more confounders by smoothing the data across
confounder strata. More details about how to assess and adjust the confounding factors in
any given epidemiologic studies are discussed in Chapter 8.
Selection Bias
Whereas confounding generally involves biases that are inherent in the source
population, selection bias involves biases arising from the procedures by which the study
participants are chosen from the source population. Thus, selection bias is not an issue in
a cohort study involving complete recruitment and follow-up because in this instance the
study cohort comprises the entire source population. However, selection bias can occur if
participation in the study or follow-up is incomplete.
Examples of selection bias in epidemiologic studies:
i. Non response bias: This type of bias occurs when the eligible study subjects
refuse or discontinue participation once the study is in progress. Withdrawal
and lost to follow to follow-up bias are common examples of this form of bias.
It is pertinent here to say that if the non respondents are known to have the
same sociodemographic characteristics as the respondent, then selection bias
tends to have minimal effects on the study results. Conventionally, although in
case-control studies, the response rate of more than 75% is considered to be
efficient, but still a suspicious of the bias effects on the study findings. In
cohort study, however, lost to follow-up of some studied subjects, even not
more, still have the suspicious of selection bias because the incidence of the
disease among them is not known.
ii. Prevalence-incidence bias: This form of bias occurs when prevalent cases are
used to study disease etiology. The prevalent cases are more likely to be long
term survivors and, therefore, may represent a relatively mild form of the

disease in question. In studying disease etiology, it is almost always advisable
to use the incident cases.
iii. Hospital-based cases bias: This type of bias occurs particularly in hospital-
based epidemiologic studies. Hospital-based patients tend to have a sever
form of the disease in question and also tend to have unhealthy behaviors such
as smoking. Therefore, hospital cases are often atypical of the target
population they are supposed to represent.
iv. Publicity bias: Occurs in such conditions when the media direct attention of
people to a real or perceived health problem. This may stimulate more case
reporting and so it may result in artifact increase of disease occurrence. To
avoid this bias, data should be restricted to the period prior to the publicity
awareness.
v. Healthy Worker Effect: The most common example of selection bias in
occupational epidemiologic studies is the healthy worker effect. This
phenomenon is characterized typically by lower relative mortality, from all
causes combined, in an occupational cohort, and occurs because relatively
healthy individuals are likely to gain employment and to remain employed.
Thus, an occupational population is usually inherently non-comparable to the
general population, and bias will occur if this is not taken into account when
comparing the mortality (or disease incidence) of a particular workforce with
that of the general population. Including the person-time experience of every
person who had ever worked in a particular factor or industry (i.e., using the
incidence rather than prevalence measures) will minimize bias due to healthy
persons remaining in employment, but will not remove the bias due to initial
selection of healthy persons into employment.
Interrelations of confounding and selection bias

Selection bias and confounding are not always clearly demarcated. In particular,
selection bias can be viewed as a type of confounding, since both can often be reduced by
controlling for surrogates for the determinants of the bias (e.g., social class).
Unfortunately, selection that is affected by exposure or disease generates a bias that

cannot be reduced in this fashion. Some consider any bias that can be controlled in the
analysis as confounding. Other biases can then be categorized according to whether they
arise from the selection of study subjects (selection bias), or their classification
(information bias). Some aspects of the Healthy Worker Effect are perhaps better
thought of as problems of selection bias rather than confounding.
Information bias
Information bias is the result of misclassification of study participants with respect
to disease or exposure status. Thus, the concept of information bias refers to those people
actually included in the study. In contrast, selection bias refers to the selection of the
study participants from the source population, and confounding generally refers to non-
comparability of subgroups within the source population. Consequently, information bias
is also called misclassification bias, especially when the data are dichotomous. Sources of
information bias include measurement device defects, questionnaires and interviews that
do not measure what they intended to measure, inaccurate diagnostic procedures, and
incomplete or erroneous data sources. It is customary to consider two types of
misclassification: non-differential and differential.
Non-differential misclassification
Non-differential misclassification occurs when the probability of exposure
misclassification is the same for both groups being compared. This arises if exposed and
non-exposed persons are equally likely to be misclassified according to disease outcome,
or if diseased and non-diseased persons are equally likely to be misclassified according to
exposure. Non-differential misclassification of exposure usually, although not always,
biases the relative risk estimate towards the null value of 1.0. In addition, the non-
differential information bias may tend to produce false negative findings in studies which
find a negligible association between exposure and disease.
Differential misclassification
Differential misclassification occurs when the probability of misclassification of
exposure is different in diseased and non-diseased persons, or the probability of

misclassification of disease is different in exposed and non-exposed persons. These
configurations can bias the observed effect estimate either toward or away from the null
value. For example, in a community-based case-control study of a severe disease, with a
control group selected from among community residents free of that disease, the recall of
occupational history and related exposures of controls might be different from that of the
cases. Cases (or proxy respondents) might have particular motivations to report specific
exposures, particularly if they had prior knowledge of presumed causal associations (e.g.,
asbestos as a well-known cause of lung cancer). In this situation, differential information
bias would occur, and it could bias the relative risk estimate (odds ratio) towards or away
from the null, depending on whether members of the cohort who did not develop the
disease of interest were more or less likely to recall such exposure than the cases. The
following are common forms of information bias suspected in the epidemiologic studies:
1. Recall bias: This type of bias is especially suspected in case-control studies as
case cases can recall their history of exposure with more rigor than controls. As
this form of bias belonging usually to the differential misclassification, it will bias
the estimated odds ratio away from the null.
2. Diagnostic suspicious bias: This type of bias may occur when one study group
undergoes greater diagnostic scrutiny than the other group.
3. The clever Hans effect (obsequiousness bias): This bias is liable to occur when
the studied subjects systematically alter their responses in the direction they
perceive to be desired by the investigator. Similar to Hans effect, the interviewer
may send nonverbal clues to study subjects, thereby influencing the subjects’
responses.
Assessment and minimizing information bias

The true extent of bias from exposure or disease misclassification can never be
known in any one study. In the absence of information to the contrary, we might be
tempted to assume that misclassifications of exposure and health outcome are both non-
differential and independent of each other, although there is usually no empirical
evidence to assess this assumption. Thus, every effort should be made during the
planning and implementation of a study to determine whether these assumptions are

supportable. Obvious examples are to ensure that the exposure assessment is performed
without the assessors having knowledge (blinded) of health status, conducting health
examinations blinded as to exposure status, and keeping study interviewers unaware of
the research hypotheses. Also, information could be minimized by testing the reliability
of the study instruments such as questionnaires, sphygmomanometer, and laboratory tests
used to collect study data before implementation of the study. The reliability of these
instruments can be tested by the use of sensitivity and specificity tests.
BIBLIOGRAPHY
1. Ahlbom A and Steineck G. Aspects of misclassification of confounding factors.
American Journal of Industrial Medicine 1992; 21:107-112.
2. Breslow NE and Day NE. Statistical methods in cancer research. Vol I. The analysis of
case-control studies. Lyon: IARC Scientific Publication No. 32. 1980.
3. Chavance M, Dellatolas G, Lellouch J. Correlated nondifferential misclassification of
disease and exposure. International Journal of Epidemiology 1992; 21: 537-46.
4. Choi BC. Definition, sources, magnitude, effect modifiers, and strategies of reduction
of the healthy worker effect. Journal of Occupational Medicine 1992; 34:979-988.
5. Gardner MJ and Altaian DG. Statistics with Confidence: confidence intervals and
statistical guidelines. BMJ Publications, 1989.
6. Gardner MJ and Altman DG. Confidence intervals rather than P values. BMJ
1986,292:746-50.
7. Greenland S and Robins JM. Confounding and misclassification. American Journal of
Epidemiology 1985; 122:495-506.
8. Miettinen OS. Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.
9. Monson RR. Occupational epidemiology, 2nd Ed. Boca Raton, Florida. CRC Press.
1990.
10. Robins JM. The control of confounding by intermediate variables. Stat Med 1989; 8:
679-701.
11. Rothman KJ, Greenland S. Modem Epidemiology, 2nd Edition. Philadelphia:

12. Savitz DA, Baron AE (1989). Estimating and correcting for confounder
misclassification. American Journal of Epidemiology 129:1062-1071.
13. Weinberg CR. Toward a clearer definition of confounding. American Journal of
Epidemiology 1993; 137: 1-8.

CHAPTER 7
Standardization and Adjustment
In epidemiology, most rates, such as incidence, prevalence, and mortality rate, are
strongly age-dependent and influenced greatly by the age structure differences. The
comparison of such crude rates over time and between different populations may be very
misleading if the underlying age composition differs in the populations being compared.
To compensate for this difference in age, we have two general options. First, we can
restrict comparison to similarly age subgroups (i.e., age specific comparison). However,
this can be confusing especially when many age strata exist and comparison are made
between many different populations. Therefore, combining age-specific rates to derive a
single age-independent index (single adjusted rate), may be more appropriate and
compensate for age differences in populations. The age adjustment (standardization) can
be achieved in several ways. In practice, the two most common approaches in use are the
direct and indirect weighting of strata-specific rates (direct and indirect standardization).
This chapter presents a brief description of the two methods of standardizations
(direct and indirect) when comparing epidemiologic or demographic events in different
populations, with simplified hypothetical examples. It also discusses the adjustment of
measures of association explaining the effect of confounding and interaction on these
measures, with simple and real clarifying examples.
Description of the Standardization methods:

1. Direct method:
In this method, we must:
i. know the age specific rates in the two populations under comparison. If not we must
use the indirect method.
ii. Borrow a reference population from elsewhere. This reference population may be
hypothetical population or may be either one of the two populations under comparison or
we may use the total of the compared populations.

Example: Suppose we have two populations A & B, the crude death rate for each of them
is more or less similar; 18/1000 for population A, and 17/1000 for populations B
(Table 7.1).
Table 7.1. Vital data of the two hypothetical populations A & B.
Age group Population A Population B
Categories No. of No. of Age specific No. of No. of Age specific
population deaths rate population deaths rate
0- 400 2 5/1000 600 6 10/1000
10- 600 3 5/1000 1000 10 10/1000
60- 1000 31 31/1000 400 18 45/1000
Total 2000 36 18/1000 2000 34 17/1000
From this table, we have the crude death rates of 18/1000 and 17/1000 for the
population A & B respectively. However, on looking carefully to this table, we observe
that the distribution (number) of population in each age group categories is not the same
in the two compared populations and this might possibly confound the calculated crude
death rates. To overcome this problem, we must adjust (standardize) the death rate by age,
which represents the confounding factor in this example. Because the age specific death
rates are known in this example for the two populations, we will use the direct method of
standardization. First, we borrow the population A to be the reference population and
then apply the number of population in each of its age group categories in the population
A to the age specific death rate of the corresponding age group category in the population
B.
The adjusted death rate in the population B is calculated using the following formula:
Σ (age specific death rate in B X No. of population in that age category in A
Total number of population in A
Σ = summation.
= 10/1000 X400 + 10/1000X600 + 45/1000X1000 / 2000 = 55/ 2000 = 27.5/1000

From this calculation, we concluded that the age-adjusted death rate in population B
(27.5/1000) is higher than that of its calculated crude rate (17/1000) and is also higher
than that of the population A (18/1000).
Note that, we can also use the population B as a reference population and
applying the number of population in each of its age group categories to the age
specific death rate in the population A (try to do it).
2. Indirect method
The indirect method of standardization is used when we have no data about the
age specific rate in one or the two populations being compared, but the number of
population in each age group should be known. In this condition, we borrow the age
specific death rates from a reference (standard) population and applying it to the number
of population in each corresponding age group of the compared populations to obtain
what is called the expected rates. Finally, we divide the observed rate by this calculated
expected rate and multiply by 100 to obtain the standardized ratio and it is known as
standardized mortality ratio (SMR) when we deal with deaths. The adjusted rate using
this indirect method is based on multiplying the crude rate in the study population by
SMR ratio. The formulas summarising this method are:
aR (indirect) = cR x SMR
SMR = O/E
E = ∑ Ri ni Where
aR : adjusted rate
cR : crude rate
SMR : Standardized mortality ratio.
O: observed number of events in the study population.
E : expected number of events in the study population.
∑ : summation.
Ri : the rate in ith stratum of the standard population.
Ni : the number of population in the ith stratum of the study population.
Example: We use the previous table (Table 7.1), without the number of deaths in each
age group categories and accordingly without any data about the age specific rates.

Table 7.2. Vital data of the hypothetical populations A & B, without age-specific rates.
Age group Population A Population B
Categories No. of No. of Age specific No. of No. of Age specific
population deaths rate population deaths rate
0- 400 ? ? 600 ? ?
10- 600 ? ? 1000 ? ?
60- 1000 ? ? 400 ? ?
Total 2000 36 18/1000 2000 34 17/1000
In this example, we must borrow a third standard (reference) population with a known
age specific death rate to calculate the SMR (Table 7.3).
Table 7.3. Age adjusted death rate of a hypothetical reference population.
Age group No. of No. of Age specific
categories population deaths death rate
0- 5000 15 3/1000
10- 20000 260 18/1000
60- 10000 500 50/1000
Using the data from this hypothetical table, we can calculate the SMR in the
population A & B as follow:
The expected death rate in population A = 3/1000 X 400 + 18/1000 X 600 + 50/1000 X
1000 = 1.2 + 10.8 + 50 = 62.
The SMR in population A = 36 / 62 X 100 = 58 %.
The expected death rate in population B = 3/1000 X 600 + 18/1000 X 1000 + 50/1000
X 400 = 1.8 + 18 + 20 = 39.8.
The SMR in population B = 34 /39.8 X 100 = 85 %.
Simply, since the SMR in population B (85%) is higher than that in population A
(58%), we can conclude that the risk of death is higher in population B. Similarly, when
we multiply the crude rate of each population by its measured SMR, we have the adjusted
rate of population B to be about 15/1000 higher than that of population A which is
calculated to be about 9/1000.

Note that, we can also use the age specific death rate of one of the two
compared population (if it is known) and apply it to the other to calculate the expected
death rate and accordingly the SMR. Then we can read the SMR as follow:
- If the SMR is more than 100, this means that more events (deaths) are occurring in
the population than expected.
- If the SMR is less than 100, this means that fewer events (deaths) are occurring in the
population than expected.
When we apply the age specific death rate of population A for the population B,
we found the SMR for the population B to be about 170%. This means that more
deaths are occurring in population B than would be expected or the death rate in
population B showing 1.7 fold increase than population A. The adjusted rate in
population B, using the above mentioned formula, is estimated to be about 29/1000.
Although these methods of standardization are in common practical use since the
middle of 19th century and help to give summary statements of unbiased comparisons,
these methods have been found from the literature to have a number of disadvantages that
include the choice of standard (reference) population. The direct standardization may
suffer from instability of its estimate particularly when the component rates are based on
small number of deaths. The use of indirect (SMR) method, however, produces a greater
numerical stability in such conditions. Furthermore, the calculation of these measures
(standardization) necessitates the hypothesis of constant rate ratios. This is, however, not
always satisfied in all the conditions particularly in presence of missing data.

Adjustment of measures of associations:
The measures of disease association in epidemiology include the prevalence ratio,
which is the ratio comparison of two prevalences, cumulative incidence ratio, incidence
rate ratio (relative risk), and disease odds ratio. Disease odds ratio provides an alternative
to the prevalence ratio and cumulative incidence ratio as a ratio of association when the
data represent proportions. In epidemiologic studies, much is done during study design
and data collection in order to avoid or at least reduce the selection and information bias.
It is also important to eliminate or reduce the confounding bias while doing the statistical
analysis of the collected data. Confounding is a distortion of an association between an
exposure and a disease by one or more extraneous factors. If recognized, measures of
association should be adjusted for the biasing effect of the confounders. Generally, to
recognize the presence of confounding effect of one or more extraneous factors, the
studied data should be first stratified by the extraneous factor(s) to derive strata-specific
rates. After stratification, adjustments are done with respect to the extraneous factor of
interest, so that the estimated measure of association can be obtained without
confounding by this factor.
The crude measures of association between exposure (E) and disease (D) may
be confounded by another extraneous factor(s) called confounding factor(s) (F). As
mentioned before, confounding means a distortion of an association between E and D
brought about by an extraneous factor F (or extraneous factors F1, F2, F3, etc).
Together with selection and information bias, the confounding bias forms the three
main pillars of systematic error (bias) that may damage the results of any
epidemiologic research (see Chapter 8).
To clarify the confounding (confusion) bias, the following real example is of
value to describe how confounding and interaction affect the estimated measures of
association. Suppose we found a positive association between lung cancer (D) and
alcohol consumption (E) in some of cancer epidemiologic studies. This association
might be confounded by cigarette smoking. Cigarette smoking (F) and E are associated
(alcohol consumers are more likely to smoke than non-consumers are), and F and D
are associated (smoking is an independent risk factor for lung cancer). In addition,
smoking does not involve in the causal chain of lung cancer. Therefore, if we have no

data about the smoking status of the studied subjects, this observed association may not
represent the true association between lung cancer and alcohol consumption because of
confusion bias. To judge if factor F confound the estimated measure of association, we
should stratify the data under the study. By stratification, we mean to stratify the
studied subjects into groups by the confounding factor(s). In this example, we stratify
subjects by smoking status (i.e., smokers and non smokers). Accordingly, three
scenarios (A, B and C) may occur when we use such stratification:
A. The estimated measure of association is the same among the stratified
groups by F, and the same as the crude measure. In this condition, neither
confounding nor interaction is suspected and we can directly present the
estimated crude measure of association (Table 7.4.).
B. The estimated measure of association is different among the stratified
groups by F, and from the crude measure. In this condition, an interaction
between F and E is suspected (effect modification), and an interaction test
(A chi-square test for interaction which is called test of heterogeneity)
should be used to confirm such effect (Table 7.5).
C. The estimated measure of association is the same among the stratified
groups by F, but not the same as the crude measure. In this condition, F
represents a confounding factor and it must be controlled (adjusted) during
statistical analysis to obtain the adjusted measure of association (Table 7.6).
It is pertinent here to say that before we use the methods of confounder
adjustment, the interaction between the two studied factors (E & F) must be ruled out.
Also, as a rule, if the adjusted risk alters the interpretation of the crude measure of
association, the adjustment is mandatory. But if, on the other hand, the adjustment does
not alter interpretation, the crude measure of association can be used.
In the scenario C, the crude (unadjusted) measure of association, the relative
risk (RR) of lung cancer associated with alcohol consumption is 4.0. However, when
we stratified the studied subjects by smoking status (i.e. F + and F -), we found the risk
to be the same among smokers (RR= 1.0) and non-smokers (RR= 1.0) but different
from the crude risk. This means that smoking status is a confounding factor for the
association between lung cancer and alcohol consumption, and the estimated true risk

(4.0) is not the true association. Therefore, we must control (adjust) this confounding
effect and we should present the resulting adjusted risk as the true measure of
association. We can obtain the adjusted risk by using either the regression analysis
models (multivariate regression analysis), or we can use the weighted average
measures based on the intra-strata variance estimates (Mantel-Haenszel method).
Explication of such methods of adjustments is beyond the scope of these notes.
In scenario B, the crude RR of lung cancer associated with alcohol consumption is
4.0. By stratification, however, we found the risk among smokers to be 1.9 which is
different from non-smokers (RR= 9.4) as well as from the crude risk. The observed
heterogeneity of the estimated measures of association (crude and strata) suggests that
smoking is not a confounding factor in this scenario, but it modifies the effect of
exposure (alcohol consumption) on the disease (lung cancer). This biological
phenomenon is called effect modification and is related to a statistical phenomenon called
interaction. Interaction refers to a difference in effect of one factor according to the level
of another factor and it always implies direct biological and public health relevance.
More details about interaction, effect modification and estimation of joint effects are
discussed in Chapter 8.
Table 7.4. Scenario A.
Total subjects Smokers (F +) Non-smokers (F -)
E+ E- E+ E- E+ E-
D+ 400 100 320 80 80 20
D- 1600 1900 480 720 1120 1180
Total 2000 2000 800 800 1200 1200
Relative risk (RR*) Crude RR = 4.0 RR (F+) = 4.0 RR (F-) = 4.0
RR* is the incidence rate among the exposed (E+) divided by the incidence rate among the non exposed (E-)
Table 7.5. Scenario B.

E+ E- E+ E- E+ E-
D+ 400 100 24 50 376 10
D- 1600 1900 376 1550 1224 390
Total 2000 2000 400 1600 1600 400

Table 7.6. Scenario C.
E+ E- E+ E- E+ E-
D+ 400 100 388 48 12 52
D- 1600 1900 1212 152 388 1748
Total 2000 2000 1600 200 400 1800
BIBLIOGRAPHY
1. Ahlbom A, Norell S. Introduction to modern epidemiology. 2nd Edition. Epidemiology
Resources Inc. Publication. 1990; P30-35.
3. Gerstman BB. Epidemiology kept simple: An introduction to modern epidemiology.
New York. A John Willy & Sons, Inc., Publication, 1998.
4. Mantel N, Haenszel, W. Statistical aspects in the analysis of data from retrospective
studies of diseases. Journal of National Cancer Institute 1959; 22:719-748.
5. Miettinen OS : Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.

CHAPTER 8
Interaction and Effect Modification
The concept of effect modification refers to a condition where the exposure-
associated effect on disease risk varies by the level of exposure of some other factor.
Modification of an effect measure can be examined in relation to constitutional factors
(e.g., gender, ethnicity, genotype), lifestyle (e.g., smoking, alcohol use, diet), or many
other personal characteristics (e.g., medical history). In addition, in studies of
workplaces with multiple exposures, one agent might be considered a modifier of the
effect of another. In the latter instance, a research objective is often the assessment of
joint effects.
Discussions of the topic of effect modification and estimation of joint effects are
often couched in terms of interaction, either on a population or individual biological basis.
We will address these issues from a practical epidemiologic perspective, where the
underlying principle is to examine how a measure of the effect of a given exposure might
vary in relation to the presence/absence or level of another factor. The only distinction
between examining effect modification and estimating joint effects is one of emphasis.
In the case of effect modification, the focus is on whether a measure of the effect of
interest is uniform across levels of another factor. As such, estimating risks associated
with the effect modifier itself are not of primary interest. For example, we might
consider the risk of lung cancer associated with alcohol drinking in cigarette smokers and
non-smokers; distinguishing the unique influence of smoking on lung cancer would only
be a secondary goal. To achieve this secondary goal, we examine the homogeneity of the
estimated measures across the levels of the modifier by the use of a chi-square interaction
test. When estimating joint effects, however, we typically are interested in the individual
and combined effects of two or more exposures; whether we have or not an interaction
(effect modification) between the studied exposures.
Effect Modification
The concepts of effect modification and confounding are quite distinct. It should
therefore be recognized that an effect modifier may or may not be a confounder. For
example, the distribution of smoking across exposure levels may be identical, yet the rate
ratio for the exposure effect may vary by smoking status. In this situation, smoking
would not be a confounder, but would be an effect modifier. The term statistical
interaction denotes a similar phenomenon for associations without causal connotation.
We will use the term, effect modification in the subsequent discussion. However, both

effect modification and statistical interaction are merely statistical concepts that depend
on the measure used. In fact, all secondary risk factors modify either the rate ratio or the
rate difference, and uniformity over one measure implies non-uniformity over the other.
Thus, for example, an apparent additive joint effect implies a departure from a
multiplicative model.
Determining whether or not a factor is an effect modifier requires estimating an
effect measure (e.g., relative risk) for the exposure of interest separately for each level of
the presumed effect modifier. When there are multiple possible effect modifiers, such as
age, ethnicity, gender, or any of lifestyle factors, effect modification is best examined for
each potential modifying variable separately. Prior selection of potential effect modifiers
of greatest interest can simplify the task. Then, assessing effect modification for a subset
of modifying variables might be carried out, with adjustment made for other variables.
To illustrate, consider a study of lung cancer and occupational exposures to x-rays in
which there are data on age, sex and smoking. We would start an assessment of effect
modification by investigating associations of x-rays exposure with lung cancer, with
stratification made by age category, and then perform similar analyses by smoking status
and age. Let us assume that the exposure effect on disease varies most by smoking; we
would then repeat the analysis for smoker and non smoker persons, while adjusting for
age and sex.
There are formal statistical tests of heterogeneity of effect across levels of another
factor from which one might infer whether there is evidence for effect modification.
However, concluding that there is no effect modification because the p-value is high can
be misleading. Most epidemiologic studies are not designed to examine effect
modification, and as such, may have inadequate study sizes within strata of an effect
modifier(s) to permit a meaningful statistical interpretation. Presentation of stratum-
specific effect estimates and their confidence intervals usually will help give a picture of
whether the data allow any inference about effect modification. However, formal
statistical tests may be warranted in situations where prior knowledge suggests likely
forms of effect modification (e.g., a harmful effect would only be anticipated among
smokers) and the study is intentionally designed to accommodate an analysis of effect
modification (e.g., sufficient numbers of smokers and non-smokers are selected).

Joint Effects
Two or more factors may combine to produce an effect that differs from what
would be anticipated by simply adding their individual effects. Before considering the
analytical strategy for investigating joint effects, it is worthwhile depicting some
exposure scenarios that might motivate estimation of joint effects. In the interest of
clarity, we will limit our discussion to the simplest case to two exposures, each present or
absent.
The effects of two exposures can range, in a qualitative sense, from effect reversal
(profound antagonism) to very strong amplification (synergy). When expressed
quantitatively, joint effects are usually described in reference to either an additive or
multiplicative model, where the joint effect may span from less or greater than additive or
multiplicative model. The relative risks for two factors, each either present or absent (the
simplest case), under the additive and multiplicative models are, respectively:
RRadd = RR10 + RR01 – 1
RRmult = RR10 X RR01
Where: RRadd is the additive joint effect relative risk; RRmult is the multiplicative model
relative risk; RR10 is the independent relative risk for the first factor; and RR01 is the
independent relative risk for the second factor. Note that we are expressing all joint
effects in terms of relative risk.
The ability to assess effect modification and estimating the joint effects of two or
more factors in epidemiologic studies helps to shed light on the underlying biological
mechanisms. An example of an interesting effect modification was reported. On
epidemiologic basis, the effect modification is suspected when the observed joint effect
of the two studied factors is more or less than the predicted joint effect in the additive
model (RR11> RR10 X RR01 – 1; or RR11< RR10 + RR01 – 1) or in the multiplicative
model (RR11> RR10 X RR01; or RR11< RR10 X RR01) indicating the departure from
the model and consequently may suggest the presence of a biological and/or public heath
interaction.
The most readily understood situation is when there are two agents, each of which
can increase or decrease risk for the outcome of interest. For example, in a recent
population-bases case-control study conducted in Canada to examine the effect of

lifestyle factors on the risk of adult leukemia (Kasim et al., 2005). Consistent with
leukemia literature, the authors found an increased risk of acute myeloid leukemia
associated with cigarette smoking, particularly among smokers. The estimated odds ratio
was 1.43 (95% confidence interval (CI)= 1.03-1.80). This study also found the risk of
AML to increase significantly among obese subjects with body mass index (BMI) ≥ 30
kg/m2 (OR= 1.58; 95% CI= 1.15-2.20). When the authors examined the joint effect of
smoking and obesity on the risk of AML, however, they found the positive association
between AML and heavy smoking to be absent among obese subjects (OR= 0.70; 95%
CI= 0.40-1.40). When the combined effects of these two factors on the risk of AML is
estimated, it was less than would be expected on the additive scale by about 68%,
indicating risk difference effect modification. On biological basis, the authors explained
this surprising finding by the fact that benzene in tobacco smoke is considered to be one
of the most important leukemogen. Benzene exerts its leukemogenic effect through a
number of benzene metabolites produced by the action of hepatic cytochrome P-450
enzymes. Most obese subjects, however, are known to have fatty liver with subsequent
inflammation and injury (fatty liver hepatitis). Such hepatic disorders associated with
obesity can alter the activity of many metabolic enzymes, including the cytochrome P-
450 enzyme activities that are involved in benzene metabolism.
In this real example, if there were sufficiently large numbers of subjects, the level
of exposure for each factor might be considered (e.g., combinations of mild, moderate,
and heavy smokers; and overweight and obese categories). Even more complicated
exposure situations could be examined, but the precision (stability) of estimated
associations for each configuration will diminish as the number of possibilities increases
and hence the number of subjects in each level category is decreased. In practice,
epidemiologists tend to restrict such analyses to the simplest cases.
BIBLIOGRAPHY
1. Miettinen OS : Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.

2. Khaled Kasim, Patrick Levallois, Belkacem Abdous, Pierre Auger, Kenneth C.
Johnson. Lifestyle factors and the risk of adult leukemia in Canada. Cancer Causes and
Control 2005; 16:489-500.

CHAPTER 9
Meta-Analysis
Meta-analysis is a relatively new form of research that was uncommon in the
literature in the early 1980s but is increasingly being used. Meta-analysis can be defined
as the process of combining information from several studies that address the same
question to produce a single estimate. When attempting to answer an important question,
such as the value of some new therapy, clinicians informally combine pieces of
knowledge that come from various sources. Meta-analysis is a systematic, quantitative
approach to this combination process. As such, it is an essential tool for evidence-based
medicine. Meta analysis may be used to decide the best clinical approach to a problem
based on several related studies, to scrutinize studies to explain why research results
differ, and to identify new directions for research.
This chapter focuses on a real example (Glasziou and Mackerras, 1993) that
includes five different epidemiological studies that measure the role of vitamin A
supplementation in infectious disease. For the sake of simplicity, we assume that the
comparison of interest in all these five studies is between two groups of patients
randomly allocated to vitamin A supplementation (treatment arm) or a control treatment
(control arm). We further assume that the arms are being compared with respect to their
effect on some untoward event, the patient’s death.
Basic methods in conducting a meta-analysis:

Meta-analysis includes both the qualitative and quantitative techniques. The
qualitative technique aims to developing the research question to be analyzed, reviewing
comprehensively the literature for relevant studies, selecting an outcome variable
common to the studies, and evaluating the studies to identify similarities or to explain
differences among them.
The quantitative aspects of meta-analysis employ a variety of statistical tools.
There are two stages in meta-analysis. First, we check that the studies are similar enough
to be legitimately compared. This step requires the use of a statistical test of homogeneity.
Heterogeneous results make meta-analysis difficult to be carried out. Second, we
calculate the common estimate and its confidence interval. To do this we may have the

original data from all the studies, which we can combine into one large data file with
study as one of the variables, or we may only have summary statistics obtained from
publications.
From the statistical point of view, meta-analysis is a straightforward application of
multifactorial methods. We have several studies that address the same question, which
might be clinical trials or epidemiological studies and perhaps carried out in different
countries. Each of study gives us an estimate of an effect. We assume that these estimates
are of the same global population value and that the comparison of interest in all studies
is between two groups of patients (treatment and control groups) that are randomly
allocated. If these assumptions are satisfied, we combine the separate study estimates to
make a common estimate. This is a multifactorial analysis, where the treatment or risk
factor is one predictor variable and the study is another, categorical, predictor variable.
If the outcome measure is continuous, such as mean fall in blood pressure, we can
check that subjects are from the same population by analysis of variance, with treatment
or risk factor, study, and interaction between them in the model. Multiple regression
analysis can also be used, remembering that study is a categorical variable and dummy
variables are required. We test the treatment times study interaction in the usual way. If
the interaction is significant this indicates that the treatment effect (outcome) is not the
same in all studies, and so we cannot combine the studies, and we cannot estimate a
single treatment effect. However, we can think of these studies a random sample of the
possible trials and estimate the mean treatment effect for this population. This is called
the random effects model. The confidence interval is usually much wider than that
found using the fixed effect model. If there is no interaction, then the data are consistent
with the treatment or risk factor effect being constant. This is called a fixed effects model.
We can drop the interaction term from the model and the treatment or risk factor effect is
then the estimate we want (Greenland, 1998).
If the outcome measure, on the other hand, is dichotomous (two categories), such
as survived or died, the estimate of the treatment or risk factor effect will be in the form
of an odds ratio. We can proceed in the same way as for a continuous outcome, using
logistic regression analysis. Several other methods exist for checking the homogeneity of
the odds ratios across studies, such as Wolf's test or that of Breslow and Day (1980).

Provided the odds ratios are homogeneous across studies, we can then estimate the
common odds ratio. This can be done using the Mantel-Haenszel method or by logistic
regression analysis.
A real example:
Glasziou and Mackerras (1993) carried out a meta-analysis of the role of vitamin A
supplementation in infectious disease. Table 1 shows the basic data needed for each study
in order to perform a meta-analysis. Essentially, these data consist of the number of
patients randomized and the number of untoward events (deaths) in the treatment and
control arm of each study.
Table 9. 1. Data of the five studies involved in the real example.
Vitamin A group Control group
Study Dose Regime Deaths Number Deaths Number
1 200,000 IU Six-monthly 101 12,991 130 12,209
2 200,000 IU Six-monthly 39 7,076 41 7,006
3 8,333 IU Weekly 37 7,764 80 7,755
4 200,000 IU Four-monthly 152 12,541 210 12,264
5 200,000 IU Once 138 3,786 167 3,411
The calculated odds ratios (ORs) and their 95% confidence interval of the above
mentioned five studies are presented in table 9.2.
Table 9.2. Study Odds ratio and their 95% confidence intervals
Study OR 95% CI
1 0.73 0.56-0.95
2 0.94 0.61-1.46
3 0.46 0.31-0.68
4 0.70 0.57-0.87
5 0.73 0.58-0.93
From this real example, the common odds ratio can be found in several ways. To
use logistic regression, we regress the event of death on vitamin A treatment and study.
We treat the treatment as a dichotomous variable, set to 1 if treated with vitamin A, 0 if
control. Further, we treat the studies included as a categorical variable, so we create

dummy variables, study 1 to study 4, which are set to one for studies 1 to 4 respectively,
and to zero otherwise (i.e., stratification by study). We test the interaction by creating
another set of variables, the products of study 1 to study 4 and vitamin A. Logistic
regression model with interaction (i.e., the model with vitamin A, study and interaction
between them gives a chi-squared statistic for the model of 496.99 with 9 degrees of
freedom, which is highly significant. On the other hand, the logistic regression model
without the interaction terms (the model that includes only vitamin A and study) gives
490.33 with 5 degrees of freedom. The difference is 496.99 - 490.33 = 6.66 with 9 – 5 =
4 degrees of freedom, which has P value of 0.15. This indicates the absence of interaction
and hence the absence of heterogeneity among the studied odds ratios, and we can drop
the interaction terms from the model. The adjusted odds ratio for vitamin A is 0.70, 95%
confidence interval 0.62 to 0.79, with P value of less than 0.0001.
Graphical Presentation
The results of meta-analysis can also be presented in graphical form. The odds
ratios and their confidence intervals of the real example are shown in the figure below.
The confidence interval is indicated by a line and the odds ratio is indicated by the middle
of a square. The area of the square is proportional to the number of subjects in the study
that makes study 2, with the widest confidence interval, to be relatively unimportant and
makes the overall estimate stand out.

Strengths of meta-analysis:
Using information from all of the available studies in the literature seems to be in
accordance with good scientific practice. The majority of randomized clinical trials
reported in oncology are too small to supply definitive answers to the questions of
clinical interest. The results of individual studies are often inconclusive. Thus, a meta-
analysis of all of the available studies may be the only way to achieve large enough
numbers of patients so that the power of the statistical tests ceases to be a limiting factor
and so that the effect estimate of the studied factor is precise.
Moreover, the results of individual studies are inevitably subject to random error:
The smaller the number of events in a trial, the larger the random error is likely to be. The
best way to reduce the impact of possible biases is to consider the results of all of the
related studies because it is unlikely that they are equally affected by the same sources of
bias.
The large number of patients typically available in meta-analyses makes it possible
to look at questions that cannot be reliably addressed in individual trials. For example,
meta-analyses can test whether the effect of a treatment depends on some patient
characteristics, or whether it is roughly constant regardless of all characteristics. This
question is often misleadingly addressed in individual studies by looking at the effect of a
treatment within patient subgroups. Finally, since individual trials are usually sized to
detect the main treatment effect, they are likely to miss real interactions, whereas a meta-
analysis may have sufficient statistical power to reveal them.
Limitations of meta-analysis:
The main problems of meta-analysis arise even before we begin the analysis of the
data. First, we must have a clear definition of the question so that we only include studies
which address this. For example, if we want to know whether lowering serum cholesterol
reduces mortality from coronary artery disease, we would not want to include a study
where the attempt to lower cholesterol failed. On the other hand, if we ask whether
dietary advice lowers mortality, we would include such a study. Which studies we
include may have a profound influence on the conclusions. Second, we must have all the
relevant studies. A simple literature search is not enough. Not all studies which have been

started are published; studies which produce significant differences are more likely to be
published than those which do not. Within a study, results which are significant may be
emphasized and parts of the data which produce no differences may be ignored by the
investigators as uninteresting. Publication of unfavorable results may be discouraged by
the sponsors of research. Researchers who are not native English speakers may feel that
publication in the English language literature is more prestigious as it will reach a wider
audience, and so try there first, only publishing in their own language if they cannot
publish in English. The English language literature may thus contain more positive
results than do other literatures. The phenomenon by which significant and positive
results are more likely to be reported than non-significant and negative ones is called
(publication bias). In addition to the above mentioned limitations, some heterogeneity
may be expected in the individual study estimates. The finding heterogeneity makes
meta-analysis difficult to be carried out, but still we can use the random-effects models to
overcome this problem.
Meta meta-analysis:
Meta met-analysis means the evaluation of meta-analysis. To evaluate any meta
analysis study, the following criteria should be considered:
i. Methods of research: How did the researcher identify the relevant studies (i.e.,
was a computerized search or a review of citations of other review articles used?)
How complete was the search?
ii. Eligibility criteria: How did the meta-analysis decide which pieces of research to
include in their evaluation?
iii. Number of studies: How many research studies were selected for analysis?
iv. Outcome variable: Which variables were selected from each article to be included
in the statistical analysis? Each original article may provide several outcome
variables, not all of which may be included in the other articles or be of particular
interest of the meta-analysis.
v. Study design: What were the specific types of studies included in the analysis, and
how many of each type were used? Study designs may include different types of
studies such as case-control, prospective, etc.

vi. Results used in combining studies: Were the data from all studies pooled to arrive
at the conclusion or were the actual results of each study analyzed?
vii. Test of homogeneity: Were the risk estimates from all included studies examined
by the test of homogeneity? What is the result of such test?
viii. Statistical methods used: To enable the reader to verify the results of the analysis,
the researchers should indicate the specific statistical methods and the specific
computer program used for the analysis.
BIBLIOGRAPHY
1. Armitage P and Berry G. Statistical Methods in Medical Research, 3rd Ed. Blackwell,
Oxford, 1994.
2. Breslow NE. and Day NE. Statistical methods in cancer research. Volume II. the
design and analysis of cohort studies IARC, Lyon, 1980.
3. Buyse M, Piedbois P, Carlson RW. Meta-analyses based on published data are
unreliable (letter). J Clin Oncol 17:1646-1647, 1999.
4. DerSimonian R and Laird N. Meta-analysis in clinical trials. Control Clin Trials 7:177-
188, 1986.
5. Easterbrook PJ, Berlin JA, Gopalan R, and Mathews DR. Publication bias in clinical
research. Lancet 337 867-72, 1991.
6. Glasziou PP and Mackerras DEM: Vitamin A supplementation in infectious disease: a
meta-analysis. British Medical Journal 306 366-70, 1993.
7. Halvorsen KT. Combining results from independent anvestigations: meta-analysis in
medical research. In medical uses of statistics. Bailer JC et al., editors. NEJM Books.
1986.
8. Jones D. Meta-analysis: Weighing the evidence. Stat Med 14:137-149, 1995.
9. Thompson SG. Controversies in meta-analysis: the case of the trials of serum
cholesterol reduction. Statistical methods in medical research 2 173-92, 1993.

CHAPTER 10
Screening
Screening is the initial examination of an individual to detect disease not yet under
medical care. Screening may be concerned with different types of diseases with the
purpose to separate healthy individuals into groups with either a high or low probability
of developing the disease for which the screening test is being used. If screening is
concerned with a single disease, it is called single phasic screening; and if screening is
concerned with many diseases, it is called multiphasic screening. In addition, screening
tests may be classified according to its aim to diagnostic testing and treatment testing.
The former is designed to test apparently healthy individuals to detect cases or those at
high risk to develop a suspected disease while the latter is undertaken to evaluate the
patient’s response to, and the effectiveness of, therapy for a disease.
Criteria for effective screening

Screening program is a major focus of efforts to promote health and prevent
disease. To be effective, however, the disease under investigation should fulfill the
following criteria:
1. The disease should have a considerable prevalence among the population
being screened.
2. The disease should be of sufficient concern to the community being
screened (i.e., it should have a public health significance).
3. The disease should have a treatment.
4. The disease should have a preclinical period.
In addition, it is of great importance to assure follow up evaluation for the screened
individuals showing positive test.
Screening test characteristics

i. The test should be highly sensitive and specific (i.e., it should be valid and
accurate).
ii. The test should be acceptable to a large number of individuals.
iii. The test should be simple (i.e., it should be accomplished easily and quickly.

iv. The test should be harmless to the individual being screened.
v. The test should be relatively inexpensive
vi. The test should be reliable (i.e., it gives the consistent results on repeating the
test).
Screening test parameters

Screening test parameters are essential to measure the usefulness of the test. These
parameters are the following: sensitivity, specificity, positive predictive value positive
(PPV), and negative predictive value (NPV). To determine these parameters, a fourfold
(2X2) table should be used (table 11.1).
Table 11.1 Results of screening test foe a disease in a population with known disease
status.
Diagnosis
Screening test Diseased Not diseases Total
Positive a b a+b
Negative c d c+d
Total a+c b+d N
a = the number of diseased persons with positive screening test (the true-positive).
b = the number of persons not diseased but with positive screening test (the false-positive).
c = the number of diseased persons but with negative screening test (the false-negative).
d = the number of diseased persons with negative screening test (the true-negative).
N = the total number of persons (a + b + c + d).
i. Sensitivity
Sensitivity is defined as the ability of the test to identify correctly those individuals
having the disease. Sensitivity is independent of the disease prevalence in the population
being tested. Sensitivity represents the ratio of the number of individuals with the disease
whose screening tests are positive to the total number of individuals with the disease
under the study and is usually is expressed as a percentage. According to the fourfold
table, the sensitivity of the test is determined as:
Sensitivity (%) = a / a + c x 100

ii. Specificity
Specificity is defined as the ability of the test to identify correctly those
individuals not having the disease. Specificity is independent of the disease prevalence in
the population being tested. Specificity represents the ratio of the number of individuals
without the disease whose screening test is negative to the total number of individuals
without the disease under the study and is usually is expressed as a percentage.
According to the fourfold table, the specificity of the test is determined as:
Specificity (%) = b / b + d x 100
iii. Positive predictive value
The positive predictive value of the test represents the ability of the test to identify
those individuals who truly have the disease (true-positives) among all individuals whose
screening tests are positive. It is the ratio of the number of individuals with the disease
whose screening tests are positive to the total number whose screening tests are positive
and is usually expressed as a percentage. The positive predictive value is affected by
disease prevalence and is increased by increasing the prevalence of the disease.
According to the fourfold table, the positive predictive value of the test (PPV) of the test
is determined as:
PPV (%) = a / a + b x 100
iv. Negative predictive value
The negative predictive value of the test represents the ability of the test to identify
those individuals who truly have not the disease (true-negatives) among all individuals
whose screening tests are negative. It is the ratio of the number of individuals without the
disease whose screening tests are negative to the total number whose screening tests are
negative and is usually expressed as a percentage. The negative predictive value is
affected by disease prevalence and is decreased by increasing the prevalence of the
disease. According to the fourfold table, the negative predictive value of the test (NPV)
of the test is determined as:
NPV (%) = d / c + d x 100
It is pertinent here to note that both PPV and NPV are determined according to the
results of a subsequent confirmatory test. Finally, it is of great importance to point that

the probability of disease if the test is positive is related to the sensitivity and specificity
of the test and to the prevalence of the disease under the study in general population from
which the individuals being screened came. This conditional probability of a disease for
an individual with positive test is called Bayes’ theorem. The probability of disease
given a positive test is called posterior probability and is increased by increasing disease
prevalence in the population, while the probability of disease prior to knowing a
screening test result is called prior probability. Bayes’ theorem is most useful in clinical
decision analysis and is being used in computer-assisted medical diagnoses.
Screening in clinical practice

Screening patients for preclinical disease is an established part of day to day
medical practice. Routine recording of blood pressure, urine testing, and preoperative
chest radiography may all be regarded as screening activities. Increasingly, screening is
now being extended to people who have not themselves requested medical aid. For
example, general practitioners invite patients who would not otherwise be attending the
surgery to undergo tests such as cholesterol measurement and cervical cytology. This
places the physician in a different role, and there is a special obligation to ensure that
such screening is beneficial. To this end the following three questions must be answered,
for which epidemiological data are required.
Question I
Does earlier treatment improve the prognosis?
Lung cancers detected at an early stage in their development are more likely to be
surgically respectable. Moreover, it is possible to identify such tumors when they are still
asymptomatic by chest radiography and sputum cytology. However, a large study in the
United States failed to demonstrate any clear reduction in mortality from lung cancer
among heavy smokers who were offered four monthly screening by radiography and
sputum cytology, despite the fact that more respectable tumors were detected in the
screened population. As this example shows, the outcome of screening must be judged in
terms of its effect on mortality or illness, and not simply by the number and severity of
cases identified. Assessing the benefits of early treatment is not always easy. One
potential source of error is the phenomenon known as lead time.

Suppose that we wish to explore the scope for reducing mortality from breast
cancer by early diagnosis. One approach might be to compare the survival of patients
whose tumors were detected at screening with that of women who only present once their
disease has become symptomatic. However, this could be misleading. Survival might be
longer in the screened women not because early treatment is beneficial, but simply
because their tumors are being diagnosed earlier in the natural history of their disease
(figure 11.1).
Figure 11.1 Lead time and screening.
Lead time with screening (a) disease is diagnosed earlier than without screening (b) and survival is
longer from diagnosis, but this does not necessarily imply that the time course of the disease has been
modified).
A further difficulty in comparisons of survival is that, apart from any effects of

treatment, cases detected at screening tend to be more slowly progressive. Patients with
aggressive disease are more likely to develop symptoms in the intervals between
screening examinations and therefore present spontaneously. Outcome is best assessed by
systematically comparing the morbidity and mortality of a screened population with that
of controls. Moreover, because people who attend for screening may have a different
incidence of disease from those who do not, it is important to measure outcome in all of
the population selected for screening and not only in those members who actually
undergo investigation. Women from social classes IV and V have the highest rates of
cervical cancer but the lowest uptake of cervical cytology. Thus an analysis restricted to
women undergoing cervical screening would tend to indicate lower mortality even if in
fact there was no advantage in early treatment.

Question II
Is a satisfactory screening test available?
Even if prognosis is improved by early treatment, screening is only worthwhile if a
satisfactory diagnostic test is available. The test must detect cases in sufficient numbers
and at acceptable cost, and it must not carry side effects that outweigh the benefits of
screening. Because a screening test must be inexpensive and easy to perform, it is not
usually the most valid diagnostic method for a disease. In screening, therefore, it has to
be accepted that some cases will remain undetected. As with all diagnostic tests, there is a
trade off between sensitivity and specificity, and the competing needs for each must be
balanced.
In addition to its sensitivity and specificity, the performance of a test is measured
by its predictive value. The predictive value of a positive result is the probability that a
person who reacts positively to the test actually has the disease. Predictive value varies
with the prevalence of disease in the population to whom the test is applied. If the
prevalence is low then there are more false positive results than true positives, and
predictive value falls. At the extreme, if nobody has the disease then the predictive value
will be zero and all positive test results will be false positives. It follows that a test that
functions well in normal clinical practice will not necessarily be useful for screening
purposes. Sputum cytology has quite a high positive predictive value for bronchial
carcinoma in patients presenting with haemoptysis, but if it is used to screen
asymptomatic people most positive results will be false.
Because the average benefit to the individual from a screening program is usually
much smaller than from interventions in response to symptoms, screening tests need to be
safer than those used in normal clinical practice. The radiation dose from a chest x ray
examination is small, but if the investigation forms part of a screening program for
tuberculosis, then even the very small risk of complications may outweigh the benefits of
early diagnosis. As the prevalence of pulmonary tuberculosis in the general population
has declined, so mass radiographic screening has ceased to be justifiable.

Question III
What are the yields of the screening service?
The yield of a screening service is measured by the number of cases identified
whose prognosis is improved as a result of their early detection. This must be related to
the total number of tests performed. Theoretically, the yields of screening may be
improved by restricting it to high risk groups, as has been suggested in the screening of
infants for developmental and other abnormalities. But identifying relatively small high
risk groups among whom most cases will be found is rarely feasible. If uptake of a
screening procedure is low then yield will be correspondingly limited.
Ultimately the yields of a screening service have to be balanced against the costs,
in terms of staff and facilities, of screening and making the confirmatory diagnoses. For
breast cancer screening it has been found that identifying one case requires examining
170 women by palpation and mammography and taking nine biopsy specimens.
BIBLIOGRAPHY
1. Armitage P and Berry G. Statistical Methods in Medical Research, 3rd Ed. Blackwell,
Oxford, 1994.
2. Breslow NE. and Day NE. Statistical methods in cancer research. Volume II. the
design and analysis of cohort studies IARC, Lyon, 1980.
3. Gerstman BB. Epidemiology kept simple: An introduction to modern epidemiology.
New York. A John Willy & Sons, Inc., Publication, 1998.
4. Rothman KJ, Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:

CHAPTER 11
Epidemiologic Surveillance
Epidemiologic surveillance dates back to the time of John Graunt, who published
the Natural and Political Observations Made upon the Bills of Mortality in 1662.
Graunt’s approach for the analysis of death certificates (Bills of Mortality), that volumes
of data should be reduced to a few tables and that profit may be gained by analyzing
these tables, is consistent with the modern technique of population-based epidemiologic
surveillance.
In the subsequent 300 years, the focus of health research shifted to sample-based
studies: cross-sectional, cohort and case-control studies, and clinical trials. In recent
decades, however, awareness of the limitations of sample-based epidemiologic studies
has grown along with recognition of the importance of population-based surveillance
systems for measuring the health status of a population, for early warning of emerging
health risks and for program development. At the same time, biophysical and socio-
economic data have become of great importance in the understanding of relationships
among human health, risk factors and interventions. From these points of views, it
appears that epidemiologic surveillance may become the focus of the ongoing health
researches using such well maintained and well validated surveillance databases.
Definition of surveillance
Epidemiologic surveillance is defined as the ongoing systematic collection,
recording, analysis, interpretation, and dissemination of data reflecting the current health
status of a community or population. The scope of epidemiologic surveillance has
evolved from an initial focus on infectious disease monitoring and intervention to a more
inclusive scope that influences health status that includes chronic diseases, injuries,
environmental exposures, and social factors.
Surveillance of an epidemic requires a very specific definition of what constitutes
a case that can be counted. The number of suspected cases, probable cases, and
confirmed cases of a disease are actively sought and monitored. The number of cases, and
the relationship between cases, is used during an outbreak investigation in an attempt to
identify causes and those at risk, and to implement an intervention. Surveillance does not

itself constitute investigation, research, risk management or evaluation, although it makes
a significant contribution of information that is essential to all of these. Surveillance may,
for example, generate hypotheses, which may later be tested by other methods.
Purpose of surveillance:
The most familiar purpose for surveillance is the identification, as rapidly as
possible, of unusual events, outbreaks of disease and emerging health issues. It is worth
noting that, although high quality surveillance data are always desirable, for these “early
warning” purposes, a balance must be struck between timeliness and high levels of
validity. Another significant role for surveillance is to inform decisions governing the
management of risks to health. This may involve public health programs, regulatory
action or public policy responses, all of which are exercises in evidence-based decision
making, with surveillance being one important source of evidence.
Merely monitoring the current status of disease prevalence, health indicators, or
social markers does not protect the health of a community. Careful monitoring, however,
creates a baseline measurement of threats to the public's health. It is this established
baseline that enables public health workers to notice when an anomaly occurs. A sharp
increase in the number of cases of a disease will instigate further investigation,
intervention, and prevention measures.
Types of surveillance:
Surveillance is based on both passive and active data collection processes. When a
clinician or laboratory encounters a patient or sample indicating the presence of certain
conditions or pathogens, there is a legal obligation to report the case to local public health
officials. The result is a passive monitoring of the levels of the disease in the community.
Active surveillance, on the other hand, is commonly referred to as "case finding." This
occurs when the data necessary to monitor levels of a medical or social condition is
sought out actively. This is accomplished through a variety of means, ranging from
clinical record reviews to community surveys.

Steps of epidemiologic surveillance
1. Data collection and recording
Epidemiologic surveillance uses a wide variety of data sources, depending upon
the circumstance under investigation. For communicable diseases, local and state health
departments typically rely on passive reporting. Other sources of data for epidemic
surveillance include birth and death certificates; sentinel surveillance sites (i.e., the use of
community-based health or occupational sites to monitor for specific health events);
cancer, birth defects, and other registries; health interview surveys; and hospital or
ambulatory care data collection systems.
The features that distinguish surveillance from other forms of health investigation
are that data are collected routinely, frequently or continuously, and they are generated
from the entire population or, less frequently, from a representative sample.
In infectious disease surveillance, as well as in chronic disease surveillance, it is
essential to clearly define the cases. Additionally, in chronic disease surveillance, it is
better to define the cases according to the “International classification of diseases”, and in
cancer cases, it is also important to confirm the diagnosis by histopathological
examination. From these points of view, the following definitions are very important to
be known:
Case: A person who meets the case definition.
Case definition: A set diagnostic criteria that must be fulfilled to be regarded as case of a
particular disease. Case definition can be based on clinical and/or laboratory criteria or a
combination of the two.
Suspect case: A case that is classified as suspected, usually on clinical basis for reporting
purposes.
Probable case: A case that is classified as probable on clinical plus either
epidemiological or laboratory basis for reporting purposes.
Confirmed case: A case that is classified usually on laboratory basis as confirmed for
reporting purposes. In Cancer, the confirmation of diagnosis is only by histopathology.
Epidemiologically-linked case: A case in which the patient has had contact with one or
more persons who either have/had the disease or have been exposed to a point source of
infection.

Laboratory linked-case: A case that is confirmed by one or more of the laboratory
methods listed in the case definition under Laboratory Criteria for Diseases. Although
other laboratory methods can be used in clinical diagnosis, only those listed are accepted
as laboratory confirmation for national reporting purposes.
2. Data analysis and interpretation

The process of surveillance includes not only data collection, but also integration,
analysis and interpretation to produce a "surveillance product" for a specific public health
purpose or policy objective, as well as the dissemination of that product to those who
need to know.
3. Information dissemination
The ongoing and timely information dissemination system help to alert health
professionals and the general public about forthcoming health risks (e.g. risk assessment)
and to put our current knowledge of risk assessment and management into perspective so
the general public knows what health risks to avoid (e.g. publication of “Handbook of
Health Risks”) and what healthy activities to pursue (e.g. publication of “Handbook of
Healthy Practices”)
4. Public health practice

Surveillance information is actually utilized for the development and evaluation of
programs and policies and to increase the impact of surveillance activities on society.
5. Computer Technology
The automated search and linkage techniques are essential to retrieve information
from a vast array of data. Also automated data analysis systems is essential to produce
early warning signals for health and risk factor trends
Developing a good surveillance system

The following activities are needed for setting up a good, comprehensive and long-
term surveillance system:

i. Conducting a series of round table discussion sessions to identify the
purposes and priorities for the comprehensive surveillance system.
ii. Conducting an extensive literature review and literature survey to
identify valid, reliable indicators of the biophysical and socio-
economic environments and health outcomes; using meta-analysis to
prioritize risk variables based on relative risks and attributable risks.
iii. Conducting a series of Delphi surveys (an initial survey to acquire
indicators and subsequent surveys to rank indicators) among cross-
disciplinary teams of experts to identify the set of indicator measures
favored by the experts for each of the health, risk and intervention
areas.
iv. Conducting a series of experts' consensus workshops to refine the set
of indicators and to develop ground rules and working definitions for
the early warning and program development system for the chosen
health outcomes.
v. Determining the availability of existing databases for these indicators,
how to access such databases and how multiple databases can become
part of a comprehensive surveillance system.
vi. Evaluating the quality and developing methods for improving the
quality of such existing databases.
vii. Identifying gaps in data availability and developing methods for
collecting additional information for the surveillance system.
viii. Repeating steps 2-4 to identify rank and refine the set of statistics to be
generated from the surveillance system.
ix. Repeating steps 2-4 to identify rank and refine the methods of using
surveillance data for public health.
The surveillance systems are now established in many countries not only for
infectious diseases but also for chronic disease. In Canada, for example, in addition to the
well developed surveillance system for most of the infectious and chronic diseases, the
national enhanced cancer surveillance system (NECSS) represents a good example of

cancer surveillance. This system collects data from eight Canadian provinces about
cancer cases. The system aims to describe the trends in cancer incidence all over Canada
and try to study various risk factors associated with cancer development, particularly
those related to lifestyle and environmental exposures. Using the data collected by
NECSS, many epidemiologic studies about risk factors associated with different types of
cancer have been conducted and published. In Egypt, the use of solid surveillance system
for poliomyelitis disease, the Egyptian ministry of health (MOH) has succeeded to
eradicate this dangerous disease from Egypt, This eradication has been declared by world
health organization (WHO) in the year 2006. Similarly, the Egyptian MOH has also
established surveillance system for many infectious diseases of public health importance.
Nowadays, the challenge which faces the Egyptian MOH is the establishment of a good
and effective surveillance system for chronic diseases, particularly for cancer, diabetes
mellitus, hypertension and ischemic heart diseases.
BIBLIOGRAPHY
1. Friis, R., and Sellers, T. (1996). Epidemiology for Public Health Practice. Gaithersburg,
MD: Aspen Publishers, Inc.
2. Gregg, M. (1996). Field Epidemiology. New York: Oxford University Press.
3. Jones J, Hunder D. Consensus methods for medical and health services research. Br
Med J 1995;311:376-80.
4. Lomas J. Research and evidence-based decision making. Aust NZ J Public Health
1997;21:439-40. 22.
5. McNeil D. Epidemiological research methods. New York: John Wiley & Sons, 1996
6. Teutsch, S., and Churchill, R. (1994). Principles and Practice of Public Health
Surveillance. New York: Oxford University Press.

CHAPTER 12
Writing a Medical Research Paper
Scientific medical papers are reports on research findings which have scientific
value. Researchers frequently communicate the results of their work in research reports to
tell others what study they performed why they did it, what they discovered, and what it
means. An objective of organizing a research paper is to allow people to read your work
selectively. The reader may be interested in just the methods, a specific result, the
interpretation, or perhaps just wants to see a summary of the paper to determine if it is
relevant to his/her study. Regardless of the specific discipline involved, the outline of all
research reports should contain the following sections: Title Page, Abstract, Introduction,
Methods and Materials, Results, Discussion, Conclusion, Acknowledgments, References
of the literature cited and Tables and Figures (and Appendices, if necessary). In this paper,
we thoroughly discuss each of the above listed sections helping medical researchers to
write their research paper in a scientific manner.
Title Page
Title page should include Research Paper Title, Author, affiliated University
and/or Institution, City, Country, and Date. Make the title of your study concise,
descriptive, and informative. Your title should indicate the nature of your research. For
example “Studies on adult leukemia” is not as descriptive as “Lifestyle factors and the
risk of adult leukemia.”
Abstract
An abstract is a concise single paragraph summary of your completed work. It can
be written in different ways as single paragraph or as structured abstract. By structured
abstract, we mean to organize the abstract in sections like that of the original paper (some
journal prefers this style of abstract, such as the journal of Cancer Causes and Controls).
It is best to write your abstract after completing a draft of your scientific paper. The
Abstract is usually written last by the authors but it appears in the journal right below
the title. It should contain all essential information about the objective of the study, basic
information about the experiments and sufficient information about the results to make

the reader eager to read the entire manuscript. Each sentence must add new information
that is clearly stated. It should contain all important results information (i.e., positive as
well as the negative results).
Introduction
The purpose of an introduction is to acquaint the reader with the rationale behind
the work, with the intention of defending it. It places your work in a theoretical context,
and enables the reader to understand and appreciate your objectives. It explains what is
known about the subject from earlier work by various authors. If the subject has little
history, it explains why there is a need for this study and justifies it. The justification is
supported by literature references.
Scientifically, the introduction should present answers to the following questions:
What problem did you investigate? Why did you choose this subject, and why is it
important? What hypotheses did you test? Based upon your reading, what results did you
anticipate, and why? The introduction should address these and similar questions. To
tackle the last question, some literature (library) research will be necessary. If you
include information from other sources to explain what is currently known about the
topic and why you are anticipating certain results, be sure to cite those references in the
body of your paper. Assume that the reader is scientifically literate, but may not be
familiar with the specifics of your study.
The Introduction ends with objectives, e.g., “The objective of this study was to
examine the association between cigarette smoking, fruits and vegetables consumption
and the risk of adult leukemia”.
In general, the Introduction should have these four elements:
i. Background: Who else has done what? How? What have we done previously?
ii. The objectives of the work.
iii. The justification for these objectives: Why is the work important?
iv. Guidance to the reader. What should the reader watch for in the paper? What
are the interesting high points? What strategy did we use?

Methods and Materials
The objective of this section is to make it possible for interested readers to repeat
the experimental work and to reproduce the results obtained with those reported by the
authors. It lists the materials (chemicals, raw materials, ingredients, instruments) used to
carry out the experiments. Unless the experiments have been described elsewhere, the
description should go into sufficient detail. The following questions to be clearly
answered in this section: How did you conduct your study? What equipment did you use?
What procedures did you follow?. Since your procedures have been completed, report
them using past tense. You may use first person, active voice (We added 2 ml of water...)
or passive voice (Two ml of water were added...).
This section should be written in narrative, paragraph format, not as a list of numbered
steps, and should not include any results. Materials should not be listed separately, but
should be included in the description of the methods. Include criteria for selection
(inclusion and exclusion criteria) and an informed consent statement if human subjects
were used (ethical consideration). If using a standard method, you may cite the literature
reference and give only the details specific to your experiment. If the work is based on a
questionnaire or survey, include the blank questionnaire/survey as part of the Methods
section (or it is better to place it in an appendix and refer to it in the Methods section). To
enable the reader to verify the results of the used analysis, you should also indicate the
specific statistical methods and the specific computer program used for the analysis at the
end of this section.
Results
The purpose of a results section is to present and illustrate your findings. Make
this section a completely objective report of the results, and save all interpretation for the
discussion. The results of your research should be presented in a logical order and you
should use past tense when refer to your results. Use tables and figures (such as graphs)
to aid your reader to see and understand your results readily. Tables and figures should be
numbered and titled separately. This will enable you to refer to them in text quite easily
(Data in Table 3 examine the association between...). It is important to avoid the
following common mistakes while writing this important section:

i. Do not discuss or interpret your results, report background information, or
attempt to explain anything.
ii. Never include raw data or intermediate calculations in a research paper.
iii. Do not present only the positive results (Both positive and negative are of the
same importance).
iv. Do not present the same data more than once.
v. Text should complement any figures or tables, not repeat the same
information.
vi. Please do not confuse figures with tables - there is a difference.
Discussion
The purpose of this section is (i) to relate your results to existing knowledge, (ii) to make
clear how your results add to or modify existing knowledge, (iii) to speculate about what
remains unknown, and (iv) to suggest directions of future research. During citation of this
important section, the following eight points should be considered in this order:
i. Summary of your main results in a few sentences.
ii. Consistency of the results with your initial hypothesis, reject or support it.
iii. Comparing of your results with the results of other scientists performing
similar studies.
iv. The biological plausibility of your results.
v. Strengths and limitations of your research.
vi. Further studies need to be performed according to your research findings.
vii. The possible directions for future research.
viii. The theoretical implications or practical applications of your research.
Conclusion
The conclusions can be included in the discussion section. But if you have a separate
conclusions section, do not repeat the discussion points. Base your conclusions on the
evidence that you have presented in your paper. This section shows the ability of the
author (s) to observe and to creatively link individual observations to provide a coherent
story about the study and its meaning and benefits.

Acknowledgments
Acknowledgments section provides an opportunity to thank your assistance; data
collectors, mentors, teachers, typists, and your sponsors for a grant or other contributors
of materials, and even your internal reviewer for comments etc. You should keep this
section brief, but be sure to identify major contributions. Some examples of
acknowledgments include: "I thank the following for assistance, advice and guidance:
Dr. ……….. (My teacher), Mr.…….. (Typist)…etc.
References (Literature Cited)

This part is helpful to other researchers by directing them to important sources of
information. When you refer to the work of another scientist in your paper, you must
indicate the source of that information. That way, someone reading your paper will
realize that the information comes from another project. Failure to cite the work of
another scientist (that you used in writing your paper) results in a serious offense
(plagiarism) that is similar to stealing. Therefore, all information that is not from your
study and is not common knowledge must be acknowledged by a citation.
How to cite the references in the body of the text?
The preferred method of citing a reference in text in most scientific papers is the
author-date system. The citation (author last name and year of publication) should be
placed naturally into the flow of the sentence. If the name of the author appears as part of
the text, cite only the year of the publication. For example, “Kasim (2005) reported a
positive association between adult myeloid leukemia and cigarette smoking.” Otherwise,
place both the name and year in parentheses, as in “A positive association between adult
myeloid leukemia and cigarette smoking has been reported (Kasim, 2005).”
If there are two authors, cite them both, as “(Kasim and levallois, 2004).” When
there are more than two authors, cite only the name of the first author and indicate the
rest by using “et al.” (Meaning “and all others”), as (Kasim et al., 2005). When a
reference has no individual author or the author is unknown, use the name of the agency
or group which published the document, or the name of the lead editor.
All references cited in the body of your paper must be listed in your References
section, and all references in the list must be cited in the text. Sources not actually cited

should not be included in the Reference section. (This is different from a bibliography, in
which you list everything you read, whether or not you actually cited it in your paper).
How to list and write the cited references in the reference section?
References should be listed in alphabetical order, according to the first author's
last name. All types of references should be lumped together before you alphabetize. Do
not make separate lists for books, articles, etc. Works by the same person should be
arranged chronologically by the date of publication.
Appendix
Appendices contain supplemental information such as lists of terms, definitions, or
questionnaires that are useful but not essential to the body of the research paper. If you
have a large table of raw data, but most of it is not essential to the discussion in your
paper, you could include the complete table as an appendix. A smaller table with a subset
of data (or a summary of the data) could then be included in the body of your paper. If
you have more than one set of materials to include, give each a number: Appendix 1,
Appendix 2, etc.

Submission of your paper to a scientific journal
Submitting your paper for publication in a scientific journal, you should visit the
website of the chosen Journal to know more about the requirements of this journal in
order to accept your paper for reviewing and further publication. To this end, many
journals require the above listed sections, submitted in the order listed, each section to
start on a new page. There are variations of course. Some journals call for a combined
results and discussion, for example, or include materials and methods after the body of
the paper etc. Generally, your paper should be neatly typed (double-spaced using 12 point
font, and printed with a letter quality printer), and carefully edited. Each section of the
paper should be clearly labeled with a section title.
When you decide to submit your Randomized Clinical Trial (RCT) to a relevant
international journal, the CONSORT statements that include a 22-item checklist and a
flow diagram should be respected while you write the manuscript of your paper that will
be reviewed by peer reviewers of the journal. Respecting these CONSORT statements
will increase the chance of your paper to be accepted for publication in this journal. You
can find more details about the usefulness of CONSORT statements at web sites:
http://www.pubmedcentral.nih.gov/articlerender.
http://www.biomedcentral.com/1471-2288/1/2

How to write a research proposal
The purpose of a proposal is to state clearly and convincingly a research question
and the means you will use to go about answering it. The research, like human body, has
its own anatomical parts (structure). Each of these parts performs a function (physiology)
aiding in the strength and validity of the research.
Anatomy of the research:
The structure of the research is set out in its proposal (protocol), the written plan
of the study. Protocols are known as devices helping the investigator to organize the
proposed research in a logical, focused, and efficient way. They also help seeking grant
funds. The components of a protocol (research structure) are summarized under the
following points:
1. Introduction (background).
2. Materials and methods.
3. Projected Results and Benefits.
4. Time schedule.
5. Budget of the research.
1. Introduction:
State immediately the specific research project you want to investigate and then
relate it to a more general context. Present a brief background for the project by telling
the reader that fact A, B, and C are known, but that fact D is not known; your project will
fill in this defect or will lead to progress in filling in this defect of knowledge (i.e., study
rational). Make sure that your central question is clearly stated, and that you have
sufficiently narrowed the focus to be able to answer your question in the time you have
available.
2. Materials and methods:

In this section, you convey the general approach you will take, the study site(s)
you will visit (research setting), methods you will use to collect the data and the supplies
and equipment you will need, Make sure you have obtained permission from

homeowners if you are going to use private land as your study site. Writing the
methodology of a research, the following should be fulfilled in details:
I. The research question:
It represents the objective of the study, and defined as the uncertainty about a health issue
that the investigator wants to resolve. It often starts with a vague and general concern that
must be narrowed down to a concrete, researchable issue.
II. The design of the study:
Designing a study is complex process which depends on the nature of research question
decided. The studies include two major types; namely observational and experimental
studies and each of which is classified into a number of designs. No one design is always
better than the others; for each a research question a judgment must be made as to which
design is the most efficient way to get a satisfactory answer.
III. Subjects of the study:
There are two major decisions to be made in choosing the study subjects:
1- Specifying the selection criteria (inclusion and exclusion).
2- Sampling.
IV. The variables:
The choice of variables, the characteristics of the study subjects, represents an important
issue in designing any research. In descriptive studies, the investigator looks at individual
variables, one at a time. In Analytic studies, however, the investigator analyzes the
relationships among two or more variables in order to predict outcomes and to draw
inferences abut cause and effect. In considering the association among two or more
variables, the studied variables will be either predictor (independent) or outcome
(dependent) variable.
In all, the variables of the study, whether independent or dependent, should be
defined clearly. It is better to indicate the methods of treatment of these variables during
analysis (i.e., they will be treated as either categorical or continuous variables). In
categorical variables, especially in analytic studies, the instigator should indicate the
reference category. Confounding variables should be also defined and presented in the
same manner as dependent and independent variables.

V. Statistical methods:
The study protocol should include a plan for managing and analyzing the study data. This
plan should discuss the following three important issues:
- Hypotheses.
The investigator must define clearly the null and alternative hypotheses of the study and
the proposed methods to test these hypotheses. Also, the suggested P value to be used as
a statistical inference should be determined.
- Sample size estimation.
It is very important to report how sample size is calculated? Is it calculated on the basis
of disease prevalence (cross sectional and descriptive studies), exposure prevalence
(case-control studies), or when the investigator have no data about either the disease or
exposure among the studied population, the size of sample can be calculated on the basis
of the study power. A sample size that gives a study power of 80% or more is considered
to be efficient and can acquaint the estimated measures (RR, OR, etc.) a high precision.
- Analytic approach.
Some details about the statistical methods should be presented. Also, the statistical
software package (Statistical program) should be indicated
4. Time schedule:
Provide a detailed time schedule of field work, laboratory work, analysis, and writing up
the final report. The time schedule may be presented in a simple time table.
5. Budget of the research (Supplies and equipment needs list):

Be sure to include any items that may need to be purchased.
BIBLIOGRAPHY
1. American Psychological Association. Publication manual of the American
Psychological Association (4th ed.). Washington DC: APA. 1995.
2. Day RA. How to write and publish a scientific paper. ISI Press, Philadelphia. 1983.
3. Day AR and Gastel B. How to Write and Publish a Scientific Paper. 5th Edition. 2005.
www.Cambridge.org/alerts.

4. Hopkins WG. Guidelines on style for scientific writing. Sport science 1999; 3(1).
5. Houp KW and Pearsall TE. Reporting technical information. Glencoe Press, Beverly
Hills, CA. 1977.
6. Lannon JM. Technical writing. Little, Brown and Company, Boston. 1979.
7. Mali P and Sykes RW. Writing and word processing for engineers and scientists.
McGraw-Hill, New York. 1985
8. Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature. VI. How to
use an overview. JAMA 1994; 272:1367-71.

CHAPTER 13
Reading and Criticizing a Medical Research Paper
While reading a medical research paper, try to deal with all the words and phrases,
although a few technical terms in the methods section might remain. Now go back and
read the whole paper, section by section, for comprehension. Effective comprehension of
what you read will enable you to comment and criticize the paper scientifically. It is also
better to comment and criticize the research paper section by section.
1. Comprehension of the paper

In the introduction section, note how the context is set. What larger question is this
a part of? The author should summarize and comment on previous research, and you
should distinguish between previous research and the actual current study. What is the
hypothesis of the paper and the ways this will be tested?
In the materials and methods section, try to get a clear picture of what was done at
each step. What was actually measured? It is a good idea to make an outline and/or
sketch of the procedures and instruments. Keep notes of your questions; some of them
may be simply technical, but others may point to more fundamental considerations that
you will use for reflection and criticism below.
In the results section, look carefully at the figures and tables, as they are the heart
of most papers. A scientist will often read the figures and tables before deciding whether
it is worthwhile to read the rest of the article! What does it mean to “understand” a figure?
You understand a figure when you can redraw it and explain it in plain English words.
The Discussion contains the conclusions that the author would like to draw from
the data. In some papers, this section has a lot of interpretation and is very important. In
any case, this is usually where the author reflects on the work and its meaning in relation
to other findings and to the field in general.
2. Reflection and criticism

After you understand the research paper and can summarize it, then you can return
to broader questions and draw your own conclusions. It is very useful to keep track of
your questions as you go along, returning to see whether they have been answered. Here

are some questions that may be useful in analyzing and criticizing various kinds of
research papers:
2.1 Introduction
What is the overall purpose of the research?
How does the research fit into the context of its field? Is it, for example, attempting to
settle a controversy? Show the validity of a new technique? Open up a new field of
inquiry?
Do you agree with the author's rationale for studying the question in this way?
2.2 Methods
Were the measurements appropriate for the questions the researcher was approaching?
Often, researchers need to use “indicators” because they cannot measure something
directly. For example, the researcher can use birth weight of babies to indicate their
nutritional status. Were the measures in this research clearly related to the variables in
which the researchers (or you) were interested?
If human subjects were studied, do they fairly represent the populations under study?
2.3 Results
What is the one major finding?
Were enough of the data presented so that you feel you can judge for yourself how the
experiment turned out?
Did you see patterns or trends in the data that the author did not mention? Were there
problems that were not addressed?
2.4 Discussion
Do you agree with the conclusions drawn from the data?
Are there other factors (selection, information and confounding bias) that could have
influenced, or accounted for, the results?
Are the measured estimates precised?
Are these conclusions over-generalized or appropriately careful?
Does the author indicate how the work should be followed up on? Does the paper
generate new ideas?
What further experiments would you think of, to continue the research or to answer
remaining questions?

BIBLIOGRAPHY
1. Greenhalgh T and Taylor R. How to read a paper: Papers that go beyond numbers
(qualitative research). BMJ 1997;315:740-743.
2. Kinmonth A-L. Understanding and meaning in research and practice. Fam Pract
1995;12:1-2.
3. Laupacis A. Wells G. Richardson WS. Tugwell P. Users' guides to the medical
literature. V. How to use an article about prognosis. JAMA 1994;271:234-7.
4. Oxman AD, Cook DJ, Guyatt GH. Users' guides to the medical literature. VI. How to
use an overview. JAMA 1994;272:1367-71.

CHAPTER 14
Epidemiology Glossary
Accuracy: The combination of study precision and validity.
Absolute risk: The observed or calculated probability of an event in the population under
study.
Absolute risk difference: the difference in the risk for disease or death between an
exposed population and an unexposed population.
Absolute risk reduction (ARR): the difference in the absolute risk (rates of adverse
events) between study and control populations.
Adjustment: A summarizing procedure for a statistical measure in which the effects of
differences in composition of the populations being compared have been minimized by
statistical methods.
Allocation ratio: Relative size of comparison group to study group.
Association: Statistical dependence between two or more events, characteristics, or other
variables. An association may be fortuitous or may be produced by various other
circumstances; the presence of an association does not necessarily imply a causal
relationship.
Bias (Syn: systematic error): Deviation of results or inferences from the truth, or
processes leading to such deviation.
Blind(ed) study (Syn: masked study): A study in which observer(s) and/or subjects are
kept ignorant of the group to which the subjects are assigned, as in an experimental study,
or of the population from which the subjects come, as in a non experimental or
observational study. Where both observer and subjects are kept ignorant, the study is
termed a double-blind study. If the statistical analysis is also done in ignorance of the
group to which subjects belong, the study is sometimes described as triple blind. The
purpose of “blinding” is to eliminate sources of bias.
Case-series: Report of a number of cases of disease.
Case-control study: Retrospective comparison of exposures of persons with disease
(cases) with those of persons without the disease (controls).
Causality: The relating of causes to the effects they produce. Most of epidemiology
concerns causality and several types of causes can be distinguished. It must be

emphasized, however, that epidemiological evidence by itself is insufficient to establish
causality, although it can provide powerful circumstantial evidence.
Co-interventions: Interventions other than the treatment under study that are applied
differently to the treatment and control groups. Co-intervention is a serious problem
when double blinding is absent or when the use of very effective non-study treatments is
permitted.
Cohort study: Follow-up of exposed and non-exposed defined groups, with a
comparison of disease rates during the time covered.
Comparison group: Any group to which the index group is compared. It usually
synonymous with control group.
Co-morbidity: Coexistence of a disease or diseases in a study participant in addition to
the index condition that is the subject of study.
Confidence interval (CI): The range of numerical values in which we can be confident
(to a computed probability, such as 90 or 95%) that the population value being estimated
will be found. Confidence intervals indicate the strength of evidence; where confidence
intervals are wide, they indicate less precise estimates of effect and vice versa.
Confidence limits: A range of values for the effect estimate within which the true effect
is thought to lie, with the specified level of confidence
Confounding variable, Confounder: A variable that can cause or prevent the outcome
of interest, is not an intermediate variable, and is associated with the factor under
investigation. A confounding variable may be due chance or bias. Unless it is possible to
adjust for confounding variables, their effects cannot be distinguished from those of
factor(s) being studied.
Dose-response relationship: A relationship in which change in amount, intensity, or
duration of exposure is associated with a change-either an increase or decrease-in the risk.
Determinant: Any definable factor that effects a change in a health condition or other
characteristic.
Effectiveness: a measure of the benefit resulting from an intervention for a given health
problem under usual conditions of clinical care for a particular group; this form of
evaluation considers both the efficacy of an intervention and its acceptance by those to

whom it is offered, answering the question, "Does the practice do more good than harm
to people to whom it is offered?"
Efficacy: a measure of the benefit resulting from an intervention for a given health
problem under the ideal conditions of an investigation; it answers the question, "Does the
practice do more good than harm to people who fully comply with the
recommendations?"
Ethic: Epidemiologists are bound by ethical guidelines that dictate justice and fairness in
all aspects of their work. Epidemiologists must consider the fair and just treatment of
their study subjects, as well as accuracy and honesty in statistical analysis, interpretation
and dissemination of study data.
Exclusion Criteria: Conditions which preclude entrance of candidates into an
investigation even if they meet the inclusion criteria.
Experimental event rate (EER): The percentage of intervention/exposed group who
experienced outcome in question.
Exposure: Exposure is the generic term used to describe the effective presence of any
agent or factor that is thought to cause disease, e.g. toxic chemicals, dietary habits,
activity levels, microorganisms.
Follow-up: Observation over a period of time of an individual, group, or initially defined
population whose relevant characteristics have been assessed in order to observe changes
in health status or health-related variables.
Healthy worker effect: The lower mortality observed in occupational cohorts than
external comparison population is usually attributed to selection for employment of fittest
members of the population
Incidence: The number of new cases of illness commencing, or of persons falling ill,
during a specified time period in a given population.
Information bias: Bias arising from the misclassification of disease or exposure status
Intention to treat analysis: A method for data analysis in a randomized clinical trial in
which individual outcomes are analyzed according to the group to which they have been
randomized, even if they never received the treatment they were assigned. By simulating
practical experience it provides a better measure of effectiveness (versus efficacy).

Interviewer bias: Systematic error due to interviewer's subconscious or conscious
gathering of selective data.
Lead-time bias: If prognosis study patients are not all enrolled at similar, well-defined
points in the course of their disease, differences in outcome over time may merely reflect
differences in duration of illness.
Likelihood ratio: Ratio of the probability that a given diagnostic test result will be
expected for a patient with the target disorder rather than for a patient without the
disorder.
Odds: A proportion in which the numerator contains the number of times an event occurs
and the denominator includes the number of times the event does not occur.
Odds Ratio (Syn: cross-product ratio, relative odds): A measure of the degree of
association; for example, the odds of exposure among the cases compared with the odds
of exposure among the controls
Power: The likelihood that a study will yield a statistically significant finding in the
expected direction (when there is an actual effect)
Precision: The stability of an estimate of effect, as reflected in its confidence interval or
the range in which the best estimates of a true value approximate the true value.
Predictive value: In screening and diagnostic tests, the probability that a person with a
positive test is a true positive (i.e., does have the disease), or that a person with a negative
test truly does not have the disease. The predictive value of a screening test is determined
by the sensitivity and specificity of the test, and by the prevalence of the condition for
which the test is used.
Prevalence: The proportion of persons with a particular disease within a given
population at a given time.
Prognosis: The possible outcomes of a disease or condition and the likelihood that each
one will occur.
Prognostic factor: Demographic, disease-specific, or co-morbid characteristics
associated strongly enough with a condition's outcomes to predict accurately the eventual
development of those outcomes. Compared with risk factors, neither prognostic nor risk
factors necessarily imply a cause and effect relationship.

Prospective study: Study design where one or more groups (cohorts) of individuals who
have not yet had the outcome event in question are monitored for the number of such
events which occur over time.
Randomized controlled trial: Study design where treatments, interventions, or
enrollment into different study groups are assigned by random allocation rather than by
conscious decisions of clinicians or patients. If the sample size is large enough, this study
design avoids problems of bias and confounding variables by assuring that both known
and unknown determinants of outcome are evenly distributed between treatment and
control groups.
Recall bias: Systematic error due to the differences in accuracy or completeness of recall
to memory of past events or experiences.
Referral filter bias: The sequence of referrals that may lead patients from primary to
tertiary centers raises the proportion of more severe or unusual cases, thus increasing the
likelihood of adverse or unfavorable outcomes.
Relative risk (RR): The ratio of the probability of developing, in a specified period of
time, an outcome among those receiving the treatment of interest or exposed to a risk
factor, compared with the probability of developing the outcome if the risk factor or
intervention is not present.
Relative risk reduction (RRR): The extent to which a treatment reduces a risk, in
comparison with patients not receiving the treatment of interest.
Response rate: The proportion of intended study subjects for whom information was
obtained.
Reproducibility (Repeatability, Reliability): The results of a test or measure are identical
or closely similar each time it is conducted.
Retrospective study: Study design in which cases where individuals who had an
outcome event in question are collected and analyzed after the outcomes have occurred.
Risk factor: Patient characteristics or factors associated with an increased probability of
developing a condition or disease in the first place.
Selection bias: A bias in assignment that arises from study design rather than by chance.
These can occur when the study and control groups are chosen so that they differ from
each other by one or more factors that may affect the outcome of the study.

Stratification: Division into groups. Stratification may also refer to a process to control
for differences in confounding variables, by making separate estimates for groups of
individuals who have the same values for the confounding variable.
Strength of inference: The likelihood that an observed difference between groups within
a study represents a real difference rather than mere chance or the influence of
confounding factors, based on both p values and confidence intervals. Strength of
inference is weakened by various forms of bias and by small sample sizes.
Surveillance: Surveillance is a systematic method for continuous monitoring of diseases
in a population, in order to be able to detect changes in disease patterns and then to
control them.
Survival curve: A graph of the number of events occurring over time or the chance of
being free of these events over time. The events must be discrete and the time at which
they occur must be precisely known. In most clinical situations, the chance of an outcome
changes with time. In most survival curves the earlier follow-up periods usually include
results from more patients than the later periods and are therefore more precise.
Validity: The extent to which a variable or intervention measures what it is supposed to
measure or accomplishes what it is supposed to accomplish.
The internal validity of a study refers to the integrity of the study design.
The external validity of a study refers to the generalization of study findings
(extrapolation).
Variance: A measure of the stability of the effect estimate which indicates the amount of
variation in the estimate that would be obtained if identical studies were repeated a large
number of times, with only chance variation among the studies
View publication stats

AEGIS Workshop Invitation Letter-2

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AEGIS Workshop Invitation Letter-2

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author proﬁles for this publication at: https://www.researchgate.

Basic Concepts of Modern Epidemiology: Epidemiology and Research

Book · October 2012

Middle East respiratory syndrome coronavirus in children View project

Childhood Leukemia in Egypt View project

The user has requested enhancement of the downloaded ﬁle.

Dr. Khaled Kasim

Khaled Kasim, Ph.D.

Basic Concepts Of Modern Epidemiology Page 2

Basic Concepts Of Modern Epidemiology Page 3

Basic Concepts Of Modern Epidemiology Page 4

Emergence of Epidemiology as a science:

Basic Concepts Of Modern Epidemiology Page 5

Epidemiology, heath, and public health:

Basic Concepts Of Modern Epidemiology Page 6

Epidemiology and Preventive Medicine

Basic Concepts Of Modern Epidemiology Page 7

Basic Concepts Of Modern Epidemiology Page 8

Basic Concepts Of Modern Epidemiology Page 9

Basic Concepts Of Modern Epidemiology Page 10

Incidence Rate (Incidence density)

Basic Concepts Of Modern Epidemiology Page 11

Cumulative Incidence (Incidence proportion)

Basic Concepts Of Modern Epidemiology Page 12

The interrelation among the three measures:

Basic Concepts Of Modern Epidemiology Page 13

Basic Concepts Of Modern Epidemiology Page 14

Moreover, ratio measures of association quantify the association strength. The

Basic Concepts Of Modern Epidemiology Page 15

1. Attributable Fraction in the population (AFp):

Basic Concepts Of Modern Epidemiology Page 16

2. Attributable fraction among the exposed

Basic Concepts Of Modern Epidemiology Page 17

Basic Concepts Of Modern Epidemiology Page 18

Basic Concepts Of Modern Epidemiology Page 19

2. Consistency of the association: Consistency refers to similar results emerging

3. Specificity of the association: This criterion requires a single cause to produce a

5. Biological gradient: This implies the presence of a dose-response relationship

Basic Concepts Of Modern Epidemiology Page 20

6. Plausibility: This refers to biological plausibility of the observed association.

8. Experimental evidence: According to Hill, the strongest support for causation

9. Analogy: A previous experience can be used as an analogy to make a causal

These nine aspects have been used by several epidemiologists as criteria or

Basic Concepts Of Modern Epidemiology Page 21

Basic Concepts Of Modern Epidemiology Page 22

Basic Concepts Of Modern Epidemiology Page 23

Strengths and limitations of cross sectional studies:

Basic Concepts Of Modern Epidemiology Page 24

3. Case Control Studies

Basic Concepts Of Modern Epidemiology Page 25

Strengths and limitations of case-control studies:

Basic Concepts Of Modern Epidemiology Page 26

Basic Concepts Of Modern Epidemiology Page 27

Strengths and limitations of prospective cohort studies:

Basic Concepts Of Modern Epidemiology Page 28

II. Experimental (intervention) studies:

Basic Concepts Of Modern Epidemiology Page 29

1. Appropriation of the control group:

Basic Concepts Of Modern Epidemiology Page 30

Basic Concepts Of Modern Epidemiology Page 31

Basic Concepts Of Modern Epidemiology Page 32

Basic Concepts Of Modern Epidemiology Page 33

Basic Concepts Of Modern Epidemiology Page 34

2. Systematic random sampling