Professional Documents
Culture Documents
net/publication/236259922
CITATION READS
1 2,440
1 author:
Khaled Kasim
Al-Azhar University
41 PUBLICATIONS 207 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Khaled Kasim on 08 January 2015.
FIRST EDITION
2010
Preface
This book presented the basic concepts of the essential epidemiologic topics
needed to be known not only for epidemiologists but also for every one interested in
medical researches.
During my Ph.D. study of epidemiology in Laval University, Quebec, Canada, and
my work as post doctoral researcher in Queen’s University Belfast, NI, UK, many of
epidemiologic topics have been written. Selecting some of these topics after its
modification in a way that suits all readers concerned with medical researches and adding
also some of other important epidemiologic topics, I accomplished this simple work. In
this work, I tried to present the selected topics in a simple language without any
sophisticated statistical details that may be hard to be easily understood by non specialists.
The book also presented knowledge necessary to acquaint some important research work
skills such as how to write, read and criticize correctly a medical research paper.
Of course, this book is in no way a substitute for the standard textbooks but my
attempt is to help medical researchers understanding the new concepts of epidemiology
and to aid them while they design, conduct, analyze and write their medical research
thesis and papers.
1. What is Epidemiology? 4
2. Measures of Disease Frequency 9
3. Measures of Association and Potential Impact 14
4. Causality and Causal Inference 19
5. Study Design 23
6. Precision and Validity in Epidemiologic Studies 48
7. Standardization and Adjustment 63
8. Interaction and Effect Modification 72
9. Meta analysis 77
10. Screening 84
11. Epidemiologic Surveillance 91
12. Writing a Medical Research Paper 97
13. Reading and Criticizing a Medical Research Paper 108
14. Epidemiology Glossary 111
Uses of epidemiology
1. Description of health status of populations (i.e., community assessment).
2. Description of the natural history of diseases.
3. Identification of causations.
4. Identification of syndrome (i.e., grouping of some different manifestations under
one syndrome or alternatively splitting of similar events into different categories).
BIBLIOGRAPHY
1. Doll R and Hill A. Mortality in relation to smoking: ten year's observations of British
doctors. BMJ 1964; 1:1399-1410 and 1460-1467.
2. Last JM. A dictionary of epidemiology. 2nd ed. Oxford, Oxford University Press, 1988.
3. Kannel WB, et al. Factors of risk in the development of coronary heart disease: six-
year follow-up experience: The Framingham study. Ann Intern Med 1961; 55:33-50.
There is a lot of difference between these measures and researchers must be clear
as to what they are trying to measure. Often, the researcher finds the terms incidence and
prevalence being used synonymously in medical literature. This only reflects the
underlying lack of awareness of basic epidemiologic concepts.
Prevalence
The prevalence measure is called simply “prevalence” (P) (other terms in use
include prevalence rate and prevalence proportion). The prevalence is defined as the
Incidence
Incidence is defined simply as the number of new events (e.g., new cases of a
disease) in a defined population within a specified period of time.
Example: suppose we wished to know how many people in a given population newly
develop diabetes in a certain period of time. Let us say all people were screened at the
start of the calendar year (January) and 10% of 1000 are found to be diabetic. This means
that 900 people are non-diabetic or healthy at the start of the year. Let us this 900
population is again screened for diabetes at the start of the next year (after one year)
and 9 people found positive. This figure of 10% (9/900) is the one year incidence of
diabetes in this population. It is clear that estimating incidence has a longitudinal (follow-
up) component compared to prevalence which has a cross sectional component. Always
note that, we can not estimate incidence by just measuring disease at one point in time.
This is one reason why prevalence is more easily available that incidence figures.
In general, incidence is usually used to quantify the number of new cases in short
duration, acute illnesses, while prevalence is used to quantify chronic illness.
Epidemiologically, the measures of incidence are more sophisticated and descriptive than
prevalence. Incidence measures are particularly useful when causal associations are being
explored. The incidence can be calculated in two ways; either as incidence rate or as
cumulative incidence.
BIBLIOGRAPHY
1. Gerstman BB. Epidemiology kept simple: An introduction to classic and modern
epidemiology. 1st Edition. New York: A John Willy and Sons, Inc., Publication, 1998.
2. Friedman GD. Primer of Epidemiology, 4th Edition. McGraw Hill International
Editions, 1994.
3. Last JM. A Dictionary of Epidemiology, 3rd Edition. Oxford University Press, 1995.
4. Rothman KJ and Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
1. Risk difference: This is also called excess risk. Risk difference is simply the
difference in the incidence among those who are exposed and those who are
unexposed. For example, if the incidence of lung cancer among smokers is 10 per
1000 and incidence among non-smokers is 1 per 1000, then the risk difference is
10 per 1000 minus 1 per 1000 = 9 per 1000 which means an excess risk of 9 per
1000 among smokers. This measure, however does not give us the magnitude of
risk, it just tells us that there has been a risk difference or excess risk between the
compared groups. Knowing the magnitude of risk, it is better to use the relative
risk or odds ratios.
2. Risk Ratio (Relative Risk): This is nothing but the ratio of incidence among
exposed and incidence among unexposed. In the smoking example, this will be 10
per 1000 divided by 1 per 1000 = 10. In simple language, a relative risk (RR) of
10 means that smokers have a ten times higher risk of developing lung cancer as
compared to non-smokers.
BIBLIOGRAPHY
1. Brennan P and Croft P. Interpreting the results of observational research: chance is not
such a fine thing, BMJ 1994, 309:727-730.
58:295-300.
Henle-Koch's Postulates
1. The agent should be present in every case of the disease under appropriate
circumstances.
2. The agent should not be present in any other disease.
3. The agent must be isolated from the body of the diseased individual in pure
culture, and it should induce disease in another susceptible animal.
It is quite clear that Henle-Koch postulates are not really compatible with the
current multi-factorial model of causation (i.e., multiple causation theory of diseases,
particularly for non-communicable diseases where a single agent rarely exists). Thus,
these postulates are rarely used in practice.
1. Strength of the association: Strong associations are more likely to be causal than
weak associations. Weak associations are more likely to be explained by
undetected biases. The association between smoking and lung cancer (large
relative risks have been generated by several observational studies) is often used
as an example for this condition. Note that, while this criterion is reasonable, it
does not rule out the possibility of a weak association being causal.
4. Temporality: This criterion denotes the sequence of events with regards to time.
It is an absolute necessity for a causal association; the cause must precede the
effect. In case-control studies, however, we did not actually know if an exposure
precedes the disease under study. Therefore, the temporality is best to be
explained in prospective studies where the exposure actually precedes the
occurrence of the disease.
7. Coherence: Coherence implies that the association does not conflict with current
knowledge about the disease (its natural history, biology, etc.). For example, the
knowledge that smoking damages bronchial epithelium is compatible with the
association between smoking and lung cancer.
BIBLIOGRAPHY
1. Bruzzi P, Green SB, Pyar DP, Brinton LA, Schairer C. Estimating the population
attributable risk for multiple risk factors using case-control data. American Journal of
Epidemiology 1985; 122:904-914.
2. Hill AB. The environment and disease: association or causation? Proc R Soc Med 1965;
3. Rothman KJ. Causal Inference. Boston: Epidemiology Resources, 1988.
4. Susser M. What is a cause and how do we know one? A grammar for pragmatic
epidemiology. American Journal of Epidemiology 1991,133:635-648.
Prospective
R
Case-control
C
T
R
C
Cross sectional
T
Ecological
2. Ecological studies
Ecological studies are also weak designs. Here the units of study are populations
rather than individuals. For example, when coronary heart disease (CHD) prevalence
rates were compared between different countries, it was found that the CHD prevalence
rate was very low in countries with low mean serum cholesterol, while it was very high in
countries with high mean serum cholesterol. This ecological link paved the way for
intensive investigation into the association between serum cholesterol and CHD.
Another example is the ecological link between malaria incidence and prevalence
of sickle cell disease. Malaria is rare in areas where sickle cell disease was prevalent.
Also, the association between smoking and lung cancer was supported by the ecological
link between smoking and gender (males had higher lung cancer rates).
Ecological studies can be useful in generating hypothesis but no causal inference
can be drawn from them; an apparent ecological link may not be a true link, it could be
confounded by several other factors. As the units of study are populations rather than
individuals, it is difficult to control the associated confounding factors.
4. Cohort Studies
Cohort studies are considered the strongest of all observational designs. A cohort
study is conceptually very straightforward. The idea is to measure and compare the
incidence of disease in two or more study cohorts. In epidemiology, a cohort is a group of
people who share a common experience, condition or characteristic. For example, a birth
cohort shares the same year of birth; a cohort of industrial worker share the same
exposure in the working environment; also, a cohort of smokers has smoking as the
common experience.
In cohort studies, there is usually exposed and unexposed cohort. The exposed
group or cohort is those individuals that have been exposed to some event or condition,
and the unexposed cohort is those individual that have not been exposed to this event or
condition. For example, in the classic General Surgeon’s cohort study on smoking and
lung cancer, the exposure factor was smoking. A cohort of smokers and a cohort of
nonsmokers were followed up for a long period of time and the incidence of lung cancer
was measured and compared among the exposed and unexposed cohort. Normally, an
effort is made to match both cohorts with respect to age, sex and other important
variables; the only key difference between the two cohorts is the outcome (lung cancer).
In such condition, both cohorts are similar in every thing except exposure and this adds to
the strength of the study as the confounding bias is not liable to occur. If, however, the
researcher decided to match also the exposure status (smoking), then the cohort study is
completely destructed (No thing to be compared).
2. Randomized allocation:
Once the eligible subject has agreed to participate in the trial, it is important that
assignment to treatment or control group is done in a manner that is free of any selection
bias. To avoid bias, neither the patient nor the physician should be aware of the group to
3. Blinded intervention:
The aim of blinding is to ensure that outcome ascertainment is done without any
bias. Blinding is logistically difficult but essential. Some authors use the word "masking"
instead of blinding. A single-blinded trial is one in which the patient is not informed of
the treatment assignment. A double-blinded trial is one in which neither the patient nor
the physician responsible for the treatment is informed of the treatment assignment.
Sometimes, the investigator analyzed the data did not also known any thing about the
treatment assignments and this is called triple blind technique. RCT usually report the
effectiveness of blinding. Sometimes, known adverse effects of drugs may un-blind the
physician (e.g. bradycardia due to beta blockers). Ideally, data collection, measurement,
reading and classification procedures on individual patients should be made by persons
who are completely blinded. For instance, if chest radiographs have to be read, the films
can be sent to another site where they are read by radiologists who have no idea about the
patients or their treatment groups.
As far as possible, outcomes chosen should be objective and clinically relevant.
Outcomes should be capable of being observed in a blinded fashion. For instance, pain is
a very subjective outcome and difficult to measure in a blinded fashion. On the other
hand, if the outcome is a biochemical parameter, then it is objective and can be easily
measured in a blinded fashion.
Sample
Target
Pop.
External
Pop.
Sometimes, the entire population will be sufficiently small, and the researcher can
include the entire population in the study. This type of research is called a census study
because data is gathered on every member of the population. Usually, the population is
too large for the researcher to attempt to survey all of its members. A small, but carefully
chosen sample can be used to represent the population. The sample reflects the
characteristics of the population from which it is drawn.
Sampling methods are simply classified as either probability or nonprobability. In
probability samples, each member of the population has a known non-zero probability of
being selected (i.e., each member has the same chance of being selected). Probability
methods include random sampling, systematic sampling, stratified and cluster sampling.
In nonprobability sampling, however, members are selected from the population in some
nonrandom manner. These include convenience sampling, judgment sampling, quota
sampling, and snowball sampling. The advantage of probability sampling is that sampling
error can be calculated. In nonprobability sampling, the degree to which the sample
differs from the population remains unknown.
3. Stratified sampling
It is commonly used probability method that is superior to random sampling
because it reduces sampling error. In stratified sampling, the population is first divided
into mutually exclusive sub-populations known as strata. Then, different sampling
4. Cluster sampling
In cluster sampling, either all elements from each selected cluster can be included
in the sample, or a sub-selection can be made from within the selected clusters. The
former case is called one-stage cluster sampling while the latter is known as two-stage
cluster sampling. An example of cluster sampling would be a study of work conditions
where the firms/enterprises are selected first and their employees are then selected for the
examination. A firm is a cluster and if all its employees are examined, this will be a one-
stage cluster design, but if only some of the employees are selected, the design will be a
two-stage cluster sampling. The process could comprise even more stages than this,
depending on the population structure.
3. Quota sampling
This sampling method is the nonprobability equivalent of stratified sampling. Like
stratified sampling, the researcher first identifies the stratums and their proportions as
they are represented in the population. Then convenience or judgment sampling is used to
select the required number of subjects from each stratum. This differs from stratified
sampling, where the stratums are filled by random sampling.
4. Snowball sampling
It is a special nonprobability method used when the desired sample characteristic
is rare. It may be extremely difficult or cost prohibitive to locate respondents in these
situations. Snowball sampling relies on referrals from initial subjects to generate
additional subjects. While this technique can dramatically lower search costs, it comes at
the expense of introducing bias because the technique itself reduces the likelihood that
the sample will represent a good cross section from the population.
Overmatching
There are at least three forms of overmatching: the first refers to overmatching that
harms statistical efficiency, such as in case-control study, matching on a variable witch
associated with exposure but not disease. The second refers to matching harms validity,
such as matching on an intermediate factor between exposure and disease. The third
refers to matching that harms cost efficiency
Overmatching bias
Matching on factors that are affected by disease or exposures alone will bias the
result. Matching on a factor affected by exposure but unrelated to disease in any way will
simply reduce statistical efficiency. Matching on intermediate factors will bias the crude
and adjusted effect estimates, such as matching on symptoms or signs of the exposure or
of the disease. As a general rule, before doing matching, the following two questions
should be perfectly answered: (i) what are the benefits, and (2) what are harms of
matching on the study results. Also, one should never match on factors that are affected
by exposure only.
Minimizing Harm:
This principle is directly related to harms-benefits analysis. It is the duty to avoid,
prevent or minimize harms to others. Research subjects must not be subjected to
Maximizing Benefit:
Another principle related to the harms and benefits of research is beneficence. The
principle of beneficence imposes a duty to benefit others and, in research ethics, a duty to
maximize net benefits. Human researches are always intended to produce benefits for
subjects themselves, for other individuals or society as a whole, or for the advancement
of knowledge.
BIBLIOGRAPHY
1. Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. World Health
Organization (WHO), 1993.
2. Friedman LM et al. Fundamentals of Clinical Trials. Boston: Johns Wright, 1985.
3. Gerstman BB. Epidemiology kept simple: An introduction to classic and modern
epidemiology. 1st Edition. New York: A John Willy and Sons, Inc., Publication, 1998.
4. Hulley SB. Designing Clinical Research. Baltimore: Williams & Wilkins, 1988.
5. MacMahon B and Trichopoulos D. Epidemiology: Principles & Methods. 2nd Edition.
Little Brown and Co, 1996.
6. Mienert C. Clinical Trials. New York: Oxford University Press, 1986.
7. Rothman KJ, Greenland S. Modem Epidemiology, 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
8. Vannus H and Satcher D. Ethical complexities of conducting research in developing
countries. NEJM 1997, 337:1003-1005.
Precision
The precision of an epidemiologic study is determined by the degree to which
random error may contribute to the study results. In sampling contexts, the random error
is called sampling error. Sampling error is the degree to which a sample might differ from
the population. When inferring to the population, results are reported plus or minus the
sampling error. Sampling error gives us some idea of the precision of our statistical
estimate. A low sampling error means that we had relatively less variability or range in
the sampling distribution with regard to the target population. Sampling error calculation
is based on the standard deviation of the sample. The greater of the sample standard
deviation, the greater will be the sampling error. The sampling error is also related to the
sample size. The greater the sample size, the smaller will be the error.
Example:
Suppose a sample of 100 subjects from a village is studied and the mean systolic
BP was found to be 110, is this the true mean of the entire village population? To answer
this question, we need to know what will happen if another sample of 100 is drawn from
the same village. The new mean may or may not be 110. Thus each time a fresh sample is
studied, different results may be obtained. This variation is because of a) variation within
Confidence interval
We know that most of the biological continuous variables are normally distributed
and the central value is the mean. Interestingly, if repeated samples are drawn from a
population and the mean (x) computed for each sample, it will be found that the means
themselves are normally distributed (most samples will yield means which are close to
some central value while a few will yield extreme values). The mean of all the sample
means will be the true population mean (X). The standard deviation of the sample is
denoted by (s). The standard deviation of the distribution of sample means, called the
standard error of the mean, is the same as the population standard deviation (S).
Mathematically, the random error is calculated by dividing the standard deviation of the
sample by the square root of the sample size (n).
SE = s / √n [formula when mean is estimated]
If one is estimating a proportion (p) instead of a mean, the SE is computed by dividing pq
(where q= 1 -p) by the sample size (n) and getting the square root of the resultant number:
SE = √pq / n [formula when proportion (percentage) is estimated]
In fact, it is an important rule in statistics that even if the underlying population is
not normally distributed, the means of the samples themselves will be approximately
normally distributed if the sample sizes are large enough. This phenomenon is explained
by a statistical theorem called the Central Limit Theorem. It is this rule that permits us to
estimate the population parameter using sample results. It is the fundamental basis for
understanding confidence intervals.
Supposing the mean birth weight of babies in Cairo is known to be 3.6 kg and the
standard deviation 0.5. A sample of 100 babies from Nasr city (a district in Cairo) is
found to have a mean birth weight of 3.2 and standard deviation of 0.4. Is this observed
difference (3.6 – 3.2) real or is it due to sampling variability. In other words, is Nasr city
truly different from the rest of Cairo? To answer this question, we need to understand
hypothesis testing.
Hypothesis testing is a method by which we determine how likely it is that
observed differences in data are entirely due to sampling variability (chance) rather than
Confounding bias
Confounding can be thought of as a mixing of the effects of the exposure being
studied with effects of other factor(s) on risk of the studied outcome. These factors are
known as confounders. A confounder, if not adequately controlled in design or analysis,
may bias the exposure-disease association, making it either closer or farther from the null
than the true effect. Confounding may even reverse the apparent direction of an effect in
extreme situations. Confounding occurs when the exposed and non-exposed
subpopulations of the source population have different background disease risks; such
differences are caused by confounders. Similar problems may also occur in randomized
trials because randomization may fail, leaving the treatment groups with different
characteristics at the time that they enter the study, and because of differential loss and
non-compliance to the prescribed drug across treatment groups.
The following three preconditions are necessary for a factor to be a confounder.
1. A confounder is a factor that is predictive of disease in the absence of the
exposure under study.
Selection Bias
Whereas confounding generally involves biases that are inherent in the source
population, selection bias involves biases arising from the procedures by which the study
participants are chosen from the source population. Thus, selection bias is not an issue in
a cohort study involving complete recruitment and follow-up because in this instance the
study cohort comprises the entire source population. However, selection bias can occur if
participation in the study or follow-up is incomplete.
Examples of selection bias in epidemiologic studies:
i. Non response bias: This type of bias occurs when the eligible study subjects
refuse or discontinue participation once the study is in progress. Withdrawal
and lost to follow to follow-up bias are common examples of this form of bias.
It is pertinent here to say that if the non respondents are known to have the
same sociodemographic characteristics as the respondent, then selection bias
tends to have minimal effects on the study results. Conventionally, although in
case-control studies, the response rate of more than 75% is considered to be
efficient, but still a suspicious of the bias effects on the study findings. In
cohort study, however, lost to follow-up of some studied subjects, even not
more, still have the suspicious of selection bias because the incidence of the
disease among them is not known.
ii. Prevalence-incidence bias: This form of bias occurs when prevalent cases are
used to study disease etiology. The prevalent cases are more likely to be long
term survivors and, therefore, may represent a relatively mild form of the
Information bias
Information bias is the result of misclassification of study participants with respect
to disease or exposure status. Thus, the concept of information bias refers to those people
actually included in the study. In contrast, selection bias refers to the selection of the
study participants from the source population, and confounding generally refers to non-
comparability of subgroups within the source population. Consequently, information bias
is also called misclassification bias, especially when the data are dichotomous. Sources of
information bias include measurement device defects, questionnaires and interviews that
do not measure what they intended to measure, inaccurate diagnostic procedures, and
incomplete or erroneous data sources. It is customary to consider two types of
misclassification: non-differential and differential.
Non-differential misclassification
Non-differential misclassification occurs when the probability of exposure
misclassification is the same for both groups being compared. This arises if exposed and
non-exposed persons are equally likely to be misclassified according to disease outcome,
or if diseased and non-diseased persons are equally likely to be misclassified according to
exposure. Non-differential misclassification of exposure usually, although not always,
biases the relative risk estimate towards the null value of 1.0. In addition, the non-
differential information bias may tend to produce false negative findings in studies which
find a negligible association between exposure and disease.
Differential misclassification
Differential misclassification occurs when the probability of misclassification of
exposure is different in diseased and non-diseased persons, or the probability of
BIBLIOGRAPHY
1. Ahlbom A and Steineck G. Aspects of misclassification of confounding factors.
American Journal of Industrial Medicine 1992; 21:107-112.
2. Breslow NE and Day NE. Statistical methods in cancer research. Vol I. The analysis of
case-control studies. Lyon: IARC Scientific Publication No. 32. 1980.
3. Chavance M, Dellatolas G, Lellouch J. Correlated nondifferential misclassification of
disease and exposure. International Journal of Epidemiology 1992; 21: 537-46.
4. Choi BC. Definition, sources, magnitude, effect modifiers, and strategies of reduction
of the healthy worker effect. Journal of Occupational Medicine 1992; 34:979-988.
5. Gardner MJ and Altaian DG. Statistics with Confidence: confidence intervals and
statistical guidelines. BMJ Publications, 1989.
6. Gardner MJ and Altman DG. Confidence intervals rather than P values. BMJ
1986,292:746-50.
7. Greenland S and Robins JM. Confounding and misclassification. American Journal of
Epidemiology 1985; 122:495-506.
8. Miettinen OS. Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.
9. Monson RR. Occupational epidemiology, 2nd Ed. Boca Raton, Florida. CRC Press.
1990.
10. Robins JM. The control of confounding by intermediate variables. Stat Med 1989; 8:
679-701.
11. Rothman KJ, Greenland S. Modem Epidemiology, 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
From this table, we have the crude death rates of 18/1000 and 17/1000 for the
population A & B respectively. However, on looking carefully to this table, we observe
that the distribution (number) of population in each age group categories is not the same
in the two compared populations and this might possibly confound the calculated crude
death rates. To overcome this problem, we must adjust (standardize) the death rate by age,
which represents the confounding factor in this example. Because the age specific death
rates are known in this example for the two populations, we will use the direct method of
standardization. First, we borrow the population A to be the reference population and
then apply the number of population in each of its age group categories in the population
A to the age specific death rate of the corresponding age group category in the population
B.
The adjusted death rate in the population B is calculated using the following formula:
Σ (age specific death rate in B X No. of population in that age category in A
Total number of population in A
Σ = summation.
= 10/1000 X400 + 10/1000X600 + 45/1000X1000 / 2000 = 55/ 2000 = 27.5/1000
2. Indirect method
The indirect method of standardization is used when we have no data about the
age specific rate in one or the two populations being compared, but the number of
population in each age group should be known. In this condition, we borrow the age
specific death rates from a reference (standard) population and applying it to the number
of population in each corresponding age group of the compared populations to obtain
what is called the expected rates. Finally, we divide the observed rate by this calculated
expected rate and multiply by 100 to obtain the standardized ratio and it is known as
standardized mortality ratio (SMR) when we deal with deaths. The adjusted rate using
this indirect method is based on multiplying the crude rate in the study population by
SMR ratio. The formulas summarising this method are:
aR (indirect) = cR x SMR
SMR = O/E
E = ∑ Ri ni Where
aR : adjusted rate
cR : crude rate
SMR : Standardized mortality ratio.
O: observed number of events in the study population.
E : expected number of events in the study population.
∑ : summation.
Ri : the rate in ith stratum of the standard population.
Ni : the number of population in the ith stratum of the study population.
Example: We use the previous table (Table 7.1), without the number of deaths in each
age group categories and accordingly without any data about the age specific rates.
In this example, we must borrow a third standard (reference) population with a known
age specific death rate to calculate the SMR (Table 7.3).
Table 7.3. Age adjusted death rate of a hypothetical reference population.
Age group No. of No. of Age specific
categories population deaths death rate
0- 5000 15 3/1000
10- 20000 260 18/1000
60- 10000 500 50/1000
Using the data from this hypothetical table, we can calculate the SMR in the
population A & B as follow:
The expected death rate in population A = 3/1000 X 400 + 18/1000 X 600 + 50/1000 X
1000 = 1.2 + 10.8 + 50 = 62.
The SMR in population A = 36 / 62 X 100 = 58 %.
The expected death rate in population B = 3/1000 X 600 + 18/1000 X 1000 + 50/1000
X 400 = 1.8 + 18 + 20 = 39.8.
The SMR in population B = 34 /39.8 X 100 = 85 %.
Simply, since the SMR in population B (85%) is higher than that in population A
(58%), we can conclude that the risk of death is higher in population B. Similarly, when
we multiply the crude rate of each population by its measured SMR, we have the adjusted
rate of population B to be about 15/1000 higher than that of population A which is
calculated to be about 9/1000.
BIBLIOGRAPHY
1. Ahlbom A, Norell S. Introduction to modern epidemiology. 2nd Edition. Epidemiology
Resources Inc. Publication. 1990; P30-35.
3. Gerstman BB. Epidemiology kept simple: An introduction to modern epidemiology.
New York. A John Willy & Sons, Inc., Publication, 1998.
4. Mantel N, Haenszel, W. Statistical aspects in the analysis of data from retrospective
studies of diseases. Journal of National Cancer Institute 1959; 22:719-748.
5. Miettinen OS : Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.
6. Rothman KJ and Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
Effect Modification
The concepts of effect modification and confounding are quite distinct. It should
therefore be recognized that an effect modifier may or may not be a confounder. For
example, the distribution of smoking across exposure levels may be identical, yet the rate
ratio for the exposure effect may vary by smoking status. In this situation, smoking
would not be a confounder, but would be an effect modifier. The term statistical
interaction denotes a similar phenomenon for associations without causal connotation.
We will use the term, effect modification in the subsequent discussion. However, both
BIBLIOGRAPHY
1. Miettinen OS : Confounding and effect modification. American Journal of
Epidemiology 1974; 100:350-353.
A real example:
Glasziou and Mackerras (1993) carried out a meta-analysis of the role of vitamin A
supplementation in infectious disease. Table 1 shows the basic data needed for each study
in order to perform a meta-analysis. Essentially, these data consist of the number of
patients randomized and the number of untoward events (deaths) in the treatment and
control arm of each study.
Table 9. 1. Data of the five studies involved in the real example.
Vitamin A group Control group
Study Dose Regime Deaths Number Deaths Number
1 200,000 IU Six-monthly 101 12,991 130 12,209
2 200,000 IU Six-monthly 39 7,076 41 7,006
3 8,333 IU Weekly 37 7,764 80 7,755
4 200,000 IU Four-monthly 152 12,541 210 12,264
5 200,000 IU Once 138 3,786 167 3,411
The calculated odds ratios (ORs) and their 95% confidence interval of the above
mentioned five studies are presented in table 9.2.
Table 9.2. Study Odds ratio and their 95% confidence intervals
Study OR 95% CI
1 0.73 0.56-0.95
2 0.94 0.61-1.46
3 0.46 0.31-0.68
4 0.70 0.57-0.87
5 0.73 0.58-0.93
From this real example, the common odds ratio can be found in several ways. To
use logistic regression, we regress the event of death on vitamin A treatment and study.
We treat the treatment as a dichotomous variable, set to 1 if treated with vitamin A, 0 if
control. Further, we treat the studies included as a categorical variable, so we create
Graphical Presentation
The results of meta-analysis can also be presented in graphical form. The odds
ratios and their confidence intervals of the real example are shown in the figure below.
The confidence interval is indicated by a line and the odds ratio is indicated by the middle
of a square. The area of the square is proportional to the number of subjects in the study
that makes study 2, with the widest confidence interval, to be relatively unimportant and
makes the overall estimate stand out.
Limitations of meta-analysis:
The main problems of meta-analysis arise even before we begin the analysis of the
data. First, we must have a clear definition of the question so that we only include studies
which address this. For example, if we want to know whether lowering serum cholesterol
reduces mortality from coronary artery disease, we would not want to include a study
where the attempt to lower cholesterol failed. On the other hand, if we ask whether
dietary advice lowers mortality, we would include such a study. Which studies we
include may have a profound influence on the conclusions. Second, we must have all the
relevant studies. A simple literature search is not enough. Not all studies which have been
Meta meta-analysis:
Meta met-analysis means the evaluation of meta-analysis. To evaluate any meta
analysis study, the following criteria should be considered:
i. Methods of research: How did the researcher identify the relevant studies (i.e.,
was a computerized search or a review of citations of other review articles used?)
How complete was the search?
ii. Eligibility criteria: How did the meta-analysis decide which pieces of research to
include in their evaluation?
iii. Number of studies: How many research studies were selected for analysis?
iv. Outcome variable: Which variables were selected from each article to be included
in the statistical analysis? Each original article may provide several outcome
variables, not all of which may be included in the other articles or be of particular
interest of the meta-analysis.
v. Study design: What were the specific types of studies included in the analysis, and
how many of each type were used? Study designs may include different types of
studies such as case-control, prospective, etc.
BIBLIOGRAPHY
1. Armitage P and Berry G. Statistical Methods in Medical Research, 3rd Ed. Blackwell,
Oxford, 1994.
2. Breslow NE. and Day NE. Statistical methods in cancer research. Volume II. the
design and analysis of cohort studies IARC, Lyon, 1980.
3. Buyse M, Piedbois P, Carlson RW. Meta-analyses based on published data are
unreliable (letter). J Clin Oncol 17:1646-1647, 1999.
4. DerSimonian R and Laird N. Meta-analysis in clinical trials. Control Clin Trials 7:177-
188, 1986.
5. Easterbrook PJ, Berlin JA, Gopalan R, and Mathews DR. Publication bias in clinical
research. Lancet 337 867-72, 1991.
6. Glasziou PP and Mackerras DEM: Vitamin A supplementation in infectious disease: a
meta-analysis. British Medical Journal 306 366-70, 1993.
7. Halvorsen KT. Combining results from independent anvestigations: meta-analysis in
medical research. In medical uses of statistics. Bailer JC et al., editors. NEJM Books.
1986.
8. Jones D. Meta-analysis: Weighing the evidence. Stat Med 14:137-149, 1995.
9. Thompson SG. Controversies in meta-analysis: the case of the trials of serum
cholesterol reduction. Statistical methods in medical research 2 173-92, 1993.
i. Sensitivity
Sensitivity is defined as the ability of the test to identify correctly those individuals
having the disease. Sensitivity is independent of the disease prevalence in the population
being tested. Sensitivity represents the ratio of the number of individuals with the disease
whose screening tests are positive to the total number of individuals with the disease
under the study and is usually is expressed as a percentage. According to the fourfold
table, the sensitivity of the test is determined as:
Sensitivity (%) = a / a + c x 100
It is pertinent here to note that both PPV and NPV are determined according to the
results of a subsequent confirmatory test. Finally, it is of great importance to point that
Lead time with screening (a) disease is diagnosed earlier than without screening (b) and survival is
longer from diagnosis, but this does not necessarily imply that the time course of the disease has been
modified).
BIBLIOGRAPHY
1. Armitage P and Berry G. Statistical Methods in Medical Research, 3rd Ed. Blackwell,
Oxford, 1994.
2. Breslow NE. and Day NE. Statistical methods in cancer research. Volume II. the
design and analysis of cohort studies IARC, Lyon, 1980.
3. Gerstman BB. Epidemiology kept simple: An introduction to modern epidemiology.
New York. A John Willy & Sons, Inc., Publication, 1998.
4. Rothman KJ, Greenland S. Modern Epidemiology. 2nd Edition. Philadelphia:
Lippincot-Raven, 1998.
Definition of surveillance
Epidemiologic surveillance is defined as the ongoing systematic collection,
recording, analysis, interpretation, and dissemination of data reflecting the current health
status of a community or population. The scope of epidemiologic surveillance has
evolved from an initial focus on infectious disease monitoring and intervention to a more
inclusive scope that influences health status that includes chronic diseases, injuries,
environmental exposures, and social factors.
Surveillance of an epidemic requires a very specific definition of what constitutes
a case that can be counted. The number of suspected cases, probable cases, and
confirmed cases of a disease are actively sought and monitored. The number of cases, and
the relationship between cases, is used during an outbreak investigation in an attempt to
identify causes and those at risk, and to implement an intervention. Surveillance does not
Purpose of surveillance:
The most familiar purpose for surveillance is the identification, as rapidly as
possible, of unusual events, outbreaks of disease and emerging health issues. It is worth
noting that, although high quality surveillance data are always desirable, for these “early
warning” purposes, a balance must be struck between timeliness and high levels of
validity. Another significant role for surveillance is to inform decisions governing the
management of risks to health. This may involve public health programs, regulatory
action or public policy responses, all of which are exercises in evidence-based decision
making, with surveillance being one important source of evidence.
Merely monitoring the current status of disease prevalence, health indicators, or
social markers does not protect the health of a community. Careful monitoring, however,
creates a baseline measurement of threats to the public's health. It is this established
baseline that enables public health workers to notice when an anomaly occurs. A sharp
increase in the number of cases of a disease will instigate further investigation,
intervention, and prevention measures.
Types of surveillance:
Surveillance is based on both passive and active data collection processes. When a
clinician or laboratory encounters a patient or sample indicating the presence of certain
conditions or pathogens, there is a legal obligation to report the case to local public health
officials. The result is a passive monitoring of the levels of the disease in the community.
Active surveillance, on the other hand, is commonly referred to as "case finding." This
occurs when the data necessary to monitor levels of a medical or social condition is
sought out actively. This is accomplished through a variety of means, ranging from
clinical record reviews to community surveys.
3. Information dissemination
The ongoing and timely information dissemination system help to alert health
professionals and the general public about forthcoming health risks (e.g. risk assessment)
and to put our current knowledge of risk assessment and management into perspective so
the general public knows what health risks to avoid (e.g. publication of “Handbook of
Health Risks”) and what healthy activities to pursue (e.g. publication of “Handbook of
Healthy Practices”)
5. Computer Technology
The automated search and linkage techniques are essential to retrieve information
from a vast array of data. Also automated data analysis systems is essential to produce
early warning signals for health and risk factor trends
The surveillance systems are now established in many countries not only for
infectious diseases but also for chronic disease. In Canada, for example, in addition to the
well developed surveillance system for most of the infectious and chronic diseases, the
national enhanced cancer surveillance system (NECSS) represents a good example of
BIBLIOGRAPHY
1. Friis, R., and Sellers, T. (1996). Epidemiology for Public Health Practice. Gaithersburg,
MD: Aspen Publishers, Inc.
2. Gregg, M. (1996). Field Epidemiology. New York: Oxford University Press.
3. Jones J, Hunder D. Consensus methods for medical and health services research. Br
Med J 1995;311:376-80.
4. Lomas J. Research and evidence-based decision making. Aust NZ J Public Health
1997;21:439-40. 22.
5. McNeil D. Epidemiological research methods. New York: John Wiley & Sons, 1996
6. Teutsch, S., and Churchill, R. (1994). Principles and Practice of Public Health
Surveillance. New York: Oxford University Press.
Title Page
Title page should include Research Paper Title, Author, affiliated University
and/or Institution, City, Country, and Date. Make the title of your study concise,
descriptive, and informative. Your title should indicate the nature of your research. For
example “Studies on adult leukemia” is not as descriptive as “Lifestyle factors and the
risk of adult leukemia.”
Abstract
An abstract is a concise single paragraph summary of your completed work. It can
be written in different ways as single paragraph or as structured abstract. By structured
abstract, we mean to organize the abstract in sections like that of the original paper (some
journal prefers this style of abstract, such as the journal of Cancer Causes and Controls).
It is best to write your abstract after completing a draft of your scientific paper. The
Abstract is usually written last by the authors but it appears in the journal right below
the title. It should contain all essential information about the objective of the study, basic
information about the experiments and sufficient information about the results to make
Introduction
The purpose of an introduction is to acquaint the reader with the rationale behind
the work, with the intention of defending it. It places your work in a theoretical context,
and enables the reader to understand and appreciate your objectives. It explains what is
known about the subject from earlier work by various authors. If the subject has little
history, it explains why there is a need for this study and justifies it. The justification is
supported by literature references.
Scientifically, the introduction should present answers to the following questions:
What problem did you investigate? Why did you choose this subject, and why is it
important? What hypotheses did you test? Based upon your reading, what results did you
anticipate, and why? The introduction should address these and similar questions. To
tackle the last question, some literature (library) research will be necessary. If you
include information from other sources to explain what is currently known about the
topic and why you are anticipating certain results, be sure to cite those references in the
body of your paper. Assume that the reader is scientifically literate, but may not be
familiar with the specifics of your study.
The Introduction ends with objectives, e.g., “The objective of this study was to
examine the association between cigarette smoking, fruits and vegetables consumption
and the risk of adult leukemia”.
In general, the Introduction should have these four elements:
i. Background: Who else has done what? How? What have we done previously?
ii. The objectives of the work.
iii. The justification for these objectives: Why is the work important?
iv. Guidance to the reader. What should the reader watch for in the paper? What
are the interesting high points? What strategy did we use?
Results
The purpose of a results section is to present and illustrate your findings. Make
this section a completely objective report of the results, and save all interpretation for the
discussion. The results of your research should be presented in a logical order and you
should use past tense when refer to your results. Use tables and figures (such as graphs)
to aid your reader to see and understand your results readily. Tables and figures should be
numbered and titled separately. This will enable you to refer to them in text quite easily
(Data in Table 3 examine the association between...). It is important to avoid the
following common mistakes while writing this important section:
Discussion
The purpose of this section is (i) to relate your results to existing knowledge, (ii) to make
clear how your results add to or modify existing knowledge, (iii) to speculate about what
remains unknown, and (iv) to suggest directions of future research. During citation of this
important section, the following eight points should be considered in this order:
i. Summary of your main results in a few sentences.
ii. Consistency of the results with your initial hypothesis, reject or support it.
iii. Comparing of your results with the results of other scientists performing
similar studies.
iv. The biological plausibility of your results.
v. Strengths and limitations of your research.
vi. Further studies need to be performed according to your research findings.
vii. The possible directions for future research.
viii. The theoretical implications or practical applications of your research.
Conclusion
The conclusions can be included in the discussion section. But if you have a separate
conclusions section, do not repeat the discussion points. Base your conclusions on the
evidence that you have presented in your paper. This section shows the ability of the
author (s) to observe and to creatively link individual observations to provide a coherent
story about the study and its meaning and benefits.
How to list and write the cited references in the reference section?
References should be listed in alphabetical order, according to the first author's
last name. All types of references should be lumped together before you alphabetize. Do
not make separate lists for books, articles, etc. Works by the same person should be
arranged chronologically by the date of publication.
Appendix
Appendices contain supplemental information such as lists of terms, definitions, or
questionnaires that are useful but not essential to the body of the research paper. If you
have a large table of raw data, but most of it is not essential to the discussion in your
paper, you could include the complete table as an appendix. A smaller table with a subset
of data (or a summary of the data) could then be included in the body of your paper. If
you have more than one set of materials to include, give each a number: Appendix 1,
Appendix 2, etc.
1. Introduction:
State immediately the specific research project you want to investigate and then
relate it to a more general context. Present a brief background for the project by telling
the reader that fact A, B, and C are known, but that fact D is not known; your project will
fill in this defect or will lead to progress in filling in this defect of knowledge (i.e., study
rational). Make sure that your central question is clearly stated, and that you have
sufficiently narrowed the focus to be able to answer your question in the time you have
available.
4. Time schedule:
Provide a detailed time schedule of field work, laboratory work, analysis, and writing up
the final report. The time schedule may be presented in a simple time table.
BIBLIOGRAPHY
1. American Psychological Association. Publication manual of the American
Psychological Association (4th ed.). Washington DC: APA. 1995.
2. Day RA. How to write and publish a scientific paper. ISI Press, Philadelphia. 1983.
3. Day AR and Gastel B. How to Write and Publish a Scientific Paper. 5th Edition. 2005.
www.Cambridge.org/alerts.