You are on page 1of 9

PSYCHOLOGICAL SCIENCE

General Article

SCIENCE AND ETHICS IN CONDUCTING, ANALYZING, AND REPORTING PSYCHOLOGICAL RESEARCH


By Robert Rosenthal
The relationship between scientific quality and ethical quality is considered for three aspects ofthe research process: conduct of the research, data analysis, and reporting of results. In the area of conducting research, issues discussed involve design, recruitment, causism, scientific quality, and costs and utilities. The discussion of data analysis considers data dropping, data exploitation, and meta-analysis. Issues regarding reporting of results include misrepresentation of fmdings, misrepresentation of credit, and failure to report results as a result of selfcensoring or external censoring.

be ethically questionable because of the shortcomings of the design. Issues of Design Imagine that a research proposal that comes before an institutional review board proposes the hypothesis that private schools improve children's intellectual functioning more than public schools do. Children from randomly selected private and public schools are to be tested extensively, and the research hypothesis is to be tested by comparing scores earned by students from private versus public schools. The safety of the children to be tested is certainly not an issue, yet it can be argued that this research raises ethical issues because of the inadequacy of its design. The goal of the research is to learn about the causal impact on performance of private versus public schooling, but the design of the research does not permit reasonable causal inference because of the absence of randomization or even some reasonable attempt to consider plausible rival hypotheses (Cook & Campbell, 1979). How does the poor quality of the design raise ethical objections to the proposed research? Because students', teachers', and administrators' time will be taken from potentially more beneficial educational experiences. Because the poor quality of the design is likely to lead to unwarranted and inaccurate conclusions that may be damaging to the society that directly or indirectly pays for the research. In addition, allocating time and money to this poor-quality science will serve to keep those finite resources of time and money from better quality science in a world that is undeniably zero-sum. It should be noted that had the research question addressed been appropriate to the research design, the ethical issues would have been less acute. If the investigators had set out only to learn whether there were performance differences between students in private versus public schools, their design would have been perfectly appropriate to their question.
Copyright 1994 American Psychological Society

The purpose of this article is to discuss a number of scientific and ethical issues relevant to conducting, analyzing, and reporting psychological research. A central theme is that ethics and scientific quality are very closely interrelated. Everything else being equal, research that is of higher scientific quality is likely to be more ethically defensible. The lower the quality ofthe research, the less justified we are ethically to waste research participants' time, funding agencies' money, and journals' space. The higher the quality of the research, the better invested have been the time ofthe research participants, the funds ofthe granting agency, the space of the journals, and, not least, the general investment that society has made in supporting science and its practitioners.

CONDUCTING PSYCHOLOGICAL RESEARCH Let us turn first to considerations of research design, procedures employed in a study, and the recruitment of human participants. In evaluating the ethical employment of our participants, we can distinguish issues of safety from more subtle issues of research ethics. Obviously, research that is unsafe for participants is ethically questionable. However, I propose that perfectly safe research in which no participant will be put at risk may also
Address correspondence to Robert Rosenthal, Department of Psychology, Harvard University, 33 Kirkland St., Cambridge, MA 02138. VOL. 5, NO. 3, MAY 1994

127

PSYCHOLOGICAL SCIENCE

Science and Ethics


Issues of Recruitment

The Amencan Psychological Association's (APA) Committee for the Protection of Human Participants in Research and its new incarnation, the Committee on Standards in Research, and such pioneer scholars of the topic as Herbert Kelman have thoughtfully considered a variety of ethical issues in the selection and recruitment of human participants (APA, 1982; Blanck, Bellack, Rosnow, Rotheram-Borus, & Schooler, 1992; Grisso et al., 1991; Kelman, 1968). Only a few comments need be made here. On the basis of several reviews of the literature, my friend and colleague Ralph Rosnow and I have proposed a number of procedures designed to reduce volunteer bias and therefore increase the generality of our research results (Rosenthal & Rosnow, 1975, 1991; Rosnow & Rosenthal, 1993). Employment of these procedures has led us to think of our human participants as another "granting agency"which, we believe, they are, since they must decide whether to grant us their time, attention, and cooperation. Part of our treating them as such is to give them information about the long-term benefits of the research. In giving prospective participants this information, we have a special obligation to avoid hyperclaiming. Hyperclaiming Hyperclaiming is telling our prospective participants, our granting agencies, our colleagues, our administrators, and ourselves that our research is likely to achieve goals it is, in fact, unlikely to achieve. Presumably our granting agencies, our colleagues, and our administrators are able to evaluate our claims and hyperclaims fairly well. However, our prospective participants are not; therefore, we should tell them what our research can actually accomplish rather than that it will yield the cure for panic disorder, depression, schizophrenia, or cancer. Causism Closely related to hyperclaiming is the phenomenon of causism. Causism refers to the tendency to imply a causal relationship where none has been estabhshed (i.e., where the data do not support it).
Causism: Characteristics and Consequences

self-serving because it makes the causist's result appear more important or fundamental than it really is. If a perpetrator of causism is unaware of the causism, its presence simply reflects poor scientific training. If the perpetrator is aware of the causism, it reflects blatantly unethical misrepresentation and deception. Whereas well-trained colleagues can readily differentiate causist language from inferentially more accurate language, potential research participants or policymakers ordinarily cannot. When a description of a proposed research study is couched in causal language, that description represents an unfair recruitment device that is at best inaccurate, when it is employed out of ignorance, and at worst dishonest, when it is employed as hype to increase the participation rates of potential participants. As a member of an institutional review board, I regret that I have seen such use made of causist language in proposals brought before us.
Bad Science Makes for Bad Ethics

Characteristics of causism include (a) the absence of an appropriate evidential base; (b) the presence of language implying cause (e.g., "the effect of," "the impact of," "the consequence of," "as a result of") where the appropriate language would have been "was related to," "was predictable from," or "could be inferred from"; and (c) self-serving benefits to the causist. Causism is
128

Causism is only one example of bad science. Poor quality of research design, poor quality of data analysis, and poor quality of reporting of the research all lessen the ethical justification of any type of research project. I believe this judgment applies not only when deception, discomfort, or embarrassment of participants is involved, but for even the most benign research experience for participants. If because of the poor quality of the science no good can come of a research study, how are we to justify the use of participants' time, attention, and effort and the money, space, supplies, and other resources that have been expended on the research project? When we add to the "no good can come of it" argument the inescapable zero-sum nature of time, attention, effort, money, space, supplies, and other resources, it becomes difficult to justify poor-quality research on any ethical basis. Eor this reason, I believe that institutional review boards must consider the technical scientific competence of the investigators whose proposals they are asked to evaluate. Yes, that will increase the work required of board members and change boards' compositions somewhat to include a certain degree of methodological expertise. No, it will not always be easy to come to a decision about the scientific competence of an investigator and of a particular proposal, but then it is not always easy to come to a decision about the more directly ethical aspects of a proposal either. Poor quality of research makes for poor quality of education as well. Especially when participation is quasicoercive, the use of participants is usually justified in part by the fact that they will benefit educationally. But if participants are required to participate in poor-quality research, they are likely to acquire only misconceptions
VOL. 5, NO. 3, MAY 1994

PSYCHOLOGICAL SCIENCE

Robert Rosenthal about the nature of science and of psychology. When participants' scores on personality scales are correlated with their scores on standardized tests or course grades, and they are told that "this research is designed to learn the impact of personality on cognitive functioning," they have been poorly served educationally as part of having been misled scientifically.
Costs and Utilities

Payoffs for doing research When individual investigators or institutional review boards are confronted with a questionable research proposal, they ordinarily employ a cost-utility analysis in which the costs of doing a study, including possible negative effects on participants, time, money, supplies, effort, and other resources, are evaluated simultaneously against such utilities as benefits to participants, to other people at other times, to science, to the world, or at least to the investigator. The potential benefits of higher quality studies and studies addressing more important topics are greater than the potential benefits of lower quality studies and studies addressing less important topics. Rosnow Eind I have often diagrammed this type of cost-utility analysis as a two-dimensional plane in which costs are one dimension and utilities the other (Rosenthal & Rosnow, 1984, 1991; Rosnow, 1990). Any study with high utility and low cost should be carried out forthwith. Any study with low utility and high cost should not be carried out. Studies in which costs equal utilities are very difficult to decide about. Payoffs for failing to do research However, Rosnow and I have become convinced that this cost-utility model is insufficient because it fails to consider the costs (and utilities) of not conducting a particular study (Rosenthal & Rosnow, 1984, 1991; Rosnow, 1990; Rosnow & Rosenthal, 1993). The failure to conduct a study that could be conducted is as much an act to be evaluated on ethical grounds as is conducting a study. The oncology group that may have a good chance of finding a cancer preventive but feels the work is dull and a distraction from their real interest is making a decision that is to be evaluated on ethical grounds as surely as the decision of a researcher to investigate tumors with a procedure that carries a certain risk. The behavioral researcher whose study may have a good chance of reducing violence or racism or sexism, but who refuses to do the study simply because it involves deception, has not solved an ethical problem but only traded in one for another. The issues are, in principle, the same for the most basic as for the most applied research. In practice, however, it is more difficult to make even rough estimates of the probability of finding
VOL. 5, NO. 3, MAY 1994

the cancer cure or the racism reducer for the more basic as compared with the more applied research. This idea of lost opportunities has been applied with great eloquence by John Kaplan (1988), of the Stanford University Law School. The context of his remarks was the use of animals in research and the efforts of "animal rights" activists to chip away "at our ability to afford animal research. . . . [I]t is impossible to know the costs of experiments not done or research not undertaken. Who speaks for the sick, for those in pain, and for the future?" (p. 839). In the examples considered so far, the costs of failing to conduct the research have accrued to future generations or to present generations not including the research participants themselves. But sometimes there are incidental benefits to research participants that are so important that they must be considered in the calculus of the good, as in the following example: I was asked once to testify to an institutional review board (not my own) about the implications of my research for the ethics of a proposed project on karyotyping. The study was designed to test young children for the presence of the XYY chromosome, which had been hypothesized to be associated with criminal behavior. The youngsters would be followed up until adulthood so that the correlation between chromosome type and criminal behavior could be determined. I was asked to talk about my research on interpersonal expectancy effects because it was feared that if the research were not done doubleblind, the parents' or researchers' expectations for increased criminal behavior by the XYY males might become a self-fulfilling prophecy (Rosenthal, 1966; Rosenthal & Jacobson, 1968, 1992). A double-blind design should have solved that problem, but the board decided not to permit the research anyway. The enormous costs to the participants themselves of the study's not being done were not considered. What were those costs? The costs were the loss of 20 years of free, high-quality pediatric care to children whose parents could never have afforded any high-quality pediatric care. Was it an ethically defensible decision to deprive scores or hundreds of children of medical care they would otherwise not have received in order to avoid having a double-blind design that had very little potential for actually harming the participants? At the very least, these costs of failing to do the research should have received full discussion. They did not.
DATA ANALYSIS AS AN ETHICAL ARENA Data Dropping

Ethical issues in the analysis of data range from the very obvious to the very subtle. Probably the most ob129

PSYCHOLOGICAL SCIENCE

Science and Ethics viotis and most serious transgression is the analysis of data that never existed (i.e., that were fabricated). Perhaps more frequent is the dropping of data that contradict the data analyst's theory, prediction, or commitment. Outlier rejection There is a venerable tradition in data analysis of dealing with outliers, or extreme scores, a tradition going back over 200 years (Barnett & Lewis, 1978). Both technical and ethical issues are involved. The technical issues have to do with the best ways of dealing with outliers without reference to the implications for the tenability of the data analyst's theory. The ethical issues have to do with the relationship between the data analyst's theory and the choice of method for dealing with outliers. For example, there is some evidence to suggest that outliers are more likely to be rejected if they are bad for the data analyst's theory but treated less harshly if they are good for the data analyst's theory (Rosenthal, 1978; Rosenthal & Rubin, 1971). At the very least, when outliers are rejected, that fact should be reported. In addition, it would be useful to report in a footnote the results that would have been obtained had the outliers not been rejected. Subject selection A different type of data dropping is subject selection in which a subset of the data is not included in the analysis. In this case, too, there are technical issues and ethical issues. There may be good technical reasons for setting aside a subset ofthe datafor example, because the subset's sample size is especially small or because dropping the subset would make the data more comparable to some other research. However, there are also ethical issues, as when just those subsets are dropped that do not support the data analyst's theory. When a subset is dropped, we should be informed of that fact and what the results were for that subset. Similar considerations apply when the results for one or more variables are not reported.
Exploitation Is Beautiful

result not be significant at the .05 level, we were taught, we should bite our lips bravely, take our medicine, and definitely not look further at our data. Such a further look might turn up results significant at the .05 level, results to which we were not entitled. All this makes for a lovely morality play, and it reminds us of Robert Frost's poem about losing forever the road not taken, but it makes for bad science and for bad ethics. It makes for bad science because while snooping does affect p values, it is likely to turn up something new, interesting, and important (Tukey, 1977). It makes for bad ethics because data are expensive in terms of time, effort, money, and other resources and because the antisnooping dogma is wasteful of time, effort, money, and other resources. If the research was worth doing, the data are worth a thorough analysis, being held up to the light in many different ways so that our research participants, our funding agencies, our science, and society will all get their time and their money's worth. Before leaving this topic, I should repeat that snooping in the data can indeed affect the p value obtained, depending on how the snooping is done. But statistical adjustments, for example, Bonferroni adjustments (Estes, 1991; Howell, 1992; Rosenthal & Rubin, 1984), can be helpful here. Most important, replications will be neededwhether the data were snooped or not!
Meta-Analysis as an Ethical Imperative

That data dropping has ethical implications is fairly obvious. An issue that has more subtle ethical implications is exploitation. Exploiting research participants, students, postdoctoral fellows, staff, and colleagues is of course reprehensible. But there is a kind of exploitation to be cherished: the exploitation of data. Many of us have been taught that it is technically improper and perhaps even immoral to analyze and reanalyze our data in many ways (i.e., to snoop around in the data). We were taught to test the prediction with one particular preplanned test and take a result significant at the .05 level as our reward for a life well-lived. Should the
130

Meta-analysis is a set of concepts and procedures employed to summarize quantitatively any domain of research (Glass, McGaw, & Smith, 1981; Rosenthal, 1991). We know from both statistical and empirical research that, compared with traditional reviews ofthe literature, meta-analytic procedures are more accurate, comprehensive, systematic, and statistically powerful (Cooper & Rosenthal, 1980; Hedges & Olkin, 1985; Mosteller & Bush, 1954). Meta-analytic procedures use more of the information in the data, thereby yielding (a) more accurate estimates of the overall magnitude of the effect or relationship being investigated, (b) more accurate estimates of the overall level of significance of the entire research domain, and (c) more useful information about the variables moderating the magnitude of the effect or relationship being investigated. Retroactive increase of utilities Meta-analysis allows us to learn more from our data and therefore has a unique ability to increase retroactively the benefits ofthe studies being summarized. The costs of time, attention, and effort ofthe human participants employed in the individual studies entering into the meta-analysis are all more justified when their data enter into a meta-analysis. That is because the meta-analysis
VOL. 5, NO. 3, MAY 1994

PSYCHOLOGICAL SCIENCE

Robert Rosenthal increaises the utility of all the individual studies being summarized. Other costs of individual studiescosts of funding, supplies, space, investigator time and effort, and other resourcesare similarly more justified because the utility of individual studies is so increased by the borrowed strength obtained when information from more studies is combined in a sophisticated way. The failure to employ meta-analytic procedures when they could be used thus has ethical implications because the opportunity to increase the benefits of past individual studies has been forgone. In addition, when public funds or other resources are employed by scientists to prepare reviews of literatures, it is fair to ask whether those resources are being used wisely or ethically. Now that we know how to summarize literatures meta-analytically, it seems hardly justified to review a quantitative literature in the pre-meta-analytic, prequantitative manner. Money that funds a traditional review is not available to fund a meta-analytic review. It should be noted that a meta-analytic review is a good deal more than simply an overall estimate of the size of the basic effect. In particular, meta-analytic reviews try to explain the inevitable variation in the size of the effect obtained in different studies. Finally, it no longer seems acceptable to fund research studies that claim to contribute to the resolution of controversy (e.g., does Treatment A work?) unless the investigator has already conducted a meta-analysis to determine whether there really is a controversy. A new experiment to learn whether psychotherapy works in general is manifestly not worth doing given the metaanalytic results of Glass (1976) and his colleagues (Smith, Glass, & Miller, 1980). Until their meta-analytic work resolved the issue, the question of whether psychotherapy worked in general was indeed controversial. It is controversial no longer. Pseudocontroversies Meta-analysis resolves controversies primarily because it eliminates two common problems in the evaluation of replications. The first problem is the belief that when one study obtains a significant effect and a replication does not, we have a failure to replicate. That belief often turns out to be unfounded. A failure to replicate is properly measured by the magnitude of difference between the effect sizes of the two studies. The second problem is the belief that if there is a real effect in a situation, each study of that situation will show a significant effect. Actually, if the effect is quite substantial, say, r = .24, and each study employs a sample size of, say, 64, the power level is .50 (Cohen, 1962, 1988; Rosenthal, 1994; Sedlmeier & Gigerenzer, 1989). Given this situation, which is typical in psychology, there is only one chance in four that two investigations will both
VOL. 5. NO. 3, MAY 1994

get results significant at the .05 level. If three studies were carried out, there would be only one chance in eight that all three studies would yield significant effects, even though we know the effect in nature is both real and important in magnitude. Significance testing Meta-analytic procedures and the meta-analytic worldview increase the utility of the individual study by their implications for how and whether we do significance testing. Good meta-analytic practice shows little interest in whether the results of an individual study were significant or not at any particular critical level. Rather than recording for a study whether it reached such a level, say, p = .05, two-tailed, meta-analysts record the actual level of significance obtained. This is usually done not by recording the p value but by recording the standard normal deviate that corresponds to the p value. Thus, a result significant at the .05 level, one-tailed, in the predicted direction is recorded as Z = + 1.645. If it had been significant at the .05 level, one-tailed, but in the wrong or unpredicted direction, it would be recorded as Z = - 1.645 (i.e., with a minus sign to indicate that the result is in the unpredicted direction). Signed normal deviates are an informative characteristic of the result of a study presented in continuous rather than in dichotomous form. Their use (a) increases the information value of a study, which (b) increases the utility of the study and, therefore, (c) changes the cost-utility ratio and, hence, the ethical value of the study. Small effects are not small Another way in which meta-analysis increases research utility and, therefore, the ethical justification of research studies is by providing accurate estimates of effect sizes, effect sizes that can be of major importance even when they are so small as to have p- = .00. Especially when we have well-estimated effect sizes, it is valuable to assess their practical importance. The r^ method of effect size estimation does a poor job of this because an r^ of .00 can be associated with a treatment method that reduces death rates by as much as 7 per 100 lives lost (Rosenthal & Rubin, 1982). Once we are aware that effect size rs of .05, .10, and .20 (with r^s of .00, .01, and .04, respectively) may be associated with benefits equivalent to saving 5, 10, or 20 lives per 100 people, we can more accurately weigh the costs and utilities of undertaking any particular study.
REPORTING PSYCHOLOGICAL RESEARCH Misrepresentation of Findings

Mother nature makes it hard enough to learn her secrets, without the additional difficulty of being misled by
131

PSYCHOLOGICAL SCIENCE

Science and Ethics the report offindingsthat were not found or by inferences that are unfounded. Although all misrepresentations of findings are damaging to the progress of our science, some are more obviously unethical than others. Intentional misrepresentation The most blatant intentional misrepresentation is the reporting of data that never were (Broad & Wade, 1982). That behavior, if detected, ends (or ought to end) the scientific career of the perpetrator. A somewhat more subtle form of intentional misrepresentation occurs when investigators knowingly allocate to experimental or control conditions those participants whose responses are more likely to support the investigators' hypothesis. Another potential form of intentional misrepresentation occurs when investigators record the participants' responses without being blind to the participants' treatment condition, or when research assistants record the participants' responses knowing both the research hypothesis and the participants' treatment condition. Of course, if the research specifically notes the failure to run blind, there is no misrepresentation, but the design is unwise if it could have been avoided. Unintentional misrepresentation Various errors in the process of data collection can lead to unintentional misrepresentation. Recording errors, computational errors, and data analytic errors can all lead to inaccurate results that are inadvertent misrepresentations (Broad & Wade, 1982; Rosenthal, 1966). We would not normally even think of them as constituting ethical issues except for the fact that errors in the data decrease the utility of the research and thereby move the cost-utility ratio (which is used to justify the research on ethical grounds) in the unfavorable direction. Some cases of misrepresentation (usually unintentional) are more subtle. The use of causist language, discussed earlier, is one example. Even more subtle is the case of questionable generalizability. Questionable generalizability Suppose we want to compare the rapport-creating ability of female and male psychotherapists, as defined by their patients' ratings. We have available three female and three male therapists, to each of whom 10 patients were assigned at random. An analysis of variance yields three sources of variance: sex of therapist (df = 1), therapists nested within sex (df = 4), and patients nested within therapists (df = 54). A common way to analyze such data would be to divide the MS sex by MS patients to get an F test. In such a case, we have treated therapists as fixed effects. When, in our report ofthe research, we describeour results of, say,/^(l, 54) = 7.13, p = .01, we have done a study that is generalizable only to other pa132

tients treated by these six therapists, but not to any other therapists (Estes, 1991; Snedecor & Cochran, 1989).
Misrepresentation of Credit

I have been discussing misrepresentation of findings, or the issue of "what was really found?" In the present section, the focus is on the issue of "who really found it?" Problems of authorship Because so many papers in psychology, and the sciences generally, are multiauthored, it seems inevitable that there will be difficult problems of allocation of authorship credit. Who becomes a coauthor and who becomes a footnote? Among the coauthors, who is assigned first, last, or any other serial position in the listing? Such questions have been discussed in depth, and very general guidelines have been offered (APA, 1981, 1987; see also Costa & Gatz, 1992), but it seems that we could profit from further empirical studies in which authors, editors, referees, students, practitioners, and professors were asked to allocate authorship credit to people performing various functions in a scholarly enterprise. Problems of priority Problems of authorship are usually problems existing within research groups. Problems of priority are usually problems existing between research groups. A current example of a priority problem is the evaluation of the degree to which Robert C. Gallo and his colleagues were guilty of "intellectual appropriation" of a French research group's virus that was used to develop a blood test for HIV, the virus that is believed to cause AIDS (Palca, 1992). Priority problems also occur in psychology, where the question is likely to be not who first produced a virus but rather who first produced a particular idea.
Failing to Report or Publish

Sometimes the ethical question is not about the accuracy of what was reported or how credit should be allocated for what was reported, but rather about what was not reported and why it was not reported. The two major forms of failure to report, or censoring, are self-censoring and external censoring. Self-censoring Some self-censoring is admirable. When a study has been really badly done, it may be a service to the science and to society to simply start over. Some self-censoring is done for admirable motives but seems wasteful of information. For example, some researchers feel they should not cite their own (or other people's) unpublished data
VOL. 5, NO. 3, MAY 1994

PSYCHOLOGICAL SCIENCE

Robert Rosenthal because the data have not gone through peer review, I would argue that such data should indeed be cited and employed in meta-analytic computations as long as the data were well collected. There are also less admirable reasons for selfcensoring. Failing to report data that contradict one's earlier research, or one's theory or one's values, is poor science and poor ethics. One can always find or invent reasons why a study that came out unfavorably should not be reported: The subjects were just starting the course; the subjects were about to have an exam; the subjects had just had an exam; the subjects were just finishing the course; and so on, A good general policy good for science and for its integrityis to report all results shedding light on the original hypothesis or providing data that might be of use to other investigators. There is no denying that some results are more thrilling than others. If our new treatment procedure prevents or cures mental illness or physical illness, that fact may be worth more journal space or space in more prestigious journals than the result that our new treatment procedure does no good whatever. But that less thrilling finding should also be reported and made retrievable by other researchers who may need to know that finding. External censoring Both the progress and the slowing of progress in science depend on external censoring. It seems likely that sciences would be more chaotic than they are were it not for the censorship exercised by peers: by editors, by reviewers, and by program committees. All these gatekeepers help to keep the really bad science from clogging the pipelines of mainstream journals. There are two major bases for external censorship. The first is evaluation of the methodology employed in a research study, I strongly favor such external censorship. If the study is truly terrible, it probably should not be reported. The second major basis for external censorship is evaluation of the results. In my 35 years in psychology, I have often seen or heard it said of a study that "those results aren't possible" or "those results make no sense," Often when I have looked at such studies, I have agreed that the results are indeed implausible. However, that is a poor basis on which to censor the results. Censoring or suppressing results we do not like or do not believe to have high prior probability is bad science and bad ethics (Rosenthal, 1975, 1994),
CONCLUSION

been that the ethical quality of our research is not independent of the scientific quality of our research. Detailing some of the specifics of this general theme has, I hope, served two functions. First, I hope it has comforted the affiicted by showing how we can simultaneously improve the quality of our science and the quality of our ethics. Second, and finally, I hope it has affiicted the comfortable by reminding us that in the matter of improving our science and our ethics, there are miles to go before we sleep.
AcknowledgmentsThis article is based on an address invited by the Board of Scientific Affairs of the American Psychological Association (APA) and presented at the annual meeting of APA, Washington, D,C,, August 15, 1992, Preparation of this paper was supported in part by the Spencer Foundation; the content is solely the responsibility of the author, I thank Elizabeth Baldwin, Peter Blanck, and Ralph Rosnow for their encouragement and support.

REFERENCES
American Psychological Association, (1981), Ethical principles of psychologists, American Psychologist. 36. 633-638, American Psychological Association, (1982), Ethical principles in the conduct of research with human participants. Washington, DC; Author, American Psychological Association, (1987), Casebook on ethical principles of psychologists. Washington, DC: Author, Barnett, V., & Lewis, T, (1978), Outliers in statistical data. New York: Wiley, Blanck, P,D,, Bellack, A,S,, Rosnow, R,L,, Rotheram-Borus, M,J,, & Schooler, N,R, (1992), Scientific rewards and conflicts of ethical choices in human subjects research, American Psychologist. 47. 959,-965, Broad, W,, & Wade, N, (1982), Betrayers of the truth. New York: Simon and Schuster, Cohen, J, (1962), The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology. 65. 145-153, Cohen, J, (1988), Statistical power analysis for the behavioral sciences (2nd ed,), Hillsdale, NJ: Eribaum, Cook, T,D,, & Campbell, D.T, (1979), Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally, Cooper, H,M,, & Rosenthal, R, (1980), Statistical versus traditional procedures for summarizing research findings. Psychological Bulletin. 87. 442-449, Costa, M,M,, & Gatz, M, (1992), Determination of authorship credit in published dissertations. Psychological Science. 3. 354-357, Estes, W,K, (1991), Statistical models in behavioral research. Hillsdale, NJ: Eribaum, Glass, G,V, (1976), Primary, secondary, and meta-analysis of research. Educational Researcher. 5. 3-8, Glass, G,V, McGaw, B,, & Smith, M,L, (1981), Meta-analysis in social research. Beverly Hills, CA: Sage, Grisso, T,, Baldwin, E,, Blanck, P,D,, Rotheram-Borus, M,J,, Schooler, N,R,, & Thompson, T, (1991), Standards in research: APA's mechanism for monitoring the challenges, American Psychologist. 46. 758-766, Hedges, L,V,, & Olkin, I, 11985), Statistical methods for meta-analysis. New York: Academic Press. Howell, D,C, (1992), Statistical methods for psychology (3rd ed,), Boston: PWSKent, Kaplan, J, (1988), The use of animals in research. Science. 242. 839-840, Kelman, H,C, (1968), A time to speak: On human values and social research. San Francisco: Jossey-Bass, Mosteller, F,, & Bush, R,R, (1954), Selected quantitative techniques. In G, Lindzey (Ed,), Handbook of social psychology: Vol. I. Theory and method (pp, 289-334), Cambridge, MA: Addison-Wesley, Paica, J, (1992), "Verdicts" are in on the Gallo probe. Science. 256. 735-738, Rosenthal, R, (1966), Experimenter effects in behavioral research. New York: Appleton-Century-Crofts, Rosenthal, R, (1975), On balanced presentation of controversy, American Psychologist. 30. 937-938, Rosenthal, R, (1978), How often are our numbers wrong? American Psychologist. 33. 1005-1008,

The purpose of this article has been to discuss some scientific and ethical issues in conducting, analyzing, and reporting psychological research, A central theme has
VOL, 5, NO, 3, MAY 1994

133

PSYCHOLOGICAL SCIENCE

Science and Ethics


Rosenthal, R. (1991). Meta-analytic procedures for sociat research (rev. ed.)Newbury Park, CA: Sage. Rosenthal, R. (1994). On being one's own case study: Experimenter effects in behavioral research30 years later. In W.R. Shadish & S. Fuller (Eds.), The sociat psychology of science (pp. 214-229). New York: Guilford Press. Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York: Holt, Rinehart & Winston. Rosenthal, R., & Jacobson, L. (1992). Pygmalion in the classroom (expanded ed.). New York: Irvington. Rosenthal, R., & Rosnow, R.L. (1975). The volunteer subject. New York: Wiley. Rosenthal, R., & Rosnow, R.L. (1984). Applying Hamlet's question to the ethical conduct of research: A conceptual addendum. American Psychologist, 39, 561-563. Rosenthal, R., & Rosnow, R.L. (1991). Essentials of behavioral research: Methods and data analysis (2nd ed.). New York: McGraw-Hill. Rosenthal, R., & Rubin, D.B. (1971). Pygmalion reaffirmed. In J.D. Elashoff & R.E. Snow, Pygmalion reconsidered (pp. 139-155). Worthington, OH: CA. Jones. Rosenthal, R., & Rubin, D.B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of Educational Psychology, 74, 166169. Rosenthal, R., & Rubin, D.B. (1984). Multiple contrasts and ordered Bonferroni proceduTes. Journal of Educational Psychology, 76, 1028-1034. Rosnow, R.L. (1990). Teaching research ethics through role-play and discussion. Teaching of Psychology, 17, 179-181. Rosnow, R.L., & Rosenthal, R. (1993). Beginning behavioral research: A conceptual primer. New York: Macmillan. Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309-316. Smith, M.L., Glass, G.V. & Miller, T.I. (1980). The benefits of psychotherapy. Baltimore: Johns Hopkins University Press. Snedecor, G.W., & Cochran, W.G. (1989). Statistical methods (8th ed.). Ames: Iowa State University Press. Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

134

VOL. 5, NO. 3, MAY 1994

You might also like