Science Guardian

Truth, beauty and paradigm power in science and society

I am Nicolaus Copernicus, and I approve of this blog

I am Richard Feynman and I approve of this blog

News, views and reviews measured against professional literature in peer reviewed journals (adjusted for design flaws and bias), well researched books, authoritative encyclopedias (not the bowdlerized Wiki entries on controversial topics) and the investigative reporting and skeptical studies of courageous original thinkers among academics, philosophers, researchers, scholars, authors, and journalists.

Supporting the right of exceptional minds to free speech, publication, media coverage and funding against the crowd prejudice, leadership resistance, monetary influences and internal professional politics of the paradigm wars of cancer, HIV(not)AIDS, evolution, global warming, cosmology, particle physics, macroeconomics, information technology, religions and cults, health, medicine, diet and nutrition.

***************************************************

HONOR ROLL OF SCIENTIFIC TRUTHSEEKERS

Halton C. Arp wki/obit/txt/vds/txt/txt/bk/bk, Henry Bauer txt/blg/ blg/bks/bk/txt/bk/vd, John Beard bk, Harvey Bialy bk/bk/txt/txt/rdo/vd, John Bockris bio/txt/ltr/bk, Donald W. Braben, Peter Breggin ste/fb/col/bks, Darin Brown txt/txt/txt/txt/txt/vd, Giordano Bruno bk/bio/bio, Frank R. Buianouckas, Stanislav Burzynski mov, Erwin Chargaff bio/bk/bio/prs, James Chin bk/vd, Nicolaus Copernicus bk, Mark Craddock, Francis Crick vd, Paul Crutzen, Marie Curie, Rebecca Culshaw txt/bk, Roger Cunningham, Charles Darwin txts/bk, Erasmus Darwin txt//bk/txt/hse/bks, Peter Duesberg ste/ste/bk/txt/vd/vd, Freeman Dyson, Albert Einstein, Richard Feynman bio, John Fewster, Rosalind Franklin, Bernard Forscher tx, Galileo Galilei, Walter Gilbert vd, Goethe bio/bk/bio, Nicolas Gonzalez tlk/rec/stetxt/txt, Alec Gordon, James Hansen, Etienne de Harven bk/txt/vd, Alfred Hassig intw/txt, Robert G. Houston txt, Steven Jonas vd, Edward Jenner txt, Benjamin Jesty, Adrian Kent vd, Thomas Kuhn, Fred Kummerow, Stefan Lanka txt/txt/vd, Serge Lang, John Lauritsen vd, Paul Lauterbur vd, Mark Leggett, Richard Lindzen, James Lovelock, Andrew Maniotis, Lynn Margulis, Barbara McClintock, Christi Meyer vd, George Miklos, Marco Mamone Capria, Peter Medawar, Luc Montagnier txt/txt/vd, Kary Mullis, Linus Pauling prs/vd/vd, Eric Penrose, Roger Penrose vd, Max Planck, Rainer Plaga, David Rasnick /vd, Robert Root-Bernstein vd, Sherwood Rowland, Otto Rossler, Harry Rubin, Marco Ruggiero txt/txt/intw/vd, Bertrand Russell Carl Sagan vd, Erwin Schrodinger, Fred Singer, Barbara Starfield txt, Gordon Stewart txt/txt, Richard Strohman, Thomas Szasz, Nicola Tesla bio/bio, Charles Thomas intw/vd, Frank Tipler, James Watson vd/vd, Alfred Wegener vd, Edward O. Wilson vd.

ACADEMICS, DOCTORS, AUTHORS, REPORTERS AND COMMENTATORS WHO HAVE NOBLY AIDED REVIEW OF THE STATUS QUO

Jad Adams bk, Marci Angell bk/txt/txt/txt, Clark Baker ste/txt/rdo/vd, James Blodgett, Tony Brown vd, Hiram Caton txt/txt/txt/bk/ste, Jonathan Collin ste , Marcus Cohen, David Crowe vd, Margaret Cuomo, Stephen Davis BK/BK,/rdo, Michael Ellner vd, Elizabeth Ely txt/txt/ste, Epicurus, Dean Esmay, Celia Farber bio/txt/txt/txt/vd, Jonathan Fishbein txt/txt/wk, T.C.Fry, Michael Fumento, Max Gerson txt, Charles Geshekter vd, Michael Geiger, Roberto Giraldo, David Healy txt, Bob Herbert, Mike Hersee ste/rdo, Neville Hodgkinson txt /vd, James P. Hogan, Richard Horton bio/vd/vd, Christopher Hitchens, Eric Johnson, Claus Jensen vd, Phillip Johnson, Coleman Jones vds, William Donald Kelley, Ernst T. Krebs Sr txt, Ernst T. Krebs Jr. txt,/bio/txt/txt/ltr, Paul Krugman, Brett Leung MOV/ste/txt/txt/tx+vd/txt, Katie Leishman, Anthony Liversidge blg/intv/intv/txt/txts/txt/intv/txt/vd/vd, Bruce Livesey txt, James W. Loewen, Frank Lusardi, Nathaniel Lehrman vd, Christine Maggiore bk/ste/rec/rdo/vd, Noreen Martin vd, Robert Maver txt/itw, Eric Merola MOV, Lady Mary Wortley Montagu, Michael Moore bio/MOV/MOV/MOV, Gordon Moran, Ralph Nader bk, Ralph Moss txt/blg/ste/bks, Gary Null /txt/rdo/vd, Dan Olmsted wki, Toby Ord vd, Charles Ortleb bk/txt/bk/intw/flm, Neenyah Ostrom bk, Dennis Overbye, Mehmet Dr Oz vd, Eleni Papadopulos-Eleopulos ste/vd, Maria Papagiannidou bk, Thomas Piketty bk/bk/bk/bk/bk/bk/bk/bk/bk/bk, Robert Pollin txt/vd/bk, Jon Rappoport bio/bk/bk/ste/bk/bk/vd, Janine Roberts bk/bk, Luis Sancho vd, Liam Scheff ste/txt/bk/bk/rdio/vd, John Scythes, Casper Schmidt txt/txt, Joan Shenton vd/vd, Joseph Sonnabend vd, John Stauber, David Steele, Joseph Stiglitz bk/txt, Will Storr rdo Wolfgang Streeck, James P. Tankersley ste, Gary Taubes vd, Mwizenge S. Tembo, John Tierney vd, Michael Tracey, Valendar Turner rec, Jesse Ventura bk, Michael Verney-Elliott bio/vds/vd, Voltaire, Walter Wagner, Andrew Weil vd, David Weinberger bio/bk/blg/blg/BK/bk/pds, Robert Willner bk/txt/txt/vd, Howard Zinn.

*****************************************************
I am Albert Einstein, and I heartily approve of this blog, insofar as it seems to believe both in science and the importance of intellectual imagination, uncompromised by out of date emotions such as the impulse toward conventional religious beliefs, national aggression as a part of patriotism, and so on.   As I once remarked, the further the spiritual evolution of mankind advances, the more certain it seems to me that the path to genuine religiosity does not lie through the fear of life, and the fear of death, and blind faith, but through striving after rational knowledge.   Certainly the application of the impulse toward blind faith in science whereby authority is treated as some kind of church is to be deplored.  As I have also said, the only thing that ever interfered with my learning was my education. I am Freeman Dyson, and I approve of this blog, but would warn the author that life as a heretic is a hard one, since the ignorant and the half informed, let alone those who should know better, will automatically trash their betters who try to enlighten them with independent thinking, as I have found to my sorrow in commenting on "global warming" and its cures.
Many people would die rather than think – in fact, they do so. – Bertrand Russell.

The progress of science is strewn, like an ancient desert trail, with the bleached skeletons of discarded theories which once seemed to possess eternal life. - Arthur Koestler

One should as a rule respect public opinion in so far as is necessary to avoid starvation and to keep out of prison. – Bertrand Russell

A sudden bold and unexpected question doth many times surprise a man and lay him open. – Sir Francis Bacon (1561 – 1626)

He who knows only his own side of the case, knows little of that. – John Stuart Mill

No problem can withstand the assault of sustained thinking. – Voltaire

Might the simple maxim, that honesty is the best policy be laid to heart! Might a sense of the true aims of life elevate the tone of politics and trade, till public and private honor become identical! – Margaret Fuller Ossoli

Although science has led to the generally high living standards that most of the industrialized world enjoys today, the astounding discoveries underpinning them were made by a tiny number of courageous, out-of-step, visionary, determined, and passionate scientists working to their own agenda and radically challenging the status quo. – Donald W. Braben

(Click for more Unusual Quotations on Science and Belief)

IMPORTANT: THIS SITE IS BEST VIEWED ONLY IN VERY LARGE FONT
All posts guaranteed fact checked according to reference level cited, typically the original journal studies. Further guide to site purpose, layout and how to print posts out is in the lower blue section at the bottom of the home page.

Sloppy science everywhere

Hotz at Journal initiates wave of media coverage of error in science

Hotter the field, the more bias

Most studies wrong

error-sign.jpegAttentive perusers of this modest blog may have noticed that we recently expanded its subhead to include the thought that while we base our critique of the public claims of Robert Gallo, Anthony Fauci, John P. Moore, Mark Wainberg, Nancy Padian and other highly decorated generals of the HIV∫AIDS salvation army on the peer-reviewed literature, a certain caveat is in order.

Not everything which finds its way into science and medical journals, even the top ones, is totally reliable, because even if the authors are not conscious of being emotionally flawed human beings subject to all the warping influences listed in the blogo above, their best efforts would still include bad design, inadvertent error and unconscious “data management”, perhaps because they make false assumptions at the start of the study, a habit which is universal in HIV∫AIDS.

As we have mentioned earlier one of the more distinguished scientists we have been privileged to interview, the renowned Harvard researcher and Nobel prize winner Walter Gilbert, once confided to us that whenever he embarked on a new investigation prompted by someone else’s paper he would always try to repeat the experiment himself, and was surprisingly often chagrined to find that he couldn’t.

And in our early efforts to report on the objections raised by the equally distinguished retrovirology researcher Peter Duesberg of Berkeley to the theoretical kite flown by Robert Gallo in 1984 in AIDS, the unlikely notion that the ugly and fatal new syndrome of immune collapse was cause by an infectious virus eventually labeled Human Immunodeficiency Virus, unfortunately immediately backed by the federal government and thus rendered sacrosanct, we were taken aback by the deep analysis of papers in HIV∫AIDS that the Berkeley professor frequently explained to us privately which showed they were badly done and poorly argued and as a result entirely misleading, even if one accepted the uncritical assumption that they were all based upon, that HIV was the right culprit for the new and appalling disease.

Politely ignoring a huge problem

We also noted, however, that in arguing against the HIV=AIDS paradigm, professor Duesberg did not at first rely on exposing the shoddiness of the papers that resulted from it. He would directly undermine the paradigm by accepting the data and conclusions of the literature, and then show how the paradigm did not stack up – in fact was contradicted by the very papers that were claimed to shore it up.

Only later was he forced to show how many major results were based on poorly designed studies which were misinterpreted, an obligation unfairly thrust upon him in answering the somewhat specious demand, Well if it isn’t HIV, what is it that causes AIDS, then? The demand is specious because so much of the literature is based on the assumption that it is HIV which is the villain in the drama, that most of it will have to be redone without that assumption to nail down the real and obviously multiple causes of immune failure in all five continents, with all their disparate symptoms and epidemiology.

A mudslide of articles about error

Anyhow we are pleased to notice that a rash of articles came out this week publicizing this little noticed fact, that it is not simply fraud which occasionally corrupts the peer-reviewed literature, it is the inadequacy of peer review, which often lets go by papers which should have been corrected or redone, whose conclusions are unreliable.

Needless to say, one of the marks of the horrendously incompetent science reporting carried out in the media – reporting that mostly doesn’t rise above the level of noting down and publishing what sources say without it passing through the critical faculties of the reporter, assuming that these even exist, let alone actually double checking it with critics in the traditional manner observed in every other field of public affairs – is that none of the top reporters whose specialty HIV=AIDS is, with the exception of HIV skeptic Celia Farber in Harpers, and of course HIV skeptic Liam Scheff elsewhere, has shown any interest whatsoever in the possibility that research in the field is questionable.

It is as if they either didn’t know, or have given the NIAID under the firm control of Lasker winner Dr Anthony Fauci a free pass, for some reason, possibly one associated with the undeniable hostility of that public servant to such notions.

How wrong it is to assume that published, peer reviewed science is scripture engraved in tablets of stone is well known to those familiar with the Baltimore scandal, where Nobelist David Baltimore blocked retraction of an incorrect paper with his name on it for years until three Congressional investigations finally prised his protective grip from it. Whether error or knowing fraud (by the lead author, not Baltimore) was involved was not quite made clear, but the subsequent book by Daniel Kevles exonerated Dr Baltimore sufficiently that having been ignominiously kicked out of the presidency of Rockefeller University, eventually the renowned researcher was able to be reinstated in the eyes of the public with the presidency of Caltech, from which he recently retired, where professor Kevles also moved from Yale.

Hotz’ hot column points to Ioannidis’s white hot essay

But fraud is not an interesting subject to contemplate, even if the cases of it which are occasionally exposed in the public prints are often spectacular, as in the case of the downfall of the Korean gentleman recently. The important point is that bad but not intentionally fraudulent science gets into print even in the top journals, as HIV/AIDS has shown in its own spectacular fashion, and science reporters seem universally unaware of this possibility. Now however, we have more than one article suddenly acknowledging this problem.

The first was last Friday, when the Wall Street Journal printed a column by Robert Lee Hotz, Most Science Studies Appear to Be Tainted By Sloppy Analysis, reporting on the work of John Ioannidis, an epidemiologist who studies research methods at the University of Ioannina School of Medicine in Greece and Tufts University in Medford, Mass.

ioannidis.jpgIoannidis has documented how the conclusions of thousands of peer-reviewed research papers may be invalid because the research is inept. In fact, he is the star of the Public Library of Science, where his stunningly honest essay of 2005, Why Most Published Research Findings Are False is their most downloaded technical paper, which is clearly what prompted Hotz’ column.

The essay is a strong contrast with Tara C. Smith and Steven Novella’s froglike masterpiece which we are deconstructing here when more important matters do not obtrude, as in the case of this exemplary piece of research based, logically sound, statistically formulated and politically sophisticated scientific commentary:

Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias….

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false. …..

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced…..

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u.…..


Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research [26], and typically they are inadequately and sparsely reported [26,27]. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable [28].


Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true….

Most Research Findings Are False for Most Research Designs and for Most Fields

Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias

Traditionally, investigators have viewed large and highly significant effects with excitement, as signs of important discoveries. Too large and too highly significant effects may actually be more likely to be signs of large bias in most fields of modern research. They should lead investigators to careful critical thinking about what might have gone wrong with their data, analyses, and results.

Of course, investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.” However, other lines of evidence, or advances in technology and experimentation, may lead eventually to the dismantling of a scientific field….

How Can We Improve the Situation?

Is it unavoidable that most research findings are false, or can we improve the situation? A major problem is that it is impossible to know with 100% certainty what the truth is in any research question….

Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research.

What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve.

Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate [10]. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test [36].

Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large.

Human error in papers

Here is how Hotz in Most Science Studies Appear to Be Tainted By Sloppy Analysis told the many readers of the pragmatic Wall Street Journal about this problem, thus ensuring that many investors, lawyers, and other people who need realistic information about scientific claims of world pandemics are now aware that scientists’ pronouncements, and their published literature, may have to be double checked for accuracy, since the New York Times has a habit of not bothering to do so, not having the money or inclination to employ factcheckers since it trusts its reporters to get it right, since they have instant access after all to the top gurus of every field, and judging from their public appearances do not appear to be overworked:

Most Science Studies Appear to Be Tainted By Sloppy Analysis

We all make mistakes and, if you believe medical scholar John Ioannidis, scientists make more than their fair share. By his calculations, most published research findings are wrong.

Dr. Ioannidis is an epidemiologist who studies research methods at the University of Ioannina School of Medicine in Greece and Tufts University in Medford, Mass. In a series of influential analytical reports, he has documented how, in thousands of peer-reviewed research papers published every year, there may be so much less than meets the eye.

These flawed findings, for the most part, stem not from fraud or formal misconduct, but from more mundane misbehavior: miscalculation, poor study design or self-serving data analysis. “There is an increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims,” Dr. Ioannidis said. “A new claim about a research finding is more likely to be false than true.”

The hotter the field of research the more likely its published findings should be viewed skeptically, he determined.

…”There is an increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims,” Dr. Ioannidis said. “A new claim about a research finding is more likely to be false than true.”

Hotz dug around and found plenty of agreement with what Ioannidis is saying, and plenty of material to confirm what the Greek American researcher has found in his many reports:

Take the discovery that the risk of disease may vary between men and women, depending on their genes. Studies have prominently reported such sex differences for hypertension, schizophrenia and multiple sclerosis, as well as lung cancer and heart attacks. In research published last month in the Journal of the American Medical Association, Dr. Ioannidis and his colleagues analyzed 432 published research claims concerning gender and genes (Drs. Nikolaos A. Patsopoulos, Athina Tatsioni and John Ioannidis analyzed claims of genetic risk and sex differences in “Claims of Sex Differences: An Empirical Assessment in Genetic Associations,”3 (abstract; login required for full text) published in the Journal of the American Medical Association last month).

Upon closer scrutiny, almost none of them held up. Only one was replicated.

What’s going wrong? The key problem is one most observers of science are well aware of, and that is that science advances hypothesis by hypothesis, which tends to translate into hope by hope, and the data tends to support a new hypothesis unless studies are carefully done to banish that effect:

Statistically speaking, science suffers from an excess of significance. Overeager researchers often tinker too much with the statistical variables of their analysis to coax any meaningful insight from their data sets. “People are messing around with the data to find anything that seems significant, to show they have found something that is new and unusual,” Dr. Ioannidis said.

In the U. S., research is a $55-billion-a-year enterprise that stakes its credibility on the reliability of evidence and the work of Dr. Ioannidis strikes a raw nerve. In fact, his 2005 essay “Why Most Published Research Findings Are False” remains the most downloaded technical paper that the journal PLoS Medicine has ever published.

“He has done systematic looks at the published literature and empirically shown us what we know deep inside our hearts,” said Muin Khoury, director of the National Office of Public Health Genomics at the U.S. Centers for Disease Control and Prevention. “We need to pay more attention to the replication of published scientific results.”

Every new fact discovered through experiment represents a foothold in the unknown. In a wilderness of knowledge, it can be difficult to distinguish error from fraud, sloppiness from deception, eagerness from greed or, increasingly, scientific conviction from partisan passion. As scientific findings become fodder for political policy wars over matters from stem-cell research to global warming, even trivial errors and corrections can have larger consequences.

Still, other researchers warn not to fear all mistakes. Error is as much a part of science as discovery. It is the inevitable byproduct of a search for truth that must proceed by trial and error. “Where you have new areas of knowledge developing, then the science is going to be disputed, subject to errors arising from inadequate data or the failure to recognize new matters,” said Yale University science historian Daniel Kevles. Conflicting data and differences of interpretation are common.

Now in his well worded piece Hotz comes to the point where HIV/AIDS critics will sit up and applaud:(our boldface)

To root out mistakes, scientists rely on each other to be vigilant. Even so, findings too rarely are checked by others or independently replicated. Retractions, while more common, are still relatively infrequent. Findings that have been refuted can linger in the scientific literature for years to be cited unwittingly by other researchers, compounding the errors.

Stung by frauds in physics, biology and medicine, research journals recently adopted more stringent safeguards to protect at least against deliberate fabrication of data. But it is hard to admit even honest error. Last month, the Chinese government proposed a new law to allow its scientists to admit failures without penalty. Next week, the first world conference on research integrity convenes in Lisbon.

Overall, technical reviewers are hard-pressed to detect every anomaly. On average, researchers submit about 12,000 papers annually just to the weekly peer-reviewed journal Science. Last year, four papers in Science were retracted. A dozen others were corrected.

No one actually knows how many incorrect research reports remain unchallenged.

Earlier this year, informatics expert Murat Cokol and his colleagues at Columbia University sorted through 9.4 million research papers at the U.S. National Library of Medicine published from 1950 through 2004 in 4,000 journals. By raw count, just 596 had been formally retracted, Dr. Cokol reported.

“The correction isn’t the ultimate truth either,” Prof. Kevles said.

Well, how many were wrong? That is the unanswered question. If all the papers on HIV/AIDS were immediately retracted because HIV is clearly not involved in causing immune collapse, Science would be crippled as a reference source, and science would lose much of its credibility. An honest error on the part of the editors, perhaps, but inexcusable as long as they claim the role of the gatekeepers and the watchdogs of science.

All of this speaks for the credibility of the well qualified critics of the paradigm in HIV=AIDS and the unusual attention they have paid to the quality of the research papers which support it, where they have found a remarkable level of data mismanagement, poor design and misleading conclusions. Yet their case is typically dismissed by paradigm defenders such as Tara Smoth of Iowa, Steve Connall of Yale, John P. Moore of Weill Cornell with scorn and derision, rather than scientific arguments. The public likewise assumes that the literature is thoroughly validated by peer review.

Now the public has been informed by one prominent newspaper, perhaps the most trusted daily now, that something is rotten in the state of science, and that they should proceed with caution before dismissing all challenges to mainstream science as if they were all ignorant creationism. After all, it is clear now that the paradigm HIV causes AIDS would have been universally discredited long ago but for the papers universally based on the assumption they are used to support.

What’s to be done?

Most people, including almost all the scientists in a field, are unlikely to examine a paper closely enough to find its faults. One wonders just how many beliefs would be dashed if they did. Dr Ioannidis has already found that the new paradigm that the sexes differ in their risk of disease according to their gender is based on 432 studies of which only one was able to be replicated and proven valid.

It is difficult to know what to trust until all the papers on a topic are thoroughly reviewed for bias, and there is no field where bias is so blatant as HIV/AIDS, where scientists such as Moore and Wainberg are so proud of it that Wainberg has suggested imprisonment for the reviewers.

Apparently in one later paper in another PLoS Medicine article earlier this year, Ramal Moonesinghe and Muin Khoury at the U.S. Centers for Disease Control and Prevention demonstrated that the likelihood of a published research result being true increases when that finding has been repeatedly replicated in multiple studies. The article is: “Most Published Research Findings Are False — But a Little Replication Goes a Long Way.

But with bias and preconceptions playing a big part obviously repetition is not enough. Raising the level of awareness among scientists and the public of the fallibility of science is key. Lets hope that the Conference last week in the world capital of port, the world’s most delicious liqueur, started some greater awareness of the problem and improvement of the situation in science. The European Science Foundation and the Office of Research Integrity held a world conference on research integrity in Lisbon, Portugal, Sept. 16-19, 2007, which included papers on best practices, training researchers, and the role played by academic journals).

Gee, we wondered if anyone mentioned HIV/AIDS in this context? Not only is it a field where bias in favor of the unproven and unsubstantiated hypothesis is so rife that every paper is imbued with it, and researchers flaunt their bias as if it was a badge of honor, but as regards testing drugs, there haven’t been any controls in any study after the AZT study was called to a sudden halt twenty years ago because the benefit was so powerfully assumed by gay activists that they insisted that the scientists release the drug immediately without further testing because it would be unfair to withhold it from the placebo control group, who were already finding ways to take it.

This blatant lack of controls is one reason why the drugs in AIDS are not recognized as being as lethal as general studies of the welfare of patients show they are, with half of current AIDS deaths due to the drugs and not to AIDS proper, whatever the cause of that is.

Of course, to those unaware that the scientific literature is subject to human error, that last phrase will come as a surprise.

Here is Hotz’s piece for reference:
September 14, 2007

SCIENCE JOURNAL
By ROBERT LEE HOTZ

Most Science Studies
Appear to Be Tainted
By Sloppy Analysis
September 14, 2007; Page B1

We all make mistakes and, if you believe medical scholar John Ioannidis, scientists make more than their fair share. By his calculations, most published research findings are wrong.

Dr. Ioannidis is an epidemiologist who studies research methods at the University of Ioannina School of Medicine in Greece and Tufts University in Medford, Mass. In a series of influential analytical reports, he has documented how, in thousands of peer-reviewed research papers published every year, there may be so much less than meets the eye.

These flawed findings, for the most part, stem not from fraud or formal misconduct, but from more mundane misbehavior: miscalculation, poor study design or self-serving data analysis. “There is an increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims,” Dr. Ioannidis said. “A new claim about a research finding is more likely to be false than true.”

The hotter the field of research the more likely its published findings should be viewed skeptically, he determined.

Take the discovery that the risk of disease may vary between men and women, depending on their genes. Studies have prominently reported such sex differences for hypertension, schizophrenia and multiple sclerosis, as well as lung cancer and heart attacks. In research published last month in the Journal of the American Medical Association, Dr. Ioannidis and his colleagues analyzed 432 published research claims concerning gender and genes.
——————————-
RECOMMENDED READING

–by Robert Lee Hotz
[Recommended Reading] Drs. Nikolaos A. Patsopoulos, Athina Tatsioni and John Ioannidis analyzed claims of genetic risk and sex differences in “Claims of Sex Differences: An Empirical Assessment in Genetic Associations,”3 (abstract; login required for full text) published in the Journal of the American Medical Association last month.
* * *
Dr. John Ioannidis argued that false findings may be the majority of published research claims, in “Why Most Published Research Findings Are False,”4 in the PLoS Medicine journal, in August 2005.
* * *
In another PLoS Medicine article earlier this year, Ramal Moonesinghe and Muin Khoury at the U.S. Centers for Disease Control and Prevention demonstrated that the likelihood of a published research result being true increases when that finding has been repeatedly replicated in multiple studies. The article is: “Most Published Research Findings Are False — But a Little Replication Goes a Long Way.”5
* * *
The Office of Research Integrity6 promotes integrity in biomedical and behavioral research supported by the U.S. Public Health Service at about 4,000 institutions world-wide.
* * *
The European Science Foundation and the Office of Research Integrity are holding a world conference on research integrity7 in Lisbon, Portugal, Sept. 16-19, 2007. The invited researchers will be presenting papers on best practices, training researchers, and the role played by academic journals.
———————————————————————

Upon closer scrutiny, almost none of them held up. Only one was replicated.

Statistically speaking, science suffers from an excess of significance. Overeager researchers often tinker too much with the statistical variables of their analysis to coax any meaningful insight from their data sets. “People are messing around with the data to find anything that seems significant, to show they have found something that is new and unusual,” Dr. Ioannidis said.

In the U. S., research is a $55-billion-a-year enterprise that stakes its credibility on the reliability of evidence and the work of Dr. Ioannidis strikes a raw nerve. In fact, his 2005 essay “Why Most Published Research Findings Are False” remains the most downloaded technical paper that the journal PLoS Medicine has ever published.

“He has done systematic looks at the published literature and empirically shown us what we know deep inside our hearts,” said Muin Khoury, director of the National Office of Public Health Genomics at the U.S. Centers for Disease Control and Prevention. “We need to pay more attention to the replication of published scientific results.”

Every new fact discovered through experiment represents a foothold in the unknown. In a wilderness of knowledge, it can be difficult to distinguish error from fraud, sloppiness from deception, eagerness from greed or, increasingly, scientific conviction from partisan passion. As scientific findings become fodder for political policy wars over matters from stem-cell research to global warming, even trivial errors and corrections can have larger consequences.

Still, other researchers warn not to fear all mistakes. Error is as much a part of science as discovery. It is the inevitable byproduct of a search for truth that must proceed by trial and error. “Where you have new areas of knowledge developing, then the science is going to be disputed, subject to errors arising from inadequate data or the failure to recognize new matters,” said Yale University science historian Daniel Kevles. Conflicting data and differences of interpretation are common.

To root out mistakes, scientists rely on each other to be vigilant. Even so, findings too rarely are checked by others or independently replicated. Retractions, while more common, are still relatively infrequent. Findings that have been refuted can linger in the scientific literature for years to be cited unwittingly by other researchers, compounding the errors.

Stung by frauds in physics, biology and medicine, research journals recently adopted more stringent safeguards to protect at least against deliberate fabrication of data. But it is hard to admit even honest error. Last month, the Chinese government proposed a new law to allow its scientists to admit failures without penalty. Next week, the first world conference on research integrity convenes in Lisbon.

Overall, technical reviewers are hard-pressed to detect every anomaly. On average, researchers submit about 12,000 papers annually just to the weekly peer-reviewed journal Science. Last year, four papers in Science were retracted. A dozen others were corrected.

No one actually knows how many incorrect research reports remain unchallenged.

Earlier this year, informatics expert Murat Cokol and his colleagues at Columbia University sorted through 9.4 million research papers at the U.S. National Library of Medicine published from 1950 through 2004 in 4,000 journals. By raw count, just 596 had been formally retracted, Dr. Cokol reported.

“The correction isn’t the ultimate truth either,” Prof. Kevles said.

Email me at ScienceJournal@wsj.com9.
URL for this article:

http://online.wsj.com/article/SB118972683557627104.html

Hyperlinks in this Article:
(1) http://forums.wsj.com/viewtopic.php?t=809
(2) http://forums.wsj.com/viewtopic.php?t=809
(3) http://jama.ama-assn.org/cgi/content/short/298/8/880
(4) http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pmed.0020124
(5) http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10.1371%2Fjournal.pmed.0040028
(6) http://ori.dhhs.gov/
(7) http://www.esf.org/activities/esf-conferences/details/confdetail242/conference-information.html
(8) http://www.esf.org/activities/esf-conferences/details/confdetail242/invited-papers-biographies.html
(9) mailto:ScienceJournal@wsj.com
Copyright 2007 Dow Jones & Company, Inc. All Rights Reserved
Blog Posts About This Topic
• Food For Thought borjas.typepad.com
• Bogus research halfsigma.com
More related content Powered by Sphere

Here for reference is the complete essay by Ionnadis, Why Most Published Research Findings Are False. The boldface is added by NAR to highlight key passages:
Send your best work to PLoS Medicine
PLoS Medicine
A peer-reviewed, open-access journal published by the Public Library of Science

ESSAY

Why Most Published Research Findings Are False

John P. A. Ioannidis

Summary

There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.

Competing Interests: The author has declared that no competing interests exist.

Citation: Ioannidis JPA (2005) Why Most Published Research Findings Are False. PLoS Med 2(8): e124 doi:10.1371/journal.pmed.0020124

Published: August 30, 2005

Copyright: © 2005 John P. A. Ioannidis. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abbreviation: PPV, positive predictive value

John P. A. Ioannidis is in the Department of Hygiene and Epidemiology, University of Ioannina School of Medicine, Ioannina, Greece, and Institute for Clinical Research and Health Policy Studies, Department of Medicine, Tufts-New England Medical Center, Tufts University School of Medicine, Boston, Massachusetts, United States of America. E-mail: jioannid@cc.uoi.gr

Published research findings are sometimes refuted by subsequent evidence, with ensuing confusion and disappointment. Refutation and controversy is seen across the range of research designs, from clinical trials and traditional epidemiological studies [1–3] to the most modern molecular research [4,5]. There is increasing concern that in modern research, false findings may be the majority or even the vast majority of published research claims [6–8]. However, this should not be surprising. It can be proven that most claimed research findings are false. Here I will examine the key factors that influence this problem and some corollaries thereof.

Modeling the Framework for False Positive Findings

Several methodologists have pointed out [9–11] that the high rate of nonreplication (lack of confirmation) of research discoveries is a consequence of the convenient, yet ill-founded strategy of claiming conclusive research findings solely on the basis of a single study assessed by formal statistical significance, typically for a p-value less than 0.05. Research is not most appropriately represented and summarized by p-values, but, unfortunately, there is a widespread notion that medical research articles should be interpreted based only on p-values. Research findings are defined here as any relationship reaching formal statistical significance, e.g., effective interventions, informative predictors, risk factors, or associations. “Negative” research is also very useful. “Negative” is actually a misnomer, and the misinterpretation is widespread. However, here we will target relationships that investigators claim exist, rather than null findings.

It can be proven that most claimed research findings are false.

As has been shown previously, the probability that a research finding is indeed true depends on the prior probability of it being true (before doing the study), the statistical power of the study, and the level of statistical significance [10,11]. Consider a 2 × 2 table in which research findings are compared against the gold standard of true relationships in a scientific field. In a research field both true and false hypotheses can be made about the presence of relationships. Let R be the ratio of the number of “true relationships” to “no relationships” among those tested in the field. R is characteristic of the field and can vary a lot depending on whether the field targets highly likely relationships or searches for only one or a few true relationships among thousands and millions of hypotheses that may be postulated. Let us also consider, for computational simplicity, circumscribed fields where either there is only one true relationship (among many that can be hypothesized) or the power is similar to find any of the several existing true relationships. The pre-study probability of a relationship being true is R/(R + 1). The probability of a study finding a true relationship reflects the power 1 − β (one minus the Type II error rate). The probability of claiming a relationship when none truly exists reflects the Type I error rate, α. Assuming that c relationships are being probed in the field, the expected values of the 2 × 2 table are given in Table 1. After a research finding has been claimed based on achieving formal statistical significance, the post-study probability that it is true is the positive predictive value, PPV. The PPV is also the complementary probability of what Wacholder et al. have called the false positive report probability [10]. According to the 2 × 2 table, one gets PPV = (1 − β)R/(R − βR + α). A research finding is thus more likely true than false if (1 − β)R > α. Since usually the vast majority of investigators depend on α = 0.05, this means that a research finding is more likely true than false if (1 − β)R > 0.05.

Table 1. Research Findings and True Relationships

What is less well appreciated is that bias and the extent of repeated independent testing by different teams of investigators around the globe may further distort this picture and may lead to even smaller probabilities of the research findings being indeed true. We will try to model these two factors in the context of similar 2 × 2 tables.

Bias

First, let us define bias as the combination of various design, data, analysis, and presentation factors that tend to produce research findings when they should not be produced. Let u be the proportion of probed analyses that would not have been “research findings,” but nevertheless end up presented and reported as such, because of bias. Bias should not be confused with chance variability that causes some findings to be false by chance even though the study design, data, analysis, and presentation are perfect. Bias can entail manipulation in the analysis or reporting of findings. Selective or distorted reporting is a typical form of such bias. We may assume that u does not depend on whether a true relationship exists or not. This is not an unreasonable assumption, since typically it is impossible to know which relationships are indeed true. In the presence of bias (Table 2), one gets PPV = ([1 − β]R + uβR)/(R + α − βR + u − uα + uβR), and PPV decreases with increasing u, unless 1 − β ≤ α, i.e., 1 − β ≤ 0.05 for most situations. Thus, with increasing bias, the chances that a research finding is true diminish considerably. This is shown for different levels of power and for different pre-study odds in Figure 1.
Figure 1. PPV (Probability That a Research Finding Is True) as a Function of the Pre-Study Odds for Various Levels of Bias, u

Panels correspond to power of 0.20, 0.50, and 0.80.

Table 2. Research Findings and True Relationships in the Presence of Bias

Conversely, true research findings may occasionally be annulled because of reverse bias. For example, with large measurement errors relationships are lost in noise [12], or investigators use data inefficiently or fail to notice statistically significant relationships, or there may be conflicts of interest that tend to “bury” significant findings [13]. There is no good large-scale empirical evidence on how frequently such reverse bias may occur across diverse research fields. However, it is probably fair to say that reverse bias is not as common. Moreover measurement errors and inefficient use of data are probably becoming less frequent problems, since measurement error has decreased with technological advances in the molecular era and investigators are becoming increasingly sophisticated about their data. Regardless, reverse bias may be modeled in the same way as bias above. Also reverse bias should not be confused with chance variability that may lead to missing a true relationship because of chance.

Testing by Several Independent Teams

Several independent teams may be addressing the same sets of research questions. As research efforts are globalized, it is practically the rule that several research teams, often dozens of them, may probe the same or similar questions. Unfortunately, in some areas, the prevailing mentality until now has been to focus on isolated discoveries by single teams and interpret research experiments in isolation. An increasing number of questions have at least one study claiming a research finding, and this receives unilateral attention. The probability that at least one study, among several done on the same question, claims a statistically significant research finding is easy to estimate. For n independent studies of equal power, the 2 × 2 table is shown in Table 3: PPV = R(1 − βn)/(R + 1 − [1 − α]n − Rβn) (not considering bias). With increasing number of independent studies, PPV tends to decrease, unless 1 − β < α, i.e., typically 1 − β < 0.05. This is shown for different levels of power and for different pre-study odds in Figure 2. For n studies of different power, the term βn is replaced by the product of the terms βi for i = 1 to n, but inferences are similar.
Figure 2. PPV (Probability That a Research Finding Is True) as a Function of the Pre-Study Odds for Various Numbers of Conducted Studies, n

Panels correspond to power of 0.20, 0.50, and 0.80.

Table 3. Research Findings and True Relationships in the Presence of Multiple Studies
Corollaries

A practical example is shown in Box 1. Based on the above considerations, one may deduce several interesting corollaries about the probability that a research finding is indeed true.

Corollary 1: The smaller the studies conducted in a scientific field, the less likely the research findings are to be true. Small sample size means smaller power and, for all functions above, the PPV for a true research finding decreases as power decreases towards 1 − β = 0.05. Thus, other factors being equal, research findings are more likely true in scientific fields that undertake large studies, such as randomized controlled trials in cardiology (several thousand subjects randomized) [14] than in scientific fields with small studies, such as most research of molecular predictors (sample sizes 100-fold smaller) [15].

Corollary 2: The smaller the effect sizes in a scientific field, the less likely the research findings are to be true. Power is also related to the effect size. Thus research findings are more likely true in scientific fields with large effects, such as the impact of smoking on cancer or cardiovascular disease (relative risks 3–20), than in scientific fields where postulated effects are small, such as genetic risk factors for multigenetic diseases (relative risks 1.1–1.5) [7]. Modern epidemiology is increasingly obliged to target smaller effect sizes [16]. Consequently, the proportion of true research findings is expected to decrease. In the same line of thinking, if the true effect sizes are very small in a scientific field, this field is likely to be plagued by almost ubiquitous false positive claims. For example, if the majority of true genetic or nutritional determinants of complex diseases confer relative risks less than 1.05, genetic or nutritional epidemiology would be largely utopian endeavors.

Corollary 3: The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true. As shown above, the post-study probability that a finding is true (PPV) depends a lot on the pre-study odds (R). Thus, research findings are more likely true in confirmatory designs, such as large phase III randomized controlled trials, or meta-analyses thereof, than in hypothesis-generating experiments. Fields considered highly informative and creative given the wealth of the assembled and tested information, such as microarrays and other high-throughput discovery-oriented research [4,8,17], should have extremely low PPV.

Corollary 4: The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true. Flexibility increases the potential for transforming what would be “negative” results into “positive” results, i.e., bias, u. For several research designs, e.g., randomized controlled trials [18–20] or meta-analyses [21,22], there have been efforts to standardize their conduct and reporting. Adherence to common standards is likely to increase the proportion of true findings. The same applies to outcomes. True findings may be more common when outcomes are unequivocal and universally agreed (e.g., death) rather than when multifarious outcomes are devised (e.g., scales for schizophrenia outcomes) [23]. Similarly, fields that use commonly agreed, stereotyped analytical methods (e.g., Kaplan-Meier plots and the log-rank test) [24] may yield a larger proportion of true findings than fields where analytical methods are still under experimentation (e.g., artificial intelligence methods) and only “best” results are reported. Regardless, even in the most stringent research designs, bias seems to be a major problem. For example, there is strong evidence that selective outcome reporting, with manipulation of the outcomes and analyses reported, is a common problem even for randomized trails [25]. Simply abolishing selective publication would not make this problem go away.

Corollary 5: The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true. Conflicts of interest and prejudice may increase bias, u. Conflicts of interest are very common in biomedical research [26], and typically they are inadequately and sparsely reported [26,27]. Prejudice may not necessarily have financial roots. Scientists in a given field may be prejudiced purely because of their belief in a scientific theory or commitment to their own findings. Many otherwise seemingly independent, university-based studies may be conducted for no other reason than to give physicians and researchers qualifications for promotion or tenure. Such nonfinancial conflicts may also lead to distorted reported results and interpretations. Prestigious investigators may suppress via the peer review process the appearance and dissemination of findings that refute their findings, thus condemning their field to perpetuate false dogma. Empirical evidence on expert opinion shows that it is extremely unreliable [28].


Corollary 6: The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
This seemingly paradoxical corollary follows because, as stated above, the PPV of isolated findings decreases when many teams of investigators are involved in the same field. This may explain why we occasionally see major excitement followed rapidly by severe disappointments in fields that draw wide attention. With many teams working on the same field and with massive experimental data being produced, timing is of the essence in beating competition. Thus, each team may prioritize on pursuing and disseminating its most impressive “positive” results. “Negative” results may become attractive for dissemination only if some other team has found a “positive” association on the same question. In that case, it may be attractive to refute a claim made in some prestigious journal. The term Proteus phenomenon has been coined to describe this phenomenon of rapidly alternating extreme research claims and extremely opposite refutations [29]. Empirical evidence suggests that this sequence of extreme opposites is very common in molecular genetics [29].

These corollaries consider each factor separately, but these factors often influence each other. For example, investigators working in fields where true effect sizes are perceived to be small may be more likely to perform large studies than investigators working in fields where true effect sizes are perceived to be large. Or prejudice may prevail in a hot scientific field, further undermining the predictive value of its research findings. Highly prejudiced stakeholders may even create a barrier that aborts efforts at obtaining and disseminating opposing results. Conversely, the fact that a field is hot or has strong invested interests may sometimes promote larger studies and improved standards of research, enhancing the predictive value of its research findings. Or massive discovery-oriented testing may result in such a large yield of significant relationships that investigators have enough to report and search further and thus refrain from data dredging and manipulation.

Most Research Findings Are False for Most Research Designs and for Most Fields

In the described framework, a PPV exceeding 50% is quite difficult to get. Table 4 provides the results of simulations using the formulas developed for the influence of power, ratio of true to non-true relationships, and bias, for various types of situations that may be characteristic of specific study designs and settings. A finding from a well-conducted, adequately powered randomized controlled trial starting with a 50% pre-study chance that the intervention is effective is eventually true about 85% of the time. A fairly similar performance is expected of a confirmatory meta-analysis of good-quality randomized trials: potential bias probably increases, but power and pre-test chances are higher compared to a single randomized trial. Conversely, a meta-analytic finding from inconclusive studies where pooling is used to “correct” the low power of single studies, is probably false if R ≤ 1:3. Research findings from underpowered, early-phase clinical trials would be true about one in four times, or even less frequently if bias is present. Epidemiological studies of an exploratory nature perform even worse, especially when underpowered, but even well-powered epidemiological studies may have only a one in five chance being true, if R = 1:10. Finally, in discovery-oriented research with massive testing, where tested relationships exceed true ones 1,000-fold (e.g., 30,000 genes tested, of which 30 may be the true culprits) [30,31], PPV for each claimed relationship is extremely low, even with considerable standardization of laboratory and statistical methods, outcomes, and reporting thereof to minimize bias.
Table 4. PPV of Research Findings for Various Combinations of Power (1 − β), Ratio of True to Not-True Relationships (R), and Bias (u)

Claimed Research Findings May Often Be Simply Accurate Measures of the Prevailing Bias

As shown, the majority of modern biomedical research is operating in areas with very low pre- and post-study probability for true findings. Let us suppose that in a research field there are no true findings at all to be discovered. History of science teaches us that scientific endeavor has often in the past wasted effort in fields with absolutely no yield of true scientific information, at least based on our current understanding. In such a “null field,” one would ideally expect all observed effect sizes to vary by chance around the null in the absence of bias. The extent that observed findings deviate from what is expected by chance alone would be simply a pure measure of the prevailing bias.

For example, let us suppose that no nutrients or dietary patterns are actually important determinants for the risk of developing a specific tumor. Let us also suppose that the scientific literature has examined 60 nutrients and claims all of them to be related to the risk of developing this tumor with relative risks in the range of 1.2 to 1.4 for the comparison of the upper to lower intake tertiles. Then the claimed effect sizes are simply measuring nothing else but the net bias that has been involved in the generation of this scientific literature. Claimed effect sizes are in fact the most accurate estimates of the net bias. It even follows that between “null fields,” the fields that claim stronger effects (often with accompanying claims of medical or public health importance) are simply those that have sustained the worst biases.

For fields with very low PPV, the few true relationships would not distort this overall picture much. Even if a few relationships are true, the shape of the distribution of the observed effects would still yield a clear measure of the biases involved in the field. This concept totally reverses the way we view scientific results. Traditionally, investigators have viewed large and highly significant effects with excitement, as signs of important discoveries. Too large and too highly significant effects may actually be more likely to be signs of large bias in most fields of modern research. They should lead investigators to careful critical thinking about what might have gone wrong with their data, analyses, and results.

Of course, investigators working in any field are likely to resist accepting that the whole field in which they have spent their careers is a “null field.” However, other lines of evidence, or advances in technology and experimentation, may lead eventually to the dismantling of a scientific field. Obtaining measures of the net bias in one field may also be useful for obtaining insight into what might be the range of bias operating in other fields where similar analytical methods, technologies, and conflicts may be operating.

How Can We Improve the Situation?

Is it unavoidable that most research findings are false, or can we improve the situation? A major problem is that it is impossible to know with 100% certainty what the truth is in any research question. In this regard, the pure “gold” standard is unattainable. However, there are several approaches to improve the post-study probability.

Better powered evidence, e.g., large studies or low-bias meta-analyses, may help, as it comes closer to the unknown “gold” standard. However, large studies may still have biases and these should be acknowledged and avoided. Moreover, large-scale evidence is impossible to obtain for all of the millions and trillions of research questions posed in current research. Large-scale evidence should be targeted for research questions where the pre-study probability is already considerably high, so that a significant research finding will lead to a post-test probability that would be considered quite definitive. Large-scale evidence is also particularly indicated when it can test major concepts rather than narrow, specific questions. A negative finding can then refute not only a specific proposed claim, but a whole field or considerable portion thereof. Selecting the performance of large-scale studies based on narrow-minded criteria, such as the marketing promotion of a specific drug, is largely wasted research. Moreover, one should be cautious that extremely large studies may be more likely to find a formally statistical significant difference for a trivial effect that is not really meaningfully different from the null [32–34].

Second, most research questions are addressed by many teams, and it is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence. Diminishing bias through enhanced research standards and curtailing of prejudices may also help. However, this may require a change in scientific mentality that might be difficult to achieve. In some research designs, efforts may also be more successful with upfront registration of studies, e.g., randomized trials [35]. Registration would pose a challenge for hypothesis-generating research. Some kind of registration or networking of data collections or investigators within fields may be more feasible than registration of each and every hypothesis-generating experiment. Regardless, even if we do not see a great deal of progress with registration of studies in other fields, the principles of developing and adhering to a protocol could be more widely borrowed from randomized controlled trials.

Finally, instead of chasing statistical significance, we should improve our understanding of the range of R values—the pre-study odds—where research efforts operate [10]. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained. As described above, whenever ethically acceptable, large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test [36].

Nevertheless, most new discoveries will continue to stem from hypothesis-generating research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections [37], usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on how many relationships are expected to be true among those probed across the relevant research fields and research designs. The wider field may yield some guidance for estimating this probability for the isolated research project. Experiences from biases detected in other neighboring fields would also be useful to draw upon. Even though these assumptions would be considerably subjective, they would still be very useful in interpreting research claims and putting them in context.

Box 1. An Example: Science at Low Pre-Study Odds

Let us assume that a team of investigators performs a whole genome association study to test whether any of 100,000 gene polymorphisms are associated with susceptibility to schizophrenia. Based on what we know about the extent of heritability of the disease, it is reasonable to expect that probably around ten gene polymorphisms among those tested would be truly associated with schizophrenia, with relatively similar odds ratios around 1.3 for the ten or so polymorphisms and with a fairly similar power to identify any of them. Then R = 10/100,000 = 10−4, and the pre-study probability for any polymorphism to be associated with schizophrenia is also R/(R + 1) = 10−4. Let us also suppose that the study has 60% power to find an association with an odds ratio of 1.3 at α = 0.05. Then it can be estimated that if a statistically significant association is found with the p-value barely crossing the 0.05 threshold, the post-study probability that this is true increases about 12-fold compared with the pre-study probability, but it is still only 12 × 10−4.

Now let us suppose that the investigators manipulate their design, analyses, and reporting so as to make more relationships cross the p = 0.05 threshold even though this would not have been crossed with a perfectly adhered to design and analysis and with perfect comprehensive reporting of the results, strictly according to the original study plan. Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available “data mining” packages actually are proud of their ability to yield statistically significant results through data dredging. In the presence of bias with u = 0.10, the post-study probability that a research finding is true is only 4.4 × 10−4. Furthermore, even in the absence of any bias, when ten independent research teams perform similar experiments around the world, if one of them finds a formally statistically significant association, the probability that the research finding is true is only 1.5 × 10−4, hardly any higher than the probability we had before any of this extensive research was undertaken!

1. Ioannidis JP, Haidich AB, Lau J (2001) Any casualties in the clash of randomised and observational evidence? BMJ 322:879–880. Find this article online
2. Lawlor DA, Davey Smith G, Kundu D, Bruckdorfer KR, Ebrahim S (2004) Those confounded vitamins: What can we learn from the differences between observational versus randomised trial evidence? Lancet 363:1724–1727. Find this article online
3. Vandenbroucke JP (2004) When are observational studies as credible as randomised trials? Lancet 363:1728–1731. Find this article online
4. Michiels S, Koscielny S, Hill C (2005) Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet 365:488–492. Find this article online
5. Ioannidis JPA, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG (2001) Replication validity of genetic association studies. Nat Genet 29:306–309. Find this article online
6. Colhoun HM, McKeigue PM, Davey Smith G (2003) Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872. Find this article online
7. Ioannidis JP (2003) Genetic associations: False or true? Trends Mol Med 9:135–138. Find this article online
8. Ioannidis JPA (2005) Microarrays and molecular research: Noise discovery? Lancet 365:454–455. Find this article online
9. Sterne JA, Davey Smith G (2001) Sifting the evidence—What’s wrong with significance tests. BMJ 322:226–231. Find this article online
10. Wacholder S, Chanock S, Garcia-Closas M, El ghormli L, Rothman N (2004) Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J Natl Cancer Inst 96:434–442. Find this article online
11. Risch NJ (2000) Searching for genetic determinants in the new millennium. Nature 405:847–856. Find this article online
12. Kelsey JL, Whittemore AS, Evans AS, Thompson WD (1996) Methods in observational epidemiology, 2nd ed. New York: Oxford U Press. 432 p.
13. Topol EJ (2004) Failing the public health—Rofecoxib, Merck, and the FDA. N Engl J Med 351:1707–1709. Find this article online
14. Yusuf S, Collins R, Peto R (1984) Why do we need some large, simple randomized trials? Stat Med 3:409–422. Find this article online
15. Altman DG, Royston P (2000) What do we mean by validating a prognostic model? Stat Med 19:453–473. Find this article online
16. Taubes G (1995) Epidemiology faces its limits. Science 269:164–169. Find this article online
17. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531–537. Find this article online
18. Moher D, Schulz KF, Altman DG (2001) The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomised trials. Lancet 357:1191–1194. Find this article online
19. Ioannidis JP, Evans SJ, Gotzsche PC, O’Neill RT, Altman DG, et al. (2004) Better reporting of harms in randomized trials: An extension of the CONSORT statement. Ann Intern Med 141:781–788. Find this article online
20. International Conference on Harmonisation E9 Expert Working Group. (1999) ICH Harmonised Tripartite Guideline. Statistical principles for clinical trials. Stat Med 18:1905–1942. Find this article online
21. Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, et al. (1999) Improving the quality of reports of meta-analyses of randomised controlled trials: The QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 354:1896–1900. Find this article online
22. Stroup DF, Berlin JA, Morton SC, Olkin I, Williamson GD, et al. (2000) Meta-analysis of observational studies in epidemiology: A proposal for reporting. Meta-analysis of Observational Studies in Epidemiology (MOOSE) group. JAMA 283:2008–2012. Find this article online
23. Marshall M, Lockwood A, Bradley C, Adams C, Joy C, et al. (2000) Unpublished rating scales: A major source of bias in randomised controlled trials of treatments for schizophrenia. Br J Psychiatry 176:249–252. Find this article online
24. Altman DG, Goodman SN (1994) Transfer of technology from statistical journals to the biomedical literature. Past trends and future predictions. JAMA 272:129–132. Find this article online
25. Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG (2004) Empirical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA 291:2457–2465. Find this article online
26. Krimsky S, Rothenberg LS, Stott P, Kyle G (1998) Scientific journals and their authors’ financial interests: A pilot study. Psychother Psychosom 67:194–201. Find this article online
27. Papanikolaou GN, Baltogianni MS, Contopoulos-Ioannidis DG, Haidich AB, Giannakakis IA, et al. (2001) Reporting of conflicts of interest in guidelines of preventive and therapeutic interventions. BMC Med Res Methodol 1:3. Find this article online
28. Antman EM, Lau J, Kupelnick B, Mosteller F, Chalmers TC (1992) A comparison of results of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA 268:240–248. Find this article online
29. Ioannidis JP, Trikalinos TA (2005) Early extreme contradictory estimates may appear in published research: The Proteus phenomenon in molecular genetics research and randomized trials. J Clin Epidemiol 58:543–549. Find this article online
30. Ntzani EE, Ioannidis JP (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: An empirical assessment. Lancet 362:1439–1444. Find this article online
31. Ransohoff DF (2004) Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 4:309–314. Find this article online
32. Lindley DV (1957) A statistical paradox. Biometrika 44:187–192. Find this article online
33. Bartlett MS (1957) A comment on D.V. Lindley’s statistical paradox. Biometrika 44:533–534. Find this article online
34. Senn SJ (2001) Two cheers for P-values. J Epidemiol Biostat 6:193–204. Find this article online
35. De Angelis C, Drazen JM, Frizelle FA, Haug C, Hoey J, et al. (2004) Clinical trial registration: A statement from the International Committee of Medical Journal Editors. N Engl J Med 351:1250–1251. Find this article online
36. Ioannidis JPA (2005) Contradicted and initially stronger effects in highly cited clinical research. JAMA 294:218–228. Find this article online
37. Hsueh HM, Chen JJ, Kodell RL (2003) Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. J Biopharm Stat 13:675–689.

2 Responses to “Sloppy science everywhere”

  1. MartinDKessler Says:

    Hi TS, The statement in your posting: “Studies have prominently reported such sex differences for hypertension, schizophrenia and multiple sclerosis, as well as lung cancer and heart attacks.” warrents comment. Three of the diseases with reasonable certainty are objectively diagnosable: hypertension, multple sclerosis, and lung cancer. Schizophrenia is not. Psychaitry would like it to be a real brain disease – they declare it is – afterall it’s their “Sacred Symbol”.
    AIDS has an indirect objective method – that has never been validated – that’s scary too. Is AIDS really an immune problem – or was it just called one because it sounded good or looked like one? As we get more complex and sophisticated in our technical scientific/medical establishment, the average person is led to believe something really scientific is really going on, when it is not. In AIDS (as in Psychiatry) inconvenient internal criticism is stifled – and anyone who tries to pull the curtain (like Duesberg or Culshaw, and many others, get no official publicity.

  2. Nick Naylor Says:

    Thanks TS, for highlighting this issue. Perhaps a slightly different take?

    Gore Vidal once sarcastically referred to “living in the glorious eternal American present” in commenting on the peculiar presupposition of leading media pundits that history is safe to ignore when assessing a societal problem.

    Given the scientific solutions to all problems promised 50 years ago, how did we wind up in this predicament, bouncing from crisis to crisis, not sure which ones are real or fake? Can a slight dip into history tell us anything about the HIV fiasco? After all, weren’t all diseases supposed to be cured by now thanks to the sequencing of the human genome? Talk about fallible predicting!

    The key, I think, isn’t “errors”, for they will be made; it’s more about the commodification of science, which, of course results – as mass production is won’t – in imitations and not the genuine article. For practitioners, it becomes an irrelevancy to assess the likelihood of “success”, since it is assured anyway once one is admitted to the club.

    Let’s consider post World War II “knowledge monopolists” sitting at the institutional levers of power, who prefer a compartmentalized “team approach” whereby experimentalists are divorced from the roots of that which they are investigating. Lives must be saved now, “no time” for idle contemplation, we are handing you this problem ready-made for your experimental solution. When the first polio vaccine resulted in an epidemic of polio, a warning was provided to society on what this new “hurry up and produce” science had in store. Was there “a remarkable level of data mismanagement, poor design and misleading conclusions” in the actual cell culture experiments that isolated the poliovirus? I don’t know. Here I prefer Feynman’s “served up ready made” wisecrack that criticized lack of fundamental understanding in physics as more important than errors per se. In biomedical research based on cell lines, post-1950 generations know of nothing else: this particular methodology is “biology”, an inherent restriction that I believe results in myopic puzzle solving. Working pathologists might say cell-line experimentalists are – notwithstanding credentials – out of their depth. They never know when they’re just chasing their own tails, even while publishing quite impressive papers.

    So, “science” should include a scientist analyzing the history of a problem. Cell-lines begat virus isolation which begat HIV – a “myopic discovery” par excellance given a total and complete failure to find this putative HIV in a single AIDS patient. But evidence of toxic exposures can be found aplenty. And cell-lines have NOT begat the promised cancer cures.

    The “error-problem” is there, as TS eloquently points out, but what if it is mostly a consequence of the manner in which work is organized in our ever increasingly dysfunctional (because they are) authoritarian places of employment. (Not to mention toxic as well.) Essentially the strategy is mindless production, more and more output, faster and faster lest we fall behind our competition. Cell-lines, of course, are well suited to this mind set.

    Thus, there’s no time to review the history of a line of enquiry to determine how likely the next step will be successful. And no time for a “science journalist” to actually know something about the field being reported on.

    “We scientists are doing our part, knowing that it’s most important to have an infinity of journal articles, with reams of inscrutable data that few users will ever have time to look at.”

    “We practicing physicians, in more privileged positions, can afford to look at it differently. So what if it’s brilliant research or a lot of junk. We’ll leave the journals stacked around in various places to remind visitors that we’re not out of touch with current wonder cures. But with colleagues, we’re honest with each other. Just cut to the chase, this drug for this virus? The research backs it up over here? Good. Now it’s time for golf.”

Leave a Reply

You must be logged in to post a comment.


Bad Behavior has blocked 3791 access attempts in the last 7 days.