Wednesday, June 17, 2015

The struggle to reproduce scientific results and why (scientists and everyone else) should be happy about it.


Once again there has been a spate of articles in the "popular" and scientific press about issues of reproducibility of scientific studies, the reasons behind the increase in retraction rates, and incidences of fraud or increased carelessness among scientists. It has been repeatedly pointed out that scientists are just people, and thus subject to all of the same issues (ego, unwillingness to accept they are incorrect etc), and that we should accept that science is flawed. This has also raised some to ask whether the enterprise and practice of science is broken (also see this). However, I think if you look carefully at what is being described there are many reasons to suggest that the scientific endeavor is not only not broken, but is  showing an increased formalism of all of the self-correcting mechanisms normally associated with science. In other words, reasons to be optimistic!

Yes, scientists are (like all of us) individually biased. And as many authors have pointed out in the current environment of both scarce resources and desire (and arguable need, to secure resources) for prestige, some scientists have cut corners, failed to do important controls (or full experiments) that they knew they should or used outright fraud. However, what I think about most these days is how quickly these issues are discovered (or uncovered) and described both online and (albeit more slowly) in the formal scientific literature. This is a good thing.

Before I get into that, a few points. It seems like articles like this in the New York Times may make it seem like retractions are reaching epidemic levels (for whichever of the possible reasons stated in the paragraph above). However such a claim seems overly hyperbolic to me. Yes, there seems to be many retractions of papers, and thanks to sites like retraction watch, these can be identified far more easily. They have also pointed out that there seems to be an increase in retraction rates during the past five years. I have not looked carefully at the numbers, and I have no particular reason to dispute them. Still, as I will discuss below, I am not worried by this, but it brings me optimism about the scientific process. Our ability (as a scientific community) to correct these mistakes is becoming quicker and more efficient.

 First (and my apologies for having to state the obvious), but the vast majority of scientists and scientific studies are done with deliberate care of experimental methodology. However this does not mean mistakes do not happen, experiments and analyses may be incorrect, interpretations (because of biases) may be present in individual studies. This is sometimes due to carelessness or a "rush to publish", but it may as well be due to honest mistakes that would have happened anyways. As individuals we are imperfect. Scientists are as flawed as any other person, it is the methodology and enterprise as a whole (and the community) that is self-correcting.

Also (and I have not calculated the numbers), many of the studies reported on sites like Retraction Watch are actually corrections, where the study itself was not invalidated, but a particular issue about the paper (which could be as simple as something being mis-labeled). I should probably look up the ratio of retractions:corrections (and my guess someone has already done this).

One of the major issues that is brought up with respect to how science is failing is that the ability to replicate the results found in a previously published study can be low. As has been written about this issue before (including on this blog), perfectly reproducing experiment can be as difficult as trying to get the experiment to work in the first place (maybe harder). Even if the effect is being measured is "true", subtle differences in experimental methodologies (that the researches are unaware are different) can cause problems. Indeed, there are at least a number of instances where the experimental methodology trying to reproduce the original protocol was flawed (I have written about one such case here). While I could spend time quibbling about the methodology used to determine the numbers, there is no doubt that there is some fraction of papers that are published, where the results from experiments are not repeatable at all, or are deeply confounded and are meaningless. I will say, that most of the studies looking at this take a very broad view of "failure to replicate". However, I have no doubt that research into "replication" will increase, and this is a good thing. Indeed, I have no idea why studies like this would suggest that studies with "redundant" findings would have "no enduring value".


So with all of these concerns, why am I optimistic? Mostly because I do not think that the actual rate of fraud or irreproducibility is increasing. Instead, I think that there has been an important change in how we read papers, detect and most importantly report problems with studies and the general process of post publication peer review (PPPR). Sites like retraction watch, pubpeer, pubmed commons, F1000 as well as on individual blogs (such as here and here) are enabling this. Not only are individual scientists working together to examine potential problems in individual studies, but these often lead to important discussions around methodology and interpretation (often in the comments to individual posts). This does not always mean that the people making the comments/criticisms about potential problems are always correct themselves (they will have their own biases of course), but potential issues can be raised, discussed by the community and resolved. These may lead to formal corrections or retractions in a few cases. Most of the time it usually leads to the recognition that the scope of impact of the study may ultimately be more limited than the authors of the original study suggested. It also (I hope) leads to new and interesting questions to be answered. Thus, the apparent increase in retraction rates and reproducibility issues most likely reflects an increased awareness and sensitivity to these issues.  This may be (psychologically) a similar issue where despite crime rates decreasing, increased scrutiny, vigilance and reporting in our society make it seem like the opposite is happening.

I also want to note that despite comments I often see (on twitter or in articles) that pre-publication peer review is failing completely (or is misguided), I think that it remains a successful first round of filtering. In addition to serving (and having previously served) as associate editor on a number of journals (and having reviewed papers for many many more) I would estimate that 80% of reviews I have seen from other authors (including for my own work) have been helpful, improve the science, the interpretation and clarity of communication. Indeed as a current AE at both Evolution and The American Naturalist the reviews for papers I handle are almost always highly professional, very constructive and improve the final published papers. Usually this results from pointing out issues of analysis, potential confounding issues in the experiments and concerns with interpretation. Often one of the major issues relates to the last point, reviewers (and editors) can often reduce the over-interpretation and hyperbole that can be found in initial submissions. This does not mean I do not rage when I get highly critical reviews of my own work. Nor does it mean that I think that trying to evaluate and predict the value and impact of individual studies (and whether it has any part in the review process and selection for particular journals) remains deeply problematic. However, in balance my experience is that this does improve studies and remains an important part of the process. Further, I do not see (despite the many burdens on the time of scientists, and that reviewing remains unpaid work) evidence for a decline in quality of reviews. Others (like here and here) have said similar things (and probably more eloquently).

As for the replication issues. This is clearly being taken seriously by many in the scientific community, including funding agencies which are now setting aside resources specifically to address this. Moreover, many funding agencies are now not only requiring that papers funded by them be made open access within 12 months of publication, but (and along with a number of journals) are requiring all raw data (and I hope soon scripts for analyses) to be deposited in public repositories.


I think the scientific community still has a lot to do on all of these issues, and I would certainly like to see it happen faster (in particular with issues like publishing reviews of papers along with paper, more oversight into making sure all raw data, full experimental methodology and scripts are made available at the time of publication). However, in my opinion, it does seem like the concerns, increased scrutiny and calls for increased openness are all a part of science and are being increasingly formalized. We are not waiting decades or hundreds of years to correct our understanding based on poor assumptions (or incorrectly designed or analyzed experiments), but often just months or weeks. That is a reason to be optimistic.

note: Wow, it has been a long time since my last post. I can just say I have moved from one University (Michigan State University) to another (McMaster University) and my family and I are in the process of moving as well (while only ~400km, it is from the US to Canada...more on that in a future post). I will try to be much better after the move next week!