Wednesday, June 17, 2015

The struggle to reproduce scientific results and why (scientists and everyone else) should be happy about it.

Once again there has been a spate of articles in the "popular" and scientific press about issues of reproducibility of scientific studies, the reasons behind the increase in retraction rates, and incidences of fraud or increased carelessness among scientists. It has been repeatedly pointed out that scientists are just people, and thus subject to all of the same issues (ego, unwillingness to accept they are incorrect etc), and that we should accept that science is flawed. This has also raised some to ask whether the enterprise and practice of science is broken (also see this). However, I think if you look carefully at what is being described there are many reasons to suggest that the scientific endeavor is not only not broken, but is  showing an increased formalism of all of the self-correcting mechanisms normally associated with science. In other words, reasons to be optimistic!

Yes, scientists are (like all of us) individually biased. And as many authors have pointed out in the current environment of both scarce resources and desire (and arguable need, to secure resources) for prestige, some scientists have cut corners, failed to do important controls (or full experiments) that they knew they should or used outright fraud. However, what I think about most these days is how quickly these issues are discovered (or uncovered) and described both online and (albeit more slowly) in the formal scientific literature. This is a good thing.

Before I get into that, a few points. It seems like articles like this in the New York Times may make it seem like retractions are reaching epidemic levels (for whichever of the possible reasons stated in the paragraph above). However such a claim seems overly hyperbolic to me. Yes, there seems to be many retractions of papers, and thanks to sites like retraction watch, these can be identified far more easily. They have also pointed out that there seems to be an increase in retraction rates during the past five years. I have not looked carefully at the numbers, and I have no particular reason to dispute them. Still, as I will discuss below, I am not worried by this, but it brings me optimism about the scientific process. Our ability (as a scientific community) to correct these mistakes is becoming quicker and more efficient.

 First (and my apologies for having to state the obvious), but the vast majority of scientists and scientific studies are done with deliberate care of experimental methodology. However this does not mean mistakes do not happen, experiments and analyses may be incorrect, interpretations (because of biases) may be present in individual studies. This is sometimes due to carelessness or a "rush to publish", but it may as well be due to honest mistakes that would have happened anyways. As individuals we are imperfect. Scientists are as flawed as any other person, it is the methodology and enterprise as a whole (and the community) that is self-correcting.

Also (and I have not calculated the numbers), many of the studies reported on sites like Retraction Watch are actually corrections, where the study itself was not invalidated, but a particular issue about the paper (which could be as simple as something being mis-labeled). I should probably look up the ratio of retractions:corrections (and my guess someone has already done this).

One of the major issues that is brought up with respect to how science is failing is that the ability to replicate the results found in a previously published study can be low. As has been written about this issue before (including on this blog), perfectly reproducing experiment can be as difficult as trying to get the experiment to work in the first place (maybe harder). Even if the effect is being measured is "true", subtle differences in experimental methodologies (that the researches are unaware are different) can cause problems. Indeed, there are at least a number of instances where the experimental methodology trying to reproduce the original protocol was flawed (I have written about one such case here). While I could spend time quibbling about the methodology used to determine the numbers, there is no doubt that there is some fraction of papers that are published, where the results from experiments are not repeatable at all, or are deeply confounded and are meaningless. I will say, that most of the studies looking at this take a very broad view of "failure to replicate". However, I have no doubt that research into "replication" will increase, and this is a good thing. Indeed, I have no idea why studies like this would suggest that studies with "redundant" findings would have "no enduring value".

So with all of these concerns, why am I optimistic? Mostly because I do not think that the actual rate of fraud or irreproducibility is increasing. Instead, I think that there has been an important change in how we read papers, detect and most importantly report problems with studies and the general process of post publication peer review (PPPR). Sites like retraction watch, pubpeer, pubmed commons, F1000 as well as on individual blogs (such as here and here) are enabling this. Not only are individual scientists working together to examine potential problems in individual studies, but these often lead to important discussions around methodology and interpretation (often in the comments to individual posts). This does not always mean that the people making the comments/criticisms about potential problems are always correct themselves (they will have their own biases of course), but potential issues can be raised, discussed by the community and resolved. These may lead to formal corrections or retractions in a few cases. Most of the time it usually leads to the recognition that the scope of impact of the study may ultimately be more limited than the authors of the original study suggested. It also (I hope) leads to new and interesting questions to be answered. Thus, the apparent increase in retraction rates and reproducibility issues most likely reflects an increased awareness and sensitivity to these issues.  This may be (psychologically) a similar issue where despite crime rates decreasing, increased scrutiny, vigilance and reporting in our society make it seem like the opposite is happening.

I also want to note that despite comments I often see (on twitter or in articles) that pre-publication peer review is failing completely (or is misguided), I think that it remains a successful first round of filtering. In addition to serving (and having previously served) as associate editor on a number of journals (and having reviewed papers for many many more) I would estimate that 80% of reviews I have seen from other authors (including for my own work) have been helpful, improve the science, the interpretation and clarity of communication. Indeed as a current AE at both Evolution and The American Naturalist the reviews for papers I handle are almost always highly professional, very constructive and improve the final published papers. Usually this results from pointing out issues of analysis, potential confounding issues in the experiments and concerns with interpretation. Often one of the major issues relates to the last point, reviewers (and editors) can often reduce the over-interpretation and hyperbole that can be found in initial submissions. This does not mean I do not rage when I get highly critical reviews of my own work. Nor does it mean that I think that trying to evaluate and predict the value and impact of individual studies (and whether it has any part in the review process and selection for particular journals) remains deeply problematic. However, in balance my experience is that this does improve studies and remains an important part of the process. Further, I do not see (despite the many burdens on the time of scientists, and that reviewing remains unpaid work) evidence for a decline in quality of reviews. Others (like here and here) have said similar things (and probably more eloquently).

As for the replication issues. This is clearly being taken seriously by many in the scientific community, including funding agencies which are now setting aside resources specifically to address this. Moreover, many funding agencies are now not only requiring that papers funded by them be made open access within 12 months of publication, but (and along with a number of journals) are requiring all raw data (and I hope soon scripts for analyses) to be deposited in public repositories.

I think the scientific community still has a lot to do on all of these issues, and I would certainly like to see it happen faster (in particular with issues like publishing reviews of papers along with paper, more oversight into making sure all raw data, full experimental methodology and scripts are made available at the time of publication). However, in my opinion, it does seem like the concerns, increased scrutiny and calls for increased openness are all a part of science and are being increasingly formalized. We are not waiting decades or hundreds of years to correct our understanding based on poor assumptions (or incorrectly designed or analyzed experiments), but often just months or weeks. That is a reason to be optimistic.

note: Wow, it has been a long time since my last post. I can just say I have moved from one University (Michigan State University) to another (McMaster University) and my family and I are in the process of moving as well (while only ~400km, it is from the US to Canada...more on that in a future post). I will try to be much better after the move next week!


  1. With respect to some analysis on the rate of retractions @jaimedash pointed me to this "on the envelope" calculations discussed in this post by @EcoEvoGames

  2. Ivan Oransky ‏@ivanoransky from retraction watch also pointed me to these papers. I had seen the PNAS one paper (it was one of the papers I was thinking about with respect to "quibbling" about the numbers) but the one in PLoS Medicine was new to me. Thanks Ivan!

  3. An analysis (with data and code) of retraction rates

  4. Exttrapolating the exponential fit the the PubMed retraction data leads to 100% retraction rate at or around 2046.

    Obviously, it will not reach 100%, as several processes will happen before that. But this steepness seems to indicate to me, that there now is a sudden awareness that one can actually do something about flawed papers, which I'm not so sure about.
    That being said, I'm not aware of any conclusive data on the matter. All we have is suggestive, which isn't really all that helpful.

    1. I agree that the data is not all there yet, but I think that the analyses that are being done are helpful. I am certainly not saying that we should we not be "shining a light" on these issues and identifying why there are so many retractions. As you have also done commented on before, I think there are many issues remaining in the culture of science that are very poisonous to the enterprise of doing and communicating science. But I think that all of the incremental movement towards a true open science (in terms of reviewing, open data and analysis, etc) are happening. Definitely too slowly, but still happening.

      I will happily admit that I may be living in an echo chamber on these issues, and I spend too much time talking to Titus Brown about it!

      Thanks again!

  5. Bjorn Brembs @brembs pointed me to his analysis summarized in this paper.
    and also discussed in his blog post

    I want to clarify, that I am not really arguing about rates of retraction, and my points about increased scrutiny (and communication about criticism of papers) was not meant to reflect journal "rank" or specific disciplines.

    Instead I was trying to suggest that how we are communicating our critical evaluations of papers is changing, mostly in a good way. It used to be that evaluation of papers (post review) was once only done and communicated in journal clubs, or discussed over a few glasses of fermented beverages at a conference. However, nothing (in terms of larger community level discussion) came of this, and major problems remained in the literature. Only the most egregious issues were ever called out. This often took many years. Often when a paper was retracted or corrected no one would notice.

    Now with pubpeer, F1000 and pubmed commons and blogging allows for a community based journal club of sorts. Indeed, when looking at the abstract for a paper on pubmed, you can easily see comments about the paper (and discussions) right there. That is fantastic progress, makes science better and makes me optimistic that science is heading slowly in the right direction. Is it smooth and perfect, no of course not. But the potential for more rapid self-correction (as a scientific community) is possible.

    1. It would be god to get a survey of more senior people about that. I sort of doubt that fishy papers were only ever retracted in extreme cases, but in the absence of any data, I have to rely on anecdotes for this impression. I remember seeing retractions during my graduate studies (in the 1990s) for earlier papers, which had been withdrawn by the authors because they could not reproduce their own results. But I don't know how widespread this attitude was, that if something is not correct, it has to be fixed. I have this cliché that the elder are much more strict about all things moral than we now are, but even that may be wrong, as many clichés tend to be.

    2. One other thing: I sent the links before I became aware there were tweets in this thread that twitter wasn't displaying at first, but shouldn't the insight from journal rank, namely that increased readership doesn't seem to have a big effect on retraction rate (it must have some effect, doesn't it?) seem to indicate that increased online access merely increases the speed with which retractions happen, but not the rate?

    3. Bjoern,

      Thanks for your replies. I have to think through both of your arguments.
      From Journal clubs back as a grad student (late 1990's early 2000's) I remember a great deal of identifying fundamental flaws in papers, but after the meeting, no one pursued it further. Not sure if this experience is generalizable at all.

      I think one could look directly at whether the rate or speed of retraction increased as a function of various online tools (from the papers themselves being online, to commenting tools etc..). I will think about your argument more though.

  6. So apparently I have been living under a rock for the past 2 months, as many others have pointed me to other posts and articles with a similar point of view as to the one I expressed here (the increased rate is not due to more fraud or mistakes, but increased vigilance and increased communication of potential problems). Whether this view point (which I still agree with) is ultimately born out by the data, I hope we will know soon. In any case here are some of the links courtesy of Lenny Teytelman (@lteytelman):

    Another great one by Stephen Heard
    (there is also a collection of additional posts linked to ere).

    and this collection of tweet based discussions (and links) from @edyong209

    Sorry for missing these!

  7. To expand on my twitter comment: yes, I do mean skew the review. Not in the sense that the content of the post pub review could be swayed by the social capital of the reviewers but in the sense that scientists outside the specific field(s) to which the reviewed paper belongs may experience a different understanding of the science. This relates your comment on "not waiting decades... to correct our understanding". I think: while those in the field are enabled to experience a relatively quick resolution to their understanding, those outside will be waiting for the consensus through the traditionally published literature. "outside" scientists are not well connected enough nor have the time to collect and interpret the post pub reviews (grey literature) to benefit from it. If anything they will be improperly influenced by the social capital of the post pub reviewers.

    In my perception, the tricky thing about scientific knowledge is the ever evolving body of work which each scientist must navigate with their own wits and the identification of individual or group authorities. In the distributed world of post pub review (as it stands at this moment) I think that social capital can modulate the way in which authority is accepted. (I wish I could cite some study here but I'm just going on a personal judgement about people.) The most direct mechanism I can imagine for this modulation would be number of individuals who read one's content and trust it's judgement.

    I am not speaking against the use of post pub review. In fact, as a graduate student, I am quite excited by the prospect of sifting through diverse and multidisciplinary scientific criticism of my future work (hopefully delivered in a manner that does not preclude future work but encourages new avenues of investigation). I see amazing examples of how post pub review can clarify and impact the rigorous progression of science ( I want to try and tie in the role of social capital in the enterprise of reviewing scientific literature and highlight that challenges remain in flattening the social landscape in the post pub world. But yes... those are my expanded thoughts. Hope it clarifies things!

    1. Danny,

      I think I have a better understanding of your concern. However, would the "optimization" of evaluation that I was broadly speaking about not also improve the opportunity for public discourse (with the "lay" public as well as specialists)? While I agree that those who are specialists may communicate in a way that is difficult to follow, it still may allow for assessment of the study in question more quickly.

      Perhaps more importantly, the very act of having the discussion in the open may give more people in the public (including new people to the field) a deeper sense of how science is being done in this field, and some of the assumptions of both the specific field and the paper itself. This will potentially help as it will (hopefully) deepen the appreciation that science is not just a collection of facts, but that facts are interpreted both with respect to how they were collected and analyzed (i.e. experimental design) and from within the context of specific fields.

  8. Yes I agree (and am excited) that the conversations generated online by the "optimization" of evaluation (post pub review) provide opportunities for public discourse by all who have access to them. I appreciate the time taken to further explain the benefits of online discussions. As I am digesting these ideas I am coming to view online post pub reviews as (as you put it in an above comment) an online journal club. Furthermore as science is an ongoing discussion and interpretation of data, these online discussions also reflect the dynamic aspect of science.

    I still think that the social landscape is not adequately flattened (not that there are avenues of discourse that exemplify the ideal). Also many fields and their knowledgeable members are not represented. But I believe this is the infancy of these types of communications and scientific culture will change to support more ways of disseminating and discussing findings. I think there is a great opportunity to further the reach and access to scientific discourse previously isolated in bars, conferences and meeting rooms. Given the youth of this type of scientific discourse (in my mind I'm calling it public scientific correspondence) I think those who engage should be acutely aware of how authority is employed and how arguments are phrased. Perhaps organizations will set codes of best practices or there has already been a blog discussion about this. Maybe graduate mentors will begin to incorporate into their relationships principles of online engagement.

    Anyways... there is a lot of food for thought in your post and comments. Especially after this post got linked to all those previous blog discussions! I'm sure it's a sign that people are beginning to experiment with some new ways of reaching peers and the public!