Genes Gone Wild: Science

Showing posts with label Science. Show all posts

Wednesday, June 17, 2015

The struggle to reproduce scientific results and why (scientists and everyone else) should be happy about it.

Once again there has been a spate of articles in the "popular" and scientific press about issues of reproducibility of scientific studies, the reasons behind the increase in retraction rates, and incidences of fraud or increased carelessness among scientists. It has been repeatedly pointed out that scientists are just people, and thus subject to all of the same issues (ego, unwillingness to accept they are incorrect etc), and that we should accept that science is flawed. This has also raised some to ask whether the enterprise and practice of science is broken (also see this). However, I think if you look carefully at what is being described there are many reasons to suggest that the scientific endeavor is not only not broken, but is showing an increased formalism of all of the self-correcting mechanisms normally associated with science. In other words, reasons to be optimistic!

Yes, scientists are (like all of us) individually biased. And as many authors have pointed out in the current environment of both scarce resources and desire (and arguable need, to secure resources) for prestige, some scientists have cut corners, failed to do important controls (or full experiments) that they knew they should or used outright fraud. However, what I think about most these days is how quickly these issues are discovered (or uncovered) and described both online and (albeit more slowly) in the formal scientific literature. This is a good thing.

Before I get into that, a few points. It seems like articles like this in the New York Times may make it seem like retractions are reaching epidemic levels (for whichever of the possible reasons stated in the paragraph above). However such a claim seems overly hyperbolic to me. Yes, there seems to be many retractions of papers, and thanks to sites like retraction watch, these can be identified far more easily. They have also pointed out that there seems to be an increase in retraction rates during the past five years. I have not looked carefully at the numbers, and I have no particular reason to dispute them. Still, as I will discuss below, I am not worried by this, but it brings me optimism about the scientific process. Our ability (as a scientific community) to correct these mistakes is becoming quicker and more efficient.

First (and my apologies for having to state the obvious), but the vast majority of scientists and scientific studies are done with deliberate care of experimental methodology. However this does not mean mistakes do not happen, experiments and analyses may be incorrect, interpretations (because of biases) may be present in individual studies. This is sometimes due to carelessness or a "rush to publish", but it may as well be due to honest mistakes that would have happened anyways. As individuals we are imperfect. Scientists are as flawed as any other person, it is the methodology and enterprise as a whole (and the community) that is self-correcting.

Also (and I have not calculated the numbers), many of the studies reported on sites like Retraction Watch are actually corrections, where the study itself was not invalidated, but a particular issue about the paper (which could be as simple as something being mis-labeled). I should probably look up the ratio of retractions:corrections (and my guess someone has already done this).

One of the major issues that is brought up with respect to how science is failing is that the ability to replicate the results found in a previously published study can be low. As has been written about this issue before (including on this blog), perfectly reproducing experiment can be as difficult as trying to get the experiment to work in the first place (maybe harder). Even if the effect is being measured is "true", subtle differences in experimental methodologies (that the researches are unaware are different) can cause problems. Indeed, there are at least a number of instances where the experimental methodology trying to reproduce the original protocol was flawed (I have written about one such case here). While I could spend time quibbling about the methodology used to determine the numbers, there is no doubt that there is some fraction of papers that are published, where the results from experiments are not repeatable at all, or are deeply confounded and are meaningless. I will say, that most of the studies looking at this take a very broad view of "failure to replicate". However, I have no doubt that research into "replication" will increase, and this is a good thing. Indeed, I have no idea why studies like this would suggest that studies with "redundant" findings would have "no enduring value".

So with all of these concerns, why am I optimistic? Mostly because I do not think that the actual rate of fraud or irreproducibility is increasing. Instead, I think that there has been an important change in how we read papers, detect and most importantly report problems with studies and the general process of post publication peer review (PPPR). Sites like retraction watch, pubpeer, pubmed commons, F1000 as well as on individual blogs (such as here and here) are enabling this. Not only are individual scientists working together to examine potential problems in individual studies, but these often lead to important discussions around methodology and interpretation (often in the comments to individual posts). This does not always mean that the people making the comments/criticisms about potential problems are always correct themselves (they will have their own biases of course), but potential issues can be raised, discussed by the community and resolved. These may lead to formal corrections or retractions in a few cases. Most of the time it usually leads to the recognition that the scope of impact of the study may ultimately be more limited than the authors of the original study suggested. It also (I hope) leads to new and interesting questions to be answered. Thus, the apparent increase in retraction rates and reproducibility issues most likely reflects an increased awareness and sensitivity to these issues. This may be (psychologically) a similar issue where despite crime rates decreasing, increased scrutiny, vigilance and reporting in our society make it seem like the opposite is happening.

I also want to note that despite comments I often see (on twitter or in articles) that pre-publication peer review is failing completely (or is misguided), I think that it remains a successful first round of filtering. In addition to serving (and having previously served) as associate editor on a number of journals (and having reviewed papers for many many more) I would estimate that 80% of reviews I have seen from other authors (including for my own work) have been helpful, improve the science, the interpretation and clarity of communication. Indeed as a current AE at both Evolution and The American Naturalist the reviews for papers I handle are almost always highly professional, very constructive and improve the final published papers. Usually this results from pointing out issues of analysis, potential confounding issues in the experiments and concerns with interpretation. Often one of the major issues relates to the last point, reviewers (and editors) can often reduce the over-interpretation and hyperbole that can be found in initial submissions. This does not mean I do not rage when I get highly critical reviews of my own work. Nor does it mean that I think that trying to evaluate and predict the value and impact of individual studies (and whether it has any part in the review process and selection for particular journals) remains deeply problematic. However, in balance my experience is that this does improve studies and remains an important part of the process. Further, I do not see (despite the many burdens on the time of scientists, and that reviewing remains unpaid work) evidence for a decline in quality of reviews. Others (like here and here) have said similar things (and probably more eloquently).

As for the replication issues. This is clearly being taken seriously by many in the scientific community, including funding agencies which are now setting aside resources specifically to address this. Moreover, many funding agencies are now not only requiring that papers funded by them be made open access within 12 months of publication, but (and along with a number of journals) are requiring all raw data (and I hope soon scripts for analyses) to be deposited in public repositories.

I think the scientific community still has a lot to do on all of these issues, and I would certainly like to see it happen faster (in particular with issues like publishing reviews of papers along with paper, more oversight into making sure all raw data, full experimental methodology and scripts are made available at the time of publication). However, in my opinion, it does seem like the concerns, increased scrutiny and calls for increased openness are all a part of science and are being increasingly formalized. We are not waiting decades or hundreds of years to correct our understanding based on poor assumptions (or incorrectly designed or analyzed experiments), but often just months or weeks. That is a reason to be optimistic.

note: Wow, it has been a long time since my last post. I can just say I have moved from one University (Michigan State University) to another (McMaster University) and my family and I are in the process of moving as well (while only ~400km, it is from the US to Canada...more on that in a future post). I will try to be much better after the move next week!

Tuesday, October 22, 2013

How easy should it be to replicate scientific experiments?

The economist just published a pair of articles broadly about the state of affairs in scientific research (and from their perspective everything is in a tail spin). "How Science Goes Wrong" and " Trouble at the lab". Both articles are worth reading, although few will find themselves in agreement with all of their conclusions. Neither article takes very long to read, so I will not try to sum up all of the arguments here. For two very different perspectives on these articles check out Jerry Coyne's blog who largely agrees with the statements they make. An alternative perspective on why these articles missed the mark almost entirely, see the post by Chris Waters my colleague here at Michigan State University . Chris points out that most studies do not represent a single experiment examining a particular hypothesis, but several independent lines of evidence pointing in a similar direction (or at least excluding other possibilities).

However, instead of going through all of the various arguments that have been made, I want to point out some (I think) overlooked issues about replication of scientific experiments. Principally that it can be hard, and even under extremely similar circumstances stochastic effects (sampling) may alter the results, at least somewhat.

Let's start by assuming that the original results are "valid", at least in the sense that there was no malfeasance (no results were faked), the experiments were done reasonably well (i.e. those performing the experiments did them well with appropriate controls), and that the results from the experiments were not subject to "spin" and no crucial data was left out of the paper (that may negate the results of the experiments). In other words, ideally what we hope to see out of scientists.

Now, I try and replicate the experiments. Maybe I believe strongly in the old adage "trust but verify" (in other words be a skeptical midwesterner). Perhaps, the experimental methods or results seem like a crucial place to begin for a new line of research (or as an alternative approach to answering questions that I am interested in).

So, I diligently read the methods of the paper summarizing the experiment (over and over and over again), get all of the components I need for the experiment, follow it as best as possible, and .... I find I can not replicate the results. What happened? Instead of immediately assuming the worst from the authors of the manuscript, perhaps consider some of the following as well.

1- Description of methodological detail in initial study is incomplete (this has been and remains a common issue). Replication is based on faulty assumptions introduced into the experiment because of missing information in the paper. Frankly this is the norm in the scientific literature, and it is hardly a new thing. Whether I read papers from the 1940's, 1970's or from the present I generally find the materials and methods section lacking, from the perspective of replication. While this should be an easy fix in this day and age (extended materials and methods included as supplementary materials or with the data itself when it is archived), it rarely is.

What should you do? Contact the authors! Get them on the phone. Often email is a good start, but a phone or skype call can be incredibly useful at getting all of the details out of those who did the experiment. Many researchers will also invite you to come spend time at their lab to try out the experiment under the conditions, which can really help. It also (in my mind) suggests that they are trying to be completely above board and feel confident about their experimental methods, and likely their results as well. If they are not willing to communicate with you about their experimental methods (or to share data, or how they performed their analysis), you will probably be in good shape to feel skeptical about how they have done their work.

2- Death by a thousand cuts. One important issue (relating to the above) is that it is almost impossible to perfectly replicate an experiment, ingredient for ingredient (what we call reagents). Maybe the authors used a particular enzyme. So you go ahead and order that enzyme, but it turns out to be from a different batch, and the company has changed the preservative used in the solution. Now, all of a sudden the results stop working. Maybe the enzyme itself is slightly different (in particular if you order it from a different company).

If you are using a model organism like a fruit fly, maybe the control (wild type) strain you have used is slightly different than the one from the original study. Indeed, in the post by Jerry Coyne mentioned above, he discusses three situations where he attempted to replicate other findings and failed to do so. However, in at least two of the cases I know about, it turned out that there were substantial differences in the wild type strains of flies used. Interesting arguments ensued, and for a brief summary of it, check out box 2 in this paper. I highly recommend reading the attempts at replication by Jerry Coyne and colleagues, and responses (and additional experiments) by the authors of the original papers (in particular for the role of the tan gene in fruit fly pigmentation).

Assuming that the original results are valid, but you can not replicate them, does it invalidate the totality of the results? Not necessarily. However, it may well make the results far less generalizable, which is important to know and is an important part of the scientific process.

3- Sampling effects. Even if you follow the experimental protocol as closely as possible, with all of the same ingredients and strains of organisms (or cell types, or whatever you might be using), you may still find somewhat different results. Why? Stochasticity. Most scientists take at least some rudimentary courses in statistics, and one of the first topics they learn about is sampling. If you have a relatively small number of independent samples that you use (a few fruit flies for your experimental group, compared to a small number in their control group), there is likely to be a lot of stochasticity in your results because of sampling. Thankfully we have tools to quantify aspects of the uncertainty associated with this (in particular standard errors and confidence intervals). However for many studies they treat large quantitative differences as if they were essentially discrete (compound A turns transcription of gene X off....). Even if the effects are large, repeating the experiment again may result in somewhat different results (different estimate, even if confidence intervals overlap).

If the way you assess "replication" is something like "compound A significantly reduced expression of gene X in the first experiment, does it also significantly reduce expression upon replication", then you may be doomed to frequently failing to replicate results. Indeed statistical significance (based on p values etc...) is a very poor tool in statistics. Instead you can ask whether the effect is in the same direction, and whether the confidence intervals between the initial estimate and the new estimate upon replication overlap.

Ask the authors of the original study for their data (if it is not already available on a data repository), so you can compute the appropriate estimates, and compare them to yours. How large was their sample size? How about yours? Can that explain the differences?

4- Finally, make sure you have done a careful job at replicating the initial experiment itself. I have seen a number of instances where it was not the initial results, but the replication itself which was suspect.

Are there problems with replication in scientific studies? Yes. Are some of the due to the types of problems as discussed in the economist or on retraction watch? Of course. However, it is worth keeping in mind how hard it is to replicate findings, and this is one of the major reasons I think meta-analyses are so important. It also makes it clear why ALL scientists need to make their data available through disciplinary or data type specific repositories like DRYAD, NCBI GEO, the short read archive or more general ones like figshare.

Monday, October 14, 2013

Fallout from John Bohannon's "Who's afraid of peer review"

As many many scientists, librarians and concerned folk who are interested in scientific publishing and the state of peer review are aware, the whole 'verse' was talking about the "news feature" in Science by John Bohannon entitled "Who's afraid of peer review?".

The basics of the article was a year long "sting" operation on a "select" group of journals (that happened to be open access.. more on this in a second) focusing in part on predatory/vanity journals. That is some of the journals had the "air" of a real science journal, but in fact would publish the paper (?any paper?) for a fee. Basically Bohannon generated a set of faux scientific articles that at a first (and superficial) glance appeared to represent a serious study, but upon even modest examination it would be clear to the reader (i.e. reviewers and editors for the journal) that the experimental methodology was so deeply flawed that the results were essentially meaningless.

Bohannon reported that a large number of the journals he submitted to accepted this article, clearly demonstrating insufficient (or non-existent peer review). This and the head line has apparently lead to a large amount of popular press, and many interviews (I only managed to catch the NPR one I am afraid).

However, this sting immediately generated a great deal of criticism both for the way it was carried out, and more importantly the way the results were interpreted. First and foremost (to many) that ALL of the journals that were used were open access, and thus no control group for journals with the "traditional" subscription based models (where libraries pay for subscription to the journals). In addition, the journals were sieved to over-represent the shadiest predatory journals. That is it did not represent a random sample of open access journals. One thing that really pissed many people off (in particular among advocated of open access journals, but even beyond this group) that Science (A very traditional subscription based journal) used the summary headline: "A spoof paper concocted by Science reveals little or no scrutiny at many open-access journals.", clearly implying that there was something fundamentally wrong with open access journals. There are a large number of really useful critiques of the article by Bohannon including ones by Michael Eisen, The Martinez-Arias lab, Lenny Teytelman, Peter Suber, Adam Gunn (including a list of other blogs and comments about it at the end). There is another list of responses found here as well. Several folks also suggested that some open access advocates were getting overly upset, as the sting was meant to focus on just the predatory journals. Read the summary line from the article highlighted in italics above, as well as the article and decide for yourself. I also suggest looking at some of the comment threads as Bohannon does join in on the comments Suber's post, and many of the "big" players are in on the discussion.

A number of folks (including myself) were also very frustrated with how Science (the magazine) presented this (and not just for the summary line). Making the "sting" appear to be scientifically rigorous in its methods, but then turning around and saying this is just a "news" piece whenever any methodological criticism is discussed. For instance, when readers commented about both the lack of peer review and the biased sampling of journals used for the "sting" operation for Bohannon's article, this was a response by John Travis (managing editor of News for Science magazine):

I was most interested in the fact Science (the journal) had an online panel consisting of Bohannon, Eisen and David Roos (as well as Jon Cohen Moderating) to discuss these issues. Much of it (especially in the first half hour) is worth watching, I think it is important to point out that Bohannon suggests he did not realize how his use of only OA journals as part of the sting operation would be viewed. He suggests that he meant this as largely a sting of the predatory journals, and that if he did it again he would have included the subscription based journals as a control group. You can watch it and decide for yourself.

The panelists also brought up two other important points that seem to not get discussed as much in the context of open access vs. subscription models for paying for publication or for peer review.

First, many subscription based journals (including Science) have page charges and/or figure charges that the author of the manuscript pays to the journals. As discussed among the panelists (and I have personal experience with paying for publication of my own research), these tend to be in the same ballpark as for the publication of open access papers. Thus the "charge" that the financial model for publication for OA journals would lead to more papers being accepted is true for many of the subscription journals as well (in particular for journals that are entirely online).

Second (and the useful point to come out of Bohannon's piece) is that there are clear problems with peer review being done sufficiently well. One suggestion that was made by both Eisen and Roos (and has been suggested many times before) is that the reviews provided by the peer referees of the manuscript and the editor could be published alongside (or as supplemental data on figshare) the accepted manuscript, so that all interested readers can assess the extent to which peer review was conducted. Indeed there are a few journals which already do this such as PeerJ, Embo J, ELife, F1000 Research, Biology Direct and some other BMC-series (see here for an interesting example), Molecular Systems Biology, Copernicus Journals. Thanks to folks on twitter for helping me put together this list!

This latter point (providing the reviews alongside published papers) seems to be so trivial to accomplish, and the reviewers names could easily remain anonymous (or they could provide their names providing a degree of academic credit and credibility to the scientific community) if so desired. So why has this not happened for all scientific journals? I am quite curious about whether there are any reasons NOT to provide such reviews?

Wednesday, January 23, 2013

Some further thoughts on "risky" research and the culture of science

This is just some further thoughts on an old post regarding the New York Times article " Grant system leads cancer researchers to play it safe". In that post I mulled over the idea that the mentoring process for young scientists trains us (as a community) to be hyper-critical and skeptical. Now of course scientists are individuals, and we vary a lot. Indeed there are lots of scientists who tend to be optimists, and take their (and other peoples) data at face value, while others spend their careers taking apart the ideas of others. There is of course room for all of these approaches. We need the creative spark of people to generate new models, and data to test them, and other scientists who test the logic or validity of these ideas and models. This is part of what makes the scientific process work so well. But, how might this affect the potential funding of risky "science"? Given that there are limited resources available to fund science research, if one reviewer of a proposal is highly skeptical of the ideas, while all of the other reviewers like them, will that be enough to have the proposal rejected for funding?

I am certain that if I "polled" many of my fellow scientists, they would all point to at least one proposal they submitted that failed to be funded based on one review, while all of the other reviewers loved it. It is not so different from what going onto Rottentomatoes. There are some movies where many reviewers love it, while others hate it. Indeed I have never seen a movie reviewed where there is complete agreement. Not surprisingly, the same is true for the scientific review process (although I would hope for different reasons).

However, this has all made me think about the differences in the way countries provide public funds for scientific research. In particular, in the U.S., the funding system tends to have both strong "boom-bust" cycles, naturally tied to the economy as a whole, but also strongly tied to fads in scientific research (sometimes called "sexy science"). Now, we are only human, and while nerdly as it may be, scientists can be enamoured by new and very interesting findings. Naturally this leads to many other scientists to want to join into this new area, and when grant proposals are reviewed on this research, the reviewers may themselves be entranced by the ideas, and pin their own hopes for future research successes on these new ideas or methods or approaches.

Indeed in my own field of Genetics, I have watched such transformations occur numerous times in my relatively short experience working in the field. This has happened both due to changes in technology as well as statistical methodology ( more on this in a future posting). In each instance, the same basic pattern emerged. First there was almost unanimous excitement and hope that these new approaches would solve all sorts of persistant problems in the field (for instance finding the set of genes that contribute to disease X). Shortly after, there were a few dissenting voices (largely ignored) that pointed out some of the shortcomings of the approach or method. Then in the next 2-3 years, as more and more people used these approaches or methods (or tested these new ideas), more and more issues were uncovered. And just then, when hope was beginning to fade, a new idea/method/technology was discovered, and so the cycle continued....

So how does all of this affect the funding for "risky research". Honestly I do not know. But I think it is worth considering. Any thoughts?