The economist just published a pair of articles broadly about the state of affairs in scientific research (and from their perspective everything is in a tail spin). "How Science Goes Wrong" and " Trouble at the lab". Both articles are worth reading, although few will find themselves in agreement with all of their conclusions. Neither article takes very long to read, so I will not try to sum up all of the arguments here. For two very different perspectives on these articles check out Jerry Coyne's blog who largely agrees with the statements they make. An alternative perspective on why these articles missed the mark almost entirely, see the post by Chris Waters my colleague here at Michigan State University . Chris points out that most studies do not represent a single experiment examining a particular hypothesis, but several independent lines of evidence pointing in a similar direction (or at least excluding other possibilities).
However, instead of going through all of the various arguments that have been made, I want to point out some (I think) overlooked issues about replication of scientific experiments. Principally that it can be hard, and even under extremely similar circumstances stochastic effects (sampling) may alter the results, at least somewhat.
Let's start by assuming that the original results are "valid", at least in the sense that there was no malfeasance (no results were faked), the experiments were done reasonably well (i.e. those performing the experiments did them well with appropriate controls), and that the results from the experiments were not subject to "spin" and no crucial data was left out of the paper (that may negate the results of the experiments). In other words, ideally what we hope to see out of scientists.
Now, I try and replicate the experiments. Maybe I believe strongly in the old adage "trust but verify" (in other words be a skeptical midwesterner). Perhaps, the experimental methods or results seem like a crucial place to begin for a new line of research (or as an alternative approach to answering questions that I am interested in).
So, I diligently read the methods of the paper summarizing the experiment (over and over and over again), get all of the components I need for the experiment, follow it as best as possible, and .... I find I can not replicate the results. What happened? Instead of immediately assuming the worst from the authors of the manuscript, perhaps consider some of the following as well.
1- Description of methodological detail in initial study is incomplete (this has been and remains a common issue). Replication is based on faulty assumptions introduced into the experiment because of missing information in the paper. Frankly this is the norm in the scientific literature, and it is hardly a new thing. Whether I read papers from the 1940's, 1970's or from the present I generally find the materials and methods section lacking, from the perspective of replication. While this should be an easy fix in this day and age (extended materials and methods included as supplementary materials or with the data itself when it is archived), it rarely is.
What should you do? Contact the authors! Get them on the phone. Often email is a good start, but a phone or skype call can be incredibly useful at getting all of the details out of those who did the experiment. Many researchers will also invite you to come spend time at their lab to try out the experiment under the conditions, which can really help. It also (in my mind) suggests that they are trying to be completely above board and feel confident about their experimental methods, and likely their results as well. If they are not willing to communicate with you about their experimental methods (or to share data, or how they performed their analysis), you will probably be in good shape to feel skeptical about how they have done their work.
2- Death by a thousand cuts. One important issue (relating to the above) is that it is almost impossible to perfectly replicate an experiment, ingredient for ingredient (what we call reagents). Maybe the authors used a particular enzyme. So you go ahead and order that enzyme, but it turns out to be from a different batch, and the company has changed the preservative used in the solution. Now, all of a sudden the results stop working. Maybe the enzyme itself is slightly different (in particular if you order it from a different company).
If you are using a model organism like a fruit fly, maybe the control (wild type) strain you have used is slightly different than the one from the original study. Indeed, in the post by Jerry Coyne mentioned above, he discusses three situations where he attempted to replicate other findings and failed to do so. However, in at least two of the cases I know about, it turned out that there were substantial differences in the wild type strains of flies used. Interesting arguments ensued, and for a brief summary of it, check out box 2 in this paper. I highly recommend reading the attempts at replication by Jerry Coyne and colleagues, and responses (and additional experiments) by the authors of the original papers (in particular for the role of the tan gene in fruit fly pigmentation).
Assuming that the original results are valid, but you can not replicate them, does it invalidate the totality of the results? Not necessarily. However, it may well make the results far less generalizable, which is important to know and is an important part of the scientific process.
3- Sampling effects. Even if you follow the experimental protocol as closely as possible, with all of the same ingredients and strains of organisms (or cell types, or whatever you might be using), you may still find somewhat different results. Why? Stochasticity. Most scientists take at least some rudimentary courses in statistics, and one of the first topics they learn about is sampling. If you have a relatively small number of independent samples that you use (a few fruit flies for your experimental group, compared to a small number in their control group), there is likely to be a lot of stochasticity in your results because of sampling. Thankfully we have tools to quantify aspects of the uncertainty associated with this (in particular standard errors and confidence intervals). However for many studies they treat large quantitative differences as if they were essentially discrete (compound A turns transcription of gene X off....). Even if the effects are large, repeating the experiment again may result in somewhat different results (different estimate, even if confidence intervals overlap).
If the way you assess "replication" is something like "compound A significantly reduced expression of gene X in the first experiment, does it also significantly reduce expression upon replication", then you may be doomed to frequently failing to replicate results. Indeed statistical significance (based on p values etc...) is a very poor tool in statistics. Instead you can ask whether the effect is in the same direction, and whether the confidence intervals between the initial estimate and the new estimate upon replication overlap.
Ask the authors of the original study for their data (if it is not already available on a data repository), so you can compute the appropriate estimates, and compare them to yours. How large was their sample size? How about yours? Can that explain the differences?
4- Finally, make sure you have done a careful job at replicating the initial experiment itself. I have seen a number of instances where it was not the initial results, but the replication itself which was suspect.
Are there problems with replication in scientific studies? Yes. Are some of the due to the types of problems as discussed in the economist or on retraction watch? Of course. However, it is worth keeping in mind how hard it is to replicate findings, and this is one of the major reasons I think meta-analyses are so important. It also makes it clear why ALL scientists need to make their data available through disciplinary or data type specific repositories like DRYAD, NCBI GEO, the short read archive or more general ones like figshare.