Thursday, October 24, 2013

My thoughts for the panel on "open access and the future of scholarly publishing"

On Tuesday, as part of open access week I participated in a panel "Publishing, Authoring, and Teaching in the Evolving Open Access Environment: A Panel Discussion". While this is not a word for word write-up, this is more or less the gist of what I said.

When I was asked whether I would be willing to participate in a panel discussion here at Michigan State University on the role of open access journals and the future of academic publishing I said yes. While I am not convinced I am particularly knowledgeable about it, I thought that it  provided an opportunity to collect my thoughts, a manifesto of "how I communicate science, and why I do it that way".

While I do tweet and blog about aspects of open science, including open access publishing, I am not one of the most outspoken advocates, and only a moderate practitioner. I publish, review and edit in/for open access journals, but not exclusively. I continue to publish in many "subscription journals" that represent the journal of record for my field, or those with some inferred "prestige". I do happen to regularly discuss issues about open science, including open access publishing with many folks, but as you will see I am not sure where I fall down on it.

I work in the basic life sciences, at the interface of evolutionary biology, genetics and genomics. The norms of scholarly communication differ substantially from field to field, in terms of what is considered productive scholarship, books VS. articles, authorship and a host of other issues. Even within the natural sciences, scholarly communication differs between biology and say physics. So, my experience and understanding remains narrow and I claim no expertise.

I think that the future of scholarly publishing will be open access, in some shape or form. That is, the majority of published manuscripts will eventually be freely available to anyone with internet access. How do we get there? I have no idea. Will this be due to broad mandates from funding agencies and  Universities to deposit manuscripts into repositories? Will journals generally agree to make content freely available after a fixed amount of time (6-12 months) - so called green open access? Or will gold open access become the norm where authors pay to have work reviewed or published?  Likely a combination of these and other approaches, but I am not good at such guess work.

So why do we care about OA in the first place?

Several reasons.  

This has been discussed by many before, so my thoughts on this are brief. For more detailed thought, check out Peter Suber's book "open access".

If you happen to be on the MSU library site, and happen to click the faculty page you will see on the right hand side links to a number of things including "Crisis in Scholarly Communication". The discussion on these pages is about the increasingly difficult access to scholarly publications. The basic reason is that while academic library budgets tend to be relatively flat, the cost of subscriptions to academic journals continues to increase very rapidly. Often this is because many of the subscription (i.e. for profit) publishers are commonly practicing bundling of journals. So if you want journal A, you also have to subscribe to x (pick a country) journal of y (sub-field) of z (pick organism).

Why should scholarly work be behind a paywall, and thus inaccessible. 

In particular for scientific (and medically relevant work) it could benefit researchers, patients and doctors (who would otherwise not have access). Open access allows the whole public to look at the research if they so choose. This also removes one small barrier in the perception of the ivory tower, and rebuilding some trust with the public (more on this later).

Who is paying for the research? Generally not the publisher making money for the paper.

 At least in the sciences, research is usually paid from grants from federal agencies, and salaries are paid from those grants, or from the University (such as MSU) which in part comes from state allocations and tuition dollars. The manuscript is then reviewed by referees usually for free (as part of our scholarly role) including as scientific editor (which is also not usually a paid position, at least for associate editors). Under the current system most referees get nothing (neither money nor any other incentive) for this essential service, and their pay is from their institution (and does not depend upon them performing this service). The publisher may maintain the electronic system to shuttle the draft manuscript to the referees, and if accepted performs copyediting and typesetting.  There are exceptions to the rule (I have had absolutely excellent editing advice on both the writing and communicating the science for a recent paper in Trends in Genetics from the managing editor for instance), but this has not generally been the case for me.

Thus the publisher is making a great deal of profit, despite having only done a fraction of the work. They (not the authors) retain the copyright on the work. This is potentially a big problem.

How I got into this

I will tell you about how I got engaged with the ideas of open access publishing, as a small part of the larger endeavor to make science in academia more open, transparent, reproducible, and in such a way that scientific ideas and data are communicated more quickly and effectively. But I also will describe many of the remaining stumbling blocks that relate to views of open access journals specifically, and the nefarious concept of prestige in publishing and how that influences hiring, grants, and promotion.

The crisis in scholarship is much bigger than open access.

Why is there a crisis?

While open access of published work is certainly a factor in the crisis in scholarly communication, and the one I will speak the most about today, it is not the only factor (issues with peer review, reproducible research, sharing of data and code, etc..).  There have been a slew of articles including a few in the Economist and the Guardian in the last few weeks on aspects of this crisis. Essentially it is argued that scientific research is in a tailspin, there is no effort to do quality science, and everything is about quantity of papers and prestige (i.e. spin) with little effort to make sure the work is valid, well reviewed or replicated. That is the incentive system for scientists is completely out of whack with the process of good science

Is there a crisis? I am a skeptic and cynic, so I think there are some real concerns. However the little optimist voice in my head also points out that there are some great opportunities as well to help to not only resolve this crisis, but make science/scholarly research better, and far more dynamic. Indeed there is a vocal and active community trying to make this all much better.

Before I delve into the specifics about the need for open access (and what might be stopping some of us from diving into it completely), I want to speak about the larger crisis of the scholarly enterprise in general, and how open science initiatives can fix this. In my mind this comes down to an issue of trust. Trust between collaborators. Trust between scientists working in the same field. Trust between researchers at different stages of their careers (graduate students, post-docs and PIs). Finally, we need trust between scientists and the general population. Not just how the public perceives scientists (and scholars in general) and the work they do, but that we do our science in an open way that leads to the appropriate self-correcting mechanisms. However, even beyond the large number of anti-science and anti-intellectual movements out there, there has been a substantial loss of trust in how scholars operate, and what motivates us.

Between the large number of research articles that get picked up by the popular press about "cures" or of genes for "this that and the other", only to have such research shown to be largely (or entirely) incorrect a few months later.  Combine this with the many news articles that point out the lack of repeatability of scientific studies, or examples of scientific fraud and misconduct, our lay audience (and those that ultimate help pay for our research and salaries) are perhaps becoming quite skeptical. The lack of access to the scientific literature for the public due to paywalls (from subscription based journals)  is simply another large nail in the coffin for the trust that scientists and science communicators have been trying to build with the public.

Frankly our motives are questioned, and not just by the public at large. They are also questioned by our graduate students too. As undergraduates (or from watching nature shows) they see this amazing wonderful universe to study, but then come to do research as graduate students and realize that a business model has taken place with a culture of "scientific stardom" being the goal for many. Worse than that, they see a perverse incentive system where quantity of publications and prestige over where articles are published has taken hold, and the overall quality of scientific research is the perceived victim (has this been evaluated?). There are many folks to blame, including university administrators, the "high prestige" journals, etc, but we first need to look at ourselves (practicing scientists) for accepting and adopting this system as it has developed into the status quo.

So how do we fix it?

Let's think about the aspects of open science. Not only do we need to communicate what we do more effectively, but we need to make everything far more transparent.

Open Science

Much has been written about open science, and the open science movement. It has many goals, but I would say in general the two most important ones are to increase transparency of the scientific process and to speed (and open) up science communication. There are many aspects going all of the way from open "lab notebooks" (here and here). Submitting pre-prints (papers prior to peer review and formal acceptance at a journal) to repositories to speed up science communication. Increasing openness in peer review, so that all can see the comments of the peer referees and editors. This can include pre and post publication review (pubpeer, pubmed commons). Sharing of raw data associated with research papers as well as all of the details of how the analysis was performed (the computer code associated with it). It all comes down to the fact the published paper is not the science itself, but a precis (or an advert) for the actual scientific work. All of the data, the work and even the peer review process itself is part of the scholarship of science. Making some or all of this available will not only help with transparency, but will speed up the scientific process. Having access to the raw data may also help to answer all sorts of new and interesting questions. I have always loved this quote by Sir William Bragg:

"The important thing in science is not so much to obtain new facts as to discover new ways of thinking about them"

Open Access and the entanglement of scholarly publishing with prestige and other incentives

Crossroads for publishing

At least in the life sciences, it is clear that publishing articles in "high profile" journals like Nature, Science, PNAS (and a few others) can make or break a career. The prestige associated with such articles can trump many things. Having such publications can open many doors in terms of jobs, grants, tenure, invited talks and more. Indeed last week in Nature, articles were written about just this phenomenon, the so called "golden club". Not surprisingly most of the traditional journals with such cachet are subscription based, although there are at least two open access journals that are certainly up there (PLoS Biology and ELife).

Since most of the prestigious journals are subscriptions based, and there are such strong incentives to publish in them, it makes it very difficult for many researcher to move to publishing in open access journals (although they can still submit papers to institutional or disciplinary repositories). If I had the opportunity to publish in Nature or Science, would I? Yes, precisely because I know that having such a publication will open doors, aid in getting grants, promotions and raises. While I feel strong support for open access, I am frankly not above such concerns. Some of this may simply reflect my petty needs for external validation of my science (which I can get over), but grants and raises potentially influences the quality of my life and my work. It is hard to pass that up.

Perceptions of open access journals

There remains a common misperception that many open access journals are nothing more than predatory or "vanity" publications with little or no rigorous peer review. A recent "sting" by the science journalist John Bohannon in Science has done little to help this perception. Too much has already been written about this article, mostly highly critical of his methods, biased sampling approach and lack of a control group. While It was presented as a news piece (not a scholarly article), Bohannon  has stated on several occasions that his original plan was to submit this to PLoS One (an open access article with peer review) so this remains an issue.

As Peter Suber (among many others) describes in his book there are currently two models of open access. The first (so called green OA) means that while papers may be published in subscription based journals, a free version of the accepted manuscript (usually without final copyediting and typesetting provided by the journal) is placed in a repository such as pubmed central or an institutional archive. Often this "free" version has a 6-12 month delay before being released. Since this model of green OA is a required stipulation for projects funded by organizations such as the NIH, some journals (where the majority of authors are funded by such agencies) are now just making all of their content open access after a 12 month embargo).

The other major model for open access publishing is gold open access. In this case, once  a manuscript has gone through peer review by expert referees and academic editors, and it is accepted, then the authors of the manuscript are charged a fee for typesetting and (usually online) publication.  Thus the model is that the authors, not the readers are charged. In the life sciences the funds for this usually (more on this in a second) come from granting agencies, although many journals have fee waivers (or no fees at all).

The concern with this of course is this may create a perverse incentive system, such that journals would increase their acceptance rate to increase profit (~1-2K/paper accepted). Thus the rigor of peer review could be negatively impacted, resulting in so called "vanity publications" that have the veneer of scientific rigor and peer review, but in fact do not. Couple this with so called predatory (scam) journals (that are much like other scam spam).  Before open access journals existed, such vanity journals already existed among the subscription models. And as the Bohannon sting has shown us, journals that are published by well known publishers like Elsevier, are not above being "stung" and by accepting faux articles with obviously flawed methodologies as well. Beyond that, somewhere on the order of 70% of all open access journals have no author side fees.

Despite this issue, and the existence of predatory and possible vanity journals (such as many of those found on Beall's list), the Open access scholarly publishers association has a code of conduct for journals aiming to maintain reputable scholarly journals from predatory ones. From my perspective, the journal PLoS One which in many ways represents the flagship of open access journals (peer review entirely based upon technical soundness of the experiments and interpretation, not upon subjective assessment of novelty) was noted for how thorough the review process (and rejection was). The other worth while point is to take a look at the re-analysis done by Brian Wood. Seems like the one thing that journal impact factor might be useful at doing is predicting whether a reasonable amount of peer review might take place.

It is also worth pointing out that problems with sufficient peer review occurs with subscription based journals as well. In addition there has been a history of these so called vanity journals even among subscription publishers. In addition, many journals with subscription based models also have page and figure charges that the authors must pay, so some of the same incentives also apply to these journals. In my own personal experience, these page charges end up being about the same cost as publication in open access journals. So many of the same charges against OA journals can equally be leveled against such journals.

Why have I not embraced open access completely?

It is probably clear from my perspective on all of this, that I am firmly in favour of open access models of publishing, and like I stated from the outset, I do think that this is where everything is going to, although by what model I am not sure.

 So given all of this, why don't I publish exclusively in open access journals? Well there are two reasons, or possibly one reason arising from two different parts of my mind.

The first relates to "establishment" journals. In my field, there are several well regarded journals that have persisted for a very long time, some for over a century such as Genetics. In my field, publishing my work in journals such as Genetics, or in Evolution means that A) It has a natural readership. B) These are the same journals that shaped my understanding of the field during my intellectual development and so I have a fondness for them and C) While they may not have the cachet of Nature, Science and the like.. there is no doubt that in my field they are considered well regarded journals. D) These also represent the journals for my professional societies, which I actively support and promote above and beyond their role in scholarly communication.

As for the second set of reasons.. I am not sure I am willing to be a martyr. In other words, I may be acting with a great deal of cowardice. Despite having been an editor at PLoS One for many years, and I stand by the rigor of reviews by my referees and myself, there is no doubt that there are many in the community who still believe that it (as a journal) accept anything. If I choose to publish all of my work (and that of my students and postdocs), I risk losing readership. If such views are held by university administrators I risk loss of salary raises promotions and grants (depending on the panel).

Thus until the incentive system has changed, and this can only change by concerted effort between university administrations, grant program officers and well established scholars in each of our fields embracing such changes, many researchers like myself will continue using this screwed up system, because of the incentives, risking further erosion of public trust.  Is my half way attitude a cop-out. Yes. Some horrible mix of rationalization, cowardice and avarice I suppose. Am I likely to change my behaviour? Probably not until my mortgage is paid off and my kids have finished university.

Tuesday, October 22, 2013

How easy should it be to replicate scientific experiments?

The economist just published a pair of articles broadly about the state of affairs in scientific research (and from their perspective everything is in a tail spin). "How Science Goes Wrong" and " Trouble at the lab". Both articles are worth reading, although few will find themselves in agreement with all of their conclusions. Neither article takes very long to read, so I will not try to sum up all of the arguments here.  For two very different perspectives on these articles check out Jerry Coyne's blog who largely agrees with the statements they make. An alternative perspective on why these articles missed the mark almost entirely, see the post by Chris Waters my colleague here at Michigan State University . Chris points out that most studies do not represent a single experiment examining a particular hypothesis, but several independent lines of evidence pointing in a similar direction (or at least excluding other possibilities).

 However, instead of going through all of the various arguments that have been made, I want to point out some (I think) overlooked issues about replication of scientific experiments. Principally that it can be hard, and even under extremely similar circumstances stochastic effects (sampling) may alter the results, at least somewhat.

Let's start by assuming that the original results are "valid", at least in the sense that there was no malfeasance (no results were faked), the experiments were done reasonably well (i.e. those performing the experiments did them well with appropriate controls), and that the results from the experiments were not subject to "spin" and no crucial data was left out of the paper (that may negate the results of the experiments). In other words, ideally what we hope to see out of scientists.

Now, I try and replicate the experiments. Maybe I believe strongly in the old adage "trust but verify" (in other words be a skeptical midwesterner). Perhaps, the experimental methods or results seem like a crucial place to begin for a new line of research (or as an alternative approach to answering questions that I am interested in).

So, I diligently read the methods of the paper summarizing the experiment (over and over and over again), get all of the components I need for the experiment, follow it as best as possible, and .... I find I can not replicate the results. What happened? Instead of immediately assuming the worst from the authors of the manuscript, perhaps consider some of the following as well.

1- Description of methodological detail in initial study is incomplete (this has been and remains a common issue). Replication is based on faulty assumptions introduced into the experiment because of missing information in the paper. Frankly this is the norm in the scientific literature, and it is hardly a new thing. Whether I read papers from the 1940's, 1970's or from the present I generally find the materials and methods section lacking, from the perspective of replication. While this should be an easy fix in this day and age (extended materials and methods included as supplementary materials or with the data itself when it is archived), it rarely is.

What should you do? Contact the authors! Get them on the phone. Often email is a good start, but a phone or skype call can be incredibly useful at getting all of the details out of those who did the experiment. Many researchers will also invite you to come spend time at their lab to try out the experiment under the conditions, which can really help. It also (in my mind) suggests that they are trying to be completely above board and feel confident about their experimental methods, and likely their results as well. If they are not willing to communicate with you about their experimental methods (or to share data, or how they performed their analysis), you will probably be in good shape to feel skeptical about how they have done their work.

2- Death by a thousand cuts. One important issue (relating to the above) is that it is almost impossible to perfectly replicate an experiment, ingredient for ingredient (what we call reagents). Maybe the authors used a particular enzyme. So you go ahead and order that enzyme, but it turns out to be from a different batch, and the company has changed the preservative used in the solution. Now, all of a sudden the results stop working. Maybe the enzyme itself is slightly different (in particular if you order it from a different company).

 If you are using a model organism like a fruit fly, maybe the control (wild type) strain you have used is slightly different than the one from the original study. Indeed, in the post by Jerry Coyne mentioned above, he discusses three situations where he attempted to replicate other findings and failed to do so. However, in at least two of the cases I know about, it turned out that there were substantial differences in the wild type strains of flies used. Interesting arguments ensued, and for a brief summary of it, check out box 2 in this paper. I highly recommend reading the attempts at replication by Jerry Coyne and colleagues, and responses (and additional experiments) by the authors of the original papers (in particular for the role of the tan gene in fruit fly pigmentation).

Assuming that the original results are valid, but you can not replicate them, does it invalidate the totality of the results? Not necessarily. However, it may well make the results far less generalizable, which is important to know and is an important part of the scientific process.

3- Sampling effects. Even if you follow the experimental protocol as closely as possible, with all of the same ingredients and strains of organisms (or cell types, or whatever you might be using), you may still find somewhat different results. Why? Stochasticity. Most scientists take at least some rudimentary courses in statistics, and one of the first topics they learn about is sampling. If you have a relatively small number of independent samples that you use (a few fruit flies for your experimental group, compared to a small number in their control group), there is likely to be a lot of stochasticity in your results because of sampling. Thankfully we have tools to quantify aspects of the uncertainty associated with this (in particular standard errors and confidence intervals). However for many studies they treat large quantitative differences as if they were essentially discrete (compound A turns transcription of gene X off....).  Even if the effects are large, repeating the experiment again may result in somewhat different results (different estimate, even if confidence intervals overlap).

If the way you assess "replication" is something like "compound A significantly reduced expression of gene X in the first experiment, does it also significantly reduce expression upon replication", then you may be doomed to frequently failing to replicate results. Indeed statistical significance (based on p values etc...) is a very poor tool in statistics. Instead you can ask whether the effect is in the same direction, and whether the confidence intervals between the initial estimate and the new estimate upon replication overlap.

Ask the authors of the original study for their data (if it is not already available on a data repository), so you can compute the appropriate estimates, and compare them to yours. How large was their sample size? How about yours? Can that explain the differences?

4- Finally, make sure you have done a careful job at replicating the initial experiment itself. I have seen a number of instances where it was not the initial results, but the replication itself which was suspect.

Are there problems with replication in scientific studies? Yes. Are some of the due to the types of problems as discussed in the economist or on retraction watch? Of course. However, it is worth keeping in mind how hard it is to replicate findings, and this is one of the major reasons I think meta-analyses are so important. It also makes it clear why ALL scientists need to make their data available through disciplinary or data type specific repositories like DRYAD, NCBI GEO, the short read archive or more general ones like figshare.

Monday, October 14, 2013

Fallout from John Bohannon's "Who's afraid of peer review"

As many many scientists, librarians and concerned folk who are interested in scientific publishing and the state of peer review are aware, the whole 'verse' was talking about the "news feature" in Science by John Bohannon entitled "Who's afraid of peer review?".

The basics of the article was a year long "sting" operation on a "select" group of journals (that happened to be open access.. more on this in a second) focusing in part on predatory/vanity journals. That is some of the journals had the "air" of a real science journal, but in fact would publish the paper (?any paper?) for a fee. Basically Bohannon generated a set of faux scientific articles that at a first (and superficial) glance appeared to represent a serious study, but upon even modest examination it would be clear to the reader (i.e. reviewers and editors for the journal) that the experimental methodology was so deeply flawed that the results were essentially meaningless.

Bohannon reported that a large number of the journals he submitted to accepted this article, clearly demonstrating insufficient (or non-existent peer review). This and the head line has apparently lead to a large amount of popular press, and many interviews (I only managed to catch the NPR one I am afraid).

 However, this sting immediately generated a great deal of criticism both for the way it was carried out, and more importantly the way the results were interpreted. First and foremost (to many) that ALL of the journals that were used were open access, and thus no control group for journals with the "traditional" subscription based models (where libraries pay for subscription to the journals). In addition, the journals were sieved to over-represent the shadiest predatory journals. That is it did not represent a random sample of open access journals. One thing that really pissed many people off (in particular among advocated of open access journals, but even beyond this group) that Science (A very traditional subscription based journal) used the summary headline: "A spoof paper concocted by Science reveals little or no scrutiny at many open-access journals.", clearly implying that there was something fundamentally wrong with open access journals. There are a large number of really useful critiques of the article by Bohannon including ones by Michael Eisen, The Martinez-Arias lab, Lenny Teytelman, Peter Suber, Adam Gunn (including a list of other blogs and comments about it at the end). There is another list of responses found here as well.  Several folks also suggested that some open access advocates were getting overly upset, as the sting was meant to focus on just the predatory journals. Read the summary line from the article highlighted in italics above, as well as the article and decide for yourself. I also suggest looking at some of the comment threads as Bohannon does join in on the comments Suber's post, and many of the "big" players are in on the discussion.

A number of folks (including myself) were also very frustrated with how Science (the magazine) presented this (and not just for the summary line). Making the "sting" appear to be scientifically rigorous in its methods, but then turning around and saying this is just a "news" piece whenever any methodological criticism is discussed. For instance, when readers commented about both the lack of peer review and the biased sampling of journals used for the "sting" operation for Bohannon's article, this was a response by John Travis (managing editor of News for Science magazine):

I was most interested in the fact Science (the journal) had an online panel consisting of Bohannon, Eisen and David Roos (as well as Jon Cohen Moderating) to discuss these issues. Much of it (especially in the first half hour) is worth watching, I think it is important to point out that Bohannon suggests he did not realize how his use of only OA journals as part of the sting operation would be viewed. He suggests that he meant this as largely a sting of the predatory journals, and that if he did it again he would have included the subscription based journals as a control group. You can watch it and decide for yourself.

The panelists also brought up two other important points that seem to not get discussed as much in the context of open access vs. subscription models for paying for publication or for peer review.

First, many subscription based journals (including Science) have page charges and/or figure charges that the author of the manuscript pays to the journals. As discussed among the panelists (and I have personal experience with paying for publication of my own research), these tend to be in the same ballpark as for the publication of open access papers. Thus the "charge" that the financial model for publication for OA journals would lead to more papers being accepted is true for many of the subscription journals as well (in particular for journals that are entirely online).

Second (and the useful point to come out of Bohannon's piece) is that there are clear problems with peer review being done sufficiently well. One suggestion that was made by both Eisen and Roos (and has been suggested many times before) is that the reviews provided by the peer referees of the manuscript and the editor could be published alongside (or as supplemental data on figshare) the accepted manuscript, so that all interested readers can assess the extent to which peer review was conducted. Indeed there are a few journals which already do this such as PeerJ, Embo J, ELife, F1000 Research, Biology Direct and some other BMC-series (see here for an interesting example), Molecular Systems Biology, Copernicus Journals. Thanks to folks on twitter for helping me put together this list!

 This latter point (providing the reviews alongside published papers) seems to be so trivial to accomplish, and the reviewers names could easily remain anonymous (or they could provide their names providing a degree of academic credit and credibility to the scientific community) if so desired. So why has this not happened for all scientific journals?  I am quite curious about whether there are any reasons NOT to provide such reviews?