Genes Gone Wild

Thursday, July 16, 2015

The great divide in how researchers manipulate fruit flies

How do you manipulate your Drosophila, Paint brush or watch maker's forceps? You Tell me!

I bet when you read "manipulate" you thought I meant genetic manipulation, eh?

So last night while I was "pushing flies" - collecting flies for a genetic cross (mating scheme), - I was switching back and forth between my two favorite instruments (paint brush with short hair and forceps) to move around anesthetized flies (sorting based on genotype, sex, etc). For some reason (even though I have been a fly pusher for 18 years) I started to think about whether the brush or forceps was more efficient. After all, I have sorted (at a rough guess) several hundred thousand flies (or more?) during this time. Small efficiencies could save huge amounts of time.

For me I generally use forceps when I am collecting small numbers of distinct individuals (based on genotype & sex) in a large pile on the anesthesia plate (a plate with a porous surface where we pump carbon dioxide through to knock out flies). I use the brush when it is many more individuals, and there are (relatively speaking) more of them, like for "virgin" collecting (collecting very young females who have not yet mated with a male, and so are not storing sperm).

So I wanted to know from fly peeps around the world; paint brush? forceps? other? Why this choice?

Wednesday, June 17, 2015

The struggle to reproduce scientific results and why (scientists and everyone else) should be happy about it.

Once again there has been a spate of articles in the "popular" and scientific press about issues of reproducibility of scientific studies, the reasons behind the increase in retraction rates, and incidences of fraud or increased carelessness among scientists. It has been repeatedly pointed out that scientists are just people, and thus subject to all of the same issues (ego, unwillingness to accept they are incorrect etc), and that we should accept that science is flawed. This has also raised some to ask whether the enterprise and practice of science is broken (also see this). However, I think if you look carefully at what is being described there are many reasons to suggest that the scientific endeavor is not only not broken, but is showing an increased formalism of all of the self-correcting mechanisms normally associated with science. In other words, reasons to be optimistic!

Yes, scientists are (like all of us) individually biased. And as many authors have pointed out in the current environment of both scarce resources and desire (and arguable need, to secure resources) for prestige, some scientists have cut corners, failed to do important controls (or full experiments) that they knew they should or used outright fraud. However, what I think about most these days is how quickly these issues are discovered (or uncovered) and described both online and (albeit more slowly) in the formal scientific literature. This is a good thing.

Before I get into that, a few points. It seems like articles like this in the New York Times may make it seem like retractions are reaching epidemic levels (for whichever of the possible reasons stated in the paragraph above). However such a claim seems overly hyperbolic to me. Yes, there seems to be many retractions of papers, and thanks to sites like retraction watch, these can be identified far more easily. They have also pointed out that there seems to be an increase in retraction rates during the past five years. I have not looked carefully at the numbers, and I have no particular reason to dispute them. Still, as I will discuss below, I am not worried by this, but it brings me optimism about the scientific process. Our ability (as a scientific community) to correct these mistakes is becoming quicker and more efficient.

First (and my apologies for having to state the obvious), but the vast majority of scientists and scientific studies are done with deliberate care of experimental methodology. However this does not mean mistakes do not happen, experiments and analyses may be incorrect, interpretations (because of biases) may be present in individual studies. This is sometimes due to carelessness or a "rush to publish", but it may as well be due to honest mistakes that would have happened anyways. As individuals we are imperfect. Scientists are as flawed as any other person, it is the methodology and enterprise as a whole (and the community) that is self-correcting.

Also (and I have not calculated the numbers), many of the studies reported on sites like Retraction Watch are actually corrections, where the study itself was not invalidated, but a particular issue about the paper (which could be as simple as something being mis-labeled). I should probably look up the ratio of retractions:corrections (and my guess someone has already done this).

One of the major issues that is brought up with respect to how science is failing is that the ability to replicate the results found in a previously published study can be low. As has been written about this issue before (including on this blog), perfectly reproducing experiment can be as difficult as trying to get the experiment to work in the first place (maybe harder). Even if the effect is being measured is "true", subtle differences in experimental methodologies (that the researches are unaware are different) can cause problems. Indeed, there are at least a number of instances where the experimental methodology trying to reproduce the original protocol was flawed (I have written about one such case here). While I could spend time quibbling about the methodology used to determine the numbers, there is no doubt that there is some fraction of papers that are published, where the results from experiments are not repeatable at all, or are deeply confounded and are meaningless. I will say, that most of the studies looking at this take a very broad view of "failure to replicate". However, I have no doubt that research into "replication" will increase, and this is a good thing. Indeed, I have no idea why studies like this would suggest that studies with "redundant" findings would have "no enduring value".

So with all of these concerns, why am I optimistic? Mostly because I do not think that the actual rate of fraud or irreproducibility is increasing. Instead, I think that there has been an important change in how we read papers, detect and most importantly report problems with studies and the general process of post publication peer review (PPPR). Sites like retraction watch, pubpeer, pubmed commons, F1000 as well as on individual blogs (such as here and here) are enabling this. Not only are individual scientists working together to examine potential problems in individual studies, but these often lead to important discussions around methodology and interpretation (often in the comments to individual posts). This does not always mean that the people making the comments/criticisms about potential problems are always correct themselves (they will have their own biases of course), but potential issues can be raised, discussed by the community and resolved. These may lead to formal corrections or retractions in a few cases. Most of the time it usually leads to the recognition that the scope of impact of the study may ultimately be more limited than the authors of the original study suggested. It also (I hope) leads to new and interesting questions to be answered. Thus, the apparent increase in retraction rates and reproducibility issues most likely reflects an increased awareness and sensitivity to these issues. This may be (psychologically) a similar issue where despite crime rates decreasing, increased scrutiny, vigilance and reporting in our society make it seem like the opposite is happening.

I also want to note that despite comments I often see (on twitter or in articles) that pre-publication peer review is failing completely (or is misguided), I think that it remains a successful first round of filtering. In addition to serving (and having previously served) as associate editor on a number of journals (and having reviewed papers for many many more) I would estimate that 80% of reviews I have seen from other authors (including for my own work) have been helpful, improve the science, the interpretation and clarity of communication. Indeed as a current AE at both Evolution and The American Naturalist the reviews for papers I handle are almost always highly professional, very constructive and improve the final published papers. Usually this results from pointing out issues of analysis, potential confounding issues in the experiments and concerns with interpretation. Often one of the major issues relates to the last point, reviewers (and editors) can often reduce the over-interpretation and hyperbole that can be found in initial submissions. This does not mean I do not rage when I get highly critical reviews of my own work. Nor does it mean that I think that trying to evaluate and predict the value and impact of individual studies (and whether it has any part in the review process and selection for particular journals) remains deeply problematic. However, in balance my experience is that this does improve studies and remains an important part of the process. Further, I do not see (despite the many burdens on the time of scientists, and that reviewing remains unpaid work) evidence for a decline in quality of reviews. Others (like here and here) have said similar things (and probably more eloquently).

As for the replication issues. This is clearly being taken seriously by many in the scientific community, including funding agencies which are now setting aside resources specifically to address this. Moreover, many funding agencies are now not only requiring that papers funded by them be made open access within 12 months of publication, but (and along with a number of journals) are requiring all raw data (and I hope soon scripts for analyses) to be deposited in public repositories.

I think the scientific community still has a lot to do on all of these issues, and I would certainly like to see it happen faster (in particular with issues like publishing reviews of papers along with paper, more oversight into making sure all raw data, full experimental methodology and scripts are made available at the time of publication). However, in my opinion, it does seem like the concerns, increased scrutiny and calls for increased openness are all a part of science and are being increasingly formalized. We are not waiting decades or hundreds of years to correct our understanding based on poor assumptions (or incorrectly designed or analyzed experiments), but often just months or weeks. That is a reason to be optimistic.

note: Wow, it has been a long time since my last post. I can just say I have moved from one University (Michigan State University) to another (McMaster University) and my family and I are in the process of moving as well (while only ~400km, it is from the US to Canada...more on that in a future post). I will try to be much better after the move next week!

Saturday, September 27, 2014

Sufficient biological replication is essential for differential expression analysis of RNA-seq

I just took part in a twitter discussion about the trade-offs between sequencing depth and number of independent biological replicate (per treatment group) for differential gene expression analysis. While there are applications of RNA-seq where sequencing deeply (more than say 50 million reads for a given sample) can be important for discovery. However, most researchers I interact with are interested at some level with differential expression among groups (different genotypes, species, tissues, etc). As with everything else that requires making estimates and quantifying uncertainty for those estimates (minimally necessary for differential expression), you need independent biological samples within each group as well. The ENCODE guidelines suggest a minimum of 2 biological replicates per treatment group (well they do not say "biological" replicates, but I will give them the benefit of the doubt).

However, numerous studies have demonstrated that 2 is rarely sufficient (see links below). I have no idea where the ENCODE got this number from. Generally you want to aim for 4 or more for simple experimental designs. There are numerous studies that have shown this (both by simulation and by rarefaction analysis). These also demonstrate that on balance, beyond a certain read depth per sample (somewhere between 10-25 million reads per sample) there is diminishing returns for rare transcripts (in terms of differential expression), and that it is better to do more independent biological replication (say 5 samples each at 20 million reads) rather than more depth (2 independent biological samples at 50 million reads each). The exact number depends on a number of factors including biological variability (and measurement error) within groups, as well as experimental design. A number of tools have been developed to help folks with figuring out optimal designs.

Here are just a few such studies (there are many more, just wanted a handful for the moment).

http://www.ncbi.nlm.nih.gov/pubmed/24319002
http://www.ncbi.nlm.nih.gov/pubmed/25246651
http://www.ncbi.nlm.nih.gov/pubmed/22985019
http://www.ncbi.nlm.nih.gov/pubmed/22268221
http://www.ncbi.nlm.nih.gov/pubmed/23497356

Check out
http://bfg.oxfordjournals.org/content/early/2011/12/30/bfgp.elr041.full.pdf+html
for a brief and succinct discussion of these and other issues.

And yes, depending on your questions, read length (and PE for SE ) also contribute!

Wednesday, September 24, 2014

Implementing Discovery

This post is the first (of two) about my suggestions for how to implement "open discovery" for answering scientific questions, but in a way that does not completely alienate current professional scientists. In particular because of the current system of how "credit" for answering questions translates to prestige which directly translates to tangible materialistic considerations (raises, being invited to give talks, grants, employment).

Some Background

Early last week I saw a tweet by @caseybergman:

@cdessimoz @MVickySchneider my thinking is heavily influenced by @michael_nielsen's book "Reinventing Discovery" http://t.co/yUvT3gldNb 2/2
— Casey Bergman (@caseybergman) September 12, 2014

This was posted in the context of how Casey plans to implement his scientific research in the coming months and years. Casey was one of two folks who introduced me to twitter as a serious means for scientific communication, and I have found in our 1-1 conversations to get a lot out of it, so I went ahead and read the book he mentioned by Michael Nielsen (http://michaelnielsen.org/), Reinventing Discovery. I was very inspired by the book.

I am not very efficient at summarizing books, but you can read the first chapter for free online. It does a good job of summarizing the main message of the book. Essentially, scientific discovery can be profoundly changed for the better (and in particular made much more efficient and productive), by opening up the ongoing research endeavour to the world, for any and all to actively and concurrently participate in. This goes well beyond (but does include) sharing all data, source code and manuscripts, which (if done) is after most of the actual research has been completed. The approach advocated in the book is about setting up the important problems in the field (with some progress of research on those problems), and inviting all scientists (professional and lay scientists alike) to participate in answering the questions.

One of the main examples that Nielsen cites throughout the book is from the Gower's polymath project, that used an open collaborative framework (on Gower's blog) to find a mathematical proof that had previously eluded the mathematical community. I will not go into the details from the book or blog post, but will just point out that this turned out to work very well, and efficiently, and went from asking the question (with some progress Gower had made) to answering it in less than 40 days. This was not surprisingly followed up by writing up the results in a scientific paper.

The book is full of various examples of how scientifically or computationally challenging problems have been addressed in the open on the internet. I will let you read the book and decide for yourself. While it is fair to say I was already primed (see here and here) for such a vision of scientific discovery, I did realize how much more potential there was.

But...

Such an approach flies in the face of how many (including within the scientific community) perceive how science is done, and how credit for solving scientific problems is garnered. While both the process of scientific discovery, and communication of those discoveries has changed a great deal over the past few hundred years, it is fairly clear that more "recently" (post World War II anyways) a particular system has been built up for professional academic scientists (those who do their research and teach at Universities and other institutes of higher learning).

There are two parts to the "standardization" (or possibly calcification) of scientific discovery and communication that bear considering, and why an immediate transformation to a completely open process of scientific discovery may not be easy (at least from the perspective of academic scientists).

Current practices in scientific communication

First is the means of scientific communication that has become accepted (and calcified). When an individual or group of scientists working on a particular problem have made (in their eyes) sufficient progress on a problem they will usually communicate this in the form of a scientific paper. This is commonly done via submitting a manuscript to a journal. An editor sends it out for peer review (to other experts in the field), who evaluate it for technical correctness, soundness of logic, and for many journals for "novelty" of the ideas and findings. If some of these criteria are not met, the manuscript may be recommended for rejection, otherwise for corrections (revisions), or for acceptance. For a given manuscript this process may repeat several times at the same journal, or at different journals (if rejected from the first journal).

Scientific prestige and current measures of productivity do translate to material benefits (for the scientist).

Assuming the paper is accepted, it is published in a scientific journal. The publication of the article itself, the place it is published and the attention it receives (both in the scientific literature via citations, or in the popular press) all can "matter" for the nebulous ideas of "credit" and "prestige" for the scientists who did the work, and wrote the paper. Indeed these ideas of "credit/prestige" are at the heard of how scientists are evaluated at universities. Our employment (getting a job in the first place), career advancement, salaries, garnering grant support, etc.. can all depend (to varying degrees) on where and how much you publish. These are proxies for "research productivity".

Who gets credit for scientific breakthroughs.

The other piece of this (and related to the idea of prestige), is that much of thescientific work, and the "important breakthroughs" are done by lone individuals or small research groups. These "important breakthroughs" are often popularized in textbooks and the media as having come out of no where (i.e. that the research is completely unlike what has been done before). However, most of the time it is pretty clear that (like with general relativity, or Natural Selection), related ideas and concepts were percolating in the scientific community. That is not to play down the genius of folks like Einstein or Darwin, just that these breakthroughs rarely occur in an extreme intellectual vacuum.

The problem is, that even in modern Universities, research institutes and funding agencies, these sorts of ideas persist, and the prestige for addressing a particular research problem go to one or a few people. This is despite the fact that much highly related work (that set up the conditions for the breakthrough) happened before. Even for multi-authored papers, it is usually just a few of the authors that garner the credit for the findings/discoveries. In my field this is usually the people who are the first and final authors on the manuscript. The prestige associated with this leads to all sorts of benefits (like the ones mentioned above), as well as being invited to give talks around the world at conferences and universities. Thus there is a real materialistic benefit in modern academic science for garnering this prestige (and getting the right position as an author).

The problem is, in many fields, what defines author position can vary considerably. In my field the first author is usually for the person who has provided the most work, and insight into the research in this paper, and the last position for the "senior scientist" whose lab the work was done in (and usually garnered funds, and sometimes came up with the ideas). However the difference between the contribution between the first and second author (or subsequent sets of authors) is rarely quantified, or clear.

This difference in authorship position means a great deal for material concerns to the participating scientists though. Being first author on a paper in a prestigious journal (even if there are 40 authors on the paper) may be necessary (although rarely sufficient) to get a job at a major research university. However being third author on each of three papers in that same journal (even if there are only 4 authors on each of those papers), will not carry nearly as much weight.

Thus the issues of how to make an open collaborative discovery system for scientific research is at odds with the current (socially constructed) system for academic awards for professional scientists. This is by no means insurmountable. While "Reinventing Discovery" only touched on some possible solutions, in the coming days I will post about one possible idea towards meeting such goals, but in such a way that can be easily integrated into the current system of publishing (but maybe not rewards).

Saturday, May 10, 2014

Can we really "afford" not to estimate effect sizes

A recent post over on DATA COLADA suggested that the sample sizes required to estimate effect sizes appropriately are prohibitive for most experiments. In particular this is the point they made:

"Only kind of the opposite because it is not that we shouldn’t try to estimate effect sizes; it is that, in the lab, we can’t afford to."

In response to their post, it has already been pointed out that even if an individual estimate (from a single study) of effect size may have a high degree of statistical uncertainty (in terms of wide confidence limits), that a combination of such estimates across many studies (in a meta-analysis), would actually have pretty reasonable uncertainty (and far smaller than for any single experiment).

I think that there are a couple of other basic points to be made as well.

1) There is no reason not to report the effect size and its confidence intervals. These can be readily computed, so why not report it and associated confidence intervals? Even if some folks still like to focus on trying to read tea-leaves from p-values, the effect size helps in the interpretation of the biological (or other) effect of the particular variables under investigation.

2) The main argument from the DATA COLADA blog post seems to be from the simulation they summarize in figure 2. There are two important points to be made from this. First for all the sample sizes they investigate in figure 2 the confidence intervals do not overlap with zero. So the effect sizes (for all of the sample sizes reported) also demonstrate the "significant effect", but with considerable additional information. In other words, there is no loss of information by just reporting the effect sizes with confidence intervals. You can always do a formal significance test as well, although most of the time it will not provide further insight.

3) The "acceptable" width of the 95% confidence intervals is a discipline specific issue (and the CIs are a function of not just sample size, but of the standard deviation for the observed data itself).

So please report effect sizes and confidence intervals. Your field needs it far more than another example of P < 0.05.

Tuesday, February 25, 2014

Why would any scientists fuss over making your data public and accessible?

Well colour me naive. When PLoS announced their new data archiving policy a few days ago, I hardly felt like it was a "big deal". Providing a public archive of raw data used in a published study seems like a no-brainer (except in very limited circumstances with medical records, location data for at risk species, etc), and is becoming standard practice, right? Clearly my naivete knows no bounds, given some of the really negative reaction to it. (here is one example, while discuss the issue a bit more broadly here ).

In the fields I am most active (at the intersection between Genomics, Genetics and Evolution), there have been numerous recent (and successful?) efforts to make sure data associated with studies becomes archived and publicly available in repositories. While they (the repositories) are not perfect, data archiving seemed to be working and generally useful, and I was always happy to do so myself. Yes some of the issues with getting the data and meta-data formatted for NCBI GEO (or SRA) could be annoying at times, but this was such a minor concern relative to all of the other efforts in collecting and analyzing the data, writing the manuscript (and getting it accepted for publication) that the day spent so that it would be available to other researchers long term seemed pretty minor. Other scientists have always sent me reagents and (when they could find it) data, so this seemed like an easy way to be helpful and inline with the scientific process (and hopefully progress).

More importantly, having tried to get data from other researchers over the years (with huge numbers of "old hard drive failures" always seeming to be the reason why it could not be made available to me). I have recently been involved with a large meta-analysis of previously published data. Rarely was the raw available, and because only summary statistics were available (rarely with associated measures of statistical uncertainty), we were very limited in what analyses we could do. There would be so much more we could do if the raw data had been available.

So, I do not want other researchers to have to deal with these frustrations because of me. By archiving data generated by myself or members of my lab, other researchers could get it without hassling me, and I would not have to worry about finding it at a later date (like 10 years down the road), where it may have taken far more time to recover, then putting it in a repository in the first place.

In Evolutionary biology, most of the journals simultaneously started a data archiving policy (generally associated with DRYAD) a few year ago, I was quite happy. Not only did I put data from new studies up in DRYAD, but also from my older studies (see here). I naively expected most evolutionary biologists to do the same. After all, there are many long term data sets in evolutionary biology that would be of great value, in particular for studies of natural selection, and estimating G matrices, where there is still much active methodological development. Some of the publications generated data sets requiring heroic efforts, and would be a huge community resource.

So I was a little surprised when DRYAD was not rapidly populated by all of these great legacy datasets. I think that folks "hoarding" data are a very small minority, and the majority of folks were just very busy, and this did not seem like a pressing issue to them. In any case, I have also spent some effort at my institution (Michigan State University) discussing such issues with students about the importance of data archiving. All of the benefits seem obvious, making our science more open, and making our data available for those who may be able to address interesting and novel questions in the future. Fundamentally, it is the data and the analysis (and interpretation) that represents much of the the science we do. Our scientific papers representing a summary of this work itself. Better to have it all (data, analysis and interpretation) available, no?

So, when PLoS made the announcement, this seemed like par for the course in biology. Funding agencies are mandating data management and sharing plans, other journals too. So who could be either shocked or dismayed by this?

Like I said, I can be naive.

Even after reading (and re-reading) posts (above) and discussion threads about concerns, I am still baffled. Yes there will be a few "corner cases" where privacy, safety or conservation concerns need to be considered. However for the vast majority of studies in biology this is not the case, or the data can be stripped of identifiers or variables to alleviate such issues, at least in many of these situations.

So what's the problem?
Does it require a little work on the part of the authors of the studies? Perhaps a little. However, I always remind folks in the lab that the raw data they generate, and the scripts they use for analysis will be made available. I find that to my benefit the scripts are much easier to read. Furthermore, keeping these issues in my mind makes it that much easier to get it organized for archiving. The readme files we generate make sure we do not forget what variable names mean, or other such issues. Handling data transformations or removing outliers in scripts means we can always go back and double check the influence of those observations.

In their post, DrugMonkey suggests that the behavioural data they generate in their lab is too difficult to organize in such a fashion as to be broadly useful. While I agree that the raw video (if that is what they are collecting) still remains difficult (although perhaps figshare could be used, which is what we will try for our behavioural trials), we find that the text files from our "event recordings" are very easy to post, organize and generate meta-data for. Does the data need to be parsed from its "raw" format, to one more useful for analysis? Sure, but we will also supply (as we do in our DRYAD data packages) the scripts to do so. Perhaps there is something I am missing about their concern. However, I do not concede their point about the difficulties about organizing their data for archiving. How hard should it really be to make such files useful to other experts in the field?

Even for simulations, we supply our scripts, configuration files, and sometimes data generated from the simulation (to replicate key figures that would take too long to generate by replicating the whole simulation).

It is always worth reminding ourselves (as scientists) of this quote (attributed to Sir William Bragg):

"The important thing in science is not so much to obtain new facts as to discover new ways of thinking about them"

Monday, November 11, 2013

pictures in my head: What is that on the wing of the fly? What does it tell us about adaptation?

Over the past week or so, there has been an absolutely amazing image that has made the rounds on the internet of a fly (Goniurellia tridens)with markings on its wings reminiscent to many viewers of ants.

Goniurellia tridens is a 3-in-1 insect [photo: Peter Roosenschoon] pic.twitter.com/i8ThAOkrvN
— Ziya Tong (@ziyatong) November 4, 2013

As described in these blog posts and articles (Anna Zacharias, Jerry Coyne, Morgan Jackson, Andrew Revkin, Joe Hanson, also here and here) about it, the assumption is that these images are used by this fly to mimic the ant (or more likely a spider- more on this below), to act to ward off potential predators. However, there has been relatively little discussion about the context in which it uses it (but see Morgan Jackson's post), and demonstration of its adaptive utility. As pointed out by many evolutionary biologists, and discussed in detail by Gould and Lewontin in one of the most famous papers in evolutionary biology (The Spandrels of San Marco and the Panglossian paradigm: A critique of the Adaptationist programme), it is easy to make a "just so" adaptive story, but as scientists we need to perform critical experiments demonstrating the adaptive utility of this picture on the wing.

As numerous commenters on the blogs and on twitter have pointed out this fly is part of the family of true fruit flies (Tephritidae), that include several that are known to startle jumping spider (causing them to do a short retreat). This retreat is likely because the flies have evolved to mimic aggressive behaviours of the spiders themselves. This work was initially described over 25 years ago in a pair of papers in Science (One by Erik Greene, Larry Orsak and Douglas Whitman. The other paper by Monica Mather and Bernard Roitberg). These papers beautifully demonstrate the adaptive utility of markings on the wing combined with a rowing action of the wings that could achieve this mimicry. Neither the markings on the wings nor the rowing behaviour alone were sufficient to induce the aversion behaviour in the spiders (the spiders retreat). Indeed those of us who took biology courses in University in the early to mid 1990's probably remember this example being taught to us. What's more is that it seems to be fairly wide spread among species in this family of flies (each research group used a different species of Tephritid fly and spider.) Another paper (Oren Hasson 1995) tested about 18 different species of jumping spiders with the medfly (also a Tephritid), and showed that most spiders responded with the retreat as well. This suggests that this adaptive wing morphology and behaviour combination is probably pretty ancient.

Here I want to show you a video of a picture-winged fly, with a jumping spider. This fly is from a totally different family of flies (the picture-winged flies Ulidiidae (formerly Otitidae)) than the ones discussed above (Tephritids), but apparently does the same thing to startle jumping spiders (as a way of escaping being eaten) as the true fruit flies.

A few years ago, when I was hosting a lab bbq in my backyard, we were lucky enough to get to watch the intricate little behavioural "routine" between a fly and a jumping spider (in this case the bold jumping spider, Phiddipus audax). The spider approached the fly, got into its attack posture, and then the fly did its "wing rowing" display, the spider "retreated" (took a short jump back), and the fly took off, successfully evading getting eaten. Not too shabby, plus how often do you get to watch this for real!

Two years ago I got to watch this happen again, and this time I happened to have some collecting vials. So I collected the flies! I then put the fly in a small dish with a jumping spider (the zebra spider Salticus scenicus) so I could get some simple video of it. Here it is in all its grainy, low quality glory.

Given that I am not a great entomologist, I sent a picture of the fly off to a colleague (Jim Parsons, our collection manager in the MSU Entomology department), and he pointed out to me that this was not a true fruit fly (Tephritid) at all, but a picture-winged fly (from the family Ulidiidae). This particular fly is called Delphinia picta.

This was clearly really exciting, as it shows the potential for a whole other group of flies demonstrating a similar set of anti-predation behaviours. While both of these families belong to the same super-family, their last common ancestor lived probably 75 million years ago (give or take several million years). Is this an example of two different groups of animals independently adapting the same way (convergence) to a similar selective pressure (not getting eaten)? Or is it an adaptation that has survived for millions of years across many species? Finally the possibility exists that some aspects of the behaviours and wing spots allow this to evolve as an anti-predator adaptation over and over again (parallelism)? Whatever it is, it suggests that something even deeper and cooler has happened in evolution, and it will be great to figure this out (hint to new graduate students seeking projects!). As my colleague Rich Lenski mentioned to me (when I showed him this video), it also makes one think carefully about the appropriate "null hypothesis" regarding putative adaptations!

In my lab, one of the things we study is the fly Drosophila melanogaster, and how it evolves in response to potential predators, including jumping spiders. Drosophila is the little fly that you used in high school or university biology. Many call it a fruit fly, even though it isn't (pomace fly and vinegar fly are both used as its common names). For Drosophila we have never observed this kind of behaviours at all. However Drosophila does display a pretty wide range of behaviours, and we are writing up a paper about it right now. For a taste of some of it, check out my graduate students poster over on figshare describing some of the behaviours.

Let me know if you want more, and maybe I can post some additional video. However, to whet your appetite here is another related video that we posted a while ago to youtube (of flies with a mantid). The action starts at about 2:30 into the video.