A recent post over on DATA COLADA suggested that the sample sizes required to estimate effect sizes appropriately are prohibitive for most experiments. In particular this is the point they made:
"Only kind of the opposite because it is not that we shouldn’t try to estimate effect sizes; it is that, in the lab, we can’t afford to."
In response to their post, it has already been pointed out that even if an individual estimate (from a single study) of effect size may have a high degree of statistical uncertainty (in terms of wide confidence limits), that a combination of such estimates across many studies (in a meta-analysis), would actually have pretty reasonable uncertainty (and far smaller than for any single experiment).
I think that there are a couple of other basic points to be made as well.
1) There is no reason not to report the effect size and its confidence intervals. These can be readily computed, so why not report it and associated confidence intervals? Even if some folks still like to focus on trying to read tea-leaves from p-values, the effect size helps in the interpretation of the biological (or other) effect of the particular variables under investigation.
2) The main argument from the DATA COLADA blog post seems to be from the simulation they summarize in figure 2. There are two important points to be made from this. First for all the sample sizes they investigate in figure 2 the confidence intervals do not overlap with zero. So the effect sizes (for all of the sample sizes reported) also demonstrate the "significant effect", but with considerable additional information. In other words, there is no loss of information by just reporting the effect sizes with confidence intervals. You can always do a formal significance test as well, although most of the time it will not provide further insight.
3) The "acceptable" width of the 95% confidence intervals is a discipline specific issue (and the CIs are a function of not just sample size, but of the standard deviation for the observed data itself).
So please report effect sizes and confidence intervals. Your field needs it far more than another example of P < 0.05.
Empirical research is not really my strong suit, but wouldn't cross-validation somewhat counter the argument that sample sizes are too small to meaningfully estimate effect size?
ReplyDeleteSorry it took so long to get back to you.
ReplyDeleteCross-validation gets at the other side of statistics, namely making predictions. Generally speaking cross-validation is one form of resampling that works well, in particular when there is sufficient data to break it into training and testing samples.
In some sense though, even the non-parametric bootstrap can be considered a 1-fold form of cross-validation and can be used to estimate confidence intervals on estimates on effect sizes.