Some thoughts about measuring the goodness of peer review…

A few weeks ago I attended a postdoc training about responsible conduct in research. A major focus of the event was an emphasis on being unbiased and avoiding any conflict of interest when reviewing a manuscript or a grant. Naturally, that state seems very much desirable. However, some of the case studies we discussed left me with a bad aftertaste: it seemed as if the concern about conflict or bias was massively outweighing the fact that peer review can also provide added value to science. In my – limited – personal experience with peer review, I have found reviews that were comprehensive and thoughtful (even if they were negative) much more valuable and constructive for my research, than any of the 3-liners declaring my paper to be excellent. This dichotomy got me thinking about the purpose of peer review and it’s relationship to science and the publishing process. Here a couple of points I’ve come up with:

First off, there seems to be a general agreement that the current peer review system is broken.

The current peer review system is broken... (picture by Nesster via flickr)

The current peer review system is broken… (picture by Nesster via flickr)

Aside from more exotic (read: not widely used) forms of peer review, such as open peer review by F1000 or informal peer-review through pre-print servers, such as bioRxiv, more and more traditional journals are responding to the demands of the scientific community and are experimenting with new forms of review that aim to reduce biases. To name a few recent examples:
-> this year the Nature Publishing Group decided that most of it’s journals should join a growing list of publications that offer double-blind peer review (ie where the authors don’t know the reviewers and the reviewers also don’t know the authors, you can read some thoughts on this here and here).
-> Similarly, according to an informal survey on twitter the Genetics Society of America (GSA) is apparently following in the footsteps of the EMBO Journals, and is considering transparent peer review (i.e. where every paper comes with a supplementary file that contains reviews and editorial letters).
-> And maybe the most interesting development has been an announcement by Molecular Cell, where the editors have decided to blind themselves to the names of submitting authors during their first/initial assessment of papers.

Yet, what seems to be unclear, is just how broken the system is – and what improvement these measures will bring? There seems to be a limited effort to develop a metric that quantifies the impact of biases on acceptance or that measures the effect of a new peer review mode. Certainly, there have been some efforts, such as the New Scientist’s 2010 analysis of papers on iPS cells, which found that US-based scientist got their papers published faster than their foreign colleagues. Or two studies that debated if the representation of female first-authors increased after a journal introduced double-blind review (yes and no). But overall, seemingly currently the best measure for the success of a given mode of review is whether or not authors “accept” that mode (ie if they opt-in or opt-out). This seems ridiculous. Surely, journals must be sitting on a wealth of data about rejected and accepted papers pre- and post-change, and it must be possible to develop a more telling metric.

Finding the champion of unbiased review will be tricky in the absence of a good metric (photo by Manitoba Coupon Maven via flickr)

Finding the champion of unbiased review will be tricky in the absence of a good metric (photo by Manitoba Coupon Maven via flickr)

To be clear: I don’t know what the ideal metric would be. Maybe the representation of authors who are female/foreign-sounding/famous/junior amongst accepted papers should be the same as amongst the submitted ones? But this would likely not be the case even for perfectly unbiased peer review. As Boyan Garvalov put it: “renowned scientists [comment: or scientists in the Western world with access to better resources] have not won their reputations on the lottery, they have earned them through talent and hard work; […] So good scientists will continue to publish in good journals” – regardless of the peer review system. Alternatively, maybe one could compare the representation of different author groups across journals with different types of peer-review, but one would need to be careful that the representation of these groups be the same in the fields the different journals cater to.

In short: there seems to be no good metric (or at least I have not come across one). Yet, the absence of a metric is problematic: it makes it near impossible to assess whether a given measure is successful in alleviating bias. (For example, I’d be curious to know if there is a larger impact when editors are blinded to author names or when reviewers are blinded.) In addition, in the absence of a metric we might end up chasing the ideal of perfect peer review (which may or may not exist), instead of focusing on what peer review should actually be about: improving the reviewed science.


One thought on “Some thoughts about measuring the goodness of peer review…

  1. In my opinion, the only reliable measure of peer review is the same as the only reliable measure of scientific articles: independent critical assessment. Such assessment requires that journals publish peer reviews and faciliate transperant discussions:

    Double-blind peer review is not always possible and is always incompatible with preprints. I think it is a step in the wrong direction.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s