Why don’t negative results get published?

On a recent AskMe thread discussing a Science article on gender and collective intelligence, someone commented:

I read an article not too long ago about how studies that find fewer/no gender differences are significantly less likely to be published, and are often actively discouraged from publication. I thought I’d saved it, but I didn’t. Anyone know what I’m talking about?

Well, I don’t know the specific article, but there’s little doubt that this is true throughout science. Publishing negative results just doesn’t happen very often. Historically, I suppose there were reasons for this. As I’m banging my head against a problem, I may try 10 different approaches before finding one that works well. If each of those failures was a paper or even a paragraph, it would have made old paper journal subscriptions rather unwieldy and much less useful.

Now that everything is online, though, a handful of scientists are starting to stand up and say “hey, we should be announcing our failures as well, so others aren’t doomed to make the same mistakes”. In my opinion, these people have an excellent point.

So there are two major ways that this can come about. The first is to be encourage more openness when publishing papers. In the methods, or at least the supplement, authors could include a decent description of what techniques turned out not to be helpful and why they might have failed. This isn’t common practice now, mostly because for reasons of communication and reputation. Journal articles are always written as though the experiments were a nice, linear process. We did A, then B, then got result C. This isn’t a very accurate description of the process, and everyone knows it, but it makes those involved look smart. (I suppose if you’re clawing your way towards tenure or angling to land a good post-doc position, you don’t necessarily want to broadcast your failures). The more valid claim is that writing articles this way makes for a nice, easy to communicate story. Still, there’s no reason why more comprehensive supplements shouldn’t be added.

The second way to better announce negative results is to practice open-notebook science, where methods and raw data are published to the web in real time (or after a short delay). What’s holding this back is that scientists worry that by revealing too much, their competitors will get a leg up and publish the next big paper before they can. In this era of crushingly low paylines, where less than 20% of NIH grant applications get funded, things have gotten pretty cutthroat. Stories of being “scooped” abound, and although some people feel that these claims are exaggerated, it can happen, sometimes with career-tanking results.

So to make a long story short, no, negative results aren’t often published, even though doing so would probably be a boon to scientific enterprise as a whole. The good news is that there’s a pretty strong movement underway that is slowly making science more open, transparent, and reproducible.

Dumping my code: the good, the bad, and the ugly

Writing code for use in biological research is a lot different from writing code for a software company. First of all, 99% of the time, I’m the only person who will ever see it. Lots of my work involves one-off scripts to parse data that I’ll never deal with again. Furthermore, cleaning up code and testing it prior to release is hard, time-consuming work. If it’s not moving me towards meaningful results or a publication, there just isn’t much incentive to do it.

The irony of all this is that reproducibility is one of the foundations of science, and better code release would go a long way towards making that a reality in today’s high-throughput world. Furthermore, without great open-source projects like R/Bioconductor and BioPerl, bioinformatics would just be crawling along as everyone wasted valuable time trying to build everything up from foundations.

All of this had been floating around in the back of my mind for some time when I stumbled across Matt Might’s post describing the CRAPL. He proposes that we stop worrying about quality and release even our ugly, hacky, code, provided that it works. The CRAPL license itself is a parody of more established licenses and contains gems like these:

- The Author” probably refers to the caffeine-addled graduate student that got the Program to work moments before a submission deadline.

. . .

- You agree to hold the Author free from shame, embarrassment or ridicule for any hacks, kludges or leaps of faith found within the Program.

So in the spirit of the CRAPL, I’m starting to dump some of the useful code snippets I’ve written over the years in various places. I’ll package up a few of the larger ones and post them on my GitHub page, but most of the little one-off scripts will just be slapped up here on the blog, under the codedump tag.

I’ll just go ahead and throw it all under the MIT License so you can use it and abuse it any way you want. If I can save someone else a few hours of work, it’s worth it. Happy Hacking!

Lessons for article recommendation services

Today someone proposed the creation of a sub-reddit where scientists could recommend papers to each other. While it’s a nice thought, I can almost guarantee that it’s going to be a failed effort. There are already sites like Faculty of 1000, which try to use panels of experts to recommend good papers. In my experience, they mostly fail at listing things that I want to read.

The main reason such sites are useless is that we scientists are uber-specialized, so what you think is the greatest paper ever will likely have very litle interest for me. It’s not that I don’t want to read about cool discoveries in other fields, it’s just that I don’t have time to. Until they invent the matrix-esque brain-jack for rapid learning, I have to prioritize my time, and my field and my work will always come first.

There are only two systems I’ve found that work well. The first are recommendation systems based on what you’ve read in the past, and what your colleagues are reading. CiteULike, for example, recommends users that have bookmarked similar papers to you, and perusing through their libraries gives me an excellent source of material. The other quality source of recommendations is FriendFeed, where I can subscribe to the feeds of other bioinformaticians with similar interests, and we can swap links to papers and comments about those papers.

Both of these systems are all about building micro-communities, with a focus that you can’t achieve in larger communities like Reddit. In this way, it’s sort of like a decentralized version of departmental journal clubs, or specialized scientific conferences. Any site that ignores the value of creating this type of community is pretty much doomed to failure from the start.

(reposted from my personal blog)

|