Fun with Pubmed

After seeing Neil’s post about increasingly novel findings in the literature, I decided to modify the code a little bit and do some searches of my own. These graphs show the number of times that each term appeared in either the title or abstract of a article listed in PubMed. I limited the results to the last 60 years (well, okay 59).

First of all, it appears that having achieved better living through chemistry, we’ve moved squarely into the age of biology.

If I want to blend in with the crowd, I should call myself a computational biologist, not a bioinformatician:

Finally, it’s all composed of the same stuff, but DNA beats genome handily.

These were just three queries off the top of my head, and are not intended to be rigorous analyses.

Quick Links

Using Impact Factor is the lazy way out

I help moderate our graduate programs’s weekly journal club, and some of the faculty involved are proposing strict guidelines on which journals can be used. Specifically, they’d like to restrict it to only journals that have an Impact Factor of ten or higher. I think that’s a horrible idea, so I responded thusly:

I oppose excluding papers based on impact factor, because it’s a seriously flawed metric. As an example, the journal “Acta Crystallographica A” has a current impact factor of 49.93.[1] For 72 articles published in 2008, 71 garnered no more than three citations, while a single article that racked up 5,624 citations skewed the metric.

On the other hand, a new breed of journals (like PLoS One) publish a huge number of papers and rely on post-publication statistics to measure impact. This inflates the denominator of the metric and leads to a low IF, even though there are certainly some fine papers in that journal.

As further evidence, a 2009 paper did Prinicpal Component Analysis of 39 different metrics of scholarly impact.[2] They concluded that Impact Factor is positioned at the periphery of these rankings, which should lead us to carefully consider how much we rely upon it.

In short, I don’t think we as moderators should take the lazy way out and allow only so-called “top-tier” journals. Surely, out of all of us, at least one or two can spare 5 minutes each week to skim the article and judge it on its merits.


Q: Is Ozzy the first rock star to have his full genome sequenced?

Conde: Yes, as far as I know. I can definitely tell you he’s the first prince of darkness to have his genome sequenced and analyzed.

— Jorge Conde, in SciAm

This came through my feed reader today and I had to laugh. As we might expect, they didn’t find all that much that was interesting in Ozzy’s genome, but it sure was great publicity for Knome and Cofactor.

With the announcement of the PGP1000 and the recent release from the 1000 Genomes Project, the number of personal genomes has now easily reached several thousand. When I started graduate school 5 years ago, a single genome cost well over a million dollars and sequencing one guaranteed you the cover of Nature or Science. Now it’s just something that moderately wealthy people do on a lark.

Why don’t negative results get published?

On a recent AskMe thread discussing a Science article on gender and collective intelligence, someone commented:

I read an article not too long ago about how studies that find fewer/no gender differences are significantly less likely to be published, and are often actively discouraged from publication. I thought I’d saved it, but I didn’t. Anyone know what I’m talking about?

Well, I don’t know the specific article, but there’s little doubt that this is true throughout science. Publishing negative results just doesn’t happen very often. Historically, I suppose there were reasons for this. As I’m banging my head against a problem, I may try 10 different approaches before finding one that works well. If each of those failures was a paper or even a paragraph, it would have made old paper journal subscriptions rather unwieldy and much less useful.

Now that everything is online, though, a handful of scientists are starting to stand up and say “hey, we should be announcing our failures as well, so others aren’t doomed to make the same mistakes”. In my opinion, these people have an excellent point.

So there are two major ways that this can come about. The first is to be encourage more openness when publishing papers. In the methods, or at least the supplement, authors could include a decent description of what techniques turned out not to be helpful and why they might have failed. This isn’t common practice now, mostly because for reasons of communication and reputation. Journal articles are always written as though the experiments were a nice, linear process. We did A, then B, then got result C. This isn’t a very accurate description of the process, and everyone knows it, but it makes those involved look smart. (I suppose if you’re clawing your way towards tenure or angling to land a good post-doc position, you don’t necessarily want to broadcast your failures). The more valid claim is that writing articles this way makes for a nice, easy to communicate story. Still, there’s no reason why more comprehensive supplements shouldn’t be added.

The second way to better announce negative results is to practice open-notebook science, where methods and raw data are published to the web in real time (or after a short delay). What’s holding this back is that scientists worry that by revealing too much, their competitors will get a leg up and publish the next big paper before they can. In this era of crushingly low paylines, where less than 20% of NIH grant applications get funded, things have gotten pretty cutthroat. Stories of being “scooped” abound, and although some people feel that these claims are exaggerated, it can happen, sometimes with career-tanking results.

So to make a long story short, no, negative results aren’t often published, even though doing so would probably be a boon to scientific enterprise as a whole. The good news is that there’s a pretty strong movement underway that is slowly making science more open, transparent, and reproducible.

The problem of prioritization

Dan Kobolt just threw up a great post talking about sifting through the hundreds of mutations that we’re finding in each genome to find those that actually, y’know, mean something.

we are facing two significant challenges. First, identifying the subset of variants have functional significance – separating the wheat from the chaff, if you will. Second, understanding how these functional variants contribute to a phenotype. This is soon to be the frontier in genetics and genomics.

I couldn’t agree more, and since my comment on his post got to be a little longer than I intended, I decided to reproduce it (edited slightly) over here.

I’ve been tackling similar ideas as part of my thesis work. Specifically, we’ve been developing tools that go beyond simple recurrence and look at mutational patterns that can give insight into the significance and functional role of mutations.

The easiest one to think about is mutual exclusivity. If I have part of an oncogenic pathway with two genes (A and B), then we expect that mutations in either one may be enough to disrupt the system, and there will be no selective pressure for mutation in the other. So if we assay a panel of tumors and see that half the tumors have a mutation in gene A, and the other half have a mutation in gene B, with no overlap, it’s quite likely that the mutations play similar functional roles. By detecting these patterns, we can create testable hypotheses about how genes interact, even if they’re not represented in functional databases.

It’s also important to remember that pathways can be disrupted in multiple ways. Exome sequencing to find point mutations may not be enough, as we know that copy-number alterations may lead to altered expression levels, or aberrant methylation may cause dysregulation. A integrative approach is going to be key as we move forward.

So my point is, yeah, there are absolutely people working on improving this process and doing a better job of prioritizing these mutations for in vivo validation. It’s an exciting place to be working right now, as it’s a major bottleneck preventing us from translating ubiquitous sequencing into personalized medicine. I’m glad to be working here in the thick of it.

Scientific Software

Two fairly interesting articles about scientific software appeared in Nature News this week. The first is an exhortation to scientists that can be summed up as: “Your code is good enough – publish it already!” (sounds familiar)

The second is a nice piece by Zeeya Merali warning about the dangers that lurk in scientific programs. The article gives a few examples of papers that had to be retracted because of buggy software, and then does a decent job of summing up many of the problems: untested code, poor documentation, and journals that don’t require code release along with publications.

While she talks about some potential solutions, including openness, better training, and collaborations with trained computer scientists, I feel like she glosses over a crucial point: The reason that we don’t have better scientific software is because it isn’t well-incentivized by the scientific community.

Grants are awarded for work on sexy diseases, not for reliable and robust software engineering. Don’t get me wrong – efforts like BioPerl and Bioconductor are fantastic community resources, but I’d argue that they’re examples of how good things can be despite a lack of systematic support. Allocating serious funding to groups that can produce platforms of solid, well-tested bioinformatics code would go a long way towards helping data science keep pace with the deluge of biological data that’s surging towards us.

Bug Report

computer bug
This is a moth that shorted out a relay in one of the world’s first computers.

An professor of mine once flashed this image up on the screen and told us that when a moth causes problems, it’s a bug. Those mistakes in your program? Those are errors, and they’re your fault.

image via the Wikimedia commons

Microbial Medicine

Today Jonathan Eisen makes some good points about probiotics being sold like snake oil miracle cures.

I’ll agree that it’s a growing problem. I often find people on message boards advocating probiotic treatments, despite the lack of evidence-based support for such actions. I always tell them the same thing: Popping some Lactobacillus acidophilus isn’t likely to hurt, but there’s little proof that it will help you, either.

Lactobacillus acidophilus
Lactobacillus acidophilus

Someone was even asking about fecal transplant bacteriotherapy the other day, after reading Carl Zimmer’s excellent article about how it can treat severe Clostridium difficile.

Look, I totally understand why people grasp at straws and search for anything that might help. Living with a chronic condition that has no cure sucks. Wasting your money on treatments that don’t help, though, will only make you lighter in the wallet and still sick. The problem with probiotics in this case are two-fold:

The first problem is that we just don’t understand the composition of the gut well enough to start trying to theraputically change the composition of your microflora. It’s an enormously complex ecosystem, and we’re just now dipping our toes into the water with initiatives like the Human Microbiome Project. Scientists are studying this with a great deal of interest though. In fact, he other day I stumbled across a study showing that by treating rats with antibiotics, then doing fecal transplant, a team could create long-term alterations in the microbiota of rats. This is pretty exciting stuff, but it’s only the first of many advances that will be needed to prove the feasibility of bacteriotherapy. The next steps will be to figure out which microbes we want in our guts, which we don’t, and how we can create a stable environment for those beneficial species. This will undoubtedly take a while.

The other problem with using probiotics to treat gut conditions is that most of these disorders (IBS/IBD/Crohn’s) have both genetic and environmental factors that contribute to the aetiology. I’m most familiar with Crohn’s Disease, which occurs when cells in the intestinal walls aren’t able to keep microbes out properly, so the immune system kicks into overdrive. So far, it hasn’t been linked to any specific species, which seems to make it a poor target for microbial therapy. Add in the fact that there are often systemic inflammatory problems like arthritis and back problems, and it’s not at all clear that we can treat this by just swapping around the microbes in the gut.

So to sum things up, using probiotics to seriously treat disease will have to wait until we better understand both the diseases in question and host/microbe interactions. That said, research into the microbiome is enormously promising, and I do believe that microbial medicine will someday be commonplace. It’s just that we still have a long way to go.

image source

Transparent histograms redux

The other day, I posted an R function that simulates transparency where histograms overlap. In the comments, Alice left this tip:

You could also use transparent colors, which could mean less programming effort:
e.g. col=rgb(0, 1, 0,0.5)) ..the 4th argument is the degree of transparency alpha

*smacks forehead* That’s a lot simpler than my solution. I didn’t know R supported transparency like that. So instead of using the function I wrote, I could have just done something like this:

a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5)

And gotten the following output:
transparent histograms

Much simpler.

« Previous Entries | Next Entries »