Using BioRuby to fetch citations from Pubmed

December 13th, 2010

As I write my thesis, I’m pulling together information from multiple papers that I’ve been an author on. These papers are in different journals, with wildly different citation styles. Within my thesis, though, citations need to be presented in a consistent format.

Now, if I had all of these citations stored in a reference manager, it would be a piece of cake to just export them all in a common format. Unfortunately, I wasn’t the first author on some of these papers, so I don’t have that data. Fortunately, with Pubmed and a little BioRuby, it’s not too difficult to get it.

Assuming that the paper is archived in Pubmed Central, there is a link on the right-hand side of the page titled “References for this PMC Article”. If you click on it, then tweak the display options at the top, you can retrieve all of the PMIDs for the articles that were cited.

Save those as a list, then use the following BioRuby code to pull down the citation in BibTex format, for easy import into a citation manager:


#!/usr/bin/ruby

require "rubygems"
require 'bio'

File.new(ARGV[0]).each{|id|
  entry = Bio::PubMed.query(id)
  medline = Bio::MEDLINE.new(entry)
  reference = medline.reference
  puts reference.bibtex
}

I’ll spare you the story of how I started out trying to use regexen to parse through the text of the citations. I pulled together something that sort of worked, but required a seperate regex for each journal, and often returned multiple results that I had to manually disambiguate. Yuck.

Thanks goes to Martin over at FriendFeed for letting me know that PMC had citation info.


Tags: , , , , | Comments Off

Comments are closed.