Bioinformatics in five years

Over at BioStar, Keith asked:

In five years time, how would the bioinformatics landscape be and what will probably be the main focus(es) in bioinformatics i.e the hottest areas in bioinformatics?

Perhaps you’re looking for daring predictions, but I see lots of incremental progress, especially on the following fronts:

If by “hottest”, you mean “number of employees”, I think that there will be large number of openings for Masters-level (or lower) bioinformatics staff. These are the folks who will handle routine munging of huge data sets at most sequencing centers. At the present, a lot of that is still handled by either PhDs or grad students. As tools and standards get entrenched, though, you’ll see more and more offloaded to technical staff.

There’s bound to be a lot of movement in the health informatics field, building tools that can take in your personal genome sequence and spit out useful medical advice (in a format that’s useful to both patients and clinicians). This involves not only genomics skills, but also mining of the medical literature and building useful and searchable databases.

Though systems biology has been muted a little as the hype wears off, it’s poised to undergo a huge leap forward. With high-throughput data from tens or hundreds of thousands of cells, our models of how the cell works at a network or pathway level are only going to improve.

Other things that will be in demand:

Database and other “big data” skills – how are you going to store and access data from millions of genomes? We’re talking petabytes of information here.

Visualization – the larger the data gets, the less we’re able to really wrap our heads around it. A few good pictures can often tell us more than a million lines of data.

Truly interdisciplinary scientists. Not CS people who picked up a little bit of biology, or Bio majors who hack a little perl. We’re going to see the first generation of scientists who have really been trained to straddle the boundary between the two. They’re going to be well-poised to not only do solid research on their own, but be the lynchpins of successful collaborations.

Now, if you asked me where I saw the state of genomics in 5 years, or the state of cancer research, I’d think I’d have a lot bigger, bolder predictions. I just don’t see that the basic computational and statistical skillsets that bioinformaticians use today are likely to change tremendously. They’ll just get applied to bigger data, become more parallel, and be more in demand.


It’s a cliché that too little American talent goes into science, and that too many people go into banking, and that our education system is said to be failing because of those effects. To some substantial effects it must be true.

I found myself on a business trip to Europe and lucky for me I was sitting in business class. Seated not far from me in business class was a young woman who had graduated from Harvard 14 months before and who was working for a major financial institution and she was traveling to Europe and when people travel for that major financial institution they travel in business class. I like to walk when I’m on a plane, so I wandered. I walked back to coach and on that airplane in coach was a distinguished physicist who I had known when I was president of Harvard who I think probably is close to even money to win a Nobel Prize one day. He was going to a conference like professors of physics do and he was going like professors of physics like him go, which is in coach. And I didn’t say anything to either of them, but I thought to myself there was something odd about the reward structure of our society.

–Lawrence Summers

as quoted here

Lessons for article recommendation services

Today someone proposed the creation of a sub-reddit where scientists could recommend papers to each other. While it’s a nice thought, I can almost guarantee that it’s going to be a failed effort. There are already sites like Faculty of 1000, which try to use panels of experts to recommend good papers. In my experience, they mostly fail at listing things that I want to read.

The main reason such sites are useless is that we scientists are uber-specialized, so what you think is the greatest paper ever will likely have very litle interest for me. It’s not that I don’t want to read about cool discoveries in other fields, it’s just that I don’t have time to. Until they invent the matrix-esque brain-jack for rapid learning, I have to prioritize my time, and my field and my work will always come first.

There are only two systems I’ve found that work well. The first are recommendation systems based on what you’ve read in the past, and what your colleagues are reading. CiteULike, for example, recommends users that have bookmarked similar papers to you, and perusing through their libraries gives me an excellent source of material. The other quality source of recommendations is FriendFeed, where I can subscribe to the feeds of other bioinformaticians with similar interests, and we can swap links to papers and comments about those papers.

Both of these systems are all about building micro-communities, with a focus that you can’t achieve in larger communities like Reddit. In this way, it’s sort of like a decentralized version of departmental journal clubs, or specialized scientific conferences. Any site that ignores the value of creating this type of community is pretty much doomed to failure from the start.

(reposted from my personal blog)