A few months ago, I meandered over to Rice to see a talk from Mark Maxham, who manages the instrument software group at PacBio. He gave a good basic overview of their technology, which works by fixing DNA polymerase molecules at the bottom of tiny wells, which they call zero-mode waveguides.
The size of these wells is key, because they’re smaller than the wavelength of the light being beamed in. Thus, the laser illuminates only the very bottom portion of the well, right where the polymerase sits. Then they feed in single-stranded DNA and fluorescently labeled nucleotides, and watch the bases being incorporated in real time.
Before I dive into the details of the sequencer, let me say that he talked a lot about the engineering challenges posed by working on such a minute scale, and it’s quite impressive that the system works at all. In addition to having super-sensitive CCDs to capture the tiny amounts of fluorescence given off as a base is incorporated, they have to hold those sensors almost perfectly still to avoid crosstalk between wells. Then, once they get the input from the sensors, they use Hidden Markov Model-based algorithms to process it and extract the signal from all the noise.
All of that is just engineering talk, though. The real question is, as a consumer of genomic data, what does this give me, and how does it stack up to the other sequencing platforms that are in development?
Read lengths on the PacBio machine are impressive – Maxham showed data from the E. coli genome where they had average read length of 500bp, and maximum read lengths of 3.2 kbp. That’s long enough to get across almost any repeat region, meaning that doing assembly from this platform will be a breeze. In order to get even longer reads for scaffolding, they also have a “pulsed mode” where the laser turns on and off at regular intervals. This produces reads containing short gaps and doubles the maximum read length to about 6.4 kbp.
The limiting factor in the length of reads they get from the system turns out to be the polymerase. The laser light used for detection heats up the system and “burns out” the polymerase after a while. In order to increase this read length, they’re going to need to either engineer a more robust polymerase or produce more sensitive CCDs so that they can lower the intensity of the laser. I’m sure they’re working on both approaches.
Another nice feature about the PacBio system is the ability to circularize reads by placing “dumbells” (which they call SMRTbells) on both ends of a double-stranded read.
Poor naming choice aside, it’s a clever little system. They prime inside one of the dumbells and as the polymerase reads this dsDNA, it’s unzipped into a circular fragment. The polymerase will happily read this fragment over and over. until the aforementioned burnout occurs. This gives very deep coverage of short sequences, and drives the error rate to almost nil.
I asked if there was any sequence specificity, and whether it has trouble handling any particular sequence motifs. (454, for example, has poor handling of homopolymer runs). He replied no, and that the only thing he knew of was that it performs slightly better on GC-rich content. I don’t know this for a fact, but I’d hypothesize that because of the extra hydrogen bond, the guanines and cytosines spend a little bit more time inside the active site of the polymerase, which gives a stronger signal and makes detection easier.
In read length and accuracy, this platform is impressive, but the big question is going to be what their throughput and price are like. Mark wasn’t allowed to go into detail on those points, but I hope that we’ll be hearing more from PacBio at the AGBT conference, which is coming up at the end of the month.
1) Eid, et al. Real-Time DNA Sequencing from Single Polymerase Molecules. Science doi:10.1126/science.1162986
2) PacBio Website