2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2021

Nicholas Eriksson, Lior Pachter, Yumi Mitsuya, Soo-Yon Rhee, Chunlin Wang, Baback Gharizadeh, Mostafa Ronaghi, Robert W. Shafer, Niko Beerenwinkel. Viral population estimation using pyrosequencing. PLoS Comput Biol, 4(4):e1000074. 2008.

Download preprint: not available

Download from publisher: https://dx.plos.org/10.1371/journal.pcbi.1000074 PubMed

Related web page: not available

Bibliography entry: BibTeX


The diversity of virus populations within single infected hosts presents a major 
difficulty for the natural immune response as well as for vaccine design and
antiviral drug therapy. Recently developed pyrophosphate-based sequencing
technologies (pyrosequencing) can be used for quantifying this diversity by
ultra-deep sequencing of virus samples. We present computational methods for the 
analysis of such sequence data and apply these techniques to pyrosequencing data 
obtained from HIV populations within patients harboring drug-resistant virus
strains. Our main result is the estimation of the population structure of the
sample from the pyrosequencing reads. This inference is based on a statistical
approach to error correction, followed by a combinatorial algorithm for
constructing a minimal set of haplotypes that explain the data. Using this set of
explaining haplotypes, we apply a statistical model to infer the frequencies of
the haplotypes in the population via an expectation-maximization (EM) algorithm. 
We demonstrate that pyrosequencing reads allow for effective population
reconstruction by extensive simulations and by comparison to 165 sequences
obtained directly from clonal sequencing of four independent, diverse HIV
populations. Thus, pyrosequencing can be used for cost-effective estimation of
the structure of virus populations, promising new insights into viral
evolutionary dynamics and disease control strategies.