2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2018

Ngan Nguyen, Glenn Hickey, Daniel R. Zerbino, Brian Raney, Dent Earl, Joel Armstrong, W. James Kent, David Haussler, Benedict Paten. Building a pan-genome reference for a population. Journal of computational biology : a journal of computational molecular cell biology, 22(5):387-401. 2015.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX


A reference genome is a high quality individual genome that is used as a
coordinate system for the genomes of a population, or genomes of closely related 
subspecies. Given a set of genomes partitioned by homology into alignment blocks 
we formalize the problem of ordering and orienting the blocks such that the
resulting ordering maximally agrees with the underlying genomes' ordering and
orientation, creating a pan-genome reference ordering. We show this problem is
NP-hard, but also demonstrate, empirically and within simulations, the
performance of heuristic algorithms based upon a cactus graph decomposition to
find locally maximal solutions. We describe an extension of our Cactus software
to create a pan-genome reference for whole genome alignments, and demonstrate how
it can be used to create novel genome browser visualizations using human
variation data as a test. In addition, we test the use of a pan-genome for
describing variations and as a reference for read mapping.