Bioinformatický seminár

Tue 18 Oct. 2011, 17:20

Title: Feng et al. Inference of isoforms from short sequence reads
Speaker: Martin Králik

Due to alternative splicing events in eukaryotic species, the
identification of mRNA isoforms (or splicing variants) is a difficult
problem. Traditional experimental methods for this purpose are time
consuming and cost ineffective. The emerging RNA-Seq technology provides a
possible effective method to address this problem. Although the advantages
of RNA-Seq over traditional methods in transcriptome analysis have been
confirmed by many studies, the inference of isoforms from millions of
short sequence reads (e.g., Illumina/Solexa reads) has remained
computationally challenging. In this work, we propose a method to
calculate the expression levels of isoforms and infer isoforms from short
RNA-Seq reads using exon-intron boundary, transcription start site (TSS)
and poly-A site (PAS) information. We first formulate the relationship
among exons, isoforms, and single-end reads as a convex quadratic program,
and then use an efficient algorithm (called IsoInfer) to search for
isoforms. IsoInfer can calculate the expression levels of isoforms
accurately if all the isoforms are known and infer novel isoforms from
scratch. Our experimental tests on known mouse isoforms with both
simulated expression levels and reads demonstrate that IsoInfer is able to
calculate the expression levels of isoforms with an accuracy comparable to
the state-of-the-art statistical method and a 60 times faster speed.
Moreover, our tests on both simulated and real reads show that it achieves
a good precision and sensitivity in inferring isoforms when given accurate
exon-intron boundary, TSS, and PAS information, especially for isoforms
whose expression levels are significantly high. The software is publicly
available for free at approximately
jianxing/IsoInfer.html .