Janik Sielemann, Katharina Sielemann, Broňa Brejová, Tomas Vinar, Cedric Chauve. plASgraph-using graph neural networks to detect plasmid contigs from an assembly graph. Technical Report 2022.05.24.493339, bioRxiv, 2022.
Download preprint: not available
Download from publisher: https://doi.org/10.1101/2022.05.24.493339
Related web page: not available
Bibliography entry: BibTeX
Identification of plasmids from sequencing data is an important and challenging problem related to antimicrobial resistance spread and other One-Health issues. In our work, we provide a new architecture for identifying plasmid contigs in fragmented genome assemblies built from short-read data. Unlike previous machine-learning approaches for this problem, which classify individual contigs separately, we employ graph neural networks (GNNs) to include information from the assembly graph. Propagation of information from nearby nodes in the graph allows accurate classification of even short contigs that are difficult to classify based on sequence features or database searches alone. Our new species-agnostic software tool plASgraph outperforms recently developed PlasForest, which uses database searches to supplement sequence- based features. Since our tool does not rely on existing plasmid databases, it is more suitable for classification of contigs in novel species and discovery of previously unknown plasmid sequences. Our tool can also be trained on a specific species, and in that scenario it outperforms mlplasmids trained on the same species. On one hand, our work provides a new, accurate, and easy to use tool for plasmid classification; on the other hand, it serves as a motivation for more widespread use of GNNs in bioinformatics, such as in pangenome sequence analysis, where sequence graphs serve as a fundamental data structure. Availability: https://github.com/cchauve/plASgraph