Janik Sielemann, Katharina Sielemann, Broňa Brejová, Tomas Vinar, Cedric Chauve. plASgraph-using graph neural networks to detect plasmid contigs from an assembly graph. Technical Report 2022.05.24.493339, bioRxiv, 2022.

Download preprint: not available

Download from publisher: https://doi.org/10.1101/2022.05.24.493339

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Identification of plasmids from sequencing data is an important and 
challenging problem related to antimicrobial resistance spread and other 
One-Health issues. In our work, we provide a new architecture for 
identifying plasmid contigs in fragmented genome assemblies built from 
short-read data. Unlike previous machine-learning approaches for this 
problem, which classify individual contigs separately, we employ graph 
neural networks (GNNs) to include information from the assembly graph. 
Propagation of information from nearby nodes in the graph allows accurate 
classification of even short contigs that are difficult to classify based 
on sequence features or database searches alone.

Our new species-agnostic software tool plASgraph outperforms recently 
developed PlasForest, which uses database searches to supplement sequence-
based features. Since our tool does not rely on existing plasmid databases, 
it is more suitable for classification of contigs in novel species and 
discovery of previously unknown plasmid sequences. Our tool can also be 
trained on a specific species, and in that scenario it outperforms 
mlplasmids trained on the same species.

On one hand, our work provides a new, accurate, and easy to use tool for 
plasmid classification; on the other hand, it serves as a motivation for 
more widespread use of GNNs in bioinformatics, such as in pangenome 
sequence analysis, where sequence graphs serve as a fundamental data 
structure.

Availability: https://github.com/cchauve/plASgraph