2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2026
Abstrakt

James Robertson, Kyrylo Bessonov, Justin Schonfeld, John H. E. Nash. Universal whole-sequence-based plasmid typing and its utility to prediction of host range and epidemiological surveillance. Microb Genom, 6(10). 2020.

Download preprint: not available

Download from publisher: http://mgen.microbiologyresearch.org/pubmed/content/journal/mgen/10.1099/mgen.0.000435 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Bacterial plasmids play a large role in allowing bacteria to adapt to changing 
environments and can pose a significant risk to human health if they confer 
virulence and antimicrobial resistance (AMR). Plasmids differ significantly in 
the taxonomic breadth of host bacteria in which they can successfully replicate, 
this is commonly referred to as 'host range' and is usually described in 
qualitative terms of 'narrow' or 'broad'. Understanding the host range potential 
of plasmids is of great interest due to their ability to disseminate traits such 
as AMR through bacterial populations and into human pathogens. We developed the 
MOB-suite to facilitate characterization of plasmids and introduced a 
whole-sequence-based classification system based on clustering complete plasmid 
sequences using Mash distances (https://github.com/phac-nml/mob-suite). We 
updated the MOB-suite database from 12 091 to 23 671 complete sequences, 
representing 17 779 unique plasmids. With advances in new algorithms for rapidly 
calculating average nucleotide identity (ANI), we compared clustering 
characteristics using two different distance measures - Mash and ANI - and three 
clustering algorithms on the unique set of plasmids. The plasmid nomenclature is 
designed to group highly similar plasmids together that are unlikely to have 
multiple representatives within a single cell. Based on our results, we 
determined that clusters generated using Mash and complete-linkage clustering at 
a Mash distance of 0.06 resulted in highly homogeneous clusters while maintaining 
cluster size. The taxonomic distribution of plasmid biomarker sequences for 
replication and relaxase typing, in combination with MOB-suite 
whole-sequence-based clusters have been examined in detail for all high-quality 
publicly available plasmid sequences. We have incorporated prediction of plasmid 
replication host range into the MOB-suite based on observed distributions of 
these sequence features in combination with known plasmid hosts from the 
literature. Host range is reported as the highest taxonomic rank that covers all 
of the plasmids which share replicon or relaxase biomarkers or belong to the same 
MOB-suite cluster code. Reporting host range based on these criteria allows for 
comparisons of host range between studies and provides information for plasmid 
surveillance.