Multi-locus sequence typing (MLST) has proven its usefulness for molecular typing of bacteria over the last 15 years. Classical MLST schemes typically define seven loci (housekeeping genes), which are sequenced using Sanger technology. Unique sequences for each locus are assigned allele numbers and bacterial strains are identified based on their allelic profiles, which is the combination of the seven allele numbers.
As next-generation sequencing, which offers a fast and cost-effective way to sequence bacterial genomes, is increasingly replacing Sanger sequencing, conventional MLST can be extended to whole genome MLST (wgMLST). Since many more loci (typically 1500 – 4000) are considered in wgMLST, a much higher typing resolution can be obtained.
In contrast to whole genome SNP analysis, wgMLST is based on the concept of allelic variation, meaning that recombinations and deletions or insertions of multiple positions are counted as single evolutionary events. This approach might be biologically more relevant than approaches that consider only point mutations.
The major drawback of the technique is that it requires allele curation. In absence of suitable automated tools, it would be a daunting task to maintain a consistent allele assignment for thousands of loci. To accommodate for this, automated curation tools are provided.
BIONUMERICS for whole genome multi locus sequence typing
The wgMLST curator and WGS tools plugins
The wgMLST curator plugin offers a full range of automated curation tools needed to set up and maintain a wgMLST schema for any organism of choice.
- Automatic naming of alleles
- Automatic sequence type assignment
- Creation of sub-schemes such as MLST, eMLST, rMLST starting from wgMLST
- Access to quality control tools
Whereas only a limited number of people will have access to the curated organism-specific reference database, most users will only need the WGS tools plugin. This plugin provides a fully automated pipeline for identifying alleles based on whole genome sequence data.
BIONUMERICS uses two methods to identify alleles:
- based on a de novo assembly, followed by a BLAST search
- using an assembly-free approach, i.e. directly from the sequence reads
In BIONUMERICS, demanding calculations such as de novo assemblies can be performed on an external calculation engine. The choice here is offered between virtually setup-free pay-per-use cloud solutions (e.g. via Amazon) or a local deployment e.g. on a computer cluster (requires custom services).
From within BIONUMERICS, jobs can be posted on the calculation engine and the results from such calculations retrieved (including parameters for quality control). Only the wgMLST allelic profiles are stored in the BIONUMERICS database as character sets, resulting in a lightweight and responsive strain database.
Typing schemes on different levels
Based on the loci included in the wgMLST scheme, additional typing schemes can be defined on different levels, e.g. core genome MLST (cMLST), ribosomal MLST (rMLST), etc.
The character views in BIONUMERICS offer a flexible tool to select the set of loci used for typing, cluster analysis (e.g. minimum spanning trees) or statistical tests present in the software.
Why use BIONUMERICS for your wgMLST applications?
BIONUMERICS offers you:
- A fully automated pipeline for wgMLST
- Integrated calculation engine: either cloud-based or setup on-premises possible
- Lightweight sample database
- Flexible selection of loci
Currently, fully functional wgMLST schemas are available for following organisms:
|Organism||Total # loci||Subschemas (#loci)||Reference genomes|
|Acinetobacter baumannii||5,633||wgMLST (5,619) MLST PubMLST Oxford* (7) MLST PubMLST Pasteur* (7)||1,734|
|Bacillus cereus||30,363||wgMLST (30,356) MLST PubMLST* (7)||372|
|Bacillus subtilis||7,753||wgMLST (7,746) MLST PubMLST* (7)||101|
|Burkholderia cepacia complex||45,472||wgMLST (45,465) MLST PubMLST* (7)||336|
|Brucella spp.||5,325||TRUNK (3,246) inopinata (4,122) microti (4,102) pinnipedialis (4,133) vulpis (3,891) ceti (4,119) suis (4,191) abortus (4,096) neotomae (4,068) canis (4,183) melitensis (4,217) ovis (4,110)MLST PubMLST 9 loci* (9)MLST PubMLST 21 loci* (21)||372|
|Campylobacter coli - C. jejuni||3,529||wgMLST loci (3,522) Core Oxford* (1,343) rMLST (52) MLST jejuni PubMLST* (7)||96|
|Citrobacter spp.||29,176||wgMLST (29,169) MLST freundii PubMLST* (7)||134|
|Clostridium difficile||8,745||wgMLST loci (8,712) core (1,999) MLST PubMLST* (7) MLST Pasteur (7) PubMLST other loci (13) CWP cluster genes PubMLST* (6)||259|
|Cronobacter spp.||15,739||wgMLST loci (15,727) cogMLST loci (1,865) Gcog loci (222) Ext-MLST loci (10) Tax-MLST loci (9) O-serotype loci (2) MLST PubMLST* (7)||78|
|Enterobacter cloacae||15,612||wgMLST (15,605) MLST PubMLST* (7)||846|
|Enterococcus faecalis||5,292||wgMLST (5,285) MLST PubMLST* (7)||493|
|Enterococcus faecium||5,496||wgMLST (5,489) MLST PubMLST* (7)||510|
|Enterococcus raffinosus||4,470||wgMLST (4,470)||4|
|Escherichia coli / Shigella||17,380||wgMLST loci(17,350) core Enterobase (2,513) MLST Pasteur (8) MLST Whittam (15) MLST PubMLST Achtman (7)||289|
|Francisella tularensis||2,808||wgMLST (2,808) core (1,147)||226|
|Klebsiella aerogenes||14,229||wgMLST (14,222) MLST PubMLST* (7)||116|
|Klebsiella oxytoca||16,277||wgMLST (16,270) MLST PubMLST* (7)||84|
|Klebsiella pneumoniae||19,729||wgMLST loci (19,720) Core (634) MLST Pasteur** (7) Wzc (1) wzi** (1)||67|
|Lactobacillus sanfranciscensis||1,797||wgMLST (1,797)||20|
|Legionella pneumophila||5,777||wgMLST loci (5,770) core (1,521) SBT (7)||32|
|Leuconostoc spp.||16,274||wgMLST (16,274)||113|
|Listeria monocytogenes||4,804||wgMLST loci (4,797) Core Pasteur** (1,748) MLST PubMLST** (7)||150|
|Micrococcus spp.||8,207||wgMLST (8,207)||35|
|Mycobacterium bovis||4,701||wgMLST (4,701)||93|
|Mycobacterium kansasii||8,629||wgMLST (8,629)||28|
|Mycobacterium leprae||2,237||wgMLST (2,237)||5|
|Mycobacterium tuberculosis||4,032||Core (v2) (2,891)||46|
|Neisseria gonorrhoeae||2,431||wgMLST (2,424) MLST PubMLST* (7)||11|
|Neisseria meningitidis||2,909||wgMLST (2,887) core (1,587) eMLST partial genes pubMLST* (20) MLST PubMLST* (7)||357|
|Pasteurella multocida||3,803||wgMLST (3,789) MLST multihost PubMLST* (7) MLST RIRDC PubMLST* (7)||157|
|Proteus vulgaris||5,201||wgMLST (5,201)||7|
|Pseudomonas aeruginosa||15,143||wgMLST (15,136) MLST PubMLST* (7)||400|
|Salmonella enterica||15,874||wgMLST (15,867) Core Enterobase (3,002) MLST PubMLST* (7)MLST PubMLST Achtman (7)||260|
|Serratia marcescens||9,377||wgMLST (9,377)||299|
|Staphylococcus aureus||3,904||wgMLST loci (3,897) Core loci (1,861) MLST PubMLST* (7)||31|
|Staphylococcus epidermidis||4,877||wgMLST (4,870) MLST PubMLST* (7)||312|
|Staphylococcus pseudointermedius||4,431||wgMLST (4,424) MLST PubMLST* (7)||118|
|Stenotrophomonas maltophilia||17,603||wgMLST (17,596) MLST PubMLST* (7)||179|
|Streptococcus agalactiae||4,052||wgMLST (4,045) MLST PubMLST* (7)||153|
|Streptococcus mitis / oralis||10,389||wgMLST (10,382) MLST oralis PubMLST* (7)||171|
|Streptococcus pyogenes||2,735||wgMLST (2,728) MLST PubMLST* (7)||304|
|Weissella ssp.||24,004||wgMLST (24,004)||80|
*: Allelic profiles are synchronized with public nomenclature hosted on pubmlst.org
**: Allelic profiles are synchronized with public nomenclature hosted on bigsdb.pasteur.fr
All wgMLST schemas mentioned above are automatically curated. Last curation was performed 2020-12-20.
Following wgMLST schemas are currently under development and will be available soon:
- Bordetella pertussis
- Clostridium botulinum
- Mycobacterium abscessus
- Vibrio spp.
- Yersinia spp.
If you are working on one of these organisms or any other organism not present in this list, please do not hesitate to contact us for a possible collaboration on wgMLST analysis.
You want to try it out for yourself?
You can! Several wgMLST tutorials are available to guide you through the process.