Create MLST Scheme tool leads to strains being linked to wrong sequence type in some cases
Issue description
The “Create MLST Scheme” tool in affected software versions associates sequences with alleles based on the position of each sequence in the list of those selected. This means that sequences will be associated with the correct allele type only if
- there is one sequence present for every allelic number in the profile, and
- the sequences are listed in the same order as the allelic number in the profile. (I.e. the allelic number in the profile corresponds to the position of the allele in the sequence list).
Otherwise, the association of sequence types with alleles will be incorrect.
Tools that download and import schemes directly from public sites like PubMLST are not affected by this issue, specifically the “Download MLST Schemes (PubMLST) and “Download Other MLST Scheme” tools provided by the CLC Microbial Genomics Module and the CLC MLST Module.
Expected impact and recommendations
Impact
We expect many MLST schemes created using “Create MLST Scheme”, where sequences have been added to genes in the profile, to be affected when using the software listed below. Where an affected scheme is being used, in affected software versions or later software versions, isolates types are likely to be associated with the wrong sequence type.
Recommendations
If an MLST scheme was created using an affected software version (see below), and sequences were added to genes, we recommend checking that:
- There are as many sequences listed as there are alleles in the profile, and that
- The positions of the alleles in the sequence list and allelic numbers in the profile match up as expected.
This can be done by opening the scheme and going to the Allele Table view. If you find your MLST scheme is affected by this problem, please discard the scheme and disregard the sequence types in any results that have been generated using it.
New schemes, unaffected by this problem, may be set up in one of the following ways:
- Download a scheme from a public resource directly, if such is available, using the “Download MLST Schemes (PubMLST) or “Download Other MLST Scheme” tool.
- Create a new scheme using an updated version of the software that is not affected by this problem when it becomes available.
- Create a new scheme using an affected software version, but either do not add sequences to it, or when adding sequences, ensure there is a sequence for each allele in the profile and that the allelic sequences are in the same order as expected by the profiles. This can be done by:
- Selecting individual sequence elements in the correct order in the tool wizard, or
- Importing a file containing all the allelic sequences ordered as expected into the CLC Workbench. If no sequence data is available for particular allele, a sequence name should be present in the file present, but no sequence listed under it. This results in an empty sequence for that allele within the imported sequence list.
- Editing the allele list in the profile outside the CLC Workbench so it contains just the entries that will have sequences associated, in the relevant order.
Affected software and versions
- CLC Microbial Genomics Module 1.1 through 4.5
- CLC MLST Module – all versions up to 1.9.1
This issue was addressed in CLC Microbial Genomics Module 4.8 and CLC MLST Module 1.9.2.