CLC Microbial Genomics Module latest improvements

CLC Microbial Genomics Module 25.0

Released on December 3, 2024

New features

New tools

De Novo Assemble Small Genome enables de novo assembly of short-read sequencing data using the third-party algorithm SPAdes. The tool is designed for assembling microbial isolates and employs SPAdes version 4.0.0.
Classify Whole Metagenome Data facilitates taxonomic classification of whole metagenome data using a fast and memory-efficient algorithm. It supports analysis with large, comprehensive reference indexes, enabling accurate species-level resolution on laptops.
Create Whole Metagenome Index creates an index from a genome reference database, which can be used by the new Classify Whole Metagenome Data tool.

New reference data

QMM-H. The QIAGEN Microbial Metagenome - Human Host database, available for download using Download Curated Microbial Reference Database, is a comprehensive microbial reference database that can be used for classification of whole metagenome data using the new Classify Whole Metagenome Data tool. The database contains RefSeq genomes of archaea, bacteria, viruses, protozoa, and fungi, and UniVec_Core sequences.
Chicken Gut. The taxonomic profiling Chicken Gut database, available for download from Download Curated Microbial Reference Database, contains metagenomic-assembled genomes from chicken gut samples, curated and hosted by MGnify, EMBL-EBI.
Pig Gut. The taxonomic profiling Pig Gut reference database, available for download from Download Curated Microbial Reference Database, contains metagenomic-assembled genomes from pig gut samples, curated and hosted by MGnify, EMBL-EBI.
QIAseq xHYB HepC Panel. This Reference Data Set, available for download from the Reference Data Manager, facilitates analysis of data generated using the QIAseq xHYB HepC Panel. It is designed for use with the Analyze QIAseq xHYB Viral Panel Data (Human host) template workflow.
For Download MLST Scheme, four new schemes have been added: Avibacterium paragallinarum MLST, Neisseria spp. N. meningitidis cgMLST v3, Pasteurella multocida cgMLST (draft), Streptococcus mitis MLST. The following schemes have been removed: Escherichia spp. wgMLST, Neisseria spp. N. meningitidis cgMLST v2.
For Download Pathogen Reference Database, 18 new species have been added: Legionella feeleii, Legionella anisa, Legionella cherrii, Legionella bozemanae, Neisseria bacilliformis , Neisseria elongata, Neisseria cinerea, Neisseria oralis, Neisseria perflava, Neisseria subflava, Neisseria weaveri, Vibrio mimicus, Vibrio metoecus, Vibrio alginolyticus, Vibrio antiquarius, Vibrio diabolicus, Haemophilus influenzae, Streptococcus mutans.

Improvements

Workflows

The following Template Workflows have been updated: Analyze QIAseq xHYB Viral Panel Data (Human host), Compare Variants Across Samples, Data QC and OTU Clustering, Data QC and Remove Background Reads, Data QC and Taxonomic Profiling, Detect Amplicon Sequence Variants and Assign Taxonomies, Find QIAseq xHYB AMR Markers (Human host), QC, Assemble and Bin Pangenomes, Type a Known Species, Type Among Multiple Species.

General updates:

Reports generated by individual tools are now placed in a 'QC & Reports' folder.
The names of input and output elements have been updated.
The locked/unlocked status of selected parameters indicating whether these can be changed when launching the workflow has been updated.
QC summary items are now configured using Create Sample Report, which has been included in all workflows.
Sample and combined report outputs have been curated to include the most relevant sections, and workflows now only output one sample report per batch and one combined report per run.

Changes to selected template workflows

Refine Abundance Table has been added to the following template workflows: Data QC and OTU Clustering, Detect Amplicon Sequence Variants and Assign Taxonomies, Data QC and Taxonomic Profiling.
In the Estimate Alpha and Beta Diversity template workflow, Remove OTUs with Low Abundance, which has been moved to the Legacy Tools folder, has been replaced by Refine Abundance Table.
An update has been made to the Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) template workflow making it more efficient.

Analyze QIAseq xHYB Viral Panel Data (Human Host) template workflow

Analysis of data generated using the QIAseq xHYB HepC Panel is now supported by the Analyze QIAseq xHYB Viral Panel Data (Human Host) template workflow when combined with the QIAseq xHYB HepC Panel Reference Data. The workflow is also available from the QIAseq Panel Analysis Assistant.
The template workflow has been updated to use the most recent version of the QIAseq xHYB Viral Panels Reference Data Set, version 1.4.
Refine Abundance Table has been added to the workflow. All abundance table outputs are now aggregated on species-level.
Remove Marginal Variants has been replaced by Filter on Custom Criteria

Performance

The speed of Annotate with DIAMOND has been significantly improved, particularly for genome-sized datasets.
Speed improvement and memory improvements for OTU Clustering when working with datasets containing many reads that are nested within longer reads

Other improvements

Create SNP Tree
- The trees produced by Create SNP Tree have branch lengths that approximate the number of SNPs, rather than the expected number of substitutions per site. Branch lengths can be displayed using settings in the 'Branch layout' side panel palette of the tree editor.
- The accuracy of trees built using Create SNP Tree's Maximum Likelihood tree algorithm has been improved. This is achieved by adding columns of conserved nucleotides to the alignment, to account for the fact that most positions are not SNPs, and to ensure that the frequencies used by substitution models are representative of the full dataset.
- The option to 'Ignore positions with deletions' has been removed. The tool has always ignored these positions when using the Maximum Likelihood tree algorithm. It now always ignores these positions, which are enriched for false positive SNPs caused by missed structural variant calls in repeat regions.
- The option to define the Tree view setting has been removed. This aligns with the behavior of other tools. When opening the output tree, the default tree view setting will be applied.
For the SNP Tree Variants view, the Show Column palette in the Side Panel has been reinstated.
Download Curated Microbial Reference Database now supports downloading whole metagenome indexes. The available index type - whole metagenome or taxonomic profiling - depends on the selected database.
In the Abundance table Stacked Visualization view, with the Side Panel setting Chart Type, in addition to scaling by percentage where all bars have the same height of 100%, it is now possible to scale by counts, where the bar heights are proportional to the number of counts.
In the report created by Find Best Reference using Read Mapping, the 'Assemblies' and 'References' tables now include a 'Name' column composed of the following attributes from the reference database: Latin name and Assembly ID for the 'Assemblies' table, and Latin name and Name for the 'References' table, respectively.
Abundance tables can now be used as input for Filter on Custom Criteria.
Taxonomic Profiling
- The abundance table will always be output; previously, this was optional.
- The default naming of outputs of Taxonomic Profiling within workflows has been updated. The term 'Database matches' has been replaced by 'Reference database reads', the term 'Host matches' has been replaced by 'Host reads', and the term 'Unclassified reads' has been replaced by 'Unmapped reads'.
For Create K-mer Tree, the option to define the Tree view setting has been removed. This aligns with the behavior of other tools. When opening the output tree, the default tree view setting will be applied.
Add Metadata to Abundance Table now only accepts Abundance Tables as input.
Various minor improvements

Bug fixes

Fixed an issue where Find Resistance with ShortBRED was unable to detect small markers in the QMI-AR Peptide Marker Database. For additional info see, https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/find-resistance-with-shortbred-is-unable-to-detect-short-amr-markers/.
Fixed an issue where Alpha diversity plots could not be opened if the underlying data contained one or more 'NaN' values.
Fixed an issue in the QC, Assemble, and Bin Pangenomes template workflow that caused the binned reads output to sometimes lack a paired status.
Fixed an issue with abundance tables where columns for aggregated features were added to the right-hand side of the table instead being added at the left-hand side, and where the location of columns in the table could change when aggregation settings were updated.
Fixed an issue in Combine Reports where values originating from the Find Best References using Read Mapping report were reported as zeros if all values in a column were integers. This occurred when the combined report was generated from sample reports or other combined reports. The issue did not occur when the Find Best References using Read Mapping report was used directly as input for the combined reports.
Corrected a typo in the Annotate with BLAST parameters. The label, previously displayed as 'Nuclotide sequence list,' has been updated to the correct spelling, 'Nucleotide sequence list.' This change applies to both the tool wizard and Command Line Tools commands.

Changes

Third party version updates

minimap2 has been upgraded to version 2.28. Results of Classify Long Read Amplicons, which uses minimap, may consequently change
DIAMOND has been upgraded to version 2.1.8. The following tools make use of DIAMOND, so results of these may consequently change: Annotate with DIAMOND, Annotate CDS with Best DIAMOND Hit, Find Resistance with ShortBRED, Create MLST Scheme, Create DIAMOND Index.

Functionality retirement

The following tool has been retired: Extract Regions from Tracks (legacy)
The following template workflow has been retired: Analyze Viral Hybrid Capture Panel Data (legacy). Instead, use the Analyze QIAseq xHYB Viral Panel Data (Human Host) template workflow.

Advance Notice

Advance Notice: Remove OTUs with Low Abundance has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend using Refine Abundance Table instead.

Reference data update, September 2024

Released on September 10, 2024

QIAseq xHYB Viral Panels reference data set: The QIAseq xHYB taxonomic profiling index has been updated to include at least one representative per viral species.
This addresses an issue where species like mumps, which were not previously represented in the taxonomic profiling index, could go undetected when this reference data set was used with the template workflow, Analyze QIAseq xHYB Viral Panel Data (Human host).

CLC Microbial Genomics Module 24.1.1

Released on July 09, 2024

Change

This update fixes an issue where it was possible to launch an analysis from the CLC Genomics Workbench using CLC Microbial Genomics Module 24.1 when the CLC Genomics Server still had CLC Microbial Genomics Server Extension 24.0 installed,
We recommend that sites using a CLC Genomics Server with the CLC Microbial Genomics Server Extension installed upgrade to this version.

Compatibility

At time of release, CLC Microbial Genomics Module 24.1.1 is available for CLC Genomics Workbench 24.0.1. It will be compatible with later releases in this major release line.
At time of release, CLC Microbial Genomics Server Extension 24.1.1 is available for CLC Genomics Server 24.0.1. It will be compatible with later releases in this major release line.

CLC Microbial Genomics Module 24.1

Released on June 25, 2024

New workflow and reference data

The template workflow Analyze QIAseq xHYB Mycobacterium Tuberculosis Panel Data (Human host) reports spoligotype, lineage, and drug resistance for QIAseq xHYB Mycobacterium tuberculosis Panel data.
The new template workflow comes with a complementary reference data set, QIAseq xHYB Mycobacterium tuberculosis Panel, available from the Reference Data Manager. The workflow is also available from the QIAseq Panel Analysis Assistant.
The RefSeq Prokaryotic 16S database contains RefSeq-curated 16S ribosomal RNA sequences from bacteria and archaea. The database is available for download from Download Amplicon-Based Reference Database.

New tools

Spoligotype Mycobacterium Tuberculosis facilitates spoligotyping of M. tuberculosis isolates from NGS reads. Based on the variability of 43 spacer oligonucleotides, the tool reports the spoligotype in binary and octal code, SIT, lineage, and sublineage.
Refine Abundance Table facilitates the aggregation and filtering of abundance tables. Besides the reduced abundance table, the tool produces a report with the 50 most abundant aggregated features. For example, an abundance table with phylogenetic taxonomy can be aggregated at the genus level and filtered to include only genera with a certain abundance level. The report will then contain a list of the 50 most abundant genera.
Join Nearby Variants merges variants that are more than one nucleotide apart but within three nucleotides of each other into larger microhaplotypes. The merge is done regardless of zygosity and frequency, and the tool is therefore most suited for data from monoploid organisms, such as most viruses and bacteria. Merging nearby variants allows amino acid changes to be more precisely determined by the Amino Acid Changes tool.
Create Report from Table enables the transformation of table elements into a report. As an example, a variant table can be converted into a report featuring a list of variants, which can then be included in a sample report.

Improvements

On PCoA plots, the axis titles now display two decimal places.
Download MLST Schemes will automatically skip generating a minimum spanning tree for MLST schemes containing more than 65,536 profiles. This addresses an issue with generating large trees and ensures smoother and more efficient handling of large MLST schemes. In addition, the tool has been updated with several minor improvements.
In the template workflow Create MLST Scheme with Sequence Types, only the report from Type with MLST Scheme is used as input to Combine Reports. Previously, the Add Typing Result to MLST report was also provided as input even though it is not supported for Combine Reports.
Improvements for Find Resistance with Pointfinder:
- Speed improvements for cases where there are many variants for one gene.
- The tool now uses a slower but more sensitive read mapping mode, which is able to classify reads as supporting/not supporting resistance with greater accuracy.
Bin Pangenomes by Taxonomy: Report content and formatting has been updated, and the report can now be used as input to the tools Create Sample Report and Combine Reports.
Bin Pangenomes by Sequence:
- Report content and formatting has been updated, and the report can now be used as input to the tools Create Sample Report and Combine Reports.
- The tool has been optimized to handle datasets of up to 100,000 contigs.
Add Metadata to Abundance Table now generates a new abundance table that includes the added metadata. Previously, the input table would be modified.

Bug fixes

Find Resistance with PointFinder:
- Fixed an issue where the tool, in rare cases, would state that a resistance causing insertion was present when it was not.
- Fixed an issue where the tool, in cases where a read had a single nucleotide deletion that conferred resistance, would report all later resistance-conferring deletions on the same gene as present even when they were not. Of the PointFinder databases currently available with Download Resistance Database, this issue affects only Klebsiella and Tuberculosis. For these databases, the later deletions in a gene always support resistance to the same drug. Therefore, this issue does not change the overall drug resistance status. However, it does affect the read counts for each resistance in the report, and the number of resistant variants in the table view and may therefore seem to give more support for the resistance than is warranted.
- Fixed an issue where the tool, in cases where reads contained a deletion, would sometimes report also database variants contained within that deletion even though only the bigger deletion was supported by the data.
- Fixed an issue where the tool would sometimes fail when reads mapped to a region with multiple overlapping deletions.
Fixed an issue that caused Download MLST Scheme and Import MLST Schemes to fail for schemes with sequence types containing ambiguous bases. Note that the downstream tool Type with MLST Scheme does not support ambiguous bases and such sequence types will effectively be ignored during analysis.
Fixed an issue that caused Taxonomic Profiling to sometimes fail when reads mapped with unaligned ends across the junction of circular genomes.

Changes

The ARES database is no longer available from Download Resistance Database.

Advance Notice

The template workflow Analyze Viral Hybrid Capture Panel Data has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend instead using the template workflow Analyze QIAseq xHYB Viral Panel Data.
Extract Regions from Tracks has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend using Extract Reads.

Compatibility

At time of release, CLC Microbial Genomics Module 24.1 is available for CLC Genomics Workbench 24.0.1. It will be compatible with later releases in this major release line.
At time of release, CLC Microbial Genomics Server Extension 24.1 is available for CLC Genomics Server 24.0.1. It will be compatible with later releases in this major release line.

Database update, March 2024

Released on March 21, 2024

The UNITE databases available from Download Amplicon-Based Reference Database have been updated to UNITE version 9.0.
The curated ViraCura HPV databases available from Download Curated Microbial Reference Database have been updated to ViraCura version 1.2.
For Download MLST Scheme, six new schemes have been added: Borrelia spp. cgMLST, Leptospira cgMLST, Neisseria spp. N. gonorrhoeae cgMLST v2, Yersinia Y.enterocolitica cgMLST, Yersinia Y.pseudotuberculosis cgMLST, and Yersinia Yersinia cgMLST. In addition, the following scheme has been removed: Neisseria spp. N. meningitidis cgMLST v1.
For Download Pathogen Reference Database, one new species has been added to the list of available pathogens: Enterococcus_hirae.

CLC Microbial Genomics Module 24.0.1

Released on March 12, 2024

Improvements

Analyze QIAseq xHYB Viral Panel Data (Human host) and Analyze Viral Hybrid Capture Panel Data (legacy) template workflows have been updated to support input of multiple sequences lists per sample.
When pressing the "Extract Reads from Selection" button on an abundance table, a progress dialog is now shown, and it is possible to cancel the calculation.

Bug fixes

Fixed an issue affecting Create SNP Tree where selecting the 'Maximum Likelihood' tree construction algorithm caused incorrect branch lengths, which could result in an inaccurate tree representation. This issue did not impact the topology of the tree.
Fixed an issue that caused Bin Pangenomes by Sequence to sometimes fail.
Fixed an issue where, when Taxonomic Profiling was provided with two or more sequence lists of input, the three optional outputs - 'Reads matching the reference database', 'Reads matching the host genome', and 'Unclassified reads' - would only contain reads from the first input sequence list.
Fixed an issue causing Download Resistance Database to not work when run on the CLC Genomics Server.
Fixed an issue where Bin Pangenomes by Sequence would not output all binned reads, but only reads from the first read mapping. Similarly, read values in the report would only refer to reads in the first read mapping.
Fixed an issue causing Classify Long Read Amplicons to sometimes fail.
Fixed an issue causing Download MLST Schemes to sometimes fail if multiple schemes were downloaded simultaneously.
Fixed an issue causing the History view of the EC database available from Download Ontology Database to be empty.
Fixed an issue with the Database Builder where the column order would change when aggregating the table by taxonomy. As a result of this fix, the ability introduced in CLC Microbial Genomics Module 24.0 to reorder columns in the side panel has been removed.
Fixed an issue where the PFAM link in the Annotate CDS with PFAM Domains results table did not work. The link is now provided in the separate 'Link to PFAM' column.
Fixed an issue in the Alpha Diversity workflow element where the parameter 'Sample with replacement' could not be adjusted.
Fixed an issue where filtering options of Identify Pathways could not be configured in a workflow.
Fixed an issue where an error could be shown when Identify Pathways was the first tool in a workflow.
Fixed an issue that prevented users from restoring CLC Standard Settings for abundance tables in stacked visualization view.
Fixed an issue where combining Classify Long Read Amplicons reports would result in a combined report where taxonomic levels did not appear in canonical order.
Various minor bug fixes

Changes

Add Metadata to Abundance Table can no longer be included in a workflow. Running the tool in a workflow was mistakenly enabled for Microbial Genomics Module 24.0, but it would result in the workflow failing.

Advanced notice

The template workflow Analyze Viral Hybrid Capture Panel Data has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend instead using the template workflow Analyze QIAseq xHYB Viral Panel Data.
Extract Regions from Tracks has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend using Extract Reads instead.
From end of May 2024, the ARES database will no longer be available from Download Resistance Database.

Compatibility

At time of release, CLC Microbial Genomics Module 24.0.1 is available for CLC Genomics Workbench 24.0.1. It will be compatible with later releases in this major release line.

At time of release, CLC Microbial Genomics Server Extension 24.0.1 is available for CLC Genomics Server 24.0.1. It will be compatible with later releases in this major release line.

Database update, February 2024

Released on February 23, 2024

The MetaCyc database is available for download via Download Pathway Database.

Database updates, January 2024

Released on January 22, 2024

The Virulence Factor database (VFDB) is available for download via Download Resistance Database.

Released on January 12, 2024

The MetaCyc database is no longer available for download via Download Pathway Database.

CLC Microbial Genomics Module 24.0

Released on January 9, 2024

New features and improvements

New tool: Classify Long Read Amplicons allows for taxonomic profiling of amplicon data, including 16S rRNA, obtained with Oxford Nanopore or PacBio long-read sequencing platforms.
For Create SNP Tree, the SNP Tree Variants view offers a 'By sample' SNP representation where the alleles for each sample are displayed in separate columns.
For Find Best References using Read Mapping, the new option to treat each assembly ID as a reference introduces support for segmented virus genomes like influenza A and B.
For Create Sample Report, selected quality control metrics are supported for the following tools: Assign Taxonomies to Sequences in Abundance Table, Classify Long Read Amplicons, Find Best Matches using K-mer Spectra, Find Best References using Read Mapping, Identify Viral Integration Sites, Mask Low-Complexity Regions, OTU Clustering, and Taxonomic Profiling.
OTU Clustering:
- The option to set the output name has been removed. This aligns with the behavior of other tools.
- The default names of the various tool outputs have been modified to better reflect their content.
- Reports from this tool can now be used as input to the tools Create Sample Report and Combine Reports.
- The formatting of the report has been updated, and some sections of the report have been renamed. The report contains information about merged overlapping paired reads. This content was previously provided in a separate report.
Taxonomic Profiling:
- The tool now supports multiple sequence lists per sample.
- Auto-detect paired distance calculation is now based on uniquely mapped reads only. Previously, all mapped reads were considered.
- The name of the parameter 'Adjust read count abundances' has been changed to 'Adjust for read length variation'. The previous name suggested that only the abundance calculation was affected, but the option actually changes the calculation of both abundance and coverage.
Detect Amplicon Sequence Variants:
- The tool now supports multiple sequence lists per sample.
- Minor improvements to the algorithm may marginally change results.
Abundance tables:
- Sorting samples by metadata will not only group samples based on metadata values but now also sort those groups based on the values. Moreover, sorting by numeric attributes will now follow the correct numeric order.
- The dropdown menu for aggregating features by taxonomic level will display options in the taxonomic order Kingdom, Phylum, Class, Order, Family, Genus, and Species, rather than alphabetically.
Database Builder:
- The Database Builder now uses 'Yes', 'Partially', and 'No' instead of checkboxes to indicate whether a genome is included in the current selection. When data is aggregated by, for example, family, and only some members of the family are included, the column will display 'Partially'. The 'Show rows' setting in the Side Panel with Included/Excluded checkboxes has been removed because the same effect can be achieved by filtering at the top of the table on the 'Included' column.
- Two new columns, TaxID and Species TaxID, are available for filtering. The columns are also included in the downloaded reference database.
- The dropdown menu for aggregating rows on taxonomy will display options in the taxonomic order Kingdom, Phylum, Class, Order, Family, Genus, and Species, rather than alphabetically.
Alpha Diversity and Beta Diversity produce empty outputs if no plot can be generated. Previously, no output was created, which could be problematic in a workflow setting.
DIAMOND related:
- DIAMOND has been upgraded to version 2.0.14. This third-party code is used by the following tools, the output of which may consequently change: Annotate with DIAMOND, Annotate CDS with Best DIAMOND Hit, Find Resistance with ShortBRED, Create MLST Scheme, Create DIAMOND Index.
- The DIAMOND 'fast' sensitivity mode has been added to tools that use DIAMOND: Annotate with DIAMOND, Annotate CDS with Best DIAMOND Hit, Find Resistance with ShortBRED, Create MLST Scheme.
- Annotate CDS with Best DIAMOND Hit now uses the DIAMOND 'iterate' option. This will search the query database with increasing sensitivity, only searching queries at the user specified sensitivity if they do not produce a significant hit at a lower sensitivity search. This change generally makes the tool run faster.
- Annotate CDS with Best DIAMOND Hit has been optimized to run substantially faster when multiple sequences are provided as input.
For antibiotic resistance databases and resistance abundance tables, antibiotic resistance ontology (ARO) links now point to EMBL-EBI's Ontology Lookup Service OLS4 instead of the previous OLS3.
Find Resistance with ShortBRED:
- Reports from this tool can now be used as input to the tools Create Sample Report and Combine Reports.
- Report content and formatting has been updated.
For Find Resistance with Nucleotide Database, parameters 'Minimum identity' and 'Minimum length' can now be set to 0% to skip filtering of results by these metrics.
Download Custom Microbial Reference Database now uses less memory.
PERMANOVA Analysis has been updated to support running the tool in a workflow.
Differential Abundance Analysis:
- The parameter label 'Metadata factor' has been renamed to 'Test differential abundance due to'.
- The tool has been updated to support running it in a workflow.
When exporting the Identified Pathways Table, sample-specific columns will include the name of the sample.

Template workflows

Analyze QIAseq xHYB Viral Panel Data (Human host) and Find QIAseq xHYB AMR Markers (Human host):
- The workflows are now also available from the QIAseq Panel Analysis Assistant.
- "(Human host)" has been appended to the template workflow names to reflect that these workflows were designed for analysis of human host data. Instructions for how to modify the workflows to work with non-human samples are found in the corresponding sections of the CLC Microbial Genomics Module user manual.
Detect Amplicon Sequence Variants and Assign Taxonomies: Adapter trimming is now optional.
Analyze Viral Hybrid Capture Panel Data and Analyze QIAseq xHYB Viral Panel Data (Human host): Analyze Viral Hybrid Capture Panel Data and Analyze QIAseq xHYB Viral Panel Data template workflows: The step Map Reads to Reference has been removed as it has become redundant. Additionally, for Low Frequency Variant Detection, the Minimum frequency has been increased from 1% to 10%. Given that a downstream step Remove Marginal Variants filters variants below 20%, this will not impact results for default settings.

Database and reference data updates and changes

Greengenes2. The Greengenes 16S database available from Download Amplicon-Based Reference Database has been updated to Greengenes2 version 2022.10.
Download MLST Scheme supports download of MLST schemes hosted by Institut Pasteur. This adds the following schemes: Bordetella pertussis cgMLST, Bordetella MLST, Corynebacterium diphtheriae complex cgMLST, Corynebacterium diphtheriae complex MLST, Corynebacterium ulcerans cgMLST, Klebsiella pneumoniae species complex MLST, and Listeria spp. MLST. In addition, the following PubMLST-hosted scheme has been added: Serratia spp. MLST.
The name of the ARG-ANNOT database available from Download Resistance Database has been changed from AR Marker Database to ARG-ANNOT Peptide Marker Database.
The name of the Enzyme Commission number (EC) database available from Download Ontology Database now includes the EC database version.

Bug fixes

Fixed an issue causing the legend on EC functional profile abundance table stacked visualization to show the wrong labels when the data was aggregated by feature.
Fixed an issue causing Bin Pangenomes by Sequence and Taxonomic Profiling to ignore the read orientation on paired read sequence lists and assume that it was forward-reverse. The tools now utilize the read orientation setting on the input sequence lists.
Fixed an issue where Create Taxonomic Profiling Index would fail if one or more sequences in the reference database did not have a taxonomy annotation.
Fixed an issue causing Find Resistance with Nucleotide Database to fail if the Phenotype column of the provided database contained colons.
Fixed an issue that prevented users from restoring CLC Standard Settings for the Database Builder, abundance tables, the PCoA 3D plot, the Sunburst plot, and the Identified Pathways Graph.
Fixed an issue causing Annotate with DIAMOND to sometimes fail.
Fixed an issue causing Annotate with DIAMOND to produce CDS annotations with invalid GO-term links.
Fixed an issue where export of Antimicrobial resistance abundance tables to .biom format would fail.
Fixed an issue where Import PICRUSt2 Multiplication Table run on the CLC Genomics Server would display an error message even though the import was successful.
Fixed an issue causing Build Functional Profile to double-count GO terms if the same term was annotated twice on the same CDS.

Changes

In the SNP Tree Variants view it is no longer possible to select/deselect columns in the Side Panel.
Find Resistance with Nucleotide DB has been renamed to Find Resistance with Nucleotide Database. Likewise, the tool parameter 'DB' has been renamed to 'Database'.
The following tools have been to new locations moved in the Toolbox:
- Mask Low-Complexity Regions is now under Utility Tools. It was previously under Microbial Genomics Module | Tools. The Tools folder has been removed.
- Create Result Metadata Table, Extend Result Metadata Table, and Use Genome as Result are now under Utility Tools | Result Metadata. They were previously under Microbial Genomics Module | Typing and Epidemiology | Result Metadata.

Known issue

On Mac computers with an Intel processor, running CLC Genomics Workbench 24.0, it is not possible to open abundance tables in Sunburst view. Attempting to do so will cause the Workbench to shut down. This issue was addressed in CLC Genomics Workbench 24.0.1.

Functionality retirement

The following tool has been retired: Convert Abundance Table to Experiment

Advance Notice

The template workflow Analyze Viral Hybrid Capture Panel Data has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend instead using the template workflow Analyze QIAseq xHYB Viral Panel Data.
Extract Regions from Tracks has been moved to the Legacy Tools folder of the Workbench Toolbox and will be retired in a future version of the software. We recommend using Extract Reads instead.
From end of May 2024, the ARES database will no longer be available from Download Resistance Database.

CLC Microbial Genomics Module 23.0.3

Released on December 4, 2023

Bug fixes

Updated Download MLST Scheme to align with recent requirement changes at PubMLST, the provider of the MLST schemes downloaded by this tool.
Fixed an issue causing the legend on EC functional profile abundance table stacked visualization to show the wrong labels when the data was aggregated by feature.
Fixed an issue causing Import MLST Scheme to fail if the import file contained empty lines.

Database updates, November 2023

Released on November 27, 2023

Download MLST Scheme: 10 new schemes have been added: Bacillus licheniformis MLST, Blastocystis spp. ST3 MLST, Blastocystis spp. ST4 MLST, Campylobacter jejuni/coli C. jejuni / C. coli cgMLST v2.0, Escherichia spp. wgMLST, Mycoplasma hyosynoviae MLST, Proteus spp. MLST, Proteus spp. Virulence, Streptococcus uberis cgMLST, Vibrio parahaemolyticus cgMLST.
In addition, the scheme Campylobacter jejuni/coli C. jejuni / C. coli cgMLST v1 was renamed to Campylobacter jejuni/coli C. jejuni / C. coli cgMLST v1.0, and the following two invalid schemes were removed: Anaplasma phagocytophilum ankA, Anaplasma phagocytophilum groEL.
The Enzyme Commission number (EC) database available from Download Ontology Database was updated to ENZYME version 08-Nov-2023.

Database update, October 2023

Released on October 11, 2023

UHGG. The taxonomic profiling UHGG reference database available from Download Curated Microbial Reference Database has been updated to version 2.0.1_1. The update addresses an issue where the previous version of the database contained 200+ genomes where all or large parts of the sequences consisted of N's. When using this database for Taxonomic Profiling, zero reads would map to the stretches of N's. Consequently, the affected strains or species would go undetected by the analysis, or their abundance counts will be underestimated. For additional info see, https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/uhgg-database-for-taxonomic-profiling-contains-sequences-with-long-unintentional-stretches-of-ns/.

Database and reference data updates, September 2023

Released on September 8, 2023

Fixed an issue where Download Ontology Database would fail for download of the Gene Ontology database with Pfam2GO mappings.

CLC Microbial Genomics Module 23.0.2

Released on August 3, 2023

Improvements

Create Taxonomic Profiling Index
- Reduced disk space usage.
- Increased speed when creating an index from a sequence list with many duplicate sequences.
- Reduced memory usage when creating an index from a sequence list with many short sequences.
- When used in a workflow, the tool now only outputs a report when configured to.
The speed of Alpha Diversity has been substantially improved for analyses where the setting "Sample with replacement" is not checked. This improvement may change the alpha diversity results slightly.

Bug fixes

Fixed an issue for Download Custom Microbial Reference Database where choosing "Build database from taxonomic lineages" in combination with the inclusion criteria "One per genus" would sometimes give an incorrect genome selection. See https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/download-custom-microbial-reference-database-one-per-genus-option-produces-erroneous-genome-selection/.
Find Resistance with PointFinder
- Fixed an issue where the tool would report detection of an insertion from the applied database when what was in the data was a longer insertion starting with the same inserted sequence.
- Fixed an issue where the tool would report only one insertion or deletion present for cases where multiple insertions and deletions were present within the same reads.
- Fixed an issue where the tool would by mistake accept single sequence inputs. This led to an analysis error as such input is not supported.

Fixed an issue where the ResFinder database available from Download Resistance Database would have double entries.
Fixed an issue where Beta Diversity would fail on Bray-Curtis and Jaccard results for abundance tables where two or more samples had all zero abundance values.
Fixed an issue where Annotate CDS with Best DIAMOND Hit would produce results with invalid GO-term links.
Build Functional Profile
- Fixed an issue where the tool would proceed and silently ignore part of the input when sequences in the read mapping and reference did not match up. Now, for such instances the tool will fail as a result of incompatible inputs.
- Fixed an issue where the tool would fail for circular genomes.
Fixed an issue where Differential Abundance Analysis would allow selection of a metadata attribute where all samples had the same value. This would cause the analysis to fail.
Fixed an issue where extracting sequences from an MLST scheme allele table using "Extract Selected Alleles" would result in a sequence list without locus annotations.
Fixed an issue with abundance tables where information would sometimes go missing when zooming in on the bar chart.

Database updates, July 2023

Released on July 4, 2023

The Virulence Factor database (VFDB) is no longer available from Download Resistance Database.
Fixed an issue for the QMI-AR and CARD databases available from Download Resistance Database where versioning was missing from the database names.

Database updates, June 2023

Released on June 23, 2023

The following databases available from Download Resistance Database have been updated:
- ShortBRED Marker Databases
  - QMI-AR: QMI-AR Peptide Marker Database 7.0 contains peptide markers from the following databases: CARD 3.2.6, ARG-ANNOT V6_JULY2019, NCBI Bacterial Antimicrobial Resistance Reference Gene Database 2023-04-17.1, and ResFinder 4.3.1.
  - CARD: The CARD Peptide Marker Database was updated to CARD version 3.2.6.
- Nucleotide Databases
  - QMI-AR: QMI-AR Nucleotide Database 7.0 contains nucleotide sequences from the following databases: CARD 3.2.6, ARG-ANNOT V6_JULY2019, NCBI Bacterial Antimicrobial Resistance Reference Gene Database 2023-04-17.1, and ResFinder 4.3.1.
  - CARD: The CARD Nucleotide Database was updated to CARD version 3.2.6.
  - VFDB: The VFDB Nucleotide Database was updated to VFDB version 2023_04_17.
- Point Mutation Databases
  - PointFinder: The PointFinder databases were updated to PointFinder version 3.0.1.
  - Fixed an issue for the PointFinder databases that contain insertions and deletions, where all insertions and some deletions were incorrectly represented in the database reference sequences. See https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/pointfinder-databases-contain-incorrect-insertions-and-deletions/.
Download MLST Scheme: Five new schemes have been added: Anaplasma phagocytophilum ankA, Anaplasma phagocytophilum groEL, Lactococcus garvieae MLST, Neisseria spp. Ng_pblaST, Staphylococcus aureus cgMLST. Haemophilus influenzae cgMLST was renamed to Haemophilus influenzae cgMLST v1.
Download Pathogen Reference Database: The list of available pathogens has been updated with three new species Kosakonia_oryzendophytica, Kosakonia_oryziphila, and Phytobacter_massiliensis. In addition, the Enterobacter genus entry was replaced by 14 species: Enterobacter_asburiae, Enterobacter_bugandensis, Enterobacter_cancerogenus, Enterobacter_chengduensis, Enterobacter_chuandaensis, Enterobacter_cloacae, Enterobacter_hormaechei, Enterobacter_kobei, Enterobacter_ludwigii, Enterobacter_mori, Enterobacter_oligotrophicus, Enterobacter_roggenkampii, Enterobacter_sichuanensis, Enterobacter_soli.
Fixed an issue for Download Custom Microbial Reference Database where the Database Builder "In RefSeq" column would state "Yes" also for genomes that were previously in RefSeq, but have since been removed. The column will now state "Yes" only for genomes currently in RefSeq.
If your creation of a custom reference database included filtering on the "In RefSeq" column or used one of the Quick Selection RefSeq options (which rely on the "In RefSeq" column), your reference database will likely contain more genomes than you intended.
Reasons provided by RefSeq as to why genomes were removed include "from large multi-isolate project", "contaminated", and "many frameshifted proteins".

Database and reference data updates, May 2023

Released on May 31, 2023

Fixed an issue where Download Ontology Database would fail for download of the Enzyme Commission Numbers (EC) database.

Reference data update, April 2023

Released on April 17, 2023

QIAseq xHYB Viral Panels QIAGEN reference data set: The viral reference database elements were updated to better reflect genome accession numbers used for original kit design. As an example, the respiratory reference database now contains the SARS-CoV subspecies “Severe acute respiratory syndrome coronavirus 2” (isolate Wuhan-Hu-1) instead of “SARS coronavirus Tor2”. Also, in addition to the existing Species TaxID annotations, the reference databases now contain TaxID annotations.

CLC Microbial Genomics Module 23.0.1

Released on March 20, 2023

Bug fixes

Fixed a metadata-related issue that impacted all tools producing abundance tables: Detect Amplicon Sequence Variants, OTU Clustering, Build Functional Profile, Find Resistance with ShortBRED, and Taxonomic Profiling.
When an abundance table was based on samples with multiple metadata associations, metadata attributes and values would sometimes get mixed up. This had several implications for the affected tables and downstream analysis:
- The abundance table functionality 'Aggregate samples' would use wrong values for some of the metadata attributes.
- Differential abundance analyses created from affected abundance tables would be based on wrong values and therefore results would be wrong, see https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/differential-abundance-analysis-may-use-wrong-values-for-samples-with-multiple-metadata-associations/.
- Heat maps based on affected abundance tables would show wrong metadata values.
  Important: The solution does not correct existing abundance tables and downstream analysis results. Affected tables and downstream results should be rerun. To check if your results are affected, open the underlying samples, and go to Elements Info View by clicking on the Show Elements Info icon under the View area. Under ‘Metadata’, check to see if you have more than one metadata association. If you see only one metadata association, your results are not affected. If you see two or more metadata associations, we recommend that you rerun the analysis.
Fixed an issue for EC functional profile abundance tables where sub-tables aggregated on feature (Level 1-4) or sample values could not be opened in stacked visualization view.
Fixed an issue where a mixture of number and text-based metadata attributes on samples could make different types of analysis fail.
Fixed an issue where Find Resistance with Nucleotide DB would by mistake accept more than one database as parameter input. If you selected more than one database, only the first database would have been used. The tool now accepts only one database.

Database and reference data updates, March 2023

Released on March 16, 2023

QMI-PTDB: New versions of the QIAGEN Microbial Insights - Prokaryotic Taxonomy Databases are available from Download Curated Microbial Reference Database. Genomes and annotations are based on NCBI Reference Sequence Database (RefSeq) release 216 and provided with Genome Taxonomy Database (GTDB) v207 taxonomy. The larger database, QMI-PTDB Genus 2.0, represents all genera with a varying number of species per genus. The smaller database, QMI-PTDB Family 2.0, represents all families with a varying number of genera per family. Both databases include species commonly included in microbial reference standards.
UHGG: The Unified Human Gastrointestinal Genome database available from Download Curated Microbial Reference Database has been updated to UHGG version 2.0.1.
ARES: The ARES resistance marker database available from Download Resistance Database has been updated to ARESdb version 3.1.
Download Pathogen Reference Database: The list of available pathogens has been updated with 11 new species: Burkholderia cepacia complex, Listeria innocua, Mannheimia haemolytica, Pasteurella multocida, Pluralibacter gergoviae, Stenotrophomonas maltophilia, Streptococcus equi, Streptococcus suis, Treponema pallidum, Vibrio fluvialis, and Vibrio metschnikovii. In addition, the Neisseria genus item was replaced by four species: Neisseria lactamica, Neisseria meningitidis, Neisseria polysaccharea, Neisseria gonorrhoeae.
Download MLST Scheme: Eight new schemes have been added: Bacillus cereus B. cereus cgMLST, Burkholderia mallei cgMLST, Enterococcus faecium MLST (Bezdíček et al.), Haemophilus influenzae cgMLST, Mycobacteroides abscessus complex cgMLST, Mycoplasma genitalium MG191-MG309 typing, Mycoplasma genitalium MLST, and Neisseria spp. N. meningitidis cgMLST v2.

CLC Microbial Genomics Module 23.0

Released on January 17, 2023

New features

The tool Detect Amplicon Sequence Variants infers sample sequences variants for amplicon data down to single nucleotide resolution.
The tool Assign Taxonomies to Sequences in Abundance Table assigns taxonomies from a taxonomic index to sequences in an abundance table, e.g. an amplicon sequence variant abundance table.

New workflows and reference data

Detect Amplicon Sequence Variants and Assign Taxonomies - a workflow for preparing reads, detecting amplicon sequence variants from amplicon sequence data, and assigning taxonomies to these.
Find QIAseq xHYB AMR Markers - a workflow for finding antimicrobial resistance markers from QIAseq xHYB AMR Panel data. A complementary reference data set, QIAseq xHYB AMR Panel, is available from the Reference Data Manager.
Analyze QIAseq xHYB Viral Panel Data - a workflow for performing taxonomic profiling, and mapping viral reads for variant calling for QIAseq xHYB Respiratory, Viral STI, Adventitious Agent, and MPXV Panel data. A complementary reference data set, QIAseq xHYB Viral Panels, is available from the Reference Data Manager.
The reference data set QIAseq xHYB Viral Panels has been updated (v1.2) to have four reference_database elements, one for each of the supported QIAseq xHYB panels (Respiratory, Viral STI, Adventitious Agent, and MPXV). This makes for a more targeted analysis as with the previous QIAseq xHYB Viral Panel v1.1, which had one element covering all four panels. The reference data set is for use with the template workflow Analyze QIAseq xHYB Viral Panel Data and is available from the Reference Data Manager.

Improvements and changes

Taxonomic Profiling uses mapping quality scores for better precision. Reads mapping with a low mapping quality score are assigned to a higher taxonomy level. The output ""Reads matching the reference database"" is annotated with a mapping quality score for each read.
Files containing amplicon sequence variant counts in tabular format (.xls/.xlsx/.csv) can be imported as abundance tables using Standard import.
Merge Abundance Tables has been extended to support amplicon sequence variants abundance tables.
Find Best Reference Using Read Mapping will handle duplicate references. If same-name references have identical sequences, only one of these will be included in analysis. If same-name references have different sequences, they will be renamed to ensure unique names.
The Phenotype and Phenotype ARO columns have been removed from the ARES Database downloaded with Download Resistance Database as these annotations are no longer available from the source.
The Identified Pathways table can be exported to Excel and other tabular formats.
Extract Regions from Tracks was moved to the Toolbox folder Tools.

Bugfixes

Fixed an issue where Find Best References Using Read Mapping would use only the first References and Host references inputs, see https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/find-best-references-using-read-mapping-uses-only-first-references-and-host-references-inputs/.
Fixed an issue in Differential Abundance Analysis that affected the estimation of dispersion estimates including information from nearby features. This leads to slightly different p-values produced by the tool.
Fixed an issue with logistic regression for Bin Pangenomes by Taxonomy. This change may in some cases lead to a change in bin assignments for contigs.
Fixed an issue where Find Resistance with Nucleotide DB would by mistake accept more than one database as parameter input. If you selected more than one database, only the first database would have been used. The tool now accepts only one database.
Fixed a rare issue where Differential Abundance Analysis could fail with the message "Error setting up ensemble".
Fixed an issue where pathway views could not be closed if the underlying pathway database was not saved.
Fixed an issue where applying advanced filtering on the MetaCyc database and the Identified Pathways tables would give unexpected results.

Advance notice

The tool Convert Abundance Table to Experiment has been moved to the Legacy folder of the Workbench Toolbox and will be retired in a future version of the software. Instead of converting your abundance table to an experiment before running statistical tests, use the tools Differential Abundance Analysis and Create Heat Map for Abundance Table on the abundance table as is.

CLC Microbial Genomics Module 22.1.2

Released on on March 20, 2023

Bug fixes

Fixed a metadata-related issue that impacted all tools producing abundance tables: Detect Amplicon Sequence Variants, OTU Clustering, Build Functional Profile, Find Resistance with ShortBRED, and Taxonomic Profiling.
When an abundance table was based on samples with multiple metadata associations, metadata attributes and values would sometimes get mixed up. This had several implications for the affected tables and downstream analysis:
- The abundance table functionality 'Aggregate samples' would use wrong values for some of the metadata attributes.
- Differential abundance analyses created from affected abundance tables would be based on wrong values and therefore results would be wrong, see https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/differential-abundance-analysis-may-use-wrong-values-for-samples-with-multiple-metadata-associations/.
- Heat maps based on affected abundance tables would show wrong metadata values.Important: The solution does not correct existing abundance tables and downstream analysis results. Affected tables and downstream results should be rerun. To check if your results are affected, open the underlying samples, and go to Elements Info View by clicking on the Show Elements Info icon under the View area. Under ‘Metadata’, check to see if you have more than one metadata association. If you see only one metadata association, your results are not affected. If you see two or more metadata associations, we recommend that you rerun the analysis.

Fixed an issue where Find Best References Using Read Mapping would use only the first References and Host references inputs, see https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/find-best-references-using-read-mapping-uses-only-first-references-and-host-references-inputs/.
Fixed an issue for EC functional profile abundance tables where sub-tables aggregated on feature (Level 1-4) or sample values could not be opened in stacked visualization view.
Fixed an issue where a mixture of number and text-based metadata attributes on samples could make different types of analysis fail.
Fixed an issue where Find Resistance with Nucleotide DB would by mistake accept more than one database as parameter input. If you selected more than one database, only the first database would have been used. The tool now accepts only one database.

CLC Microbial Genomics Module 22.1.1

Released on 17.11.2022

Bug fixes

Fixed an issue where Download Resistance Database would fail for ResFinder database download.
Fixed an issue where combining Taxonomic Profiling reports would result in a combined report where taxonomic levels did not appear in canonical order.
Download Custom Microbial Reference Database
- Fixed an issue where the information in the database builder on which you base your manual assembly ID selection would be outdated.
- Fixed an issue where the tool would not report if one or more of the specified Assembly IDs could not be downloaded. Information about failed download is now presented in the tool log.
Various minor bug fixes

Database and reference data updates, October 2022

Released on October 11, 2022

New Monkeypox virus (MPXV) and Outgroup (MOCOVA) databases for broad MPXV/MOCOVA species, sub-lineage, and variant identification are available from Download Curated Microbial Reference Database.
The curated ViraCura HPV v1.1 databases were updated with additional reference sequences.
The reference dataset QIAseq xHYB Viral Panels, available from the Reference Data Manager, was updated with the Monkeypox viral reference sequence. The reference dataset can be used in combination with the Analyze Viral Hybrid Capture Panel Data template workflow to analyze data generated with the QIAseq xHYB Viral Panels, now including the QIAseq xHYB MPXV Panel and the QIAseq xHYB MPXV Spike-In Panel. The name of the reference dataset was previously QIAseq xHYB Viral Respiratory Panel.

CLC Microbial Genomics Module 22.1

Released on June 15, 2022

New features

A new set of tools for the prediction of biochemical pathways from functional abundance tables and differential abundance tables is now available. This set of tools is comprised of:

Download Pathway Database to download the MetaCyc pathway database.
Identify Pathways which takes an abundance table or a differential abundance table with EC terms as input and predicts the presence of biochemical pathways in the sample or the up/down regulation of pathways between groups of samples.

The pathway database and the pathway calls can be visualized as simple pathway graphs to put the EC terms into the context of their biochemical compounds. In a pathway calling result, the widths and hues of EC terms can be adjusted to display the abundance or fold changes in the abundance, p-values and max group means.

Improvements, Changes and Bug fixes

Annotate with BLAST and Annotate with DIAMOND

A new option "Adjust CDS to open reading frame" is now available. When enabled a local alignment will be extended or reduced such that the region starts with a start with a start codon, ends with a stop codon and does not contain any internal stop codons while matching the length of the reference sequence within ten percent.

Download Custom Microbial Reference Database

Removed the feature of selection using clicks and require users to always use the "Include"/"Exclude" buttons when building databases
Changed the protocol to connect with NCBI from ftp to https. This enables faster downloads of reference sequences.
Fixed an issue where the parameters used for downloading the database have not been added correctly to the downloaded sequence list.

MLST

Enabled the download of the "Listeria spp. cgMLST1748" MLST scheme using Download MLST scheme tool.
The Import MLST Scheme tool now adds more information into the log informing about skipped headers and fields during the import process.

Identify Viral Integration Sites

Fixed an issue in the tool where a minus one for the number of detected breakpoints indicated that no breakpoints have been found. The number of detected breakpoints is now strictly larger than 0.

Various minor improvements

Improved the stability for detecting the CPU type on Apple systems.
Fixed an issue with the progress reporting of the Download Curated Microbial Reference Database tool.
Improved error handling in PERMANOVA analysis when the tree and the sequences in the abundance table do not match or the distances for other reasons become zero.
Fixed an issue where aggregating samples for EC abundance tables would fail with a null pointer exception.

CLC Microbial Genomics Module 22.0.1

Released on 14.03.2022

Bug fixes

Fixed a bug causing the following tools to run slow or fail on Apple M1. The tools affected are Annotate with DIAMOND, Annotate CDS with Best DIAMOND Hit, Create DIAMOND Index, Create MLST Scheme and Taxonomic Profiling

CLC Microbial Genomics Module 22.0

Released on January 11, 2022

Updated to be compatible with CLC Genomics Workbench 22 and CLC Genomics Server 22.

Improvements

The Download Amplicon-Based Reference Database now allows for downloading specific versions of amplicon based reference databases, e.g. UNITE can now be downloaded at 97% and 99% clustering levels independently. Also SILVA v138.1 is now available for download.
Updated Diamond to v2.0.13. Improved sensitivity and performance which enables Blast-like searches to be completed many times faster. Workbench tools that use Diamond: Annotate With Diamond, Annotate CDS with Best DIAMOND Hit, Find Resistance with ShortBRED and Create MLST Scheme.
The Workflow Data QC and Clean Host DNA has been generalised to select reads of organisms of interest while discarding reads of some background organisms, e.g. a host. To reflect this, the new workflow is now called Data QC and Remove Background Reads. The workflow can also be used to generate input for the Compare Variants Across Samples workflow.

Bug fixes

Fixed an issue where a Null Pointer Exception could be encountered in the wizard of the Differential Abundance Analysis tool when some samples were not associated with metadata for the selected metadata group.
Fixed an issue where PCoA and Alpha diversity plots produced from abundance tables with many taxa or species could be extremely slow.
Fixed an issue where the wizards of some download tools in workflows were only available after running the respective tool as a standalone tool. This affects Download Protein Database and Download Ontology Database.
Fixed issue where the character "%" in the name of a Taxonomic Profiling Index caused the Taxonomic Profiling tool to crash.

Changes

Removed the file input for Download Custom Microbial Reference Database to simplify the wizard. To work with the new interface, the content of the files can be copied and pasted into the respective fields in the wizard.
To highlight Large MLST toolset is generally applicable for 7-gene, core genome and whole genome MLST schemes the word "Large" been removed from all tools of the toolset. All names now just refer to MLST.
All Workflows from the Microbial Genomics Module are now available in the Template Workflows section under Microbial Workflows.

CLC Microbial Genomics Module 21.1.1

Released on January 11, 2022

Improvements

Changed the color highlighting in a typing result where now the typing status is colored green, yellow or red to indicate the quality of the typing run. The sequence type calls are not colored anymore.
The Download Curated Microbial Reference Database and Download Protein Database tool are now available in View mode.

Bug fixes

Fixed several issues with the "Download Pathogen Reference Database" tool, including all the issues mentioned on https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/download-pathogen-database-returns-too-few-reference-sequences/. The tool now only allows for downloading one specific set of pathogens including sequence attributes from https://www.ncbi.nlm.nih.gov/pathogens/organisms. It now also uses the latest version of the sequence attributes available. Other sets of pathogens previously available via this tool, e.g. all viruses, can be downloaded using the Download Custom Microbial Reference Database tool.
Fixed an issue with Large MLST Schemes where the genetic code used to generate a Large MLST Scheme would not be reported if "Check codon positions" has been disabled when constructing the scheme.
Fixed an issue in the Multi-VCF exporter where the export would fail if the first variant track was empty.
Fixed an issue where Excel files could not properly be imported as OTU abundance tables if the taxonomies were 7-step taxonomies rather than QIIME formatted.
Fixed an issue with the Identify Viral Integration Site viewer where single-chromosome host genomes would disappear when zooming in.

CLC Microbial Genomics Module 21.1

Released on June 28, 2021

New features

New features improving viral analysis capabilities

Identify Viral Integration Sites makes it possible to detect viral integration events in host genomes based on hybrid capture data. The tool provides a circular interactive and zoomable host/virus genome viewer that makes it possible to inspect the integration events in detail, featuring coverage, unaligned end coverage and broken pair coverage visualizations and CDS track overlays. Closeby genes are reported in a synchronized table.
Find Best References using Read Mapping serves as a high-quality alternative to the Find Best Matches Using Kmer Spectra tool, specifically for small genomes where the kmer statistic can be imprecise.
The Analyze Viral Hybrid Capture Panel Data workflow is available for the identification of the most prevalent viral species and its variants from a sample analyzed using hybrid capture technology.

New features improving amplicon based metagenomics capabilities

Import PICRUSt2 Multiplication Table (beta) is available to import data for functional inference or copy number normalization of 16S or ITS sequencing data.
Infer Functional Profile (beta) is available for the inference of functional profiles from 16S or ITS sequencing data.
Normalize OTU Table by Copy Number (beta) is available to correct relative species abundances using predicted rRNA copy numbers.
Download Ontology Database is available for downloading Gene Ontology (GO) and Enzyme Classification (EC) number databases. This tool replaces the Download GO database tool.
Building functional profiles with EC numbers is now available with the Build Functional Profile tool.
Download Protein Database now provides access to
- a UniRef50 Database annotated with both GO and EC terms.
- a UniRef90 subset including all proteins associated with GO or EC terms.

Other new features

The Compare Variants Across Samples workflow is now available for analysing variants from a single species across many samples. The main results are a combined variant track for all samples and a SNP tree.
Download Curated Microbial Reference Database is now available to download specific reference databases as sequence lists or taxonomic profiling index files. This tool replaces the Download Microbial Reference Database tool. Currently available databases are:
- QIAGEN Microbial Insights - Prokaryotic Taxonomy Database (QMI-PTDB)
- QIAGEN Microbial Insights - Prokaryotic Taxonomy Database (QMI-PTDB) optimized for 16GB of memory
- Unclustered RVDB (https://rvdb.dbi.udel.edu/)
- Clustered RVDB (https://rvdb.dbi.udel.edu/)
Download Custom Microbial Reference Database for the easy customization of microbial reference databases. This tool replaces the Download Microbial Reference Database tool and is more flexible as it supports server and workflow execution and an improved interface to preselect references and taxonomic clades.
Identify Large MLST Scheme from Genomes is available to select the best matching Large MLST scheme for a set of input genomes.

Improvements, Changes and Bug fixes

Workflows

The Type a Known Species workflow and the Type Among Multiple Species workflow now use the Large MLST toolset. Note that the change is fully compatible with classical 7-gene MLST schemes but requires the schemes to be downloaded with the Download Large MLST Scheme tool or imported with the Import Large MLST Scheme tool.
Fixed sample names in workflow reports from Type Among Multiple Species, Type a Known Species, Create Large MLST Scheme with Sequence Types, and Map to Specified Reference.
The trimming reports in the Map to Specified Reference workflow are now placed into a sample specific folder together with the remaining sample specific output.

Split Sequence List

The tool now supports a subsequent iterate block on the resulting output elements.
When using the partition number based option, a metadata column with the chunk id will be added to the optional metadata table describing the output objects.
Fixed an issue that would cause the tool to fail when a selected column contained empty strings and "Create metadata table" has been selected.

Create Annotated Sequence List

The tool now supports annotations using arbitrary columns for matching metadata to sequences.
The tool is now able to handle larger annotation files.
The name of the output of the tool now is the name of the first input element with a "(Metadata Annotated)" extension
The tool now ignores "Name" columns specified in metadata tables. If the names of the sequences must be changed the "Batch rename" tool should be used.
"Taxonomy" columns from metadata tables are now parsed into the standard 7-step taxonomy, even if given in a QIIME format.
The report now contains the taxonomy information also for taxonomies specified in metadata tables.
When allowing the tool to overwrite existing data, it is now possible to disallow overwriting non-empty data with empty values. This is specifically useful when the data comes from metadata tables produced with the Split Sequence List tool, which contains empty fields for conflicting metadata.

Bin Pangenomes by Sequence

Extended parallelism for faster processing.
Fixed an issue where more threads than available could be used by the tool.

Create SNP Tree

A new option "Ignore positions with deletions" has been added to ignore SNP sites where at least one sample contains a deletion.
Fixed an issue with the tool when some samples contain deletions at the tested SNP positions. This would lead to an alignment object, containing symbols which looked like gaps ("-"), but were invalid, leading to an error in the visualization of the tree in some cases.
This bug would also lead to slightly wrong base frequencies when creating Maximum Likelihood trees using the "Felsenstein 81", "HKY", or "General Time Reversible" models. The "Jukes Cantor" (which is default) and "Kimura 80" models assume equal base frequencies, and are thus not affected by this.

Annotate with BLAST/Annotate with DIAMOND

Fixed an issue where using the Combine Reports tool on reports generated by the tools would not produce any output on the Genomics Server.
Fixed an issue that caused erroneous combined reports being produced when combining already combined reports containing sections with Annotate With DIAMOND or Annotate with BLAST results.
Fixed an issue when combining reports from the tools when the hit results were empty.

Other updates

Create K-mer Tree now displays a warning when selecting one or more input sequence lists that contain neither Assembly ID annotations nor read groups.
Taxonomic Profiling reports can now be combined with the Combine Reports and Create Sample Report tools.
A bug has been fixed where aggregating the "Database builder" table on Species level in the Download Microbial Reference Database tool would result in an error. Note that the functionality has been replaced by Download Custom Microbial Reference Database.
When exporting an abundance table as BIOM file, an option has been added to use the name column instead of the often cryptic ID column.
Fixed an issue where OTU abundance tables could not be imported on the server.
Fixed an issue that caused the Create Large MLST Scheme to fail with a cryptic error message when all genes have been discarded.

Functionality retirement

All NGS MLST tools are now considered legacy tools which means they will be removed in a future release. The functionality of the NGS MLST tools has been replaced by the "Large MLST" toolset to be found in the toolbox under Typing and Epidemiology. Note that MLST Schemes compatible with the "Large MLST" set of tools can be obtained with the "Download Large MLST Scheme" and "Import Large MLST Scheme" tools. Both tools support classical 7-gene MLST schemes, core genome and whole genome MLST schemes. The Type a Known Species and Type Among Multiple Species workflows have been updated to work with the "Large MLST" toolset instead. If you are concerned about this change or if you need help migrating to the new set of tools please get in touch via ts-bioinformatics@qiagen.com.
Download GO Database has been replaced by the more general Download Ontology Database tool which allows for downloading both GO and EC databases.
Download Microbial Reference Database has been replaced by Download Custom Microbial Reference Database and Download Curated Microbial Reference Database for constructing custom reference databases and for downloading curated reference databases, respectively.

CLC Microbial Genomics Module 21.0

Released on January 12, 2021

New features

New features improving functional annotation and exploration capabilities

Import RNAcentral Database makes it possible to import sequences from rnacentral.org, and join them with GO associations. The resulting database can be used together with Annotate with BLAST for annotating nucleotide sequences with non-coding genes and after mapping reads to such annotated sequences for functional profiling using the Build Functional Profile tool.
Create DIAMOND Index supports transfer of metadata such as GO-terms, and the resulting index files also have a default tabular view showing the names and metadata for the sequence entries. The resulting index can be used together with Annotate with DIAMOND and Annotate CDS with Best DIAMOND Hit for annotating nucleotide sequences with coding genes and GO-terms and, after mapping reads to such annotated sequences, for functional profiling using the Build Functional Profile tool.
Download Protein Database has been updated so that the SwissProt and UniRef50 database are annotated with GO-terms when downloaded.
A new default view has been added for the GO database. This makes it possible to search the different GO-terms in the ontology and browse their description and their relations. For example, the Gene Ontology visualization can be reduced to the significant entries of a differential abundance analysis table by making use of "Select names in other views" and filter the GO view by "Current selection".

New tools for improved handling of sequence lists

Mask Low Complexity Regions can be used to mask or annotate regions of nucleotide or protein sequences with low complexity. When creating a taxonomic profiling index from such masked sequences, the number of false positive hits in a taxonomic profiling may be reduced.
Create Annotated Sequence List is a general sequence annotation tool that can be used to set up many different databases (annotated sequence lists) used in the CLC Microbial Genomics Module. This tool can also be used in a workflow to create annotated sequence lists, with the ability to connect to many relevant downstream analysis tools. This is a more versatile solution than using a New | Sequence List element.
Split Sequence List can be used to separate sequence lists based on metadata annotation values, e.g. a Microbial Reference Database could be split up based on the values in the Assembly ID column, or it could be split into a specified number of equally sized lists, where the order of sequences within these lists can be optionally randomized. This tool is able to produce a metadata table that contains columns and values from the sequences' metadata annotations that are consistent with the specified splitting scheme.

Improvements, Changes and Bug fixes

OTU Taxonomy annotations

The OTU Taxonomy field on sequence lists has been removed and replaced by the more general Taxonomy field. This changes the following behavior of OTU clustering results:

In sunburst plots, OTUs defined at higher levels are now shown on their respective level rather than on the strain level.
QIIME formatted taxonomies will be converted to standard 7-step taxonomies.

Bin Pangenomes by Taxonomy

The tool now has separate output channels for read mappings using unbinned contigs as references, and for read mappings using all contigs as references.
Fixed an issue where contigs without mapped reads were not listed in the report.

Create Large MLST Scheme

The tool now also accepts individual nucleotide sequences as input.
The default search parameter for DIAMOND has been changed to More Sensitive to reflect the best practices for creating Large MLST Schemes.

Other updates

Fixed an issue with Extract Reads from Selection for abundance tables. Previously, the tool would not extract the correct reads, if the table contained taxonomies specified at different levels. Also, it would not recognize and match on Assembly ID's, if those were specified. Additionally, when selecting multiple entries, the tool would not extract the union of the selection. Note, that is it is necessary to rerun the Taxonomic Profiling tool in order for the "database matches" table to have assembly ID's added.
Irrelevant log messages from the Add Typing Results to Large MLST Scheme tool have been removed.
Fixed an issue in Find Prokaryotic Genes where open reading frames of open-ended CDSs could be incorrectly identified.
Fixed an issue with Annotate with DIAMOND where clicking "Next" when configuring the workflow element was not possible.
Fold changes and p-values from Differential Abundance Analysis will change by a small amount compared with the values generated previously (typically <0.01%) due to improvements to the underlying statistical model.
When downloading virus data, the Download Microbial Reference Database and Download Pathogen Database tools print warnings to check the minimum sequence length parameters.

Other Changes

The license feature name for this product has changed. Previously it was CLC_MICROBIAL_GENOMICS_MODULE, now it is CLC_GENOMICS_PREMIUM_MODULES. If you have a network license for this product and you have configured access, or other restrictions, relating to licenses for this product, the license feature must be updated in the CLC License Server configuration.

Functionality retirement

Tools

Create Microbial Reference Database
Create Amplicon-Based Reference Database
Create Gene Database

The above have been replaced by the new, general purpose Create Annotated Sequence List tool.

CLC Microbial Genomics Module 20.1.1

Released on October 22, 2020

Improvements and bug fixes

Taxonomic Profiling

Fixed an issue where multiple host genome indices could be selected as input, but only the first was used. Only one host genome index can now be supplied to the Taxonomic Profiling tool. The Create Taxonomic Profiling Index tool can be used to combine multiple sequence lists into a single, combined index.
Fixed an issue where, when a single taxon is the most dominant constituent of a metagenomics sample, the most abundant taxon was removed, when it should not have been. This problem was most likely to affect low complexity samples, that is, samples with 1 or very few taxa present, and that have varying read lengths in the input data. The issue was unlikely to affect high complexity samples.
Fixed an issue where the tool would incorrectly disqualify contigs of reference sequences if reads were longer than the contig.
Correcting abundance values to account for a skewed read distribution between taxa is now optional.
The confidence score has been removed from the abundance table output.

Other tools

Fixed an issue where Download MLST Schemes (PubMLST) stopped working due to a recent change in the XML format used by PubMLST.

Fixed an issue where Bin Pangenomes By Taxonomy would fail when using a Taxonomic Profiling index containing entries without taxonomies.

CLC Microbial Genomics Module 20.1

Released on June 23, 2020

New Features

Large MLST tools, workflows and resources

All tools for creating and using Large MLST Schemes are now out of beta.

A tutorial called Working with large MLST schemes is now available, covering using Large MLST Scheme tools with a focus on scheme creation, modification, and isolate typing.

A workflow Create Large MLST Scheme with Sequence Types is now provided for creating high-quality schemes with sequence types. This workflow includes the Create Large MLST Scheme tool, which has been substantially revised for this release, including supporting the creation of schemes from organisms with spliced genes.

Type with Large MLST Scheme

Typing is now faster and there is a smaller memory footprint.
Noise from spurious kmer hits has been reduced.
A new parameter, "Minimum kmer ratio", has been added to allow low-confidence allele hits to be removed.
When supplying the output report from this tool to the Extend Result Metadata Table tool, three new fields are added to the Result Metadata Table, "Typing Status", "Sequence Type" and "MLST Scheme".
The number of reported sequence types is now restricted to 100.
The assembly status is now automatically detected. The "Assemble reads" option is thus no longer presented in the wizard.
Read coverage for the alleles is now automatically detected.
Fixed an issue where the tool could detect novel alleles with ambiguous symbols when typing an assembly.
Fixed an issue where the tool would fail with an error if no MLST scheme had been supplied.

Add Typing Results to Large MLST Scheme

Performance has been improved
Allele lengths are now checked, and outliers due to length are automatically discarded.
The handling of novel alleles names has been improved.
Metadata of an associated metadata table is now transferred to the output scheme.

Create Large MLST Scheme

This tool has been substantially revised for this release. Among other changes, it now supports the creation of schemes from organisms with spliced genes. Please refer to the manual for usage details.

Below is a list of concerns addressed in this update:

Typos and number formatting in the report output have been corrected.
Handling of ambiguous sequence types has been improved.
Fixed an issue where the tool would fail when searching for unannotated genes and at least one contig did not contain a single DIAMOND hit.
Fixed an issue where the tool would fail when the stop codon coincided with the end of a contig.
Fixed an issue where the tool would fail when DIAMOND detected a frameshift in a gene.
Fixed an issue where the locus type annotations "AMR related" and "Virulence related" were not added when creating a scheme without sequence types.
Fixed an issue where the tool would not accept a sequence list without CDS annotations if it was added as the first input element, even when several input elements with CDS annotation had been selected.

Download Large MLST Scheme

A wider range of pubMLST schemes are now handled.
Handling of ambiguous sequence types has been improved.
Downloading schemes is now more robust.
Various other minor improvements

Working with large MLST schemes

Large MLST schemes can now be exported.
The loading time of large MLST schemes has been improved.
Fixed an issue where the clustering options for the creation of subschemes or reclustering had been disabled in certain cases.
Fixed an issue in the visualization of the minimum spanning trees where a null pointer exception would occur when shift-dragging before selecting any node.

Other new tools

Two new tools for annotating nucleotide sequences, such as de novo contigs or genomes, from a candidate set of reference sequences are availalble under the Functional Analysis folder:

Annotate with DIAMOND Adds CDS annotations to DNA sequences based on matches found using DIAMOND. Matching can be done against a list of proteins, CDS annotations from an annotated genomic sequence, or a DIAMOND index.
Annotate with BLAST Annotates a sequence list based on matches found using BLAST, where matching can be done against a list of proteins, a list of nucleotide sequences, annotations from an annotated genomic sequence or existing BLAST databases.

Extend Result Metadata Table This replaces the Add To Result Metadata Table tool, providing similar functionality, but in a form that can be included in workflows intended for use on QIAGEN CLC Genomics Cloud Engine.

Improvements, Changes and Bug fixes

Taxonomic Profiling can be used now analyze metagenomic data produced with long read technologies.

The Type a Known Species, Type among Multiple Species and Map to Specified Reference workflows can now be run on the QIAGEN CLC Genomics Cloud Engine. Changes introduced to support this were the inclusion of control flow elements in the workflow design and the inclusion of the new Extend Result Metadata Table tool.

The Create Kmer Tree and Create SNP Tree tools can now handle input data that is associated with several metadata tables. The use of the Result Metadata Table is now optional.

The removal and reporting of duplicate sequences by Create Taxonomic Profiling Index has been improved.

Assembly grouping options have been added to Find Prokaryotic Genes, allowing the grouping of input sequences to be specified.

Download Microbial Reference Database

Fixed an issue where not all available genomes from NCBI were shown.
Stability when communicating with the NCBI servers has been improved.
The speed when loading the selection table has been improved.
Duplicate sequences are no longer removed from input sequence sets, allowing more general use of this tool for browsing and downloading sequences from the NCBI.
The "Sequence list(s)" option for providing sequences to be appended to the downloaded sequence list has been removed. Please see our FAQ entry "How can I concatenate sequence lists and when do I need to?" if you have been using this option previously.
The Database Selection table now offers the possibility of de-/selecting several references with a single click.

ARES AMR Database

The data in the overview have been corrected and consolidated at the species level.
Fixed an issue where duplicate names in a created PointFinder table caused the Find Resistance with PointFinder tool to fail.

Legacy tools and workflows

The following tools and workflows have been moved to the Legacy folder of the Workbench Toolbox, with "(legacy") appended to their original names. They will be removed in a future version of the software.

Add To Result Metadata Table (legacy) Please use the new Extend Result Metadata Table tool instead.
Type a Known Species (legacy), Type among Multiple Species (legacy) and Map to Specified Reference (legacy) workflows. These workflows contain the Add To Result Metadata Table (legacy) tool. Please use the new workflows, of the same original names, described above, which make use of the new Extend Result Metadata Table tool.

CLC Microbial Genomics Module 20.0

Released on December 11, 2019

New Features

Five new tools are available for working with core genome (cg) and whole genome (wg) MLST schemes, three tools to create MLST schemes :

Download Large MLST Scheme for downloading MLST schemes from PubMLST.org. It currently supports a range of cgMLST, eMLST and traditional 7-gene MLST schemes.
Import Large MLST Scheme for importing MLST schemes from plain-text files in pubMLST format, i.e. one tsv file with the profile information and one FASTA file per locus containing the alleles.
Create Large MLST Scheme (beta) facilitates the construction of cg/wg MLST schemes starting from a sequence list with CDS annotations.

For typing and extending schemes, two new tools are available

Type With Large MLST Scheme (beta) uses WGS reads or assemblies of bacterial isolate samples to identify known or novel sequence types and novel alleles.
Add Typing Results To Large MLST Scheme (beta) allows the addition of new sequence types, alleles and metadata to an existing scheme.

The MLST schemes feature minimum spanning tree and heat map visualizations which are synchronized with the typing results, facilitating the analysis of typing results in relation to the scheme it has been typed with.

Improvements, Changes and Bugfixes

Resistance Detection Tools and Databases

The “Filter overlaps” option of Find Resistance With Nucleotide DB now prioritizes BLAST hits by the number of aligned nucleotides (similarity*length) and only reports the best hits irrespective of the “Predicted Phenotype”.
The QMI-AR database has been updated to version 2019-11
The CARD database has been updated to version 3.0.5.
When downloading multiple resistance databases, the Download Resistance Database tool no longer stops if one of the downloads fails. It will continue to attempt to download the other databases.

Alpha Diversity

The calculation of the percentiles of the Alpha Diversity Box Plot editor has been changed to match the default implementation of the “quantile” method in R.
The whiskers in the boxplot visualization of the alpha diversity, are now restricted to the last data point within the selected interquartile range, according to https://en.wikipedia.org/wiki/Box_plot.

Bin Pangenomes by Taxonomy

The report generated by Bin Pangenomes by Taxonomy now contains the name of the taxonomy of a bin rather than the taxonomic level in the “Taxonomy” column, and the contig and nucleotide counts have been corrected.
Fixed an issue where the tool failed if several files containing contigs were used as a parameter.

Bin Pangenomes by Sequence

Fixed a bug in Bin Pangenomes by Sequence that caused the tool to stall when some of the reads giving rise to a contig had been removed prior to binning.
Fixed an issue that caused the tool to crash when two input connections in a workflow contained different types of data.

DIAMOND

The DIAMOND tool has been updated to v0.9.26. To run this version, CPUs supporting AVX instructions are now also required for Linux based operating systems, and all existing DIAMOND index files will have to be recreated. DIAMOND is used in:
Improved progress reporting for tools running DIAMOND.

The Typing and Epidemiology tools are no longer beta-status tools.

The QIAseq 16S/ITS Demultiplexer tool has been updated with new barcodes to support the latest QIAGEN QIAseq 16S/ITS Region Panels kit.

For the Differential Abundance Analysis tool, the order of comparisons in an “Across Groups” analysis has been changed to match the sign of the fold change of the “All group pairs” or “Against control group” analyses.

CLC Microbial Genomics Module 4.8

Released on September 19, 2019

Resistance analysis updates:New and updated databases are available for download via the Download Resistance Database tool:

The integrated ARES database, containing resistance conferring genes and mutations, with the respective empirical predictive performance data.
An updated QMI-AR database where a serious issue has been fixed which caused some of the resistance genes originating from ResFinder to be associated with the wrong ARO number. Further details …
Additional and updated PointFinder databases.

Find Resistance with ShortBRED

The computational performance of the tool has been improved for the case where a sequence list of reads matching ShortBRED markers is requested.
The column names of the summary table output from the tool have been renamed from ‘Number of reads‘ and ‘Number of unique reads‘ to ‘Number of markers‘ and ‘Number of reads‘, respectively.

When using the Find Resistance with Nucleotide DB tool with the Virulence Factor Database and using Add To Result Metadata Table, the corresponding column names have been adjusted to ‘Virulence found‘ instead of ‘Resistance Found‘.

A new tutorial for profiling antimicrobial resistance genes in isolate and metagenomic samples of NGS reads is provided.

Additional bugfixes and improvements:

Fixed a serious issue where the Create MLST Scheme tool would wrongly associate allelic sequences with profiles in some cases. Further details …
For the stacked bar chart visualization of abundance tables it is now possible to customize the legend by selecting the number of items to be shown, with a default value of 10, and to specify the depth of the taxonomy if the view is aggregated.
Fixed a bug in the Alpha Diversity Box Plot Editor which caused the visualization to crash for an aggregated abundance table.
A bug has been fixed which caused the QIAseq 16S/ITS Demultiplexer tool to crash when the first column contained numerical values.
Fixed a bug that appeared when the tool Download Microbial Reference Database was added to a workflow.
When running the tool Remove OTUs with Low Abundance in a workflow, an error message has been added when multiple input files are encountered.

CLC Microbial Genomics Module 4.5

Release on June 27, 2019>

New Features

Five new resistance databases are now accessible through the Download Resistance Database tool:

two peptide databases of Antibiotic Resistance markers for the Find Resistance with ShortBRED tool:
- The QIAGEN Microbial Insight – Antimicrobial Resistance (QMI-AR) database
- The Comprehensive Antimicrobial Resistance Database (CARD) (McMaster University)
three nucleotide databases of Antibiotic Resistance genes for the Find Resistance with Nucleotide DB tool:
- The QIAGEN Microbial Insight – Antimicrobial Resistance (QMI-AR) database
- The Comprehensive Antimicrobial Resistance Database (CARD) (McMaster University)
- The Virulence Factor Database (VFDB) (CAMS&PUMC – China)

Improvements, Changes and Bugfixes

Find Resistance with ShortBRED

The tool now supports two new marker databases: QMI-AR and CARD.
The tool now supports more comprehensive metadata including Compound ARO, Gene annotation depth and information about the antibiotic class to which the marker confers resistance.
In the optional sequence-list tool output, reads are annotated with phenotype and other information from markers they aligned to.
The tool can now output a sortable and searchable result table.

Alpha diversity

Alpha diversity measures can now be calculated on a specified level of the taxonomy.
Alpha diversity result can now be visualized as a box plot at a given rarefaction level where the samples are grouped by their metadata.
In the box plot visualization p-value statistics for the Kruskal-Wallis and the Mann-Whitney U tests are available for comparing groups of samples.
The Alpha diversity result can now only be visualized with the ‘Alpha Diversity Graph‘, the similar ‘Line Graph‘ has been removed.
The options ‘X-axis at zero‘, ‘Y-axis at zero‘, and ‘Show as histogram’ of the ‘Alpha Diversity Graph’ have been removed.
Fixed a bug preventing the tool from being executed in some workflows.

Beta diversity

The beta diversity result can now be visualized as a 2D plot of the principal coordinates.
In order to explore variations in community structure, samples can now additionally be colored by the aggregate abundance of user-selected taxonomic groups.
Side panels now have a ‘Show point‘ option so that the samples to include in the plot can be selected.

Find Prokaryotic Genes

The tool can now create a single or multiple gene structure models when the input consists of several assemblies or contig bins.
The tool now provides an option to annotate open-ended sequences (e.g. short contigs).
The tool has now options to save and reuse models.
It is no longer a beta status tool.

Download Protein Database

The UniRef50 and SwissProt databases have been updated to version 2019_03.
The UniRef90 and UniRef100 databases can no longer be downloaded.

Bin Pangenomes by Sequence

Fixed a bug, where it would produce only few bins for large contig lists (in the order of 20,000 contigs or more).
Fixed a rare bug where the tool would crash when the coverage on a contig was best explained by a single Poisson function.

Download Microbial Reference Database

Fixed an issue where the tool would fail when providing a sequence list without accession IDs as parameter while and selecting the option to ‘Include only plasmids‘ or ‘Exclude all plasmids‘.
Fixed a bug which leads to wrong numbers in the taxonomic summary table of the reports for the Download Microbial Reference Database and the Taxonomic Profiling tools.
Fixed an issue where duplicate sequences in the sequence list provided were not removed.
Fixed a bug where the statistics in the report were wrong.

Additional improvements

The Find Resistance with ResFinder tool has been renamed: Find Resistance with Nucleotide DB. This tool now offers support for the new antimicrobial resistance databases, QMI-AR and CARD, and the virulence factor database VFDB (see above).
A search field has been added in the side panel of visualizations with associated metadata information in order to enable easy access to the metadata category of interest.
The folders ‘Functional Analysis’ and ‘Drug Resistance Analysis’ have been moved from ‘Metagenomics’ to the general ‘Microbial Genomics Module’ folder to highlight that these tools can be used with both metagenomic and isolate data.
The input field for the parameter MLST Scheme Name has been shortened to fit within the default window size of the Create MLST Scheme tool wizard.
Fixed a potential bug that could cause the OTU Clustering tool to incorrectly count the number of paired reads mapping in the forward or reverse orientation to a reference OTU. While this would not affect the result of the clustering, it may prevent some OTUs from being reversed-complemented in the output.
Fixed an issue where creating a sunburst plot for an aggregated abundance table with the button ‘Create Abundance Subtable’ would fail when at least one row in the original abundance table had the entry “N/A” in the taxonomy column.

CLC Microbial Genomics Module 4.1

Release on January 31, 2019

Metagenomics - Amplicon-Based Analysis

The OTU Clustering tool has a new option for specifying if non-merged paired-end reads should be included in the analysis. This option is off (unchecked) by default, as including only merged reads improves analysis run time. The Data QC and OTU Clustering workflow now also includes only merged reads in the OTU clustering analysis step. To run the workflow with all reads, a copy of the workflow must be created and this option enabled in that copy.
The "Similarity Percentage" parameter can now be adjusted when launching the Data QC and OTU clustering workflow.
Fixed a bug where action buttons underneath tables would not be accessible if the table view was too narrow.

Metagenomics - Taxonomic Analysis

Fixed an issue that caused the Bin Pangenome by Taxonomy tool to crash when the input index did not contain any taxonomy information.
Updated two parameter names in the tool Bin Pangenomes By Taxonomy to better reflect the allowed input data type.
The Bin Pangenomes by Taxonomy can now output a specified number of best bins (defined as largest coverage of the reference genome) individually to facilitate subsequent analyses.
Fixed a bug causing some of the plasmid-related information to be lost during the binning step of the Bin Pangenomes by Taxonomy tool and the QC, Assemble and Bin Pangenomes workflow.
Fixed a broken help link for the QC, Assemble and Bin Pangenomes workflow.
Fixed the Bin Pangenomes by Sequence tool so it can process all and only the relevant input object types.
The Taxonomic Profiling abundance table has a new button, "Extract Reads from Selection", for the extraction of reads uniquely associated with specified rows in the table.
It is now possible to specify the host genome index when running the Data QC and Taxonomic Profiling workflow by enabling the option "Filter host reads".

Metagenomics - Functional Analysis

The Build Functional Profile tool can now output a DIAMOND hits functional profile.
Fixed a bug in the Find Prokaryotic Genes tool that affected genes spanning the origin of circular chromosomes, which would have the annotated CDS region spanning the whole circular chromosome.
Fixed a bug that would cause the tool Annotate CDS with Best Diamond Hit to stall when running Diamond in 'sensitive' or 'more sensitive' mode in a Gx Workbench running on Windows 10.

Metagenomics - Drug Resistance Analysis

Fixed a bug where accession number hyperlinks in the Find Resistance with ResFinder output table could not be properly exported to Excel.

Fixed a bug with the tool Use Genome as Result - and the workflow using the tool called Map to Specified Reference - when the genome name contains a colon " : ".
Fixed a problem where the Download MLST Schemes (PubMLST) tool did not format the MLST schemes properly resulting in non-conclusive MLST assignments when using the downloaded schemes for typing.

CLC Microbial Genomics Module 4.0

Released on November 28, 2018

New tools for Metagenomics

Create Taxonomic Profiling Index, a tool to index reference sequences for use with the Taxonomic Profiling tool.
Create DIAMOND Index, a tool for computing a DIAMOND index which can be used as input to Annotate CDS with Best DIAMOND Hit.
Bin Pangenomes by Sequence, a tool to group contigs and reads typically of a shotgun metagenomic sample uniquely based on sequence and coverage similarity.
Bin Pangenomes by Taxonomy, a tool to group contigs and reads typically of a shotgun metagenomic sample according to their taxonomic relationship.
QC, Assemble and Bin Pangenomes, a workflow for pre-processing and assembly of whole-genome shotgun sequencing reads, and bin contigs/reads according to taxonomic association and sequence similarity.
Drug resistance analysis, a new area that collects tools for antibiotic resistance analysis:
- Find Resistance with PointFinder, for identifying antimicrobial resistance mutations present in a isolate sample using a antibiotic variant database especially designed by QIAGEN.
- Find Resistance with ShortBRED, for identifying antimicrobial resistance genes and quantifying their relative abundance in a metagenomic sample using a peptide marker database especially designed by QIAGEN.
- The tool Find Resistance has been renamed to Find Resistance with ResFinder.

New features and improvements: Functional Analysis

The tool Annotate CDS with Best DIAMOND Hit has new options to run in standard, sensitive and more sensitive modes.
We improved the accuracy of the BLAST search in the Annotate CDS with Best BLAST Hit tool.
Improved the Sunburst plot to allow graphical export with the legend.
Three vector formats (.ps, .eps, .svg) have been added to the export sunburst dialog.
Stacked bar charts now also show the relative abundance when hovering over the chart.

Improvements for Databases

Renamed the Download Database for Find Resistance tool to Download Resistance Database.
- The tool Download Resistance Database now provides the user with a peptide marker database to be used with the tool Find Resistance with ShortBRED.
- The tool Download Resistance Database now provides the user access to a genes sequence list for alleles to be used with the Find Resistance with PointFinder tool.
The tool Create Microbial Reference Database has been renamed to Download Microbial Reference Database.
- The Download Microbial Reference Database tool now has new options allowing to only include plasmid sequences, only genomic sequences, or to include both types of sequences.
- The Download Microbial Reference Database now by default downloads sequences without any annotation, an option has been added to allow the user to also download available annotations together with the sequences.
Removed "All Plasmids" option from Download Pathogen Reference Database tool.

Bug fixes

The QIAseq 16S/ITS Demultiplexer now removes extra leading and trailing spaces from user-defined barcodes.
Changed the output names of the QIAseq 16S/ITS Demultiplexer tool to follow the "sample_region" format, since previous format (region_sample) would cause sample names to be removed in OTU abundance tables.
Fixed an issue with the Annotate CDS with Best DIAMOND Hit tool where it fails with "RC = 132" on older Mac computers (pre 2014).

CLC Microbial Genomics Module 3.6.1

Released on October 10, 2018

Bug fixes

Fixed an issue where the Download Database for Find Resistance tool failed due to an update at ResFinder.
Fixed an issue with the handling of duplicate database entries in the Create Microbial Reference Database tool. This fix greatly improves the speed of the tool when handling data added during recent NCBI updates.

CLC Microbial Genomics Module 3.6

Released on September 13, 2018

Improvements

It is now possible to import a custom MLST profile using the Create MLST Scheme tool.
In the Add NGS MLST Report to Scheme tool it is now possible to add more than one report, and therefore more than one sequence type, to a scheme at a time.
Warning messages in Add NGS MLST Reports to Scheme and Merge MLST Schemes now appear when the specified report/schemes to add/merge are incompatible.
The protein accession ID links in the DIAMOND result table now point to UniProtKB instead of NCBI.
The QIAseq 16S/ITS Demultiplexer tool now adds region information to the read group in the element info output. Thus the OTU Clustering tool adds region information as metadata in the abundance table to allow data aggregation based on this metadata category.
In Abundance tables, headers of the columns displaying abundances for each sample have been reverted to show the sample name first. This improves clarity when showing an Abundance table with multiple samples.

Bug fixes

Fixed a bug in Add Sequence to MLST tool, where the steps defining the sequences to be added were not updated after changing the specified MLST scheme.
Fixed a bug causing the Find Prokaryotic Genes tool to fail when a large number of sequences are provided as input.
Fixed a bug causing the parameter validation of the QIAseq 16S/ITS Demultiplexer tool to fail when it is included in a workflow.

CLC Microbial Genomics Module 3.5

Released on June 28, 2018

New tools

Annotate CDS with Best DIAMOND Hit - an efficient alternative to Annotate CDS with Best BLAST Hit allowing the annotation of large data sets, even on desktop machines.
Download Protein Database - five protein databases are available to download using this tool: COG, SwissPROT, UniRef-50, UniRef-90, and UniRef-100
Find Prokaryotic Genes (beta) - a tool for identifying and annotating prokaryotic genome or contig sequences with predicted gene and CDS regions.
QIAseq 16S/ITS Demultiplexer- a tool for demultiplexing reads generated using QIAseq 16S/ITS Screening and Region panels.

Improvements

Abundances tables have now the following buttons:
- Create Abundance Subtable replaces Create Abundance Table from Selection and will create a table from selected rows.
- Create Sequence Sublist (available for OTU abundance tables only) will create a sequence list from selected rows.
- Create Normalized Abundance subtable will create a table normalized on a single row for which all abundance values are non zero.
The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now create a copy of the input instead of modifying it.
The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now optionally outputs a table summarizing information about the annotations added to the sequence list.
The Create Microbial Reference Database now includes an option to use a QIAGEN compiled set of Genbank assembly IDs pre-selected to represent the full NCBI list of genomes. The optimized database is particularly well-suited for running the Taxonomic Profiling tool on a laptop computer with 16GB of RAM.
The Taxonomic Profiling tool now qualifies reference genomes automatically without hard thresholds for minimum number of reads or minimum coverage, exploring the potential mapping positions more exhaustively.
The Taxonomic Profiling tool has a new option called "Minimum seed length" that allows users to define the desired balance between precision (higher length) and recall (lower length).
In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.

Changes

In workflows, the PERMANOVA Analysis and Convert Abundance Table to Experiment tools no longer accept as input abundance tables generated by tools within the same workflow. Abundance tables must now exist prior to launching any workflow containing either of these tools. Existing workflows where either of these tools is configured to take in abundance tables generated by other tools in the same workflow will need to be re-designed.
The folder 'Amplicon-Based OTU Clustering' has been renamed to 'Amplicon-Based Analysis'.
In the Databases folder, the 'Taxonomic Profiling' subfolder was renamed to 'Taxonomic Analysis'.

Bug fixes

Fixed a bug that caused the ID column to display incorrect information on aggregated Abundance Tables.
Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the "Fuzzy match duplicates" option enabled.
Fixed an issue that would affect the OTU Clustering report when run with the option "Allow creation of new OTUs" disabled: "Total predicted OTUs" and "De novo OTUs" are now showing correct values. More specifically, the "Total predicted OTUs" would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the "De novo OTUs" value, which is computed as the difference between the "Total predicted OTUs" and the "OTUs based on database" values.
Fixed a bug that would happen in the rare cases where identical subsequences (contigs) with different taxonomies were found in a database for the Taxonomic Profiling tool. The taxonomy of the identical contigs are now set to the lowest common ancestor.

CLC Microbial Genomics Module 3.0.1

Released on May 15, 2018

Improvements

The De Novo Assemble Metagenome tool is now in the Metagnenomics folder, and the tools in the Databases folder have been reorganized in application-specific subfolders.
An informative error message is now produced by the OTU Clustering tool if the option "Similarity percent specified by OTU database" is selected and a database is chosen that does not specify a similarly percentage.
Decimal values are now supported for the "Similarity percentage" parameter in the OTU Clustering tool.
The Find Best Matches using K-mer Spectra tool now stops if only a few reads can be mapped, and an error message describing the problem is presented.
The Find Best Matches using K-mer Spectra, Identify MLST Scheme from Genomes, Identify MLST, and Extract Regions from Tracks tools, as well as the Type Among Multiple Species and Type a Known Species workflows, can now deal with references that have more than one chromosome or lists of contigs.
The memory handling of the Taxonomic Profiling tool when working with large numbers of reads has been improved.

Bug fixes

Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the "Metadata factor" parameter was not retained when configuring the tool in a workflow.

CLC Microbial Genomics Module 3.0

Released on November 21, 2017

New features

The Create SNP Tree tool can now output a new SNP Matrix that contains a pairwise comparison of SNP differences between any pair of all samples included in the analysis.
- The matrix supports coloring of individual table cells for easy identification of related strains.
- It is possible to highlight samples with less SNP differences than an adjustable threshold.
A new Multi-VCF format in the Export menu renders possible to export multiple samples' variant tracks into one VCF file, provided that they have the same reference genome.
A new option in the Data section of Abundance Table Settings side panel allows for hiding entries with incomplete taxonomy for the taxonomic level chosen to aggregate the data.

Changes

Updated the Alpha Diversity tool to being able to handle a lower detection limit per feature in an abundance table.
The optional output of a Distance Matrix from the Beta Diversity tool is changed from being a simple table object to now being a SNP Matrix object.

Improvements

The Taxonomic Profiling tool has been improved, allowing higher detection rates at an equivalent level of false positives.
The Taxonomic Profiling tool can be configured by the users according to two new options: the minimum number of reads, and minimum coverage criteria necessary for the read to be assigned.
The Differential Abundance Analysis tool has been updated such that:
- It has an extra option for the comparison of all groups against one specific group within a metadata factor.
- It can perform an ANOVA-like comparison.
The Create SNP Tree tool now also supports construction of Maximum Likelihood phylogenies:
- Users can choose whether to run a Neighbor-Joining algorithm or a Maximum Likelihood algorithm.
- Users can optionally output an alignment of the concatenated SNPs that are used in the construction of SNP tree.
Trees produced with the Create SNP Tree and Create K-mer Tree tools are now multifurcating.

Bug fixes

Fixed a bug that caused bacterial assemblies of type "acidobacteria" and viral assemblies of type "dsDNA viruses, no RNA stage" to not be shown by the Create Microbial Reference Database tool.
Fixed a bug causing the annotation columns "Assembly ID" and "FTP Path" to disappear in sequence lists downloaded with the Create Microbial Reference Database tool.
Updated the manual to be more specific about downloading viruses from NCBI with the Create Microbial Reference Database tool.
Fixed a bug that cause Create Microbial Reference Database tool to not download taxonomies for all entries in some cases.
Fixed a bug caused by NCBI renaming a column in one of their files and leading the Download Pathogen Reference Database tool to fail.
Renamed the "Set of species" option in Download Pathogen Reference Database to "By Kingdom/Domain".
Fixed a bug in the OTU Clustering tool causing the Merge Paired Reads Report to not be output when the input contains both merged and non-merged sequence lists.
Fixed a bug in Align OTUs with MUSCLE that would cause the tool incorrectly select the most abundant in some cases.
The Differential Abundance Analysis now accepts metadata groups with only one replicate.
Added a popup menu allowing to select and deselect all samples in Stack and Sunburst visualization of abundance tables.
Upgraded the Neighbor Joining algorithm in the Create SNP Tree tool to use less memory.
Updated the Create SNP Tree and Create K-mer Tree tools so that trees with negative branch length are not allowed.
Fixed an issue with the Biom importer when run through the Cosmos ID plugin.
Updated manual with special system requirements.

CLC Microbial Genomics Module 2.5.5

Released on October 10, 2018

Bug fixes

Fixed an issue where the Download Database for Find Resistance tool failed due to an update at ResFinder.
Fixed an issue with the handling of duplicate database entries in the Create Microbial Reference Database tool. This fix greatly improves the speed of the tool when handling data added during recent NCBI updates.

CLC Microbial Genomics Module 2.5.4

Released on June 28, 2018

Improvements

In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.
OTU abundances tables have now a Create Sequence List from Selection that will create a sequence list from selected rows.

Bug fixes

Fixed a bug that caused the ID column to display incorrect data on aggregated Abundance Tables.
Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the "Fuzzy match duplicates" option enabled.
Fixed an issue that would affect the OTU Clustering report when run with the option "Allow creation of new OTUs" disabled: "Total predicted OTUs" and "De novo OTUs" are now showing correct values. More specifically, the "Total predicted OTUs" would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the "De novo OTUs" value, which is computed as the difference between the "Total predicted OTUs" and the "OTUs based on database" values.

CLC Microbial Genomics Module 2.5.3

Released on May 15, 2018

Improvements

An informative error message is now produced by the OTU Clustering tool if the option "Similarity percent specified by OTU database" is selected and a database is chosen that does not specify a similarly percentage.
The Find Best Matches using K-mer Spectra tool now stops if only a few reads can be mapped, and an error message describing the problem is presented.
The Find Best Matches using K-mer Spectra, Identify MLST Scheme from Genomes, Identify MLST, and Extract Regions from Tracks tools, as well as the Type Among Multiple Species and Type a Known Species workflows, can now deal with references that have more than one chromosome or lists of contigs.

Bug fixes

Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the "Metadata factor" parameter was not retained when configuring the tool in a workflow.

CLC Microbial Genomics Module 2.5.2

Released on December 5, 2017

Bug fixes

Fixed a bug that caused bacterial assemblies of type "acidobacteria" and viral assemblies of type "dsDNA viruses, no RNA stage" to not be shown by the Create Microbial Reference Database tool.
Fixed a bug causing the annotation columns "Assembly ID" and "FTP Path" to disappear in sequence lists downloaded with the Create Microbial Reference Database tool.
Updated the manual to be more specific about downloading viruses from NCBI with the Create Microbial Reference Database tool.
Fixed a bug that cause Create Microbial Reference Database tool to not download taxonomies for all entries in some cases.
Fixed a bug caused by NCBI renaming a column in one of their files and leading the Download Pathogen Reference Database tool to fail.
Fixed a bug in Align OTUs with MUSCLE that would cause the tool incorrectly select the most abundant in some cases.
Fixed an issue with the Biom importer when run through the Cosmos ID plugin.
Updated manual with special system requirements.

CLC Microbial Genomics Module 2.5.1

Released on September 11, 2017

Bug fixes

Fixed an issue in the Create Microbial Reference Database tool that led to incorrect taxonomies being assigned when "Viruses" was selected in the "Select NCBI sources" section of the wizard.
Fixed a bug that caused the OTU clustering tool to fail in rare cases.

CLC Microbial Genomics Module 2.5

Released on August 16, 2017

New features

New import and export feature of abundance tables in the biological observation matrix (biom) file format. This allows users to share and use their data with analysis tools from CosmosID, or to visualize an abundance table from CosmosID using the MGM tools:
- The new importer supports version 1.0 and 2.1 of the biom file format.
- The new exporter supports version 2.1 of the biom file format.
The manual section about the Taxonomic Profiling tool has been updated to reflect the current intended use of the tool.

Changes

The tools Optional Merge Paired Reads and Fixed Length Trimming have been moved to the Legacy Tools folder of the toolbox as they are no longer needed for the OTU Clustering tool. They will be completely removed in a future release of the software.
The Optional Merge Paired Reads and Fixed Length Trimming steps have been removed from the Data QC and OTU Clustering workflow because the OTU Clustering tool can now merge paired reads and does not require fixed-length sequences as input.
The Taxonomic Profiling tool now allow the user to optionally "Estimate paired end distances" as a pre-processing step, and its performance has been improved.

Improvements

The OTU Clustering tool can now also handle fungal Internal Transcribed Spacer (ITS) amplicon sequences:
- The algorithm have been improved to handle variable length data like fungal ITS sequences, which makes the Fixed Length Trimming tool redundant.
- The OTU Clustering tool now handles OTUs with reads mapping in both forward and backward orientation for taxonomic assignment. This kind of mixed orientation data now also works with the "Allow creation of new OTUs" option enabled.
- After loading the read sequences, the tool now attempts to merge any overlapping paired-end reads, thus making the Optional Merge Paired Reads tool redundant. The parameters for the alignment of reads are now part of the "OTU Clustering" wizard. OTU clustering is performed on all reads, i.e., both reads that are merged and reads that could not be merged.
- The tool can process both paired-end and single-end data files at the same time.
The Taxonomic Profiling reference database index management has been improved, in that it includes messages/warnings in the wizard about indexing, and generates a new CLC folder called "CLC_MgmReferenceCache" designated for the storage of index files.
The Download Database for Find Resistance tool has been updated to point to the newest version of the database.

Bug fixes

Fixed a bug that caused the "Create Abundance Table from Selection" button to fail due to duplicated names while aggregating on taxonomy.
Fixed a bug that caused the Data QC and Clean Host DNA, Data QC and Taxonomic Profiling, Type a Known Species, and Type Among Multiple Species workflows to not run on CLC Genomics Server without the Biomedical extension enabled.
Fixed a bug that caused Add Metadata to Abundance Table to throw a NullPointerException when opening Excel files with empty cells.
Fixed a bug that caused the Create SNP Tree tool to fail when analyzing read mappings whose genomes are comparable but have chromosomes in a different order.
Fixed a bug that caused the Find Resistance tool to not report all BLAST hits when the gene database contains more than 250 genes.
Fixed a bug causing Stacked Charts to throw an out of bounds exception when changing from "Bar Chart" to "Area Chart".
Fixed a bug that made the Create Microbial Reference Database tool crash when filtering sorting and aggregating a selection table.
Fixed a bug causing the "File with accession number" option in the Create Microbial reference database tool to be without effect.
Minor bug fixes

CLC Microbial Genomics Module 2.0

Released on March 2nd, 2017

New features

New tool for Taxonomic Profiling of whole metagenome shotgun sequencing datasets.
- All existing visualizations (stacked bar charts, stacked area charts, sunburst charts and heat maps) have been updated to work with the output from this tool.
- All existing abundance analysis tools (Alpha Diversity, Beta Diversity, PERMANOVA Analysis and Differential Abundance Analysis) have been updated to work with the output from this tool.
Three new workflows for host DNA removal, taxonomic profiling and downstream analysis of whole metagenome shotgun sequencing datasets:
New tool for easily creating custom microbial reference genome databases for use in taxonomic profiling and microbial isolate typing: Create Microbial Reference Database.

Changes

The plugin Toolbox has been largely restructured in order to make it more intuitive to navigate. Microbiome analysis tools are now categorized into four folders: Amplicon-based OTU Clustering, Taxonomic Analysis, Functional Analysis, and Abundance Analysis. All database management tools have been collected in the top-level folder Databases.
The two tools Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases have been merged into one tool called Download Pathogen Reference Database.
Three tools have been renamed:
- Set Up Resistance Gene Database's new name is Set Up Gene Database.
- Download OTU Reference Database's new name is Download Amplicon-Based Reference Database.
- Format Reference Database's new name is Set Up Amplicon-Based Reference Database.

Improvements

The speed of searches for data elements with associations to specified metadata, from within a Result Metadata Table, has been greatly improved. To enable metadata related searches to work after upgrading to the Microbial Genomics Module 2.0, indices for the locations containing the relevant data will need to be rebuilt.
The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the "Allow creation of new OTUs" option, as the orientation of the new OTUs will not be inferred consistently.
When aggregating an abundance table, for example by class, a new column called "Class (Aggregated)" containing the class names is created. This name will be used in subsequent analysis outputs to avoid very long feature names in abundance tables and downstream analysis tools, e.g., heat maps.
The Set Up Microbial Reference Database tool now has an option to update the latin name of each sequence in a given sequence list with the content of the source annotation of the sequence.
The Set Up Microbial Reference Database tool now also recognizes "Latin name" as a special metadata column name, making it easier to set up custom databases with meaningful sequence names.
The Download Pathogen Reference Database tool now corrects corrupt latin names of sequences by replacing them with the content of the source annotation in the downloaded genbank files.
Axis in PCoA plots output from the Beta Diversity tool can now be replaced my metadata columns in order to make clustering correlated with specific metadata more visible.
The Differential Abundance Analysis tool now checks the input metadata and displays a warning directly in the wizard if singularities or linear dependencies are found.
Added a new column to the result metadata table, "Best match, average coverage", which will help identifying samples that have been sequenced with insufficient depth.

Bug fixes

Fixed a bug in abundance tables that caused read names to be appended to the aggregated taxonomy in rare cases when aggregating on higher phylogeny levels.

CLC Microbial Genomics Module 1.6.2

Released on March 06, 2017

Improvements

The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the "Allow creation of new OTUs" option, as the orientation of the new OTUs will not be inferred consistently.

Bug fixes

Fixed a serious bug that made all downloads on Windows machines with the Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases tools fail.
Fixed a bug in the Download MLST Schemes (PubMLST) tool that caused an error when starting the tool. This error emerged after PubMLST migrated to a new server.
Fixed a bug in the De Novo Assemble Metagenome tool that caused some contigs to be duplicated exactly.
Fixed a bug in the Alpha Diversity tool that sometimes caused a miscalculation caused by a numerical overflow when using Simpson's diversity index.
Fixed a bug that caused the Annotate CDS with Pfam Domains tool to not give an output when the input only had one CDS annotation.
Fixed a bug that caused some MLST schemes to throw an error when shown in a table view.
Fixed a bug that sometimes caused sunburst charts to hide high-abundance features in the 'Other' category. Sunburst charts now display the 100 most abundant features and group all other features into 'Other'.

CLC Microbial Genomics Module 1.6

Released on September 15, 2016

Updated for compatibility with CLC Genomics Workbench 9.5, Biomedical Genomics Workbench 3.5 and CLC Genomics Server 8.5.

CLC Microbial Genomics Module 1.5.1

Released on August 30, 2016

Bug fixes

Fixed a bug that caused the tool Find Best Matches using K-mer Spectra to fail in some cases when run against a single reference genome.
Fixed a bug preventing users to save the view settings of rarefaction plots.

CLC Microbial Genomics Module 1.5

Released on July 12, 2016

New features

With the new tool Download Pathogen Reference Databases, users can now easily download prebuilt reference databases for typing of the following pathogens:
- Salmonella enterica
- Listeria monocytogenes
- Escherichia coli and Shigella
- Campylobacter jejuni
- Acinetobacter baumannii
- Klebsiella pneumoniae
Custom reference databases for typing microbial isolates can be set up using the new tool Set Up Pathogen Reference Database.
Annotating references in existing reference databases with metadata is also enabled by the new tool Set Up Pathogen Reference Database.
Custom gene databases for antimicrobial resistance typing can be set up using the new tool Set Up Resistance Gene Database.
Functionality to check microbial isolate samples for contamination and low quality has been added to the tool Find Best Matches Using K-mer Spectra.
Statistical differential abundance analysis of taxonomic and functional entities across samples or groups of samples is enabled by the new tool Differential Abundance Analysis.
Hierarchical clustering of both samples and features in abundance tables produced by OTU clustering or whole metagenome functional analysis is enabled by the new tool Create Heat Map for Abundance Table.

Improvements

Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
In Stacked Bar Charts and Area Charts visualizations of abundance tables,
- samples can now be sorted according to their names or according to associated metadata.
- features (taxonomic or functional entities) can now be sorted according to their abundance or name.
- the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
- samples and groups of samples can now be renamed by clicking their names in the side panel.
In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
Alpha diversity plots now include a legend that can be set up based on the available metadata.
In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.

Bug fixes

Fixed a bug that caused the Build Functional Profile tool to run very slow on input located on a CLC Genomics Server.
Fixed a bug that caused check marks showing which references a sample had been mapped against to be reset in Result Metadata Tables when running the Re-map Samples to Specified Reference workflow or running the Use Genome as Result tool.
Fixed a bug in the Convert Abundance Table to Experiment tool that caused names and taxonomies in the resulting Experiment Tables to be meaningless when OTU Tables based on the SILVA database were used.
Fixed a bug that caused the tool Add to Result Metadata Table to fail frequently when run on a CLC Genomics Server.

Changes

The Type A Single Species workflow workflow has been renamed to Type a Known Species.
The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.

Retired tools

The Convert Abundance Table to Experiment tool is now marked as a legacy tool and will be removed from the module in the next feature release.

CLC Microbial Genomics Module 1.4

Released on July 12, 2016

New features

With the new tool Download Pathogen Reference Databases, users can now easily download prebuilt reference databases for typing of the following pathogens:
- Salmonella enterica
- Listeria monocytogenes
- Escherichia coli and Shigella
- Campylobacter jejuni
- Acinetobacter baumannii
- Klebsiella pneumoniae
Custom reference databases for typing microbial isolates can be set up using the new tool Set Up Pathogen Reference Database.
Annotating references in existing reference databases with metadata is also enabled by the new tool Set Up Pathogen Reference Database.
Custom gene databases for antimicrobial resistance typing can be set up using the new tool Set Up Resistance Gene Database.
Functionality to check microbial isolate samples for contamination and low quality has been added to the tool Find Best Matches Using K-mer Spectra.

Improvements

Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
In Stacked Bar Charts and Area Charts visualizations of abundance tables,
- samples can now be sorted according to their names or according to associated metadata.
- features (taxonomic or functional entities) can now be sorted according to their abundance or name.
- the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
- samples and groups of samples can now be renamed by clicking their names in the side panel.
In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
Alpha diversity plots now include a legend.
In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.

Bug fixes

Fixed a bug that caused the Build Functional Profile tool to run very slow on input located on a CLC Genomics Server.
Fixed a bug that caused check marks showing which references a sample had been mapped against to be reset in Result Metadata Tables when running the Re-map Samples to Specified Reference workflow or running the Use Genome as Result tool.
Fixed a bug in the Convert Abundance Table to Experiment tool that caused names and taxonomies in the resulting Experiment Tables to be meaningless when OTU Tables based on the SILVA database were used.
Fixed a bug that caused the tool Add to Result Metadata Table to fail frequently when run on a CLC Genomics Server.

Changes

The Type A Single Species workflow workflow has been renamed to Type a Known Species.
The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.

Retired tools

The Convert Abundance Table to Experiment tool is now marked as a legacy tool and will be removed from the module in the next feature release.

CLC Microbial Genomics Module 1.3.1

Released on May 10, 2016

Bug fixes

Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.

Improvements

Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
Added GPU requirements to the System Requirements for viewing PCoA 3D plots.

CLC Microbial Genomics Module 1.3

Released on March 31, 2016

Bug fixes

Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.

Improvements

Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.
Improved error messages and warnings in the wizard for the Build Functional Profile tool.

CLC Microbial Genomics Module 1.2.2

Released on May 10, 2016

Bug fixes

Added a report output to the Add to Result Metadata Table tool. Please make sure to add this output to all workflows you run on a CLC Genomics Server setup to make them run through without errors.
Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.

Improvements

Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
Added GPU requirements to the System Requirements for viewing PCoA 3D plots.

CLC Microbial Genomics Module 1.2.1

Released on March 31, 2016

Bug fixes

Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.

Improvements

Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.

CLC Microbial Genomics Module 1.2

Released on February 29, 2016

New features

Functional profiling of whole metagenome datasets based on Pfam domains, GO terms and BLAST hits
Whole metagenome de novo assembler
Annotation of CDS with Pfam domains and GO terms
Annotation of CDS with Best BLAST hits using predefined or custom databases

Improvements

Swapped the Trim Sequences tool and the Optional Merge Paired Reads tool in the Data QC and OTU Clustering ready-to-use workflow in order to merge more identical amplicon reads. This may result in different results in some analysis.
Improved the tolerance of the Download Bacteria Genomes from NCBI tool towards unstable FTP connections with NCBI.
Enabled graphical export of Bar Chart, Area Chart, Sunburst Chart and PCoA Chart vizualisation of abundance tables.
Added legends to Bar Chart and Area Chart vizualisations of abundance tables.
Improved the speed and compute ressource requirements of the OTU Clustering tool.
The OTU Clustering tool now reverse-complements reference OTUs when most reads map in the reverse strand.
Improved the length of the trimmed reads output by the Fixed Length Trimming tool on datasets with a large read length standard deviation.
The OTU Clustering tool now produces a summary report that can be used to evaluate the quality of the input data and the OTU clustering.
The Optional Merge Paired Reads tool now produces a summary report.
The Fixed Length Trimming tool now produces a summary report.
Activated links to the manual from ready-to-use workflow wizards.
Updated the UNITE database that is downloaded by the Download OTU Reference Database to the latest version

Bug fixes

Adapted the Download Bacteria Genomes from NCBI tool to a new structure of the NCBI ftp site.
Fixed a bug in the Fixed Length Trimming tool that caused a wrong automatic length calculation when run on inputs with a very large number of reads.
Fixed a bug in the Fixed Length Trimming tool, the Optional Merge Paired Reads tool and the Filter Samples Based on Number of Reads tool that caused the history entries of output from these tools to be inconsistent.

Changes

Placed all tools in the Microbial Genomics Module into a single folder in the toolbox with subfolders ‘OTU Clustering’, ‘Typing and Epidemiology’, ‘Whole Metagenome Analysis’ and ‘General Tools’.

CLC Microbial Genomics Module 1.1

Released on October 15, 2015

New features

Determination of MLST for NGS samples
Identification of antimicrobial resistance genes
Construction of SNP trees from NGS reads
SNP tree variants differentiating between two sub-trees can be displayed easily
Construction of K-mer trees from genomes and NGS samples
Access sample metadata and analysis results in a table
Metadata is automatically transferred to SNP trees and K-mer trees
Three template workflows provided for routine typing

Improvements

Added help buttons in all editors
The Format Reference Database tool was improved to handle malformed input better
Improved parameter descriptions and mouse-over texts in several places

Bug fixes

Fixed a bug preventing usage of metadata with only 2 values in the Permanova and Convert to Experiment wizards
Fixed a bug that caused all csv-files imported to the workbench to be imported as OTU abundance tables. Chimera crossover cost parameter in OTU clustering now only takes integer values
Added a check to prevent the user from running “Reference based OTU clustering” without a “OTU database”

Changes

The Estimate Alpha and Beta Diversities workflow no longer outputs an alignment as it was not of any use for the user.