QIAGEN powered by

CLC Microbial Genomics Module latest improvements

CLC Microbial Genomics Module 22.1

Released on June 15, 2022

New features

A new set of tools for the prediction of biochemical pathways from functional abundance tables and differential abundance tables is now available. This set of tools is comprised of:


The pathway database and the pathway calls can be visualized as simple pathway graphs to put the EC terms into the context of their biochemical compounds. In a pathway calling result, the widths and hues of EC terms can be adjusted to display the abundance or fold changes in the abundance, p-values and max group means.

Improvements, Changes and Bug fixes 

Annotate with BLAST and Annotate with DIAMOND

Download Custom Microbial Reference Database

MLST

Identify Viral Integration Sites

Various minor improvements

CLC Microbial Genomics Module 22.0.1

Released on 14.03.2022

Bug fixes

 

CLC Microbial Genomics Module 22.0

Released on January 11, 2022

Updated to be compatible with CLC Genomics Workbench 22 and CLC Genomics Server 22.

Improvements

Bug fixes

Changes

 

CLC Microbial Genomics Module 21.1.1

Released on January 11, 2022

Improvements

Bug fixes

 

CLC Microbial Genomics Module 21.1

Released on June 28, 2021

New features

New features improving viral analysis capabilities

New features improving amplicon based metagenomics capabilities

Other new features

Improvements, Changes and Bug fixes

Workflows

Split Sequence List

Create Annotated Sequence List

Bin Pangenomes by Sequence

Create SNP Tree

Annotate with BLAST/Annotate with DIAMOND

Other updates

Functionality retirement

 

CLC Microbial Genomics Module 21.0

Released on January 12, 2021

New features

New features improving functional annotation and exploration capabilities
New tools for improved handling of sequence lists

Improvements, Changes and Bug fixes

OTU Taxonomy annotations

The OTU Taxonomy field on sequence lists has been removed and replaced by the more general Taxonomy field. This changes the following behavior of OTU clustering results:

Bin Pangenomes by Taxonomy
Create Large MLST Scheme
Other updates

Other Changes

Functionality retirement

Tools

The above have been replaced by the new, general purpose Create Annotated Sequence List tool.

 

CLC Microbial Genomics Module 20.1.1

Released on October 22, 2020

Improvements and bug fixes

Taxonomic Profiling
Other tools

Fixed an issue where Download MLST Schemes (PubMLST) stopped working due to a recent change in the XML format used by PubMLST.

Fixed an issue where Bin Pangenomes By Taxonomy would fail when using a Taxonomic Profiling index containing entries without taxonomies.

 

CLC Microbial Genomics Module 20.1

Released on June 23, 2020

New Features

Large MLST tools, workflows and resources

All tools for creating and using Large MLST Schemes are now out of beta.

A tutorial called Working with large MLST schemes is now available, covering using Large MLST Scheme tools with a focus on scheme creation, modification, and isolate typing.

A workflow Create Large MLST Scheme with Sequence Types is now provided for creating high-quality schemes with sequence types. This workflow includes the Create Large MLST Scheme tool, which has been substantially revised for this release, including supporting the creation of schemes from organisms with spliced genes.

Type with Large MLST Scheme

Add Typing Results to Large MLST Scheme

Create Large MLST Scheme

This tool has been substantially revised for this release. Among other changes, it now supports the creation of schemes from organisms with spliced genes. Please refer to the manual for usage details.

Below is a list of concerns addressed in this update:

Download Large MLST Scheme

Working with large MLST schemes

Other new tools

Two new tools for annotating nucleotide sequences, such as de novo contigs or genomes, from a candidate set of reference sequences are availalble under the Functional Analysis folder:

Extend Result Metadata Table This replaces the Add To Result Metadata Table tool, providing similar functionality, but in a form that can be included in workflows intended for use on QIAGEN CLC Genomics Cloud Engine.

Improvements, Changes and Bug fixes

Taxonomic Profiling can be used now analyze metagenomic data produced with long read technologies.

The Type a Known Species, Type among Multiple Species and Map to Specified Reference workflows can now be run on the QIAGEN CLC Genomics Cloud Engine. Changes introduced to support this were the inclusion of control flow elements in the workflow design and the inclusion of the new Extend Result Metadata Table tool.

The Create Kmer Tree and Create SNP Tree tools can now handle input data that is associated with several metadata tables. The use of the Result Metadata Table is now optional.

The removal and reporting of duplicate sequences by Create Taxonomic Profiling Index has been improved.

Assembly grouping options have been added to Find Prokaryotic Genes, allowing the grouping of input sequences to be specified.

 

Download Microbial Reference Database

ARES AMR Database

Legacy tools and workflows

The following tools and workflows have been moved to the Legacy folder of the Workbench Toolbox, with “(legacy”) appended to their original names. They will be removed in a future version of the software.

 

CLC Microbial Genomics Module 20.0

Released on December 11, 2019

New Features

Five new tools are available for working with core genome (cg) and whole genome (wg) MLST  schemes, three tools to create MLST schemes :

For typing and extending schemes, two new tools are available

The MLST schemes feature minimum spanning tree and heat map visualizations which are synchronized with the typing results, facilitating the analysis of typing results in relation to the scheme it has been typed with.

Improvements, Changes and Bugfixes

Resistance Detection Tools and Databases

Alpha Diversity

Bin Pangenomes by Taxonomy

​Bin Pangenomes by Sequence

DIAMOND

The Typing and Epidemiology tools are no longer beta-status tools.

The QIAseq 16S/ITS Demultiplexer tool has been updated with new barcodes to support the latest QIAGEN QIAseq 16S/ITS Region Panels kit.

For the Differential Abundance Analysis tool, the order of comparisons in an “Across Groups” analysis has been changed to match the sign of the fold change of the “All group pairs” or “Against control group” analyses.

CLC Microbial Genomics Module 4.8

Released on September 19, 2019

Resistance analysis updates:New and updated databases are available for download via the Download Resistance Database tool:

Find Resistance with ShortBRED

When using the Find Resistance with Nucleotide DB tool with the Virulence Factor Database and using Add To Result Metadata Table, the corresponding column names have been adjusted to ‘Virulence found‘ instead of ‘Resistance Found‘.

new tutorial for profiling antimicrobial resistance genes in isolate and metagenomic samples of NGS reads is provided.

Additional bugfixes and improvements:

CLC Microbial Genomics Module 4.5

Release on June 27, 2019>

New Features

Five new resistance databases are now accessible through the Download Resistance Database tool:

Improvements, Changes and Bugfixes

Find Resistance with ShortBRED

Alpha diversity

Beta diversity

Find Prokaryotic Genes

Download Protein Database

Bin Pangenomes by Sequence

Download Microbial Reference Database

Additional improvements

CLC Microbial Genomics Module 4.1

Release on January 31, 2019

Metagenomics – Amplicon-Based Analysis

  • The OTU Clustering tool has a new option for specifying if non-merged paired-end reads should be included in the analysis. This option is off (unchecked) by default, as including only merged reads improves analysis run time. The Data QC and OTU Clustering workflow now also includes only merged reads in the OTU clustering analysis step. To run the workflow with all reads, a copy of the workflow must be created and this option enabled in that copy.
  • The “Similarity Percentage” parameter can now be adjusted when launching the Data QC and OTU clustering workflow.
  • Fixed a bug where action buttons underneath tables would not be accessible if the table view was too narrow.

Metagenomics – Taxonomic Analysis

Metagenomics – Functional Analysis

  • The Build Functional Profile tool can now output a DIAMOND hits functional profile.
  • Fixed a bug in the Find Prokaryotic Genes tool that affected genes spanning the origin of circular chromosomes, which would have the annotated CDS region spanning the whole circular chromosome.
  • Fixed a bug that would cause the tool Annotate CDS with Best Diamond Hit to stall when running Diamond in ‘sensitive’ or ‘more sensitive’ mode in a Gx Workbench running on Windows 10.

Metagenomics – Drug Resistance Analysis

  • Fixed a bug with the tool Use Genome as Result – and the workflow using the tool called Map to Specified Reference – when the genome name contains a colon ” : “.
  • Fixed a problem where the Download MLST Schemes (PubMLST) tool did not format the MLST schemes properly resulting in non-conclusive MLST assignments when using the downloaded schemes for typing.

CLC Microbial Genomics Module 4.0

Released on November 28, 2018

New tools for Metagenomics

New features and improvements: Functional Analysis

  • The tool Annotate CDS with Best DIAMOND Hit has new options to run in standard, sensitive and more sensitive modes.
  • We improved the accuracy of the BLAST search in the Annotate CDS with Best BLAST Hit tool.
  • Improved the Sunburst plot to allow graphical export with the legend.
  • Three vector formats (.ps, .eps, .svg) have been added to the export sunburst dialog.
  • Stacked bar charts now also show the relative abundance when hovering over the chart.

Improvements for Databases

Bug fixes

  • The QIAseq 16S/ITS Demultiplexer now removes extra leading and trailing spaces from user-defined barcodes.
  • Changed the output names of the QIAseq 16S/ITS Demultiplexer tool to follow the “sample_region” format, since previous format (region_sample) would cause sample names to be removed in OTU abundance tables.
  • Fixed an issue with the  Annotate CDS with Best DIAMOND Hit tool where it fails with “RC = 132” on older Mac computers (pre 2014).

CLC Microbial Genomics Module 3.6.1

Released on October 10, 2018

Bug fixes

CLC Microbial Genomics Module 3.6

Released on September 13, 2018

Improvements

  • It is now possible to import a custom MLST profile using the Create MLST Scheme tool.
  • In the Add NGS MLST Report to Scheme tool it is now possible to add more than one report, and therefore more than one sequence type, to a scheme at a time.
  • Warning messages in Add NGS MLST Reports to Scheme and Merge MLST Schemes now appear when the specified report/schemes to add/merge are incompatible.
  • The protein accession ID links in the DIAMOND result table now point to UniProtKB instead of NCBI.
  • The QIAseq 16S/ITS Demultiplexer tool now adds region information to the read group in the element info output. Thus the OTU Clustering tool adds region information as metadata in the abundance table to allow data aggregation based on this metadata category.
  • In Abundance tables, headers of the columns displaying abundances for each sample have been reverted to show the sample name first. This improves clarity when showing an Abundance table with multiple samples.

Bug fixes

  • Fixed a bug in Add Sequence to MLST tool, where the steps defining the sequences to be added were not updated after changing the specified MLST scheme.
  • Fixed a bug causing the Find Prokaryotic Genes tool to fail when a large number of sequences are provided as input.
  • Fixed a bug causing the parameter validation of the QIAseq 16S/ITS Demultiplexer tool to fail when it is included in a workflow.

CLC Microbial Genomics Module 3.5

Released on June 28, 2018

New tools

  • Annotate CDS with Best DIAMOND Hit – an efficient alternative to Annotate CDS with Best BLAST Hit allowing the annotation of large data sets, even on desktop machines.
  • Download Protein Database – five protein databases are available to download using this tool: COG, SwissPROT, UniRef-50, UniRef-90, and UniRef-100
  • Find Prokaryotic Genes (beta) – a tool for identifying and annotating prokaryotic genome or contig sequences with predicted gene and CDS regions.
  • QIAseq 16S/ITS Demultiplexer– a tool for demultiplexing reads generated using QIAseq 16S/ITS Screening and Region panels.

Improvements

  • Abundances tables have now the following buttons:
    • Create Abundance Subtable replaces Create Abundance Table from Selection and will create a table from selected rows.
    • Create Sequence Sublist (available for OTU abundance tables only) will create a sequence list from selected rows.
    • Create Normalized Abundance subtable will create a table normalized on a single row for which all abundance values are non zero.
  • The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now create a copy of the input instead of modifying it.
  • The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now optionally outputs a table summarizing information about the annotations added to the sequence list.
  • The Create Microbial Reference Database now includes an option to use a QIAGEN compiled set of Genbank assembly IDs pre-selected to represent the full NCBI list of genomes. The optimized database is particularly well-suited for running the Taxonomic Profiling tool on a laptop computer with 16GB of RAM.
  • The Taxonomic Profiling tool now qualifies reference genomes automatically without hard thresholds for minimum number of reads or minimum coverage, exploring the potential mapping positions more exhaustively.
  • The Taxonomic Profiling tool has a new option called “Minimum seed length” that allows users to define the desired balance between precision (higher length) and recall (lower length).
  • In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.

Changes

  • In workflows, the PERMANOVA Analysis and Convert Abundance Table to Experiment tools no longer accept as input abundance tables generated by tools within the same workflow. Abundance tables must now exist prior to launching any workflow containing either of these tools. Existing workflows where either of these tools is configured to take in abundance tables generated by other tools in the same workflow will need to be re-designed.
  • The folder ‘Amplicon-Based OTU Clustering’ has been renamed to ‘Amplicon-Based Analysis’.
  • In the Databases folder, the ‘Taxonomic Profiling’ subfolder was renamed to ‘Taxonomic Analysis’.

Bug fixes

  • Fixed a bug that caused the ID column to display incorrect information on aggregated Abundance Tables.
  • Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the “Fuzzy match duplicates” option enabled.
  • Fixed an issue that would affect the OTU Clustering report when run with the option “Allow creation of new OTUs” disabled: “Total predicted OTUs” and “De novo OTUs” are now showing correct values. More specifically, the “Total predicted OTUs” would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the “De novo OTUs” value, which is computed as the difference between the “Total predicted OTUs” and the “OTUs based on database” values.
  • Fixed a bug that would happen in the rare cases where identical subsequences (contigs) with different taxonomies were found in a database for the Taxonomic Profiling tool. The taxonomy of the identical contigs are now set to the lowest common ancestor.

CLC Microbial Genomics Module 3.0.1

Released on May 15, 2018

Improvements

Bug fixes

  • Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
  • Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
  • Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
  • Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the “Metadata factor” parameter was not retained when configuring the tool in a workflow.

CLC Microbial Genomics Module 3.0

Released on November 21, 2017

New features

  • The Create SNP Tree tool can now output a new SNP Matrix that contains a pairwise comparison of SNP differences between any pair of all samples included in the analysis.
    • The matrix supports coloring of individual table cells for easy identification of related strains.
    • It is possible to highlight samples with less SNP differences than an adjustable threshold.
  • A new Multi-VCF format in the Export menu renders possible to export multiple samples’ variant tracks into one VCF file, provided that they have the same reference genome.
  • A new option in the Data section of Abundance Table Settings side panel allows for hiding entries with incomplete taxonomy for the taxonomic level chosen to aggregate the data.

Changes

  • Updated the Alpha Diversity tool to being able to handle a lower detection limit per feature in an abundance table.
  • The optional output of a Distance Matrix from the Beta Diversity tool is changed from being a simple table object to now being a SNP Matrix object.

Improvements

  • The Taxonomic Profiling tool has been improved, allowing higher detection rates at an equivalent level of false positives.
  • The Taxonomic Profiling tool can be configured by the users according to two new options: the minimum number of reads, and minimum coverage criteria necessary for the read to be assigned.
  • The Differential Abundance Analysis tool has been updated such that:
    • It has an extra option for the comparison of all groups against one specific group within a metadata factor.
    • It can perform an ANOVA-like comparison.
  • The Create SNP Tree tool now also supports construction of Maximum Likelihood phylogenies:
    • Users can choose whether to run a Neighbor-Joining algorithm or a Maximum Likelihood algorithm.
    • Users can optionally output an alignment of the concatenated SNPs that are used in the construction of SNP tree.
  • Trees produced with the Create SNP Tree and Create K-mer Tree tools are now multifurcating.

Bug fixes

  • Fixed a bug that caused bacterial assemblies of type “acidobacteria” and viral assemblies of type “dsDNA viruses, no RNA stage” to not be shown by the Create Microbial Reference Database tool.
  • Fixed a bug causing the annotation columns “Assembly ID” and “FTP Path” to disappear in sequence lists downloaded with the Create Microbial Reference Database tool.
  • Updated the manual to be more specific about downloading viruses from NCBI with the Create Microbial Reference Database tool.
  • Fixed a bug that cause Create Microbial Reference Database tool to not download taxonomies for all entries in some cases.
  • Fixed a bug caused by NCBI renaming a column in one of their files and leading the Download Pathogen Reference Database tool to fail.
  • Renamed the “Set of species” option in Download Pathogen Reference Database to “By Kingdom/Domain”.
  • Fixed a bug in the OTU Clustering tool causing the Merge Paired Reads Report to not be output when the input contains both merged and non-merged sequence lists.
  • Fixed a bug in Align OTUs with MUSCLE that would cause the tool incorrectly select the most abundant in some cases.
  • The Differential Abundance Analysis now accepts metadata groups with only one replicate.
  • Added a popup menu allowing to select and deselect all samples in Stack and Sunburst visualization of abundance tables.
  • Upgraded the Neighbor Joining algorithm in the Create SNP Tree tool to use less memory.
  • Updated the Create SNP Tree and Create K-mer Tree tools so that trees with negative branch length are not allowed.
  • Fixed an issue with the Biom importer when run through the Cosmos ID plugin.
  • Updated manual with special system requirements.

CLC Microbial Genomics Module 2.5.5

Released on October 10, 2018

Bug fixes

CLC Microbial Genomics Module 2.5.4

Released on June 28, 2018

Improvements

  • In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.
  • OTU abundances tables have now a Create Sequence List from Selection that will create a sequence list from selected rows.

Bug fixes

  • Fixed a bug that caused the ID column to display incorrect data on aggregated Abundance Tables.
  • Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the “Fuzzy match duplicates” option enabled.
  • Fixed an issue that would affect the OTU Clustering report when run with the option “Allow creation of new OTUs” disabled: “Total predicted OTUs” and “De novo OTUs” are now showing correct values. More specifically, the “Total predicted OTUs” would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the “De novo OTUs” value, which is computed as the difference between the “Total predicted OTUs” and the “OTUs based on database” values.

CLC Microbial Genomics Module 2.5.3

Released on May 15, 2018

Improvements

Bug fixes

  • Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
  • Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
  • Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
  • Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the “Metadata factor” parameter was not retained when configuring the tool in a workflow.

CLC Microbial Genomics Module 2.5.2

Released on December 5, 2017

Bug fixes

  • Fixed a bug that caused bacterial assemblies of type “acidobacteria” and viral assemblies of type “dsDNA viruses, no RNA stage” to not be shown by the Create Microbial Reference Database tool.
  • Fixed a bug causing the annotation columns “Assembly ID” and “FTP Path” to disappear in sequence lists downloaded with the Create Microbial Reference Database tool.
  • Updated the manual to be more specific about downloading viruses from NCBI with the Create Microbial Reference Database tool.
  • Fixed a bug that cause Create Microbial Reference Database tool to not download taxonomies for all entries in some cases.
  • Fixed a bug caused by NCBI renaming a column in one of their files and leading the Download Pathogen Reference Database tool to fail.
  • Fixed a bug in Align OTUs with MUSCLE that would cause the tool incorrectly select the most abundant in some cases.
  • Fixed an issue with the Biom importer when run through the Cosmos ID plugin.
  • Updated manual with special system requirements.

CLC Microbial Genomics Module 2.5.1

Released on September 11, 2017

Bug fixes

  • Fixed an issue in the Create Microbial Reference Database tool that led to incorrect taxonomies being assigned when “Viruses” was selected in the “Select NCBI sources” section of the wizard.
  • Fixed a bug that caused the OTU clustering tool to fail in rare cases.

CLC Microbial Genomics Module 2.5

Released on August 16, 2017

New features

  • New import and export feature of abundance tables in the biological observation matrix (biom) file format. This allows users to share and use their data with analysis tools from CosmosID, or to visualize an abundance table from CosmosID using the MGM tools:
    • The new importer supports version 1.0 and 2.1 of the biom file format.
    • The new exporter supports version 2.1 of the biom file format.
  • The manual section about the Taxonomic Profiling tool has been updated to reflect the current intended use of the tool.

Changes

  • The tools Optional Merge Paired Reads and Fixed Length Trimming have been moved to the Legacy Tools folder of the toolbox as they are no longer needed for the OTU Clustering tool. They will be completely removed in a future release of the software.
  • The Optional Merge Paired Reads and Fixed Length Trimming steps have been removed from the Data QC and OTU Clustering workflow because the OTU Clustering tool can now merge paired reads and does not require fixed-length sequences as input.
  • The Taxonomic Profiling tool now allow the user to optionally “Estimate paired end distances” as a pre-processing step, and its performance has been improved.

Improvements

  • The OTU Clustering tool can now also handle fungal Internal Transcribed Spacer (ITS) amplicon sequences:
    • The algorithm have been improved to handle variable length data like fungal ITS sequences, which makes the Fixed Length Trimming tool redundant.
    • The OTU Clustering tool now handles OTUs with reads mapping in both forward and backward orientation for taxonomic assignment. This kind of mixed orientation data now also works with the “Allow creation of new OTUs” option enabled.
    • After loading the read sequences, the tool now attempts to merge any overlapping paired-end reads, thus making the Optional Merge Paired Reads tool redundant. The parameters for the alignment of reads are now part of the “OTU Clustering” wizard. OTU clustering is performed on all reads, i.e., both reads that are merged and reads that could not be merged.
    • The tool can process both paired-end and single-end data files at the same time.
  • The Taxonomic Profiling reference database index management has been improved, in that it includes messages/warnings in the wizard about indexing, and generates a new CLC folder called “CLC_MgmReferenceCache” designated for the storage of index files.
  • The Download Database for Find Resistance tool has been updated to point to the newest version of the database.

Bug fixes

  • Fixed a bug that caused the “Create Abundance Table from Selection” button to fail due to duplicated names while aggregating on taxonomy.
  • Fixed a bug that caused the Data QC and Clean Host DNA, Data QC and Taxonomic Profiling, Type a Known Species, and Type Among Multiple Species workflows to not run on CLC Genomics Server without the Biomedical extension enabled.
  • Fixed a bug that caused Add Metadata to Abundance Table to throw a NullPointerException when opening Excel files with empty cells.
  • Fixed a bug that caused the Create SNP Tree tool to fail when analyzing read mappings whose genomes are comparable but have chromosomes in a different order.
  • Fixed a bug that caused the Find Resistance tool to not report all BLAST hits when the gene database contains more than 250 genes.
  • Fixed a bug causing Stacked Charts to throw an out of bounds exception when changing from “Bar Chart” to “Area Chart”.
  • Fixed a bug that made the Create Microbial Reference Database tool crash when filtering sorting and aggregating a selection table.
  • Fixed a bug causing the “File with accession number” option in the Create Microbial reference database tool to be without effect.
  • Minor bug fixes

CLC Microbial Genomics Module 2.0

Released on March 2nd, 2017

New features

  • New tool for Taxonomic Profiling of whole metagenome shotgun sequencing datasets.
    • All existing visualizations (stacked bar charts, stacked area charts, sunburst charts and heat maps) have been updated to work with the output from this tool.
    • All existing abundance analysis tools (Alpha Diversity, Beta Diversity, PERMANOVA Analysis and Differential Abundance Analysis) have been updated to work with the output from this tool.
  • Three new workflows for host DNA removal, taxonomic profiling and downstream analysis of whole metagenome shotgun sequencing datasets:
  • New tool for easily creating custom microbial reference genome databases for use in taxonomic profiling and microbial isolate typing: Create Microbial Reference Database.

Changes

  • The plugin Toolbox has been largely restructured in order to make it more intuitive to navigate. Microbiome analysis tools are now categorized into four folders: Amplicon-based OTU Clustering, Taxonomic Analysis, Functional Analysis, and Abundance Analysis. All database management tools have been collected in the top-level folder Databases.
  • The two tools Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases have been merged into one tool called Download Pathogen Reference Database.
  • Three tools have been renamed:

Improvements

  • The speed of searches for data elements with associations to specified metadata, from within a Result Metadata Table, has been greatly improved. To enable metadata related searches to work after upgrading to the Microbial Genomics Module 2.0, indices for the locations containing the relevant data will need to be rebuilt.
  • The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the “Allow creation of new OTUs” option, as the orientation of the new OTUs will not be inferred consistently.
  • When aggregating an abundance table, for example by class, a new column called “Class (Aggregated)” containing the class names is created. This name will be used in subsequent analysis outputs to avoid very long feature names in abundance tables and downstream analysis tools, e.g., heat maps.
  • The Set Up Microbial Reference Database tool now has an option to update the latin name of each sequence in a given sequence list with the content of the source annotation of the sequence.
  • The Set Up Microbial Reference Database tool now also recognizes “Latin name” as a special metadata column name, making it easier to set up custom databases with meaningful sequence names.
  • The Download Pathogen Reference Database tool now corrects corrupt latin names of sequences by replacing them with the content of the source annotation in the downloaded genbank files.
  • Axis in PCoA plots output from the Beta Diversity tool can now be replaced my metadata columns in order to make clustering correlated with specific metadata more visible.
  • The Differential Abundance Analysis tool now checks the input metadata and displays a warning directly in the wizard if singularities or linear dependencies are found.
  • Added a new column to the result metadata table, “Best match, average coverage”, which will help identifying samples that have been sequenced with insufficient depth.

Bug fixes

  • Fixed a bug in abundance tables that caused read names to be appended to the aggregated taxonomy in rare cases when aggregating on higher phylogeny levels.

CLC Microbial Genomics Module 1.6.2

Released on March 06, 2017

Improvements

  • The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the “Allow creation of new OTUs” option, as the orientation of the new OTUs will not be inferred consistently.

Bug fixes

  • Fixed a serious bug that made all downloads on Windows machines with the Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases tools fail.
  • Fixed a bug in the Download MLST Schemes (PubMLST) tool that caused an error when starting the tool. This error emerged after PubMLST migrated to a new server.
  • Fixed a bug in the De Novo Assemble Metagenome tool that caused some contigs to be duplicated exactly.
  • Fixed a bug in the Alpha Diversity tool that sometimes caused a miscalculation caused by a numerical overflow when using Simpson’s diversity index.
  • Fixed a bug that caused the Annotate CDS with Pfam Domains tool to not give an output when the input only had one CDS annotation.
  • Fixed a bug that caused some MLST schemes to throw an error when shown in a table view.
  • Fixed a bug that sometimes caused sunburst charts to hide high-abundance features in the ‘Other’ category. Sunburst charts now display the 100 most abundant features and group all other features into ‘Other’.

CLC Microbial Genomics Module 1.6

Released on September 15, 2016

Updated for compatibility with CLC Genomics Workbench 9.5, Biomedical Genomics Workbench 3.5 and CLC Genomics Server 8.5.

CLC Microbial Genomics Module 1.5.1

Released on August 30, 2016

Bug fixes

  • Fixed a bug that caused the tool Find Best Matches using K-mer Spectra to fail in some cases when run against a single reference genome.
  • Fixed a bug preventing users to save the view settings of rarefaction plots.

CLC Microbial Genomics Module 1.5

Released on July 12, 2016

New features

  • With the new tool Download Pathogen Reference Databases, users can now easily download prebuilt reference databases for typing of the following pathogens:
    • Salmonella enterica
    • Listeria monocytogenes
    • Escherichia coli and Shigella
    • Campylobacter jejuni
    • Acinetobacter baumannii
    • Klebsiella pneumoniae
  • Custom reference databases for typing microbial isolates can be set up using the new tool Set Up Pathogen Reference Database.
  • Annotating references in existing reference databases with metadata is also enabled by the new tool Set Up Pathogen Reference Database.
  • Custom gene databases for antimicrobial resistance typing can be set up using the new tool Set Up Resistance Gene Database.
  • Functionality to check microbial isolate samples for contamination and low quality has been added to the tool Find Best Matches Using K-mer Spectra.
  • Statistical differential abundance analysis of taxonomic and functional entities across samples or groups of samples is enabled by the new tool Differential Abundance Analysis.
  • Hierarchical clustering of both samples and features in abundance tables produced by OTU clustering or whole metagenome functional analysis is enabled by the new tool Create Heat Map for Abundance Table.

Improvements

  • Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
  • The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
  • The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
  • The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
  • The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
  • The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
  • In Stacked Bar Charts and Area Charts visualizations of abundance tables,
    • samples can now be sorted according to their names or according to associated metadata.
    • features (taxonomic or functional entities) can now be sorted according to their abundance or name.
    • the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
    • samples and groups of samples can now be renamed by clicking their names in the side panel.
  • In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
  • In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
  • Alpha diversity plots now include a legend that can be set up based on the available metadata.
  • In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
  • When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.

Bug fixes

Changes

  • The Type A Single Species workflow workflow has been renamed to Type a Known Species.
  • The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
  • The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
  • The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
  • All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
  • The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.

Retired tools

CLC Microbial Genomics Module 1.4

Released on July 12, 2016

New features

Improvements

  • Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
  • The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
  • The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
  • The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
  • The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
  • The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
  • In Stacked Bar Charts and Area Charts visualizations of abundance tables,
    • samples can now be sorted according to their names or according to associated metadata.
    • features (taxonomic or functional entities) can now be sorted according to their abundance or name.
    • the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
    • samples and groups of samples can now be renamed by clicking their names in the side panel.
  • In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
  • In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
  • Alpha diversity plots now include a legend.
  • In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
  • When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.

Bug fixes

Changes

  • The Type A Single Species workflow workflow has been renamed to Type a Known Species.
  • The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
  • The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
  • The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
  • All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
  • The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.

Retired tools

CLC Microbial Genomics Module 1.3.1

Released on May 10, 2016

Bug fixes

  • Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
  • Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.

Improvements

  • Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
  • Added GPU requirements to the System Requirements for viewing PCoA 3D plots.

CLC Microbial Genomics Module 1.3

Released on March 31, 2016

Bug fixes

  • Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.

Improvements

  • Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
  • The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.
  • Improved error messages and warnings in the wizard for the Build Functional Profile tool.

CLC Microbial Genomics Module 1.2.2

Released on May 10, 2016

Bug fixes

  • Added a report output to the Add to Result Metadata Table tool. Please make sure to add this output to all workflows you run on a CLC Genomics Server setup to make them run through without errors.
  • Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
  • Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.

Improvements

  • Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
  • Added GPU requirements to the System Requirements for viewing PCoA 3D plots.

CLC Microbial Genomics Module 1.2.1

Released on March 31, 2016

Bug fixes

  • Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.

Improvements

  • Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
  • The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.

CLC Microbial Genomics Module 1.2

Released on February 29, 2016

New features

  • Functional profiling of whole metagenome datasets based on Pfam domains, GO terms and BLAST hits
  • Whole metagenome de novo assembler
  • Annotation of CDS with Pfam domains and GO terms
  • Annotation of CDS with Best BLAST hits using predefined or custom databases

Improvements

  • Swapped the Trim Sequences tool and the Optional Merge Paired Reads tool in the Data QC and OTU Clustering ready-to-use workflow in order to merge more identical amplicon reads. This may result in different results in some analysis.
  • Improved the tolerance of the Download Bacteria Genomes from NCBI tool towards unstable FTP connections with NCBI.
  • Enabled graphical export of Bar Chart, Area Chart, Sunburst Chart and PCoA Chart vizualisation of abundance tables.
  • Added legends to Bar Chart and Area Chart vizualisations of abundance tables.
  • Improved the speed and compute ressource requirements of the OTU Clustering tool.
  • The OTU Clustering tool now reverse-complements reference OTUs when most reads map in the reverse strand.
  • Improved the length of the trimmed reads output by the Fixed Length Trimming tool on datasets with a large read length standard deviation.
  • The OTU Clustering tool now produces a summary report that can be used to evaluate the quality of the input data and the OTU clustering.
  • The Optional Merge Paired Reads tool now produces a summary report.
  • The Fixed Length Trimming tool now produces a summary report.
  • Activated links to the manual from ready-to-use workflow wizards.
  • Updated the UNITE database that is downloaded by the Download OTU Reference Database to the latest version

Bug fixes

  • Adapted the Download Bacteria Genomes from NCBI tool to a new structure of the NCBI ftp site.
  • Fixed a bug in the Fixed Length Trimming tool that caused a wrong automatic length calculation when run on inputs with a very large number of reads.
  • Fixed a bug in the Fixed Length Trimming tool, the Optional Merge Paired Reads tool and the Filter Samples Based on Number of Reads tool that caused the history entries of output from these tools to be inconsistent.

Changes

  • Placed all tools in the Microbial Genomics Module into a single folder in the toolbox with subfolders ‘OTU Clustering’, ‘Typing and Epidemiology’, ‘Whole Metagenome Analysis’ and ‘General Tools’.

CLC Microbial Genomics Module 1.1

Released on October 15, 2015

New features

  • Determination of MLST for NGS samples
  • Identification of antimicrobial resistance genes
  • Construction of SNP trees from NGS reads
  • SNP tree variants differentiating between two sub-trees can be displayed easily
  • Construction of K-mer trees from genomes and NGS samples
  • Access sample metadata and analysis results in a table
  • Metadata is automatically transferred to SNP trees and K-mer trees
  • Three template workflows provided for routine typing

Improvements

  • Added help buttons in all editors
  • The Format Reference Database tool was improved to handle malformed input better
  • Improved parameter descriptions and mouse-over texts in several places

Bug fixes

  • Fixed a bug preventing usage of metadata with only 2 values in the Permanova and Convert to Experiment wizards
  • Fixed a bug that caused all csv-files imported to the workbench to be imported as OTU abundance tables. Chimera crossover cost parameter in OTU clustering now only takes integer values
  • Added a check to prevent the user from running “Reference based OTU clustering” without a “OTU database”

Changes

  • The Estimate Alpha and Beta Diversities workflow no longer outputs an alignment as it was not of any use for the user.