CLC Microbial Genomics Module latest improvements
CLC Microbial Genomics Module 22.1
Released on June 15, 2022
New features
A new set of tools for the prediction of biochemical pathways from functional abundance tables and differential abundance tables is now available. This set of tools is comprised of:
- Download Pathway Database to download the MetaCyc pathway database.
- Identify Pathways which takes an abundance table or a differential abundance table with EC terms as input and predicts the presence of biochemical pathways in the sample or the up/down regulation of pathways between groups of samples.
The pathway database and the pathway calls can be visualized as simple pathway graphs to put the EC terms into the context of their biochemical compounds. In a pathway calling result, the widths and hues of EC terms can be adjusted to display the abundance or fold changes in the abundance, p-values and max group means.
Improvements, Changes and Bug fixes
Annotate with BLAST and Annotate with DIAMOND
- A new option “Adjust CDS to open reading frame” is now available. When enabled a local alignment will be extended or reduced such that the region starts with a start with a start codon, ends with a stop codon and does not contain any internal stop codons while matching the length of the reference sequence within ten percent.
Download Custom Microbial Reference Database
- Removed the feature of selection using clicks and require users to always use the “Include”/”Exclude” buttons when building databases
- Changed the protocol to connect with NCBI from ftp to https. This enables faster downloads of reference sequences.
- Fixed an issue where the parameters used for downloading the database have not been added correctly to the downloaded sequence list.
MLST
- Enabled the download of the “Listeria spp. cgMLST1748” MLST scheme using Download MLST scheme tool.
- The Import MLST Scheme tool now adds more information into the log informing about skipped headers and fields during the import process.
Identify Viral Integration Sites
- Fixed an issue in the tool where a minus one for the number of detected breakpoints indicated that no breakpoints have been found. The number of detected breakpoints is now strictly larger than 0.
Various minor improvements
- Improved the stability for detecting the CPU type on Apple systems.
- Fixed an issue with the progress reporting of the Download Curated Microbial Reference Database tool.
- Improved error handling in PERMANOVA analysis when the tree and the sequences in the abundance table do not match or the distances for other reasons become zero.
- Fixed an issue where aggregating samples for EC abundance tables would fail with a null pointer exception.
CLC Microbial Genomics Module 22.0.1
Released on 14.03.2022
Bug fixes
- Fixed a bug causing the following tools to run slow or fail on Apple M1. The tools affected are Annotate with DIAMOND, Annotate CDS with Best DIAMOND Hit, Create DIAMOND Index, Create MLST Scheme and Taxonomic Profiling
CLC Microbial Genomics Module 22.0
Released on January 11, 2022
Updated to be compatible with CLC Genomics Workbench 22 and CLC Genomics Server 22.
Improvements
Bug fixes
Changes
- Removed the file input for Download Custom Microbial Reference Database to simplify the wizard. To work with the new interface, the content of the files can be copied and pasted into the respective fields in the wizard.
- Large MLST toolset is generally applicable for 7-gene, core genome and whole genome MLST schemes the word “Large” been removed from all tools of the toolset. All names now just refer to MLST.
- All Workflows from the Microbial Genomics Module are now available in the Template Workflows section under Microbial Workflows.
CLC Microbial Genomics Module 21.1.1
Released on January 11, 2022
Improvements
Bug fixes
CLC Microbial Genomics Module 21.1
Released on June 28, 2021
New features
New features improving viral analysis capabilities
- Identify Viral Integration Sites makes it possible to detect viral integration events in host genomes based on hybrid capture data. The tool provides a circular interactive and zoomable host/virus genome viewer that makes it possible to inspect the integration events in detail, featuring coverage, unaligned end coverage and broken pair coverage visualizations and CDS track overlays. Closeby genes are reported in a synchronized table.
- Find Best References using Read Mapping serves as a high-quality alternative to the Find Best Matches Using Kmer Spectra tool, specifically for small genomes where the kmer statistic can be imprecise.
- The Analyze Viral Hybrid Capture Panel Data workflow is available for the identification of the most prevalent viral species and its variants from a sample analyzed using hybrid capture technology.
New features improving amplicon based metagenomics capabilities
Other new features
- The Compare Variants Across Samples workflow is now available for analysing variants from a single species across many samples. The main results are a combined variant track for all samples and a SNP tree.
- Download Curated Microbial Reference Database is now available to download specific reference databases as sequence lists or taxonomic profiling index files. This tool replaces the Download Microbial Reference Database tool. Currently available databases are:
– QIAGEN Microbial Insights – Prokaryotic Taxonomy Database (QMI-PTDB)
– QIAGEN Microbial Insights – Prokaryotic Taxonomy Database (QMI-PTDB) optimized for 16GB of memory
– Unclustered RVDB (https://rvdb.dbi.udel.edu/)
– Clustered RVDB (https://rvdb.dbi.udel.edu/)
- Download Custom Microbial Reference Database for the easy customization of microbial reference databases. This tool replaces the Download Microbial Reference Database tool and is more flexible as it supports and workflow execution and an improved interface to preselect references and taxonomic clades.
- Identify Large MLST Scheme from Genomes is available to select the best matching Large MLST scheme for a set of input genomes.
Improvements, Changes and Bug fixes
Workflows
- The tool now supports a subsequent iterate block on the resulting output elements.
- When using the partition number based option, a metadata column with the chunk id will be added to the optional metadata table describing the output objects.
- Fixed an issue that would cause the tool to fail when a selected column contained empty strings and “Create metadata table” has been selected.
- The tool now supports annotations using arbitrary columns for matching metadata to sequences.
- The tool is now able to handle larger annotation files.
- The name of the output of the tool now is the name of the first input element with a “(Metadata Annotated)” extension
- The tool now ignores “Name” columns specified in metadata tables. If the names of the sequences must be changed the “Batch rename” tool should be used.
- “Taxonomy” columns from metadata tables are now parsed into the standard 7-step taxonomy, even if given in a QIIME format.
- The report now contains the taxonomy information also for taxonomies specified in metadata tables.
- When allowing the tool to overwrite existing data, it is now possible to disallow overwriting non-empty data with empty values. This is specifically useful when the data comes from metadata tables produced with the Split Sequence List tool, which contains empty fields for conflicting metadata.
- Extended parallelism for faster processing.
- Fixed an issue where more threads than available could be used by the tool.
- A new option “Ignore positions with deletions” has been added to ignore SNP sites where at least one sample contains a deletion.
- Fixed an issue with the tool when some samples contain deletions at the tested SNP positions. This would lead to an alignment object, containing symbols which looked like gaps (“-“), but were invalid, leading to an error in the visualization of the tree in some cases.
This bug would also lead to slightly wrong base frequencies when creating Maximum Likelihood trees using the “Felsenstein 81”, “HKY”, or “General Time Reversible” models. The “Jukes Cantor” (which is default) and “Kimura 80” models assume equal base frequencies, and are thus not affected by this.
- Fixed an issue where using the Combine Reports tool on reports generated by the tools would not produce any output on the Genomics Server.
- Fixed an issue that caused erroneous combined reports being produced when combining already combined reports containing sections with Annotate With DIAMOND or Annotate with BLAST results.
- Fixed an issue when combining reports from the tools when the hit results were empty.
Other updates
- Create K-mer Tree now displays a warning when selecting one or more input sequence lists that contain neither Assembly ID annotations nor read groups.
- Taxonomic Profiling reports can now be combined with the Combine Reports and Create Sample Report tools.
- A bug has been fixed where aggregating the “Database builder” table on Species level in the Download Microbial Reference Database tool would result in an error. Note that the functionality has been replaced by Download Custom Microbial Reference Database.
- When exporting an abundance table as BIOM file, an option has been added to use the name column instead of the often cryptic ID column.
- Fixed an issue where OTU abundance tables could not be imported on the server.
- Fixed an issue that caused the Create Large MLST Scheme to fail with a cryptic error message when all genes have been discarded.
Functionality retirement
-
All NGS MLST tools are now considered legacy tools which means they will be removed in a future release. The functionality of the NGS MLST tools has been replaced by the “Large MLST” toolset to be found in the toolbox under Typing and Epidemiology. Note that MLST Schemes compatible with the “Large MLST” set of tools can be obtained with the “Download Large MLST Scheme” and “Import Large MLST Scheme” tools. Both tools support classical 7-gene MLST schemes, core genome and whole genome MLST schemes. The Type a Known Species and Type Among Multiple Species workflows have been updated to work with the “Large MLST” toolset instead. If you are concerned about this change or if you need help migrating to the new set of tools please get in touch via ts-bioinformatics@qiagen.com.
-
Download GO Database has been replaced by the more general Download Ontology Database tool which allows for downloading both GO and EC databases.
-
Download Microbial Reference Database has been replaced by Download Custom Microbial Reference Database and Download Curated Microbial Reference Database for constructing custom reference databases and for downloading curated reference databases, respectively.
CLC Microbial Genomics Module 21.0
Released on January 12, 2021
New features
New features improving functional annotation and exploration capabilities
- Mask Low Complexity Regions can be used to mask or annotate regions of nucleotide or protein sequences with low complexity. When creating a taxonomic profiling index from such masked sequences, the number of false positive hits in a taxonomic profiling may be reduced.
- Create Annotated Sequence List is a general sequence annotation tool that can be used to set up many different databases (annotated sequence lists) used in the CLC Microbial Genomics Module. This tool can also be used in a workflow to create annotated sequence lists, with the ability to connect to many relevant downstream analysis tools. This is a more versatile solution than using a New | Sequence List element.
- Split Sequence List can be used to separate sequence lists based on metadata annotation values, e.g. a Microbial Reference Database could be split up based on the values in the Assembly ID column, or it could be split into a specified number of equally sized lists, where the order of sequences within these lists can be optionally randomized. This tool is able to produce a metadata table that contains columns and values from the sequences’ metadata annotations that are consistent with the specified splitting scheme.
Improvements, Changes and Bug fixes
OTU Taxonomy annotations
The OTU Taxonomy field on sequence lists has been removed and replaced by the more general Taxonomy field. This changes the following behavior of OTU clustering results:
- In sunburst plots, OTUs defined at higher levels are now shown on their respective level rather than on the strain level.
- QIIME formatted taxonomies will be converted to standard 7-step taxonomies.
- The tool now has separate output channels for read mappings using unbinned contigs as references, and for read mappings using all contigs as references.
- Fixed an issue where contigs without mapped reads were not listed in the report.
- The tool now also accepts individual nucleotide sequences as input.
- The default search parameter for DIAMOND has been changed to More Sensitive to reflect the best practices for creating Large MLST Schemes.
Other updates
- Fixed an issue with Extract Reads from Selection for abundance tables. Previously, the tool would not extract the correct reads, if the table contained taxonomies specified at different levels. Also, it would not recognize and match on Assembly ID’s, if those were specified. Additionally, when selecting multiple entries, the tool would not extract the union of the selection. Note, that is it is necessary to rerun the Taxonomic Profiling tool in order for the “database matches” table to have assembly ID’s added.
- Irrelevant log messages from the Add Typing Results to Large MLST Scheme tool have been removed.
- Fixed an issue in Find Prokaryotic Genes where open reading frames of open-ended CDSs could be incorrectly identified.
- Fixed an issue with Annotate with DIAMOND where clicking “Next” when configuring the workflow element was not possible.
- Fold changes and p-values from Differential Abundance Analysis will change by a small amount compared with the values generated previously (typically <0.01%) due to improvements to the underlying statistical model.
- When downloading virus data, the Download Microbial Reference Database and Download Pathogen Database tools print warnings to check the minimum sequence length parameters.
Other Changes
- The license feature name for this product has changed. Previously it was CLC_MICROBIAL_GENOMICS_MODULE, now it is CLC_GENOMICS_PREMIUM_MODULES. If you have a network license for this product and you have configured access, or other restrictions, relating to licenses for this product, the license feature must be updated in the CLC License Server configuration.
Functionality retirement
- Create Microbial Reference Database
- Create Amplicon-Based Reference Database
- Create Gene Database
The above have been replaced by the new, general purpose Create Annotated Sequence List tool.
CLC Microbial Genomics Module 20.1.1
Released on October 22, 2020
Improvements and bug fixes
Taxonomic Profiling
- Fixed an issue where multiple host genome indices could be selected as input, but only the first was used. Only one host genome index can now be supplied to the Taxonomic Profiling tool. The Create Taxonomic Profiling Index tool can be used to combine multiple sequence lists into a single, combined index.
- Fixed an issue where, when a single taxon is the most dominant constituent of a metagenomics sample, the most abundant taxon was removed, when it should not have been. This problem was most likely to affect low complexity samples, that is, samples with 1 or very few taxa present, and that have varying read lengths in the input data. The issue was unlikely to affect high complexity samples.
- Fixed an issue where the tool would incorrectly disqualify contigs of reference sequences if reads were longer than the contig.
- Correcting abundance values to account for a skewed read distribution between taxa is now optional.
- The confidence score has been removed from the abundance table output.
Fixed an issue where Download MLST Schemes (PubMLST) stopped working due to a recent change in the XML format used by PubMLST.
Fixed an issue where Bin Pangenomes By Taxonomy would fail when using a Taxonomic Profiling index containing entries without taxonomies.
CLC Microbial Genomics Module 20.1
Released on June 23, 2020
New Features
All tools for creating and using Large MLST Schemes are now out of beta.
A tutorial called Working with large MLST schemes is now available, covering using Large MLST Scheme tools with a focus on scheme creation, modification, and isolate typing.
A workflow Create Large MLST Scheme with Sequence Types is now provided for creating high-quality schemes with sequence types. This workflow includes the Create Large MLST Scheme tool, which has been substantially revised for this release, including supporting the creation of schemes from organisms with spliced genes.
Type with Large MLST Scheme
- Typing is now faster and there is a smaller memory footprint.
- Noise from spurious kmer hits has been reduced.
- A new parameter, “Minimum kmer ratio”, has been added to allow low-confidence allele hits to be removed.
- When supplying the output report from this tool to the Extend Result Metadata Table tool, three new fields are added to the Result Metadata Table, “Typing Status”, “Sequence Type” and “MLST Scheme”.
- The number of reported sequence types is now restricted to 100.
- The assembly status is now automatically detected. The “Assemble reads” option is thus no longer presented in the wizard.
- Read coverage for the alleles is now automatically detected.
- Fixed an issue where the tool could detect novel alleles with ambiguous symbols when typing an assembly.
- Fixed an issue where the tool would fail with an error if no MLST scheme had been supplied.
Add Typing Results to Large MLST Scheme
- Performance has been improved
- Allele lengths are now checked, and outliers due to length are automatically discarded.
- The handling of novel alleles names has been improved.
- Metadata of an associated metadata table is now transferred to the output scheme.
Create Large MLST Scheme
This tool has been substantially revised for this release. Among other changes, it now supports the creation of schemes from organisms with spliced genes. Please refer to the manual for usage details.
Below is a list of concerns addressed in this update:
- Typos and number formatting in the report output have been corrected.
- Handling of ambiguous sequence types has been improved.
- Fixed an issue where the tool would fail when searching for unannotated genes and at least one contig did not contain a single DIAMOND hit.
- Fixed an issue where the tool would fail when the stop codon coincided with the end of a contig.
- Fixed an issue where the tool would fail when DIAMOND detected a frameshift in a gene.
- Fixed an issue where the locus type annotations “AMR related” and “Virulence related” were not added when creating a scheme without sequence types.
- Fixed an issue where the tool would not accept a sequence list without CDS annotations if it was added as the first input element, even when several input elements with CDS annotation had been selected.
Download Large MLST Scheme
- A wider range of pubMLST schemes are now handled.
- Handling of ambiguous sequence types has been improved.
- Downloading schemes is now more robust.
- Various other minor improvements
Working with large MLST schemes
- Large MLST schemes can now be exported.
- The loading time of large MLST schemes has been improved.
- Fixed an issue where the clustering options for the creation of subschemes or reclustering had been disabled in certain cases.
- Fixed an issue in the visualization of the minimum spanning trees where a null pointer exception would occur when shift-dragging before selecting any node.
Two new tools for annotating nucleotide sequences, such as de novo contigs or genomes, from a candidate set of reference sequences are availalble under the Functional Analysis folder:
- Annotate with DIAMOND Adds CDS annotations to DNA sequences based on matches found using DIAMOND. Matching can be done against a list of proteins, CDS annotations from an annotated genomic sequence, or a DIAMOND index.
- Annotate with BLAST Annotates a sequence list based on matches found using BLAST, where matching can be done against a list of proteins, a list of nucleotide sequences, annotations from an annotated genomic sequence or existing BLAST databases.
Extend Result Metadata Table This replaces the Add To Result Metadata Table tool, providing similar functionality, but in a form that can be included in workflows intended for use on QIAGEN CLC Genomics Cloud Engine.
Improvements, Changes and Bug fixes
Taxonomic Profiling can be used now analyze metagenomic data produced with long read technologies.
The Type a Known Species, Type among Multiple Species and Map to Specified Reference workflows can now be run on the QIAGEN CLC Genomics Cloud Engine. Changes introduced to support this were the inclusion of control flow elements in the workflow design and the inclusion of the new Extend Result Metadata Table tool.
The Create Kmer Tree and Create SNP Tree tools can now handle input data that is associated with several metadata tables. The use of the Result Metadata Table is now optional.
The removal and reporting of duplicate sequences by Create Taxonomic Profiling Index has been improved.
Assembly grouping options have been added to Find Prokaryotic Genes, allowing the grouping of input sequences to be specified.
Download Microbial Reference Database
- Fixed an issue where not all available genomes from NCBI were shown.
- Stability when communicating with the NCBI servers has been improved.
- The speed when loading the selection table has been improved.
- Duplicate sequences are no longer removed from input sequence sets, allowing more general use of this tool for browsing and downloading sequences from the NCBI.
- The “Sequence list(s)” option for providing sequences to be appended to the downloaded sequence list has been removed. Please see our FAQ entry “How can I concatenate sequence lists and when do I need to?” if you have been using this option previously.
- The Database Selection table now offers the possibility of de-/selecting several references with a single click.
ARES AMR Database
- The data in the overview have been corrected and consolidated at the species level.
- Fixed an issue where duplicate names in a created PointFinder table caused the Find Resistance with PointFinder tool to fail.
The following tools and workflows have been moved to the Legacy folder of the Workbench Toolbox, with “(legacy”) appended to their original names. They will be removed in a future version of the software.
CLC Microbial Genomics Module 20.0
Released on December 11, 2019
New Features
Five new tools are available for working with core genome (cg) and whole genome (wg) MLST schemes, three tools to create MLST schemes :
- Download Large MLST Scheme for downloading MLST schemes from PubMLST.org. It currently supports a range of cgMLST, eMLST and traditional 7-gene MLST schemes.
- Import Large MLST Scheme for importing MLST schemes from plain-text files in pubMLST format, i.e. one tsv file with the profile information and one FASTA file per locus containing the alleles.
- Create Large MLST Scheme (beta) facilitates the construction of cg/wg MLST schemes starting from a sequence list with CDS annotations.
For typing and extending schemes, two new tools are available
The MLST schemes feature minimum spanning tree and heat map visualizations which are synchronized with the typing results, facilitating the analysis of typing results in relation to the scheme it has been typed with.
Improvements, Changes and Bugfixes
Resistance Detection Tools and Databases
- The “Filter overlaps” option of Find Resistance With Nucleotide DB now prioritizes BLAST hits by the number of aligned nucleotides (similarity*length) and only reports the best hits irrespective of the “Predicted Phenotype”.
- The QMI-AR database has been updated to version 2019-11
- The CARD database has been updated to version 3.0.5.
- When downloading multiple resistance databases, the Download Resistance Database tool no longer stops if one of the downloads fails. It will continue to attempt to download the other databases.
Alpha Diversity
- The calculation of the percentiles of the Alpha Diversity Box Plot editor has been changed to match the default implementation of the “quantile” method in R.
- The whiskers in the boxplot visualization of the alpha diversity, are now restricted to the last data point within the selected interquartile range, according to https://en.wikipedia.org/wiki/Box_plot.
Bin Pangenomes by Taxonomy
- The report generated by Bin Pangenomes by Taxonomy now contains the name of the taxonomy of a bin rather than the taxonomic level in the “Taxonomy” column, and the contig and nucleotide counts have been corrected.
- Fixed an issue where the tool failed if several files containing contigs were used as a parameter.
Bin Pangenomes by Sequence
- Fixed a bug in Bin Pangenomes by Sequence that caused the tool to stall when some of the reads giving rise to a contig had been removed prior to binning.
- Fixed an issue that caused the tool to crash when two input connections in a workflow contained different types of data.
DIAMOND
- The DIAMOND tool has been updated to v0.9.26. To run this version, CPUs supporting AVX instructions are now also required for Linux based operating systems, and all existing DIAMOND index files will have to be recreated. DIAMOND is used in:
- Improved progress reporting for tools running DIAMOND.
The Typing and Epidemiology tools are no longer beta-status tools.
The QIAseq 16S/ITS Demultiplexer tool has been updated with new barcodes to support the latest QIAGEN QIAseq 16S/ITS Region Panels kit.
For the Differential Abundance Analysis tool, the order of comparisons in an “Across Groups” analysis has been changed to match the sign of the fold change of the “All group pairs” or “Against control group” analyses.
CLC Microbial Genomics Module 4.8
Released on September 19, 2019
Resistance analysis updates:New and updated databases are available for download via the Download Resistance Database tool:
- The integrated ARES database, containing resistance conferring genes and mutations, with the respective empirical predictive performance data.
- An updated QMI-AR database where a serious issue has been fixed which caused some of the resistance genes originating from ResFinder to be associated with the wrong ARO number. Further details …
- Additional and updated PointFinder databases.
Find Resistance with ShortBRED
- The computational performance of the tool has been improved for the case where a sequence list of reads matching ShortBRED markers is requested.
- The column names of the summary table output from the tool have been renamed from ‘Number of reads‘ and ‘Number of unique reads‘ to ‘Number of markers‘ and ‘Number of reads‘, respectively.
When using the Find Resistance with Nucleotide DB tool with the Virulence Factor Database and using Add To Result Metadata Table, the corresponding column names have been adjusted to ‘Virulence found‘ instead of ‘Resistance Found‘.
A new tutorial for profiling antimicrobial resistance genes in isolate and metagenomic samples of NGS reads is provided.
Additional bugfixes and improvements:
- Fixed a serious issue where the Create MLST Scheme tool would wrongly associate allelic sequences with profiles in some cases. Further details …
- For the stacked bar chart visualization of abundance tables it is now possible to customize the legend by selecting the number of items to be shown, with a default value of 10, and to specify the depth of the taxonomy if the view is aggregated.
- Fixed a bug in the Alpha Diversity Box Plot Editor which caused the visualization to crash for an aggregated abundance table.
- A bug has been fixed which caused the QIAseq 16S/ITS Demultiplexer tool to crash when the first column contained numerical values.
- Fixed a bug that appeared when the tool Download Microbial Reference Database was added to a workflow.
- When running the tool Remove OTUs with Low Abundance in a workflow, an error message has been added when multiple input files are encountered.
CLC Microbial Genomics Module 4.5
Release on June 27, 2019>
New Features
Five new resistance databases are now accessible through the Download Resistance Database tool:
- two peptide databases of Antibiotic Resistance markers for the Find Resistance with ShortBRED tool:
- The QIAGEN Microbial Insight – Antimicrobial Resistance (QMI-AR) database
- The Comprehensive Antimicrobial Resistance Database (CARD) (McMaster University)
- three nucleotide databases of Antibiotic Resistance genes for the Find Resistance with Nucleotide DB tool:
- The QIAGEN Microbial Insight – Antimicrobial Resistance (QMI-AR) database
- The Comprehensive Antimicrobial Resistance Database (CARD) (McMaster University)
- The Virulence Factor Database (VFDB) (CAMS&PUMC – China)
Improvements, Changes and Bugfixes
Find Resistance with ShortBRED
- The tool now supports two new marker databases: QMI-AR and CARD.
- The tool now supports more comprehensive metadata including Compound ARO, Gene annotation depth and information about the antibiotic class to which the marker confers resistance.
- In the optional sequence-list tool output, reads are annotated with phenotype and other information from markers they aligned to.
- The tool can now output a sortable and searchable result table.
Alpha diversity
- Alpha diversity measures can now be calculated on a specified level of the taxonomy.
- Alpha diversity result can now be visualized as a box plot at a given rarefaction level where the samples are grouped by their metadata.
- In the box plot visualization p-value statistics for the Kruskal-Wallis and the Mann-Whitney U tests are available for comparing groups of samples.
- The Alpha diversity result can now only be visualized with the ‘Alpha Diversity Graph‘, the similar ‘Line Graph‘ has been removed.
- The options ‘X-axis at zero‘, ‘Y-axis at zero‘, and ‘Show as histogram’ of the ‘Alpha Diversity Graph’ have been removed.
- Fixed a bug preventing the tool from being executed in some workflows.
Beta diversity
- The beta diversity result can now be visualized as a 2D plot of the principal coordinates.
- In order to explore variations in community structure, samples can now additionally be colored by the aggregate abundance of user-selected taxonomic groups.
- Side panels now have a ‘Show point‘ option so that the samples to include in the plot can be selected.
Find Prokaryotic Genes
- The tool can now create a single or multiple gene structure models when the input consists of several assemblies or contig bins.
- The tool now provides an option to annotate open-ended sequences (e.g. short contigs).
- The tool has now options to save and reuse models.
- It is no longer a beta status tool.
Download Protein Database
- The UniRef50 and SwissProt databases have been updated to version 2019_03.
- The UniRef90 and UniRef100 databases can no longer be downloaded.
Bin Pangenomes by Sequence
- Fixed a bug, where it would produce only few bins for large contig lists (in the order of 20,000 contigs or more).
- Fixed a rare bug where the tool would crash when the coverage on a contig was best explained by a single Poisson function.
Download Microbial Reference Database
- Fixed an issue where the tool would fail when providing a sequence list without accession IDs as parameter while and selecting the option to ‘Include only plasmids‘ or ‘Exclude all plasmids‘.
- Fixed a bug which leads to wrong numbers in the taxonomic summary table of the reports for the Download Microbial Reference Database and the Taxonomic Profiling tools.
- Fixed an issue where duplicate sequences in the sequence list provided were not removed.
- Fixed a bug where the statistics in the report were wrong.
Additional improvements
- The Find Resistance with ResFinder tool has been renamed: Find Resistance with Nucleotide DB. This tool now offers support for the new antimicrobial resistance databases, QMI-AR and CARD, and the virulence factor database VFDB (see above).
- A search field has been added in the side panel of visualizations with associated metadata information in order to enable easy access to the metadata category of interest.
- The folders ‘Functional Analysis’ and ‘Drug Resistance Analysis’ have been moved from ‘Metagenomics’ to the general ‘Microbial Genomics Module’ folder to highlight that these tools can be used with both metagenomic and isolate data.
- The input field for the parameter MLST Scheme Name has been shortened to fit within the default window size of the Create MLST Scheme tool wizard.
- Fixed a potential bug that could cause the OTU Clustering tool to incorrectly count the number of paired reads mapping in the forward or reverse orientation to a reference OTU. While this would not affect the result of the clustering, it may prevent some OTUs from being reversed-complemented in the output.
- Fixed an issue where creating a sunburst plot for an aggregated abundance table with the button ‘Create Abundance Subtable’ would fail when at least one row in the original abundance table had the entry “N/A” in the taxonomy column.
CLC Microbial Genomics Module 4.1
Release on January 31, 2019
Metagenomics – Amplicon-Based Analysis
- The OTU Clustering tool has a new option for specifying if non-merged paired-end reads should be included in the analysis. This option is off (unchecked) by default, as including only merged reads improves analysis run time. The Data QC and OTU Clustering workflow now also includes only merged reads in the OTU clustering analysis step. To run the workflow with all reads, a copy of the workflow must be created and this option enabled in that copy.
- The “Similarity Percentage” parameter can now be adjusted when launching the Data QC and OTU clustering workflow.
- Fixed a bug where action buttons underneath tables would not be accessible if the table view was too narrow.
Metagenomics – Taxonomic Analysis
Metagenomics – Functional Analysis
- The Build Functional Profile tool can now output a DIAMOND hits functional profile.
- Fixed a bug in the Find Prokaryotic Genes tool that affected genes spanning the origin of circular chromosomes, which would have the annotated CDS region spanning the whole circular chromosome.
- Fixed a bug that would cause the tool Annotate CDS with Best Diamond Hit to stall when running Diamond in ‘sensitive’ or ‘more sensitive’ mode in a Gx Workbench running on Windows 10.
Metagenomics – Drug Resistance Analysis
- Fixed a bug with the tool Use Genome as Result – and the workflow using the tool called Map to Specified Reference – when the genome name contains a colon ” : “.
- Fixed a problem where the Download MLST Schemes (PubMLST) tool did not format the MLST schemes properly resulting in non-conclusive MLST assignments when using the downloaded schemes for typing.
CLC Microbial Genomics Module 4.0
Released on November 28, 2018
New tools for Metagenomics
- Create Taxonomic Profiling Index, a tool to index reference sequences for use with the Taxonomic Profiling tool.
- Create DIAMOND Index, a tool for computing a DIAMOND index which can be used as input to Annotate CDS with Best DIAMOND Hit.
- Bin Pangenomes by Sequence, a tool to group contigs and reads typically of a shotgun metagenomic sample uniquely based on sequence and coverage similarity.
- Bin Pangenomes by Taxonomy, a tool to group contigs and reads typically of a shotgun metagenomic sample according to their taxonomic relationship.
- QC, Assemble and Bin Pangenomes, a workflow for pre-processing and assembly of whole-genome shotgun sequencing reads, and bin contigs/reads according to taxonomic association and sequence similarity.
- Drug resistance analysis, a new area that collects tools for antibiotic resistance analysis:
- Find Resistance with PointFinder, for identifying antimicrobial resistance mutations present in a isolate sample using a antibiotic variant database especially designed by QIAGEN.
- Find Resistance with ShortBRED, for identifying antimicrobial resistance genes and quantifying their relative abundance in a metagenomic sample using a peptide marker database especially designed by QIAGEN.
- The tool Find Resistance has been renamed to Find Resistance with ResFinder.
New features and improvements: Functional Analysis
- The tool Annotate CDS with Best DIAMOND Hit has new options to run in standard, sensitive and more sensitive modes.
- We improved the accuracy of the BLAST search in the Annotate CDS with Best BLAST Hit tool.
- Improved the Sunburst plot to allow graphical export with the legend.
- Three vector formats (.ps, .eps, .svg) have been added to the export sunburst dialog.
- Stacked bar charts now also show the relative abundance when hovering over the chart.
Improvements for Databases
Bug fixes
CLC Microbial Genomics Module 3.6.1
Released on October 10, 2018
Bug fixes
CLC Microbial Genomics Module 3.6
Released on September 13, 2018
Improvements
- It is now possible to import a custom MLST profile using the Create MLST Scheme tool.
- In the Add NGS MLST Report to Scheme tool it is now possible to add more than one report, and therefore more than one sequence type, to a scheme at a time.
- Warning messages in Add NGS MLST Reports to Scheme and Merge MLST Schemes now appear when the specified report/schemes to add/merge are incompatible.
- The protein accession ID links in the DIAMOND result table now point to UniProtKB instead of NCBI.
- The QIAseq 16S/ITS Demultiplexer tool now adds region information to the read group in the element info output. Thus the OTU Clustering tool adds region information as metadata in the abundance table to allow data aggregation based on this metadata category.
- In Abundance tables, headers of the columns displaying abundances for each sample have been reverted to show the sample name first. This improves clarity when showing an Abundance table with multiple samples.
Bug fixes
- Fixed a bug in Add Sequence to MLST tool, where the steps defining the sequences to be added were not updated after changing the specified MLST scheme.
- Fixed a bug causing the Find Prokaryotic Genes tool to fail when a large number of sequences are provided as input.
- Fixed a bug causing the parameter validation of the QIAseq 16S/ITS Demultiplexer tool to fail when it is included in a workflow.
CLC Microbial Genomics Module 3.5
Released on June 28, 2018
New tools
- Annotate CDS with Best DIAMOND Hit – an efficient alternative to Annotate CDS with Best BLAST Hit allowing the annotation of large data sets, even on desktop machines.
- Download Protein Database – five protein databases are available to download using this tool: COG, SwissPROT, UniRef-50, UniRef-90, and UniRef-100
- Find Prokaryotic Genes (beta) – a tool for identifying and annotating prokaryotic genome or contig sequences with predicted gene and CDS regions.
- QIAseq 16S/ITS Demultiplexer– a tool for demultiplexing reads generated using QIAseq 16S/ITS Screening and Region panels.
Improvements
- Abundances tables have now the following buttons:
- Create Abundance Subtable replaces Create Abundance Table from Selection and will create a table from selected rows.
- Create Sequence Sublist (available for OTU abundance tables only) will create a sequence list from selected rows.
- Create Normalized Abundance subtable will create a table normalized on a single row for which all abundance values are non zero.
- The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now create a copy of the input instead of modifying it.
- The Annotate CDS with Best BLAST Hit, Annotate CDS with Best DIAMOND Hit and Annotate CDS with Pfam Domains tools now optionally outputs a table summarizing information about the annotations added to the sequence list.
- The Create Microbial Reference Database now includes an option to use a QIAGEN compiled set of Genbank assembly IDs pre-selected to represent the full NCBI list of genomes. The optimized database is particularly well-suited for running the Taxonomic Profiling tool on a laptop computer with 16GB of RAM.
- The Taxonomic Profiling tool now qualifies reference genomes automatically without hard thresholds for minimum number of reads or minimum coverage, exploring the potential mapping positions more exhaustively.
- The Taxonomic Profiling tool has a new option called “Minimum seed length” that allows users to define the desired balance between precision (higher length) and recall (lower length).
- In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.
Changes
- In workflows, the PERMANOVA Analysis and Convert Abundance Table to Experiment tools no longer accept as input abundance tables generated by tools within the same workflow. Abundance tables must now exist prior to launching any workflow containing either of these tools. Existing workflows where either of these tools is configured to take in abundance tables generated by other tools in the same workflow will need to be re-designed.
- The folder ‘Amplicon-Based OTU Clustering’ has been renamed to ‘Amplicon-Based Analysis’.
- In the Databases folder, the ‘Taxonomic Profiling’ subfolder was renamed to ‘Taxonomic Analysis’.
Bug fixes
- Fixed a bug that caused the ID column to display incorrect information on aggregated Abundance Tables.
- Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the “Fuzzy match duplicates” option enabled.
- Fixed an issue that would affect the OTU Clustering report when run with the option “Allow creation of new OTUs” disabled: “Total predicted OTUs” and “De novo OTUs” are now showing correct values. More specifically, the “Total predicted OTUs” would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the “De novo OTUs” value, which is computed as the difference between the “Total predicted OTUs” and the “OTUs based on database” values.
- Fixed a bug that would happen in the rare cases where identical subsequences (contigs) with different taxonomies were found in a database for the Taxonomic Profiling tool. The taxonomy of the identical contigs are now set to the lowest common ancestor.
CLC Microbial Genomics Module 3.0.1
Released on May 15, 2018
Improvements
Bug fixes
- Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
- Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
- Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
- Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the “Metadata factor” parameter was not retained when configuring the tool in a workflow.
CLC Microbial Genomics Module 3.0
Released on November 21, 2017
New features
- The Create SNP Tree tool can now output a new SNP Matrix that contains a pairwise comparison of SNP differences between any pair of all samples included in the analysis.
- The matrix supports coloring of individual table cells for easy identification of related strains.
- It is possible to highlight samples with less SNP differences than an adjustable threshold.
- A new Multi-VCF format in the Export menu renders possible to export multiple samples’ variant tracks into one VCF file, provided that they have the same reference genome.
- A new option in the Data section of Abundance Table Settings side panel allows for hiding entries with incomplete taxonomy for the taxonomic level chosen to aggregate the data.
Changes
- Updated the Alpha Diversity tool to being able to handle a lower detection limit per feature in an abundance table.
- The optional output of a Distance Matrix from the Beta Diversity tool is changed from being a simple table object to now being a SNP Matrix object.
Improvements
- The Taxonomic Profiling tool has been improved, allowing higher detection rates at an equivalent level of false positives.
- The Taxonomic Profiling tool can be configured by the users according to two new options: the minimum number of reads, and minimum coverage criteria necessary for the read to be assigned.
- The Differential Abundance Analysis tool has been updated such that:
- It has an extra option for the comparison of all groups against one specific group within a metadata factor.
- It can perform an ANOVA-like comparison.
- The Create SNP Tree tool now also supports construction of Maximum Likelihood phylogenies:
- Users can choose whether to run a Neighbor-Joining algorithm or a Maximum Likelihood algorithm.
- Users can optionally output an alignment of the concatenated SNPs that are used in the construction of SNP tree.
- Trees produced with the Create SNP Tree and Create K-mer Tree tools are now multifurcating.
Bug fixes
- Fixed a bug that caused bacterial assemblies of type “acidobacteria” and viral assemblies of type “dsDNA viruses, no RNA stage” to not be shown by the Create Microbial Reference Database tool.
- Fixed a bug causing the annotation columns “Assembly ID” and “FTP Path” to disappear in sequence lists downloaded with the Create Microbial Reference Database tool.
- Updated the manual to be more specific about downloading viruses from NCBI with the Create Microbial Reference Database tool.
- Fixed a bug that cause Create Microbial Reference Database tool to not download taxonomies for all entries in some cases.
- Fixed a bug caused by NCBI renaming a column in one of their files and leading the Download Pathogen Reference Database tool to fail.
- Renamed the “Set of species” option in Download Pathogen Reference Database to “By Kingdom/Domain”.
- Fixed a bug in the OTU Clustering tool causing the Merge Paired Reads Report to not be output when the input contains both merged and non-merged sequence lists.
- Fixed a bug in Align OTUs with MUSCLE that would cause the tool incorrectly select the most abundant in some cases.
- The Differential Abundance Analysis now accepts metadata groups with only one replicate.
- Added a popup menu allowing to select and deselect all samples in Stack and Sunburst visualization of abundance tables.
- Upgraded the Neighbor Joining algorithm in the Create SNP Tree tool to use less memory.
- Updated the Create SNP Tree and Create K-mer Tree tools so that trees with negative branch length are not allowed.
- Fixed an issue with the Biom importer when run through the Cosmos ID plugin.
- Updated manual with special system requirements.
CLC Microbial Genomics Module 2.5.5
Released on October 10, 2018
Bug fixes
CLC Microbial Genomics Module 2.5.4
Released on June 28, 2018
Improvements
- In OTU abundance tables, headers of the columns displaying abundances for each sample now include the sample name for clarity.
- OTU abundances tables have now a Create Sequence List from Selection that will create a sequence list from selected rows.
Bug fixes
- Fixed a bug that caused the ID column to display incorrect data on aggregated Abundance Tables.
- Fixed an issue that would make the OTU Clustering tool stall frequently or fail when running with the “Fuzzy match duplicates” option enabled.
- Fixed an issue that would affect the OTU Clustering report when run with the option “Allow creation of new OTUs” disabled: “Total predicted OTUs” and “De novo OTUs” are now showing correct values. More specifically, the “Total predicted OTUs” would erroneously include some OTUs to which no input read was assigned. This would in turn cause an overestimation of the “De novo OTUs” value, which is computed as the difference between the “Total predicted OTUs” and the “OTUs based on database” values.
CLC Microbial Genomics Module 2.5.3
Released on May 15, 2018
Improvements
Bug fixes
- Fixed an issue in the OTU Clustering tool that would cause a paired read that had been merged to be filtered out if one of the members of the pair contained sequencing errors.
- Fixed an issue where domain annotations added by the Annotate CDS with Pfam Domain tool started one amino acid later than expected.
- Fixed an issue where the nodes in a K-mer tree referred to individual sequences instead of assembles. This caused problems if bacteria with more than one chromosome where included for analysis.
- Fixed a bug in the Differential Abundance Analysis tool where the most recent value of the “Metadata factor” parameter was not retained when configuring the tool in a workflow.
CLC Microbial Genomics Module 2.5.2
Released on December 5, 2017
Bug fixes
CLC Microbial Genomics Module 2.5.1
Released on September 11, 2017
Bug fixes
- Fixed an issue in the Create Microbial Reference Database tool that led to incorrect taxonomies being assigned when “Viruses” was selected in the “Select NCBI sources” section of the wizard.
- Fixed a bug that caused the OTU clustering tool to fail in rare cases.
CLC Microbial Genomics Module 2.5
Released on August 16, 2017
New features
- New import and export feature of abundance tables in the biological observation matrix (biom) file format. This allows users to share and use their data with analysis tools from CosmosID, or to visualize an abundance table from CosmosID using the MGM tools:
- The new importer supports version 1.0 and 2.1 of the biom file format.
- The new exporter supports version 2.1 of the biom file format.
- The manual section about the Taxonomic Profiling tool has been updated to reflect the current intended use of the tool.
Changes
- The tools Optional Merge Paired Reads and Fixed Length Trimming have been moved to the Legacy Tools folder of the toolbox as they are no longer needed for the OTU Clustering tool. They will be completely removed in a future release of the software.
- The Optional Merge Paired Reads and Fixed Length Trimming steps have been removed from the Data QC and OTU Clustering workflow because the OTU Clustering tool can now merge paired reads and does not require fixed-length sequences as input.
- The Taxonomic Profiling tool now allow the user to optionally “Estimate paired end distances” as a pre-processing step, and its performance has been improved.
Improvements
- The OTU Clustering tool can now also handle fungal Internal Transcribed Spacer (ITS) amplicon sequences:
- The algorithm have been improved to handle variable length data like fungal ITS sequences, which makes the Fixed Length Trimming tool redundant.
- The OTU Clustering tool now handles OTUs with reads mapping in both forward and backward orientation for taxonomic assignment. This kind of mixed orientation data now also works with the “Allow creation of new OTUs” option enabled.
- After loading the read sequences, the tool now attempts to merge any overlapping paired-end reads, thus making the Optional Merge Paired Reads tool redundant. The parameters for the alignment of reads are now part of the “OTU Clustering” wizard. OTU clustering is performed on all reads, i.e., both reads that are merged and reads that could not be merged.
- The tool can process both paired-end and single-end data files at the same time.
- The Taxonomic Profiling reference database index management has been improved, in that it includes messages/warnings in the wizard about indexing, and generates a new CLC folder called “CLC_MgmReferenceCache” designated for the storage of index files.
- The Download Database for Find Resistance tool has been updated to point to the newest version of the database.
Bug fixes
- Fixed a bug that caused the “Create Abundance Table from Selection” button to fail due to duplicated names while aggregating on taxonomy.
- Fixed a bug that caused the Data QC and Clean Host DNA, Data QC and Taxonomic Profiling, Type a Known Species, and Type Among Multiple Species workflows to not run on CLC Genomics Server without the Biomedical extension enabled.
- Fixed a bug that caused Add Metadata to Abundance Table to throw a NullPointerException when opening Excel files with empty cells.
- Fixed a bug that caused the Create SNP Tree tool to fail when analyzing read mappings whose genomes are comparable but have chromosomes in a different order.
- Fixed a bug that caused the Find Resistance tool to not report all BLAST hits when the gene database contains more than 250 genes.
- Fixed a bug causing Stacked Charts to throw an out of bounds exception when changing from “Bar Chart” to “Area Chart”.
- Fixed a bug that made the Create Microbial Reference Database tool crash when filtering sorting and aggregating a selection table.
- Fixed a bug causing the “File with accession number” option in the Create Microbial reference database tool to be without effect.
- Minor bug fixes
CLC Microbial Genomics Module 2.0
Released on March 2nd, 2017
New features
- New tool for Taxonomic Profiling of whole metagenome shotgun sequencing datasets.
- All existing visualizations (stacked bar charts, stacked area charts, sunburst charts and heat maps) have been updated to work with the output from this tool.
- All existing abundance analysis tools (Alpha Diversity, Beta Diversity, PERMANOVA Analysis and Differential Abundance Analysis) have been updated to work with the output from this tool.
- Three new workflows for host DNA removal, taxonomic profiling and downstream analysis of whole metagenome shotgun sequencing datasets:
- New tool for easily creating custom microbial reference genome databases for use in taxonomic profiling and microbial isolate typing: Create Microbial Reference Database.
Changes
- The plugin Toolbox has been largely restructured in order to make it more intuitive to navigate. Microbiome analysis tools are now categorized into four folders: Amplicon-based OTU Clustering, Taxonomic Analysis, Functional Analysis, and Abundance Analysis. All database management tools have been collected in the top-level folder Databases.
- The two tools Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases have been merged into one tool called Download Pathogen Reference Database.
- Three tools have been renamed:
Improvements
- The speed of searches for data elements with associations to specified metadata, from within a Result Metadata Table, has been greatly improved. To enable metadata related searches to work after upgrading to the Microbial Genomics Module 2.0, indices for the locations containing the relevant data will need to be rebuilt.
- The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the “Allow creation of new OTUs” option, as the orientation of the new OTUs will not be inferred consistently.
- When aggregating an abundance table, for example by class, a new column called “Class (Aggregated)” containing the class names is created. This name will be used in subsequent analysis outputs to avoid very long feature names in abundance tables and downstream analysis tools, e.g., heat maps.
- The Set Up Microbial Reference Database tool now has an option to update the latin name of each sequence in a given sequence list with the content of the source annotation of the sequence.
- The Set Up Microbial Reference Database tool now also recognizes “Latin name” as a special metadata column name, making it easier to set up custom databases with meaningful sequence names.
- The Download Pathogen Reference Database tool now corrects corrupt latin names of sequences by replacing them with the content of the source annotation in the downloaded genbank files.
- Axis in PCoA plots output from the Beta Diversity tool can now be replaced my metadata columns in order to make clustering correlated with specific metadata more visible.
- The Differential Abundance Analysis tool now checks the input metadata and displays a warning directly in the wizard if singularities or linear dependencies are found.
- Added a new column to the result metadata table, “Best match, average coverage”, which will help identifying samples that have been sequenced with insufficient depth.
Bug fixes
- Fixed a bug in abundance tables that caused read names to be appended to the aggregated taxonomy in rare cases when aggregating on higher phylogeny levels.
CLC Microbial Genomics Module 1.6.2
Released on March 06, 2017
Improvements
- The OTU Clustering tool now handles OTUs with reads mapping in both the forward and backward orientation for taxonomic assignment. Note that this kind of data should not be used with the “Allow creation of new OTUs” option, as the orientation of the new OTUs will not be inferred consistently.
Bug fixes
- Fixed a serious bug that made all downloads on Windows machines with the Download Bacterial Genomes from NCBI and Download Pathogen Reference Databases tools fail.
- Fixed a bug in the Download MLST Schemes (PubMLST) tool that caused an error when starting the tool. This error emerged after PubMLST migrated to a new server.
- Fixed a bug in the De Novo Assemble Metagenome tool that caused some contigs to be duplicated exactly.
- Fixed a bug in the Alpha Diversity tool that sometimes caused a miscalculation caused by a numerical overflow when using Simpson’s diversity index.
- Fixed a bug that caused the Annotate CDS with Pfam Domains tool to not give an output when the input only had one CDS annotation.
- Fixed a bug that caused some MLST schemes to throw an error when shown in a table view.
- Fixed a bug that sometimes caused sunburst charts to hide high-abundance features in the ‘Other’ category. Sunburst charts now display the 100 most abundant features and group all other features into ‘Other’.
CLC Microbial Genomics Module 1.6
Released on September 15, 2016
Updated for compatibility with CLC Genomics Workbench 9.5, Biomedical Genomics Workbench 3.5 and CLC Genomics Server 8.5.
CLC Microbial Genomics Module 1.5.1
Released on August 30, 2016
Bug fixes
- Fixed a bug that caused the tool Find Best Matches using K-mer Spectra to fail in some cases when run against a single reference genome.
- Fixed a bug preventing users to save the view settings of rarefaction plots.
CLC Microbial Genomics Module 1.5
Released on July 12, 2016
New features
- With the new tool Download Pathogen Reference Databases, users can now easily download prebuilt reference databases for typing of the following pathogens:
- Salmonella enterica
- Listeria monocytogenes
- Escherichia coli and Shigella
- Campylobacter jejuni
- Acinetobacter baumannii
- Klebsiella pneumoniae
- Custom reference databases for typing microbial isolates can be set up using the new tool Set Up Pathogen Reference Database.
- Annotating references in existing reference databases with metadata is also enabled by the new tool Set Up Pathogen Reference Database.
- Custom gene databases for antimicrobial resistance typing can be set up using the new tool Set Up Resistance Gene Database.
- Functionality to check microbial isolate samples for contamination and low quality has been added to the tool Find Best Matches Using K-mer Spectra.
- Statistical differential abundance analysis of taxonomic and functional entities across samples or groups of samples is enabled by the new tool Differential Abundance Analysis.
- Hierarchical clustering of both samples and features in abundance tables produced by OTU clustering or whole metagenome functional analysis is enabled by the new tool Create Heat Map for Abundance Table.
Improvements
- Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
- The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
- The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
- The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
- The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
- The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
- In Stacked Bar Charts and Area Charts visualizations of abundance tables,
- samples can now be sorted according to their names or according to associated metadata.
- features (taxonomic or functional entities) can now be sorted according to their abundance or name.
- the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
- samples and groups of samples can now be renamed by clicking their names in the side panel.
- In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
- In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
- Alpha diversity plots now include a legend that can be set up based on the available metadata.
- In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
- When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.
Bug fixes
Changes
- The Type A Single Species workflow workflow has been renamed to Type a Known Species.
- The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
- The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
- The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
- All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
- The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.
CLC Microbial Genomics Module 1.4
Released on July 12, 2016
New features
Improvements
- Taxonomic assignment to microbial isolate samples in databases downloaded by the Download and Set Up Pathogen Reference Database tools is now done to the species level, and not just genus level as it was previously.
- The Create K-mer Tree tool now includes a default K-mer tree layout that makes it easier to identify a suitable common reference in the tree.
- The Create SNP Tree tool now includes a default SNP tree layout that visualizes useful analysis results and serves as a good starting point to find your own favorite layout.
- The Create K-mer Tree and Create SNP Tree tools now accept input samples that are associated to multiple metadata tables when a Result Metadata Table is also supplied.
- The Find Best Matches using K-mer Spectra tool has been changed to use the Z-score rather than the the number of matching k-mers to select best matches in order to remove a bias towards larger genomes.
- The Find Best Matches using K-mer Spectra tool has been changed to use both the forward and reverse strand of the supplied references to enable a more accurate best-match detection.
- In Stacked Bar Charts and Area Charts visualizations of abundance tables,
- samples can now be sorted according to their names or according to associated metadata.
- features (taxonomic or functional entities) can now be sorted according to their abundance or name.
- the “Other” feature category can now be hidden in both the plot and in the legend of the plot.
- samples and groups of samples can now be renamed by clicking their names in the side panel.
- In PCoA plots, samples and groups of samples can now be renamed by clicking their names in the side panel.
- In Alpha diversity plots, the look of each line (representing a sample) can now be configured based on the associated metadata.
- Alpha diversity plots now include a legend.
- In resistance gene databases, the metadata associated to each gene can now be viewed and edited in the table view.
- When a SNP tree is built based on input with no SNPs detected between three or more samples, a warning is now issued.
Bug fixes
Changes
- The Type A Single Species workflow workflow has been renamed to Type a Known Species.
- The Re-map Samples to Specified Reference workflow has been renamed to Map to Specified Reference.
- The Type Among Multiple Species and Type a Known Species workflows will by default check for low quality and contamination.
- The Type Among Multiple Species and Type a Known Species workflows now outputs the best matching reference in the supplied reference database, not just the best matching reference in the database with an associated MLST type.
- All ready-to-use workflows have been moved to dedicated workflow folders in the Microbial Genomics Module folder in the toolbox.
- The Alpha Diversity tool now outputs a plot for each selected distance measure, not a single report containing all plots.
CLC Microbial Genomics Module 1.3.1
Released on May 10, 2016
Bug fixes
- Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
- Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.
Improvements
- Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
- Added GPU requirements to the System Requirements for viewing PCoA 3D plots.
CLC Microbial Genomics Module 1.3
Released on March 31, 2016
Bug fixes
- Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.
Improvements
- Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
- The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.
- Improved error messages and warnings in the wizard for the Build Functional Profile tool.
CLC Microbial Genomics Module 1.2.2
Released on May 10, 2016
Bug fixes
- Added a report output to the Add to Result Metadata Table tool. Please make sure to add this output to all workflows you run on a CLC Genomics Server setup to make them run through without errors.
- Fixed a bug that caused result metadata tables to not be properly saved when they were updated as part of running a workflow.
- Adapted the “Download Bacterial Genomes from NCBI” tool to a new format in a file downloaded from NCBI.
Improvements
- Rewrote a misleading error message that appeared when the Download OTU Reference Database tool was not able to contact the online QIAGEN ressources.
- Added GPU requirements to the System Requirements for viewing PCoA 3D plots.
CLC Microbial Genomics Module 1.2.1
Released on March 31, 2016
Bug fixes
- Fixed a bug in the De Novo Assemble Metagenome tool that caused excessive memory usage when using multiple input files.
Improvements
- Improved FeatureIDs in experiments generated using the “Convert Abundance Table to Experiment” tool.
- The name of the annotation column in experiments generated using the “Convert Abundance Table to Experiment” tool now depends on the type of the abundance table.
CLC Microbial Genomics Module 1.2
Released on February 29, 2016
New features
- Functional profiling of whole metagenome datasets based on Pfam domains, GO terms and BLAST hits
- Whole metagenome de novo assembler
- Annotation of CDS with Pfam domains and GO terms
- Annotation of CDS with Best BLAST hits using predefined or custom databases
Improvements
- Swapped the Trim Sequences tool and the Optional Merge Paired Reads tool in the Data QC and OTU Clustering ready-to-use workflow in order to merge more identical amplicon reads. This may result in different results in some analysis.
- Improved the tolerance of the Download Bacteria Genomes from NCBI tool towards unstable FTP connections with NCBI.
- Enabled graphical export of Bar Chart, Area Chart, Sunburst Chart and PCoA Chart vizualisation of abundance tables.
- Added legends to Bar Chart and Area Chart vizualisations of abundance tables.
- Improved the speed and compute ressource requirements of the OTU Clustering tool.
- The OTU Clustering tool now reverse-complements reference OTUs when most reads map in the reverse strand.
- Improved the length of the trimmed reads output by the Fixed Length Trimming tool on datasets with a large read length standard deviation.
- The OTU Clustering tool now produces a summary report that can be used to evaluate the quality of the input data and the OTU clustering.
- The Optional Merge Paired Reads tool now produces a summary report.
- The Fixed Length Trimming tool now produces a summary report.
- Activated links to the manual from ready-to-use workflow wizards.
- Updated the UNITE database that is downloaded by the Download OTU Reference Database to the latest version
Bug fixes
- Adapted the Download Bacteria Genomes from NCBI tool to a new structure of the NCBI ftp site.
- Fixed a bug in the Fixed Length Trimming tool that caused a wrong automatic length calculation when run on inputs with a very large number of reads.
- Fixed a bug in the Fixed Length Trimming tool, the Optional Merge Paired Reads tool and the Filter Samples Based on Number of Reads tool that caused the history entries of output from these tools to be inconsistent.
Changes
- Placed all tools in the Microbial Genomics Module into a single folder in the toolbox with subfolders ‘OTU Clustering’, ‘Typing and Epidemiology’, ‘Whole Metagenome Analysis’ and ‘General Tools’.
CLC Microbial Genomics Module 1.1
Released on October 15, 2015
New features
- Determination of MLST for NGS samples
- Identification of antimicrobial resistance genes
- Construction of SNP trees from NGS reads
- SNP tree variants differentiating between two sub-trees can be displayed easily
- Construction of K-mer trees from genomes and NGS samples
- Access sample metadata and analysis results in a table
- Metadata is automatically transferred to SNP trees and K-mer trees
- Three template workflows provided for routine typing
Improvements
- Added help buttons in all editors
- The Format Reference Database tool was improved to handle malformed input better
- Improved parameter descriptions and mouse-over texts in several places
Bug fixes
- Fixed a bug preventing usage of metadata with only 2 values in the Permanova and Convert to Experiment wizards
- Fixed a bug that caused all csv-files imported to the workbench to be imported as OTU abundance tables. Chimera crossover cost parameter in OTU clustering now only takes integer values
- Added a check to prevent the user from running “Reference based OTU clustering” without a “OTU database”
Changes
- The Estimate Alpha and Beta Diversities workflow no longer outputs an alignment as it was not of any use for the user.