Latest improvements for QIAGEN CLC Genomics Workbench
QIAGEN CLC Genomics Workbench 24.0.2
Release date: 2024-08-22
Improvements
- The SAM, BAM and CRAM Mapping Files importer can import files containing paired reads with equivalent read groups that have different read group IDs.
- BLAST searches against the core_nt database can be launched using BLAST at NCBI.
- Import Tracks from File has been updated to also support version 100 of the COSMIC variation database.
- Calculations using Whole Genome Coverage Analysis now include the contribution of long reads (>= 5000 bp).
- Various minor improvements
Bug fixes
- Fixed an issue where sequences could not be retrieved from a BLAST result using the right-click option "Extract and Open" when the BLAST database being referred to was located in a CLC Server BLAST database location.
- Fixed the following issues affecting Differential Expression in Two Groups and Differential Expression for RNA-Seq when the "Downweight outliers" option was enabled and the experimental design was unbalanced in the following ways:
- When one group contained only a small fraction of the total number of samples, and that group behaved differently to others in the experiment, that group could be downweighted such that it was effectively excluded from the fitting process. This led to inaccurate differential expression results for comparisons involving that group.
- When one group contained just a single replicate, the analysis could fail with an error.
- Additionally, small adjustments have also been made to the "Downweight outliers" option, which may slightly improve its performance in larger experimental designs where no outliers are present.
- Fixed an issue causing the Workbench to freeze when running Normalize or Transform if the input to the tool was open in the View Area.
- Fixed an issue where the Workbench would freeze if settings in the Expression Browser Side Panel "Expression values" or "Grouping" palettes were repeatedly updated.
- Fixed an issue that could cause Low Frequency Variant Detection to run out of memory on RNA-Seq data when the data was being processed on a system with many cores.
- Fixed an issue where the header row in the Summary statistics table was missing in PDF exports of Map Reads to Reference reports.
- Fixed an issue where the action buttons for accepting or rejecting a certificate in the "Untrusted Certificate" dialog were not readily accessible.
- Fixed an issue where an error arose if the only element in a folder was deleted at the same time that new elements were being dropped into that folder.
- Various minor bug fixes
Reference data related
- Effective June 26, 2024, Download Pfam Database downloads Pfam 37.0. This update also affects downloads using earlier versions of the CLC Genomics Workbench.
Changes
The Java version bundled with CLC Genomics Workbench 24.0.2 is Java 17.0.12, where we use the JRE from the Azul OpenJDK builds.
Advanced notice
- The Sequence Representation right-click menu option in the Workbench Navigation Area is planned for removal in the 25.0 release. Functionality to base sequence display names on accession, latin names or common names will remain available in the Preference settings and in the Sequence settings palette in the Side Panel of relevant data elements.
- Plugin note: The older tools distributed in the SignalP and TMHMM plugin, Signal Peptide Prediction (SignalP 4.1) and Transmembrane Helix Prediction (TMHMM 2.0), will not be included in the next major release of this plugin. No changes are planned with regards to distribution of the newer tools, Signal Peptide Prediction (SignalP 6.0) and Transmembrane Helix Prediction (DeepTMHMM).
QIAGEN CLC Genomics Workbench 24.0.1
Release date: 2024-03-12
Improvements
- Font related improvements affecting Windows setups, and particularly those with multiple screens connected, where one of them is high resolution.
- Axis ranges can be adjusted in the Volcano plot view of Statistical Comparison Table and Tracks.
- The SAM, BAM and CRAM Mapping Files importer can be used in workflows, either using the dedicated import element or using on-the-fly import.
- The Element Bio, MGI/BGI, PacBio Onso, and Singular importers accept FASTQ files with read numbers indicated using either _R1/_R2 or _1 /_2 in the file names.
- The Element Bio, MGI/BGI, PacBio Onso, and Singular importers now skip problematic FASTQ files. Previously the import of all the files would fail if a single problematic file was encountered.
- In the template workflow Identify DNA Germline Variants, only the graphical report from QC for Sequencing Reads is used as input to Create Sample Report. Previously, the Supplementary Report was also provided as input, but information from that report was redundant.
- The template workflow RNA-Seq and Differential Gene Expression Analysis has been updated:
- It is now simpler to select reference data from any CLC File Location.
- Combine Reports has been added.
- More of results are written to subfolders.
- The License Manager Settings dialog now makes clear that network licenses cannot be borrowed when a custom username is configured for connecting to a CLC Network License Manager.
- Various minor improvements
Bug fixes
- Fixed an issue where the Maximum Likelihood Phylogeny tool produced incorrect branch lengths, which could result in an inaccurate tree representation. This issue did not impact the topology of the tree.
- Fixed an issue where the Maximum Likelihood Phylogeny tool for protein alignments would return the initial tree rather than the maximum likelihood tree for certain protein substitution models.
- Fixed an issue that caused BLAST at NCBI to search the Prokaryota nt database when the Viruses nt database had been selected.
- Fixed an issue that caused Copy Number Variant Detection (CNVs) to fail with the error "Controls not compatible with target regions" when there were targets on a circular chromosome that was not the last chromosome analyzed.
- Fixed an issue that caused Amino Acid Changes to incorrectly report that some amino acid codons overlapped more than one variant. This happened when two adjacent amino acid codons each overlapped one variant, and the variants were within two base pairs of each other.
- Fixed an issue causing Create Sample Report and Combine Reports to sometimes fail when run using both a QC graphical report and QC supplementary report as input.
- Fixed an issue causing the Expression Browser plot to incorrectly draw the connection between pairs of groups where a gene is differentially expressed and one of the groups contained only one sample.
- Fixed an issue where the name of a sequence list created using the MGI/BGI importer with the "Join lanes" option checked would not include the sample ID if the sample ID in the FASTQ file name was placed after the lane information.
- Fixed an issue where adding a sequence to some stand-alone read mappings containing Sanger sequence data could lead to an error when Compactness was set to 'Not compact' and the trace data was visible in the mapping.
- Various minor bug fixes
Reference data related
Under the Download Genomes tab of the Reference Data Manager, the reference sequence "Homo sapiens - hg38_no_alt_analysis_set" is available along with associated variant data:
- Dbsnp (common) variants - dbSNP common version 151
- Dbsnp variants - dbSNP version 156
- Clinical associated variants - the most recent monthly release available
Changes
The Java version bundled with CLC Genomics Workbench 24.0.1 is Java 17.0.9, where we use the JRE from the Azul OpenJDK builds.
Plugin notes
Changes in CLC Genomics Workbench 24.0.1 address issues affecting functionality provided by the CLC Microbial Genomics Module:
- Fixed an issue where opening the Sunburst view of an abundance table using an Intel-based Mac caused the Workbench to shut down.
Advanced notice
The Sequence Representation functionality, allowing the display name for sequence elements in the Navigation Area to based on the accession, latin name or common name of that sequence has been marked as legacy functionality. Setting the display name for sequence elements in the Navigation Area to the accession, latin name or common name in a sequence record using the Sequence Representation functionality, via a right-click menu or in Preference settings, will be retired in a future release.
QIAGEN CLC Genomics Workbench 24.0
Release date: 2024-01-09
New features and improvements
Summary reports
Create Sample Report and Combine Reports have been substantially improved.
Report Content
- A summary section is included, providing an overview of potentially low quality samples.
- More sections from reports generated by Trim Reads and QC for Targeted Sequencing are included.
- Reports generated by Homology Based Cloning are supported.
- Improved content layout in sample reports.
Configuration
- Content is customizable. Sections to include and their order can be specified.
- Quality assessment criteria in Create Sample Report can be assigned traffic light colors. Combine Reports uses this information, providing a way to quickly assess overall sample quality in combined sample reports.
- Create Sample Report has support for additional quality control summary items for reports produced by QC for Sequencing Reads, QC for Read Mapping, QC for Targeted Sequencing, and RNA-Seq Analysis.
- Configurations can be re-used in future runs of Create Sample Report and Combine Reports.
- Modify Report Type A tool for changing the type of a report, affecting where the contents of that report will be included in a sample or combined report.
- Sample names to use in sample reports are configurable.
Changes
These improvements have resulted in the following changes relative to earlier versions:
- When reports that are not supported by Create Sample Report and Combine Reports are provided as input, these tools will fail. Previously, the tools would run, but would ignore the unsupported reports. In practice, this change is most likely to be noticed in the context of workflows where output channels for unsupported reports have been connected to the input channel of a Create Sample Report or Combine Reports workflow element. Unsupported reports cannot be entered as input in the tools launch wizards.
- Sample reports and combined reports are no longer supported as input to Create Sample Report.
- Information from reports from Map Reads to Reference, Map Reads to Contigs and Map Bisulfite Reads now have their own sections. Previously that information was included in the "Read mapping summary" section.
- Renaming within sample reports and combined reports
- "Methylation levels" is now "Call methylation levels".
- "Duplicated mapped reads" is now "Remove duplicate mapped reads".
- "Variants" is now "Create variant track statistics report".
- "QC summary" is now "Quality control" (combined reports only).
Due to these updates, sample reports and combined reports created in earlier versions of the software should not be used as input to Combine Reports in CLC Genomics Workbench 24.0 or above.
Workflows
- QIAseq Panel Analysis Assistant Provides access to workflows for analyzing data generated using QIAseq panels and kits, as well as associated functionality, such as downloading reference data and creating customized copies of workflows.
- Two new control flow elements:
- Branch on Sequence Count - used to control the downstream processing of a sequence list depending on the number of sequences in that list.
- Branch on Sample Quality - used to control the downstream processing of any data element based on quality criteria available in sample reports.
- The workflow build id is included in the Workflow details section of the History view for data elements generated using installed workflows. Previously only the workflow name and workflow version were reported.
- Workflow inputs can be preconfigured with files stored on AWS S3. This is of particular relevance when using reference data stored on AWS S3.
- Options have been added to jump to the workflow element at the source or destination of a connection between elements.
Import and export
New import and export functionality
- Element Bio Imports fastq format files produced by Element Biosciences.
- PacBio Onso Imports fastq format files produced by PacBio Onso.
- Singular Imports fastq format files produced by Singular Genomics.
- Ultima Imports CRAM format files produced by Ultima Genomics.
- SAM/BAM/CRAM Mapping Files CRAM import functionality added to the earlier SAM/BAM Mapping Files importer.
- Read mappings can be exported to CRAM format.
- Data in public AWS S3 buckets can be accessed.
Other import and export improvements
- All importers of fastq format files now annotate sequences with UMIs if UMI information is detected in read headers.
- The llumina importer supports fastq format files with more than 2 billion reads. Such files are imported into multiple, smaller sequence lists.
- The MGI/BGI importer is more flexible in the ways it can determine the files to pair together when importing paired reads.
- The MGI/BGI importer supports joining lanes.
- SAM and BAM files can be imported from AWS S3 buckets.
- Drag-and-drop can be used for selecting files in import tools.
- When exporting heterozygous insertions or deletions to VCF as symbolic alleles, Export VCF no longer creates a non-symbolic VCF line for the reference allele.
- Export history to PDF includes information about the workflow that produced the data element.
- Non-CLC format files can be directly saved to disk from the Navigation Area using the "Save to disk..." option in the right-click menu or by dragging and dropping from the Navigation Area to a file browser.
- Non-CLC format files in a CLC File Location can be opened in a relevant program by dragging them from the Navigation Area to a program icon in a toolbar, or similar, on systems that support this. This adds to the existing functionality where double clicking on such files in the Navigation Area leads these to be opened in a relevant program on systems that support this.
Usability
- A View Settings menu has replaced the "Save View" functionality at the bottom of Side Panels of elements open for viewing. The updated functionality makes it simple to save, apply and manage view settings for elements and element types.
- The font size of the Navigation Area, Toolbox tab and Favorites tab can be increased and decreased.
- A new Track List can be created by selecting track elements in the Navigation Area and dragging them onto a track based on a compatible reference genome that is open in Track view.
- Search functionality is available in the Reference Data Manager.
Table related
- Column order can be adjusted by moving the corresponding column names up or down in the Side Panel. Previously re-ordering could be done only by dragging a column to the desired location within the table view.
- The column order for most table types can be saved and applied as view settings. Previously, column order could be adjusted when viewing tables, but the revised order could not be saved for later use or for use with other tables of the same type.
- Sets of criteria used to filter tables can be saved as Filter Sets. These can be reapplied to other tables in a single click, or exported and imported, for sharing purposes.
- When exporting an element with no contents to Excel format (.xlsx, .xls), the sheet created contains column headers. Previously, column headers were not included in this case.
- Expression data tables containing more than 1 million rows can now be sorted.
BLAST
- The list of databases available using BLAST at NCBI has been expanded, including the addition of the experimental taxonomic nt databases 'Eukaroyta nt (nt_euk)’, ‘Prokaryota (bacteria and archaea) nt (nt_prok)' and 'Viruses nt (nt_viruses)'.
- Spaces in the names of BLAST database locations and folders in the path are supported.
Reference data related
The following improvements refer to data available via the Reference Data Manager:
- In Download Genomes, “Dbsnp variants” for Homo sapiens hg19 and hg38 has been updated from dbSNP version 151 to 156.
- Under the QIAGEN Sets tab:
- The following Reference Data Elements have been added:
- Version refseq_GRCh38.p14_no_alt_analysis_set of RefSeq Genes, CDS and mRNA elements.
- Version 20231112_hg38_no_alt_analysis_set for Clinvar.
- Version 20231009_hg38_no_alt_analysis_set for Gene Ontology.
- Version dbsnp_common_v151_ucsc_hg38_no_alt_analysis_set for dbSNP Common. Note that this contains variants on the alt contigs, where the earlier versions 151_refseq_hg38_no_alt_analysis_set, which was based on NCBI's dbSNP Common, and 151_ensembl_hg38_no_alt_analysis_set did not.
- Version dbsnp_common_v151_ucsc_hg19 for dbSNP Common.
- Reference Data Sets referring to Reference Data Elements listed above have been updated to refer to the newly added versions.
- The RNA trim adapter list included in multimodal Reference Data Sets has been updated.
- Items under the Previous Reference Data Sets tab and Previous Reference Elements tab from earlier releases are no longer available for download. Data that has already been downloaded is not affected. It will still be listed and can still be deleted using functionality in the Reference Data Manager.
- The following Reference Data Elements have been added:
Other new features and improvements
- Custom color gradients can be defined, including specifying the type and number of boundaries in the gradient, and colors to use at those boundaries.
- The Volcano plot view of Statistical Comparison Table and Tracks supports interactive customization for generation of publication-ready figures, with features' coloring determined by p-values and fold changes, including fading of non-significant features and flexible color gradient definition.
- Reads can be extracted from read mappings based on the orientation they were mapped in using Extract Reads, Create Reads Track from Selection and Extract from Selection.
- Reads can be extracted from read mappings as broken pairs when only one read of a pair matches the extraction criteria using Extract from Selection, bringing this tool's option in line with the options available in Extract Reads and Create Reads Track from Selection. The organization of the options in the Extract from Selection wizard has been updated accordingly.
- The wizard layout and options for Filter on Custom Criteria have been improved.
- Filter criteria configured in Filter on Custom Criteria can be re-used in future runs of the tool.
- Annotate with Nearby Information, previously named Annotate with Nearby Gene Information, can use any annotation track for annotating input annotation tracks.
- Substantial speed improvements for Detect and Refine Fusion Genes when the "Detect with novel exon boundaries" option is enabled and the reference sequence contains thousands of chromosomes.
- In variant tracks, replacements consisting of a combination of deletions and SNVs show the SNVs aligned to the right and the deletions aligned to the left, consistent with the general representation of SNVs and deletions in read mapping tracks. Previously, such replacements in variant tracks had SNVs aligned to the left and deletions aligned to the right.
- The names of outputs generated by Homology Based Cloning now contain the name of each of the sequences included in the cloning reaction.
- Improvements to the placement of amino acids in the Amino Acid tracks produced by Amino Acid Changes.
- A warning is shown when read mappings containing Oxford Nanopore or PacBio long reads are provided as input to Fixed Ploidy Variant Detection, Low Frequency Variant Detection or Basic Variant Detection. These tools are not recommended for use with such data.
- Read mapping tracks containing long reads (>10kbp) load faster and are more responsive. This update can affect the order reads are presented compared to when opened using earlier versions of the software.
- Speed improvements when working with large numbers of chromosomes as references (e.g. hundreds of thousands). Examples of tools affected include Convert to Tracks, Create Mapping Graph, and Identify Graph Threshold Areas.
- Outlier calculation has been improved for Combine Reports to be insensitive to rounding. This can lead to existing combined reports that contain box plots and outliers to have fewer outliers.
- Large reports open faster.
- When connected to a CLC Server, subfolders of CLC Server File System Locations that you have access to are listed at the top of the containing folder in the Workbench Navigation Area. Previously subfolders were not ordered according to access level.
- The history for elements created using an external application includes the version of the external application used.
- CLC File Locations can be removed and re-indexed when the Workbench is being run in Viewing Mode.
- Fixed a rare issue that would result in an error dialog being shown during a drag and drop operation in the Navigation Area.
- New policy property: 'run_on_workbench_when_server_is_available' When set to 'deny', server-enabled tools, and workflows, cannot be run locally on the CLC Genomics Workbench when it is connected to a CLC Genomics Server. The default is set to 'allow'.
- Various minor improvements
Bug fixes
- Fixed an issue that caused Annotate with Repeat and Homopolymer Information to fail when annotating variants in the second to last position on a chromosome.
- Fixed an issue that caused Annotate with Repeat and Homopolymer Information to not annotate variants, when a homopolymer or repeat spanned the origin of a circular reference sequence.
- Fixed an issue in the QC for Targeted Sequencing report section "Minimum coverage of target regions positions", where the reported percentage of positions with a certain coverage or higher only included positions where coverage was greater than threshold values. Now positions that have a coverage that is equal to or greater than threshold values are included.
- Fixed a rare issue that caused Fixed Ploidy Variant Detection and Low Frequency Variant Detection to incorrectly assign variants that should have been heterozygous as homozygous. This happened when the quality scores for the different nucleotides at a variant position had non-overlapping distributions.
- Fixed an issue where references that contained * and/or = within their name were skipped when importing SAM or BAM mapping files.
- Fixed an issue where checking the "Create subfolders per batch unit" option in the MGI/BGI importer had no effect.
- Fixed an issue that could cause VCF export to fail when exporting fusion tracks containing fusions that were not annotated as PASS in the filter column.
- Fixed an issue where paired reads with unaligned ends overlapping within an insertion were not shown in different colors for forward and reverse reads after the option to show the strands was selected in the Side Panel view settings.
- Fixed an issue where the axis scale range for plots in reports exported to PDF format could sometimes differ from the range seen when viewing that plot in a CLC Workbench.
- Fixed an issue that resulted in some box plots from combined reports not being included in reports when exported to PDF.
- Fixed an issue where infinite values were included in report plots when the report was exported to pdf format.
- Fixed an issue where providing incomplete metadata (e.g. a column name missing) when launching a workflow containing an Iterate element would lead to the analysis stalling, instead of failing with an error.
- Sorting Local Search results based on sized sorts in numerical order. Previously, the sorting was alphabetical.
- Fixed an issue in Download BLAST Databases that caused part of the description of the available databases to be hidden in the launch wizard.
- Fixed an issue affecting the Illumina and MGI/BGI importers where, when all the read files provided as input were compressed with zip, the "Paired reads" option was disabled.
- Fixed an issue in the Workflow Manager where multiple workflow installer files (.cpw) could be selected at the same time. Selection is now restricted to a single workflow file per installation action.
- Fixed an issue where retired Reference Data Elements could be listed under the QIAGEN Sets tab of the Reference Data Manager even though they were no longer available for download.
- Various minor bug fixes
Changes
- The following tool names have been updated to better reflect their functionality:
- The SAM/BAM Mapping Files importer is now called SAM/BAM/CRAM Mapping Files.
- The PacBio importer is now called PacBio Long Reads.
- Annotate with Nearby Gene Information is now called Annotate with Nearby Information.
- The following tools have been moved in the Toolbox:
- Create Sample Report is under Utility Tools | Reports. It was previously under Quality Control.
- Combine Reports is under Utility Tools | Reports. It was previously under Quality Control.
- Annotate with Nearby Information is under Utility Tools | Annotate and Filter. It was previously under Epigenomics Analysis.
- De Novo Assembly no longer accepts PacBio and PacBio HIFI long reads. For de novo assembly of long reads, use tools from the Long Read Support plugin.
- Map Reads to Reference no longer uses a specialised mapping algorithm when mapping PacBio reads. For this data type, we recommend using Map Long Reads to Reference, available from the Long Read Support plugin.
- The SRA blast database can no longer be used with BLAST at NCBI, because NCBI no longer supports blasting against that database via their API.
- BLAST has been upgraded to BLAST+ 2.14.0. BLAST+ changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The SRA toolkit has been updated to version 3.0.2.
- The Java version bundled with CLC Genomics Workbench 24.0 is Java 17.0.8.1, where we use the JRE from the Azul OpenJDK builds.
- Dedicated installers for Intel and ARM-based mac systems are available.
Functionality retirement
- The options "Minimum read count fusion gene table" and "Create fusion gene table" have been removed from RNA-Seq Analysis. For fusion detection, we recommend Detect and Refine Fusion Genes.
The following tools have been retired:
- QIAGEN GeneReader importer (Legacy)
General information
Important: For network licenses, CLC Network License Manager 5.5.3 or above when available, is needed. See the latest improvements page for release notes. Upgrading the software is described in the manual. Get the CLC Network License Manager installers for the latest version via the website. The version running on a system can be found in the VERSION.txt file in the installation folder of the CLC Network License Manager.
Plugin notes
This section includes information not included elsewhere. Please see the dedicated plugin latest improvements pages for details about improvements to other plugins. Links to plugin latest improvements pages are provided on plugin product webpages.
- Tools for analyzing long reads are available from the Long Read Support plugin.
- Various improvements to the Navigation Tools plugin, including bookmarking items in CLC Server File System Locations, renaming bookmarks, and easier navigation to bookmarked items.
- Tools delivered by the Vector NTI import plugin are now legacy. This plugin will be retired in future.
Advanced notice
The Sequence Representation right-click menu option in the Workbench Navigation Area is planned for removal in the 25.0 release. Functionality to base sequence display names on accession, latin names or common names will remain available in the Preference settings and in the Sequence settings palette in the Side Panel of relevant data elements.