Latest improvements for QIAGEN CLC Genomics Server
QIAGEN CLC Genomics Server 25.0
Release date: 2024-12-03
Server specific
New features
Expanded support for encrypted communication and certificate management
- HTTPS communication between the CLC Genomics Server and its client software is supported out of the box using a self-signed certificate for new installations.
- HTTPS communication between a master CLC Genomics Server and its execution nodes is supported.
- Tools are available in the web administrative interface for:
- Configuring and monitoring communication status between a master and its execution nodes.
- Viewing and managing certificates.
- Updating the Tomcat configuration file.
- Updating self-signed certificates.
- Information about expired certificates and certificates within 30 days of expiring is available:
- In the new "Trusted certificates validity" section of the diagnostics report generated by the check setup tool.
- In the new Certificate Management tool. Expiry dates already past, or within 30 days of the present date are shown in red text.
Integrated support for analysis automation configuration
When logged into the web client as an administrative user, a new tab is present under Extensions called Analysis automation. When the Analysis Automation Server Plugin, available via QIAGEN Discovery Bioinformatics Services, is installed, this tab contains the settings needed to configure automated submission of analyses as sample data becomes available from a sequencing machine.
Improvements
- The organization of the Server settings area under the Job Processing tab has been improved.
- Create Pairwise Comparison and Proteolytic Cleavage are available on the CLC Genomics Server. Previously these could only be run on a CLC Workbench.
CLC Genomics Cloud related
The improvements in this section are relevant when the CLC Cloud Server Plugin is installed and access to an AWS account with CLC Genomics Cloud infrastructure has been configured.
- The organization of the CLC Genomics Cloud tab contents in the web administrative interface has been improved, and the EC2 instance type and AWS Batch queue name are now included.
- Separate drop-down menus for the AWS Connection and AWS Batch queue are presented when creating or editing cloud presets. Previously, there was a single drop-down list with each relevant combination.
- Under Global Permissions | Cloud presets, the AWS Connection name is included beside the cloud preset name.
Bug fixes
- Various minor bugfixes
Shared with CLC Workbenches
New tools and workflows
Long read handling
New tools and template workflows for analyzing long, next-generation sequencing reads:
- Import Oxford Nanopore Reads
- Map Long Reads to Reference
- Structural Variant Caller for Long Reads
- RNA-Seq Analysis for Long Reads
- De Novo Assemble Long Reads
- Polish Contigs with Reads
- De Novo Assemble Long Reads and Polish with Short Reads template workflow
The tools listed above were formerly distributed in the Long Read Support plugin. Improvements and bug fixes made relative to earlier releases:
- De Novo Assemble Long Reads
- Hifiasm has been updated to version 0.19.9. This third-party de novo assembler is used for assembling PacBio HiFi reads. Results may consequently differ compared with results from earlier versions.
- Raven has been updated to version 1.8.3. This third-party de novo assembler is used for assembling Oxford Nanopore or PacBio non-HiFi reads. Changes to polishing within the tool now lead to references with fewer insertions when compared to the reads.
- In the assembly graph output, hovering the mouse cursor over a contig reveals a tooltip containing the contig's name and length.
- Map Long Reads to Reference
- Reads that wrap around a chromosome more than once are now counted as Unmapped reads and reported as such in the report. Previously, these reads were silently ignored.
- In the stand-alone read mapping output, individual read mappings are now assigned the suffix '_mapping', instead of ' mapping'. This change improves compatibility with downstream tools.
- minimap2 has been upgraded to version 2.28. This third-party mapper is used by the following tools: Map Long Reads to Reference and RNA-Seq Analysis for Long Reads. Results may thus differ compared with results from earlier versions.
- For RNA-Seq Analysis for Long Reads, the log file includes the number of reads that wrap around a chromosome more than once. As in earlier versions, these reads count towards expression but are not included in the reads track output.
- Polish with Reads has been renamed to Polish Contigs with Reads to clarify its intended purpose.
- Fixed an issue causing Structural Variant Caller for Long Reads to fail when a stand-alone read mapping was provided as input.
Other new functionality
- Collapse Overlapping Annotations collapses overlapping annotations in an annotation track into a single annotation.
- Resize Annotations allows adjustment of the 5' and/or 3' positions of annotations in an annotation track.
- Remove Information from Track refines annotation, expression, statistical comparison, and variant tracks by keeping or removing selected information.
- Create Report from Table creates a report based on the content of the table view of a data element. The information to include in the report is configurable. This tool was formerly distributed in the CLC Microbial Genomics Module.
- Create Sample Level Heat Map for RNA-Seq creates a heat map of sample distances from RNA-seq data.
- Import Expression Data imports RNA-seq expression values from Excel, CSV or TSV files. This tool was formerly distributed in the Ingenuity Pathway Analysis plugin.
Improvements
Detect and Refine Fusion Genes
Updates to this tool are likely to result in differences in the identified fusions compared to results from earlier versions.
- New options allow filtering detected fusions using lists of genes and/or fusions. This can be useful for removing known false positives, or for restricting fusion detection to genes or fusions of interest.
- Fusions are now supported by fusion spanning reads in addition to fusion crossing reads. Fusion spanning reads are included in the p-value and Z-score calculations.
- All fusions identified in the detection step can be included in the fusion WT track output. This can be useful for:
- Investigating why certain fusions have been filtered away before the refinement step.
- Identifying possible fusions based on paired reads that are mapped as broken pairs, where the breakpoint location could not be determined.
- Exon skippings and novel exon boundaries are no longer detected by default.
- Options have been renamed and reorganized in various wizard steps to better reflect their functionality.
- The following options have been removed:
- Ignore fusions of overlapping genes on opposite strands. We recommend using the new filter feature.
- Only use fusion primer reads. We recommend filtering the reads using Filter on Custom Criteria prior to running Detect and Refine Fusion Genes.
- Maximum distance for broken pair fusions. This option had no impact on results.
- The naming patterns for the tool's outputs have been updated.
- The table view of fusion tracks has been improved:
- It includes an "IPA gene view" column containing links out to QIAGEN Ingenuity Pathway Analysis when these are available, providing additional information about the fusions.
- The columns have been renamed and rearranged.
- The default view has the most typically used columns visible, with other columns initially hidden.
- The Gene column has been removed.
- In the report, the "Discarded base breakpoints" column in the "Unaligned Ends" table has been renamed to "Discarded breakpoints".
Performance
- The runtimes of Copy Number Variant Detection (Targeted), QC for Targeted Sequencing, QC for Read Mapping and QC for Sequencing Reads have been substantially improved.
- The speed of searching and filtering variant tracks has been improved.
- The speed of Annotate with Exon Numbers, Annotate with Overlap Information, and Filter Based on Overlap has been improved.
- Maximum Likelihood Phylogeny uses less memory for large alignments than previously. Memory savings are greatest when aligning a few, long sequences.
Import
- The Tracks importer for GFF3 format files:
- Disallows mismatched chromosome lengths between the file and the provided reference.
- Supports all chromosome aliases as defined by UCSC.
- Uses the Sequence Ontology version 2024-06-05 for identifying gene-like and transcript-like annotation types.
- Includes pseudogenes in the (Gene) aggregated track.
- VCF import
- Supports all chromosome aliases as defined by UCSC.
- DUP:TANDEM symbolic alleles are imported as insertions in the variant track output.
- Three FASTQ header formats containing UMI information are now supported.
- When importing GenBank format files using Standard Import, ncRNA and rRNA annotations are named using information from one of the following qualifiers, considered in this order: "gene", "locus_tag", "product", "protein_id", "transcript_id", "note". Previously, they were always named using information from the "note" qualifier.
Reports
- Combine Reports and Create Sample Report include options for including the number and percentage of mapped and unmapped bases from Map Reads to Reference reports.
- The JSON exporter includes the passed/uncertain/failed status of quality conditions from sample and combined reports.
- QC for Sequencing Reads reports the percentage of reads with average quality higher than 20, 25, 30 and 35. The reported values can be used as QC thresholds in Create Sample Report.
- In the results report from Copy Number Variant Detection (Targeted), the genome and chromosome plots have been improved, including:
- Updates to axis labels.
- An improved color scheme.
- Coloring CNVs according to whether they are gains (red) or losses (blue).
- Values with decimals in the Trim Reads report are now always reported to 2 decimal places.
- The shade of red used in report plots has been adjusted to make it easier to discern from other colors.
Other new features and improvements
- Annotate with Repeat and Homopolymer Information
- The repeat and homopolymer detection has been improved. This may lead to differences in results compared to those from earlier versions.
- The reference sequence is tested for homopolymers and repeats both 5' and 3' of a variant. Previously, the reference sequence was only tested to the 3' side of a variant. When different homopolymers are found to the on both sides of a variant, information about the longest of these is kept. The same is true for repeats detected on both sides of a variant. When a homopolymer is found on one side of a variant and a repeat on the other, information about both are retained.
- Length and sequence information about homopolymers and repeats is added to annotations in the variant track output.
- The maximum number of mismatches allowed in a homopolymer/repeat can now be specified when launching the tool.
- QC for Targeted Sequencing
- Mappings containing long reads are processed efficiently.
- Information about coverage from broken pairs and non-specific reads is included in the per-region statistics track.
- Both mean and median coverage are included in the gene coverage track. Previously, only the mean was reported, but it was mislabelled as the median (see the Bug fixes section below).
- QC for Sequencing Reads can efficiently process long reads.
- Filter Based on Overlap includes new options for keeping or removing parts of annotations in an annotation track that are also present in other tracks. Existing options have been renamed to better reflect their functionality.
- Filter on Custom Criteria accepts Sequence Lists as input.
- Merge Annotation Tracks can merge tracks containing similar types of annotations, such as various gene or RNA types.
- Create Consensus Sequences from Variants
- The relevant IUPAC ambiguity code is inserted instead of N in the consensus sequence when multiple SNVs are present in the same position.
- Only the most frequent insertion in the consensus sequence is included for overlapping insertions. Previously, all the overlapping insertions would be added consecutively.
- The Motif Search tool
- Regular expressions that include the character "?" are now supported, allowing for lookahead expressions.
- The table output includes name and motif columns when motif lists are used as input, and a motif column when a single sequence is analyzed.
- A report can be output that shows the number of matches for the searched motifs.
- Multiple sequence alignments can be provided as input.
- Reads tracks
- An option has been added to the Track layout Side Panel palette for highlighting unaligned ends.
- Lines indicating insertions with a frequency below the threshold set in the Hide insertions below (%) Side Panel option have been made thinner and are hidden when zoomed out to a level where other types of variants are no longer shown.
- In the Volcano plot view of Statistical Comparison Table and Tracks, the default color for down-regulated features is now blue, and the default color for up-regulated features is red. Previously the default was red for down-regulated features and blue for up-regulated features.
- The table view of annotation tracks now includes the annotation type.
- Heat map elements have a table view containing the underlying values.
- Various minor improvements
Bug fixes
- Detect and Refine Fusion Genes
- Fixed an issue where, when multiple possible fusions had been detected for the same pair of genes, some fusions would be incorrectly reported as having no support. See the public notification about this issue.
- Fixed an issue where 0 fusion crossing reads were assigned to fusions with the breakpoint located after the last exon of the highest-priority transcript, or the first transcript if no priorities were available.
- Fixed an issue causing the tool to fail if the input mRNA track did not contain any features of type mRNA.
- QC for Targeted Sequencing
- Fixed an issue where the mean coverage in the gene coverage track was labelled as the median coverage.
- Fixed an issue affecting the coverage report and the per-region statistics track, where coverage on insertions from reads that ended inside the insertions was ignored.
- Fixed an issue affecting the coverage report, where coverage on insertions was counted twice on overlapping targets.
- Trim Reads
- Fixed an issue that could cause automatic read-through adapter trimming to give different results when input sequence lists were not provided in the same order. This could happen when the read-through sequence was not the same in individual sequence lists. Automatic read-through adapter trimming is now done individually on each input sequence list.
- Fixed an issue where consensus sequence calculated from all identified read-through sequences were reported instead of the sequence that was actually used for automatic read-through adapter trimming.
- Annotate with Repeat and Homopolymer Information
- Fixed an issue that caused the tool to fail when annotating variants in the second to last position on a chromosome.
- Fixed an issue where annotations were not added to variants located in a homopolymer or repeat region spanning the origin of a circular reference sequence.
Other bug fixes
- Fixed an issue causing the outer ends of very long unaligned ends to not be rendered in reads tracks. This would happen on Oxford Nanopore and PacBio long read mappings in regions where no reads were aligned to the reference.
- Fixed an issue for phylogenetic trees where the horizontal scale bar would change, even though 'Fixed width on zoom' was selected and the zoom only affected the tree vertically.
- Fixed an issue where the SAM/BAM/CRAM Mapping Files and Ultima importers allowed reference synonyms when importing CRAM files, causing the importers to fail. The importers no longer allow reference synonyms for CRAM files.
- Fixed an issue causing VCF import to fail when importing VCF files with symbolic alleles having SVLEN=0. These alleles are now imported to an annotation track and assigned length zero.
- Fixed an issue causing reads wrapping around circular reference sequences more than once to be counted as mapped reads, even though they are actually discarded. These reads are now included in the unmapped reads count in the reports from Map Reads to Reference, Map Bisulfite Reads to Reference, Map Reads to Contigs, and the SAM/BAM/CRAM Mapping Files importer.
- Track-based output from Map Reads to Contigs can no longer be selected when the option to update contigs has been enabled. Previously, a reads track could be produced but the contigs in it were not updated based on the new read mapping information.
- Fixed an issue causing Standard Import of GenBank format files to not support files where the SOURCE or ORGANISM field was immediately followed by the ORIGIN field.
- Fixed an issue affecting the output from Create K-medoids Clustering for RNA-Seq where line graph legends were not displayed when Cluster 1 contained more than 10 genes.
- Fixed in issue causing local BLAST jobs to fail when searching a database located on a windows file share.
- Fixed an issue with Search for Sequences at NCBI affecting searches using a list of terms separated using OR, commas or spaces. No results for terms found were returned if the list contained one or more terms that could not be found.
- A small improvement was made affecting Create Alignment, Assemble Sequences, and Assemble Sequences to Reference when lower alignment accuracy options are selected. This change is not expected to affect results for most analyses.
- Fixed an issue where a palette in the Side Panel would disappear if, when moving it, it was placed onto its existing location.
- Fixed an issue where an error arose if the only element in a folder was deleted at the same time that new elements were being dropped into that folder.
- Various bug fixes
Changes
Tool and settings
- Create Heat Map for RNA-Seq has been renamed to Create Feature Level Heat Map for RNA-Seq. Option names have been updated for consistency with Create Sample Level Heat Map for RNA-Seq.
- Copy Number Variant Detection (CNVs) has been renamed to Copy Number Variant Detection (Targeted).
- The Core nucleotide BLAST database (core_nt) is the default for blastn and tblastx searches using BLAST at NCBI. Previously the default was Nucleotide collection (nr/nt).
- The Illumina importer no longer supports .txt files.
- The column Linkage in variant tracks has been removed. That column was always empty.
- The Translocation and Total (Translocation) rows are no longer included in the "Variants table" in the report generated using InDels and Structural Variants. This row was always empty as translocations are not called by this tool.
Third party version updates
- The Java version bundled with CLC Genomics Server 25.0 is Java 21.0.4, where we use the JRE from the Azul OpenJDK builds.
- The hmmsearch programme used in Pfam domain search has been updated to version 3.4.
- Trim Sequences has been updated with build 10.1 of the UniVec database.
- The restriction site database REBASE has been updated to version 408.
- BLAST has been upgraded to BLAST+ 2.15.0. BLAST+ changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
Functionality retirement
- Remove Information from Variants. Remove Information from Track has replaced this tool.
Plugin Retirements
- Ingenuity Pathway Analysis Server Plugin Functionality from this integration plugin is now available via the Biomedical Genomics Analysis Server Plugin.
- Long Read Support Server Plugin Functionality from this plugin is now available directly in the CLC Genomics Server. See information in the New features and improvements section in the "Shared with Workbenches" section, above.
Advanced notice
Legacy tools
Legacy status tools will be retired in a future version of the software:
- Correct Long Reads (legacy). Note that this legacy tool is affected by the upgrade of the third party mapper, minimap2, to version 2.28.
Compatibility
The follow are the corresponding client applications for CLC Genomics Server 25.0:
-
-
- CLC Genomics Workbench 25.0
- CLC Main Workbench 25.0
- CLC Server Command Line Tools 25.0
-
CLC Server Command Line Tools
Please see the CLC Genomics Server 25.0 listings above for the details about the new tools and features listed here.
These are the draft release notes for CLC Server Command Line tools 25.0, due for release on December 3, 2024.
Installers for this product are available as "early access" via links at the bottom of this page. These products are not supported, and we recommend they are not used in production during the early access period.
New tools
Long read handling
- correct_long_reads
- long_de_novo_assembly
- long_read_mapping
- long_rna_seq
- long_structural_variant
- ngs_import_nanopore
- polish_with_reads
Utility tools
- collapse_overlap_annotations
- create_report_table
- remove_information_from_tracks
- resize_annotations
Other tools
- create_sample_heatmap_for_rnaseq
- create_pairwise_comparison
- import_expression_and_metadata
- proteolytic_cleavage
New and updated options for existing tools
- anno_with_repeat_and_homopoly_info
- option added: --max-mismatch
- detect_and_refine_fusion_genes
- options added
- --fusion-filter-action
- --fusion-filter-names
- --fusion-filter-tables
- --gene-filter-action
- --gene-filter-names
- --gene-filter-tracks
- --include-all-fusions
- --report-significant-breakpoints
- --require-both-genes
- options removed:
- --fusion-primer-reads
- --ignore-overlapping-opposite-strands
- --maximum-broken-pair-end-to-fusion-
- --skip-nonsignificant-breakpoints-
- options added
- install_workflow
- option added: -f, --force
- option added: -u, --update
- motif_search
- option added: --create-report
Commands removed
- remove_information_from_variants The functionality provided by that tool is now available via the new utility tool remove_information_from_tracks.