Server specific
New features and improvements
Reorganization of web administrative interface
The top level tabs in the CLC Genomics Server web administrative interface have been renamed and the contents under them re-organized. From left to right in the interface:
- Element info – Functionality for working with with data stored in CLC Server Locations, including for setting group-level permissions on folders.
- Configuration – Configuration functionality, including for setting up CLC Server Locations, configuring the server setup, configuring settings for external data, configuring user authentication, etc.
- Management – Functionality for managing the CLC Server, including downloading licenses, stopping, restarting and putting the server in maintenance mode, as well as access to the queue, audit log, etc.
- Extensions – Configuration relating to extending the functionality of the CLC Server, such as downloading and managing plugins, and configuring External Applications.
New functionality in the web administrative interface
- Group-level permissions on CLC Server File System Locations and their contents can be configured via the web administrative interface. Previously, configuration could be done only using a CLC Workbench client.
- Recycle bins can be emptied via the web interface. When logged in as an administrative user, any individual recycle bin can be emptied, or all recycle bins in a FIle System Location can be emptied in a single action.
Other new features
-
Access to Amazon S3 and BaseSpace
- Admin level access can be granted to specified groups for installing and configuring workflows, and configuring and enabling external applications via the web administrative interface.
External applications improvements
- External Applications can now be configured with a failure strategy.
- The audit log entries for external applications now include the native process command line and the exit value.
- Improvements to error handling when external applications fail, when run directly as a tool or within a workflow context, including that any results files produced despite the job failure will be posted. In such cases, the std out and error files can contain valuable troubleshooting information.
- For external applications in workflows, the final, substituted command line executed by the external application and the native process exit code are posted to the workflow log. In addition, every error contained in the server result (thus also all failures triggered), is posted to the workflow log. Previously, only the first error thrown to stop the process was posted.
- History information can optionally be added to CLC data elements created by external applications.
Other improvements
- Tomcat has been updated to version 9.0.46. Custom port and SSL settings will need to be reconfigured after this upgrade
- Improvements to the display of history information for data elements. The History tab is now under the top level Element info tab.
- The contents of tool and workflow logs, generated when an analysis are run, can be viewed directly under the Element info tab.
- The audit log shows the XML serialized version of any exception.
- Various minor improvements
Bug fixes
- Fixed an issue that could cause the Main configuration tab of the web administrative interface to occasionally be blank after logging in.
- Fixed an issue where the maintenance mode banner, expected in the top right hand corner, was sometimes not shown when it should have been.
- Fixed an issue with the maintenance mode banner message where the reason for maintenance mode/restart was not correctly displayed.
- Fixed an issue where Find and Model Structure could not be run on a CLC Genomics Server.
- Various minor bugfixes
Functionality removal
The Genomics Analysis Portal client is no longer provided in the CLC Genomics Server distribution.
Changes
Shared with CLC Workbenches
New features and improvements
The new Sequence Lists folder under Toolbox | Utility Tools contains tools for working with sequence lists. This includes existing tools, with new names and expanded functionality, as well as new tools:
- Split Sequence List New tool: Splits up nucleotide or peptide sequence lists. The output can be a specified number of lists, lists containing a specified number of sequences, or lists containing sequences with particular attribute values, such as terms in the description.
- Update Sequence Attributes in Lists New tool: Updates and adds information about the sequences in a list. For example, descriptions can be updated, or new information types can be added based on information provided in an Excel file.
- Create Sequence List Existing tool. Create new sequence lists from sequence elements and/or sequence list elements. Previously available only from the File | New menu.
Other new functionality
- MGI/BGI importer An importer for MGI/BGI fastq format files.
- Rename Sequences in Lists Rename sequences within sequence lists by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- Rename Elements Rename elements by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- A Heat map graphics exporter has been introduced for exporting heat maps to graphics file formats.
- Files containing tab separated values (.tsv) can be imported as tables using Standard Import.
- Export VDJ tools Exports T-Cell VDJ repertoire in txt format.
RNA-Seq and Expression Analysis improvements
Demultiplex Reads
- Demultiplex Reads now supports setting barcodes from table elements in addition to importing barcodes from local files.
- The barcode import table format has been extended to support additional columns.
- When multiple elements are provided as input, the information in the Preview pane includes information obtained from across these. Previously, only the first input element was used for this.
- BLAST has been upgraded to BLAST+ 2.12.0 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.12.0 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The list of databases available using BLAST at NCBI has been expanded, including the addition of ‘16S ribosomal RNA sequences (Bacteria and Archea)’ and ‘28S ribosomal RNA sequences from Fungi type and reference material (LSU)’.
- When BLAST at NCBI is used with multiple query sequences, the job will continue even if particular sequences fail due to a problem. Results for successful searches (including those with no hits) are returned. Sequences missing from the results due to problems are recorded in the job log.
- Searches against the Patented protein sequences database using BLAST at NCBI work once again. Previously, these searches always failed, with a dialog message saying only that no hits were found even though an error was returned by the NCBI. For affected searches, the error was reported in the job log.
- Fixed an issue affecting BLAST HSP Tables where the calculation of percent overlap between blast hits in reverse direction and query sequence was based on a sequence length that was 2 base pairs two short leading to incorrect values.
- Improvements have been made to make it less likely that a “CPU usage limit was exceeded” error will be returned when running blastp, blastx, tblastn or tblastx using BLAST at NCBI.
Importer and exporter improvements
- Multiple tables can be exported to a single file when using the following exporters: Tab delimited text, Annotation tab delimited text, Table CSV, Annotation CSV.
- A new custom reads option was added to the Illumina importer. The extended options for fastq file import has been added to support 10X data, it is e.g. now possible to import three fastq files with R1, R2, and I1 as paired reads where I1 is added in front of R1.
- When exporting variant tracks to VCF format, variants that fall under thresholds to be exported can now optionally be excluded entirely from the resulting VCF file.
- When using the VCF export setting for complex variant representation “Reference overlap and depth estimate”, complex overlapping reference alleles are now exported with a homozygous reference genotype.
- The list of supported GVF attributes in column 9 has been expanded when importing GVF files using the GFF2/GTF/GVF track importer.
- 1000 Genomes annotations are now better supported by the GFF2/GTF/GVF track importer.
- The Zygosity field is now included when exporting to GVF format.
- A subset of columns to export can be specified when exporting Mapping Coverage data.
Other improvements
- Copy Number Variant Detection (CNVs) can use coverage tables generated by QC for Targeted Sequencing as control mappings. Read mappings can still be used as control mappings.
- Copy Number Variant Detection (CNVs) allows different fold-change thresholds for deletions and amplifications.
- When working with paired reads, Trim Reads allows the trimming of a fixed number of bases to apply to only read 1, only read 2, or both reads of each pair.
- An option has been added to Extract Reads or Create Reads Track from Selection to allow just one member of a pair to be extracted when only one meets the extraction criteria.
- Extract Reads accepts stand alone read mappings in addition to reads tracks as input.
- Create Sample Report can take both the Graphical and the Supplementary Report created by QC for Sequencing Reads as input.
- An option has been added to Amino Acid Changes for using one letter amino acid codes in HGVS annotations.
- Filter on Custom Criteria now accepts expression tracks as input.
- In Quantify miRNA the option to select strand-specific analysis has been removed. The analysis is now always strand-specific.
- Remove Duplicate Mapped Reads considers if reads are duplicates based on the start position of reads instead of both start and end. This allows reads that have undergone quality trimming to be recognised as duplicates.
- The distance to consider around an intron-exon boundary when using Predict Splice Site Effect can be specified. Previously a length of 2 was always used.
- Create Mapping Graph can now generate graphs for forward read coverage and reverse read coverage.
- The Sample Reads tool is now named Subsample Sequence List. Peptide sequence lists are now accepted by this tool, in addition to nucleotide sequence lists.
- When a Track List and the tracks it refers to are copied in a single operation, the new copy of the Track List will refer to the the new copies of the tracks. Previously, the new Track List continued to refer to the original tracks.
- For workflows with paired read import as part of the workflow run, and when the workflow is launched in batch mode, or contains Iterate elements, paired read handling is now the same as for the relevant NGS importer tools (Illumina, Fasta, Sanger) themselves, irrespective of how batch units are defined or organized. Previously when batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit, irrespective of whether the Paired option was checked.
- Memory usage when launching workflows in batch mode has been improved.
- Trim Sequences specifies which version of the UniVec database was used, both in the report and in the history of the trimmed sequences output.
- The few tools that directly manipulate input elements, instead of generating a new element containing the changes as output, now generate a new element as output when used within a workflow. This allows them to be handled like any other tool in a workflow context.
- In addition to sequence elements, Add attB Sites accepts sequence lists with fewer than 10,000 sequences as input.
- Internal compression of CLC data has been improved. Elements created with this version of the software, with compression enabled, can be opened in version 21.0.5 and higher. Data must be exported or saved as uncompressed if sharing data with earlier versions of the software.
- Various minor improvements
Bug fixes
- Fixed an issue in Create Box Plot where percentiles reported in the history of a box plot element were off by one. For example, the “25%-ile” value was given the 24th percentile value. The correct values were used in the plots themselves.
- Fixed an issue in Demultiplex Reads where dual barcodes were not allowed to have mismatches in both barcodes.
- Fixed an issue in Demultiplex Reads where dual barcodes could previously be selected in random combinations. Dual barcodes are now handled in pairs.
- When using the “Genome annotated with genes only” in RNA-Seq Analysis, the range of annotation track types that can be used has been expanded. This includes the use of CDS annotation tracks, among others.
- Fixed an issue in Create Sample Report where, when QC thresholds had been specified for Trim Reads, wrong values from the Trim Reads report were shown in table 1.1 Quality Control of the sample report.
- Fixed an issue that caused Create Sample Report to fail when input reports did not contain values for specified QC thresholds.
- Fixed an issue in Combine Reports and Create Sample Report where the “Mean coverage per target” section would report coverages 10x too high when including a report from QC for Targeted Sequencing.
- Fixed an issue in VCF export where, in rare cases, variants below a specified minimum allele fraction threshold were not removed.
- Fixed an issue affecting Local Realignment where large indels upstream of a target region were sometimes not used when provided as guidance variants.
- Fixed an issue that in rare cases could cause Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection to fail on very high coverage samples when the “Remove pyro-errors variants” option was enabled.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate paired reads with read-through correctly.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate reverse mapping single-end reads correctly.
- Fixed an issue affecting QC for Targeted Sequencing, where it failed with an error when an RNA-Seq read mapping containing paired reads was provided as input.
- Fixed an issue in Filter on Custom Criteria where numeric annotations were sometimes not allowed to be filtered using numerical operators such as “<“, “>”, “=”.
- Fixed an issue in Trio Analysis where, in rare cases, inconsistent zygosity between mother and father could lead to a wrong annotation of inheritance. Trio Analysis now reports inheritance as ‘Inconsistent zygosity’ if zygosity or the number of alleles is inconsistent between child, mother or father.
- Fixed an issue with VCF files exported from the CLC Genomics Workbench, where fusions that had one breakpoint in common were represented in a way that prevented QIAGEN Clinical Insight Interpret from displaying the counts.
- Fixed an issue causing Quantify miRNA to fail when there were empty entries in the Accession column of miRbase
- Fixed an issue where the names of outputs from Output elements attached directly to an Iterate element in workflows were not as intended when the metadata ({3} placeholder was used. We generally recommend that the specific input number(s) to include in output names are specified when configuring workflows that contain control flow elements.
- Fixed an issue where the content of the recycle bin was not shown correctly after the recycle bin had been emptied.
- Various bug fixes
Changes
- The Sample Reads tool is now named Subsample Sequence List and is located under the Utility Tools | Sequence Lists subfolder of the Toolbox. The functionality of this tool has been expanded. See the Improvements listing above, or refer to the manual.
- The Extract Annotations tool is now named Extract Annotated Regions.
- The tool Set Up Experiment is now named Set Up Microarray Experiment.
- The “Number of duplicates distribution” section has been removed from the report produced by Remove Duplicate Mapped Reads.
- When exporting BAM files, file names are limited to a maximum of 254 characters.
- Input modifying tools within workflows generate an output element instead of directly modifying the input provided. Workflows containing these tools may need to be edited.
The following tools are now legacy tools and will be retired in a future version of the software:
Functionality retirement
The following tools have been retired:
- Create Track from Experiment (legacy)
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
- Roche 454 NGxS import (legacy)
- Create Combined RNA-Seq Report (legacy)
- Remove Reference Variants (legacy)
Compatibility
The follow are the corresponding client applications for CLC Genomics Server 22.0
- CLC Genomics Workbench 22.0
- CLC Main Workbench 22.0
- CLC Command Line Tools 22.0
CLC Genomics Server 22.0 is compatible with GCE version 22.0.
Please see the CLC Genomics Server 22.0 listings above for the details about the new tools and features listed here.
- install_plugin_download_and_restart
- list_plugins_download
- list_installed_plugins
- empty_recycle_bins
- split_sequence_list
- update_seq_attrs_in_list
- rename_seqs_in_seq_list
- rename_elements
- ngs_import_mgi_bgi
- history_add
- amino_acid_changes
- option added: –one-letter-codon
- cnv_detection
- option added: –minimum-fold-change-amplification
- option added: –minimum-fold-change-deletion
- option removed: –minimum-fold-change-magnitude
- create_sequence_statistics
- option added: –extinction-coefficient
- differential_expression_rna_seq
- option added: –metadata-table-tsv
- extract_overlapping_reads
- option added: –only-matching-read-pair
- mapping_graph_tracks
- option added: –forward-read-coverage
- option added: –reverse-read-coverage
- ngs_import_illumina
- option added: –reads-options
- option added: –use-reads-options
- predict_splice_site
- option added: –splice-window-size
- process_tagged_sequences
- option added: –barcode-table-element
- quantify_small_rna
- option removed: –strand-specific
- rna_seq
- option added: –count-paired-reads-as-two
- option added: –ignore-broken-pairs
- option removed: –broken-pair-countingscheme
- changed: sample_reads
- trim
- option added: –first-read-trim
- option added: –second-read-trim
Export options added:
- heatmap_graphics
- immune_rept_vdj_tools
Import option added
- If options with identical names are found within an element of a workflow, each of the corresponding CLC Server Command Line Tools parameters will have a number appended to ensure uniqueness.
- Job status updates are now more frequent and the time needed for handling finished jobs has been reduced.
- empty_recycle_bin Replaced by new command empty_recycle_bins
- list_plugins See new commands for several commands relating to plugin administration
- create_combined_rnaseq_report
- experiment_to_track
- filter_reference_variants
- ngs_import_roche454
- small_rna_annotate
- small_rna_sampling
- Fixed an issue where find_structure_algo was not listed as an available tool when it should have been.