Latest improvements for QIAGEN CLC Genomics Workbench
QIAGEN CLC Genomics Workbench 23.0.5
Improvements
- Detect and Refine Fusion Genes has a new option allowing fusions of overlapping genes on opposite strands to be reported.
- Loading reports with very large tables takes less time.
- Previously, when annotation tracks were exported to BED format files, the Score column in the exported file contained only 0 values. Now, if the annotation track contains a Score column, those values are reported in the Score column of the exported file. (This does not affect expression tracks, where the expression value is exported as the score.)
- VCF Import can import VCF files with an unexpected number of values in CLCAD2 or AD. This includes VCF files produced by VarScan2.
- Various minor improvements
Bug fixes
- Fixed an issue that could cause Copy Number Variant Detection (CNVs) to give wrong results when targets were overlapping and coverage tables were used as control mappings, see https://digitalinsights.qiagen.com/technical-support/faq/important-clc-notifications/copy-number-variant-detection-cnvs-can-give-wrong-results-when-targets-overlap-and-coverage-tables-are-used-as-controls/.
- Fixed an issue causing Annotate with Nearby Gene Information to report incorrect nearest-gene information for the last gene (3') on a given chromosome.
- Fixed an issue causing Detect and Refine Fusion Genes to fail if the provided mRNA track contained transcripts annotated with priorities and the track was imported using the GFF3 importer.
- Fixed an issue causing Demultiplex Reads to fail with an error if the Edit/Up/Down buttons in the wizard were used when no tag was selected and the Reset button had earlier been pressed.
- Fixed an issue causing SAM/BAM import to fail when the provided reference element contained one or more circular sequences, but these sequences were not marked as circular in the SAM/BAM file and one or more reads mapped with unaligned ends at the beginning of the read.
- Fixed an issue causing Standard Import of GenBank format to import qualifiers' values as annotations surrounded by quotes. The surrounding quotes are now removed.
- Fixed an issue causing GFF3 export to fail when sequence annotations included features with incorrectly formatted frame qualifiers. Now, such frame qualifiers are ignored.
- Fixed an issue that could cause QC for Sequencing Reads to fail when provided with more than one sequence list, and one or more of those sequence lists contained very few sequences.
- Fixed an issue causing Create Sample Report and Combine Reports to fail, if an input report was named “report”.
Data related updates
From September 19, 2023, Download Pfam Database downloads Pfam 36.0. This update also affects download using earlier versions of the CLC Genomics Workbench.
Plugin notes
Import Immune Reference Segments, delivered by Biomedical Genomics Analysis and CLC Single Cell Analysis Module, can now import V segments in IMGT format that end in the conserved amino acid. Previously, these segments were silently ignored.
QIAGEN CLC Genomics Workbench 23.0.4
Improvements
- Download BLAST Databases is more resilient to interrupted connections and similar issues when downloading large databases.
Bug fixes
- Fixed an issue where workflows containing a BAM export element could not be launched from CLC Genomics Workbench 23.0.3 to run on a CLC Genomics Server due to an error reported after selecting an export destination in the launch wizard ("The parameter 'Export destination' File not found.")
- Fixed an issue causing workflows to fail if they contained multiple Filter on Custom Criteria elements connected to a single downstream element, and one or more of the Filter on Custom Criteria outputs was empty.
- Fixed an issue causing QC for Read Mapping to report the number of unaligned ends instead of the number of reads with unaligned ends. This could cause “read count” and “% of all mapped reads” to be too high.
Data related changes
ClinVar data for hg38 and dbSNP data for hg38 and hg19 made available via Download Genomes have been updated. These changes affect data available via CLC Genomics Workbench 22.x and 23.x and took effect on May 15, 2023.
Details of these updates:
- "Clinical associated variants" (ClinVar) - now accesses the latest releases from the NCBI for hg38. This data source was already used for hg19. Previously, Ensembl was the source for hg38.
- "Dbsnp variants" (dbSNP) now accesses version 151 from the NCBI. Previously, Ensembl was the source of this data for hg38. UCSC was the source for hg19.
- "Dbsnp (common) variants" (dbSNP common) now accesses version 151 from the NCBI. Previously, this option was not available for hg38. UCSC was the source for hg19.
QIAGEN CLC Genomics Workbench 23.0.3
Improvements
- The SAM and BAM exporters have a new option relevant where there is one or more circular reference sequences. The new option, "Export reads spanning the origin of circular chromosomes as unmapped", is checked by default, making the default behavior of these exporters match that of CLC Genomics Workbench 22.x and earlier. This update changes the default behavior of these exporters relative to CLC Genomics Workbench 23.0.1 and 23.0.2. In those versions, reads that span the origin are exported as extending beyond the end of the reference. That behaviour corresponds to unchecking the new option.
- Import of PacBio SAM/BAM files with Platform Model (PM) set to HIFI are imported as HiFi reads without having to check the "Mark as HiFi reads" option.
- Producing an Amino Acid Track is now optional in Amino Acid Changes.
- Various minor improvements
Bug fixes
- Fixed an issue affecting the homopolymer trimming options of Trim Reads. When enabled, homopolymers that started with 9 identical bases followed by a different base were not trimmed. Other homopolymers were trimmed as expected. This update may affect the number of reads trimmed in a given dataset, and thus could lead to differences in results from downstream analyses, relative to earlier software versions.
- Fixed an issue causing Detect and Refine Fusion Genes to fail on certain data sets.
- Fixed an issue causing RNA-Seq Analysis to fail when reads mapped to a gene located close to the origin of a circular chromosome.
- Fixed an issue that caused Join Alignments to fail in CLC Genomics Workbench 23.0.1 and 23.0.2.
- Fixed an issue causing SAM/BAM export to fail when reference sequence names contained commas, brackets or other characters not in the set of allowed characters according to the SAM format specification. These characters are now replaced by an underscore in the exported file.
- Fixed an issue causing import of SAM/BAM files to fail when they contained a Platform (PL) but no Platform Model (PM) in the header. This affected the PacBio importer, the Ion Torrent importer and Standard Import of reads from SAM/BAM files.
- Fixed an issue that caused the value selection range of gradient sliders in the Side Panel to be displayed incorrectly when using Locale Settings that do not use '.' (dot) as a decimal separator.
- Fixed an issue where lines in pdfs containing history information were not wrapped, resulting in the ends of long lines not being present in the exported document.
- Fixed an issue that caused VCF Export to fail when exporting fusions that had two or more filter criteria listed in the Filter column.
- Fixed an issue that caused Low Frequency Variant Detection, Fixed Ploidy Variant Detection, and Basic Variant Detection to fail when the end of a mapped read supported a deletion, and there was support in other reads for a variant at the subsequent position. This issue has only been observed for RNA-Seq data where splicing combined with primer trimming could lead to this situation.
- Fixed an issue causing Extract Reads to not correctly extract reads overlapping annotated regions that cross the origin of circular chromosomes when the type of overlap was set to "Span region" or "No overlap".
- Fixed issues in read mapping track tooltips affecting positions in paired reads that contained different bases:
- When viewing the reads as pairs, one of the 2 bases would be chosen to represent both reads for the purposes of the counts in the tooltip. Now, such ambiguous bases are represented by their IUPAC codes and counted accordingly. Thus, the tooltip now corresponds to the bases shown in the read mapping track.
- When viewing the individual reads making up the pairs, just one of the two bases would be counted, thereby undercounting the total number of bases at such positions.
Plugin notes
Fixed an issue affecting Immune Repertoire Analysis, delivered by Biomedical Genomics Analysis, and Single Cell V(D)J-Seq Analysis, delivered by CLC Single Cell Analysis Module. The tools failed if there were reads where the region that aligned to a C segment was contained within the region that aligned to a J segment.
QIAGEN CLC Genomics Workbench 23.0.2
Improvements and bug fixes
- The runtime of Amino Acid Changes has been significantly improved.
- Fixed an issue in the Trim Reads report where the number of “Trimmed (broken pairs)” was not reported per sequence list provided as input, but were instead added together incrementally. The number of reported “Trimmed reads” decreased correspondingly. The issue would occur when paired reads from more than one sequence list were trimmed and broken read pairs were produced.
- Fixed a rare issue that could cause Trim Reads to retain a wrong part of a read if the read was both trimmed based on quality scores and adapter read-through.
- Fixed an issue causing the Demultiplex Reads tool to always demulitplex based on a sequence structure of "barcode, sequence". Adjustments to the tag list, such as adding a linker or placing the barcode at the end, were ignored. This issue did not affect the tool when run in a workflow context.
- Fixed an issue that could cause Detect and Refine Fusion Genes to fail on Windows when either the dataset was large or fusion genes with many possible transcripts were detected.
- Fixed an issue that could cause VCF Export to fail when exporting filtered annotation tracks that were empty.
- Fixed an issue that caused fragment lengths to be incorrect in tables from Design Primers. The values are re-calculated when a table is opened, hence previous designs do not need to be repeated.
- Fixed an issue causing the download of the QIAseq xHYB Viral Panels reference data set to fail on Windows.
- Fixed a rare issue where Rebuild Index could not repair a corrupt search index.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 23.0.1
Improvements and bug fixes
- Fixed an issue affecting Trim Reads, where the wrong part of a read was retained if the read was both trimmed to a fixed length and also trimmed by another method from the opposite end of the read.
- Fixed an issue affecting Trim Reads when both adapter trimming using a trim adapter list and fixed length trimming were selected. This issue could cause the resulting trimmed reads to be shorter than expected.
- Fixed an issue where fusion plots created by Detect and Refine Fusion Genes were omitted in the report and were not accessible via the fusion track table.
- Fixed an issue where workflows containing a Branch on Coverage element would fail for read mappings with no zero coverage regions when using reports output by QC for Read Mapping.
- Fixed an issue where dates indicated with forward slashes in CSV format files were not recognized as dates by Import Metadata.
- Fixed an issue where the history entry in a sequence list after sorting always stated the sorting was based on length, even the sorting was based on name or marked status.
- Fixed an issue causing Annotate with GFF/GTF/GVF file to fail when the option "Ignore duplicate annotation" was checked.
- Fixed an issue causing Standard Import of GenBank format to stall if qualifier names spanned more than one line.
- Various minor improvements
Please see the release notes for CLC Genomics Workbench 23.0, below, for a full list of changes since the last general release of this software.
QIAGEN CLC Genomics Workbench 23.0
New tools
- Homology Based Cloning - Design cloning experiments for cloning methods relying on homologous ends, such as Gibson Assembly®.
- Create K-medoids Clustering for RNA-Seq finds clusters of features, e.g., genes/transcripts/miRNAs etc, whose expressions behave similarly, for example first increasing over time and then decreasing. The tool produces a Clustering Collection which contains a Sankey plot showing how these features move between clusters under different conditions, for example different treatments. A line graph representation of features from individual clusters or pairs of clusters is present as well.
New tools coming from plugins
- Detect and Refine Fusion Genes - Find fusion genes in RNA-Seq data by identifying potential fusions and then refining that list by evaluation of the evidence for each fusion. This is an updated version of the tool formerly distributed in the Biomedical Genomics Analysis plugin. The updates made are listed in an Improvements section below.
- Target Region Coverage Analysis - Analyze and compare coverage from multiple samples. This tool was formerly distributed in the Biomedical Genomics Analysis plugin.
- Create Consensus Sequences from Variants – Create consensus sequences from a variant track and a reference sequence. This tool was formerly distributed in the Biomedical Genomics Analysis plugin.
- Annotate with GFF/GVF/GTF file - Add annotations from a GFF, GVF or GTF format file onto sequences, individual or in sequence lists. This tool was formerly distributed in the Annotate with GFF file plugin.
Other new functionality and improvements
RNA-Seq analysis tools
- New tutorial: Get hands-on experience with new RNA-Seq analysis functionality, including Create K-medoids Clustering for RNA-Seq (see New Tools above), with the RNA-Seq analysis with four tissues and six timepoints tutorial.
- Improvements to RNA-Seq Analysis:
- Substantial speed improvements. Reads that map to multiple transcripts or genes will be distributed differently than earlier due to different choices of random seed in the new implementation. The algorithm is still deterministic.
- Transcripts are no longer renamed in Transcript Expression (TE) output unless renaming is necessary to avoid duplicate names. Previously, transcripts were renamed to the gene name plus a number e.g. "BRCA_1". This change means that TE tracks in this version of the software cannot typically be used together with TE tracks generated using older versions to produce Heat Maps, PCA plots, Expression, etc.
- Reports UMI fragment counts when relevant. UMI counts are included in the Fragment statistics section of the report if the input reads are annotated with UMIs by tools from the Biomedical Genomics Analysis plugin, and if the library type is set to 3' sequencing for RNA-Seq Analysis.
- Improvements to Heat Maps:
- Samples can be ordered by the Tree, Sample, or Active metadata layer options, or any individual metadata entry.
- Optimize tree layouts - a new option for reordering features to produce a top-left to bottom-right diagonal.
- The order of the metadata categories can be adjusted. This order is reflected in the legend.
- Metadata categories are alphabetically sorted.
- The Expression Browser includes a new plot for visualizing genes across samples and contrasts and metadata categories are sorted alphabetically.
- Venn diagrams support four and five groups. Previously up to 3 were supported. Tooltips indicate which groups are part of a specific intersection.
- PCA plots produced by PCA for RNA-Seq:
- Have two table views. The first table view shows the loadings of the principal components. The second table view shows the coordinates of the points.
- The order of the metadata categories in 2D PCA plots can be adjusted. This order is reflected in the legend.
miRNA analysis tools
- Quantify miRNA:
- Handles custom databases containing duplicated names.
- Does not allow custom databases containing sequences longer than 60bp. This avoids misallocation of reads to sequences that are similar to small RNAs.
- When adding multiple inputs to Extract IsomiR Counts, the extracted expression tables contain an entry for the combined set of IsomiRs identified among the samples, making them compatible for analysis in Differential Expression in Two Groups and Differential Expression for RNA-Seq.
Differential Expression for RNA-Seq and Differential Expression in Two Groups
- A new option for creating a subset has been added to the miRNA Statistical Comparison Table produced by Differential Expression for RNA-Seq and Differential Expression in Two Groups.
- It is possible to downweigh outliers. This option is disabled by default and recommended only when the results seem enriched for genes that are expressed at anomalously high levels in a small proportion of samples.
- The Max Group Means column of Statistical Comparison Tracks and Tables now shows TPM instead of RPKM. Note that this column is used for filtering data in tools such as Create Heat Map for RNA-Seq and the Pathway Analysis tool of the Ingenuity Pathway Analysis plugin.
Detect and Refine Fusion Genes
This is an updated version of Detect and Refine Fusion Genes, formerly distributed in the Biomedical Genomics Analysis plugin. The updates listed here are relative to the version distributed with Biomedical Genomics Analysis 22.2.
- Fusions will not be called for overlapping genes.
- Novel exon boundary improvements:
- Options have been expanded to allow for detecting fusions with a single fusion partner ("Detect with novel exon boundaries") as well as detecting those with 2 fusion partners ("Allow fusions with novel exon boundaries in both genes")
- The "Detect exon skippings" option supports detection of fusions with novel exon boundaries.
- An option has been added to omit non-significant breakpoints from the report.
- A minimum Z-score can now be specified for use when evaluating evidence for a fusion.
- Speed improvements
- The option "Allow fusions with novel exon boundaries in both genes" now defaults to false to reduce the number of false positive fusions. Setting it to true is useful for exhaustive searches of novel fusions.
- Changes to the maximum number of equivalent matches to the reference allowed for a single read to be retained:
- When remapping reads to a fusion chromosome, the maximum number is now 30. Previously it was 10.
- When searching for unaligned ends, the maximum number remains unchanged, as 10.
- The option "Maximum number of hits for a read" has been removed. It's value was ignored in previous versions.
- Fusions from mRNA transcripts without an associated gene in the Gene track are not used when detecting fusions. mRNA transcript features must have a gene id in one of the following columns to be matched with the associated gene: "Parent", "gene_id" or "gene_name".
- Fixed an issue where paired end reads were treated as single end reads when the option to "Only use fusion primer reads" was enabled.
- Fixed an issue where unaligned ends could be too long or too short for reads containing insertions and deletions. This change may lead to small differences in results compared to earlier versions, expected to be due to a decrease in false positive and false negatives reported.
Bisulfite mapping
- Map Reads to Bisulfite Reference speed improvement. This is data dependent, with about a 50% improvement likely for most data sets. This speed up might change the details of results very slightly.
- Call Methylation Level speed improvement. This speedup might, in some cases, change results very slightly.
- Import of read mappings from SAM/BAM now use methylation information from the optional SAM tags XR for read conversion and XG for reference conversion. The recognized values are "CT" and "GA". Support for these tags is added so that information is not lost if a bisulfite mapping is exported and then re-imported.
- Export of read mappings to SAM/BAM format now includes details on bisulfite conversion. These are specified using the SAM tags XR for read conversion and XG for reference conversion. The possible values of these tags are "CT" and "GA". This is provided for increased compatibility with third party tools.
Workflows
- Branch on Coverage - a new workflow control flow element where the downstream processing of read mappings can be controlled based on coverage values within reports.
- Import with Metadata - new template workflow that imports sequence data into sequence lists and associates the imported elements to a CLC Metadata Table containing descriptive information for each sample.
- Workflows containing Demultiplex Reads elements and workflows containing Split Sequence List elements can be run in Batch mode.
- Barcodes can be preconfigured in Demultiplex Reads elements in workflows.
- Workflow Export elements can be preconfigured to export to locations on AWS S3.
- When Annotate with Overlap Information is included more than once in the same workflow, columns with overlap information are now always added in the same order. Previously, concurrency issues could cause column order to be different between different runs.
Search for Reads in SRA
- Technical reads can be downloaded in addition to biological reads. The reads to import, as well as the read structure and orientation are configurable.
- When multiple accessions are provided in an Accession query field, each is searched for separately. Previously only entries containing all the accessions entered were returned.
- An estimate of the disk space and final size of imported sequence lists is no longer provided in the wizard, but further information about space requirements has been added to the manual.
- A troubleshooting section has been added to the manual.
Read mappings
- Read mapping speed on Apple Silicon processors has been improved. Read mapping results are not affected by this. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- In stand-alone read mappings and read mapping tracks, deletions are now highlighted in the coverage graph and in the shown reads.
- For stand alone read mappings a "Match coloring" side panel provides the colors applied to reads when the compactness level is set to "Packed".
Import and export
- VCF Import:
- Supports symbolic alleles for inversions (<INV>), insertions (<INS>), deletions (<DEL>) and tandem duplications (<TANDEM:DUP>). Symbolic alleles that do not contain sequence information or are longer than 100,000 base pairs are imported to annotation tracks instead of variant tracks. Previously symbolic alleles were not imported.
- Improved handling of variants with multiple loci encoded in the same vcf record.
- VCF Export supports symbolic allele representation for insertions (<INS>), deletions (<DEL>) and tandem duplications (<TANDEM:DUP>). (Inversions (<INV>) were already supported.) With the exception of deletions, variants in annotation tracks are always exported as symbolic alleles. Deletions in annotation tracks and variants in variant tracks above a specified size are also exported as symbolic alleles. The default size is 1000 bp, which corresponds with the QCI Interpret requirement that InDels > 1000 bp must be represented as symbolic alleles.
- The PacBio importer supports HiFi reads.
- The read length when exporting to FASTQ format files has been increased from 524,288 bp to 16,777,216 bp.
- SAM/BAM Mapping Files importer:
- Performance improvements
- The circular flag of references is now retained.
- Import Tracks from File has been updated to show a warning if the file is not imported.
- GFF3 Export retains the case of attribute headers. Previously, all headers were adjusted to lower case during export.
- The history information of elements imported using Standard Import includes the specific importer used (e.g. "CSV table importer", "Fasta Importer", etc).
- Standard Import can be used to import files from AWS S3 locations.
- When exporting images to bitmap-based formats, the Screen resolution and High resolution options are now bounded so the maximum supported number of pixels will not be exceeded.
Sequence Lists
- Checkboxes can be enabled to select sequences within the graphical view of sequence lists. Lists can be sorted based on whether they are marked or not, and marked sequences can be deleted.
- In the Annotation Table view, the following changes have been made to the right-click menu:
- The underlying sequence of selected annotations can be deleted.
- Names of sequences selected annotations are on can be copied to the clipboard.
- The option to export to gff now exports to GFF3 format - Export Selected to GFF3 File. This option has also been updated in the Annotation Table view of individual sequence elements.
- In the Table view, selected sequences can be deleted, and the names of selected sequences can be copied.
- Various minor improvement to labels in right-click menus.
CLC Metadata Tables
- When launching analyses in Batch mode, or when launching workflows with an Iterate element, CLC Metadata Tables with data associated can be used directly as input. Each row in the CLC Metadata Table is a batch unit, with data elements associated to a row, of a type compatible as input to the analysis, being the default contents of a batch unit. When launching workflows, the column to base the batch units on can be specified.
- New options for editing CLC Metadata Tables, including for adding content from other CLC Metadata Tables or Excel, CSV or TSV files. Rows in a CLC Metadata Table can also be selected and used to make a new CLC Metadata Table.
- When associating data automatically to CLC Metadata Tables, a preview of the associations that will be made is shown in the wizard.
Other improvements
- Search for Sequences at UniProt has been substantially updated and improved. Changes include new search fields and more informative information returned, including links to PubMed entries.
- Quick Search and Local Search have been substantially improved. Please refer to the documentation for details.
- The overview of batch units when launching tools or workflows in Batch mode and when launching workflows with control flow elements (Iterate, Collect and Distribute), have been aligned. In the latter, the contents of batch units can now be adjusted by including or excluding elements based on a part of their name. Previously this was only possible when launching analyses in Batch mode. Right-click options to remove batch units or to remove particular data elements from a batch unit, have been removed.
- When Low Frequency Variant Detection, Fixed Ploidy Variant Detection or Basic Variant Detection was used with a mapping realigned using Local Realignment with a guidance variant track, it was possible for partial insertions to be called. Now, the full insertion must be present within at least one, individual read for it to be reported.
- QC for Targeted Sequencing:
- Can report coverage statistics per gene.
- Supports analysis of read mappings generated by RNA-Seq Analysis.
- The hg38 masking track GenomeReferenceConsortium_masking_hg38_no_alt_analysis_set is provided via the Reference Data Manager as a reference element, and is part of reference sets that use the "hg38_no_alt_analysis_set" genome sequence. It contains regions defined by the Genome Reference Consortium and primarily serves to remove false duplications, including one affecting the gene U2AF1. It is intended for use with Map Reads to Reference.
- Annotate with Exon Numbers:
- Can add exon numbers to elements in annotation, expression and statistical comparison tracks. Previously only variant tracks could be annotated with exon numbers.
- Adds exon numbers when input elements start outside an exon but still overlap the exon.
- Adds all exons when multiple exons overlaps a single input element.
- Allows annotation with exons from only one transcript or CDS.
- Filter on Custom Criteria can be used to filter Statistical Comparison Tracks, Statistical Comparison Tables, IsomiR tables, and miRNA Seed Tables.
- Demultiplex Reads has been updated to:
- Report barcodes without any matched reads
- Show the barcodes names in the history.
- Reports from Create Sample Reports and Combined Report generated using RNA-Seq reports now include the percentage of reads mapped to exons in the Fragment counting statistics table.
- In Create Sample Report, the percentage of target region positions with coverage above a set threshold can be used as a QC metric.
- QC for Sequencing Reads processes only the first 100,000 base pairs in long reads. Before the tool would fail when provided with very long reads.
- Local Realignment no longer realigns reads into regions with no coverage, such as introns in RNA-Seq read mappings.
- Remove Duplicate Mapped Reads uses an improved method to identify duplicate reads when handling paired end reads. In general, this improvement results in slightly more reads being considered duplicates.
- The options for extracting reads according to their location relative to features in an overlap track have been expanded in Extract Reads. Previously reads had to lie fully within an annotated region to be extracted. Now, in addition to that condition, options are provided for extracting any overlapping reads, extracting only reads that fully span annotated regions or extracting all reads except those that overlap with annotations in the overlap track.
- Assemble Sequences to Reference supports alignment of reads that span the origin of a circular reference.
- Secondary Peak Calling has a new option "Peak detection stringency".
- The report from Copy Number Variant Detection (CNVs):
- Includes a table showing the number of genes affected by CNV calls.
- Contains new coverage plots at genome and chromosome levels.
- The Trim Reads report now includes statistics for the number of reads in intact pairs and in broken pairs.
- Updated restriction site database to REBASE 2022-06-30.
- The Identify Known Mutations from Mappings output channel names when used in a workflow have been improved. The elements produced by the tool have not been changed.
- While viewing data, in most situations, tooltips can be suppressed by holding down the Ctrl key. Similarly those tooltips can be displayed immediately, instead of a moment after the mouse cursor stops moving, by holding down the Shift key.
- The Welcome Center content has been updated to focus on information helpful when getting started using the Workbench.
- Third party plugins for CLC Workbenches can be installed when the Workbench is running in Viewing Mode.
- A button has been added to the top Toolbar for contacting our Support team.
- Various minor improvements
Bug fixes
- Low Frequency Variant Detection, Fixed Ploidy Variant Detection and Basic Variant Detection:
- Fixed an issue that in very rare cases caused insertions to be called twice. Now, the same insertion is always only included once in the variant track.
- Fixed an issue in the remove pyro-error variants filter. Previously, the frequency threshold for removing pyro-error variants was ignored and more variants than intended were removed. The filter is generally only used for Ion Torrent data. This fix may result in a small improvement to the precision of variant detection.
- Fixed a rare issue affecting variant calling in very low coverage regions, where a variant could be reported that was not present in any single read in the mapping.
- Fixed an issue causing Map Reads to Reference to fail if a masking track covering a whole chromosome was provided as input.
- RNA-Seq Analysis
- Fixed an issue where reads were not counted as unique for a transcript in the GE track table, if the read could map in multiple ways to the same transcript, but only to that transcript.
- Fixed an issue that could lead to an IndexOutOfBounds error when the option "Calculate expression for genes without transcripts" was selected, and two or more genes had the same name, and at least one of these has no transcripts, and the Region column of the table view of the gene track contains the text "join", ">", or "<" (i.e., the genes have splice structure, or uncertain end positions).
- Fixed an issue where the gene identifier would be removed from the statistical comparison track and tables produced by the Differential Expression for RNA-Seq tool when it was not recognized to be an Ensembl gene identifier.
- Fixed an issue in Differential Expression in Two Groups and Differential Expression for RNA-Seq that affected the estimation of dispersion estimates including information from nearby genes. This leads to slightly different p-values produced by by these 2 tools.
- Fixed an issue affecting Extract Consensus Sequence where annotations transferred from the reference sequence to the consensus sequence could be wrongly positioned if the read mapping had an insertion in a region that was removed due to low coverage.
- Fixed an issue where, if two genes had the same name and overlapped, their transcripts might become assigned to only one of the genes. The fix only applies when the gene and transcript annotations are imported from GFF3.
- Fixed an issue affecting the naming of outputs from Local Realignment when the tool was provided with multiple read mappings as input and not run in batch mode. Each resulting realigned read mapping is now named after the corresponding input. Previously all the realigned read mappings were named after the first read mapping in the set of inputs.
- QC for Sequencing Reads
- Fixed an issue in the report where the graph for R1 nucleotide contributions would be truncated to only show the same number of nucleotides as the R2 plot.
- Fixed an issue where the median read length in the supplementary report could be incorrect when the number of reads was very low. The median reported in the graphical report was correct.
- Amino Acid Changes
- Fixed an issue causing the output from to be named after the reference data instead of the input data.
- Fixed an issue that caused the transcripts and proteins listed in the Coding region change and Amino acid change columns in the annotated variant track output to be inconsistently ordered.
- Fixed an issue in the Trim Reads report, where the number of reads under “No trim” could be incorrect when "Remove fixed number of bases” was enabled.
- Fixed an issue causing Show Enzymes Cutting Inside/Outside Selection to give wrong results when the selection crossed the junction of a circular sequence and a desired number of cut sites outside the selection was not specified.
- Fixed an issue in VCF Export, where specified minimum ploidy was not always enforced for complex variants. The issue would only occur when an allele had first been removed from a locus to adhere to the specified maximum ploidy.
- Fixed an issue where the wrong entry in a trim adapter list would be opened for editing if the list had been sorted or filtered.
- Fixed a rare issue in K-means/medoids clustering where a gene could be output in multiple clusters. This would occur when genes with identical expressions were chosen to be medoids, and so would only happen when K was comparable to the number of genes with unique expressions across samples.
- Fixed issues with Quantify miRNA where:
- It would fail on paired reads if using spike-ins.
- Opening a sequence list to view it would cause this tool to fail if that same sequence list had been used as input.
- In the report from Create Sample Report the value column in the summary table is coloured green or yellow according to whether the threshold is met. Previously, the threshold column was coloured.
- Workflow related
- Fixed an issue affecting the location of outputs generated from a workflow element that was also linked to a Collect and Distribute element. In cases where the output folder name was defined using the {input} or {2} placeholder, these outputs were sometimes all saved to the first folder created, instead of to different folders as intended.
- Fixed an issue where default names were applied to outputs from Output elements attached directly to an Iterate element in workflows, even when naming placeholders had been configured.
- Fixed an issue affecting workflows with nested Iterate elements where results from the outer level of iteration flowed into a Distribute and Collect element. Any output elements generated in the inner iteration, which should have saved, were lost.
- Fixed an issue where unlocked options for on-the-fly importers in a workflow would be locked if the Input element was re-opened for editing.
- Fixed an issue affecting the "Highlight used elements" view setting of the workflow editor, where most elements, not just the unconnected ones, were grayed out when this option was selected.
- Fixed an issue when exporting information from an Expression Browser element to Excel and choosing the "Export table as currently shown" option resulted in information from cells containing very long entries, such as GO Biological processes, being truncated.
- Fixed issues affecting the right-click option to ”Extract Sequence…” over a sequence track in a track list containing an annotation track:
- The "Extract annotations" setting was having no effect. Even when unchecked, annotations were included in the output.
- The "Extract annotations" option was disabled when no sequence region had been selected before running “Extract Sequence”. In this situation, the option is now enabled, and turned on by default.
- Fixed an issue setting the subtree line color for a particular node of a phylogenetic tree would result in that color also being applied to neighboring nodes.
- Fixed an issue that could cause the Workbench to freeze when exporting elements with certain view settings to graphics formats, for example read mappings with compactness set to "Not compact" and the Sequence layout set to "No wrap".
- Fixed an issue where legends added to heat maps could sometimes be placed on top of an existing legend.
- Fixed an issue affecting the visualization of certain read mapping tracks, where when zoomed out, some empty space appeared at the top of the mapping when the option "Float variant reads to top" was selected.
- Fixed an issue in reads tracks tooltips where insertions could be reported as present in more than 100% of the reads.
- Fixed an issue affecting hyperlinked table entries, where html tags were sometimes included as text in the information exported to Excel or CSV formats.
- Fixed an issue where upgrading on Windows systems could be blocked due to a locked file.
- Fixed an issue where text in installer screens was not visible when installing the software in 'dark mode' on Linux.
- Various other minor bug fixes
Changes
- Tools in the RNA-Seq and Small RNA Analysis folder of the Toolbox have been rearranged into subfolders related to the natural flow of an analysis.
- The Cloning tool has been renamed to Restriction Based Cloning.
- The "Disconnect paired reads" viewing option for read mapping tracks and stand-alone read mappings has been replaced by the option "Show strands of paired reads". The behaviour of the new option is like the old one except that the members of each pair are connected by a blue line.
- Indexes used for searching are not the same as the ones used in earlier versions. New indexes are automatically established for each available CLC Workbench Location when installing version 23.0 for the first time. So for Workbenches, this change will be seamless in most cases. However, if you later run an old version of a CLC Workbench and save new data elements to a CLC Workbench location, a search from the newer version of the software will not find those unless you manually re-index your CLC Workbench locations.
- When tools or workflows are run in Batch mode, "Create subfolders per batch unit" is selected by default. Previously this option was not selected by default.
- PFAM accessions in the results table created by PFAM Domain Search are linked to PFAM entries hosted by InterPro. Links generated in older versions of the software were to the Pfam website, which is being decommissioned.
- AWS Connections:
- An AWS region can be specified in the AWS connection settings. When upgrading from an earlier version with AWS connections already defined, the region will be set to us-east-1 by default. This can be changed by editing the connection. The region setting is primarily relevant if you plan to submit analyses from a CLC Workbench with the CLC Cloud Module installed to run on a CLC Genomics Cloud setup.
- Information about AWS Connections now includes whether the connection is valid for submitting jobs to a CLC Genomics Cloud, in addition to whether the connection is valid for accessing files on AWS S3.
- The Java version bundled with CLC Genomics Workbench 23.0 Java 17.0.4, where we use the JRE from the Azul OpenJDK builds.
Legacy tools and functionality
The following tools have been moved to the Legacy folder of the Workbench Toolbox and will be retired in a future version of the software:
- QIAGEN GeneReader importer (Legacy)
Functionality retirement
The following tools have been retired:
- Batch Rename (legacy)
- Compare Sample Variant Tracks (Legacy)
- Empirical Analysis of DGE (Legacy)
Plugin notes
Plugin retirements
- Annotate with GFF file plugin The tool Annotate with GFF/GVF/GTF file is now available directly in the WB.
- Haplotype Calling (beta). Functionality from this plugin is now in the Biomedical Genomics Analysis plugin.
QIAGEN CLC Genomics Workbench 22.0.3
Improvements
- When exporting images to bitmap-based formats, the Screen resolution and High resolution options are now bounded so the maximum supported number of pixels will not be exceeded.
- Search for Sequences in Uniprot has been updated to reflect changes at UniProt Knowledgebase (UniProtKB). This tool is broken in earlier releases, including CLC Genomics Workbench 22.0, 22.0.1 and 22.0.2.
- Third party plugins for CLC Workbenches can be installed when the Workbench is running in Viewing Mode.
- Various minor improvements
Bug fixes
- Fixed an issue causing Map Reads to Reference to fail if a masking track covering a whole chromosome was provided as input.
- Fixed an issue in RNA-Seq Analysis that could lead to an IndexOutOfBounds error when the option “Calculate expression for genes without transcripts” was selected, and two or more genes had the same name, and at least one of these has no transcripts, and the Region column of the table view of the gene track contains the text “join”, “>”, or “<” (i.e., the genes have splice structure, or uncertain end positions).
- Fixed an issue affecting Extract Consensus Sequence where annotations transferred from the reference sequence to the consensus sequence could be wrongly positioned if the read mapping had an insertion in a region that was removed due to low coverage.
- Amino Acid Changes
- Fixed an issue causing the output from to be named after the reference data instead of the input data.
- Fixed an issue that caused the transcripts and proteins listed in the Coding region change and Amino acid change columns in the annotated variant track output to be inconsistently ordered.
- Fixed issues with Quantify miRNA where:
- It would fail on paired reads if using spike-ins.
- Opening a sequence list to view it would cause this tool to fail if that same sequence list had been used as input.
- Fixed issues affecting the right-click option to ”Extract Sequence…” over a sequence track in a track list containing an annotation track:
- The “Extract annotations” setting was having no effect. Even when unchecked, annotations were included in the output.
- The “Extract annotations” option was disabled when no sequence region had been selected before running “Extract Sequence”. In this situation, the option is now enabled, and turned on by default.
- Fixed an issue setting the subtree line color for a particular node of a phylogenetic tree would result in that color also being applied to neighboring nodes.
- Fixed an issue that could cause the Workbench to freeze when exporting elements with certain view settings to graphics formats, for example read mappings with compactness set to "Not compact" and the Sequence layout set to "No wrap".
- Fixed an issue causing Standard Import of GenBank format to stall if qualifier names spanned more than one line.
- Fixed an issue where unlocked options for on-the-fly importers in a workflow would be locked if the Input element was re-opened for editing.
- Fixed an issue where upgrading on Windows systems could be blocked due to a locked file.
- Fixed an issue where text in installer screens was not visible when installing the software in 'dark mode' on Linux.
Changes
- The Java version bundled with CLC Genomics Workbench 22.0.3 is 11.0.17, where we use the JRE from the Azul Zulu Builds of OpenJDK.
- PFAM accessions in the results table created by PFAM Domain Search are linked to PFAM entries hosted by InterPro. Links generated in older versions of the software were to the Pfam website, which is being decommissioned.
QIAGEN CLC Genomics Workbench 22.0.2
Bug fixes
- Fixed an issue affecting Local Realignment where false positive insertions and deletions could be introduced in read mappings when a guidance variant was provided that had a sequence similar to the reference sequence immediately after the variant. See Important QIAGEN CLC software notifications for further details.
- Fixed an issue affecting the naming of outputs when Local Realignment was provided with multiple read mappings as input. When the tool was not run in Batch mode, a read mapping was generated per input, as expected, but all the realigned read mappings were named after the first read mapping in the set of inputs. Now each is named after the corresponding input.
- Fixed an issue with Predict Secondary Structure (RNA), where the Secondary Structure 2D View did not show colors when coloring by "Base pair probability" or "Unpaired probability".
- Fixed an issue where Template Workflows delivered by a plugin could be listed twice in the Toolbox after an unsuccessful plugin installation or removal.
- Fixed an issue that could result in an error like "no zstd-jni in java.library.path" when importing files or undertaking other activities involving the compression of CLC data elements.
Plugin notes
- After upgrading the CLC Genomics Workbench:
- Immune Repertoire Analysis will be able to identify clonotypes from data with a varied read structure. This tool is available when the Biomedical Genomics Analysis plugin is installed.
- Single Cell TCR-Seq Analysis will be able to identify clonotypes from data with a varied read structure. This tool is available when the CLC Single Cell Analysis Module is installed with access to a valid CLC Genomics Premium Modules license.
The results from these tools may be slightly altered as a result of the underlying improvement.
- Fixed an issue where the table view of a Venn Diagram could not be shown for data imported using the Import Expression Data tool of the Ingenuity Pathway Analysis plugin. The issue arose when a gene or transcript could not be matched to a location on the genome and the option "Unmatched genes/transcripts" was set to "Include".
QIAGEN CLC Genomics Workbench 22.0.1
Improvements
- The RNA-Seq Analysis Report has been updated to include new text in the "Adapter read-through" section. The additional text explains that trimming the start of paired-end reads (5' trim) can lead to spurious detection of adapter read-through.
- Update Sequence Attributes in Lists now accepts attribute information from comma separated (.csv) and tab separated (.tsv) format files.
- Volcano plots no longer include points where the fold change or y-axis component has an NaN value.
- When two or more reports produced by the same tool are used as input to Create Sample Report, and they contain conflicting values, the tool now fails. Previously, the results from the first report were used.
- When reports that contain identical tables and values are input to Create Sample Report, the values are included once in the report. Previously, a row per input report was included, even when the values were identical.
- Various minor improvements
Bug fixes
- Fixed an issue in Copy Number Variant Detection (CNVs), where a subset of region CNVs were not correctly calculated on chromosomes with low coverage target regions in the control samples. The issue led to overly large region CNVs. As gene-level CNVs are calculated from region-level CNVs, these were also affected. When only few target regions had low coverage in control samples, results were likely not affected.
- Fixed an issue that could the Workbench to crash on macOS Monterey when using the Welcome Center or viewing sunburst plots (which can be generated using tools in the CLC Microbial Genomics Analysis Module).
- Fixed an issue with the side panel settings where the label coloring of some restriction sites would always be black and the sorting would not be alphabetical when saved view settings were applied to a sequence.
- Fixed an issue where the wrong entry in a trim adapter list would be opened for editing if the list had been sorted or filtered.
- Fixed an issue that caused the Workbench to crash when installed on some Linux flavors if certain actions were taken and data compression was enabled.
- Fixed an issue in Maximum Likelihood Phylogeny that in rare situations led to tree construction never completing.
- Fixed an issue with the Create BLAST Database tool which could fail if the underlying native BLAST tool reported warnings.
- Fixed an issue where, when exporting elements to an S3 location in batch mode, the first element was exported successfully but the rest failed with an error.
- Fixed an issue with Split Sequence List when splitting based on attribute values, where an error resulted if the sequence list had more than 1000 distinct values for the selected attribute.
- Fixed an issue where a workflow could fail or produce an incorrect folder structure for outputs when it contained Split Sequence List, with splitting based on attribute values, and the name of at least one of the groups contained a forward slash "/" character.
- Fixed an issue affecting the naming of workflow outputs defined using a naming pattern of the form {input:N} or its equivalent {2:N}, e.g. {input:1} or {2:1}. The intended input name was not the one used to form the output element names when the workflow included an Iterate element, and import was done on the fly, and the batch units were defined based on the organization data, and:
- Each batch unit was a set of paired reads, or
- Each batch unit was a folder.
- Fixed an issue where the median read length in the supplementary report produced by QC for Sequencing Reads could be incorrect when the number of reads was very low. The median reported in the graphical report was correct.
- Fixed an issue with Create Sample Report affecting samples where reads mapped to only one chromosome. For such samples, the QC metric "% reads mapped in target region" was not included in the Quality Control table, even when that option had been selected when launching the tool.
- Various minor bug fixes
Changes
The Java version bundled with CLC Genomics Server 22.0.1 is Java 11.0.14.1, where we use the JRE from the Azul OpenJDK builds.
QIAGEN CLC Genomics Workbench 22.0
New features and improvements
Template workflows
Template workflows are available from the Toolbox. These can be run as they are, or can be copied and customized for your needs. The template workflows provided are:
- Identify DNA Germline Variants - a workflow for detecting germline variants in DNA sequence data.
- RNA-Seq and Differential Gene Expression Analysis - a workflow for running RNA-Seq analyses on individual samples and then carrying out differential expression across specified groups represented by those samples.
- Prepare Raw Data - a workflow for quality reporting and adapter trimming of reads, intended for use before further analysis of that data.
Tools for annotating and manipulating sequence lists
The new Sequence Lists folder under Toolbox | Utility Tools contains tools for working with sequence lists. This includes existing tools, with new names and expanded functionality, as well as new tools:
- Split Sequence List New tool: Splits up nucleotide or peptide sequence lists. The output can be a specified number of lists, lists containing a specified number of sequences, or lists containing sequences with particular attribute values, such as terms in the description.
- Update Sequence Attributes in Lists New tool: Updates and adds information about the sequences in a list. For example, descriptions can be updated, or new information types can be added based on information provided in an Excel file.
- Create Sequence List Existing tool. Create new sequence lists from sequence elements and/or sequence list elements. Previously available only from the File | New menu.
Access to Amazon S3 and BaseSpace
- Import data from Amazon S3 or BaseSpace when launching workflows
- Save workflow results to Amazon S3
- Access data in BaseSpace and configure a regional BaseSpace location
Other new functionality
- MGI/BGI importer An importer for MGI/BGI fastq format files.
- Rename Sequences in Lists Rename sequences within sequence lists by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- Rename Elements Rename elements by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- A Heat map graphics exporter has been introduced for exporting heat maps to graphics file formats.
- Files containing tab separated values (.tsv) can be imported as tables using Standard Import.
- Export VDJ tools Exports T-Cell VDJ repertoire in txt format.
Improved menu organization and tool access
- New top level menus:
- Connections For tools and functionality relevant to connections to other systems, such as a CLC Genomics Server.
- Utilities For general tools and functionality such as search tools, the Plugin Manager, Workflow Manager and Reference Data Manager.
- Improvements to the contents and order of tools in other top level menus
- The Favorites tab, where favorite tools and frequently used tools are listed for easy access, is now available in the Launch dialog and the workflow Add Elements dialog, in addition to in the Toolbox area in the bottom, left side of the Workbench.
RNA-Seq and Expression Analysis improvements
- Support for chimeric protocols has been improved in RNA-Seq Analysis.
- RNA-Seq Analysis now reports biotypes with frequency 0 in the "Distribution of biotypes" table.
- When included in workflows, PCA for RNA-Seq and Create Heat Map for RNA-Seq can be run using just one sample as input, thus enabling their use for both multi-sample and single-sample analyses.
- The statistical comparisons generated by Differential Expression for RNA-seq and Differential Expression in Two Groups includes the Biotype when it is available from the expression samples used as input.
- When using Create Expression Browser with miRNA input, the miRBase ID is preserved.
- When performing Differential Expression for RNA-seq on miRNA "group on mature" expression tables the miRBase ID is now exposed in the Statistical Comparison Tables.
Demultiplex Reads
- Demultiplex Reads now supports setting barcodes from table elements in addition to importing barcodes from local files.
- The barcode import table format has been extended to support additional columns.
- Barcode columns can be sorted.
- When running a workflow that contains a Demultiplex Reads element, the workflow wizard can calculate a preview and remove barcodes, similar to what is seen when running Demultiplex Reads directly from the Toolbox.
- When multiple elements are provided as input, the information in the Preview pane includes information obtained from across these. Previously, only the first input element was used for this.
BLAST related updates
- BLAST has been upgraded to BLAST+ 2.12.0 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.12.0 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The list of databases available using BLAST at NCBI has been expanded, including the addition of ‘16S ribosomal RNA sequences (Bacteria and Archea)’ and ‘28S ribosomal RNA sequences from Fungi type and reference material (LSU)’.
- When BLAST at NCBI is used with multiple query sequences, the job will continue even if particular sequences fail due to a problem. Results for successful searches (including those with no hits) are returned. Sequences missing from the results due to problems are recorded in the job log.
- Searches against the Patented protein sequences database using BLAST at NCBI work once again. Previously, these searches always failed, with a dialog message saying only that no hits were found even though an error was returned by the NCBI. For affected searches, the error was reported in the job log.
- Fixed an issue affecting BLAST HSP Tables where the calculation of percent overlap between blast hits in reverse direction and query sequence was based on a sequence length that was 2 base pairs two short leading to incorrect values.
- Improvements have been made to make it less likely that a "CPU usage limit was exceeded" error will be returned when running blastp, blastx, tblastn or tblastx using BLAST at NCBI.
Importer and exporter improvements
- Multiple tables can be exported to a single file when using the following exporters: Tab delimited text, Annotation tab delimited text, Table CSV, Annotation CSV.
- A new custom reads option was added to the Illumina importer. The extended options for fastq file import has been added to support 10X data, it is e.g. now possible to import three fastq files with R1, R2, and I1 as paired reads where I1 is added in front of R1.
- When exporting variant tracks to VCF format, variants that fall under thresholds to be exported can now optionally be excluded entirely from the resulting VCF file.
- When using the VCF export setting for complex variant representation "Reference overlap and depth estimate", complex overlapping reference alleles are now exported with a homozygous reference genotype.
- The list of supported GVF attributes in column 9 has been expanded when importing GVF files using the GFF2/GTF/GVF track importer.
- 1000 Genomes annotations are now better supported by the GFF2/GTF/GVF track importer.
- The Zygosity field is now included when exporting to GVF format.
- A subset of columns to export can be specified when exporting Mapping Coverage data.
Other improvements
- Copy Number Variant Detection (CNVs) can use coverage tables generated by QC for Targeted Sequencing as control mappings. Read mappings can still be used as control mappings.
- Copy Number Variant Detection (CNVs) allows different fold-change thresholds for deletions and amplifications.
- When working with paired reads, Trim Reads allows the trimming of a fixed number of bases to apply to only read 1, only read 2, or both reads of each pair.
- An option has been added to Extract Reads or Create Reads Track from Selection to allow just one member of a pair to be extracted when only one meets the extraction criteria.
- Extract Reads accepts stand alone read mappings in addition to reads tracks as input.
- Create Sample Report can take both the Graphical and the Supplementary Report created by QC for Sequencing Reads as input.
- An option has been added to Amino Acid Changes for using one letter amino acid codes in HGVS annotations.
- Filter on Custom Criteria now accepts expression tracks as input.
- In Quantify miRNA the option to select strand-specific analysis has been removed. The analysis is now always strand-specific.
- Remove Duplicate Mapped Reads considers if reads are duplicates based on the start position of reads instead of both start and end. This allows reads that have undergone quality trimming to be recognised as duplicates.
- The distance to consider around an intron-exon boundary when using Predict Splice Site Effect can be specified. Previously a length of 2 was always used.
- A choice of extinction coefficients has been introduced in Create Sequence Statistics.
- Create Mapping Graph can now generate graphs for forward read coverage and reverse read coverage.
- The Sample Reads tool is now named Subsample Sequence List and is located under the Utility Tools | Sequence Lists subfolder of the Toolbox. Peptide sequence lists are now accepted by this tool, in addition to nucleotide sequence lists. Existing workflows containing a Sample Reads element can be upgraded as normal. The element in the workflow will remain named "Sample Reads", as seen in the Workflow Editor.
- In the tooltip displayed when moving the mouse cursor over reads in a fully zoomed-in view of a read mapping track, the number of reads supporting a deletion is displayed, in addition to the number of reads supporting particular base calls.
- When viewing data, tabs within the same tab area can be re-ordered by drag and drop.
- When a Track List and the tracks it refers to are copied in a single operation, the new copy of the Track List will refer to the the new copies of the tracks. Previously, the new Track List continued to refer to the original tracks.
- For workflows with paired read import as part of the workflow run, and when the workflow is launched in batch mode, or contains Iterate elements, paired read handling is now the same as for the relevant NGS importer tools (Illumina, Fasta, Sanger) themselves, irrespective of how batch units are defined or organized. Previously when batch units were based on data organization and all files were in the same folder, each file was treated as a separate batch unit irrespective of whether the Paired option was checked.
- In a workflow, Extract Annotated Regions (formerly Extract Annotations) can be connected to many more downstream tools than earlier.
- A Create Sequence List workflow element is available, replacing the New | Sequence List element. Create Sequence List can be connected to many more tools downstream than the earlier element.
- Memory usage when launching workflows in batch mode has been improved.
- Trim Sequences specifies which version of the UniVec database was used, both in the report and in the history of the trimmed sequences output.
- When the option to create a log is enabled when launching analyses in batch mode, a log file is created for each batch unit, as well as a combined log for all the analyses. Previously, only the combined log was generated.
- The table search criteria "is in list" and "is not in list" can be used with integers without specifying a thousand separators in the search term.
- The few tools that directly manipulate input elements, instead of generating a new element containing the changes as output, now generate a new element as output when used within a workflow. This allows them to be handled like any other tool in a workflow context.
- In addition to sequence elements, Add attB Sites accepts sequence lists with fewer than 10,000 sequences as input.
- Internal compression of CLC data has been improved. Elements created with this version of the software, with compression enabled, can be opened in version 21.0.5 and higher. Data must be exported or saved as uncompressed if sharing data with earlier versions of the software.
- Various minor improvements
Bug fixes
- Fixed an issue in Create Box Plot where percentiles reported in the history of a box plot element were off by one. For example, the "25%-ile" value was given the 24th percentile value. The correct values were used in the plots themselves.
- Fixed an issue in Demultiplex Reads where dual barcodes were not allowed to have mismatches in both barcodes.
- Fixed an issue in Demultiplex Reads where dual barcodes could previously be selected in random combinations. Dual barcodes are now handled in pairs.
- When using the "Genome annotated with genes only" in RNA-Seq Analysis, the range of annotation track types that can be used has been expanded. This includes the use of CDS annotation tracks, among others.
- Fixed an issue in Create Sample Report where, when QC thresholds had been specified for Trim Reads, wrong values from the Trim Reads report were shown in table 1.1 Quality Control of the sample report.
- Fixed an issue that caused Create Sample Report to fail when input reports did not contain values for specified QC thresholds.
- Fixed an issue in Combine Reports and Create Sample Report where the "Mean coverage per target" section would report coverages 10x too high when including a report from QC for Targeted Sequencing.
- Fixed an issue in VCF export where, in rare cases, variants below a specified minimum allele fraction threshold were not removed.
- Fixed an issue affecting Local Realignment where large indels upstream of a target region were sometimes not used when provided as guidance variants.
- Fixed an issue that in rare cases could cause Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection to fail on very high coverage samples when the "Remove pyro-errors variants" option was enabled.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate paired reads with read-through correctly.
- Fixed an issue where Remove Duplicate Mapped Reads did not always de-duplicate reverse mapping single-end reads correctly.
- Fixed an issue affecting QC for Targeted Sequencing, where it failed with an error when an RNA-Seq read mapping containing paired reads was provided as input.
- Fixed an issue in Filter on Custom Criteria where numeric annotations were sometimes not allowed to be filtered using numerical operators such as "<", ">", "=".
- Fixed an issue in Trio Analysis where, in rare cases, inconsistent zygosity between mother and father could lead to a wrong annotation of inheritance. Trio Analysis now reports inheritance as 'Inconsistent zygosity' if zygosity or the number of alleles is inconsistent between child, mother or father.
- Fixed an issue with VCF files exported from the CLC Genomics Workbench, where fusions that had one breakpoint in common were represented in a way that prevented QIAGEN Clinical Insight Interpret from displaying the counts.
- Fixed an issue causing Quantify miRNA to fail when there were empty entries in the Accession column of miRbase
- Fixed an issue causing BLAST hits with an identity below 40% to be shown in black even if the threshold for coloring was set lower than this.
- Fixed an issue where threshold values for color selectors in the side panel of the View Area could not be adjusted.
- Fixed an issue where specifying the color range values for heat maps in the side panel settings did not work.
- Fixed an issue where the names of outputs from Output elements attached directly to an Iterate element in workflows were not as intended when the metadata ({3} placeholder was used. We generally recommend that the specific input number(s) to include in output names are specified when configuring workflows that contain control flow elements.
- An element's position within a folder in the Navigation Area can be controlled when copy/pasting, with the pasted element appearing above a selected element in the same folder. This fixes an issue introduced in CLC Genomics Workbench 12.0, where pasted elements were always placed at the bottom of the list in a folder when pasting.
- Fixed an issue where the content of the recycle bin was not shown correctly after the recycle bin had been emptied.
- Various bug fixes
Changes
- The Sample Reads tool is now named Subsample Sequence List and is located under the Utility Tools | Sequence Lists subfolder of the Toolbox. The functionality of this tool has been expanded. See the Improvements listing above, or refer to the manual.
- The Extract Annotations tool is now named Extract Annotated Regions.
- The tool Set Up Experiment is now named Set Up Microarray Experiment.
- The Track Tools folder, containing tools for working with track elements, has been moved from the top level of the Toolbox to under the Utility Tools folder. Correspondingly, the workflow element for creating track lists is under the Utility Tools folder in the Add Elements dialog and no longer under the "New" list.
- The workflow element for creating sequence lists is under the Utility Tools folder. It no longer appears under the "New" list in the Add Elements dialog.
- The “Number of duplicates distribution” section has been removed from the report produced by Remove Duplicate Mapped Reads.
- When exporting BAM files, file names are limited to a maximum of 254 characters.
- Input modifying tools within workflows generate an output element instead of directly modifying the input provided. Workflows containing these tools may need to be edited.
- The Cut, Copy and Paste buttons have been removed from the toolbar. These functionalities are still available using items under the Edit menu or standard keyboard shortcuts.
- The Restore option under the Edit menu, for moving elements back out of the recycle bin, is now called Restore from Recycle bin.
- The Empty option under the Edit menu, for emptying the recycle bin, is now called Empty Recycle bin.
- The Java version bundled with CLC Genomics Workbench 22.0 is Java 11.0.10, where we use the JRE from AdoptOpenJDK.
Legacy tools
The following tools have been moved to the Legacy folder of the Workbench Toolbox and will be retired in a future version of the software:
- Batch Rename (legacy) This tool has been replaced by two new tools, Rename Elements, for renaming data elements, and Rename Sequences in Lists for renaming sequences within sequence lists.
- Empirical Analysis of DGE (legacy)
Functionality retirement
The following tools have been retired:
- Create Track from Experiment (legacy)
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
- Roche 454 NGxS import (legacy)
- Create Combined RNA-Seq Report (legacy)
- Remove Reference Variants (legacy)
The right-click option "Run in batch mode (legacy) for launching installed, multi-input workflows in Batch mode has been retired. Workflows can be launched in batch mode using standard launch functionality.
QIAGEN CLC Genomics Workbench 21.0.6
Improvements
- When using the VCF export setting for complex variant representation "Reference overlap and depth estimate", complex overlapping reference alleles are now exported with a homozygous reference genotype.
- When exporting variant tracks to VCF format, variants that fall under thresholds to be exported can now optionally be excluded entirely from the resulting VCF file.
Bug fixes
- Fixed an issue in Copy Number Variant Detection (CNVs), where a subset of region CNVs were not correctly calculated on chromosomes with low coverage target regions in the control samples. The issue led to overly large region CNVs. As gene-level CNVs are calculated from region-level CNVs, these were also affected. When only few target regions had low coverage in control samples, results were likely not affected.
- Fixed an issue that could the Workbench to crash on macOS Monterey when using the Welcome Center or viewing sunburst plots (which can be generated using tools in the CLC Microbial Genomics Analysis Module).
- Fixed an issue with the side panel settings where the label coloring of some restriction sites would always be black and the sorting would not be alphabetical when saved view settings were applied to a sequence.
- Fixed an issue that caused Create Sample Report to fail when input reports did not contain values for specified QC thresholds.
- Fixed an issue in VCF export where, in rare cases, variants below a specified minimum allele fraction threshold were not removed.
- Fixed an issue where the wrong entry in a trim adapter list would be opened for editing if the list had been sorted or filtered.
- Fixed an issue in Maximum Likelihood Phylogeny that in rare situations led to tree construction never completing.
- Fixed an issue with the Create BLAST Database tool which could fail if the underlying native BLAST tool reported warnings.
- Improvements have been made to make it less likely that a "CPU usage limit was exceeded" error will be returned when running blastp, blastx, tblastn or tblastx using BLAST at NCBI.
- Fixed an issue affecting the naming of workflow outputs defined using a naming pattern of the form {input:N} or its equivalent {2:N}, e.g. {input:1} or {2:1}. The intended input name may not have been the one used to form the output element names when the workflow included an Iterate element, and the batch units were folders, defined based on the organization of the input data, and import was done on the fly.
- An element's position within a folder in the Navigation Area can be controlled when copy/pasting, with the pasted element appearing above a selected element in the same folder. This fixes an issue introduced in CLC Genomics Workbench 12.0, where pasted elements were always placed at the bottom of the list in a folder when pasting.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 21.0.5
Improvements
- GO Annotation File (GAF) 2.2 files can now be imported.
- Amino Acid Changes provides c. annotations for intronic regions when "Use transcript priorities" is enabled
- Create Expression Clone (LR) output names include the Destination vector name first followed by the Entry clone name. Previously the naming pattern was inconsistent
- The order of samples in reports generated by Create Sample Report when run inside workflows is now consistent between workflow runs. Previously, when multiple Collect and Distribute elements connected to a Create Sample Report element, the order of the samples in the report could differ between workflow runs.
Bug fixes
- Fixed an issue where housekeeping gene normalization was never used by Differential Expression in Two Groups.
- Server import/export locations can again be pre-configured as export destinations in workflow export elements when logged into a CLC Genomics Server. This functionality was present in earlier release lines, but was not present in earlier 21.x versions
- Fixed an issue where TPM expression values were incorrectly reported when using the RNA-Seq Analysis tool with Library Type set to 3' sequencing. Previously TPM was calculated per million mapped reads, instead of per million exonic reads. This resulted in TPMs that summed to less than 1 million, and made TPMs less comparable between libraries that had different proportions of intronic and/or intergenic fragments. Additional notes:
- This issue did not affect results of downstream analyses using tools in the RNA-Seq and Small RNA Analysis folder, i.e. PCA, Heat Map, Venn diagrams, and Gene Set testing.??
- This issue did affect the TPM values of the Quantify QIAseq UPX 3' workflow, delivered by the Biomedical Genomics Analysis plugin.
- Right click menu options relating to exporting graph views that were missing in earlier CLC Genomics Workbench releases in the 21.x release line have been restored. This includes the "Open Graph in New View" and "Export Graph to Comma-separated File" options.
- Fixed an issue causing Copy Number Variant Detection (CNVs) to fail if the order of the chromosomes in the gene and target region annotation tracks differed.
- Fixed an issue where editing the reverse strand of a DNA sequence could result in extra bases being inserted.
- Fixed an issue in Extract Consensus Sequence so that insertions are no longer added in low-coverage regions.
- Fixed an issue affecting the graphical display of BLAST results, where HSPs matching the start of a subject sequence were shown as matching the entire sequence. This affected only the graphical overview. Information in the HSP table was correct.
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
- Right-click functionality for launching installed, multi-input workflows in Batch mode.
QIAGEN CLC Genomics Workbench 21.0.4
Improvements
- The stability of SRA Download has been improved, providing better support of large downloads, particularly on systems with less stable network connections.
- The speed of RNA-Seq Analysis jobs has been improved. Workflows containing an RNA-Seq Analysis element will need to be updated.
- InDels and Structural Variants now discards reads that are longer than 5000 bp. Long reads could previously cause the tool to fail and are of minimal value for structural variant detection based on unaligned ends. Workflows containing an InDels and Structural Variants element will need to be updated.
- Various minor improvements
Bug fixes
- Fixed an issue where InDels and Structural Variants calculated variant ratios incorrectly in cases where multiple breakpoints supported the variant.
- Fixed an issue that caused Annotate with Repeat and Homopolymer Information to fail when variants were within 10 base pairs of chromosome ends.
- Multiple SAM/BAM mapping files can again be selected when launching the Import SAM/BAM Mapping Files tool. Only single files could be selected in versions 21.0.1, 21.02 and 21.0.3.
- Fixed an issue that caused Download Blast Databases to occasionally fail when downloading a subset of databases from NCBI.
- Fixed an issue where deleting a portion of a sequence could cause an error to be reported.
- Fixed an issue where an error was produced when working with outputs of Design Primers when options in the side panel under Primer info were selected. The error appeared when the mouse cursor was subsequently moved over the sequence.
- Fixed an issue where an error resulted when trying to do the following actions when working with sequences in sequence lists: Insert Restriction Site Before/After Selection, Digest All and Create Restriction Map, Show Enzymes Cutting Inside/Outside Selection.
- Fixed an issue that caused an occasional error to arise when deleting bases in a circular sequence that also have restriction sites.
- Fixed an issue affecting Cloning Editor where switching between linear and circular sequence representations could cause the editor to report an error.
- Fixed an issue where Cloning Editor occasionally reported an error when running in Viewing Mode.
- Fixed issues present when working with Sanger assemblies when the "Lock top sequence" viewing option was enabled. Issues included not being able to scale the trace data, problems selecting bases in the sequences, and the right-click menu not offering the expected options.
- Fixed an issue where dragging the edge of contigs in Sanger assemblies beyond the viewing area caused the horizontal scroll bar to keep scrolling.
- Fixed an issue affecting Batch Rename, where if the term to be replaced in element names was not present, the names were deleted.
- Fixed an issue affecting naming patterns in export tools, and in workflow output and export elements, where upper case text within curly brackets in these patterns was translated to lower case when naming the outputs.
- Fixed an issue affecting workflows containing Iterate elements when the names of input data element contained characters considered special by the operating system (e.g. on Windows : < > | ). When affected by this issue, no outputs would be produced or just a Workflow Result Metadata table would be produced.
- Fixed an issue that caused workflows with a Demultiplex Reads element to fail if the tag list options in that element were unlocked and no other tools in the workflow had unlocked parameters.
- Fixed an issue affecting workflows containing Iterate elements connected to more than one downstream element, where the batch overview step in the launch wizard to displayed the same input objects multiple times. This was a problem in presentation only. It did not affect the analyses.
- Fixed an issue affecting the Table view of long Sequence Lists, where the contents of the 'Linear' column could be missing for some sequences.
- Fixed an issue where table export of a Venn diagram did not export all the columns selected in the wizard.
- Fixed an issue affecting metadata tables where the paths to data elements that had been moved were not updated to reflect the new location.
- Fixed an issue where Import Tracks from File did not retain COSMIC links in variant tracks imported from VCF.
- Fixed an issue affecting Import Tracks from File where importing COSMIC variation database did not support the QIAGEN reference set Homo_sapiens_sequence_hg38_no_alt_analysis_set.
- Various minor bugfixes
Plugin notes
Changes have been made that improve the speed of jobs on the CLC Genomics Cloud Engine (GCE) that involve large CLC data elements. This improvement affects systems where the Cloud Plugin is installed and Cloud Connection settings have been configured.
Changes
The location of reference data available for download from QIAGEN via the CLC Genomics Workbench (e.g. QIAGEN reference sets, protein and resistance databases) is changing. The list of sites for configuring firewall settings for networks that utilize a whitelist approach are:
- reference.clcbio.com
- reference.clcbio.com.s3-website.eu-central-1.amazonaws.com
- genomics-cloud-reference-data-eu-central-1.s3-website.eu-central-1.amazonaws.com
No configuration changes are needed in the CLC Genomics Workbench itself. The full list of sites the software accesses is available in our FAQ entry: Which internet addresses does CLC software need access to?
QIAGEN CLC Genomics Workbench 21.0.3
Bug fixes
- Fixed an issue in that caused metadata layers to be displayed incorrectly on heatmaps produced by Create Heat Map for RNA-Seq. This issue affects analyses run using CLC Genomics Workbench 21.0.1 or 21.0.2, whether the tool is run independently, or included in workflows. We recommend deleting Heat Maps produced by affected software, and re-running the analyses. Please see the notification about this issue, which includes details about how to check if your results are affected.
QIAGEN CLC Genomics Workbench 21.0.2
Improvements and bug fixes
- Fixed an issue that prevented the use of saved view settings that included additional restriction enzymes (configured via the side panel settings). This issue could cause an error message to arise when the saved view settings were applied directly to an open sequence or sequence list, or when such view settings were applied generally for sequences in the Workbench Preferences.
- Fixed an issue where changes to the list of motifs, made via the side panel settings of sequences, were not saved when that view setting was saved.
- Fixed an issue where where the axis on a log plot could show the wrong value when zoomed out to the point where there were multiple orders of magnitudes between the indicators.
- Fixed an issue where Demultiplex Reads run within a workflow context could not be run on paired reads where the barcode was defined on the mate.
- Various minor improvements
QIAGEN CLC Genomics Workbench 21.0.1
Bug fixes
- Fixed an issue introduced in CLC Genomics Workbench 21.0, where stand-alone read mappings could not be opened.
- The Java version bundled with CLC Genomics Workbench 21.0.1 is Java 11.0.6, where we use the JRE from AdoptOpenJDK. This addresses an issue present in Java 11.0.8, included with CLC Genomics Workbench 21.0, which could cause Java applications on mac systems to crash when switching keyboard layouts.
Please see the release notes for CLC Genomics Workbench 21.0, below, for a full list of changes since the last general release of this software.
QIAGEN CLC Genomics Workbench 21.0
New features and improvements
Full workflow support for Sanger sequence analysis
New features have been introduced, and improvements made, to support automated analyses of Sanger trace data using workflows.
Trim Sequences
- Trim Sequences can be used in workflows.
- Trim Sequences can be run on the CLC Genomics Server.
- A new sequence element containing the trimmed sequences is output. Previously, the input was modified and saved.
- A report can be generated containing a summary of the number of reads trimmed and the reasons for the trimming. This report is supported by the Combine Reports tool.
- The UniVec database used in this tool has been updated to version 10.0 of UniVec_Core.
Other improvements supporting trace data analysis in workflows
- Trace data can be imported using on-the-fly import in workflows.
- Improved output naming by the Assemble Sequences to Reference and Assemble Sequences tools: The sample name is included in the file name and the sequence names in the output.
- Metadata-based naming is supported in workflows run in batch mode or with Iterate control flow elements through the use of new placeholders: {metadata} and {metadata:<columnname>}.
- The Secondary Peak Calling tool no longer modifies the input data element, but instead produces new elements as output. Note: This change requires that the workflows with this tool that were created in older versions of the software must be manually updated. The old workflow element must be replaced by a new one. The recommended upgrade path for installed workflows containing the Secondary Peak Calling tool is to save a copy of the workflow in the Navigation Area using a version CLC Genomics Workbench 20.x, and then open and manually update that workflow in CLC Genomics Workbench 21.0. The new workflow can then be installed, if desired.
New tools
- Create Sample Report creates a summary report of selected information from multiple reports relating to a single sample. Specific types of information can be specified for inclusion in the Quality Control section.
- Extract IsomiR Counts extracts information from the read mappings of each miRNA or other custom added database type, e.g. piRNA etc, and collects the information across all mappings in a table that can be exported.
- Annotate with Repeat and Homopolymer Information adds annotations to variants by appending two new columns with information about repeat and homopolymer status.
- Merge Variant Tracks merges multiple variant tracks into a single track. Options are available for appending annotations from overlapping variants.
Extract IsomiR Counts, Annotate with Repeat and Homopolymer Information and Merge Variant Tracks were previously available via the Biomedical Genomics Analysis plugin.
Workflow related
- When a workflow with Export elements is run in batch mode, the exported files from each batch run can be saved to separate folders.
- BED and VCF format files can be imported on-the-fly in workflows.
- On-the-fly import can be used without metadata when running workflows in batch mode, and when running workflows containing a single Iterate element.
- Name placeholders for output elements and export elements have been updated, and the naming of outputs of workflows run in batch mode can be more finely controlled.
- Improvements for Workflow Input elements:
- Workflow Input elements can be configured to limit the data input method to either selection of data elements from the Navigation Area or selection of files to be imported using on-the-fly import. The default is to allow the input method to be chosen when launching the workflow.
- Workflow Input elements can be configured to limit the on-the-fly import types available when launching the workflow. Parameters of selected importers can also be locked or unlocked, as desired, defining whether the setting is configurable when launching the workflow.
- Additional configuration options for Iterate and Collect and Distribute workflow elements are available.
- When a workflow with Iterate elements is run with the "Batch" checkbox checked, the "Batch identifier" column in the Workflow Result Metadata table will contain the combined batch identifier, reflecting all levels of batching and iterations.
- The following tools are available to be included in workflows:
Performance improvements
- Improved the performance of the alignment editor for large alignments.
- Create Tree is significantly faster when creating large trees when using the Jukes-Cantor distance measure.
- Fixed an issue that could cause the CLC Workbench to become unresponsive when exporting large binding site tables generated by Find Binding Sites and Create Fragments.
- Improved the performance of searching for data elements associated with a metadata table when using the "Find Associated Data" button.
- Fixed an error that occurred when displaying very large trees.
- 2-dimensional PCA plots with many metadata categories load significantly faster.
- Performance has been improved when tools generating a large number of sequences (for example, Trim Reads) are run on a system with many threads.
- Substantial speed improvements have been made to the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools.
- The speed of Demultiplex Reads has been improved when it is run on machines with many cores.
- The speed of Copy Number Variant Detection (CNVs) has been improved.
- When running Map Reads to Reference using a reference that has been downloaded through the Download Genomes functionality, it is now faster to determine whether an already cached reference index can be re-used.
- Performance improvements have been made for the calculation of generalized linear models in Differential Expression for RNA-Seq and Differential Expression in Two Groups. This can lead to slightly different results, with changes typically smaller than one part in ten thousand.
Working with tables
- Column order can be adjusted when viewing tables, and the revised column order will be respected when exporting the open table to, for example, csv or excel format files.
- Tables in reports can be opened in a new tab: right click -> Open Table.
- Tables can be exported using a right-click option: "File" -> "Export Table". The export takes into account filtering, ordering and deselection of columns.
Export
- Exported files can be saved into subfolders of the selected output area by using a forward slash character / at the start of the custom file name definition.
- Graphics export of Tracks, Track lists, Sequences, Alignments and Read mappings is supported as a standard export, which can be embedded into workflows and executed on a CLC Genomics Server. This feature is intended for high-throughput applications. For other applications, we recommend the existing graphic export tool.
- The naming pattern for files exported using the fastq exporter has been updated to be in line with the naming format the Illumina importer expects. The exported file names now end with "_R1.fastq" and "_R2.fastq". Previously the extension used was ".R1.fastq" when exporting a single file, if pairs where exported to two files, the second file had the extension ".R2.fastq". (The first "." in the original naming has been replaced by an "_").
- Export VCF has been updated:
- It supports the export of CNV and fusion data.
- If multiple elements have been selected for export, there is an option for exporting them to a single file.
- It uses the value "." to represent missing variant annotations.
- Special characters in variant annotations are exported using percent encoding, as specified in VCF 4.3.
Illumina importer
- The "Paired reads" option is enabled by default.
- Improved validation when the "Paired reads" option is enabled,. The names of the pairs of files are validated as follows:
- If the file names follow the Illumina naming format, the two files are required to have the same sample name and lane
- If the file names do not follow the Illumina naming format, but _R1/_R2 is detected in the names, the first file must contain _R1 and the second file must contain _R2.
- If the "Join reads from different lanes" option is enabled, the detected lane, in the format _L001, must be the same for both files.
- If a pair of files does not meet the requirements above, a message is printed in the log and the pair of files is skipped.
- Improved naming of the imported elements:
- If the imported files follow the Illumina naming format, the imported elements no longer contain the _R1_001 suffix.
- Otherwise, if _R1 / _R2 is detected in the names of the files, it is removed from the name of the imported elements.
Create Protein Report
Updates have been made relating to the BLAST functionality in Create Protein Report:
- The default expect value (e-value) for BLAST searches at the NCBI is 0.05, aligning with the defaults used at the NCBI.
- The top 10 BLAST alignments are included in the report, where previously it was the top 100. The full BLAST report continues to be available by clicking on the results in the report, and the full BLAST hit table continues to be included in the report.
- Results of searches against local sequences or databases can no longer be included in the report. (The standard BLAST tool remains available for running local searches.)
Local Realignment
- A restriction has been removed from Local Realignment that prevented paired reads from being realigned when that realignment would change which read was left-most on the reference. The overall effect of this change is to increase the likelihood of detecting insertions in rare cases.
- Improvements have been made when realigning large insertions at the beginning of reads.
- The "Allow guidance insertion mismatches" and "Maximum guidance-variant length" options are enabled only when a guidance-variant track is provided.
- Fixed an issue that caused reads with unaligned ends stretching over a chromosome boundary to be removed from the mapping.
QC for Targeted Sequencing
- A new option in QC for Targeted Sequencing allows a custom list of coverage levels to be specified.
- The report includes the complete set of chromosomes in the "Targeted region overview" section when using references with up to 200 chromosomes. Previously the limit was 100 chromosomes. This change means the hg38_no_alt_analysis_set reference data set, available from the Reference Data Manager, is now supported.
- The report has been extended with values reporting the number and percentage of base positions in target regions with coverage above or equal to the minimum threshold.
Working with a CLC Server
- The CLC Server connection dialog will present information like the version and port of the selected CLC Server prior to login, when that information is available.
- The CLC Server connection dialog will close automatically after the "Log In" button is clicked. The login process runs in the background, indicated by a flashing server icon in the lower left corner of the Workbench.
- When the Workbench loses connection to a CLC Server, it attempts to reestablish the connection. Open views of data stored on the CLC Server are not closed.
- When selecting files stored on a CLC Server, only files with the relevant extensions are listed, together with their date of last modification and the size of the files.
Other improvements
- Improved the alignment quality for read mappings by removing aligned ends with an alignment score of zero. As a result, some alignments will be shorter and may be filtered away because they no longer pass the minimum length fraction criterion. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- Option names and other information in the wizards for the Trim Reads tool and the corresponding workflow element have been updated for clarity and consistency.
- De Novo Assembly reports can be used as input to the Combine Reports tool.
- A new option, "Filter on average expression for FDR correction" is available in Differential Expression for RNA-Seq and Differential Expression in Two Groups. When checked, automatic, independent filtering prior to FDR correction is carried out, with the aim of increasing power.
- A Chromosome Table View is available for tracks and track lists, providing a chromosome-level summary of the data contained in the track or track list.
- Stand-alone Read Mapping, Contig and BLAST Graphics views support wrapped sequence layouts. The relevant option is available in the side panel. This may be of particular interest when working with Sanger trace data.
- Reference data downloaded via Download Genomes includes the version number as part of the name.
- The behavior of track views when making selections in the neighborhood of insertions has been improved.
- Import Metadata uses the name of the imported spreadsheet when naming the resulting metadata table.
- The element History view has been updated, and its performance has been improved when handling many history entries.
- When hovering the mouse cursor over a Sequence List in the Navigation Area, the tooltip includes information about the sequencing platform, if this information is available.
- Improved the rendering of annotations in the Export Graphics tool when used with the "Export whole area" option
- When configuring the Demultiplex Reads tool, tags can be moved up and down.
- The list of file types automatically associated with the Workbench has been updated to only include CLC files (.clc). On Mac OS only, the Workbenches would previously be associated with the set of file types that can be imported using the 'Standard Import' tool. The Workbench can still be associated to any given file type using the standard tools of the operating system in question.
- Annotate with Overlap Information and Filter Based on Overlap count insertions and zero-length annotations as overlapping a region when they overlap either border. E.g. when an insertion is right on the border of a gene, we say that the insertion overlaps the gene.
- Data from the BGISEQ platform is supported for download using the Search for Reads in SRA tool.
- The SRA toolkit has been updated to version 2.10.7.
- Plots and tables generated by QC for Sequencing Reads have better usability, especially when working with long reads. Tables with more than 500 data points now show the first 100 entries and then bin remaining data points, based on range. In graphs, end positions with a coverage below 0.005% across the reads are not included.
- QC for Sequencing Reads reports the QC metrics separately for different read types found in the input data: unpaired reads, R1 reads and R2 reads.
- In Quantify miRNA, the minimum value for the setting "Minimum sequence length", used for seed counting, has been changed to 8. (The seed is a 7 nucleotide sequence from positions 2-8 on the mature miRNA.)
- The Quantify miRNA outputs, "Grouped on mature" and "Grouped on seed tables", contain links to miRBase.
- A new section has been added to the Call Methylation Levels report containing details of read conversion and direction.
- Remove Duplicate Mapped Reads outputs reads in a deterministic order.
- The "Reads trimmed (%)" column in the "Trim summary" section of the Combine Reports output has been removed as it was a duplicate of the "Reads after trim (%)" column.
- Custom attributes can be configured in a data location such that attribute values are not copied when copying data elements.
- Data locations can be added when using the Workbench in Viewing Mode.
- Various minor improvements
Bug fixes
- Fixed an issue affecting Filter on Custom Criteria when included in a workflow with the filtering step option unlocked. If criteria were updated, added, or removed filter in the launch wizard, the updated criteria were not used in the first run of the workflow with these updated values. Instead, the old criteria were used in that run. In subsequent runs, the updated values were used.
- Fixed an issue in Filter on Custom Criteria where, after loading annotations in the wizard, comparison operators for existing filter criteria would be set to defaults, while the original values of those operators, configured before the annotations were loaded, are actually used in the analysis.
- Fixed an issue affecting read mappings where a short deletion was preferred to a mismatch for equal scoring alignments. Tools benefiting from this change include Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- Fixed an issue in Trim Reads where length filters were applied before automatic read-through adapter trimming was done, if it was enabled. This could result in reads shorter than minimum length settings being included in the output.
- Fixed an issue affecting Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection, where forward coverage or reverse coverage could be reported as being higher than it was when looking for very low frequency variants with very low minimum count values.
- Fixed an issue affecting annotations of restriction sites for enzymes cutting within the recognition site, where the arms indicating the cut site spanned large sequence regions, instead of indicating the cut site.
- Fixed an issue where the Search for Sequences at NCBI tool would occasionally return fewer rows than configured in the 'Number of hits' preference setting.
- Fixed an issue where IonTorrent SAM files with special characters in the sample name could not be imported to separate folders.
- Fixed an issue where Map Reads to Reference could occasionally ignore reads when encountering a read with an unaligned end that wraps twice around a chromosome.
- Fixed an issue in Quantify miRNA where the isomiRs associated with a reference mir-rna were not all consistently named using the miRbase isomiR nomenclature (http://www.mirbase.org/help/nomenclature.shtml).
- Fixed an issue with Quantify miRNA, where, when used in a workflow, the unmapped reads output channel could not be connected to an input channel expecting a nucleotide sequence list.
- Fixed an issue in Create Heat Map for RNA-Seq affecting the "Fixed number of features" option, where one member of the set of most variable genes or transcripts was missing from those used in the analysis, with a slightly less variable feature included instead.
- Fixed an issue in Create Heat Map for RNA-Seq, where the "Filter by statistics" option could not be used with miRNA expression data.
- Fixed an issue in Create Heat Map for RNA-Seq, where the history of heat maps did not include the name or the version of the tool used.
- Fixed an issue where RNA-Seq Analysis failed if a read mapped across 2 exons of a gene, where those 2 exons spanned the origin of a chromosome.
- Fixed an issue where RNA-Seq Analysis failed if a gene or mRNA spanned the origin of a chromosome and that chromosome was marked as linear. We now ignore these mRNAs.
- Fixed an extremely rare issue where RNA-Seq Analysis could fail when the positions of genes (or transcripts) were defined with respect to a sequence that was not part of the genome. An example of this kind of annotation is the remote entry identifier allowed by GenBank flat file format, see http://www.insdc.org/files/feature_table.html#3.4 These genes and transcripts are now filtered away prior to the tool being run.
- Fixed an issue that caused Combine Reports to occasionally fail when combining reports with summary information shown as plots.
- Fixed an issue with Combine Reports where, when combining RNA-Seq reports, warning messages for the "Distribution of biotypes" section could be present when they should not have been.
- Fixed an issue in the Navigation Area that caused move operations to be converted to copy operations if the Navigation Area was refreshed before the move was completed.
- Fixed an issue where "Reset tree topology" in the phylogenetic tree editor could fail in some cases when input sequences were extracted from (at least) two different trees.
- Fixed an issue where a wrongly formatted VCF file could make the VCF importer terminate instead of writing the error to the log.
- Fixed an issue where Transcription Factor ChIP-Seq would exit with an error when given a read mapping with a circular reference sequence with coverage across all bases.
- Fixed an issue affecting the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools, where complex indels were reported in regions where the reference had a sequences of Ns. This error was introduced in CLC Genomics Workbench 20.0.2.
- Fixed an issue that could cause De Novo Assembly to occasionally fail when assembling paired data with both the "Auto detect paired distances" and ""Map reads back to contigs (slow)" options enabled.
- Fixed an issue with links to HGNC in gene tracks imported from GFF3 files using "Import Tracks from File" and in some Refseq gene tracks provided via the Reference Data Manager.
- Fixed an issue where external files could not be opened on Windows Server 2019.
- Fixed an issue where, on Mac OS, clicking CLC URLs ("clc://...") would open an older version of the Workbench even after installing a new version.
- Fixed an issue where a path specified in the path.properties configuration file was not properly interpreted if it was specified in 'Windows syntax', e.g. "x:\myDrive\temp"
- Fixed an issue in Excel importer, where the presence of certain formulas would previously prevent successful import.
- Fixed an issue where, if BLAST at NCBI failed with an error, no error would be shown and instead no hits were returned.
- Fixed an issue where Track Lists sometimes could not display reference data.
- Fixed an issue where the Ctrl+F keyboard shortcut did not activate the 'Find' side panel when viewing a track list.
- Fixed an issue where some workflows using a Collect and Distribute element with multiple output channels did not pass the correct inputs to a tool after the Collect and Distribute element.
- Fixed an issue where compressed data elements sometimes appeared to have "Compression enabled: No" in the Element Info tab.
- Fixed an issue that caused the updating of plugins containing workflows (e.g. Biomedical Genomics Analysis, CLC Microbial Genomics Module) to be slow.
- Various minor bug fixes
Changes
Other changes
- The Java version bundled with CLC Genomics Workbench 21.0 is Java 11.08, where we use the JRE from AdoptOpenJDK.
- The read mapping tool used by various tools in the CLC Genomics Workbench (e.g. Map Reads to Reference, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference) has been updated for this release and corresponds to the version in CLC Assembly Cell 5.2.1. Other binaries are unchanged and continue to correspond to the versions in CLC Assembly Cell 5.1.1.
- The default base name for the element being exported is designated using the placeholder {name}, instead of {input}. The numeric equivalent, {1}, is unchanged. The default export naming pattern has correspondingly been changed to {name}.{extension}. (GxS notes only, add the following: This change also applies to exports configured in External Applications.) Previously {input} was used.
- The default expect value (e-value) for BLAST at NCBI is 0.05 and the maximum number of hits is 5000, aligning with the defaults used at the NCBI.
- Changes have been made to the handling of sequence identifiers when using Create BLAST Database. This change allows continued flexibility in the naming of sequences used for making these databases, avoiding direct exposure to limitations present in the underlying BLAST+ program, makeblastdb, such as not allowing long or duplicate sequence names. Further details are provided in our FAQ area.
- The option "Reports originate from a single sample" has been removed from the Combine Reports tool. For generation of a single sample combined report, please use the new Create Sample Report tool.
- The "Chromosome M name" option in Trio Analysis has been renamed to "Chromosome MT name", with default value "MT" instead of "M".
- The creation of Workflow Result Metadata tables is optional when running workflows on the CLC Genomics Server.
Functionality retirement
Tools
- Reverse Sequence
Plugin notes
Plugin retirements
- The Ingenuity Variant Analysis plugin has been retired. The Ingenuity Variant Analysis service has been replaced by QCI Interpret Translational. Please email ts-bioinformatics@qiagen.com if you need further information about this.
- The Advanced Structural Variant Detection (beta) plugin has been retired. An improved tool, Structural Variant Caller, is available in the Biomedical Genomics Analysis plugin.
- The functionality of the External Applications Client Plugin is now built into CLC Workbenches, so this plugin has been retired.
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
QIAGEN CLC Genomics Workbench 20.0.5
Improvements and Changes
- The QC for Targeted Sequencing report now includes the complete set of chromosomes in the "Targeted region overview" section when using references with up to 200 chromosomes. Previously the limit was 100. This change means the hg38_no_alt_analysis_set reference data set, available from the Reference Data Manager, is now supported.
- The SRA toolkit has been updated to version 2.10.7.
- The maximum number of hits returned by BLAST at NCBI is now 5000, aligning with the defaults used at the NCBI.
- The speed of Copy Number Variant Detection (CNVs) has been improved.
- Various minor improvements
Bug fixes
- Fixed an issue affecting Filter on Custom Criteria when included in a workflow with the filtering step option unlocked. If criteria were updated, added, or removed filter in the launch wizard, the updated criteria were not used in the first run of the workflow with these updated values. Instead, the old criteria were used in that run. In subsequent runs, the updated values were used.
- Fixed an issue where Map Reads to Reference could occasionally ignore reads when encountering a read with an unaligned end that wraps twice around a chromosome.
- Fixed an issue affecting annotations of restriction sites for enzymes cutting within the recognition site, where the arms indicating the cut site spanned large sequence regions, instead of indicating the cut site.
- Fixed an issue in Local Realignment that caused reads with unaligned ends stretching over a chromosome boundary to be removed from the mapping.
- Fixed an issue that could cause the CLC Workbench to become unresponsive when exporting large binding sites tables generated by the Find Binding Sites and Create Fragments tool.
- Fixed an issue with Quantify miRNA, where, when used in a workflow, the unmapped reads output channel could not be connected to an input channel expecting a nucleotide sequence list.
- Fixed an issue in Create Heat Map for RNA-Seq affecting the "Fixed number of features" option, where one member of the set of most variable genes or transcripts was missing from those used in the analysis, with a slightly less variable feature included instead.
- Fixed an issue in Create Heat Map for RNA-Seq, where the "Filter by statistics" option could not be used with miRNA expression data
- Fixed an issue where the history of heat maps produced by Create Heat Map for RNA-Seq did not include the name or the version of the tool.
- Fixed an issue where, if BLAST at NCBI failed with an error, no error would be shown and instead no hits were returned.
- Fixed an issue with Combine Reports where, when combining RNA-Seq reports, warning messages for the "Distribution of biotypes" section could be present when they should not have been.
- Fixed an issue that caused the updating of plugins containing workflows (e.g. Biomedical Genomics Analysis, CLC Microbial Genomics Module) to be slow.
- Fixed a bug where some workflows using a Collect and Distribute element with multiple output channels did not pass the correct inputs to a tool after the Collect and Distribute element.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 20.0.4
Improvements
- The NCBI nucleotide database "Betacoronavirus", focused on SARS-CoV-2, has been added to BLAST at NCBI.
- Names of extracted consensus sequences now have the pattern: "<input name> <reference name> consensus". This affects outputs from the Extract Consensus Sequence tool, as well as consensus sequences generated using right-click menu options and buttons.
- The COSMIC track importer has been updated to support version 91 of the COSMIC Mutation Data format. Due to insufficient information in version 90, we are not able to support that particular version. Older versions are still supported.
- The vertical range when viewing chromatograms has been increased, allowing the full view of all peaks, even in noisy data containing many small peaks.
- The SRA Toolkit has been updated to version 2.10.5 on Linux and Mac OS X. This improves stability in environments with unreliable network connections.
Bug fixes
- Fixed an issue where options of Extract Annotations were not configurable in workflows.
- Fixed an issue with Filter Custom Criteria where filtering on the Regions column using operators >=, <=, and != would remove all variants or annotations.
- Fixed an issue that could occasionally cause the InDels and Structural Variants tool to fail with an error message about a breakpoint missing at a particular location.
- Fixed an issue where unnecessary, empty output folders could be generated by analyses run in batch mode. This happened when the "Create subfolders per batch unit" option was enabled, and the "Include" or "Exclude" field had been used to specify elements within each batch unit to analyze, such that some batch units were empty.
- Fixed an issue where the Standard Importer for Fasta Alignments would fail when importing multiple files.
- Fixed an issue where importing a Clone Manager file (.cm5) where a sequence length annotation would be ignored and the full sequence imported.
- Fixed an issue where the order of genes was not preserved when copying from a heat map.
- Fixed an issue that could occasionally cause the export to PDF format of some reports to fail with an error.
- Various minor bugfixes
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 20.0.3
Bug fixes
- Fixed an issue with Create BLAST Database, where BLAST databases could not be created if the sequence set included entries with identifiers longer than 50 characters or of a form similar to PDB identifiers. This was due to new requirements introduced with NCBI BLAST+ 2.8.1. To address this, we have replaced the BLAST+ 2.9.0 makeblastdb tool, used by Create BLAST Database, with the version from BLAST+ 2.6.0. This is the same version used in CLC Genomics Workbench 12.x. This change does not affect the searching of BLAST databases using this or earlier supported versions.
- Fixed an issue affecting Search for Sequences in Uniprot where sequences could not be downloaded due to changes to the Uniprot database format. Uniprot entries downloaded now may have small differences in the content of some fields relative to downloads made in the past.
- Fixed an issue that sometimes prevented the selection of individual data elements as inputs to a workflow, allowing only folders to be specified, when the "Batch" box was checked.
- Fixed an issue where the output naming pattern of a workflow was not properly respected if output channels of one or more elements were connected to both a Collect and Distribute element and to an Output element.
- Fixed an issue where, after installing a workflow with bundled data and restarting the Workbench, the installed workflow could not be run, with a message in the wizard incorrectly reporting that the bundled data "is missing".
- Fixed an issue that prevented printing to certain types of printers on Windows 10 systems.
- Fixed an issue that could prevent the export of graphics in PDF format, if an older version of the software had previously been used to export graphics.
- Fixed an issue where sequence annotations were sometimes incorrectly rendered at some zoom levels for sequences, sequence lists and whole genome alignments.
- Fixed an issue that could cause scrolling through data in an open tab to continue for some time when a selection in an editor was dragged to somewhere outside the editor and the mouse button then released.
- Fixed an issue where the overlap of a small minority of annotations or variants was incorrectly determined. A visible outcome of this was in the rendering of annotations and variants in some editors. In the small number of affected positions, annotations and variants could be displayed in multiple vertical layers, instead of beside one another, or appeared to be "hopping" vertically when you scrolled in the editor.
Other known manifestations, addressed by this fix, occurred in tools delivered by plugins:- Biomedical Genomics Analysis when installed on CLC Genomics Workbench 20.0, 20.0.1 or 20.0.2
- Results of Annotate RNA Variants, a tool included in the Perform QIAseq Multimodal Analysis (Illumina) and Perform QIAseq RNA Fusion XP Analysis workflows. The problem could affect following annotations on variants called at a small number of specific positions: "Matches known intron", "Possible splice signatures" and "Conserved splice signature". Further details are available, including information relating to the expected (very small) magnitude of the problem.
- Transcript Discovery when installed on CLC Genomics Workbench 20.0.2 or earlier versions
- The Transcript Discovery tool occasionally identified an incorrect exon boundary. Due to the expected level of sensitivity and precision of this tool, we expect this to have very little impact in practice.
- Biomedical Genomics Analysis when installed on CLC Genomics Workbench 20.0, 20.0.1 or 20.0.2
- Various minor bugfixes
Advanced notice
- The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
- Create Combined RNA-Seq Report (legacy)
- The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 20.0.2
Improvements
- When the "3' sequencing" Library type setting of RNA-Seq Analysis is selected when analyzing reads that have been annotated with UMIs by tools of the Biomedical Genomics Analysis plugin, expression values in the GE track are based on the number of distinct UMIs for each gene, rather than the number of reads.
- The running time of Fixed Ploidy Variant Detection and Low Frequency Variant Detection has been substantially improved for some data sets with large numbers of differences from the reference in localized regions of the read mapping.
- Metadata associations are now preserved when importing .zip files containing both metadata tables and associated data.
- Various minor improvements
Bug fixes
- Fixed a bug in Create Heat Map for RNA-Seq affecting the "Fixed number of features" option, where one member of the set of most variable genes or transcripts was missing from those used in the analysis, with a slightly less variable feature included instead.
- Fixed an issue where some plots, including volcano plots and scatter plots, sometimes showed an incorrect tooltip when hovering over a data point with the mouse.
- Fixed a bug that occurred when workflows in the Navigation Area created in CLC Genomics Workbench 12.x (or older versions) were updated, where the "Unmapped Reads" output of Map Reads to Reference was no longer passed on to downstream workflow elements after the update. Installed workflows were not affected by this bug.
- Fixed an issue affecting workflows with Iterate elements, where an error occurred if there were more than 350 iteration rounds when the workflow was run on the QIAGEN CLC Genomics Workbench.
- Fixed an issue affecting workflows run in batch mode or that included Iterate elements, where if two input elements with the same name but in different folders were selected from the Navigation Area, and the "Use organisation of input data" option was chosen, then the inputs were allocated to the same batch, even though the batch overview showed that they would be in different batches. Now, they will be allocated to different batches, as shown in the batch overview.
- Fixed an issue where data elements selected in the Navigation Area were sometimes incorrectly preselected when browsing for inputs in wizard steps of tools and workflows.
- Fixed a bug that prevented installed workflows with configuration options and at least one locked input from being configured through the Workflow Manager.
General information
NCBI plans to change their blast database folder structure in early February 2020. When that happens, the Download BLAST Databases tool of QIAGEN CLC Genomics Workbench 20.0.2, and future updates, will list the new, dbV5 blast databases for download. The Create Blast Database tool will continue to create dbV4 databases. Databases of either version can be searched using the blast programs included in QIAGEN CLC Genomics Workbench 20.x release line.
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 20.0.1
Improvements
- Reports generated by Call Methylation Levels can now be used as inputs to the Combine Reports tool.
- Import of Gene Ontology Annotation files now supports files including BOM encoding.
- The SRA toolkit has been updated from version 2.8.0 to 2.9.6.
- The hmmsearch program, used by Pfam Domain Search, is now 64-bit on all platforms. Previously a 32-bit version was distributed for use on macOS.
- Information from the File Info field is now included when printing from the Element info view.
Bug fixes
- Fixed a bug in the Primer Designer, where clicking on each potential primer starting position did not highlight the the primer region.
- Fixed a bug in the advanced filter of the table view of sequence lists, where days and months could not be matched with corresponding values in the "Modified" column.
- Fixed a bug where sequences could not be downloaded from NCBI through the 'Show BLAST Hit Table' view.
- Fixed a bug in Experiments, where very long column group headers could cause an "out of memory" error on macOS.
- Fixed a bug where the name of the gene was not displayed when hovering over a point in the Volcano Plot View of Statistical Comparison tracks
- When exporting Oxford Nanopore alignments in SAM or BAM format, the platform specification is now exported as "ONT". Previously it was exported as "NANOPORE".
- Fixed an issue in the element history of a result generated by Extract Annotations, where the reference sequence track could appear to have been used for extracting annotations, even when it was not used.
- Fixed a bug in workflows using control flow elements, where some outputs were not saved if the same output channel was connected directly to both an output element and to a Collect and Distribute element. This problem occurred if the combined lengths of the input filenames for a given run exceeded 250 characters, and batch units were defined on the basis of the organization of the input data.
- Fixed a bug in workflows containing two Iterate elements connected to each other, where it was not possible to select a metadata column for the inner Iterate element in the batch configuration step of the workflow wizard if two or more connections were made from the output channel of that element.
- Various minor bugfixes
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 20.0
New features
Workflows
- NGS sequence data can be imported on the fly, as an initial action when a workflow is run, avoiding the need import the data prior to launching the workflow.
- When launching workflows, batch units can now be defined using metadata, supplied either as a CLC medata table or by selecting an Excel format file containing information about the data.
- Workflows with multiple inputs, where those inputs should be matched with each other, can now be launched in batch mode, making use of the ability to define batch units based on metadata. For example, a workflow where sets of reads should be mapped to different reference sequences can now be launched in batch mode.
- Two new workflow elements have been introduced, Iterate and Collect and Distribute, which allow workflows to be designed where the execution of different parts of a workflow can be finely controlled. For example, using these elements, a single workflow can contain an RNA-Seq analysis step, typically run once per sample, as well as a Differential Expression for RNA-Seq step, typically run once for a set of samples. Similarly, a single workflow can be designed to run batches of trio analyses, producing cohort-level reports as outputs.
- Workflows now produce a Workflow Result Metadata table, which contain one row per output, with the relevant data element associated with that row. When launched in batch mode, the batch the row relates to is clearly indicated.
Epigenomics analysis
- Tools for detecting peaks in sequencing data are now available from a new 'Advanced Peak Shape Tools' folder found in the Epigenomics Analysis folder of the Toolbox:
- Learn Peak Shape Filter
- Apply Peak Shape Filter
- Score Regions
These tools were formerly available via the Advanced Peak Shape Tools plugin (beta).
- A tool for detecting evidence for histone acetylation marks in genes or other predefined genomic regions is now in the Epigenomics Analysis folder of the Toolbox:
- Histone ChIP-Seq
This tool was formerly available via the Histone ChIP-Seq plugin.
- Histone ChIP-Seq
miRNA analysis (small RNA)
- Tools for analyzing miRNA data are now available under the RNA-Seq and Small RNA Analysis folder of the Toolbox:
- Quantify miRNA
- Annotate with RNAcentral Accession Numbers
- Create Combined miRNA Report
These tools were formerly available via the Biomedical Genomics Analysis plugin.
- MirBase is now available through the Reference Data Manager as a Reference Data Element.
Protein structure and homology
- Generate Biomolecule A new tool available from the side panel of Molecule Projects allowing biomolecules to be generated or extracted based on symmetry information in PDB files.
- Find and Model Structure A new tool that finds suitable protein structures for representing a given protein sequence. From the resulting table, a structure model (homology model) of the sequence can be created by one click using one of the found protein structures as template.
- Molecule structures in a Molecule Project can now be exported to a PDB format file.
- Search for PDB Structures at NCBI is now available when running the Workbench in Viewing Mode.
Import and export
- Reports can now be exported in JSON format and in PDF format.
- A new option in the Illumina importer, "Join reads from different lanes", will when enabled merge fastq files from the same sequencing run but from different lanes into a single sequences list.
- It is now possible to select input files from multiple folders when importing high-throughput sequencing data.
- Create Expression Browser can now use tables imported from CSV or Excel format files as an annotation resource. Using such tables, sort and filtering can be done according to numeric annotation values as well as textual annotations.
- When exporting to PDF, there is now an option to export the history of the report.
Other new tools
- Combine Reports Summarizes information from multiple reports and produces a single report. It can be used for combining different report types for a single sample, or combining reports for a set of samples.
- Create Variant Track Statistics Report Creates a summary report for different types of variants in variant tracks.
RNA-seq Analysis
- A new option, "Library type setting", in the RNA-Seq Analysis tool offers the selection of "Bulk", for analysis of samples where reads are expected to be uniformly distributed across the full length of transcripts , or "3' sequencing", which tailors the output and report quality control for samples generated using low input 3' sequencing applications. "Bulk" is the default, and corresponds to the behavior of the tool in previous software versions.
- The definition of "Maximum number of hits for a read" in the RNA-Seq Analysis tool has been simplified. It now refers to the number of distinct places on the reference that a read maps best to. Previously, a more complex definition was used, involving checking for matches against genes and then against intergenic regions, with rules applied to the results.
- The report generated by RNA-Seq Analysis now includes the percentage of reads mapped to transcripts of particular length ranges, aiding the interpretation of the "Coverage along normalized transcript length" graph.
Trim Reads
- Sequences can now be trimmed to a fixed length from either the 3' or the 5' end.
- New options have been added to allow homopolymer trimming to be finely tuned.
- If "Trim ambiguous nucleotides" is enabled, ambiguous characters (e.g. N) at the end of sequences are removed, even if the number of these characters is lower than the limit set. Previously, such characters were left in place if their number was lower than the limit.
- When included in a workflow, Trim Reads now always produces an output when an output element is connected to it. This includes the following situations:
- Where no reads have been trimmed (either because all trimming options were deselected, or because none of the trim options matched any of the reads). In this case, the "Trimmed sequences" output will contain all input reads, "Discarded sequences" will be empty, and "Percentage trimmed" will be 100% in the report.
- Where all reads have been trimmed. In this case, the "Discarded sequences" output will contain all input reads, "Trimmed sequences" will be empty, and "Percentage trimmed" will be 0%.
BLAST
- A new option for the BLAST tool called Filter out redundant results, will when enabled cull HSPs on a per subject sequence basis by removing HSPs that are completely enveloped by another HSP.
- The NCBI blast executables have been updated to version 2.9.0.
- The option "Choose filter to mask low complexity regions" has been renamed to "Mask low complexity regions".
New options in other analysis tools
- A new option for Local Realignment called "Allow guidance insertion mismatches" allows reads to be realigned using guidance insertions that have mismatches relative to the read sequences. This option is enabled by default.
- The creation of reads tracks (mappings) is now optional in the RNA-Seq Analysis tool.
- A new option in Copy Number Variant Detection (CNVs) called "Merge overlapping targets" allows overlapping target regions to be merged into one larger target region. CNV calls are made on this larger region.
- A new option called "Report unmethylated cytosines" is available for the Call Methylation Levels tool. When enabled, methylation levels are reported for all sites with read coverage, rather than only for sites with methylated cytosines.
- Two new options in Create Mapping Graph are available for generating coverage tracks for reads that mapped best to a single location on the reference sequence: "Specific read coverage" and "Paired read specific coverage".
Improvements
Workflows
- All installed workflows can now be updated in a single operation from the Workflow Manager using the new Update All Workflows button.
- Placeholder-based naming of outputs in workflows can now be configured at a finer level: the {input} or {2} placeholder is now replaced by the name of the first workflow input by default. This can then be further configured to use the names of other inputs by specifying them by number after a colon in the placeholder. For example: {2:1,3} would be replaced by the names of workflow inputs number 1 and 3. Previously, a workflow output configured as {2} was replaced by a concatenation of all the workflow input names.
- The listing of items in the "Add Element" dialog in the Workflow Editor has been improved: installed workflows are no longer listed and searches no longer return tools where the search term matches only tooltip text.
- When running a workflow configured to use reference data, the Reference Data Set selection step has been updated to show the list of preconfigured elements in the tooltip.
- The "Export to PDF" tool can now be used in workflows to export reports in PDF format.
Performance improvements:
- Mapping of NGS reads on multicore systems is now approximately 25% faster. Tools benefiting from this improvement include Map Reads to References, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- Saving analysis results to an SSD is now considerably faster.
- The import of ZIP files has been improved: temporary objects are cleaned up during the import process, reducing the required disk space.
- Moving and deleting many elements at once is now faster.
- Emptying the Recycle bin now takes place in the background.
- Messages from tools are no longer presented in the form of black bubbles in the Processes area. Messages are still writtent to the log.
- Basic Variant Detection, Fixed Ploidy Variant Detection, and Low Frequency Variant Detection have been optimized to work on machines with lower memory. The changes are most noticeable in situations where coverage is high or where many variants are called.
- Improved memory handling when working with read mappings with very high coverage.
- There are general performance improvements in the following areas:
- The Navigation Area
- BLAST and Add attB Sites tools when using large sequence lists
- Opening large protein sequences
- Making BLAST databases where most sequences have the same name.
Demultiplex Reads
- Sequences with a single mismatch to a barcode and that can be grouped unambiguously can now be demultiplexed.
- Demultiplex Reads is now multithreaded for faster execution.
- The percentage or reads in each group is now reported to one decimal place in the report.
- The percentage of reads not grouped is no longer included in the "Reads per barcode" plot in the report.
- Various other minor improvements
QC reports
- Plots in the "Per-base analysis" section of the graphical report produced by QC for Sequencing Reads no longer include a value for base position 0. Values at position 0 in these plots previously were not meaningful.
- Base position numbering now starts with 1 in the coverage table of the supplementary report produced by the QC for Sequencing Reads tool. Previously the base position numbers started at 0.
- The reports produced by the QC for Read Mapping and QC for Targeted Sequencing tools now also include the median coverage.
- The coverage report generated by QC for Targeted Sequencing now includes the total length of target region positions with coverage below the specified level.
- QC for Target Sequencing has been updated:
- Reads mapping across the origin of circular references are now counted. (These were previously ignored.
- Relevant warning messages are written to the log and the report when a target region track contains regions that overlap or that span the origin of circular references.
- An issue has been addressed where the number of mapped bases reported in in the Target Region Overview section was not correct for tracks containing overlapping regions.
Tracks and track lists
- When hovering over a position in a Reads track that is shown in non-aggregated view, a tooltip appears showing the read counts for each observed nucleotide in that position, together with the directions of the reads that contain that nucleotide.
- When opening a track list, the first variant track is no longer opened if it is already open in an editor.
- In a track view or track list view, the "Location" field in the side panel now accepts ranges that include spaces. For example, "X: 70,832,863 - 70,842,697". Previously, spaces were not supported.
Import and export
- When importing BED files using the Import Tracks tool, only the first three columns (chromosome, start and end positions) are now required to match UCSC specifications for the BED format. Remaining columns that do not match these requirements will be imported as Var1, Var2, etc.
- The CSV importer has been updated:
- Values no longer need to be enclosed in quotation marks in the CSV file to be successfully imported.
- Data values starting with a numeric character but also containing non-numeric characters are now interpreted as text. Such values were previously converted to numbers and then only imported up to the first non-numeric character.
- The import of Nexus files has been updated to more closely match the format specifications.
- When selecting files to import from an import/export directory via a CLC Workbench, right-clicking on a folder name now brings up a menu with the options: "Add the content of a directory" or "Add the full content (recursively) of a directory".
- The "Excel 2010" and "Excel 97-2007" exporters now export NaN and +/-Infinity values to #N/A.
- When importing multiple files using the Standard Import, the process ends with an error if at least one of the files failed to import. The details of which file failed and why can be seen in the log.
- The GenBank exporter now replaces any spaces in annotation names with underscores.
Searching
- It is now possible to search for values contained within metadata tables.
- Elements in the CLC_References locations can now be found by searches.
- Elements in a Recycle Bin can now be found by searches.
Metadata related
- Metadata tables can be moved to a new File Location while maintaining metadata associations.
- Three matching schemes are now available for associating data with metadata, based on matching data element names with values in the metadata key column: Exact, Prefix and Suffix. Suffix is a new option, where matches are sought starting from the end of data element names. Prefix, previously named Partial, looks for matches from the beginning of data element names. Exact, as in earlier versions, seeks exact matches between data element names and metadata key column values.
Create Box Plot
- Create Box Plot now calculates the median and percentile values in the same way as the "quantile" method in R. This aligns with the way these values are calculated by other tools in the QIAGEN CLC Genomics Workbench.
- Whiskers of boxplots now range from the lowest data point within 1.5 times the inter quartile range (IQR) of the lower quartile and the highest data point within 1.5 IQR of the upper quartile. Previously, they extended 1.5 times the length of the box (IQR).
Improvements to other analysis tools
- The algorithm used to auto-detect paired distances when mapping NGS reads has been improved. Tools benefiting from this include Map Reads to References, RNA-Seq Analysis, Map Reads to Contigs and Map Bisulfite Reads to Reference.
- Improvements to SRA download:
- The temporary disk space needed to download data has been reduced significantly.
- Technical reads are now discarded.
- Orphan reads are now put into a separate output for paired data.
- When importing multiple files containing sequencing reads (QIAGEN GeneReader, Illumina, PacBio, Fasta Read, Ion Torrent) or when importing SAM format files, a single problematic file does not stop the import. The import process now continues with the next file if it encounters a file that could not be imported.
- The "Chromosome coverages" section in the results report produced by the Copy Number Variant Detection (CNVs) tool is now a table.
- The "CPM" expression option in the side panel setting of the Expression Browser has been renamed "CPM (TMM-adjusted)" to reflect how it is calculated.
- The TMM Normalization used in the Expression Browser and in Create Heat Map for RNA-Seq, PCA for RNA-Seq, Differential Expression in Two Groups, and Differential Expression for RNA-Seq, has been changed. This change involves how a reference column is selected for TMM Normalization. It is unlikely to lead to noticeable differences in results. Changes are most likely to occur in situations where the majority of transcripts/genes have zero expression.
- The "All group pairs" and "Across groups (ANOVA-like)" comparisons in Differential Expression for RNA-Seq now compare expressions in the same direction. Previously, the fold changes reported by these 2 tests for the same data, entered in an identical order, had opposite signs.
- The long form of the HGVS nomenclature for DNA is now used by the Amino Acid Changes tool for annotating coding region changes: the bases of deletions and duplications not longer than 50 nucleotides are included, and repeated sequences are reported using the insertion form.
- Exon information added by Annotate with Exon Numbers now includes a blank entry if a variant is located in an intronic region. For locations with multiple isoforms annotated, this gives a one to one relationship between the number of exon annotations and the number of isoforms.
- InDels and Structural Variants now consistently assigns a count of 1 for a paired read, leading to improved statistics. Previously, regions where the R1 and R2 reads overlapped were assigned a count of 2.
- Filter Variants on Custom Criteria now prints a message to the log if any columns specified in the criteria are not present in the data.
- The QC for Targeted Sequencing tool now sets the direction of each read in a pair independently, which can lead to more accurate forward and reverse coverage values in some situations.
- Identify Shared Variants now reports homozygous sample frequency, heterozygous sample frequency and mean allele frequency.
Other improvements
- Outputs of tools provided by plugins now include the plugin name and version in the element history.
- A new option when right-clicking on a table cell, Edit | Copy Cell, allows individual cells to be copied to the clipboard. Previously only whole rows could be copied
- Tool and workflow logs now display an "Elapsed time" column.
- In the tree view of phylogenetic trees, the "Reset Tree Toplogy" button will now also uncollapse any collapsed nodes.
- The name of a non-default workspace is now shown in the Workbench title bar.
- The table view ("Show Table") of plots has been improved in the cases where multiple data series are shown in the plot. The table now includes all of the x values from all data series, instead of the x values from just the first data series. If a data series is missing a y value for a specific x value, than the entry in the table will be empty.
- The maximum size of a plot in a report displayed in the Workbench has been increased too 800 pixels, and the width/height ratio has been changed from 2/3 to 1/2.
- The ranking of search results in Quick Launch has been improved.
- CLC URLs have been made more compact.
- The icon for the sequence view has been changed for protein sequences, so it is possible to distinguish protein sequence views from nucleotide sequence views based on the icon.
- In the Reference Data Management dialog, the "usable free space" is shown instead of the previous "free space".
- In the Batch Rename tool, the option 'Replace part of the name' fields have changed from 'From' and 'To' to 'Replace' and 'With' for clarification.
Bug fixes
- Fixed a rare issue that could cause some jobs to fail when multiple instances of Filter Variants on Custom Criteria were run simultaneously.
- Fixed an issue causing the file chooser dialog on Windows systems to freeze when selecting bzip2 format files for import.
- Fixed a bug where failing import of Illumina .fastq files could leave files in the temporary files directory.
- Fixed a bug where the track views and table views of Statistical Comparison tracks did not synchronize to show the same genomic location when an annotation was selected in one of the views.
- Fixed an issue in the "Duplicated sequences" section of the QC for Sequencing Reads graphical report, where the relative sequence count for the duplicate count of 100 was incorrectly reported in the field for the duplicate count of 99.
- Fixed an issue affecting mac OS X setups with accessibility settings enabled, where the "Replace Selection with Sequence" functionality available from within the Cloning editor could fail with an error.
- Fixed a bug where workflow installer files did not include the specified icon.
- Fixed a bug where Expression Browsers could not be displayed or exported if they contained GO annotation values that included parentheses but no database reference.
- Fixed an issue in RNA-Seq Analysis where an error message was produced if the value entered for "Minimum read count fusion gene table" was 1 and no fusions were found.
- Fixed an issue where multiple target tracks could be selected when running QC for Targeted Sequencing in a workflow context. If done, only the first target track selected was used. Now, only one target region can be selected.
- Fixed an issue in Copy Number Variant Detection (CNVs) algorithm reports, where values in the "Start BIC" and "End BIC" columns in section 3.1 were truncated to a maximum of 4 digits in the integer part. The underlying calculations were not affected.
- Fixed an issue where the Gene Set Test tool did not exclude relevant GO terms as computationally inferred if there were parentheses in the GO annotation description.
- Fixed an issue in the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools that in rare cases could result in the QUAL value reported being slightly different between runs.
- Fixed an issue in Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection where the tool could continue to use the CPU and write to disk even after a job was cancelled.
- Fixed an issue in the GFF2 importer with different representation of stop codons in the CDS regions due to differences in input formats.
- Fixed an issue in Map Reads to Reference where the summary statistics table in the report did not include paired read statistics for mappings with paired end reads if no reads were mapped in intact pairs.
- Various minor bugfixes
Changes
- The Java version bundled with QIAGEN CLC Genomics Workbench 20.0 is Java 11, where we use the JRE from AdoptOpenJDK.
- Using Local Search, searches for sequences with a specific length or length range now only returns individual sequence elements that meet the search requirements. Previously searches were also done within other types of elements, e.g. sequences lists, read mappings, etc.
- The "Create index" option of the BAM exporter can no longer be used in combination with zip compression or choosing to output the results as a single file.
- Options relating to import of paired reads have been removed from the Ion Torrent importer.
- The tool Remove Orphan Reference Variants is now called Remove Homozygous Reference Variants.
- Reads tracks (mappings) are no longer generated by the RNA-Seq Analysis tool by default. Enable the new "Create reads track" output option when launching the tool if reads tracks should be created.
- Some folders in the Toolbox have been renamed and reorganized:
- The RNA-Seq Analysis folder is now called RNA-Seq and Small RNA Analysis, and this folder contains tools for both these areas of analysis.
- The Microarray and Small RNA Analysis folder is now called Microarray Analysis and contains tools relevant to that area of analysis.
- The Quality Control folder is now just above the Resequencing Analysis folder. Previously it was within the Resequencing Analysis folder.
- The Help -> Tutorials menu item has been replaced with Help -> Online Tutorials, which opens the online tutorials in a browser.
- The naming of some outputs from some tools have been updated:
- Demultiplex Reads>
- Grouped reads Now: <sample name> <Barcode name> Previously: <Barcode name>
- Ungrouped reasds Now: <sample name> Not grouped Previously: Not grouped
- Report Now: <sample name> Demultiplex Reads report Previously: Demultiplex Reads report
- Where multiple sequence lists are provided as input, the name of the first selected sample is used as the sample name.
- Trim Reads
- Trimmed, paired sequences Now: <sample name> (paired, trimmed pairs) Previously: <sample name> (paired) trimmed (paired)
- Trimmed, broken pairs Now: <sample name> (paired, trimmed orphans) Previously: <sample name> (paired, trimmed orphans)
- Discarded sequences Now: <sample name> (discarded) Previously: <sample name> (discarded)
- Report Now: <sample name> report Previously: <sample name>(trim report)
- In the case where multiple sequence lists were provided as input to the Trim Reads, the name of the first selected sample will be used in the output.
- RNA-Seq analysis tool and Map Reads to Reference
- Output names have been shortened: the content of the last set of parentheses of the input name is replaced by in the output name with a new tag denoting the specific type of output. Previously, tags were added to the input names when forming the output name.
- The word "un-mapped" has been replaced with "unmapped" in output names.
- When unmapped reads outputs are added to metadata tables the inputs are associated with, they are are now assigned the metadata role "Unmapped reads".
- Demultiplex Reads>
- The following have been moved to the Legacy folder of the Workbench Toolbox and "(legacy") appended to their names. They will be removed in a future version of the software.
- Create Combined RNA-Seq Report: The new Combine Reports tool includes this functionality, and should be used to combine RNA-Seq reports.
- Create Track from Experiment
- Remove Reference Variants
- Reverse Sequence
- The Small RNA Analysis folder, containing the following tools:
- Extract and Count
- Annotate and Merge Counts
- Download miRBase
- Remove Reference Variants The functionality of this tool can be replicated using "Filter Variants on Custom Criteria" with relevant criteria. To remove reference variants where the alternate allele has already been filtered away, use the new tool Remove Homozygous Reference Variants.
Functionality retirement
The following tools have been retired:
- Identify Differentially Expressed Gene Groups and Pathways (legacy)
- Add Fold Changes (legacy)
- Add Information from Overlapping Genes (legacy)
- Create Fold Change Track (legacy)
- Download Reference Genome Data (legacy)
The import of the following formats is no longer supported:
- qseq
- scarf
Plugin notes
New plugins
- Navigation Tools Provides the functionality formerly provided by the Bookmark Navigator and Recent Items Navigator plugins.
- SignalP and TMHMM Provides the functionality formerly provided by the SignalP and TMHMM plugins
Plugin retirements
Functionality of the following plugins has been integrated into the QIAGEN CLC Genomics Workbench and can be found under the Epigenomics Analysis area of the Toolbox:
- Histone CHIP-Seq
- Advanced Peak Shape Tools
The following plugins have been retired, with their functionality being provided by a new plugin:
- Bookmark Navigator
- Recent Items Navigator
- SignalP
- TMHMM
The following plugins have been retired and their functionality is no longer available through the QIAGEN CLC Genomics Workbench:
- PPfold
- TRANSFAC
Advanced notice
The following will be removed in a future release of the software:
- Create Combined RNA-Seq Report (legacy)
- Create Track from Experiment (legacy)
- Remove Reference Variants (legacy)
- Reverse Sequence (legacy)
- Roche 454 NGS import (legacy)
- Tools under the Small RNA Analysis (legacy) folder:
- Extract and Count (legacy)
- Download miRBase (legacy)
- Annotate and Merge Counts (legacy)
The "Run in Batch Mode..." functionality for installed workflows with multiple inputs will be retired in a future release. Workflows with multiple inputs can now be launched in batch mode by checking the "Batch" checkbox when selecting the input data.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 12.0.4
Improvements
- Download Blast Databases now retrieves databases from the dedicated version 4 archive ("v4" folder) at the NCBI. From February 4, 2020, this tool in older versions of the software retrieves version 5 (dbV5) blast databases, which cannot be searched using the BLAST search tool in this release line or earlier lines. (QIAGEN CLC Genomics Workbench 20.0 and higher can be used to download and search dbV5 blast databases.)
- The hmmsearch program, used by Pfam Domain Search, is now 64-bit on all platforms. Previously a 32-bit version was distributed for use on macOS.
- Various minor improvements
Bug fixes
- Fixed a bug where sequences could not be downloaded from NCBI through the 'Show BLAST Hit Table' view.
- Fixed a bug in the Primer Designer, where clicking on each potential primer starting position did not highlight the the primer region.
QIAGEN CLC Genomics Workbench 12.0.3
Improvements
- Improved the stability and performance when using tools that communicate with the NCBI. This will provide noticeable improvements when retrieving large numbers of search results or downloading large numbers of sequences using tools like Search for Sequences at NCBI.
- Improved the placement of figures when reports are exported to PDF.
- The Basic Variant Detection, the Fixed Ploidy Variant Detection and the Low Frequency Variant Detection tools have been updated: Variants extending up to 50 nucleotides beyond either end of a target region are now reported in full, while variants extending even further will include only the first 50 nucleotides beyond the target region. Insertions at the right hand border of a target region are now considered to be a variant within the target region.
- Improved the stability of workflow execution when the data is placed on a Network File System (NFS).
Bug fixes
- Fixed an issue causing the Bonferroni and FDR multiple testing corrections of the Differential Expression for RNA-Seq and Differential Expression in Two Groups tools to be calculated using a greater number of tests than were actually performed, resulting in the corrections being too strict. Further details...
- Fixed an issue where a large nucleotide sequence could be detected as a protein sequence when being extracted from a BLAST database.
- Fixed an issue in the Local Realignment tool that could affect the re-alignment of mappings created using the Create UMI Reads tool of the Biomedical Genomics Analysis plugin. This issue would sometimes lead a small minority of reads to be re-aligned differently in different runs. Further details...
Advanced notice
The following functionalities of NGS importers will be retired and will not be available in the the next major release of the software:
- Support for paired-end reads in the Ion Torrent importer
- Import of SCARF and QSEQ format files in the Illumina importer
The following tools will be removed in a future release of the software:
- Compare Sample Variant Tracks
- Create Track from Experiment
- Identify Differentially Expressed Gene Groups and Pathways
- Add Fold Changes
- Add Information from Overlapping Genes
- Create Fold Change Track
- Download Reference Genome Data (The functionality via the Reference Data Manager is unaffected by this.)
The PPfold plugin will be retired as of the next major release of the CLC Workbenches and Servers.
The TRANSFAC plugin will be retired as of the next major release of the CLC Workbenches and Servers.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 12.0.2
Please also refer to the Latest Improvements listing for QIAGEN CLC Genomics Workbench 12.0.1 below to see all the changes that have taken place since QIAGEN CLC Genomics Workbench 12.0.
Bug fixes
- Fixed an issue where the installation of large plugins occasionally failed.
- Fixed an issue with Ion Torrent importer workflow elements where an error arose if no linker sequence was entered when configuring the element.
- Fixed an issue where an error could occasionally arise during workflow validation, for example when adding or connecting workflow elements in the workflow editor.
- Fixed an issue where analyses with a read mapping step could occasionally fail with an error if different references were being used at the same time, and the reference cache (the temporary disk space used for reference data structure files) exceeded the configured size limit.
Improvements
The maximum amount of temporary disk space for the read mapping reference cache (the temporary disk space used for reference data structure files) has been increased to 16 GB. It was previously set to 8 GB.
Advanced notice
Support for paired-end reads in the Ion Torrent importer will be retired and will not be available in the the next major release of the software.
The following tools will be removed in a future release of the software:
- Compare Sample Variant Tracks
- Create Track from Experiment
- Identify Differentially Expressed Gene Groups and Pathways
- Add Fold Changes
- Add Information from Overlapping Genes
- Create Fold Change Track
- Download Reference Genome Data (The functionality via the Reference Data Manager is unaffected by this.)
The PPfold plugin will be retired as of the next major release of the QIAGEN CLC Workbenches and Servers.
The TRANSFAC plugin will be retired as of the next major release of the QIAGEN CLC Workbenches and Servers.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 12.0.1
Improvements
- Identify Shared Variants can now run on a single variant track, so that when included in a workflow, it no longer requests more than one workflow connection (variant track) in order to compare variants shared across samples.
- Improved time performance of the Basic Variant Detection, Fixed Ploidy Variant Detection, Low Frequency Variant Detection tools when running on empty non-circular chromosomes.
- The following tools have been standardized to present options in the same order in wizards, irrespective of whether the tool is launched via the Toolbox or in the context of a workflow: Basic Variant Detection, Fixed Ploidy Variant Detection, Low Frequency Variant Detection, InDels and Structural Variants, Remove Marginal Variants.
- Import of VCF files with reference overlap representation and filtered variants present is now supported. In QIAGEN CLC Genomics Workbench 12.0, filtered variant positions could interfere with reference overlap processing when importing VCF files.
- In tracks/track lists, entering "chr4" or "4:" in the location field now searches for a chromosome named "4", "chr4" or "chromosome_4", instead of selecting the 4th chromosome in the drop-down menu.
Bug fixes
- Fixed an issue where variants of exactly the same size and location as a target region would not be called. This issue affected variant calling when the "Restrict calling to target regions" parameter was used in Basic Variant Detection, Fixed Ploidy Variant Detection, or Low Frequency Variant Detection. For example, a SNV would not be called in a target region of size 1 that covered the SNV, but would be called in a target region of size >1. Further details.
- Reference variants without exact matching non-reference variants are now retained if they partially overlap non-reference variants. This may affect users of the following tools: Annotate with Flanking Sequence, Annotate with Conservation Score, Annotate with Exon Numbers, Remove Variants Present in Control Reads, Remove Marginal Variants, , Remove Orphan Reference Variants, Filter against Known Variants, Filter Based on Overlap, GO Enrichment Analysis, Link Variants to 3D Protein Structure, Predict Splice Site Effect, TRIO Analysis, Identify Shared Variants, Add Information from Overlapping Genes (legacy), Compare Simple Variant Tracks (legacy) and Remove Variants Found in Allele Frequency Community (from the Ingenuity Variant Analysis plugin). Further details.
- Columns in variant tables are now ordered more consistently.
- The BLAST Selection Against NCBI and BLAST Selection against Local Data options are once again available in the right-click context menu for sequence selections in a sequence view.
- Fixed a bug where the Local Realignment tool would fail when run on multiple inputs mode.
- Fixed an issue with the Amino Acid Changes tool where it would fail if there were circular chromosomes and the gene flanking option was enabled. Flanking checks have now been disabled for any exons/CDSs that span a circular chromosome origin.
- When enabling prioritization in Amino Acid Changes, the tool now annotates the highest prioritized transcript of all genes at a variant's position. Previously it only annotated the highest prioritized transcript for one gene on each strand.
- Fixed an issue where the RNA-Seq Analysis tool could fail when run using genes and transcripts from the Transcript Discovery tool (from the Transcript Discovery plugin) if these contained a "biotype" column.
- Fixed a rare issue where RNA-Seq Analysis would fail if a read that mapped to the start of a circular chromosome had an unaligned region at the start long enough that, had it aligned, it would have wrapped around the origin of the circular chromosome.
- Fixed a bug where, when running the Identify Graph Threshold Areas tool with a window size greater than 1, the last interval found was one nucleotide shorter than expected.
- Fixed a bug where the Download BLAST Databases tool sometimes failed with an error during download.
- Fixed a concurrency bug in the Copy Number Variant Detection tool, which very rarely resulted in the tool reporting all low-coverage targets on one or more chromosomes as false positive deletions.
- Fixed an issue causing the Workbench to freeze when showing a heat map where all entries were identical.
- Fixed an issue where the PDF export of reports did not contain column headers if the first header was empty.
- Fixed an issue where text files did not have the expected ".txt" extension after a "Tab delimited text" export with the "Output as a single file" option selected.
- The BED Export tool will now export empty values in the "name" column as dots "." and the BED Import tool now interprets dots "." in the "name" column as empty values.
- Fixed an issue where the Excel and PDF export of reports failed for reports that contained empty tables.
- Fixed an issue with the Search for Sequences at NCBI and Search for Reads in SRA tools, where the search could fail when the Workbench had been configured to use a proxy server.
- Fixed an issue where the table from Create Pairwise Comparison would not update correctly when the number of decimals to be shown was changed. However it can't display more than 4 decimal places, regardless of the user settings.
- Fixed a bug where, in tables in reports, numeric columns with at least five "?" would be left aligned instead of right aligned.
- Fixed a bug where, when launching a QIAGEN CLC Workbench using a CLC URL pointing to a data element stored on a CLC Server location, the data element was not opened automatically inside the Workbench after start-up.
- Workflows from Biomedical Genomics Workbench 5.x that contained tools for copy number variant detection can now be updated. Some of these workflows could not be updated in QIAGEN CLC Genomics Workbench 12.0.
- Fixed an issue where, when used on miRNA data, the tools Differential Expression for RNA-Seq and Differential Expression in Two Groups would report two "FDR p-value" columns. The second column is now correctly labeled "Bonferroni corrected p-value".
- Fixed an issue caused by a bug in Java 10 where files resulting from an analysis were occasionally corrupted. Often this affected analysis log files, but could affect other outputs. This affected GPFS file systems, and could affect other distributed file systems. The QIAGEN CLC Genomics Workbench 12 release line uses Java 10 and while we have worked around this issue so that analysis results files should no longer be affected by it, we recommend that the software itself not be installed directly on a GPFS file system.
Advanced notice
Support for paired-end reads in the Ion Torrent importer will be retired and will not be available in the the next major release of the software.
The following tools will be removed in a future release of the software:
- Compare Sample Variant Tracks
- Create Track from Experiment
- Identify Differentially Expressed Gene Groups and Pathways
- Add Fold Changes
- Add Information from Overlapping Genes
- Create Fold Change Track
- Download Reference Genome Data (The functionality via the Reference Data Manager is unaffected by this.)
The PPfold plugin will be retired as of the next major release of the QIAGEN CLC Workbenches and Servers.
The TRANSFAC plugin will be retired as of the next major release of the QIAGEN CLC Workbenches and Servers.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 12.0
New features
Reference Data Manager
- A reference data management tool is now available via the Workbench interface. It can be used for finding, downloading and managing reference data, as well as downloading and managing sets of reference data, which can then be used when configuring the reference data to be used in workflows.
- Reference data can be downloaded from public repositories such as Ensembl and sets of reference data of relevance to biomedical and panel data analysis can be downloaded from a QIAGEN repository.
- If the Workbench is connected to a QIAGEN CLC Genomics Server, and that server has been configured to allow storage of reference data, then data can be downloaded using the QIAGEN CLC Genomics Server or grid nodes.
- In some data selection wizard steps, two tabs are now available: the original Navigation Area view, where data of appropriate type to use as input are presented, and a new Reference Data tab. Under the Reference Data tab, reference data elements obtained using the QIAGEN Sets or Custom Sets tabs of the Reference Data Manager can be selected as input.
- A new concept called a Worklow Role has been introduced, allowing workflow input elements to be linked to data elements of a Reference Data Set. Workflow input elements can be configured with a Workflow Role within a workflow design. Data in QIAGEN Sets have Workflow Roles pre-assigned. Workflow Roles can be assigned to other data elements using functionality found under the Custom Sets tab of the Reference Data Manager.
Bisulfite Sequencing Analysis
- Three tools for analyzing cytosine methylation data are now available from a folder under the Epigenomics Analysis folder of the Toolbox: Map Bisulfite Reads to Reference, Call Methylation Levels, and Create RRBS-fragment Track. These tools reveal methylated cytosines genome wide and at single base level resolution, support statistical comparison between samples accommodating different experimental designs, and support reduced representation sequencing. These tools were formerly available via the Bisulfite Sequencing plugin, but are now integrated into the Workbench.
- The Map Bisulfite Reads to Reference offers the option to enable global alignments to produce read mappings with no unaligned ends, which was not formerly possible.
- The default "cost of insertions and deletions" in the Map Bisulfite Reads to Reference tool is now "affine". This improves results on internal benchmarks because it breaks a symmetry in the default "linear" scoring for reads ending in homopolymers (which are abundant in bisulfite mapping due to in-silico conversion of the reads and references to a 3 letter alphabet). This symmetry meant that either a mismatch or an insertion could be introduced at the ends of some of these reads without changing the mapping score. In practice the mismatch is more plausible, and this is favored by the affine penalties.
Other new tools
- Import Primer Pairs for importing primer pair locations from a generic text format file or from a QIAGEN gene panel primer file. This tool was formerly only available in the Biomedical Genomics Workbench.
- Import QIAGEN GeneReader for importing QIAGEN GeneReader data.
- Copy Number Variation Detection (CNVs) for detecting copy number variations (CNVs) from targeted resequencing experiments. Using read mappings and target regions as input, it produces amplification and deletion annotations. It is available under Toolbox | Resequencing. This tool was formerly only available in the Biomedical Genomics Workbench.
- Remove Information from Variants for removing annotations on variants. It can be found under the folder Toolbox | Resequencing Analysis | Variant Annotation. This tool was formerly only available in the Biomedical Genomics Workbench.
- Remove Orphan Reference Variants for removing reference allele variants that lack a corresponding non-reference variant allele.
- Differential Expression in Two Groups to be used instead of the more general Differential Expression for RNA-seq tool for testing differential expression between a single treatment group and a control group. Both these tools take the same input, but Differential Expression in Two Groups does not require a metadata table to describe the experimental design.
- The Batch Rename tool, available under the Utility folder of the Toolbox, allows sets of data elements, or members of a data element (e.g. sequences in an alignment or reads within a read mapping) to be renamed. Changes can be simple, like adding text to the start or end of names, or more complex changes, using regular expressions or custom options. This tool was formerly delivered in the Batch Rename plugin, but is now integrated into the Workbench.
- An item called CLC Server Connection has been introduced under the File menu. This launches a dialog for connecting from the Workbench to a QIAGEN CLC Server. This functionality was formerly delivered in the QIAGEN CLC Workbench Client Plugin, but is now integrated into the Workbench.
Welcome Center
The new Welcome Center is presented when the Workbench is first started up. It provides an overview of QIAGEN CLC Genomics Workbench functionality, news as well as links to additional information, data sets, and tutorials, making it easy to start working with the Workbench.
Improvements
Toolbox reorganization
The following tools have changed name:
- Coverage Analysis to Whole Genome Coverage Analysis
- Filter Marginal Variant Calls to Remove Marginal Variants
- Filter Reference Variants to Remove Reference Variants
- Create Detailed Mapping Report to QC for Read Mapping
- Create Statistics for Target Regions to QC for Targeted Sequencing
- Identify Candidate Variants to Filter Variants on Custom Criteria
- The Extract Reads Based on Overlap tool has been renamed to Extract Reads. In Extract Reads, the "Overlap tracks" parameter is now optional, so all reads in a mapping can be easily extracted if desired. The Extract Reads tool can also generate either reads tracks or sequence lists as output.
The folders and locations of tools provided under the Workbench Toolbox have been updated, better reflecting the purposes of the tools. To easily find specific tools and their new locations, please run the Launch tool in the Workbench.
- Tools formerly found under Toolbox | NGS Core Tools are now distributed in other folders to better reflect their purpose. Some also have a new name.
- Map Reads to References, Local Realignment, Merge Read Mappings, Remove Duplicate Mapped Reads, and Extract Consensus Sequence are in the Resequencing Analysis folder.
- Trim Reads and Demultiplex Reads are in the Prepare Sequencing Data folder.
- Sample Reads is in the Utility folder.
- Merge Overlapping Pairs is in the Legacy folder.
- QC for Read Mapping (previously Create Detailed Mapping Report) is in the Resequencing Analysis folder.
- QC for Sequencing Reads (previously Create Sequencing QC Report) is in the Prepare Sequencing Data folder.
- Variant detection tools are now all found under the folder Toolbox | Resequencing Analysis | Variant Detection.
- The Annotate and Filter Variants folder has been replaced by two folders, with some tools from other folders being included here also:
- Variant Annotation, which contains Annotate from Known Variants, Remove Information from Variants, Annotate with Conservation Scores, Annotate with Exon Numbers, Annotation with Flanking Sequences
- Variant Filtering, which contains Filter Variants on Custom Criteria, Filter against Known Variants, Remove Marginal Variants, Remove Reference Variants, Remove Variants Present in Control Reads (formerly called Filter against Control Reads)
- A new folder called Quality Control has been introduced under Resequencing Analysis. It contains QC for Targeted Sequencing, QC for Read Mapping, Whole Genome Coverage Analysis.
- The Compare Variants folder has been renamed Variant Comparisons, and contains Identify Shared Variants (formerly called Compare Variants within Group), Identify Enriched Variants in Case vs Control Groups (formerly called Fisher Exact Test), Trio Analysis.
- The folder under Molecular Biology Tools called Sequencing Data Analysis is now called Sanger Sequencing Analysis.
- A folder called Utility Tools has been introduced, which contains the tools Batch Rename, Extract Annotations, Sample Reads and Extract Reads (formerly called Extract Reads Based on Overlap).
- A folder called Prepare Sequencing Data has been introduced, which contains QC for Sequencing Reads (formerly called Create Sequencing QC Report), Trim Reads and Demultiplex Reads.
- The Toolbox | Workflows folder has been renamed Installed Workflows.
- Workflows delivered by some QIAGEN plugins are placed under Toolbox | Ready-To-Use Workflows. This folder appears only when at least one Ready-To-Use-Workflow is installed.
Using the Launch tool to find a tool will work with both the new and previous name.
Tracks improvements
- The side panel for a Reads Track now has a legend showing coloring information for the different read types. The legend also allows for customization of read colors.
- A new location field makes it easier to navigate tracks. Track locations can be specified using range, positions, chromosome names ("MT:", "5:"), and gene/transcript names ("BRCA2", "DHFR-001").
- Reads tracks now have a new coverage graph located above the reads, instead of the overflow graph that was previously placed below the reads.
- Reads tracks now have a vertical scrollbar to make it easier to navigate through high-coverage regions.
- When hovering the cursor over selected track types, a set of action buttons appear under the track name. These can be used to open the table view or to jump to the next or previous element.
- For annotation and variant tracks, the table view is now synchronized, so that making a selection on the track view, will select the corresponding rows in the table.
- Variant tracks now include Forward coverage and Reverse coverage annotations.
- For large variant tracks, it is now possible to limit the corresponding table view by making a selection on the track.
- It is now possible to Copy, BLAST or Open in a New View a selected portion of a read from a read track.
- A new option allows users to extract a selected sequence from a track list: Right click a selection made on a sequence track and select "Extract Sequence".
- Overlapping variants are now always shown in the same order as they were before when re-opening a variant track: Reference variants first, followed by lexicographic ordered variant based on alteration string (T > G). So snvs are ordered (top to bottom) by A, C, G, T.
- Track lists now display additional information about an annotation when hovering on it with the mouse cursor: the name and strand of the annotation, which exon is currently being hovered over and the position of the mouse cursor relative to the start of the annotation. This information is available in the ruler shown in the reference track of a track list, as well as in the lower right corner of the workbench.
Standalone read mapping improvement: Read mappings now have a new coverage graph located above the reads. The overflow graph at the bottom have been removed.
RNA-Seq Analysis tool improvements
- The RNA-Seq Analysis tool supports the alignment and quantification of reads that wrap around the ends of circular chromosomes.
- The tool caches the data structure used by the read mapper to map reads to known mRNA annotations. This reduces run time by up to 3 minutes per sample, with the greatest benefits being observed when using large numbers of mRNA annotations on systems with few cpu cores.
- A new row has been added to the "Strand specificity" section of the report produced by the tool. The row contains the number of "Reads with known strand", which is used in determining the percentage of reads ignored due to being on the wrong strand.
- The "Detected transcripts" column has been renamed to "Uniquely identified transcripts" for both the gene-level and transcript-level expression tracks.
- For the Statistical comparison track, the Volcano plot view has an option making it possible to visualize the smallest p-values (including p-value=0).
- The "Reference Sequence" section of the report now lists the number and length of all chromosomes used during read mapping. Previously it reported only the length and number of chromosomes with at least one genes or transcript.
- The RNA-Seq Analysis and Map Reads to Reference tools can now share cached copies of the read mapper indexes. This means that the average run time over many samples will be reduced if both tools are frequently used.
- The RNA-Seq Analysis tool is now more efficient when handling large references, particularly when batch processing samples.
- Read mappings produced by the RNA-Seq Analysis tool previously ignored deletions and insertions at exon-intron boundaries. This meant that such deletions/insertions would not be detectable in downstream variant calling. The tool has been updated to keep the deletions and insertions in the mapping, implicitly favoring the hypothesis of a deletion/insertion over a novel splice junction. This change does not affect expression levels.
Amino Acid Changes tool improvements
- The Amino Acid Changes tool previously used square brackets to describe coding region and amino acid changes when a single variant affected multiple transcripts or proteins, e.g., NM_207170.3:c.[140C>T]; NM_015484.4:c.[266C>T]. These brackets have now been removed (e.g., NM_207170.3:c.140C>T; NM_015484.4:c.266C>T) to comply with the HGVS standards, which reserve the brackets for the reporting of alleles. These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
- The tool describes replacements in the compact format preferred by HGVS (112_117delinsTG). Previously the description included the reference sequence (112_117delAGGTCAinsTG). These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
- We implemented the 3' HGVS compliance rule for c. annotation of variants: When doing p. annotations (protein-level HGVS) we similarly annotate insertions that really are duplications as such.
- The tool uses all positions covered by a variant when describing coding region changes, in accordance with HGVS recommendations. Previously the tool restricted its change descriptions to positions within a transcript (if supplied) or CDS. This fix will therefore mainly affect the descriptions of deletions that partially overlap a transcript. These changes are also reported by the variant callers when run on a standalone read-mapping with CDS annotations.
- An option can add c. annotations (HGVS DNA-level) for variants that are within a certain distance from the transcript boundaries. The distance can be configured but defaults are set to 5 kb upstream and 3 kb downstream.
- An option in the Amino Acid Changes tool allows users to output a variant track HGVS compliant.
- An option allows the prioritization of a single transcript when several annotations are available for one variant.
VCF importer and exporter improvements
- The VCF exporter and importer have been improved and now support VCF v4.2.
- VCF Export "Enforce diploid" option has been replaced with an improved and more general "Enforce ploidy" option set by default to 2. This option gives more control over the exported genotype and better compatibility with external applications such as Ingenuity Variant Analysis.
- Four complex variant representations can now be handled by the VCF importer and exporter, including the common reference overlap representation.
- The VCF exporter has an option to write variant annotations as INFO fields.
- In the VCF importer, we fixed and issue with the import of INFO IDs that contained non-alphabetical characters.
BED importer and exporter improvements
- The BED exporter now replaces spaces in feature names with underscores, since white space is not allowed in the BED feature names.
- The BED file exporter now always exports to BED12 format.
- The BED importer limit for name lengths has been raised from 80 to 256 characters.
Various tools improvements
- The Local Realignment tool has been optimized to run more quickly.
- The De Novo Assembly tool has been updated to use the same version of the read mapper as the one used by the Map Reads to Contigs tool. This typically leads to more accurate mappings. For larger assemblies the run time is expected to decrease on average, but for small assemblies run time is likely to increase.
- Filter Against Known Variants no longer adds duplicate annotations from known variants tracks. In addition, Overlap, Exact match and Partial MNV match annotations are now always added to the output variant track.
- The Import Ion Torrent and Import PacBio tool support import of reads from SAM or BAM format files. Mapping information is discarded during this import. To import a read mapping from SAM or BAM format files, use the existing Import | SAM/BAM Mapping Files... tool.
- Handling of RNA-Seq reads by the InDels and Structural Variants tool has been improved. This change affects breakpoint p-values and as a result, affects the number of breakpoints and variants reported. In addition, we have improved the calculations of the values reported for the "perfectly mapped" and "not perfectly mapped" breakpoint annotations.
- Improvements in the Differential Expression for RNA-Seq tool
- It now accepts RNA-Seq panel samples (including QIAseq panel samples) as input and offers additional normalization options.
- It now outputs statistical comparison tables in addition to the statistical comparison tracks. Tables offer the same functionality as the tracks, except for the track view.
- Improvements to the Annotate with Overlap Information tool:
- It now adds "Fold Change" annotations if you annotate with a Statistical Comparison track.
- It now has an option to "Keep only one copy of duplicate annotations".
- NCBI API keys for E-utilities can now be entered under Preferences | Advanced | NCBI API Key. This may be of particular interest where multiple machines with QIAGEN CLC Workbenches installed use the same IP address. If an NCBI API key is provided, it is used when running the following tools: Search for Reads in SRA, Search for Sequences at NCBI, Search for PDB Structures at NCBI.
- The Search for Sequences in UniProt tool is now available in Viewing Mode.
- The Search for Sequences in Uniprot tool has been updated to use the HTTPS protocol which is now required by UniProt.
- When using Search for Sequences at NCBI or Search for Sequences in UniProt to download sequences, the name of the searched database is now reported in the History view. If multiple sequences are downloaded, the resulting Sequence List now also has details in the History view.
- When right-clicking a CDS annotation on a stand-alone sequence, the option "Translate CDS/ORF..." gives a choice between translating using a selected code translation table or by extracting the translation code from the annotation itself if this information is available.
- The Download BLAST Databases tool now requires less disk space when downloading and installing BLAST databases.
- The history information associated with results from the BLAST and BLAST at NCBI tools now includes the version of the BLAST software used for the search.
- The Reverse Sequence tool now names the output sequence name with the input sequence name followed by "-R" .
- The tool called Replace Selection With Sequence which appears in context menus for sequences in the cloning tool will now be disabled when the sequence is linear but a selection spanning the end to start position has been made. Reasons why sequences cannot be marked as circular or linear are now described more clearly in the tooltip.
- For the Gene Set Test tool, the name of the columns "Occurrences in all genes", "Genes (universe)", "Occurrences in subset", "Genes (subset)" have been renamed to "Detected Genes", "Detected Genes (Names)', "DE Genes", "DE Genes (Names)".
- For the GO Enrichment Analysis, the name of the columns "Occurrences in all genes", "Genes (universe)", "Occurrences in sample", "Genes (overlap)" have been renamed to "Matched Genes", "Matched Genes (Names)", "Genes with Variations", "Genes with Variations (Names)".
Overall improvements
- Data created in QIAGEN CLC Genomics Workbench 12.0 will be internally compressed by default. Options are available for exporting data without this compression or for disabling it entirely. A new option allows users to "Export table as currently shown" - including all filter settings and potential additional columns selected using the Side Panel.
- Tooltips on data elements in the Navigation Area show the following additional information: type of the element (e.g. Sequence List), file size, and compression status.
- On macOS, the standard file browser is now used for browsing files. Previously, a third-party library was used.
- A new option allows users to "Export table as currently shown" - including all filter settings and potential additional columns selected using the Side Panel.
- A new filtering button in all tables allows users to display in the table view only pre-selected rows.
- The 'is in list' table filter now supports tabs as a list separator. This makes it possible to paste rows from Excel into the search field.
- The Import tool allows for entering folder paths in the File name field.
- It is now easier to reorder items in the Navigation Area. It was not previously possible to change the order of adjacent folders.
- When starting the workbench with a "clc://" argument, the requested element is now selected in the Navigation Area.
- The Show History View no longer has a restriction on the number of elements shown.
- Workbench response times after logging into a QIAGEN CLC Genomics Server have been improved in the situation where many server jobs submitted from the Workbench had completed since the last login.
- A message now warns if a bug report fails to reach Support using Help | Contact Support.
- Searching in the Navigation Area is now available when using the workbench in Viewing Mode.
- The Create installer for workflows dialog has additional fields for specifying information about the workflow's author.
- A "Check for updates" functionality is now available from the Help menu.
- Improved error message when attempting to save a file that is not the newest copy.
- Sequence annotations where the strand is not known are now drawn without an arrow to distinguish them from annotations on the plus strand.
- Various minor improvements.
Bug fixes
- Fixed an issue affecting the Map Reads to Reference tool when it was included in a workflow, where if the References parameter was connected to an input, and a masking track was configured, an error was reported stating that the masking track was incompatible with the reference genome, whether or not it was compatible.
- Fixed an issue with the Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools that caused a small minority of variants to go unreported under certain conditions expected to arise rarely..
- Fixed a bug where in some cases, the Search for Reads in SRA tool would not fetch the final page of results.
- Fixed a bug in the Identify Candidate Variants tool, (now called Filter Variants on Custom Criteria), where no results were returned when one or more criteria used a comparison operator with more than one term (e.g. ">=", "abs value <").
- Fixed a bug in the Import Tracks tool where one nucleotide exons would be skipped during import of GTF files. A consequence of this fix means that we do not support the import of UCSC SNPs typed as exons any longer.
- Fixed a bug where the "Unaligned end" field provided in the Breakpoint track output of the Indel and Structural Variants tool was left blank when the value should have been "Mixed consensus" on all but one chromosome. The field is now filled for all chromosomes.
- Fixed a problem introduced in QIAGEN CLC Genomics Workbench 11.0 where launching a tool from the Quick Launch window after sorting led to the wrong tool being started.
- Fixed an issue where the Motif Search updated the history of the input file even when no changes to the input data element were made.
- Fixed an issue where it was possible to create an empty alignment editor, causing the Workbench to crash.
- Fixed bug that caused import of empty text files to stall.
- Fixed an issue found in the History of a result generated by the Extract Annotations tool, that would incorrectly show that a reference sequence track was used when it was not.
- Specifying a reference cache size greater than 2GB was not possible when using a readmapper.properties file.
- The mapping tool used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis has been updated:
- Fixed an issue that led to some deletions being reported as multiple, separate deletions instead of a single, larger deletion when affine gap costs were used.
- Fixed a very rare bug in the read mapper, where an alignment with a leading unaligned end could get a wrong score.
- On Windows 10 and Windows Server 2016, it now runs with 'below normal' as the priority. Previously, it ran with 'normal' priority.
- Fixed the links to the AmiGO Gene Ontology website used for GO annotations.
- Fixed the links to the HGNC (HUGO Genome Nomenclature Consortium).
- On Windows 10 and Windows Server 2016, the underlying program launched when running the Sample Reads tool now runs with 'below normal' as the priority. Previously, it ran with 'normal' priority.
- On Windows 10 and Windows Server 2016, the default BLAST database location will now be either 'C:\Users\USERNAME\My Documents\CLCdatabases' or 'C:\Users\USERNAME\Documents\CLCdatabases'. Previously it was 'C:\Users\USERNAME\CLCdatabases'. When upgrading from earlier versions, an existing BLAST database location will not be modified and will continue to work.
- On some Windows 10 and Windows Server 2016 systems, the log files, user settings file, and workflow files, normally stored in 'C:\Users\USERNAME\AppData\Roaming\CLC bio\Workbench' were installed in 'C:\Users\USERNAME\Application Data\CLC bio\Workbench'. Now we instead store these files in '%APPDATA%\CLC bio\Workbench'. User settings and workflow files will be automatically copied to the new location if they were previously stored in 'C:\Users\USERNAME\Application Data\CLC bio\Workbench'.
- Various minor bug fixes.
Changes
- Workflows installed on an earlier Workbench version can be installed on a new major release line by copying the installed Workflow in the earlier Workbench version, saving the copy in the Workbench Navigation Area and then opening this copy in the new Workbench version. The workflow can then be installed if desired.
- The underlying read mapper and de novo binaries included in the QIAGEN CLC Genomics Workbench 12.0 are from QIAGEN CLC Assembly Cell 5.1.1.
- The following tools been moved to the Legacy folder of the Workbench Toolbox:
- Download Reference Genome Data: Download of reference data from public repositories such as Ensembl is now available from within the new Reference Data Manager
- Compare Sample Variant Tracks
- Merge Overlapping Pairs
- Create Track from Experiment
- The following tools were previously available in the Biomedical Genomics Workbench and have been placed in the Legacy folder of the QIAGEN CLC Genomics Workbench Toolbox:
- Identify Differentially Expressed Gene Groups and Pathways
- Add Fold Changes
- Add Information from Overlapping Genes: Using the Annotate With Overlap Information tool followed by the Remove Orphan Reference Variants tool provides the same functionality
- Create Fold Change Track
- The Import SOLiD tool has been retired. It was previously in Legacy Tools. As a consequence:
- The tools Map Reads to Reference, Map Reads to Contigs, Trim Reads, De Novo Assembly, Extract and Count, and Annotate and Merge no longer have special handling of SOLiD colorspace data. They will continue to work as expected for SOLiD data, but will not make use of color information to correct for phase shifts.
- Import | SAM/BAM Mapping Files and Standard Import | Reads from SAM/BAM files no longer allow import of data where colorspace information is provided in the form of CS flags and sequence data is omitted (SEQ = "*") .
- Export | SAM, Export | BAM, and Export | Fastq no longer have special handling of SOLiD colorspace data. They will continue to work as expected for SOLiD data, but will not make use of color information to correct for phase shifts.
- The *.cas importer found in Import -> Standard Import no longer allows the import of read mappings where SOLiD color information has been used as part of the mapping algorithm.
- The Import Tracks tool no longer supports the import of files in Complete Genomics master VAR file format. To import such files, it is necessary to first convert them to VCF using the tools provided by Complete Genomics.
- The column "Ignored reads (wrong strand)" has been removed from the "Strand specificity" section of the report produced by the Create Combined RNA-Seq Report tool. The column has been removed to better fit the report's purpose of only providing high-level relevant information.
- The format of the serverinfo.properties file has been changed. The new format supports up to 8 servers.
- The "Whole Genome shotgun-reads (wgs)" database has been removed from the BLAST at NCBI tool. Growth in the database means that specialized variants of BLAST are now required for search. More details on these can be found here.
Plugin notes
New plugins
- Biomedical Genomics Analysis 1.0 Installing this plugin on a QIAGEN CLC Genomics Workbench provides the functionality formerly available by running a Biomedical Genomics Workbench and installing the now-retired plugin, QIAseq Targeted Panel Analysis.
Plugin retirements
- Bisulfite Sequencing The tools delivered by this plugin have been integrated into the Workbench and can be found in the Toolbox under the folder Epigenomics Analysis | Bisulfite Sequencing.
- CLC Workbench Client Plugin The CLC Server Connection item under the File menu has replaced the need for this plugin.
- Batch Rename The Batch Rename tool, formerly delivered by this plugin, is now available directly in the Workbench under the Utility Tools folder.
- QIAseq Targeted Panel Analysis and QIAGEN GeneRead Panel Analysis Plugin These plugins were formerly available for use on Biomedical Genomics Workbenches only. Their functionality is now available via the new Biomedical Genomics Analysis plugin when installed on a QIAGEN CLC Genomics Workbench.
Advanced notice
The following tools will be removed in a future release of the software:
- Compare Sample Variant Tracks
- Create Track from Experiment
- Identify Differentially Expressed Gene Groups and Pathways
- Add Fold Changes
- Add Information from Overlapping Genes
- Create Fold Change Track
- Download Reference Genome Data (The functionality via the Reference Data Manager is unaffected by this.)
The PPfold plugin will be retired as of the next major release of the CLC Workbenches and Servers.
The TRANSFAC plugin will be retired as of the next major release of the CLC Workbenches and Servers.
If you are concerned about these proposed changes, please contact our Support team by emailing ts-bioinformatics@qiagen.com.
QIAGEN CLC Genomics Workbench 11.0.2
Improvements
- NCBI API keys for E-utilities can now be entered under Preferences | Advanced | NCBI API Key. This may be of particular interest where multiple machines with QIAGEN CLC Workbenches installed use the same IP address. If an NCBI API key is provided, it is used when running the following tools: Search for Reads in SRA, Search for Sequences at NCBI, Search for PDB Structures at NCBI
- The Search for Sequences in Uniprot tool now uses the HTTPS protocol.
Bug fixes
- Fixed a bug where the "Unaligned end" field provided in the Breakpoint track output of the Indel and Structural Variants tool was left blank when the value should have been "Mixed consensus" on all but one chromosome. The field is now filled for all chromosomes.
- Fixed a issue with the Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools that caused a small minority of variants to go unreported under certain conditions expected to arise rarely.
- Fixed an issue affecting the Map Reads to Reference tool when it was included in a workflow, where if the References parameter was connected to an input, and a masking track was configured, an error was reported stating that the masking track was incompatible with the reference genome, whether or not it was compatible.
- Fixed a bug in the Identify Candidate Variants tool where no results were returned when one or more criteria used a comparison operator with more than one term (e.g. ">=", "abs value <").
- Specifying a reference cache size greater than 2GB was not possible when using a readmapper.properties file.
- Links to the HGNC (HUGO Genome Nomenclature Consortium) website are now working again.
- Fixed the links to the AmiGO Gene Ontology website used for GO annotations.
- Fixed a bug where in some cases, the Search for Reads in SRA... tool would not fetch the final page of results.
Advanced notice
- SOLiD colorspace data support, including import, is not available in the the next major release line of the software.
- Roche 454 NGS import is now a legacy tool. We have retained it in the next major release line of the software, but it may be retired in a future release.
If you are concerned about these changes, please contact the QIAGEN Bioinformatics team (ts-bioinformatics@qiagen.com).
QIAGEN CLC Genomics Workbench 11.0.1
Improvements
- Implemented the 3' HGVS compliance rule for c. annotation of variants:
- When doing c. annotations (DNA-level HGVS) we annotate insertions that really are duplications as such.
- For c. annotations we furthermore fulfill the 3' rule for insertions, deletions and duplications.
- When determining amino acid changes, the 3' rule is applied to the DNA change first. This may shift a variant in or out of the coding region, and that will affect whether or not we consider it as an amino acid change.The 3' rule for p. annotations were previously fulfilled and are not affected by this fix.
Bug fixes
- Fixed a bug in the VCF (Variant Calling Format) file format exporter that affected the QUAL score of the variant. Previously, the variant QUAL score was set to be the maximum QUAL score of all alleles (regardless of whether it was a reference allele or not). In some instances, e.g., when there are two alleles and one has poor QUAL score, this choice was suboptimal. Instead, the variant QUAL score is now chosen as the maximum QUAL scores among all non-reference variants.
- Fixed an issue where the RNA-Seq Analysis tool would show an error if the first chromosome or contig contained no transcripts and the "Calculate expression for genes without transcripts" option was used.
- Fixed an issue where the RNA-Seq Analysis tool would sometimes generate TE tracks that could not be used in downstream tools. The error occurred when the "Calculate expression for genes without transcripts" option was used on a gene track where two genes had the same name, one of the genes contained the other, and neither gene had a transcript.
- Fixed an issue with the Trim Reads tool used in a workflow with multiple Trim adapter lists as input: all but the first list input were previously silently ignored, but the workflow now gives users a warning message.
- Fixed an issue where importing a Trim Adapter List with an adapter with "Discard the read (end matches at 3')" was imported incorrectly.
- Fixed an issue that could cause some third party plugins to fail trying to retrieve the fastq exporter.
- Fixed an issue where domain annotations added by the Pfam Domain Search tool started one amino acid later than expected. The corresponding start position in the table produced by the tool was correct.
- Fixed an issue with the advanced table filter functionality that prevented the removal of empty entries from columns expected to contain text.
- Fixed an issue where Excel formatted files (.xls, .xlsx) could not be imported as Trim Adapter Lists.
- Fixed a license issue causing workbenches to not start properly on Turkish Operating Systems.
- Fixed issue causing the license assistant dialog and EULA dialog to be too big for smaller screens.
- Fixed an issue where weblinks to Uniprot sequences led to the homepage.
Advanced notice
- SOLiD colorspace data support, including import, will be retired and will not be available in the the next major release of the software.
- Complete Genomics support, including import, will be retired and will not be available in the the next major release of the software.
- Roche 454 NGS import is now a legacy tool. We plan to retain it in the next major release of the software, but it may be retired in a future release.
If you are concerned about the proposed changes, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).
QIAGEN CLC Genomics Workbench 11.0.0
Improvements and new features
- Trim Reads:
- The Trim Sequences tool under the NGS Core Tools section of the Toolbox has been renamed to Trim Reads.
- A new option has been added to the Trim Reads tool: "Automatic read-through adapter trimming". This option makes it possible to automatically identify overlap in paired reads and will trim the region that is not part of that overlap. This option is turned on by default. This new default affects workflows that include Trim Reads (or by its former name: Trim Sequences); the parameter will be turned on and locked by default.
- Trimming adaptor:
- The New Trim Adapter List dialog has been updated to a new and more user-friendly interface.
- It is now possible to reverse complement an adapter sequence with a "Reverse Complement" button to the right of the sequence field.
- It is now possible to specify whether the trim should be performed on all reads, or only on the first or second read of a pair.
- A visual shows the adapter and the sequence being trimmed in relation to the rest of the sequence depending on the option chosen when an adapter is found.
-
Fastq Export
-
Paired sequence lists can now be exported to 2 fastq formatted files, one file containing the first member of each pair, the other containing the second member. This is now the default for Fastq Export when exporting paired data.
-
The option "Output as single file" is now disabled by default.
- The introduction of the new default setting "Export paired sequence lists to two files", has the implication that existing workflows that include a fastq export step will be in a state of conflict after they are updated for use on this release. This is because this option is not compatible with the option to "Output as single file", which was turned on by default in earlier versions. Affected workflows must be edited to either remove the option "Export paired sequence lists to two files" or the option "Output as single file". Messaging about this is provided when upgrading affected workflows.
-
- RNA-Seq Analysis:
- RPKM is now always calculated when running the RNA-Seq Analysis tool with the options "Genome annotated with genes only" and "One reference sequence per transcript".
- The default for the reference type parameter is now "Genome annotated with genes and transcripts".
- In the RNA-Seq Analysis tool, the option "Calculate RPKM for genes without transcripts" has been renamed to "Calculate expression for genes without transcripts".
- The behavior of the RNA-Seq Analysis tool has been changed when the option “Genome annotated with genes and transcripts” is used together with the option “Calculate expression for genes without transcripts".
-
-
- The counts of genes without transcripts are calculated. Previously only the TPM and RPKM were calculated.
- For a gene without a corresponding transcript, where that gene is overlapped by the intron of another gene, reads aligning to this region are counted towards the expression of the gene without the transcript. Previously such reads were counted as belonging to the intronic region of the overlapping gene.
- A single-exon transcript for each gene without transcripts is now added to the output TE track.
-
- Workbenches without a license can now be run in Viewing Mode. In this mode, data can be viewed, imported and exported. Plugins needed for viewing certain data types can be installed. Viewing mode, with its added functionalities, replaces Limited Mode.
- A dialog is now presented on startup if there are installed workflows that need to be updated before they can be run. The information about what to do to when a workflow needs to be updated has been improved.
- The history of a data element can now be exported as a CSV format file.
- The Extract Consensus Sequence tool can now be connected in a workflow to many more tools that take nucleotide data as inputs, including the Map Reads to Reference and Map Reads to Contigs tools.
- An option to include reads that partially overlap variants has been added to the Identify Known Mutations from Sample Mappings tool, enabling detection of variants that are longer than the reads.
- The Identify Known Mutations from Sample Mappings tool has been made slightly more strict when handling insertions and replacements, requiring reads to overlap adjacent reference positions to be counted as fully covering the variant.
- The speed of the Illumina High-Throughput Sequencing Import has been substantially improved. The largest gains are seen on paired read files compressed by gzip with speed improvements of up to 30%.
- Changed amino acids colors to better suit users with various forms of color blindness.
- The Download Pfam Database tool now downloads version 31. Updates can now be made independently of the release of QIAGEN CLC Genomics Workbench, so the version available for download could change over time from the one recorded here.
- In table views, it is now possible to filter columns with the filters "Is in list" and "Is not in list" when the values are numbers.
- When exporting files to SAM or BAM format files, information is now entered into the optional fields NM (edit distance) and MD (mismatch string).
- The filter terms for the Identify Candidate Variants tool now include the numeric operators '>=', '<=', 'abs value >=' and 'abs value <='.
- Importing a GO annotation file with the Standard Import tool, specifying the format "Generic annotation file for expression data", now fails with an informative warning if any of the GO annotations are truncated.
- Warnings are now reported if truncated GO annotations are found when opening data created by the Create Expression Browser tool.
- The 'Expression Browser Table' (output from the Create Expression Browser tool) now preserves sorting when changing the grouping, if sorting is not on any of the grouped columns.
- NCBI blast executables are upgraded to version 2.6.0.
- All wizard steps are now shown in the wizard sidebar when starting a tool or workflow.
- Visualization of features that wrap around the origin of circular sequences has been improved for sequences and tracks.
- Table filtering and search now interpret thousands and decimal separators in the same manner as the displayed table. Previously US punctuation was always used. This change means that if a table displays numbers in the form "123.456,7" then it is possible to find numbers less than ten by searching for "< 10,0" or "<10", but not "<10.0". If the table displays numbers in the form "123,456.7" then it is possible to find numbers less than ten by searching for "<10.0" and "<10", but not "<10,0".
- When a tool is disabled in a right-click context menu, hovering the mouse over the tool name will now reveal why a tool was disabled in most cases.
- The help window can now be closed by pressing the escape key.
- The Download Reference Genome Data tool now downloads genome annotations from GFF3 files instead of previously as GTF files. Genome annotations for Homo sapiens versions hg18 and hg19 are still downloaded as GTF files, as these are not available as GFF3 files.
- HTML formatting tags are now removed during export of data to Excel .xlsx or .xls format. This change does not affect the export of hyperlinks.
- This history information for data generated using the Identify Candidate Variants tool now includes a match criteria field, recording if the option 'match all' or 'match any' was used.
- For Reads tracks, the side panel option "Highlight reverse paired reads" is now enabled by default.
- For stand-alone read mappings, read pairs with reverse orientation are now highlighted with a lighter blue color. This is identical to the 'Highlight reverse paired reads' option for reads tracks.
- Parameters for the Trim Sequences tool are now shown in the same order when running the tool from the Toolbox or within a workflow.
- The column headings in the table containing statistics for each mapping, optionally produced by the Create Detailed Mapping Report tool, have been made more descriptive.
- The Search for Reads in SRA tool now reports in the top left corner the number of rows being displayed.
- Communication of error messages from the NCBI when running the Search for Reads in SRA tool has been improved.
- Map Reads to Reference now outputs an empty read mapping and report when the input contains 0 reads.
- A warning message is now presented when the tool Extract Sequences is run with the "Extract to single sequences option" selected and more than 100 sequences would result.
Changes
- The Roche 454 and SOLiD Import tools have been moved to the Legacy folder of the Workbench Toolbox.
- The option "Search on both strands" has been removed in the Trim Reads tool (formerly named Trim Sequences) and the Extract and Count tool.
- The Search for Sequences at NCBI tool now uses accession.version identifiers instead of GI numbers, as GI numbers are being phased out by NCBI (see https://ncbiinsights.ncbi.nlm.nih.gov/2016/07/15/ncbi-is-phasing-out-sequence-gis-heres-what-you-need-to-know/. )
- The Create Mapping Graph tool has been modified so that the coverage of overlapping paired end reads is now only counted as one in the overlapping region, instead of two as done previously.
- Removed the line "Total consensus length" from Detailed Mapping Report when using a Read Mapping Track as input, as these tracks do not contain consensus information.
- Clicking "Select genes in other views" in a Volcano Plot with an empty selection no longer gives an error message.
- The SAM and BAM Mapping Files importer now fails if there are reads with more than one primary alignment where both are marked as being the first in a pair or both are marked as being second in a pair.
- Scrolling in a table now scrolls a fixed number of pixels, and not a fixed number of rows or columns.
- The Extract Consensus Sequence tool can no longer process protein BLAST results.
- The "Adapter trimming" section of the Workbench Preferences has been removed. This section supported functionality that was already retired.
- The "Help" and "Reset" buttons in pop-up dialogs are now buttons with text labels. They were previously buttons with icons.
- The GCG sequence exporter has been removed. The GCG alignment exporter is unaffected by these changes.
- The underlying read mapper and de novo binaries included in QIAGEN CLC Genomics Workbench 11.0 are from QIAGEN CLC Assembly Cell 5.0.5.
Bug fixes
- Fixed an issue with the Create Statistics for Target Region tool where "GC %" was reported as a ratio. It is now reported as a percentage.
- Fixed an issue where paired distances were calculated incorrectly for paired reads in Forward-Reverse orientation where there is adapter read-through. Paired distances can be seen in the report from the Map Reads to Reference tool and the RNA-Seq Analysis tool. The paired distance calculation is also used by the "auto-detect paired distances" option in these tools, although this issue is unlikely to affect the inferred distances.
- Fixed an issue with the Amino Acid Changes tool when used with a circular sequence with a CDS annotation placed across the origin. Variants outside such a wrapped annotation could previously be incorrectly annotated with coding region changes.
- Fixed an issue with the Amino Acid Changes tool when used with a circular sequence with an intron across the origin. Previously, nearby variants were not annotated with coding region changes. Now, variants in such introns and that are within 2 nucleotides of the nearest exon will be annotated with coding region changes, if such changes are identified.
- Fixed a bug where the Amino Acid Changes tool would in some cases use the CDS reference instead of the RNA reference for annotating coding region changes. This would happen if the RNA and CDS annotations could not be matched, and it could cause variants in UTR regions to not be reported. The matching has now been improved by supporting the 'parent' field used by the GFF3 file format to pair CDS and RNA references.
- Fixed a bug in the RNA-Seq Analysis tool where, when run in "Genes and transcripts" mode, and using "Total counts" as Expression value, the expression values reported for GE tracks would not include shared exon counts. Downstream analyses based on the Set Up Experiment tool could be affected by this issue. Using affected GE tracks as input to the following tools would *not* affect their results: Differential Expression for RNA-Seq, Create Heat Map for RNA-Seq and PCA for RNA-Seq.
- Fixed an issue where the option to run the Differential Expression for RNA-Seq tool in batch mode was made available, leading to an error if it was selected.
- Fixed an issue where it was possible to start the Create Heat Map for RNA-Seq tool with invalid parameters that would cause the tool to fail.
- Fixed an issue where the number of input samples to the Map Reads to References and Map Reads to Contigs tools would be silently limited to 120. The execution is now aborted with a warning message. Each analysis must be started with 120 samples maximum.
- Fixed an issue with the mapping tool in the Workbench, which is used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis, where length and similarity fraction cut-offs in some cases were ignored for reads longer than 500bp.
- Fixed an issue with the InDels and Structural Variants that caused it to crash if it encountered a particular set of conditions relating to reads with deletions.
- Fixed an issues with the InDels and Structural Variants tool duplicate breakpoints and variants were reported if reads mapping as broken pairs were included in the analysis.
- Fixed an issue where filtering a log for a job that was still running would result in error dialogs.
- Fixed an issue that had previously prevented configuration of the export option "Output as single file" in workflows.
- Fixed an issue where data exported with gzip or zip compression did not have the .gz or .zip suffix appended to the filename when earlier exports had been made with the same name and export location specified.
- An issue has been fixed so that it is now possible to export in BAM format reads that contain synonyms, for instance 'X' as synonym for 'N'.
- Fixed bug which caused the fasta exporter to fail when exporting read mappings where one or more reference sequences have no reads mapped to it.
- Fixed an issue that could cause exports of reports with line graphs to fail.
- Fixed an issue where resetting the default parameter values when configuring the Identify Candidate Variants tool did not work.
- Fixed an issue that would prevent the Trim Sequences tool being run with certain length filter settings.
- Fixed an issue where the option to "Highlight reverse paired reads" in the side panel of a reads track would cause paired end reads to be colored incorrectly if the reads completely overlapped, as would happen in the case of adapter read-through.
- Fixed a bug where a cell containing multiple hyperlinked URLs caused export to Excel 2010 or Excel 97-2007 format to fail. Such cell contents are now written in plain text.
- Contigs with Gap annotations covering regions longer than 10 bp can now be successfully exported to AGP format. Sequences containing such gaps will be split into separate contigs on export. This issue will be particularly of interest to those using the Join Contigs tool of the QIAGEN CLC Genome Finishing Module.
- Fixed an issue where the Low Frequency Variant Detection tool could return NaN for the Probability value in rare instances for small datasets.
- Improved performance for several tools when handling genomes with many chromosomes. Examples include Annotate with Overlap Information, the BED Exporter, Filter Annotations On Name, and Motif Search.
Plugin Notes
- Licenses for commercial modules are no longer required to install a module on a Workbench nor to view data generated by tools of a commercial module.
- The flexibility associated with network module licenses has been improved. Workbench module licenses provided via a CLC License Server are now initially loaded only when a tool provided by that module is launched. Such licenses are returned when 4 hours lapses since the last module tool was launched from that Workbench.
Advanced notice
- SOLiD colorspace data support, including import, will be retired and will not be available in the the next major release of the software.
- Roche 454 NGS import has been moved to the Legacy Tools folder and will be removed in a future release, but will still be available in the next major release of the software.
If you are concerned about the proposed changes, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).
QIAGEN CLC Genomics Workbench 10.1.3
Improvements
- NCBI API keys for E-utilities can now be entered under Preferences | Advanced | NCBI API Key. This may be of particular interest where multiple machines with QIAGEN CLC Workbenches installed use the same IP address. If an NCBI API key is provided, it is used when running the following tools: Search for Reads in SRA, Search for Sequences at NCBI, Search for PDB Structures at NCBI.
- The Search for Sequences in Uniprot tool now uses the HTTPS protocol
Bug fixes
- Fixed a bug where the "Unaligned end" field provided in the Breakpoint track output of the Indel and Structural Variants tool was left blank when the value should have been "Mixed consensus" on all but one chromosome. The field is now filled for all chromosomes.
- Fixed a issue with the Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools that caused a small minority of variants to go unreported under certain conditions expected to arise rarely.
- Links to the HGNC (HUGO Genome Nomenclature Consortium) website are now working again.
- Fixed the links to the AmiGO Gene Ontology website used for GO annotations.
- Fixed a bug where in some cases, the Search for Reads in SRA... tool would not fetch the final page of results.
- Weblinks to UniProt sequences are now working again.
- Fixed an issue where domain annotations added by the Pfam Domain Search tool started one amino acid later than expected. The corresponding start position in the table produced by the tool was correct.
- Fixed an issue where the RNA-Seq Analysis tool would sometimes generate TE tracks that could not be used in downstream tools. The error occurred when the "Calculate expression for genes without transcripts" option was used on a gene track where two genes had the same name, one of the genes contained the other, and neither gene had a transcript.
- Fixed an issue where the RNA-Seq Analysis tool would show an error if the first chromosome or contig contained no transcripts and the "Calculate expression for genes without transcripts" option was used.
Advanced notice
- SOLiD colorspace data support, including import, will not be available from QIAGEN CLC Genomics Workbench 12.0 onwards.
QIAGEN CLC Genomics Workbench 10.1.2
Improvements and changes
RNA-seq analysis:
- Fixed a bug in the RNA-Seq Analysis tool where, when run in "Genes and transcripts" mode, and using "Total counts" as Expression value, the expression values reported for GE tracks would not include shared exon counts. Downstream analyses based on the Set Up Experiment tool could be affected by this issue. Using affected GE tracks as input to the following tools would *not* affect their results: Differential Expression for RNA-Seq, Create Heat Map for RNA-Seq and PCA for RNA-Seq.
- The behavior of the RNA-Seq Analysis tool has been changed when the option “Genome annotated with genes and transcripts” is used together with the option “Calculate expression for genes without transcripts".
-
- The counts of genes without transcripts are calculated. Previously only the TPM and RPKM were calculated.
- For a gene without a corresponding transcript, where that gene is overlapped by the intron of another gene, reads aligning to this region are counted towards the expression of the gene without the transcript. Previously such reads were counted as belonging to the intronic region of the overlapping gene.
- A single-exon transcript for each gene without transcripts is now added to the output TE track.
-
General:
- Fixed an issue where the number of input samples to the Map Reads to Reference tool and Map Reads to Contigs tools would be silently limited to 120. The execution is now aborted with a warning message. Each analysis must be started with 120 samples maximum.
- Improved the information about what to do to when a workflow needs to be updated.
Bug fixes
- Fixed an issue with the mapping tool in the Workbench, which is used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis, where length and similarity fraction cut-offs in some cases were ignored for reads longer than 500bp.
- Fixed a bug in the Amino Acid Changes tool where the CDS reference was used instead of the RNA reference when annotating coding region changes if the RNA and CDS annotations could not be matched. This could result in variants in UTR regions not being reported. The matching has been improved by supporting the 'parent' field used by the GFF3 file format to pair CDS and RNA references.
- Fixed an issue where the option to "Highlight reverse paired reads" in the side panel of a reads track would cause paired end reads to be colored incorrectly if the reads completely overlapped, as would happen in the case of adapter read-through.
- Fixed a bug in the Add Information about Amino Acid Changes tool where the CDS reference was used instead of the RNA reference when annotating coding region changes if the RNA and CDS annotations could not be matched. This could result in variants in UTR regions not being reported. The matching has been improved by supporting the 'parent' field used by the GFF3 file format to pair CDS and RNA references.
- Fixed a an issues with the InDels and Structural Variants tool duplicate breakpoints and variants were reported if reads mapping as broken pairs were included in the analysis.
- Fixed an issue with the InDels and Structural Variants that caused it to crash if it encountered a particular set of conditions relating to reads with deletions.
- Fixed an issue where the Low Frequency Variant Detection tool could return NaN for the Probability value in rare instances for small datasets.
- Various minor bugfixes
Advanced notice
Support for SOLiD colorspace data will be phased out over the next 12 months. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).
QIAGEN CLC Genomics Workbench 10.1.1
Genomics Workbench
Bug fixes
- Fixed an issue introduced in QIAGEN CLC Genomics Workbench 9.5 causing the Merge Annotation Tracks tool to fail when used on tracks with more than 6 chromosomes.
- Fixed an issue with the Cloning tool introduced in the QIAGEN CLC Genomics Workbench 10.1 where the tool could not be launched.
Advanced notice
- Support for SOLiD colorspace data will be phased out over the next 18 months. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).
QIAGEN CLC Genomics Workbench 10.1.0
New features
- Added keyboard shortcuts to change editor views. Ctrl + Shift + PageUp and Ctrl + Shift + PageDown now changes the current view of the currently focused editor.
- New keyboard shortcuts are available for navigation within the workbench:
- Navigate between open tabs with Ctrl + Page Down and Ctrl + Page Up (Windows/Linux/Mac). On laptops without Page Up/Down keys, the shortcuts are Ctrl+fn+arrow up/down.
- Return focus to the navigation area with Alt + Home.
- We have made the following improvements to tab presentation in the View area of the Workbench:
- Tabs show more of the name of the opened object.
- Tabs now open from the top left corner to the right and down.
- Tabs always stay in the same position when another tab is selected or a new tab is opened.
- A new sub menu has been added to the right click menu on tabs to select between the open tabs.
- Anonymous Workbench usage information can now be shared with us to help us improve our products and offerings. Information about what is collected and how to opt out is provided when the updated Workbench is launched. Further details are available in the manual.
Improvements
- New and improved Save View Settings dialog. This new dialog can be used for saving, applying, importing and exporting side panel views.
- Improvement to the PCA plot generated by the PCA for RNA-Seq tool, so that all points are visible with default side panel view settings. Previously the standard view settings could hide points with missing metadata.
- Stability improvements to SRA search:
- Fixed an issue that could cause the SRA search to prematurely time out with the message "java.com.SocketTimeoutException".
- Fixed an issue that could cause the SRA search view to display an error when trying to show results.
- When importing tracks, the history of the track now contains the full path name of the imported file.
- Opening large pairwise comparisons generated by the the Create Pairwise Comparison tool is now faster.
- Improved messaging when installing workflows on a QIAGEN CLC Genomics Server from the Workbench.
Bug fixes
- Fixed an issue with the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools that could cause the count and frequency values to be too low for a small subset of those variants that are contained within a larger variant region (e.g. an MNV or deletion). For a variant to be affected by this problem, there needed to be at least two other potential variants nearby that were disregarded during the variant calling process. This circumstance and our testing suggest this is a rare issue.
- Fixed a bug that in some cases would result in incorrect BaseQRankSum values being reported in the outputs of the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools.
- Fixed an issue where switching to the Heat Map view on an Experiment would give an error when no Heat Map existed.
- Fixed an issue where the GFF3 Exporter could generate invalid GFF3 for features of length 0.
- Fixed a rare issue that could cause GenBank export to fail.
- Fixed an issue with the Realign Selection tool where clicking on the ? button to see the manual information resulted in an error. This tool is launched from the right-click context menu for selections in sequence alignments.
- Fixed an issue where workflows run in batch mode would fail in the case where no results are saved to the Navigation Area and only one file is exported per batch unit.
- Fixed an issue introduced in QIAGEN CLC Genomics Workbench 10.0.1 where the Download Sequences from NCBI tool would continue to indicate it was searching when in fact the search had finished and no items were found.
Advanced notice
- Support for SOLiD colorspace data will be phased out over the next 18 months. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com).
QIAGEN CLC Genomics Workbench 10.0.1
Bug fixes
- Fixed an issue where the Search for Sequences at NCBI tool did not display the first and last search result in each page of results returned.
QIAGEN CLC Genomics Workbench 10.0
New tools for RNA-Seq
- Create Combined RNA-Seq Report - makes it possible to join multiple reports generated by the RNA-Seq Analysis tool into one combined overview report.
- PCA for RNA-Seq* - clusters samples in 2D or 3D. Known metadata about each sample is added as an overlay.
- Differential Expression for RNA-Seq* - uses multi-factorial statistics based on a negative binomial GLM.
- Create Heat Map for RNA-Seq* - simultaneously clusters samples and features. Known metadata about each sample is added as an overlay.*
- Create Expression Browser* - allows expression values, statistical results, and gene annotations to be viewed together.*
- Create Venn Diagram for RNA-Seq* - shows differentially expressed genes shared between experimental conditions.*
- Gene Set Test - tests the output from the Differential Expression for RNA-Seq tool for overrepresented gene sets (such as Gene Ontology terms) using a hypergeometric test.
- Import | RNA Spike-ins - for importing RNA spike-in sequences and concentration data.
*Tools marked with an asterisk were available to earlier Workbench versions via the Advanced RNA-Seq plugin. They can now be found in the Toolbox in the RNA-Seq Analysis folder.
These tools automatically account for differences due to sequencing depth, removing the need to normalize input data. They work with existing RNA-seq TE and GE tracks. Changes made in this release mean that outputs from the Differential Expression for RNA-Seq tool can now be used as inputs to the Extract Annotations and Extract Reads Based on Overlap tools.
RNA-Seq Analysis
- The RNA-Seq Analysis tool now supports RNA spike-ins, such as ERCC and SIRV, for quality control. This makes it possible to validate RNA-Seq experiments by comparing known spike-in concentrations to measured transcript concentrations. Spike-ins can be imported using the new RNA Spike-ins Import tool.
- The RNA-Seq Analysis report has been revised and updated:
- We now show the distribution of the biotypes that the reads mapped to.
- The strand specificity of the mapped reads is now reported.
- Transcript coverage plots make it possible to detect and visualize 5' and 3' coverage bias.
- For paired-end reads, we now detect and warn about potential adapter read-through.
- A biotype column is now available in the Expression Track tables produced by the RNA-Seq Analysis tool, when biotype information is available.
- The Mapping options of the RNA-Seq Analysis tool, "Map to gene regions only" and "Also map to inter-genic regions", have been removed. The tool now runs by mapping reads to the full reference supplied, which is equivalent to choosing the recommended "Also map to inter-genic regions" option in earlier versions.
- The RNA-Seq Analysis tool now always uses the "Expression level" option "Use EM estimation (recommended)" to quantify expression. This is more accurate than the previous default option. Differences are especially noticeable for Transcript Expression (TE) tracks.
- The RNA-Seq Analysis quantification by EM estimation now runs faster.
- In RNA-Seq analyses, reads that map uniquely to a genome position are now always marked as unique. Previously, a uniquely mapped read would be marked as ambiguous if it mapped to a position with multiple overlapping genes.
- Exon IDs will no longer be included in the ENSEMBL column of transcript expression (TE) tracks generated by the RNA-Seq Analysis tool. Gene and transcript names will continue to be listed and hyperlinked in this column.
Import/Export
- A tool to import PacBio data is now available at Import | PacBio.
- Usability aspects of data association using the Import Metadata tool have been improved, including adding a preview of data items to be associated with particular metadata rows.
- Fasta is now the default format the first time the Import | Tracks tool is invoked (was GFF2/GTF/GVF in earlier versions).
- The GFF2/GTF/GVF tracks importer can no longer be used to import GFF3 format files. The new GFF3 tracks importer should be used for this instead.
- The GFF3 importer has been updated with respect to the handling of CDS features. In earlier versions, CDSs with different IDs but the same parent gene would always be merged into the same CDS feature during import. This behavior will still occur in cases where all CDSs in the GFF3 file either have unique IDs or no IDs. For GFF3 files where there are any CDSs with identical IDs, then only CDSs with the same ID are merged into a single feature.
- The Import | Tracks tool now accepts files with a .fna extension.
- The display of the types of files to import using the Import | Tracks tool has been improved.
- The speed of importing to tracks where the original file contains data relating to many chromosomes has been substantially improved.
- RNA tracks imported from GFF3 format files are now colored according to their biotype.
- The Cosmic option of the Import | Tracks tool is now more flexible with regards to the column headings in the files being imported.
- An exporter has been added to export annotations on sequences or tracks to Generic Feature Format Version 3 (GFF3) format.
- An option has been added to create an index file when exporting to BAM format.
BLAST
The list of BLAST databases for use with the BLAST at NCBI tool has been updated:
- Added “RefSeq representative genomes” database.
- Removed “New or revised GenBank sequences (month)”. This is no longer supported by the NCBI.
- Changed “References mRNA sequences”name to “References RNA sequences”. The database that is searched remains the same as before.
- Changed “16S ribosomal RNA sequences” database to now search the “rRNA_typestrains/prokaryotic_16S_ribosomal_RNA” database, as listed on the NCBI website. It previously queried “TL/16S_ribosomal_RNA_Bacteria_and_Archaea”.
- Fixed “Human genomic plus transcripts” and “Mouse genomic plus transcripts” databases configuration to reflect their new location. Searching these previously returned an error.
New features and improvements
- Toolbox rearrangement: the expression analysis tools are now in two top-level folders: "RNA-Seq Analysis" and "Microarray and Small RNA Analysis". The former top level Toolbox folder Transcriptomics Analysis has been removed.
- When working with Gene Sets that refer to Gene Ontology terms, gene annotations are now automatically propagated to parent Gene Ontology terms. This improvement affects the tools: Hypergeometric Tests on Annotations and Gene Set Enrichment Analysis (GSEA).
- The mapping tool in the Workbench, which is used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis has been updated. The update includes improved read mapping quality and speed (especially for longer reads), improved memory performance for the index building stage, and various minor bug fixes. The new mapping tool corresponds to the clc_mapper tool included in Assembly Cell 5.0.3, planned for release in March, 2017.
- The default value for the parameter "Maximum guidance-variant length" in the tool Local Realignment tool has been changed to 200 (was 100). This change applies to all ready-to-use workflows and when the tools is launched directly.
- The Basic Variant Detection tool will no longer report N as an alternative allele when there is an ambiguous base at a variant position.
- The report generated by the tool Create Statistics for Target Regions now includes a "≥" sign instead of a ">" sign.
- The "Additional Reporting" options in the Create Sequencing QC Report tool, "Quality analysis" and "Over-representation analysis" have been removed. These outputs are now generated by default.
- A PubMed search option has been added to the Search for Reads in SRA tool. This returns only those runs that are associated with a PubMed abstract or full-text article.
- Support has been added for 'negative lookahead' when using Java regular expressions when using the Motif Search Tool.
- For new or existing sequence lists the sequencing platform can now be specified via the Read Group setting of the Element Info view.
- It is now possible to right-click on a table cell and filter table rows based on the value of that cell by choosing options under the new context menu section called "Table filters". This change applies to all tables where advanced filtering is available.
- The speed of sorting and loading tracks has been greatly improved. Due to these changes, tracks created with this or later versions of the Workbench cannot be used with older Workbenches. Backwards compatibility has been maintained: tracks created using older versions of the Workbench can continue to be used.
- The speed of searches for data elements with associations to specified metadata, from within a Metadata Table, has been greatly improved. To enable metadata related searches to work after upgrading to the QIAGEN CLC Genomics Workbench 10.0 indices for the locations containing the relevant data will need to be rebuilt.
- Columns with position information in the table produced by Find Broken Pair Mates tool now sorts numerically rather than alphabetically. Alphabetic sorting for these columns was introduced with the QIAGEN CLC Genomics Workbench 9.0. Earlier versions had numerical sorting.
- Tutorial windows are no longer blocked when a wizard is open.
- Various minor improvements
Bug fixes and changes
- Fixed an issue where the index building stage of the Map Reads to References and the Map Reads to Contigs tools was not taking into account the maxcores setting in the cpu.properties file, where this had been configured.
- Fixed an issue where sequence circularity was not reported in the output from the Map Reads to References tool.
- Fixed a bug in the Create Detailed Mapping Report tool, which sometimes reported incorrect read counts for circular sequences.
- Fixed an issue where the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools reported homozygous reference insertions in cases where a heterozygous variant was possible but the insertion variant was disregarded during filtering.
- Fixed an issue where the Identify Known Mutations from Sample Mappings tool would fail if it was part of a workflow and it received multiple input sample mappings as input.
- Fixed an issue with the Annotation Table view of a sequence where it was possible to change the types of annotations displayed at the same time as an annotation was being edited, which could lead to an error being thrown or the wrong annotation being changed.
- Fixed an issue with GenBank and EMBL exports where quoting specifications were not being conformed to.
- Fixed an issue with Primer Tables where an error resulted if either the option "Save Primer(s) Fwd, Rev" or "Save Fragment" was chosen and then the save operation was stopped by clicking on the Cancel button.
- Fixed an issue where in some cases filtering tables for empty values would not produce any results.
- Fixed an issue where advanced filtering did not work when looking for rows with cells containing multiple values using the filtering term "=" (equals).
- Fixed an issue where a workflow containing an export step that failed did not provide any indication that a problem had occurred.
- A sporadic java issue that led to errors including the text "java.lang.ClassCastException: sun.awt.image.BufImgSurfaceData cannot be cast to sun.java2d.xr.XRSurfaceData", has been addressed through an upgrade to java. This issue was primarily seen when using the Workbench remotely on Linux systems.
- Fixed a problem with the identification of the correct sequence types from MLST schemes in cases where the schemes contained blank characters. This issue affected workbenches with CLC MLST or QIAGEN CLC Microbial Genomics Module installed.
- Various minor bugfixes.
Retirement
- The GFF exporter has been retired and is no longer available. The new GFF3 exporter should be used instead.
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools have been retired and have been removed from the Legacy folder of the Toolbox.
- Tools in the Expression Profiling by Tags folder under the Toolbox | Legacy area have been retired and this folder has been removed. The tools retired are Extract and Count Tags, Create Virtual Tag List and Annotate Tag Experiment.
Plugin notes
- The Advanced RNA-Seq plugin has been retired. The tools from this plugin have been integrated into the software. Please see the New Tools for RNA-Seq section for more details.
Other notifications
- An option to opt out of providing anonymous usage information to QIAGEN has been added to the Workbench Preferences. We are not yet collecting any usage information so opting in or out does not have any effect at this time.
QIAGEN CLC Genomics Workbench 9.5.5
Improvements
- When importing tracks, the history of the track now contains the full path name of the imported file.
Bug fixes
- Fixed an issue with the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools that could cause the count and frequency values to be too low for a small subset of those variants that are contained within a larger variant region (e.g., an MNV or deletion). For a variant to be affected by this problem, there needed to be at least two other potential variants nearby that were disregarded during the variant calling process. This circumstance and our testing suggest this is a rare issue.
- Fixed a bug that in some cases would result in incorrect BaseQRankSum values being reported in the outputs of the Basic Variant DetectionLow Frequency Variant Detection and Fixed Ploidy Variant Detection
- Fixed an issue that could cause the SRA search view to display an error when trying to show results.
- Fixed a problem with the identification of the correct sequence types from MLST schemes in cases where the schemes contained blank characters. This issue affected Workbenches with the CLC MLST or QIAGEN CLC Microbial Genomics Module installed.
QIAGEN CLC Genomics Workbench 9.5.4
Improvements
- In cases where tools within a workflow have been renamed, it is now possible to filter for original tool names within the workflow configuration view of the workflow editing tool.
Bug fixes
- A timeout value that would lead a job to fail after 24 hours, which was introduced as part of optimizations to run on multiple threads in the QIAGEN CLC Genomics Workbench 9.5 has been extended to 7 weeks. The tools affected by this change are Annotate from Known Variants, Filter against Known Variants, Filter against Control Reads, Annotate with Exon Number, Annotate with Flanking Sequences, Filter Marginal Variant Calls, Compare Sample Variant Tracks, Trio Analysis, GO enrichment Analysis, Amino Acid Changes, Annotate with Conservation Score, Predict Splice Site Effect, Link Variants to 3D Protein Structure, Merge Annotation Tracks, Create Statistics for Target Regions, Fisher Exact Test, Annotate with Overlap Information, Filter Based on Overlap, Filter Reference Variants, Identify Candidate Variants, Coverage Analysis, and InDels and Structural Variants.
- Fixed an issue in the RNA-Seq Analysis tool where running in EM mode, with a "Strand specific" setting of "Forward" or "Reverse" would result in the second read of a pair mapped as a broken pair being counted incorrectly if that read was mapped outside a region annotated as a transcript.
- Fixed an issue where an error arose when using the RNA-Seq Analysis tool with the EM option and a strand specific setting of "Forward" or "Reverse" in cases where the second read of mapped broken pair mapped to the opposite strand of the strand specific setting.
- Fixed an issue with the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools where the forward and/or reverse count for a longer variant, supported by paired reads with both children having the same direction, could be too low. The forward count and reverse count is now reported correctly.
- Fixed an issue with the InDels and Structural Variants tool where an incorrect insertion could be called when the optimal alignment of a read's unaligned end around the breakpoint included a gap in the insertion sequence.
- Fixed an issue in the InDels and Structural Variants tool that would terminate analysis of large read mappings prematurely a fraction of the times.
- Fixed an issue with the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools where the count and read count could be reported as marginally higher than they actually were in a small minority of cases. For the affected variants, this could then also result in variant frequencies being reported that were slightly higher than they should have been, in some cases above 100%. Variants affected by this issue are a small subset of variants where the variant affected overlapped another potential variant and where only the affected variant was then reported. This change could lead to a small decrease in the number variants reported compared to earlier versions of the CLC software, due to a variant no longer passing the count or read count filtering constraints. The impact of this change is expected to be low. For example, in our tests, for a particular analysis that reported 250,000 variants, 30 fewer were reported with the same parameters and filters applied after this fix was implemented.
- Fixed an issue where the Basic Variant Detection, Fixed Ploidy Variant Detection, Low Frequent Variant Detection and Local Realignment tools could fail if a deletion was encountered at the end of a match between a read and the reference in the mapping used as input.
- Fixed an issue in the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequent Variant Detection tools where the tools could stop with an error. The problem arose when a read split up within a mapping (e.g. to map to separate exons) was split into 4 or more parts, and at least 4 of those parts would map within a region of adjacent variants being considered as a possible multiple nucleotide variant (MNV). This infrequent problem was most likely to occur when using high coverage RNA-Seq mappings and looking for variants occurring at low frequency. It was introduced in the previous bugfix release of the QIAGEN CLC Genomics Workbench, version 9.5.3.
Advanced notice
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools will be removed from the Server and Workbenches in March 2017.
- Support for some older operating systems (OS), listed below, will be discontinued in March 2017. Software released at that time and later may still run without issue, but problems experienced due to using an unsupported OS will not be addressed. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com), letting them know the OS being used and the products you are running on that OS.
- Windows: Windows Vista and Windows Server 2008
- Mac: Mac OS X 10.7 and 10.8
- Linux: Red Hat Enterprise Linux 5, SUSE Linux Enterprise Server 10 and 11 and Fedora 6 through 21
QIAGEN CLC Genomics Workbench 9.5.3
Improvements
- Server import and export locations shown in Workbench wizards now have tooltips giving the path to those locations.
- Various other minor improvements (e.g. improved tooltips)
Bugfixes
For the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools, the following have been addressed:
- Fixed an issue where the coverage of a longer variant that contained another variant was reported for both the longer variant and the contained variant. The coverage for the contained variant is now reported correctly.
- Fixed an issue affecting coverage calculation for SNVs without immediately adjacent variants when using paired read data: if the second read of a pair containing the variant did not meet the requirements of the quality filter, neither the first nor second read of that pair contributed to the coverage calculated for the variant.
- Fixed an issue where, for an SNV without immediately adjacent variants, overlapping reads of a pair that had conflicting base calls for that variant position contributed to the values calculated for coverage, read coverage, and read count of that variant.
- Fixed a bug where count, read count, and forward- and reverse read count could be incorrect for variants found in overlapping regions of a pair of reads and where the variant was originally identified as being adjacent to one or more other variants.
The above issues, including information on the products affected, are described on the public notification page: Coverage and count reporting for variants in certain circumstances are incorrect
For the Identify Known Mutations from Sample Mappings tool, the following issues have been addressed:
- Fixed an issue with the Identify Known Mutations from Sample Mappings tool where reads in a sample mapping were not identified as supporting the presence of a known variant in cases where the first position of the variant region in the mapped read contained a gap.
- Fixed an issue with the Identify Known Mutations from Sample Mappings tool where a read containing a variant longer than a known variant being tested for was counted as supporting the known variant in cases where the first part of the read’s variant sequence is identical to that of the known variant.
- Fixed an issue in the Identify Known Mutations from Sample Mappings tool where overlapping reads of a pair having conflicting base calls for a variant position could contribute to the coverage calculated for that variant.
Advanced notice
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools will be removed from the Server and Workbenches in early 2017.
- The Expression Profiling by Tags tools (Extract and Count Tags, Create Virtual Tag List, and Annotate Tag Experiment) are scheduled to be removed from the Server and Workbench in spring, 2017.
- Support for some older operating systems (OS), listed below, will be discontinued in early 2017. Software released at that time and later may still run without issue, but problems experienced due to using an unsupported OS will not be addressed. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com), letting them know the OS being used and the products you are running on that OS.
- Windows: Windows Vista and Windows Server 2008
- Mac: Mac OS X 10.7 and 10.8
- Linux: Red Hat Enterprise Linux 5, SUSE Linux Enterprise Server 10 and 11 and Fedora 6 through 21
QIAGEN CLC Genomics Workbench 9.5.2
Improvements
- SRA download functionality has been updated to support the upcoming NCBI transition to HTTPS.
- Updated the restriction enzyme list from REBASE.
Bug fixes
- Fixed a bug where running two or more concurrent instances of RNA-Seq Analysis with EM quantification could in some cases lead to incorrect results or error messages.
- Fixed an issue with running BLAST on macOS Sierra.
- Updated PFAM links reported by the Pfam Domain Search tool.
- Fixed an issue introduced in QIAGEN CLC Genomics Workbench 9.5 where enzymes listed alphabetically after RdeGBI were missing methylation information.
- Various minor bugfixes.
Advanced Notice
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools will be removed from the Server and Workbenches in early 2017.
- The Expression Profiling by Tags tools (Extract and Count Tags, Create Virtual Tag List, and Annotate Tag Experiment) are scheduled to be removed from the Server and Workbench in spring, 2017.
- Support for some older operating systems (OS), listed below, will be discontinued in early 2017. Software released at that time and later may still run without issue, but problems experienced due to using an unsupported OS will not be addressed. If you are concerned about the proposed change, please contact our Support team (AdvancedGenomicsSupport@qiagen.com), letting them know the OS being used and the products you are running on that OS.
- Windows: Windows Vista and Windows Server 2008
- Mac: Mac OS X 10.7 and 10.8
- Linux: Red Hat Enterprise Linux 5, SUSE Linux Enterprise Server 10 and 11 and Fedora 6 through 21
QIAGEN CLC Genomics Workbench 9.5.1
Bug fixes
- Fixed a serious issue that could arise when using Import | Illumina or Import | Ion Torrent to import gzip or bzip2 compressed files using QIAGEN CLC Genomics Workbench 9.5.
- Fixed a problem where launching a tool from the Quick Launch window after sorting led to the wrong tool being started .
QIAGEN CLC Genomics Workbench 9.5
New features and improvements
New tools
- "Search for Reads in SRA..." allows search and download of reads from the SRA database. Its is available from the Download button.
- A new GFF3 importer is available as an option in the Import -> Tracks tool.
- "Identify Known Mutations from Sample Mapping" can be used to look up known genomic variants in read mappings. Available from the "Resequencing Analysis" tool folder.
- "Identify Candidate Variants" can be used to identify and extract variants that fulfill certain criteria. Available from the "Annotate and Filter Variants" tool folder.
- A new option "Use EM estimation (recommended)" was added to the RNA-Seq Analysis tool. This enables the use of an expectation-maximization algorithm to distribute ambiguous reads between isoform/genes.
Improvements
Resequencing
- The Local Realignment tool has a new option that can allow the use of guidance variants longer than 100bp.
- The InDels and Structural Variants tool now offers the option to include reads mapped as broken pairs in the analysis.
- The InDels and Structural Variants tool offers now the option for consensus calculation to ignore reads if their relative coverage or quality scores are too low.
- The COSMIC importer has been updated to support the latest version of the COSMIC database, release v77.
- Improved performance of a number of tools when run on systems with multiple cores: Annotate from Known Variants, Filter against Known Variants, Filter against Control Reads, Annotate with Exon Number, Annotate with Flanking Sequences, Filter Marginal Variant Calls, Compare Sample Variant Tracks, Trio Analysis, GO enrichment Analysis, Amino Acid Changes, Annotate with Conservation Score, Predict Splice Site Effect, Link Variants to 3D Protein Structure, Merge Annotation Tracks, Create Statistics for Target Regions, Fisher Exact Test, Annotate with Overlap Information, Filter Based on Overlap, Identify Candidate Variants, and Filter Reference Variants.
RNA-Seq
- The tools "Filter Based On Overlap" and "Annotate with Overlap Information" now work with the Statistical Comparison Tracks produced by the Advanced RNA-Seq plugin.
- It is now possible to export expression tracks in BED format. The expression value will be exported as the score.
- Expression tracks are now colored according to a log-scale to ease visual interpretation of expression data.
- The track view of expression tracks dynamically rescales to make best use of the screen real estate. This change brings expression tracks in line with other track types.
Launch
- The Quick Launch tool is now found under the Toolbox menu instead of the View menu and a button called Launch that brings up this tool has been added to the toolbar.
- Analyses can now be launched on data elements listed in the results table of a Local Search by selecting the elements of interest, right clicking with the mouse and navigating through the context menu that appears.
Workflows
- Workflow outputs can now be configured so that subfolders to contain the outputs are created.
- New placeholders are available when defining the names of workflow outputs: {user}, {host}, and for elements of the timestamp of the output object, {year}, {month}, {day}, {hour}, {minute}, {second}.
- Placeholders within workflow output names that were previously available only as digits can now be specified using written names: {name} is a synonym for {1} and {input} is a synonyms for {2}.
- When using the {2} placeholder for custom naming in workflow output elements, only unlocked inputs will be included in the generated name.
- In the generated pdf showing all the configured parameters of a workflow, entries for parameters connected to a tool or an input element now list the names of the defining elements. Previously the parameter listings for such elements were left blank.
- The History view of data elements created using a workflow now includes information about the workflow that created them.
- Where a tool name has been altered in a workflow, the original name is now included alongside the changed name when exporting workflow parameters.
- The order of the tools in the workflow "Add Element" menu now matches the order in the Workbench Toolbox menu.
Metadata
- A "Remove Association(s)" option for removing metadata associations from selected data elements has been added sin the Metadata Elements view in a right click context menu.
- In the Metadata Find Associated Data view it is now also possible to use Find in Navigation Area when multiple rows are selected.
- When importing metadata from a spreadsheet with formulas in it, the result of the evaluation of the formula (as displayed in Excel) is now imported rather than the formula itself.
- Improved workflow validation to aid the user in identifying inputs that will be ignored due to the configuration of the workflow elements.
General
- The Trim Sequences tool under Toolbox | NGS Core Tools now handles ambiguity codes in the adapter/primer sequences.
- The upper limit for the "discard reads above length" option of the Trim Sequences tool under Toolbox | NGS Core Tools has been raised from 8000 to 99,999.
- The Identify Graph Thresholds tool can now be configured to work on specified regions only.
- A new option in the Sample Reads tool makes it possible to choose whether sampling should be deterministic or random.
- The "Sort folder" tool now uses numerical sorting for filenames prefixed with a number.
- New placeholders are available when defining the names of exporter outputs: {user}, {host}, and for elements of the timestamp of the output object, {year}, {month}, {day}, {hour, {minute}, {second}.
- Placeholders within export output names that were previously available only as digits can now be specified using written names: {input} is a synonym for {1}, {extension} is a synonym for {2} and {counter} is a synonym for {3}.
- The Identify Graph Thresholds tool can now be run using only a lower or upper threshold limit, rather than having to specify both.
- The Extract Consensus Sequence tool now outputs a sequence list for all results. Previously, when running this tool directly, if the result was a single sequence, it would output a sequence, not a sequence list. (Nothing has changed when this tool is run as part of a workflow, where sequence lists were always generated).
- The Extract Consensus Sequence tool no longer displays a message bubble after each contig has been processed.
- The Download Reference Genome Data manager now shows version numbers for several genome annotations.
- The list of enzymes pre-installed in the workbench has been updated from REBASE.
- Read group details are now shown on the Element Info view of sequence lists.
- The "Unknown" feature category can now be hidden in tree editors.
- The option "is not in list" has been introduced as a new table filtering option.
- All NCBI server communication is now encrypted. (NCBI will be moving all web services to the HTTPS protocol on September 30, 2016).
- GenBank import now also allows for file names with 'GBFF' extension.
- Standard deviations in reports are now being calculated with a different algorithm than previously. This will have no noticeable effect in the overwhelming majority of cases.
- The History view of variant, feature and expression tracks that are created by pressing the "Create Track from Selection" button on the table view of an existing track will now include details of any advanced filtering that was applied to the table at the time the new track was made.
- Three prime UTR and five prime UTR tracks from the "Download Reference Genome Data" tool are now named with distinct suffices. Previously the track names ended with ""_utr" and ""_utr-1".
- Improved the progress reporting for the import of large, gzip compressed Illumina and Ion-Torrent files.
- General speed and usability improvements.
Changes
Retirements
- The Expression Profiling by Tags folder has been moved to the Legacy Tools section of the toolbox
Bug fixes
- Fixed an issue with the tools "Extract from Selection" and "Extract Reads Based on Overlap" so that they now correctly extract mapped reads that extend over the (arbitrarily chosen) ends of the 1D representation of a circular genome.
- Fixed an issue where the Motif Search tool was incorrectly reporting all match accuracies as either 0% or 100%.
- Fixed an issue that caused characters in sequence names to be rendered incorrectly when a report was exported to Excel.
- The Download Reference Genome Data manager will now always refer to the latest version of the Ensembl GRCh38 Genome. It was previously locked to Ensembl version 82.
- Fixed an issue where, when searching for both read1 and read2 in a broken pair, the "Find Broken Pair Mates" tool reported that the mate of read1 was itself. The tool now correctly shows that the mate of read1 is read2.
- The Create New Sequence List button in a broken pair table now works when multiple read groups are involved.
- Fixed an issue that would make the wizard for the Demultiplex Reads tool fail if an invalid bar code was entered.
- Fixed a bug that made graphics export of some plots from reports fail.
- Fixed a rarely occurring bug where rendering of a Read Mapping in a Track List would fail.
- The Find Binding Sites and Create Fragments tools now properly display mismatches when the primer input is in lower-case.
- Fixed an issue where, when viewing statistical comparison tracks together with read mapping tracks, the statistical comparison track annotations could sometimes be rendered as offset from their true genomic positions.
- Fixed a memory leak in the Extract Consensus Sequence tool.
- Fixed an issue where sequences of length zero would cause the Create BLAST Database tool to throw an error. Such sequences are now skipped and will not included in the final database.
- The Illumina High-Throughput Sequencing Import tool now correctly warns that zip files with multiple entries are not supported.
- Tables with more than 126 million entries now show a warning that they contain too much data to display instead of leaving the table empty.
- Fixed an issue in the wizard for the tool Create Entry Clone where a previously used data located on an unmounted location would result in an error message being shown.
- Fixed an issue where right-clicking on a graph in a report and choosing to show "Report", "History" or "Element Info" triggered an error.
- Fixed an issue where it took a long time to open a workbench it it was previously closed when displaying an open table editor that had been sorted.
- Fixed a bug in the "Manage Enzymes" wizard that prevented a user from cancelling the action if "Save as new enzyme list" was enabled.
- Fixed a rare issue where some annotations could, but did not necessarily, go missing on sequences with greater than 1000 annotations of a given type on that sequence before the deletion and where the right-click context menu option "Delete selection" was used.
- Fixed an issue with links in tables to the PDB and dbEST databases.
- Fixed a rare issue that caused the "Extract from selection" right-click option not to work for a reads track in a track list.
- Fixed a bug where exporting to Wiggle on systems with specific system locales would produce files that could not be re-imported.
- Fixed an issue with the Import Metadata tool where, if a spreadsheet had already been loaded, then selecting the same spreadsheet again did not reload the spreadsheet content.
- Fixed an issue affecting the launching of workflows with multiple inputs in batch, where the workflow execution wizard did not update correctly when another metadata spreadsheet was selected.
- Fixed an issue where, when the Processes tab was hidden and then shown again, any processes listed before the tab was hidden were no longer shown.
- Fixed an issue where the save wizard dialog did not pre-select "Save in input folder" when that option was the most recently used one.
- Fixed an issue that could arise when migrating a workflow containing Create Sequencing QC Report where the workflow was originally created using the QIAGEN CLC Genomics Workbench 6.5 or older.
- Fixed an issue that made "Download and save" fail when invoked on a Blast editor.
- Updated the URL to use for links to UniProt databases.
- Various minor bugfixes.
Notice
From now on, only 64 bit versions of the QIAGEN CLC Genomics Server, QIAGEN CLC Genomics Workbench, Biomedical Genomics Workbench, CLC Bioinformatics Database and QIAGEN CLC Assembly Cell will be made available. 32 bit versions of these are discontinued.
Advance Notice
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools are scheduled to be removed from the Server and Workbench in spring, 2017.
- The Expression Profiling by Tags tools (Extract and Count Tags, Create Virtual Tag List, and Annotate Tag Experiment) are scheduled to be removed from the Server and Workbench in spring, 2017.
QIAGEN CLC Genomics Workbench 9.0.1
Bug fixes
- Fixed an issue with the RNA-Seq Analysis tool that could arise when the "Genomes annotated with genes and transcripts" option was chosen: If two or more genes had the same name, and a transcript could be assigned to each from the mRNA track, then the value in the "Transcripts annotated" column in the GE track and in the TE track was 0. Furthermore, all counts for such genes were reported as zero, even when there were reads mapping to them.
- Fixed an issue that arose when executing workflows with multiple inputs in batch, where changes to pre-defined, fixed inputs specified during the launch process were not applied.
- Fixed an exception in the Read Mapping Editor that could arise when working with mappings to circular references.
- Fixed an issue where the Motif Search tool incorrectly reported all match accuracies as either 0% or 100%.
- Fixed an issue where sorting a folder while saving into it could trigger an error.
- Fixed a bug in the batch mode dialog that would lead to an error when problems related to the underlying file or data location were encountered.
- Fixed broken help link in Ion Torrent importer.
Advanced notice
- From the autumn 2016 release, only 64 bit versions of the QIAGEN CLC Genomics Server, QIAGEN CLC Genomics Workbench, Biomedical Genomics Workbench, CLC Bioinformatics Database and QIAGEN CLC Assembly Cell will be made available. 32 bit versions of these will be discontinued from that time.
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools will be removed from the Server and Workbenches in early 2017.
QIAGEN CLC Genomics Workbench 9.0
New tools
Import Metadata - basic and easy metadata import. This tool supplements the tools available in the Metadata Table Editor.
Improvements
Workflow
- Workflow inputs can now be ordered via the Workflow Editor, affecting the order that input information is requested when setting up a Workflow run.
- Workflows with multiple input elements, where all input elements will be changed per batch, can now launched in batch by right-clicking on the installed workflow name and choosing the option "Run in Batch Mode...".
- Tools in a Worfklow that have been renamed will have both the new tool name and the original tool name displayed in the Workflow Configuration Editor.
- Made it possible to select files located on a CLC Server when using exporters in the workflow configuration editor of a CLC Workbench.
RNA-seq
- The RNA-Seq Analysis tool now computes Transcripts Per Million (TPM) values, which appear as an additional column in expression tracks.
- Faster analysis of multiple samples in the RNA-Seq Analysis tool due to caching of reference index files.
- Performance improvements for Expression Tracks in RNA-Seq.
- Expression track table views now have two new buttons for selecting or copying genes/transcripts names.
- Expression tracks now contain links to external databases when available.
- Transcript level expression tracks now contain the gene name for each transcript.
Mapping
- Match score can now be specified in the Map Reads to Reference and Map Reads to Contigstools.
- Map Reads to Reference now outputs an empty read mapping and report when nothing mapped, and empty unmapped reads if everything mapped.
- Fixed threads being leaked in Map Reads to Reference when caching of indexed reference sequences was used.
Track
- The tool Create Mapping Graph can now create a coverage graph over the start positions of reads in a read mapping.
- Improved error messaging when trying to import malformed fasta files into tracks.
Metadata
- The use of partial or exact matching schemes can be chosen when associating data with metadata using the Associate Data Automatically option.
- It is now possible to change the type of a metadata column, even if it already contains values. Conversion is only possible when all existing values in the given column can be converted to the new data type.
- Usability enhancements in the Metadata Table Editor.
General
- Fixed an issue with the VCF-exporter resulting in inconsistent information being output to the exported VCF. The metadata field "##reference" field now contains a human readable string-representation identifying the reference genome the exported variants are based on. The metadata field "##fileOrigin" was added to contain a human readable string-representation identifing the exported variant track.
- Performance optimization for sizing phylogenetic trees by metadata.
- The 3D Protein Structure Database has been updated. Please use the Download 3D Protein Structure Database tool to work with the latest version.
- The Download Pfam Database tool has been updated to download version 29.
- Improvements to the way Ensembl IDs are parsed to links in tables: stable Ensembl IDs are now correctly parsed to links for all Ensembl-supported organisms (Ensembl release 83).
- Substantial speed improvements to BAM export.
- The options for saving the output from "batch jobs " have been improved. Outputs can now be saved into a specified single folder in addition to the other established save options.
- All Excel sheets in a document are now imported and each sheet has a table created for its contents.
- The CSV, HTML and Excel table/tabular exporter now use "Inf" and "NaN" values to replace the ambiguous "?".
- In the wizard for exporting a table in CSV format, when not exporting all columns, it is now possible to cancel or go back to the previous step while selected columns are loading.
- SAM records with CIGAR strings with no aligned residues can now be handled when importing SAM/BAM files.
- An option has been added to allow the same print settings to be applied to all reports being exported to pdf format in a given export run.
- GFF Track Import now supports spaces in annotation names
- The "Manage Resources" tab has been removed from the the Plugin Manager.
Changes
Workflow related
- The Create Scatter Plot tool is now Workflow enabled.
- The Create MA Plot tool is now Workflow enabled.
General
- The naming rules for the outputs of several tools have been changed to align with those applied by most other tools. The tools affected by these changes are: Local Realignment,Low Frequency Variant Detection, Fixed Ploidy Variant Detection, Basic Variant Detection as well as the legacy variant detection tools: Probabilistic Variant Detection and Quality-based Variant Detection.
- The BaseQRankSum value for variants is now negative to indicate that the qualities for the variant is below those for the reference allele. The BaseQRankSum is now calculated as a positive value when the qualities for the variant are above those for the reference allele.
- Export to clc format now truncates very long filenames.
- Versions of individual tools are now reported in the history of output objects.
- The default view of the expression tracks has been changed: the table view opens first by default, and some columns are hidden, to simplify the view.
- For the NGS importers, the paired reads minimum and maximum default interval has been updated to 1 - 1000.
- Plots without any data points will now be skipped when rendering reports.
- The annotations "Known variation", "Validated by other experiment", "Ancestral allele", and "Phenotype related", created by variant track import are not used and have therefore been removed from variant tracks.
- The Detailed Mapping Report statistics table now shows previously missing values for regions with partial coverage. For fully covered regions these values cannot be calculated, and empty strings are replaced with coverage minimum, average and standard deviation. Numeric sorting is retained by inserting NaN values instead of empty strings, where calculations cannot be made.
- RPM package installers for Linux are no longer available.
- Associate Data Automatically accepts data elements (not folders) as input.
- The 'Database Fields' label shown in the 'Show Element Info' view has been renamed to ' Local Attribute Fields'.
- The "Metadata Role Override" parameter that was visible when creating Workflows has been removed.
- The user can no longer uncheck "export all columns" for input objects that do not support this option. This applies to command line functionality as well, where the user will now receive an error if this is attempted.
Retirements
- The ChIP-Seq Analysis (legacy) tool has been retired.
Bug fixes
- Fixed a bug when the download buttons on BLAST result table view failed for nucleotide sequences.
- Fixed an error when running merge overlapping pairs on extremely short reads.
- BED Export: when exporting block list entries (such as connected exons from mRNA tracks), positions were absolute, but are now relative to the 'chromStart' position.
- Fixed a frame offset bug that occurred when translating reverse complemented CDS regions into protein sequences.
- In heat maps it is now possible again to show colors legend to the left and right of the heat map.
- Fixed an off-by-one error for read start positions in the 'Find Broken Pair Mates...' output table.
- Fixed a bug that caused the Excel importer to use column names as cell values of the first row.
- Fixed an issue where open tabs were not correctly ordered after splitting view horizontally or vertically using the View menu or keyboard shortcuts.
- Fixed an issue where an error was reported if the local realignment tool detected an insertion followed by a deletion in the original mapping. Such positions are now ignored.
- Fixed an issue where Workflows were not able to remove intermediate data from permission enabled locations unless the top folder was writable.
- Fixed a bug that prevented the output from certain tools to be used as input in the "References" channel of the Map Reads to Reference tool when used in workflows.
- Fixed an issue where the "Show results"option in the Processes tab would lead to an error if the results dataset was very large.
- Fixed an issue so that double clicking on clc:// urls on Mac OS X now opens the data element in a view in an installed CLC Workbench.
- Fixed a bug, where the Reference Data Manager fails to open, when the CLC_References folder is located on a resource (e.g. an external disk), which is currently not available (e.g. the disk might be unplugged/disconnected).
- Fixed an issue where an error arose when renewing a borrowed network license.
- Fixed a bug that led to the creation of an empty folder for each excluded batch unit.
- Fixed a bug that led to the inclusion of the number of excluded batch units in the count of the total number of batch units to be processed.
- Added missing percentage signs for identities and gaps in Blast text exports.
- A Workbench Data Location pointing at a file on the system instead of a folder will now appear as unavailable in the Workbench Navigation area instead of throwing an error.
- When the InDels and Structural Variants tool is added to the workflow the "P-value Threshold" parameter did not show up in the Select settings wizard step under "Significance of unaligned ends breakpoints". This has been fixed.
- Fixed a bug where it was possible to type non-number characters into a number field when starting up a job in the Workbench.
- An error was previously thrown when encountering blank annotation-values. Blank values are now ignored.
- Fixed an error that could appear when moving the mouse over annotations in a sequence annotation table.
- Fixed an issue with Open Copy of Workflow so it now works on all workspaces rather than just the first workspace.
- Fix an issue that could lead to an error when a job status description changed while a full description was being generated.
- Fixed an issue with handling dates when importing metadata from Excel format files using the Metadata Table Editor.
- Fixed a bug that was causing missing report text lines.
- Fixed an issue where, when the option to "Skip these updates" was checked in the plugin update information window, this information was not saved. This led to the same plugin update information being presented after each Workbench restart if the plugins were not updated.
- The "Extract and Count" tool in Small RNA analysis now only accepts sequences and sequence lists. Previously, it incorrectly accepted standalone read mappings or small RNA samples as well.
- Fixed an error that occurred when pressing the Print button in the Help dialog (Mac OS X only).
- Fixed an issue where the text area in error dialogs did not expand vertically when the dialog was expanded.
- Fixed an issue where sub-jobs were not resumed after pausing and resuming a batch process.
- Fixed an issue where the workflow installer creation keyboard shortcut could be used when it should have been disabled.
- Fixed a rare issue that could be triggered by switching editor view with a double click.
- Fixed an issue that caused the 'Use random codon' parameter in the tool "Reverse Translate" to report a null-error.
- A bug was fixed where no BaseQRankSum was calculated for insertions of length 1.
Plugin updates and fixes
All plugins need to be installed in the new Workbench for compatibility reasons.
Changes to freely available plugins
- A new RNA-Seq analysis plugin is now available: Advanced RNA-Seq
- Ingenuity Pathway Analysis: Expanded with a new tool supporting Statistical Comparisons and a Ready-to-Use Workflow for statistical analysis, visualization, and upload to QIAGEN IPA.
- Annotate with GFF: Now supports spaces in annotation names.
- Batch rename: Fixed an issue where a warning was displayed for entries not modified.
- The RNA-Seq Legacy plugin has been retired.
Compatibility
- This release can be used with QIAGEN CLC Genomics Server 8.0.
Advanced notice
From the autumn 2016 release, only 64 bit versions of the QIAGEN CLC Genomics Server, QIAGEN CLC Genomics Workbench, Biomedical Genomics Workbench, CLC Bioinformatics Database and QIAGEN CLC Assembly Cell will be made available. 32 bit versions of these will be discontinued from that time.
QIAGEN CLC Genomics Workbench 8.5.4
All changes in this release have also been fixed on the QIAGEN CLC Genomics Workbench 10.x and 9.5.x lines at time of writing, with the exception of the one release note marked with an asterisk. That issue was fixed for QIAGEN CLC Genomics Workbench 10.0 and will be fixed in a future release from the QIAGEN CLC Genomics Workbench 9.5.x line.
Improvements
- All NCBI server communication is now encrypted (uses HTTPS).
- Updated the URL to use for links to UniProt databases.
- Updated BLAST executables to be compatible with macOS Sierra. This change only affects Mac users.
Bug fixes
- For the Basic Variant DetectionLow Frequency Variant Detection and Fixed Ploidy Variant Detection tools:
- Fixed an issue where the count and read count could be reported as marginally higher than they actually were in a small minority of cases. For the affected variants, this could then also result in variant frequencies being reported that were slightly higher than they should have been, in some cases above 100%. Variants affected by this issue are a small subset of variants where the variant affected overlapped another potential variant and where only the affected variant was then reported. This change could lead to a small decrease in the number variants reported compared to earlier versions of the CLC software, due to a variant no longer passing the count or read count filtering constraints. The impact of this change is expected to be low. For example, in our tests, for a particular analysis that reported 250,000 variants, 30 fewer were reported with the same parameters and filters applied after this fix was implemented.
- Fixed an issue where the coverage of a longer variant that contained another variant was reported for both the longer variant and the contained variant. The coverage for the contained variant is now reported correctly.
- Fixed a bug where count, read count, and forward- and reverse read count could be incorrect for variants found in overlapping regions of a pair of reads and where the variant was originally identified as being adjacent to one or more other variants.
- Fixed an issue affecting coverage calculation for SNVs without immediately adjacent variants when using paired read data: if the second read of a pair containing the variant did not meet the requirements of the quality filter, neither the first nor second read of that pair contributed to the coverage calculated for the variant.
- Fixed an issue where for a SNV without immediate neighboring variants, overlapping reads of a pair that had conflicting base calls for that variant position contributed to the values calculated for coverage, read coverage, and read count of that variant.
- Fixed an issue where the forward and/or reverse count for a longer variant, supported by paired reads with both children having the same direction, could be too low. The forward count and reverse count is now reported correctly.
- Fixed an issue with the InDels and Structural Variants tool where an incorrect insertion could be called when the optimal alignment of a read's unaligned end around the breakpoint included a gap in the insertion sequence.
- Fixed an issue where, when searching for both read1 and read2 in a broken pair, the Find Broken Pair Mates tool reported that the mate of read1 was itself. The tool now correctly shows that the mate of read1 is read2.
- Fixed a rare issue where some annotations could, but did not necessarily, go missing on sequences with greater than 1000 annotations of a given type on that sequence before the deletion and where the right-click context menu option "Delete selection" was used.
- Fixed a bug in the Manage Enzymes wizard that prevented a user from cancelling the action if "Save as new enzyme list" was enabled.
- Fixed a problem with the identification of the correct sequence types from MLST schemes in cases where the schemes contained blank characters. This issue affected Workbenches with the CLC MLST Module or QIAGEN CLC Microbial Genomics Module installed.*
QIAGEN CLC Genomics Workbench 8.5.3
Bug fixes
- Fixed an issue with the RNA-Seq Analysis tool that could arise when the "Genomes annotated with genes and transcripts" option was chosen: If two or more genes had the same name, and a transcript could be assigned to each from the mRNA track, then the value in the "Transcripts annotated" column in the GE track and in the TE track was 0. Furthermore, all counts for such genes were reported as zero, even when there were reads mapping to them.
- Fixed an issue where the Motif Search tool incorrectly reported all match accuracies as either 0% or 100%.
- Fixed a bug that made the help for tree side panel settings inaccessible when the workbench was run in limited (evaluation) mode.
Advanced notice
- From the autumn 2016 release, only 64 bit versions of the QIAGEN CLC Genomics Server, QIAGEN CLC Genomics Workbench, Biomedical Genomics Workbench, CLC Bioinformatics Database and QIAGEN CLC Assembly Cell will be made available. 32 bit versions of these will be discontinued from that time.
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools will be removed from the Workbench in early 2017.
- The tools in the Expression Profiling by Tags section of the Toolbox will be removed in early 2017. Tools affected: Extract and Count Tags, Create Virtual Tag List, and Annotate Tag Experiment.
QIAGEN CLC Genomics Workbench 8.5.2
Improvement
- Performance optimization for sizing phylogenetic trees by metadata.
Bug fixes
- Fixed an issue with handling dates when importing metadata from Excel format files using the Metadata Table Editor.
- Create Detailed Mapping Report tool: the detailed mapping report statistics table is now showing previously missing values for regions with partial coverage. For fully covered regions these values cannot be calculated, and empty strings are replaced with coverage minimum, average and standard deviation. Numeric sorting is retained by inserting NaN values instead of empty strings, where calculations cannot be made.
- Fixed an error when running Merge Overlapping Pairs on extremely short reads.
- Fixed a bug that was causing missing report text lines.
- Added support for a SAM record to be able to declare a CIGAR string, which leaves no residues left for aligning when importing SAM/BAM files.
- The Download Pfam Database tool has been updated to download version 29.
- Fixed a frame offset bug that occurred when translating reverse complemented CDS regions into protein sequences.
- Fixed an issue where Workflows were not able to remove intermediate data from permission enabled locations unless the top folder was writable.
- The "Metadata Role Override" parameter that was visible when creating Workflows has been removed.
- Fixed an issue that caused the 'Use random codon' parameter in the Reverse Translate tool to report a null-error.
- Fixed threads being leaked in Map Reads to Reference when caching of indexed reference sequences was used.
- Fixed an issue where Map Reads to Reference would under rarely occurring circumstances report a persistence error.
- When the InDels and Structural Variants tool is added to a workflow, the "P-value Threshold" parameter did not show up in the Select settings wizard step under "Significance of unaligned ends breakpoints". This has been fixed.
- BED Export: when exporting block list entries (such as connected exons from mRNA tracks), positions were absolute. This has been fixed: positions are now relative to the 'chromStart' position.
- Fixed bug when download buttons on BLAST result table view failed for nucleotide sequences.
- Fixed an issue with renewing a borrowed license.
QIAGEN CLC Genomics Workbench 8.5.1
Bug fixes
- Fixed a bug when the "Search for sequences at NCBI" tool would fail to download nucleotide sequences with the error message "The following sequences were not downloaded correctly: ...".
- Fixed a problem with the BLAST at NCBI step of the Create Protein Report tool.
- Fixed an issue leading to an error during VCF export where the data involved had originally been imported from VCF files and the values in the QUAL field were integers.
- Export of floating-point (decimal) numbers to VCF format were previously dependent on the specified locale. This has been fixed so that the decimal separator now always is a point.
- When doing automatic association of metadata, the log now shows which metadata rows were not associated with any data.
- Fixed a bug that prevented metadata manual information to be accessed from within the Workbench.
- Fixed a bug where doing automatic association using a metadata table stored on a CLC Server would fail.
- Automatic association of metadata now handles association based on the a prefix of data names rather and exact matching to the whole data name.
- A metadata table no longer needs a key column for its rows to be manually associated with data elements.
- An option to override metadata roles previously visible in the configuration of Workflow outputs was removed.
- Fixed issue that caused locked parameters to be overwritten by a previously entered value, during workflow execution.
- Fixed an error happening when a Workbench Data Location was pointing at a file on the system instead of a folder. It will now appear as unavailable in the Workbench Navigation area.
- Enabled tooltips for all parameters when configuring and executing workflows.
- The login process from a Workbench to a CLC Server must now complete before opening a clc url will begin.
- Fix a problem on Macs where the Workbench was not recognized as a custom protocol handler for clc:// urls.
- Resolved a rare occurring exception that could be triggered by switching editor view with a double click.
- Fixed a problem where after import of a large volume of data, using the "Show results" option in the process tab resulted in an error.
- Fixed an error that occurred when pressing the Print button in the Help dialog (Mac OS X only).
Changes
- In the output from the Trio Analysis tool, the inheritance option "Accumulative" has been renamed to "Recessive".
QIAGEN CLC Genomics Workbench 8.5
New features and improvements
- The Sequencing QC report now contains the total number of reads in the summary.
- Numerical comparison operators => and <= have been added to the filter tool for tables.
- Quality scores ( QUAL ) are now calculated and added as annotations for variants. These values are included in VCF exports.
- Batching on selected elements is now possible: it used to be restricted to selected folders.
- The Search for Sequences at NCBI tool now has an option to search the EST database.
- Improved memory management when handling large report elements.
- Improved use of multiple cores when running the Create Detailed Mapping Report.
- Improved use of multiple cores in the InDels and Structural Variants tool.
- The output of the Reverse Complement Sequence now gets the suffix "-RC" attached to the name of the input. It used to be "-1".
- The Hierarchical Clustering of Samples tool can now be executed as part of a workflow and can be executed on a QIAGEN CLC Genomics Server.
- The fastq exporter can now export sequences up to 500Kbp. The limit used to be 32Kbp.
- Tooltips on leaves of phylogenetic trees now display a description of the attached sequence.
- Numbers are no longer appended to the names of Workflow elements when creating a copy of a Workflow using "Open Copy of Workflow".
- Metadata Management. Keep track of input files and import meta information for your samples.
Changes
- The tool "ChIP-Seq Analysis" has been renamed to "Transcription Factor ChIP-Seq"
Bug fixes
- Fixed a SOLiD NGS importer bug where import of very low quality, colorspace-encoded, paired-end sequence reads in fastq format could lead to paired sequence lists where the wrong reads were marked as pairs.
- Fixed an issue with the Map Reads to Contigs tool that could be extremely slow when included in workflows with multiple inputs.
- Fixed a bug in the Annotate and Merge Counts tool where the Feature ID of mature 3' small RNAs in the "grouped on mature" tables was incorrect if the input data type was an Experiment.
- Fixed an issue where some filtering operations, such as "doesn't contain" did not act correctly when filtering table cells that contained multiple pieces of information.
- Fixed automatically generated link to COSMIC website, which previously led to retired page.
- Fixed an issue where annotations that spanned the ends of a circular sequence would be incorrectly placed in the Circular Sequence View.
- Fixed a bug that caused the workbench to freeze if certain sequences were displayed in circular view with radial rendering of labels.
- Fixed an issue whereby Create Box Plot and Principal Component Analysis could sometimes be run with illegal arguments, leading to an error message.
- Fixed a bug in the Predict Secondary Structure tool when the option to calculate the partition function was selected for long molecules (>1000 nucleotides).
- Fixed an issue where some filtering operations, such as "doesn't contain" did not act correctly when filtering table cells that contained multiple pieces of information.
- Fixed errors which prevent the side panel options of the gel view of a sequence list to be correctly applied and stored.
- The list of Illumina adapters sequences has been removed from the Genomics Workbench.
- Fixed an issue where one could not zoom in after zooming out fully on very large workflows.ing out fully on very large workflows.
- Fixed an issue that prevented a root folder on Windows drives from being used as a File Location.
- Fixed an issue where updating an existing installation on Windows would result in the .vmoptions file being deleted, which makes the Workbench run with the default Java configuration.
- Fixed exported reports having the wrong author in certain situations.
QIAGEN CLC Genomics Workbench 8.0.3
Bug fixes
- Fixed a read mapper bug that caused some reads to be incorrectly reported as unmapped when global alignment was selected.
- Fixed an issue with the sort order for paired reads in SAM/BAM exports in high coverage regions.
- Fixed a SOLiD NGS importer bug where import of very low quality, colorspace encoded paired-end sequence reads in fastq format could lead to paired sequence lists where the wrong reads area marked as pairs.
- Fixed an issue where the Local Realignment tool when run with RNA-seq mapping could occasionally report a match that did not meet internal requirements as a valid match. This had a downstream effect when variant calling tools were run, and then failed upon encountering such a position. This issue has also been addressed in this release.
- Fixed an issue where the Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection tools would stop with an error when encountering a place in a read mapping containing a match that did not meet internal requirements of a valid match.
- Selecting an entry in a Blast results table could highlight the wrong alignment in the Blast editor, if the table had been filtered or sorted.
- Fixed an issue where an error would arise when using the Design Primers editor and clicking on an annotation on the sequence.
- Fixed a bug that caused the mapper to enter an infinite loop if a reference of length 0 was used.
- Fixed a rare bug that sometimes made the read mapper halt prematurely when several seeds were identified at the same reference position.
- Fixed a rare issue where the Workbench would display an error message when installing a 3rd party licensed plugin.
- Fixed an issue where an error would arise in some view types when an region of a sequence had been selected and then the "Zoom to selection" tool was used.
Improvements
- The Illumina import now shows the file name on top of the process bar during the import.
- In the "SAM/BAM Mapping Files" import tool, any inconsistencies between the reference sequences in the BAM file and the reference sequences in the CLC software that are provided for the import of the BAM file are now highlighted in red in the "References in files" table.
QIAGEN CLC Genomics Workbench 8.0.2
Bug fixes
- Fixed an issue with running BLAST at NCBI where an NCBI-generated error about their CPU usage limit being exceeded was not being reported transparently and a result of "no hits" was being reported instead.
- Added a work around to a java issue that occasionally resulted in the Workbench displaying an uninformative error and requiring a restart to continue working.
- The InDels and Structural Variants tool can now better detect variants when using target regions near the edge of the regions.
- Fixed a rare error in the Create Statistics for Target Regions tool. The error resulted in a failure when a target region only included the very last nucleotide of a chromosome.
- The relative read direction filter in the Low Frequency Variant Detection tool is less strict on variants with large coverage.
- The variant callers could enter an infinite loop for certain inputs. This fix adds a check that was unfortunately missing in previous fix for this problem.
- Fixed bug in which Local Realignment could produce an illegal read mapping. This only happened for RNA-data.
- The variant caller will now fail if it encounters an illegal RNA read mapping. If the variant caller fails with such a message, and if it was run on locally realigned data, then we suggest to re-run the local realignment to avoid the error.
- The Reverse Translate tool ignored any genetic code specified in the codon frequency tables. All reverse translation would thus default to the standard genetic code.
- Fixed wrong display of "Supported format" when exporting elements from either the Folder Editor or the Local Search Editor.
- Fix of potential wrong file being saved when editing a file found via the Local Search Editor.
- Plots inside reports are now shown with their saved side panel settings.
- Fixed saving different line colors in plots through the side panel.
- Side panel option to show legends for a plot with more than 10 samples is now enabled.
- Fixed an issue that led to an error when rendering plots for empty data sets.
- Fixed text inside variant boxes in the track view sometimes having a small font size.
- When installing a workflow with bundled data, it is no longer possible to select a read-only folder for storing the data.
- Transcriptomics experiment and sample tables can now be sorted, even with large numbers of rows.
- Fixed an issue where the "Empty Recycle Bin” option was sometimes incorrectly unavailable.
- A fix was applied to avoid an exception in circumstances when the cleanup of downloaded files from BLAST failed.
QIAGEN CLC Genomics Workbench 8.0.1
New features and improvements
- Former plugin "Duplicate Mapped Reads Removal" is now integrated under the name "Remove Duplicate Mapped Reads " and can be found in the NGS Core toolbox. For users that had previously installed this plugin, it needs to be uninstalled.
- Link Variants to 3D Protein Structure now generates full biomolecules: even if the PDB template only contains a single subunit, the full multimer can be generated from symmetry information. This makes it possible to locate variants on the interface between protein chains.
- The filtering option in the Create Track from Experiment tool only considered the predicted fold-changes in the positive direction, so features that were reduced in expression were filtered out. This has now been fixed.
- BLAST has been upgraded to BLAST+ 2.2.30 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.2.30 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- Transcriptomics experiment and sample tables can now be sorted, even with large numbers of rows.
- Particular annotation types (columns) can now be specified for export in Excel, HTML and tab delimited formats.
- Added column to output of "Annotate and Merge Counts" indicating 3' or 5' direction when using "grouping on mature" parameter.
- Increased the performance for gzip export.
- The results of BLAST searches now include a new view, the Blast Hit Table.
Bug fixes
- Fixed an issue with running blast searches at the NCB I where an NCBI-generated error about their CPU usage limit being exceeded was not being reported transparently and a result of "no hits" was being reported instead.
- Fixed an error that in rare cases would result in a division by zero error message when selecting rows in the Annotation Table view.
- Fixed an error that made it impossible to add an annotation via the Annotation Table view if the table is empty.
- Fixed rare problem where a track list of reads tracks and graph tracks would break.
- Fixed an error affecting the "Cut Sequence Before/After Selection" tool in the Cloning editor.
- Fixed a bug where a left-click quickly followed by right-click was interpreted as double-click on OS X (in the persistence search result list, in the toolbox tree, and in the workflow editor).
- Fixed an error that occurred when running the Create Sequencing QC Report tool and requesting quality analysis reporting..
- Fixed an error that prevented the import of adapters from csv format.
- Fixed the SOLiD NGS importer to correctly import basespace encoded sequences in fastq files. It is still assumed that sequences originate from colorspace.
- It is now possible to filter tables based on content in the 'Link Variants to 3D Protein Structure' column.
- Fixed a rare error that caused the Amino Acid Change tool to crash if a CDS feature was less than 3 bases long.
- Fixes and updates for automated genome downloads (Zea mays, C. elegans).
- Fixed a bug in the probabilistic variant caller that caused it to fail for certain input.
QIAGEN CLC Genomics Workbench 8.0
New features and improvements
- New tools:
- Create Track from Experiment. This tool makes it possible to convert Experiments to Tracks. In the Experiment, the results of the statistical analysis are annotated on the experiment as additional columns. It can be advantageous to visualize the results of the statistical analysis as tracks.
- Link Variants to 3D Protein Structure makes it possible to visualize amino acid changes on 3D protein structures. After running the tool on a variant table, variants can be visualized on 3D structures. 3D models are automatically built using structural templates from the PDB. The new tool can be found under 'Resequencing Analysis | Functional Consequences | Link Variants to 3D Protein Structure'.
- The Map Reads to Reference tool now supports both linear gap cost parameters and affine gap cost parameters. The addition of affine gap cost support allows you to get more accurate results for reads with stretches of insertions or deletions.
- The read mapper used in the RNA-Seq Analysis tool has been upgraded to use the new read mapper described above. This upgrade enables you to run RNA-seq Analysis with as little as 6 GB RAM and at the same time improves your end results. However, you cannot yet use affine gap cost parameters in your RNA-Seq analysis.
- Performance of the Merge Read Mappings tool has been improved, especially in situations where the number of reference sequences is very large, such as when merging reads mapped against de novo assembly results.
- The tool Amino Acid Changes has been expanded with an extra output that makes it possible to visualize amino acid changes in track format. The amino acid color schemes can be changed in the Side Panel under "Track layout" and "Amino acids track".
- Chromosome bands/cytogenetic ideograms can now be downloaded to the Workbench via the Download function. The ideogram can be added to track lists to get a better overview of the data.
- Tracks:
-
- Improved resource management: Makes it more efficient to work with tracks involving large numbers of reference sequences. This typically applies to situations where a reference genome is not available, such as when tracks are based on de-novo assembly results.
- Consistent output when enriching variant tracks and annotation tracks with extra table columns. Output tracks from these tools now have the same number of added table columns and the columns will always be in the same order. Previously, if an added column had empty values for all variant rows, it would have been removed from the final table, resulting in varying number and relative order of additional columns when multiple samples were processed with the same tools/workflows. All columns are retained now, facilitating downstream processing of exported tables, and providing immediate visual reference as to which enrichment/annotation tools have been applied, even if they did not produce any results for a particular sample.
- Tables for variant tracks and annotation tracks can now sort and filter columns with cells containing multiple numbers.
- Improved the track viewer for variant tracks to show the sequence alteration on the rendered variant.
- Improved performance of creating variant tracks and annotation tracks.
- Graph tracks now show negative values filled upwards to y=0 (as expected).
- Workflows:
-
- When installing a workflow in the workflow manager, the newly installed workflow is automatically selected.
- The "Run" button in the workflow editor does not require a saved workflow anymore to be enabled.
- In the execution wizard of a workflow the "Reset to default" button is now active.
- All icons in the workflow editor are now on the left side.
- Introduction of snippets: Parts of workflows can now be saved as a snippet and reused in other workflows.
- Installed workflows: It is now possible to create a copy of an installed workflow and open the copy in the view area by clicking once and then right-clicking on the installed workflow in the toolbox. This brings up the option "Open Copy of Workflow".
- MA plots, scatter plots and histograms can now accept expression tracks as input.
- An extra optional output called "Create coverage graph", that shows the coverage in each position of the targets, has been added to the tool Create Statistics for Target Regions.
- Increased decimals for numbers when exporting table to CSV, tab delimited text, and Excel.
- Improved reporting of errors related to low disk space.
- New features for the 3D molecule viewer:
- Align to Existing Sequence makes it possible to connect a 3D protein chain to a sequence, sequence list, or an existing alignment
- Transfer Annotations makes it possible to create atom groups from sequence annotations (and vice versa) for connected sequences.
- Improved layout of the property viewer.
- Improved PDB import of water molecules, DNA/RNA, and saccharides.
- When importing PDB files, the resulting Molecule Project now contains citation information (PDB ID and primary reference), which can be found in the 'Show History' view.
- Batching: Processes tab and analysis execution logs now display batch names in addition to analysis names for enhanced clarity.
- The External Application Client Plugin is now available directly from the Workbench Plugin Manager.
- Multiple target region tracks for the "Indels and Structural variants" can now be specified.
Bug fixes
- Fixed an error resulting in billions of reads being silently dropped when producing large read mappings against large counts of reference sequences. The error involves a read count overflow and the dropping of at least 2 billion reads per failure instance.
- Fixed display problem in read mappings showing too many hidden insertions (as vertical black lines) in certain overlapping paired reads.
- Fixed problem with links and text in tables that were being cut off when succeeding a link.
- Restriction site analysis: The values "Cut position(s)" column of the restriction site analysis table now behaves like numbers instead of text, meaning sorting and filtering works.
- The tool Identify Graph Threshold Areas can now use negative values to define its threshold.
- Workflows:
- In the workflow editor the "Reset to default" now always reverts to the right names.
- In the workflow editor the validation is now correctly triggered when changing the configuration of an input element.
- The workflow editor can now open workflows in which the graphical view of the workflow is corrupt.
- Fixed an exception which could occur during workflow migration.
- Data with the same name can now be bundled multiple times in a workflow installer.
- Previously when a plugin contained custom actions and a workflow, the workflow could not be installed. This has been fixed.
- Fixed problem with unlocked output names that previously could not be configured during execution of a workflow.
- A workflow with configured data from a server is now automatically validated when connected to the server (when opened in the editor). Previously the workflow had to be closed and reopened first.
- The original workflow file included in a workflow installer can now be exported directly without having to restart the workbench in advance.”
- A problem with saved table settings that sometimes did not work has been fixed. The bug fix includes a more robust/generic way of saving table settings with different columns. To fix this problem, existing saved table settings should first be loaded on an object where it works (i.e. has the same columns as when it was saved); and then the table settings should be saved with the old name to overwrite the settings.
- Fixed an error that could cause batch processing to open all results rather than saving them.
- Fixed problem with import of BED files using external applications.
- SAM/BAM import will no longer fail for alignments with POS = 0, but instead import them as though they were unmapped.
- Fixed problem going back in the wizard for the "Find Binding Site and Create Fragments" tool.
- Fixed error occurring when removing an unsaved reads track from a track list.
- Metadata for phylogenetic trees: A bug has been fixed with import of metadata containing column names with colons.
- Fixed error when showing protein translations of annotations shorter than 3 bases.
- Search for PDB Structures at NCBI has been fixed to correctly show PDB deposit date and organism type.
- Fixed a bug in the Mapping Coverage exporter.
- Fixed reads tracks reads-amount indicators (the numbers between the reads track and the box with the tracks name and number of reads) that sometimes wrongly said 0.
- Small RNA Analysis -> Annotate and Merge Counts: When you choose to create a “grouped on mature” output, the small RNAs are grouped by both the 5’ and the 3’ mature sequences separately in the “grouped on mature” output. The column heading has therefore been changed to show "Mature" instead of "Mature 5' ".
- When using the RNA-Seq Analysis tool with the "One reference sequence per transcript" option, the "Maximum number of hits for a read" option was sometimes not taken into account for multi-hit reads. This has been fixed.
- Two problems with the F1 help has been fixed; 1) When pressing F1 in a workflow tool wizard more than one help window appeared, and 2) Fixed problems showing help by pressing the F1 key in tool wizards.
- Fixed a bug that in some cases caused an error when annotating read sequence lists with the GFF/GTF/GVF annotation tool.
- Amino Acid Change tool: In cases where an mRNA track does not overlap all annotations in the CDS track, "Coding Region Changes" were not added to variants overlapping a CDS but not overlapping an mRNA annotation. This has been fixed.
- Variant callers and the "Amino Acid Changes" tool: In cases where variants overlapping an mRNA annotation but not a CDS annotation,"Coding Region Changes" were not added to variants overlapping an mRNA annotation but not a CDS annotation. This has been fixed.
- Fixed an error that in rare cases would prevent creation of tracks from references sequences.
- Hypergeometric test on annotations: Fixed a rare error that occurred for some data sets containing annotations of the form: '1234 // abc'.
- Fixed a bug in the QC report creation step of the ChIP-seq analysis.
- Fixed a bug for color space reads in the RNA-Seq Analysis tool that caused only exon-exon matches to be reported.
- An issue where an XSQ file containing both base space and color space versions of the same reads were incorrectly imported into the same sequence list, resulting in each read appearing twice has been addressed.
- The alignment editor view and alignment primer design view now have independent settings.
- Fixed an issue with mapping of paired-end reads, where these were erroneously reported as broken pairs when the fragment size derived from the alignments of the two ends of the pair was longer than reference sequence.
Changes
- Contigs coming from the de novo assembler will now have underscores in their names rather than spaces.
Plugin updates and bug fixes
- The TRANSFAC Plugin has been updated and now has two modes of operation: "Classic" and "Genomic". The Classic mode is the legacy mode taking sequences as input and annotating these sequences. The new Genomic mode takes regions on a genome (an annotations track) as input. In both modes it is now possible to specify global thresholds of similarity score which can be used to filter the annotations included in the output.
- A bug has been fixed with import of metadata containing column names with colons in the Metadata Import plugin.
Compatibility
- This release can be used with QIAGEN CLC Genomics Server 7.0
- This release is using the read mapping and de novo assembler that corresponds to QIAGEN CLC Assembly Cell 4.3.
QIAGEN CLC Genomics Workbench 7.5.5
Bug fixes
- Fixed a read mapper bug that caused some reads to be incorrectly reported as unmapped when global alignment was selected.
- Fixed a bug that caused the mapper to enter an infinite loop if a reference of length 0 was used.
- Fixed a rare bug that sometimes made the read mapper halt prematurely when several seeds were identified at the same reference position.
- Fixed a SOLiD NGS importer bug where import of very low quality, colorspace encoded paired-end sequence reads in fastq format could lead to paired sequence lists where the wrong reads area marked as pairs.
- Fixed sort order for paired reads in SAM/BAM exports in high coverage regions.
- The analysis/workflow execution system now handles search algorithms specially so that search results are not modified. This eliminates a host of concurrency issues.
- Fixed an issue where selecting an entry in a Blast results table could highlight the wrong alignment in the Blast editor, if the table had been filtered or sorted.
- Minor improvements in persistence.
QIAGEN CLC Genomics Workbench 7.5.4
Bug fixes
- Fixed an issue that caused the Reverse Translate tool to ignore the genetic code specified in the codon frequency tables such that the reverse translation used the standard genetic code.
- Fixed an issue introduced by a fix in the Genomics Workbench 7.5.3 restricting the use of the Create Statistics for Target Regions tool on tracks containing a larger number of nucleotides (>2147483647 bp) than could be supported for coverage table output. This check is no longer applied if coverage table output is not requested.
- Fixed bug in which Local Realignment could produce an illegal read mapping. This only happened for RNA-data.
- The variant caller will now fail if it encounters an illegal RNA read mapping. If the variant caller fails with such a message, and if it was run on locally realigned data, then we suggest to re-run the local realignment to avoid the error.
- Read-only folders are no longer offered as potential locations to save data bundled with a Workflow.
- Side panel option to show legends for a plot with more than 10 samples is now enabled.
- Fixed saving different line colors in plots through the side panel.
- Plots inside reports are now shown with their saved side panel settings.
- The automated paired distance estimate can no longer exceed the maximum distance accepted by read mapper (100,000 bp).
- Fixed an error that occurred when hovering the mouse cursor over the edge of a read mapping.
QIAGEN CLC Genomics Workbench 7.5.3
Bug fixes
- The filtering option in the Create Track from Experiment tool only considered the predicted fold-changes in the positive direction, so features that were reduced in expression were filtered out. This has now been fixed.
- When using the RNA-Seq Analysis tool with the "One reference sequence per transcript" option, the "Maximum number of hits for a read" option was sometimes not taken into account for multi-hit reads. This has been fixed.
- Fixed an issue with mapping of paired-end reads, where these were erroneously reported as broken pairs when the fragment size derived from the alignments of the two ends of the pair was longer than reference sequence.
- Fixed an error affecting the "Cut Sequence Before/After Selection" tool in the Cloning editor.
- Fixed an issue with running blast searches at the NCBI where an NCBI-generated error about their CPU usage limit being exceeded was not being reported transparently and a result of "no hits" was being reported instead.
- Fixed a bug in the probabilistic variant caller that caused it to fail for certain input.
QIAGEN CLC Genomics Workbench 7.5.2
Bug fixes
- Fixed an error resulting in billions of reads being silently dropped when producing large read mappings against large counts of reference sequences. The error involves a read count overflow and the dropping of at least 2 billion reads per failure instance.
- Fixed error when removing an unsaved reads track from a track list.
- Fixed display problem showing too many hidden insertions in certain overlapping paired reads.
- Metadata for phylogenetic trees: A bug has been fixed with import of metadata containing column names with colons.
- Fixed a bug in the Mapping Coverage exporter.
- Fixed reads tracks reads-amount indicators (the numbers between the reads track and the box with the tracks name and number of reads) that sometimes wrongly said 0.
- Small RNA Analysis -> Annotate and Merge Counts: When you choose to create a “grouped on mature” output, the small RNAs are grouped by both the 5’ and the 3’ mature sequences separately in the “grouped on mature” output. The column heading has therefore been changed to show "Mature" instead of "Mature 5'".
- Two problems with the F1 help has been fixed; 1) When pressing F1 in a workflow tool wizard more than one help window appeared, and 2) Fixed problems showing help by pressing the F1 key in tool wizards.
- Amino Acid Change tool: In cases where an mRNA track does not overlap all annotations in the CDS track, "Coding Region Changes" were not added to variants that overlap a CDS but not an mRNA annotation. This has been fixed.
- The Low Frequency Variant caller could end up in an infinite loop in certain corner cases. This is now fixed.
- Fixed "Export Graphics" default save-as directory.
- Fixed problem with import of BED files using external applications.
- Hypergeometric test on annotations: Fixed a rare error that occurred for some datasets containing annotations of the form: '1234 // abc'.
- Fixed a bug in the QC report creation step of the ChIP-seq analysis.
- Fixed error when showing protein translations of annotations shorter than 3 bases.
- Fixed a bug for color space reads in the RNA-Seq Analysis tool that caused only exon-exon matches to be reported.
- Fixed problem going back in the wizard for the "Find Binding Site and Create Fragments" tool.
- Fixed a bug that in some cases caused an error when annotating read sequence lists with the GFF/GTF/GVF annotation tool.
- An issue where an XSQ file containing both base space and color space versions of the same reads were incorrectly imported into the same sequence list, resulting in each read appearing twice has been addressed.
Plugin updates and bug fixes
- A bug has been fixed with import of metadata containing column names with colons in the Metadata Import plugin.
Compatibility
- This release can be used with QIAGEN CLC Genomics Server 6.5.3
- This release is using the read mapping and de novo assembler that corresponds to QIAGEN CLC Assembly Cell 4.3.
QIAGEN CLC Genomics Workbench 7.5.1
New features and improvements
- "Filter Annotations on Name" can now insert names to filter on from significantly bigger files. Previously the limit for the file size was 10KB, this has now been increased to 20MB.
- RNA-Seq Analysis: The ENSEMBL gene id of each gene, where available, has been added as an additional column to the gene expression track output.
- Improved performances of the ChIP-seq Analysis tool for genomes with a large number of chromosomes.
- It is now possible to run a workflow without an optional input.
Bug fixes
- A bug has been fixed in the Set Up Experiment tool. Exon-related expression values can now only be selected when present in the individual samples.
- When creating a subset of a paired experiment, the sub-experiment no longer appeared as being paired. This bug has been fixed and sub-experiments created in previous versions should recover the pairing information when accessed with this version of the workbench.
- Pfam filtering bug fixed. Previously, Pfam only reported the first domain of each type in a query and as a consequence many domains were missed. We recommend that users whose research depends on Pfam annotations re-run the tool on their data.
- The AAC tool did not annotate variants in 3' UTR with their DNA-level change using the HGVS c.xxx format. This affects any analysis done with Gx 7.5 or earlier based on ENSEMBL CDS tracks from older versons. The AAC analysis should be redone using Gx 7.5.1 for correct annotation. Important: Please also check the description in the Gx 7.5 release notes of a bug fix in the translation of CDS annotations to protein sequences that was wrong in cases where the reading frame was not +1 or -1 in CDS annotations imported from ENSEMBL.
- Fixed problem importing VCF files using the AO and RO genotype field.
- Fixed problem importing certain VCF files.
- Fixed a bug in the 'Maximum Likelihood Phylogeny' tool that failed when generating bootstrap values for certain input alignments.
- Fixed problem with scrolling to the relevant files when selecting objects as parameters in tool wizards.
- The Blast text results have been improved so they show the correct query and subject positions regardless of strand.
- Fixed a problem that prevented BLAST operations when choosing to run these on the CLC Server.
- Fixed problem with import of read mappings with supplementary alignments. When importing read mappings with supplementary alignments, supplementary alignments are not imported. Previously import of such read mappings caused import errors.
- Fixed rare problem with coverage that could occur in zoomed out reads tracks containing wrapped paired reads.
- Fixed rare error when sorting experiment tables.
- Fixed a bug in the Annotate and Merge Counts tool that in rare cases resulted in incorrect sorting and crash.
QIAGEN CLC Genomics Workbench 7.5
New features and improvements
- New tools:
- New variant callers (Resequencing analysis):
- Three new tools for detecting variants are available in the "Variant Detectors" toolbox under "Resequencing Analysis": Basic Variant Detection, Fixed Ploidy Variant Detection and Low Frequency Variant Detection.The Basic Variant Detection and Fixed Ploidy Variant Detection are similar in nature to the Quality-based and Probabilistic Variant Detection tools respectively. The main difference is that all filters, previously employed in either the Quality-based or Probabilistic Variant Detector, are now available in all three variant callers, in addition to a new filter: the relative read direction filter. The Low Frequency Variant Detection tool is a new statistics-based tool for detecting low frequency variants e.g. in mixed tissue cancer or mixed population samples. The Quality-based and Probabilistic Variant Detection tools have been moved to the "Legacy tools" folder in the toolbox, and will eventually be retired. Please note that any benchmarking done for your own purpose using these tools should be repeated when you switch to the new variant callers. We recommend that you read the Special notes upgrading to Genomics Workbench 7.5 for further information.
- New variant callers (Resequencing analysis):
-
- Improved read mapper and a tool for downsampling (NGS Core Tools):
- Memory usage reduced for the read mapper, enabling mapping against human genomes on a modern notebook.
- Caching of reference index files improves the speed when the same reference is used repeatedly for read mapping.
- The new "Sample Reads" tool can be used to downsample large sets of reads for all types of NGS analysis.
- Improved read mapper and a tool for downsampling (NGS Core Tools):
-
- New ChIP-Seq tools (Epigenomics Analysis):
- The ChIP-Seq Analysis tool found in the toolbox under "Epigenomics Analysis" has been replaced with the plugin "Peak Shape ChIP-Seq Analysis" (that has been renamed to "ChIP-Seq Analysis"). The old "ChIP-Seq Analysis" tool has been renamed to "ChIP-Seq Analysis (legacy)" and moved to the new "Legacy tools" folder in the toolbox. The new ChIP-Seq Analysis tool uses a new approach to identify genomic regions with significantly enriched read coverage and a read distribution with a characteristic shape. The parametrization of the algorithm is done automatically by learning the characteristic shape of the signal from the data, making the algorithm intuitive and easily understandable.
- The "Annotate with Nearby Gene Information" tool can be used to annotate ChIP-seq peaks with the nearest gene upstream and downstream, based on the start position of the gene. The resulting annotations are provided in the same format as in the legacy ChIP-seq Analysis.
- New ChIP-Seq tools (Epigenomics Analysis):
- A new folder called “Legacy tools” has been added to the toolbox. The "ChIP-Seq Analysis (legacy)" has been moved to this folder along with the Probabilistic Variant Detection and the Quality Based Variant Detection tools.
- Workflows:
-
- The input information is now shown in the preview dialog and also exported to all formats.
- It is now possible to edit the workflow input name by right-clicking on the input name in the workflow.
- Tools with object parameters now accept multiple inputs. This applies to e.g. Trio Analysis that now can be run in a workflow using child, mother, and father as input in the same workflow.
- Workflows as such can have multiple inputs (though this will disable the batch functionality).
- Data can now be directly bundled with a workflow installation. This means that reference data can be packed and shared together with a workflow (only recommended for small data).
- A workflow input can be pre-configured. If kept unlocked, it can be used to give a default when executing the workflow.
- A text field has been added to the side panel, where you can search for elements in the workflow. A found element will be centered and highlighted.
- A new editor was added to the workflow to make it easier to check the configuration. The new editor can be accessed from the lower left corner of the View Area and lists all configuration parameters.
- Workflows can be packaged with a plugin and will get installed simultaneously with the plugin.
- Workflows installed on the server now have an overlay icon in the workbench, to make them easily distinguishable from workflows installed in the workbench.
- The execution of a workflow in the workbench and on the server has been unified to have the same behavior regarding logs, intermediate results and output naming.
- Locked settings in the workflow wizard are now again hidden per default when executing the workflow, to give a cleaner, simpler look to the configuration. When expanding, all parameters are displayed.
- One tool can now receive input from two different sources; 1) a reads track that is the input that hold the data that is to be analyzed (in this case reads that is to be locally realigned), and 2) a parameter that can have different functions depending on the tool that it is connected to (e.g. an InDel track can be used as a guidance track for the local realignment. In other situations the parameter track could be used for e.g. annotation or could provide a reference sequence).
- New workflow-enabled tools:
- Create sequence statistics.
- NGS Importers.
- Protein Analysis, Pfam domain search:
- Pfam Domain Search now uses HMMER3 and the latest Pfam database that can be downloaded with the new tool "Download Pfam Database".
- Searching multiple sequences is significantly faster.
- New filters are available in the improved Pfam Domain Search tool to enable generation of the same results as the online tool.
- 3D Molecule Viewer:
- Protein Structure Alignment - high quality structural alignment of whole protein chains or selected regions of a protein, available from the Side Panel of Molecule Projects.
- Project Tree improvements - new ways of selecting nearby atoms. Improved visualization of intermolecular bonds. Atom groups are now stored on the Molecule Project and can be renamed. Labels on custom atom groups now show residue names if applicable.
- New molecule color scheme where only the carbon atom color is varied.
- 3D view state - all 3D visualization settings (including custom atom groups) can now be stored on a molecule project and shared with others.
- Molecule Preparation. Many improvements, including better handling of partial charges and more recognized chemical groups.
- The option "Fix maximum of coverage graph" has been added to the Side Panel for reads tracks. This allows direct comparison of the coverage of the individual reads tracks.
- Local Search enabled from the menu bar now includes filtering on "Path".
- Advanced filtering on tables now includes the option to filter for a space, comma or semi-colon delimited lists of terms.
- Zoom tools redesign: The “Zoom to selection” feature is now also available for sequences, sequence lists, alignments and read mappings.
- The tracks info panel, with track names in the left side of the track, now wraps information instead of showing a scroll bar.
- Saving/applying side-panel settings for tables now works for different tables that share some columns.
- Graph Tracks can now be exported to Wiggle file; the span option is now supported in the Wiggle import.
- "Filter Based on Overlap" and "Annotate with Overlap Information" now accept Expression tracks as input.
- SAM/BAM import. It is now possible to choose to ignore unmapped reads when importing SAM/BAM files.
- Fisher's Exact Test. Added options for correction of p-value for multiple testing; Bonferroni correction and False Discovery Rate (FDR).
- Speedup: newly created expression tracks will display graph faster.
- Copy operations can now be stopped.
- Output from "Reverse Translate" can now be a Sequence List.
- Import of Example Data and imports done through dragging files into the workbench and dropping them in the Navigation Area will no longer block the user interface while executing. Instead, the import happens as a background process that can be monitored and controlled via the Processes tab in the lower left corner.
- CLC Workbenches now support high resolution displays such as Apple retina displays of all data shown in the View Area (including tooltips).
- Small improvements of the de novo assembler speed.
- Improved error message in the Empirical Analysis of DGE tool in case of invalid expression values in experiments (occurs rarely).
- More informative naming of coding region translations produced by the tool Translate to Protein. The name for a coding region translation consists of the name of the input sequence followed by the annotation type and finally the annotation name.
- Genetic codes: The list of NCBI translation tables has been expanded to include translation table 25 "Candidate Division SR1 and Gracilibacteria" (See: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c).
- Improved error messages due to low disk space.
Bug fixes
- Translation of CDS annotations to protein sequences was wrong in cases where the reading frame was not +1 or -1 in CDS annotations imported from ENSEMBL. This error affected the Translate to Protein tool, translation functionality in sequence viewers and their context menus, as well as the Amino Acid Consequences (AAC) variant annotation tool. We highly recommend redoing the AAC analysis for correct variant annotation, as CDS tracks typically are created from ENSEMBL data.
- A bug in the Fisher Exact Test tool that in some cases caused incorrect counting has been fixed. The Fisher Exact Test algorithm now checks if a case variant also exists in a control variant as a different type (e.g. an SNV variant can exist as part of an MNV variant). Note that variants only found in the control tracks are no longer included in the output.
- The right-click menu on certain annotations in tracks was not working when viewing a single track. This has been fixed.
- Icons in the workflow editor are now scaled consistently when zooming in or out.
- Several issues with the validation display in the workflow editor have been fixed.
- A bug has been fixed in the workflow configuration wizard. Previously the input was not taken into account when deciding which parameters were enabled.
- Fixed problem where the "space" key did not trigger "Find Conflict" in the stand-alone read mapping editor.
- Fixed stand-alone read mappings not showing mismatches and insertions in the overflow graph.
- Fixed a bug in the de novo assembler and legacy read mapper which could cause a crash due to a collision of temporary file names.
- Fixed a bug which caused the de novo assembler to crash in rare cases on systems running windows. Tools depending on read mapping might also have been affected by this.
- NGS import tools now work when run via CLC Server.
- 'Replace input sequences with result' in Cloning Editor no longer fails.
- A bug has been fixed in the Local Realignment tool. The bug materializes in extremely rare cases when applying the variant callers on locally realigned RNA-seq mappings with spliced reads. On these mappings, local realignment could generate invalid spliced reads (after local realignement, you could have spliced reads with segments that overlapped).
- Fixed minor annotation track rendering problem on tracks having peaks with large amounts of overlapping features.
- Fixed bug which caused "Merge Annotation Tracks" to output wrong "Origin tracks" annotations in some situations.
- Fixed bug causing occasional error when the same track list element was opened more than once.
- Fixed "Export Graphics" default save-as directory.
- Multi-sequence BLAST search results (BLAST tables) are now exportable as plain text.
- Due to a change in the COSMIC format QIAGEN CLC Genomics Workbench could not import COSMIC data. This has been fixed. Through Import->Tracks we now support the following COSMIC databases in tsv format, which can be manually downloaded from the COSMIC ftp site (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/download):
- Complete COSMIC data
- Complete mutation data
- All Mutation in census genes
Changes
- To improve the stability of workflows: If a variant caller finds no variants, an empty track is produced, rather than no output.
- Due to upgrade to Java 7, Windows Server 2003 and OSX 10.5.8, 10.6 are no longer supported by Oracle. Hence, the system requirements are now: Linux, Windows Vista, Windows 7, Windows 8 or Windows Server 2008, or Mac OS X 10.7 or later.
- As of June 2014, COSMIC download requires registration. This means that COSMIC is no longer part of the resources that can be downloaded with the Download Genome Data Tool. You can still register at the COSMIC website, download the file to your computer, and use the Import Tracks tool to import the data.
- In Experiments, the column previously named "Total intron reads" is now named "Total intron-exon reads", and the column previously named "Unique intron reads" is now named "Unique intron-exon reads". This new headings will appear in Experiment tables for both newly and previously created Experiments.
Compatibility
- This release can be used with QIAGEN CLC Genomics Server 6.5.
- This release is using the read mapping and de novo assembler that corresponds to QIAGEN CLC Assembly Cell 4.3.
QIAGEN CLC Genomics Workbench 7.0.4
Bug fixes
- Fixed a bug in RNA-Seq Analysis regarding the calculation of RPKM. This error was introduced with the new RNA-Seq tool in QIAGEN CLC Genomics Workbench 7.0. When calculating RPKM, the total number of gene reads was used instead of total exon reads. This will only have a significant impact in case there are many intron reads mapped to this gene. With this release we have fixed the bug. Users that base their analyses on RPKM values conducted with QIAGEN CLC Genomics Workbench 7.0 - 7.0.3 should refer to our public notification about this issue to get further details, including how to determine if re-running RNA-seq analyses will be necessary and a work-around if this will not be possible. The Legacy RNA-Seq plugin is not affected by this bug
- Fixed a bug in the Filter against Control Reads tool which meant that variants that are of type "Replacement" and which also introduce an insertion were not properly removed by the filter, even if there were reads supporting them. We recommend all customers that have relied on this tool for processing data with this tool in QIAGEN CLC Genomics Workbench 7.0.X to run the tool again in the 7.0.4 version.
- Fixed error that caused selections in views not to be centered in the middle of the view.
- Fixed bug that caused a crash in the Reassemble Contigs tool
- Fixed bug that made the Workbench crash when viewing tables under certain circumstances
- Fixed problem with "Find" on stand-alone read mappings with a circular reference and sequence lists containing circular sequences.
- Fixed bug that sometimes caused the workbench to crash when running "Local Realignment" on mappings generated with other mappers and imported as BAM files.
- Fixed problem with some parts of workflow not being executed if there was multiple branches in workflow
Changes
- Users running RNA-seq analyses with only gene annotations can now choose whether to calculate the RPKM for these genes (i.e. genes without transcripts) or not.
QIAGEN CLC Genomics Workbench 7.0.3
Bug fixes
- Fixed problem with the amino acid changes tool that reported all variants within coding regions as non-synonymous. This error was introduced with Genomics Workbench 7.0.2
QIAGEN CLC Genomics Workbench 7.0.2
Bug fixes
- Fixed problems causing an error when trying to uninstall plugins.
- Fixed issue with pausing and resuming running processes
QIAGEN CLC Genomics Workbench 7.0.1
New features and improvements
- Improved parameter specification for RNA-seq Analysis
- It is now possible to perform both batched and non-batched import of VCF files without genotype information
- Statistical Analysis: Improved reporting of invalid input to the tools "On Gaussian Data" and "On Proportions"
- Fasta export:
-
- Fasta export with trimming is now much faster and consumes less memory
- Fasta export now reports progress while executing
- When the "Remove trimmed regions" option is set, the Fasta export will ignore sequences in which all nucleotides are covered by a Trim annotation
- Translate to Protein (Batch Process):
- The wizard now has options for specifying whether to translate the coding regions or extract translations from the annotations
- The log has been made more detailed and informative
- If the result is just a single protein sequence, the output will be just that, otherwise all sequences are output as a list
- If the tool estimates that the number of protein sequences to be produced is greater than 1.000.000, it will create protein sequences without history, and it will not copy the common name, latin name, and taxonomy fields
- The PDB importer has improved support for custom residues
Changes
- When importing a VCF file. If multiple count tags are present in a VCF file, the VCF tags are prioritized in the following order: 1) CLCAD2, 2) AD, 3) AO
- In the "Amino Acid Changes" tool, the description of coding region changes at the DNA level now complies with HGVS recommended nomenclature with regard to variants in untranslated regions. Examples: "c.-4A>C" describes a SNV four bases upstream of the start codon, while "c.*4A>C" describes a SNV four bases downstream of the stop codon
Bug fixes
- Fixed problems where the workbench window would not position itself correctly on startup
- After annotating variants with the tool "Annotate from Known Variants" a small fraction of the MNVs disappeared. This has now been fixed
- The tool "Restriction Site Analysis" ignored the selected number of cut sites in GWB 7.0. This has been fixed
- Fixed table filter being invisible for vertically split tables
- Fixed problem where "InDels and Structural Variation" would throw ArrayIndexOutOfBounds exceptions for certain data
- Fixed missing icon for "Metadata Import" in the phylogenetic tree table
- Fixed "Filter Based on Overlap" accepting expression tracks as inputs but not knowing how to handle them
- Fixed typo in "RNA-seq Analysis" that was visible in workflows
- Fixed error in mapping long reads as part of de-novo assembly, Read Mapping Legacy plugin, RNA-Seq Legacy plugin, and Transcript Discovery plugin
- Multi BLAST results table: the missing "Description (E-value)" field is displayed again in the table output
- A rare error has been fixed in the Secondary Peak Calling tool
- An issue with workflow connections not being displayed properly has been fixed
- Fixed a bug that in certain cases made the De Novo Assembly fail
- Fixed a bug that in certain cases made the RNA-Seq Analysis fail
- Fixed a bug that made access to data impossible because of a failed rename operation
- Fixed a bug that made PDB import fail on workstations with Turkish Locale settings
- A problem importing Ensembl version 75 files has been addressed. If you have previously imported Ensembl version 75 files, please see the FAQ entry for full details of what to do
- Fixed a bug in labeling of phylogenetic trees. Newly created trees were labeled with saved general tree settings and subtree texts
QIAGEN CLC Genomics Workbench 7.0
New features and improvements
- New beta plugin available: memory-efficient read mapper that significantly reduces the memory requirements for read mapping.
- RNA-Seq on tracks: A substantial update of the popular RNA-Seq Analysis tool together with new statistical tools for analysis of differential expression form a great improvement for all users working with RNA-Seq.
- The output of the RNA-Seq Analysis is based on tracks and includes tracks with the read mapping, expression values and fusion genes. Tracks from different samples can be shown in one track list, enabling richer visual comparison across samples.
- The gene-level and transcript-level expression results are now output as two different tracks and can be used together for visualization. Downstream analysis can be performed on either.
- A new column "Relative RPKM" on the transcript-level expression track can be used to see the relative expression of alternative transcripts for a gene.
- Experiments based on the new expression tracks can be used for browsing the track list with read mappings and annotations.
- It is now possible to map the reads against the full genome as well as gene regions.
- The new read mapping algorithm introduced with QIAGEN CLC Genomics Workbench 6.5 is now also used for RNA-Seq. This means that mapping is faster but for some data sets it will also require more memory. For a human data set using the latest annotation sets (obtained through the Download Reference Genome Data), there is a minimum requirement at 16GB of RAM and we recommended 24 GB of RAM. If this causes problems, it is still possible to make use of the old RNA-Seq Analysis tool which is available as a plugin.
- The wizard has been redesigned to make use of tracks and includes a more explicit way of controlling what reference annotations should be used (if any).
- If you have an annotated reference sequence that you have used for RNA-Seq, you can convert it to tracks using Convert to Tracks.
- The fusion genes table has been changed into an annotation track which can be used to browse the read mappings in a track list.
- Variant tracks can be annotated with expression values from expression tracks.
- New RNA-Seq tutorials available at http://clcbio.com/tutorials
- New statistical test based on EdgeR:
- The tools available for statistical analysis of differential expression have been extended to also include the 'Exact Test' (developed by Robins and Smyth and implemented in the EdgeR Bioconductor package). The test is applicable to comparisons of pairs of groups and implicitly performs TMM normalization.
- New functionality for phylogenetic trees (was previously part of a beta plugin)
- Greatly enhanced viewer for visualizing and working with phylogenetic trees. The viewer allows the user to rapidly create high-quality, publication-ready figures of the trees.
- Large trees are made easy to explore using different zoom functionalities and a small minimap of the entire tree. The viewer also comes with two alternative tree layouts, namely circular layouts and radial layouts, which are great for visualizing very large trees.
- Supports importing, editing and visualization of metadata associated with nodes in phylogenetic trees.
- Tool to reconstruct phylogenetic trees based on k-mers. This approach avoids the computationally intensive step of constructing a multiple alignment of the input sequences. The k-mer based reconstruction tool is especially useful for whole genome phylogenetic reconstruction where the genomes are closely related.
- Tool performing a statistic evaluation of different substitution models to be used with maximum likelihood tree construction. The output of this tool is a report that lists the recommended settings to be used when constructing phylogenetic trees based on maximum likelihood.
- Added an option for using the Kimura 80 substitution model when creating trees with distance based methods.
- Distance-based tree reconstruction methods can now reconstruct trees from protein alignments using the Jukes-Cantor substitution model or the Kimura protein ML distance estimate.
- A user defined start tree can now be supplied to the ML inference tool.
- Complete redesign of the graphical user interface including:
- New tool bar graphics
- New product logos and colors, including splash screen
- New background graphics on canvas and in dialogs
- Tool bar has been re-organized
- New tab design. Aligning the look and feel across platforms, which is particularly important to mac-users as split screen used to take up a lot of screen space.
- New zoom tools. Easy adjustment of zoom speed for improved zooming of huge sequences.
- Detachable Side Panel editors. A Side Panel editor can be dragged from the main workbench window and dropped outside the workbench, making a separate window e.g. on a second screen, if available.
- New concept for Side Panels and Views:
- Support for multiple screens: views can be moved to a different screen by dragging the tab of the view
- Side Panels now consist of palettes that can be organized in group and the order can be customized
- Palettes can be detached and placed anywhere on the screen
- Navigation Area and Toolbox can be minimized to allow more pixels for displaying data
- Zoom tools redesign:
- The zoom tools have been re-organized and placed closer to the data
- A zoom slider shows the present zoom level and can be used to adjust zoom
- Detailed zoom is a new feature that allows zooming in and out in very small increments by dragging the zoom slider and moving the cursor above it. An expert feature for e.g. large tracks.
- Zoom to selection button now available for track views
- Copying data in the Navigation Area runs much faster and uses less memory than before. This is a great improvement which also kicks in when moving data between a QIAGEN CLC Genomics Server and a Workbench.
- Tracks:
- The speed of the Annotate with known variants and Filter against Known Variants tools have been greatly improved when using a large reference database like dbSNP.
- Table filtering of tracks: it is now possible to use "overlaps" and "doesn't overlap" when filtering on the region column. This allows for quicker inspection if any of the variants or annotations overlap a particular position.
- Tooltips on variant tracks in track lists now include the number of variants in the track.
- The Identify Graph Threshold Areas tool is now capable of identifying intervals with higher-than-average reads. This is obtained by setting a “window-size” parameter in the "Identify Graph Threshold Areas" wizard that specifies the width of the window around every position that is used to calculate an average value for that position.
- Previously, when importing variants from VCF files and from UCSC, a small number of variants were ignored because they were not proper replacements or MNVs because they contained reference bases at the ends. These variants are now trimmed and properly imported. This also affects the Download Reference Genome Data tool.
- When selectiong Import -> Track it is now possible to import several files into a single track in one step with the newly added batch mode function. Please be aware that this is not possible if you work with VCF files without genotype information.
- Workflows:
- Possibility to have bulk configuration of elements. This enables to set the same reference data for multiple elements at once.
- Workflows can be added inside a workflow. The inner workflow is "unfolded" into the single elements.
- Parameters can now be renamed in the editor by the creator during configuration of the elements.
- During workflow installation the wizard now allows editing of the name of the installed workflow.
- Workflows with invalid/unknown elements are laid out nicer and more consistent.
- The sidepanel has now an option to display rulers in the editor to indicate better the size of a workflow (particularly when exporting)
- Fit Width now fits the entire workflow in the editor by zooming out.
- The sidepanel has a new section "Minimap" which shows an outline of the whole workflow. It allows to navigate the workflow in the view and also supports zooming
- One can change the design of the workflow editor via the sidepanel (removed the old designs in the preferences)
- Better validation when configuring parameters in workflows
- If a tool receives inputs from at least two tools, the inputs can now be ordered via the context menu on the connections or the input part of the target element.
- The name of an output in the workflow can be set by configuring the output element
- Parameters of a workflow run can now be exported to various formats via the wizard
- It is now possible to reset a reference parameter. Before it was only possible by removing the whole element and add it again.
- In the workbench the installed workflows are now sorted alphabetically.
- The graphics export of a workflow now knows about the scale and one can now export the whole workflow or only the current view.
- A cpw file can now be dragged into the workflow manager and will be installed.
- Further speed improvements on working with larger workflows in the editor
- New tools that are now workflow-enabled:
- Create Track List. (With the requirement that all tracks must also be a workflow output.)
- Annotate with Flanking Sequences
- Convert from Tracks
- All tools in Statistical Tests
- 3D structure viewer:
- Property viewer - a new tab in the Side Panel. Shows detailed information about the atom under the mouse pointer. If multiple atoms are selected (Ctrl-click), the distance (two atoms selected), angle (three atoms selected), dihedral angle (four atoms selected) formed by the selected atoms, is shown in the Property viewer. If a molecule is selected in the Project Tree, meta-data relating to the molecule is shown in the Property viewer.
- Issues List. Issues related to the molecule structures and their chemistry representation is listed in the Issues List view on Molecule Projects. If seen in split-view with the 3D viewer, a selected issue will zoom to the atoms involved in the issue and select them in the 3D view.
- General improvements to the PDB importer
- Double-clicking an entry in the Project Tree will zoom-to-fit on the molecule or atom group.
- When selecting atoms (by mouse clicking or from sequence), the atom context (whole residue or molecule) will also be shown in the 3D view. From context menu on 'Current' selection, an atom group can be generated either from the exact selection or from the selection plus the context (whole residues or molecules).
- Create statistics for target regions:
- New options to ignore non-specific matches and broken pair reads have been added.
- Tables reporting base-level coverage and specificity have been added to the report. Moreover, bases in overlapping paired reads now only contribute with a coverage and count of one, as opposed to previously two.
- Amino acid changes:
- There are two new columns reporting amino acid changes for the longest transcript. Previously, amino acid changes would be reported for all transcripts, and this information is still available, but many users prefer just to use the longest transcript, and this information is now available in two new columns: one for the change on the protein level, and one for the change on the coding DNA level.
- Variants up- and downstream of the coding regions are now annotated with a coding DNA position as long as they are inside the transcript. In order for this to be reported, the amino acid changes tool has to be supplied with an mRNA track which will be used to determine whether the variant included in the transcript.
- Extract consensus sequence:
- Is now able to copy annotations from both existing consensus sequence and the reference sequence.
- When extracting consensus sequence from a mapping, conflict and low coverage annotations now include the position on the reference.
- New "Ambiguous base" annotations are added (when "Add annotations" are enabled in output options) for ambiguous consensus-symbols.
- Removals caused by filtering are now explicitly annotated as "Removal" and not erroneously as "Deletion". “Deletion” is now reserved for annotation of areas where the reads agree that a gap exist although the gap is not found in the reference sequence. The new Removals are named "Removed by filter" and qualified by a "Cause of removal=filtering" qualifier to make them distinguishable from removals caused by low coverage.
- Quality scores are now always computed for the consensus sequence when quality scores are available in the input read-mapping no matter the state of the "Use Quality Score" input parameter.
- Read mappings can now be exported to a tabular file including detailed per-base information on coverage and nucleotide composition including insertions and deletions.
- Trim annotations can be used to trim off sequences when exporting to fasta.
- Secondary peak calling has been improved: it now only detects peaks that have a distinct peak shape, only peaks that fall within the same interval as the top peak are called. In addition, trim annotations are taken into account so that no peaks are called within trimmed regions. This greatly reduced false positive calls. Finally, the annotations now include information about the secondary peak's fraction of the maximum peak height.
- Advanced table filter now includes an option to search for "starts with" in addition to just "contains"
- Limitations on export of Excel 2010 files (xlsx) are removed:
- Multiple tables can be exported to one xlsx file
- Reports can be exported to xlsx
- Hyperlinks are preserved in xlsx files
- SignalP prediction has been updated to be server-, batch- and workflow enabled.
- Folder contents view: subfolders will display how many items they have
- Policy settings now also control the use of the "Download Reference Genome" tool (using the online_search key)
- Assemble Sequences tools now accept sequence lists as input.
- REBASE restriction enzyme list updated to version 310.
- InDels and Structural Variants: We now require higher similarity for short alignments than for longer alignments for accepted matches.
- Quality-based and Probabilistic Variant Detection: The homopolymer filter has been improved so that longer variants no longer are lost when it is applied.
- ChIP-Seq tool now reports if the background distribution cannot be calculated properly.
Bug fixes
- Workflow:
- In the editor the "Fit Width" zooming was active, but behaved as "100%" zooming. Therefore the "100%" zooming is now active instead of the "FitWidth"
- Added or connected elements are now placed near where you connect or add them, even when zoomed near or far.
- It was possible to Undo the action of adding a workflow output, but it was not possible to Redo afterwards.
- The right-hand icons in an element now scale with zooming.
- The log of a workflow run on the server contains now the same information as when run in the workbench.
- When configuring elements in the editor, the "Reset to CLC Standards" button is now functional and will reset the parameters to their default values. When configuring during installation or execution the button is disabled.
- The log of a server executed workflow now states when the workflow has been cancelled.
- A workflow with elements which provide additional inputs could not be batched.
- Crash when adding data to experiments (e.g. by running any of the statistical analysis tools) has been fixed.
- Track visualization: various bug fixes of track visualization.
- Various bugs in the extract consensus sequence tool have been fixed.
- Tracks with many "chromosomes" took up extra disk space. These are now more compressed.
- In Reads Tracks, if no quality score information is available all residues are given the worst value (0) rather that the highest (64), which was the case before.
- Fixed crash when creating a detailed mapping report.
- Read mapping and variant detection wizards were unresponsive when using input with many reference sequences.
- When translating to protein, ambiguous nucleotides potentially resulting in stop codons were not translated properly, and only the codons resulting in an amino acid were represented in the protein. Now the stop codons are also represented by an X in the protein sequence.
- A problem with filtering and sorting the BLAST output table has been solved. Some of the columns were treated as text instead of numeric.
- Restriction maps, histogram data, and primer tables could not be exported to csv or similar. This has been fixed.
- When setting up an experiment, samples in groups are now ordered according to how they appear when selected as inputs (in the same order).
- When using the “Find Open Reading Frame” tool, the input sequence was reported incorrectly in the ORF table. This has been fixed.
- Fixed problems with Excel export that failed when special characters were used in the name.
- Some of the tooltips associated with table column headers did not match the right column header. This has been fixed.
- A bug in the reported number of "perfect mapped" and "non-perfect mapped" reads has been corrected.
- InDels and Structural Variants:
- Fixed problem with detection of deletions in GWB versions 6.5.1 and 6.5.2
- The reported allele sequence for the special kind of insertions that are tandem duplications was wrong. This has been fixed.
- Fixed a bug in the code that should make sure that the similarity required for a mapping of an unaligned end is applied only to the aligned part of the unaligned end and not to the full unaligned end (this is important for insertions. For these, the part of the unaligned end that is inserted should not be considered).
- A corner-case in the InDels and Structural Variation tool for inputs containing long unaligned ends has been fixed (previously, when running the InDel and Structural Variants tool, errors occurred for some read mapping data sets).
- Fixed problem where "Indels and Structural Variation" would break on certain inputs.
- Extract Consensus Sequence: When using ambiguity codes for conflict resolution the filters (noise threshold and minimum nucleotide count) were always applied globally and not only in the presence of a conflict. Now the filters are only applied when there is an actual conflict.
Changes
- The two tools in the "Multiplexing" folder in the toolbox category NGS Core Tools have been changed:
- "Process Tagged Sequences" has been renamed to "Demultiplex Reads" and is now directly in the NGS Core Tools folder.
- "Sort Sequences by Name" has been moved to the Sequencing Data Analysis folder.
- The De novo assembly legacy plugin has been discontinued and is no longer available for this release.
- Motif search: annotations created by the motif search are of type "Motif" instead of "Region"
- The “Download Genome” tool found under “Download” in the toolbar has been renamed to “Download Reference Genome Data” to make clear that the “Download Reference Genome Data” can be used to download e.g. annotations and variant data as well as reference genomes.
- The “Fasta” importer found in the toolbar under “Import” has been renamed to “Fasta Read Files” to stress that this importer preferably should be used to import read files rather than reference sequences. The reason for this is that the “Fasta read files” import option allows the read names to be included, whereas the descriptions from the fasta files are ignored. Hence, as the standard import function keeps the descriptions in addition to the read names, we recommend using the Standard Importer for import of other fasta format data (such as reference sequences).
Compatibility
- This release can be used with QIAGEN CLC Genomics Server 6.0
- This release is using the read mapping and de novo assembler that corresponds to QIAGEN CLC Assembly Cell 4.2.1
QIAGEN CLC Genomics Workbench 6.5.2
Bug fixes
- Fixed: Error message when running analysis on experiments (statistical tests, clustering etc.)
- Fixed: track lists would sometimes be rendered empty when scrolling tracks with insertions.
- Fixed problem of unresponsive dialogs when running analysis with many reference sequences.
- Fixed a problem in track lists that caused the view to crash when there is an insertion at the very end of a chromosome.
- The folder used by the Workbench for storing log and settings files on Windows 8 has been updated to follow conventions for Windows 8 which is identical to Windows 7. Any existing settings will be copied to the new location automatically.
- Fixed various problems related to launching the Workbench through Java Webstart.
- Fixed: Opening a search view for searching sequences at NCBI would sometimes fail.
- Fixed: The Target Regions Statistics tool did not handle annotations covering the starting point of circular reference sequences properly. If you are using the tool with annotations spanning across the starting point of a circular reference, we recommend re-running the analysis.
- In BAM files created by BWA, non-specific reads are now recognized as such during import. Previously, they were treated as unique reads.
- Improved stability of Probabilistic variant detection on huge data sets.
- Fixed various stability and performance problems of Maximum likelihood phylogeny.
- Fixed problem that caused a crash with extract consensus sequence tool with certain parameter configurations and with read mappings with no reads.
- Fixed crash of detailed mapping report tool with certain data sets.
- Fixed error in importing SOLiD XSQ files.
- Fixed an error when importing BAM files, including problems regarding download of reference sequences.
- Fixed a read mapper error occurring under special circumstances when excluding regions of a reference when mapping reads .
- Fixed a bug in the Assemble Sequences tool causing some data sets to produce inferior contigs.
Compatibility
- This release is based on QIAGEN CLC Assembly Cell 4.2.1
- This release can be used with QIAGEN CLC Genomics Server 5.5.X.
QIAGEN CLC Genomics Workbench 6.5.1
New features
- VCF export allows you to enforce diploid reporting of the variants. This will enable the VCF files to be parsed with other software relying on each line to report two alleles. As part of this, the CLCAD field is replaced with CLCAD2 (read more in the user manual). If you use VCF export in workflows, please see this special note.
- Heat maps: It is now possible to show a legend on a heat map.
Changes
- Variant comparison tools are workflow-enabled
- When importing Genbank nucleotide sequences, the Workbench will determine whether it is DNA or RNA based on the sequence rather than the description in the file.
Bug fixes
- Fixed: An important issue with the interpretation of ensembl-style gtf files when using the Download Genomes functionality or the Import Tracks functionality of the Genomics Workbench. This issue only affects version 6.5 of the Genomics Workbench. If you have downloaded gene annotations using Download Genomes or have chosen to import ensembl-style gtf annotation files using the tool Import | Tracks using version 6.5 of the Genomics Workbench, then we highly recommend that you delete the annotation tracks you have generated, and perform the download or import again. Annotations from earlier versions of the Workbench are not affected by this issue.
- Fixed inconsistencies when importing variant files from UCSC, affecting variants on the negative strand where the allele sequence is longer than one base. This affects dbSNP tracks downloaded using the Download Genome tool, and we highly recommend that you delete any variant tracks imported or downloaded from UCSC, and perform the import or download again.
- Fixed: Filter Against Control Reads was using only the first control reads track, if multiple ones were selected. The issue affected both 6.0 and 6.5 versions. If you used multiple control read tracks simultaneously to filter variants, we strongly recommend that you redo the analysis.
- Fixed: Trimming sequencing data for vector contamination from UniVec failed
- Fixed: It was not possible to proceed in the ChIP-Seq Analysis wizard without a control sample.
- Fixed: GFF export failed.
- SAM/BAM import: reads mapped to reference sequences that were not provided during import is no longer included in the list of unmapped reads. Instead they are in a separate list.
- Improvement of information displayed in license dialogs
- Fixed: in track views, coloring reads on quality score did not affect unaligned ends.
- Fixed: in track views, coloring reads on quality score did not work properly for paired reads.
- Improvements and fixes to the Indel and Structural Variation tool:
- Improved the detection of insertions and deletions from self-mapping evidence particularly relevant for amplicon data
- Fixed: a bug which caused some variants to be called as 'replacements' that should be called as 'insertions' or deletions
- Fixed: a bug which caused the structural variantions to go undetected for long unaligned ends
- Fixed: In the trim dialog, it was not possible to de-select an adapter list without resetting all settings to default.
- Fixed: In Trio Analysis, homozygous variants on chr Y and MT and male X were wrongly marked as de novo mutations when not found in the father. The parameters for Trio Analysis have been changed as part of this.
- Fixed: In Trio Analysis, variant tracks with "unknown" zygosity values would be accepted, creating misleading results. Read more in the FAQ.
- Fixed: SAM and BAM export now supports direct gzip and zip compression of the files.
- Fixed: Copying information from the Folder Contents view did not work.
- Fixed: Local Realignment fails on certain data sets
- Fixed: out of memory error when performing bootstrapping with ML tree construction methods.
QIAGEN CLC Genomics Workbench 6.5
New features
- Variant detection:
- New tool for adjusting read mappings through local realignment. The Local Realignment tool has the option to realign unaligned ends, realignment with a guidance variant track (e.g. obtained from external resources such as dbSNP, through the Indels and Structural Variants tool described below or from analysis of other read mappings) and allows for realignment of multiple samples. Has previously been available as a beta plugin.
- New tool for detecting structural variants (detects insertions and deletions, intra-chomosomal translocations, tandem duplications and inversions) working on "unaligned ends (soft clippings)". Has previously been available as a beta plugin.
- Important changes to variant reporting: adjacent variants are now reported as one variant instead of linked variants.
- A new variant filter has been added to both “Probabilistic Variant Detection” and “Quality-based Variant Detection”: “Ignore variants in non-specific regions”. This new filter ensures that variants in regions covered by just a few non-specific reads are ignored.
- Probabilistic Variant Detection: A new threshold filter, “Required variant count”, has been added to the wizard. This filter ensures that only variants present in a number of reads that exceeds the specified threshold are called.
- Quality-based Variant Detection: Addition of a new column that reports hyper-allelic status of variants. This is based on the specified threshold “Maximum expected allele” in the “Set genome information” wizard under “Ploidy”. The output in the table is “Yes” or “No” with respect to whether the threshold has been exceeded.
- A new column has been added to the variant track table that describes the length of the insertions, deletions, and replacements. This makes it possible to filter on the length of e.g. insertions/deletions.
- VCF export is now using genotype fields. The tag CLCAD is used for count of a variant, and PL is used for coverage. In this version, one variant track will result in one VCF file.
- Variant annotation:
- New tool for comparing variants between two samples
- Filter against known variants and Annotate from known variants: An MNV in the input track can be annotated as a partial match of an SNV in the track of known variants, if the SNV is a subset of the MNV.
- Filter against known variants: There is a new option to let MNVs be annotated as an exact match if several SNVs in the track of known variants can be joined to represent the full MNV allele sequence input track.
- When running the “Annotate with overlap information” tool using an annotation track as input and a variant track as parameter track, the column describing the specific variant in the Track Table now shows the position and description of the variants. The variant description also appears in the track tooltips when holding the mouse over the variants.
- Workflows:
- Automatic adjustment of layout in workflows. It is now (again) possible to adjust the connected workflow elements automatically. Right click in the workflow editor to access a menu with the option "Layout". Clicking on "Layout" will adjust the layout of the workflow. The layout can also be adjusted with the quick command Shift + Alt + L.
- Automatic update of tools in workflows. Tools in existing workflows will automatically be updated when opened from the Navigation Area. If new parameters have been added to the updated version of the tool, these will be used with their default settings. A workflow can be kept in its original form by saving the updated workflow with a new name as this will ensure that the old workflow is kept rather than being overwritten.
- Information: In the “Manage Workflows” dialog a new tab has been added providing information about the workflow (such as when it was built and which version of the workbench was used).
- Highlight used elements: In the Side Panel under “View mode” it is now possible to select “Highlight used elements”, which will show all elements that are used in the workflow. Unused elements are grayed out. The “Highlight used elements” can also be activated with the quick command Alt+ Shift + U.
- Highlight Subsequent Path: Is a further development of the old option “Mark Subsequent Path”. If you right click on the name of one of the tools in a workflow, it is possible to select “Highlight Subsequent Path”, which will highlight the path in the workflow from the tool that was clicked on and further downstream. All other elements in the workflow will be grayed out.
- Create Installer: A workflow can now be installed directly from the workbench. This can be done with the “Create Installer” button (or the quick command Alt + Shift + I). Three options exist in the “Create Installer” dialog: 1) Install the workflow on your local computer, 2) Install the workflow on the current server (requires that you are logged on to the server and that you are the administrator), and 3) Create an installer file to install it on another computer.
- Export can now be part of workflows.
- Workflow enabled elements can be dragged directly from the Toolbox into the workflow editor.
- Workflow images can be copied from the editor by using Ctrl + C (copy) and pasted into the desired destination with the Ctrl + V command.
- All elements can be removed from the workflow with the shortcut Alt + Shift +R.
- Previously, when running the “ChIP-Seq Analysis” tool, the result would be a copy of the read mapping with annotations added. Now the annotations are added to the read mapping used as input. Workflows using the "ChIP-Seq Analysis" tool must be manually updated, deleting the ChIP-Seq workflow element and adding it again.
- Reinstallation of modified workflows can now be done directly with the “Create Installer” function. A pop-up dialog provides the option to make "forced import" of an already installed workflow.
- Speed improvements in the workflow editor means that the user experience when editing large workflows has been greatly improved.
- New tools that are now workflow-enabled:
- Classical Sequence Analysis, Alignments and Trees
- Classical Sequence Analysis, General Sequence Analysis
- Classical Sequence Analysis, Nucleotide Analysis
- Molecular Biology tools, Sequencing Data Analysis
- Track Tools, Annotate and Filter
- Track Tools, Graphs
- Resequencing Analysis, Compare Variants
- Transcriptomics Analysis, General Plots
- De Novo Sequencing
- 3D Molecule Viewing: The integrated viewer of structure files, the 3D Molecule Viewer, has been completely redesigned. The Molecule Viewer offers a range of tools for inspection and visualization of proteins and other molecules stored in structure files from the Protein Data Bank (PDB).
- De novo assembly
- New tool: Map Reads to Contigs. This tool allows mapping of reads to contigs. This can be relevant in situations where contigs have been imported from an external source, the output from a de novo assembly is contigs with no read mapping, or if you wish to map a new set of reads or a subset of reads to the contigs.
Scaffolds can be exported in AGP format: scaffolded contigs are exported as individual contigs and not as a single scaffold with N's inserted in between contigs. This allows for submission-ready data. - Great performance improvement when updating the contig sequence based on reads that are mapped back to contigs.
- New tool: Map Reads to Contigs. This tool allows mapping of reads to contigs. This can be relevant in situations where contigs have been imported from an external source, the output from a de novo assembly is contigs with no read mapping, or if you wish to map a new set of reads or a subset of reads to the contigs.
- Tracks: Several new features have been added
It is now possible to:
-
- When there are more reads than can be shown in the available view area, an overflow graph will be displayed below the reads. The overflow graph was previously shown in grey. Now the overflow graph is shown in the same colors as the sequences. Hence, it is now possible to distinguish forward, reverse and paired reads in the overflow graph as well as mismatches in reads.
- Insertions from variant tracks and reads tracks can now be shown in tracks.
- For variant tracks, a new side-panel option “Insertion” allows the user to select whether to display insertions or not.
- For reads tracks insertions seen in more than a given percentage of reads are shown. The default percentage is 1%, setting it to 0% will show all insertions (like the cluster editor) and setting it to 100% will show no insertions.
- Insertions in reads tracks that are present at a frequency below the specified threshold are shown with a vertical line in the reads to indicate its location.
- Reads tracks now also have a mouse-over tooltip that provides information about insertions at specific positions. This tooltip reports the number of reads that contain the insertion and lists what was inserted.
- Extract reads from read tracks in two different ways:
- Extract sequences from tracks. Allows extraction of all reads as single sequences or as sequence lists.
- Extract from selection. Allows the creation of a reads track containing only reads that fall within the selected region, and of specific types.
- Four new options are available in the Side Panel for Track layout when viewing a reads track:
- Show quality scores: Makes it possible to adjust the colors of the residues based on their quality scores. In cases where no quality scores are available, blue (the color normally used for residues with a low quality score) is used as default color for such residues.
- Matching residues as dots: Replaces matching residues with dots in reads tracks. This option makes it easier to spot variants.
- Show read type specific coverage: When enabled, the coverage graph that summarizes those reads that could not be explicitly shown is now replaced by one coverage graph for each read type found in the Reads track. This can be used for easy and visual comparison of the strand specific coverage.
- Only show coverage graph: When enabled, only the coverage graph is shown and no reads are shown.
- A new tool has been included: “Identify Graph Threshold Areas”. This tool uses graph tracks as input to identify graph regions that fall within certain limits (thresholds that have been specified by the user).
- Extract annotations from track. This tool makes it very easy to extract parts of a sequence (or several sequences) based on its annotations.
- Create a track list with the shortcut Ctrl + L
- The create histogram tool now also accepts graph tracks as input.
- The error message "Too much data for rendering. Either zoom in to view data, or adjust data aggregation threshold" has now been added to the big grey box that appears in cases where a track cannot be shown. Previously only a big grey box was shown with no further explanation.
- Opening a large table view of a variant track is no longer blocking the user interface. It is running in the background, and it is possible to stop loading the data by closing the table view.
- The Coverage analysis tool is a new tool that can find regions in a read mapping where the coverage is suddenly dropping or rising.
- The "Assemble Sequences" and "Assemble Sequences to Reference" tools are now batch, server and workflow enabled.
- Assemble Sequences: Trimming is no longer integrated with the “Assemble Sequences” tool. This means that trimming must be done separately with the “Trim Sequences” tool.
- Export framework redesigned
- Export of multiple files: you can export several files in one go. The naming of the file will default to the name used in the Navigation Area of the Workbench, but the user can specify a naming pattern to use instead.
- Export formats: A new column “Exports selected” has been added to the “Select exporter” table that indicates with a “Yes”, “No” or “Partly” whether the currently selected element can be exported with the given exporters. Partly means that you have made a selection of elements where only some of them can be exported by the selected exporter.
- Improved usability with a quick-select dialog for choosing the right export format. The dialog includes a description of each exporter that can be quickly filtered.
- Export can be integrated into workflows
- Support for direct compression of exported files in zip and gzip.
- Previously, VCF export required the user to know that both a variant track and a sequence track should be selected before exporting. This has changed, so that the user only has to select the variant track as input, and the sequence track is supplied as a parameter. This means it is more obvious that it should be selected, and it also means that the choice of sequence track will be remembered for the next vcf export.
- The folder viewhas been improved with the following:
- It is now possible to drag and drop objects from the folder editor. This will create a copy of the objects at the selected destination.
- Attribute columns will be left empty if the attribute has not been defined (previously attribute values that had not been defined were set to 0 and checkboxes were shown as unchecked).
- A new column showing the first 50 residues of each sequence has been added as an option.
- The column with the name “Length” has been renamed to “Size”.
- The column “Size” shows the length of a sequence, or for sequence lists, the number of sequences e.g.:
- Sequence or contig lists: the number of sequences/contigs
- Read mappings: the length of the consensus sequence
- De novo assemblies: the length of the reference
- Alignments: the length of the alignment
- The Side Panel “Save/Restore Settings” functionhas been expanded with a new feature:
- The “Save/Restore Settings” function (found at the top of the Side Panel) has been redesigned. It is now possible to save settings in two different ways. 1) The settings can be saved for this element type in general, e.g. for a track it would be save settings “For Track View in General”. In this way the settings will be applied each time you open an element of the same type, which in this case means each time one of the saved tracks are opened from the Navigation Area, these settings will be applied. These “general” settings are user specific and will not be saved with or exported with the element. 2) Settings can be saved with the specific element only e.g. for a track it would be save settings “On This Track Only”. The settings are saved with only this element (and will be exported with the element if you later select to export the element to another destination).
- Alignments: If you have one particular sequence that you would like to use as a reference sequence, it can be useful to move this to the top. This can now be done automatically by right clicking on the sequence name and then selecting “Move Sequence to Top”.
- The Sequence List Table has been improved with a new feature. A new column showing the first 50 residues of each sequence has been added as an option.
- SOLiD import now accepts XSQ files
- The following Plug-ins are now fully integrated in the Workbench:
- InDels and Structural Variation (old plugin name: "Structural Variation")
- Local Realignment
- Extract Annotations
- The tomato genome, Solanum lycopersicum SL2.40.18, available in the Download Genome tool.
- Phylogenetic trees:
- Create Tree now support the Kimura 2-parameter substitution model for DNA sequences and Kimura's distance estimate for protein sequences (Kimura 1983).
- It is now possible to construct Maximum Likelihood phylogenies from protein sequences.
Improvements
- Scrolling in read mappings: The mouse scroll speed through read mappings can now be performed with increased speed. Shift + Alt + Mouse wheel scroll makes the scroll 10x as fast as when using Alt + Mouse wheel scroll. When zoomed all the way in, each mouse wheel step scrolls 10 rows.
- Sort Sequences by Name: The multiplexing tool now allows a delimiter between group names in the “Sort Sequences by Name” wizard. This means that the selected group names are separated by an underscore. Previously all selected group names were merged without any delimiter.
- Cloning: The cloning editor can now work without having a designated vector. In essence this means that when no vector is selected you go directly in “Stitch mode” when a fragment has been selected, whereas you go in “Cloning mode” when a cloning vector and a fragment are selected.
- Renaming of data in the Navigation Area by clicking twice has been improved. Previously, you could accidentally enter rename mode when the intention was to get focus in the Navigation Area. Now, you can only trigger rename by clicking when the Navigator has focus.
- Filter Annotations on Name: The wizard layouts for the tool when used directly as opposed to when included in a workflow has been standardized.
- Extract consensus sequence tool:
- It is now possible to use the quality scores when resolving conflicts or disagreements between reads with “Insert ambiguity codes”. Previously, “Use quality scores” could only be selected when using the “Vote” option for conflict resolution.
- Low coverage regions are now annotated in the consensus sequence produced.
- The Extract Consensus Sequence dialog is now shown when extracting the consensus sequence when right-clicking a selection on the reference sequence in the mapping view, enabling the user to extract part of the consensus sequence.
- The Extract Consensus Sequence dialog is now shown when extracting the consensus sequence when right-clicking the name of the consensus or reference sequence, or when clicking the Extract Consensus button in the mapping table. The right-click menu option on the consensus to Open Sequence Including Gaps has been removed, since this functionality is now covered by the Extract Consensus Sequence tool.
- When using the “Translate to protein” tool, the max limit has been raised to 1GB.
- The sequence action "Open Copy" has been removed and "Open This Sequence" has been renamed to "Open Sequence".
- The alignment tool is now more memory efficient.
- Tables: Improved auto-adjustment of the column width (based on content and number of columns).
- Read mapping: The speed of running a read mapping against a masked reference has been improved significantly. When mapping reads to a reference sequence, it is possible to map reads to only selected annotated regions of the reference (= masking). Previously masking of a reference was performed by replacing the masked out nucleotides with N's. The new masking method discards the masked out nucleotides by splitting the reference into separate sequences. Hence, the masked out sequences are completely ignored in the analysis. The remaining sequence fragments are positioned according to the original unmasked reference sequence.
- Read mapping: The status bar in the lower right corner now shows the corresponding positions on the reference/contig sequence.
- The read mapper will now place ambiguous gaps to the left, as opposed to the right, to ensure better concordance with common variant databases.
- BLAST has been upgraded to BLAST+ 2.2.28 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.2.28 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- Usability improvement of simple table filtering:
- A dedicated filter button has been added to apply the filter directly without having to wait until the filter is automatically applied
- For tables with more than 10000 rows, the filter will not be applied automatically after a delay. Instead, there is a helping text asking the user to apply the filter through the "Filter" button. This avoids premature filtering before entry of the filter text has completed. Since filter can take some seconds for large tables, this used to be an annoyance because the user would have to wait until filtering was done to complete the entry.
- Phylogenetic trees:
- Bootstrapping with the "Maximum Likelihood Phylogeny" is now possible.
- Bootstrap values are now displayed in percent instead of absolute numbers.
Bug fixes
- Numbering of amino acids when calculating amino acid changes was wrong for coding regions spanning the starting point of circular chromosomes. We recommend running amino acid calculation again. Please note that the actual amino acid change is called correctly, only the numbering is affected.
- PDF export of the history of a result did not include the name and version number of the Workbench that produced the result.
- Phylogenetic trees:
- The Juke-Cantor distance estimate now ignore all positions containing gaps in pairwise alignments.
- Disabled substitution rate estimation when the corresponding option is deselected by the user in the Maximum Likelihood Phylogeny tool.
- Fixed a bug that caused branch lengths to be estimated incorrectly for ML trees.
Changes
- Option to calculate RPKM values for genes without associated transcripts has been added to RNA-seq analysis.
- System requirements for Linux has changed. From this release, SuSE is supported from version 10.2. This was previously version 10.0.
- Secondary Peak Calling: The parameter “Fraction of max peak height for calling”, in the “Secondary Peak Calling” wizard, has been changed to use the interval 0-1with 0.2 as default setting. Previously the interval was 0 – 100 with 20 as default setting.
QIAGEN CLC Genomics Workbench 6.0.5
Bug fixes
- Fixed problems downloading and importing COSMIC variation data introduced in QIAGEN CLC Genomics Workbench 6.0.4: Sex chromosomes and mitochondrial genome were not annotated. We recommend everybody having downloaded or imported COSMIC variations with QIAGEN CLC Genomics Workbench 6.0.4 to re-do the download or import and re-run all analysis where this COSMIC variant track has been used.
- Various minor bug fixes.
QIAGEN CLC Genomics Workbench 6.0.4
Bug fixes
- Fixed problems with annotation file download and import with genomes with more than 22 autosomes. Read more in the FAQ
- Fixed problem in workflows: it was not possible to configure all elements when running a workflow that branched after the input element.
- Fixed issue with automated association of chromosome names during import of track data for some non-human organisms.
- Fixed problem when trying to start ChIP-Seq analysis on a QIAGEN CLC Genomics Server
- Fixed error in Find primer binding sites
- Fixed error in Quality-based Variant Detection
- Fixed problem with zero coverage not reported properly in target region statistics
QIAGEN CLC Genomics Workbench 6.0.3
Bug fixes
- Detailed mapping report: better labeling of plots
- The Create Statistics for Target Regions tool begins counting the reference positions at 0 rather than at 1. This causes a discrepancy with the reference position reported in other tools.
- Description text in progress area is now making full use of available width of the progress area
- Fixed errors relating to exporting graphics of read mappings
- Handling of line breaks in annotation notes improved
- On Linux: User interface text has been changed to not use bold font to make a better visual appearance
- ChIP-Seq annotations were not added when running ChIP-Seq on the Genomics Server. The fix means that workflows using ChIP-Seq will be broken and needs to be re-configured by deleting the ChIP-Seq element and adding it again.
- Create mapping graph tracks caused problems when part of workflows
- Fixed error that caused variant detection to crash
QIAGEN CLC Genomics Workbench 6.0.2
Improvements
- An update to the de novo assembly algorithm means that it will only include Ns in the contigs when doing scaffolding, or if the reads themselves contain Ns. Previously, ambiguities in the graph behind the assembly resulted in regions of Ns, but these have turned out to be problematic for customers submitting their results to NCBI, so the algorithm is now taking extra care to avoid this.
- VCF export: headers mentioning the name and version producing the VCF file, and the identifier of the origin variant track is also encoded as a CLC URL in the header. The installer of the Workbench will per default associate the CLC URL with the Workbench, so that it can directly open the file. Alternatively, the id can be pasted into the search field in the Workbench to retrieve it.
- Fragments generated from restriction site analysis can now be opened in batch. When multiple rows in the fragment table are selected, the right-click menu option will now create a sequence list with all the selected fragments.
- For alignments, mappings, BLAST results and other sequence views, the right-click options to Open Copy of Sequence and Open This Sequence have been merged to Open Sequence. If a copy should be created, use Save As with the new sequence, or drag it into a folder in the Navigation Area.
Bug fixes
- Import or download of UCSC variant tracks was only done partially with no warning to the user. Only variants on chr1 were annotated. This has now been fixed, but we strongly recommend all users downloading or importing variant data from UCSC using Genomics Workbench 6.0 to re-run the import/download using the new version.
- Trio analysis tool did not report a reference allele as a de novo mutation, even if both mother and father only had variant alleles at this position. This has now been fixed so that reference alleles are not considered special when analyzing the inheritance.
- The RNA-Seq Analysis produced only single reads in the unmapped reads list. This has now been fixed, and we encourage customers using paired reads as input and performing downstream analysis of the unmapped reads to rerun the RNA-Seq Analysis.
- In the GO Enrichment Analysis tool for variant data, some columns were missing. This has now been fixed.
- When trimming paired data, section 4 in the report did not show the right number of reads used as input.
- Several errors related to workflow configuration and execution have been fixed.
- Errors related to managing Workspaces have been fixed.
- An error occurring when using variant tracks from old versions in the Compare Variants tool has been fixed.
- Annotations were added by the Find Open Reading Frames tool, even though the option to add annotations was not selected. This is now fixed.
- Fixed an out-of-memory problem in the Create Alignment tool.
- The result of the Target Regions Statistics tool is now named after the input file.
QIAGEN CLC Genomics Workbench 6.0.1
Improvements
- The RNA-Seq tool supports strand-specific mapping of paired reads.
- Improved performance of showing and exporting coverage graphs
- Added Legal and Tabloid formats for printing
- Made the reporting of automatic pair distance ranges for de novo assembly and read mapping more user friendly
Bug Fixes
- Workflows including variant detection need to be upgraded. The variant detection elements need to be re-created and connected.
- Fixed error in probabilistic variant detection that caused it to crash.
- Fixed an error in the trim report: When several trim methods were chosen, the numbers did not accurately reflect the number of sequences trimmed in each step.
- Fixed an error in the figure showing the paired distance in the RNA-Seq results report
- Fixed an error when translating DNA to protein. When more than 10 sequences were produced, the resulting protein sequence included X instead of * as stop symbol. We advice customers to re-run any analyses with the translation tool when using more than 10 sequences as input.
- Non-specifically mapped reads (multihit reads) were colored red and green instead of yellow in packed view and when disconnect pairs is enabled. This is now fixed.
- Fixed error in target region statistics when some regions were 0 bases long.
- Link to reference sequence were missing from the history of mapping results, this is now fixed.
- Unmapped reads from de novo assemblies were not passed on to the next element in a workflow, this is now fixed.
- Various minor bug-fixes
QIAGEN CLC Genomics Workbench 6.0
New features
- Workflow: there are several important new features for workflows
- It is possible to control which parameters should be locked or unlocked. This means that the creator of the workflow can decide which parameters should be left open for adjustment when the workflow is executed.
- Usability of workflow editor greatly improved
- There is a grid for helping layout
- Visual indication of the number of possible connections as input to a workflow element
- Visual indication to show if parameters have been changed
- Handles for dragging and connecting elements have been made more clear
- Side Panel for controlling grid layout and switching on compact visualization of workflow elements
- You can high-light selected paths in a workflow
- It is now an option to attach the workflow design file with the installer to allow users edit the workflow
- There is a special icon for workflows in the Toolbar to make the creation and installation of workflows more visible
- Several tools are now workflow-enabled:
- Workflow compatibility: with this release, all of the tools in the Resequencing folder and the Trim tool have changed. This is mainly due to the change in the variant format (explained below). Workflows using these tools need to be updated by deleting the tool, adding it again and restoring the connections and parameters that have been modified. When you open the workflow editor, the workflow elements that need to be updated are high-lighted in red. For installed workflows, this needs to be done in the original workflow design, and the installer needs to be re-built and installed again. We are sorry for the inconvenience caused by this, and we are working on a solution to make the upgrade mechanisms for the next release much more smooth.
- Variant detection and resequencing
- New variant data format. We recommend all users of the variant detection tools to read the change notes in the manual for this release. The main features are:
- Variants are reported with one entry per allele. This means that heterozygous variants are represented as two lines, including one line for the reference allele.
- Variants were previously joined to form MNVs. The MNV concept has been replaced by linkage groups that mark that two variants have been observed together and assures that tools like Amino Acid Changes will produce correct results.
- As a consequence, the variant types have been updated.
- As a consequence of the new data format, the Filter against Variant Database tool has been updated and renamed to Filter against Known Variants:
- The auto-link feature is now obsolete
- There are now three modes of filtering (learn more here ). The filter for exact matches replaces the Haplotype Comparison tool which has been removed from this release
- New tool for annotating variants with flanking sequence from the reference
- New tool for removing reference allele variants
- New variant data format. We recommend all users of the variant detection tools to read the change notes in the manual for this release. The main features are:
- De novo assembly
- Automatic paired distance estimation is now part of the de novo assembly
- Guidance only option is now able to use single reads as well as paired reads
- The number of Ns deriving from ambiguities in the graph data structure built by the assembler is reduced. Note that this does not refer to Ns inserted as part of scaffolding.
- Fixed problem causing scaffold annotations to be removed when updating contig sequences based on mapping
- Improved the scaffolding accuracy for overlapping contigs.
- Mapping reads to circular chromosomes is now fully supported
- Visualizations of reads that map across the starting point of the sequence are shown both at the start and end of the reference sequence, marked with >> to indicate that the alignment continues at the other end.
- All algorithms and exporters support circular mappings
- When downloading genomes using the Download Genome tool, circular chromosomes are marked as circular. If this information is important for the further analysis, please download or import a new copy of the reference genome, since this information is not part of existing tracks. Circular and linear versions of the same chromosome are compatible when used in comparisons and for track list visualization.
- New tool for extracting consensus sequence from a read mapping or BLAST result:
- A number of options for handling low-coverage regions, including putting in Ns or splitting the consensus sequence
- Ability to decide for ambiguity or voting scheme taking quality scores into account when dealing with conflicts. A noise threshold can be added for the ambiguity option.
- Consensus sequence are annotated with important events (low-coverage regions and conflicts).
- Ability to run in batch and be part of workflows
- New tool for merging overlapping pairs
- Tracks
- Scrolling in mapping tracks can be done by pressing Alt while scrolling with scroll wheel or track pad
- Vertical zooming in graph tracks can be achieved by pressing Alt while scrolling with scroll wheel or track pad
- Vertical panning in graph tracks can be achieved by pressing Alt while using the Pan tool
- VCF export of variant tracks: Please note that you have to select both the variant track and the reference genome sequence track before you click Export.
- Trim:
- Runs on multiple cores. This will greatly speed up trim on computers with multiple cores.
- The definition of adapters for adapter trim has changed from the preferences to its own filein the Navigation Area. This makes it easier to manage large sets of adapters, it solves some usability problems related to the old dialog, and it makes it possible to work with adapter trim from the QIAGEN CLC Genomics Server Command Line Tools. Adapters can be imported directly using the standard import framework, or they can be created from scratch by manually adding in the adapter list editor.
- Target region statistics:
- The minimum coverage value is use throughout the coverage report and tracks for defining low coverage thresholds
- Additional table and plot in the report showing how many target regions have a certain percentage of the region above the low coverage threshold.
- Additional information in the track: median coverage and fraction of fragment covered by the minimum coverage
- New output type: per-base coverage table can now be created
- Detailed mapping report includes more information:
- The tables for non-specific and non-perfect matches display the fraction of all mapped reads in addition to the number of reads
- Overview plot of lengths of insertions and deletions in the read alignments
- Tables and plots showing differences between reads and reference for each base.
- Information about quality score distribution for matches and mismatches
- Distribution of mismatches on read position
- Information about number of reads with unaligned ends and distribution of lengths of the unaligned ends
- Mapping views:
- New option to disconnect pairs in the view. This is particularly useful for overlapping pairs which can be hard to tell apart in packed view.
- Small RNA annotation:
- miRBase can now also be imported from a file. Previously only direct download was possible.
- Grouping on mature use all mature, not only 5' end.
- Statistics on ambiguities are now available in the annotation summary report.
- RNA-Seq: fusion gene table has been changed to list broken pairs rather than gene combinations. The pairs can be extracted to a sequence list for further investigation.
- Import of tabular mapping files is no longer supported. This format was produced by the early Illumina pipelines (with Eland) and this is no longer relevant. The SAM format has taken the place as the de facto standard for mapping data.
- Toolbox improvements:
- New Favorite Toolbox: A new tab next to the Toolbox holds
- Frequently used tools. This is automatically updated based on which tools are used most frequently.
- Favorite tools: Right-clicking a tool in the Toolbox allows you to add a tool to your favorites list.
- Quick launch of tools: Pressing Ctrl + Shift + T shows a dialog for easy typing and launching tools.
- New Favorite Toolbox: A new tab next to the Toolbox holds
- Relevant view settings are now copied when switching between different views of the same data. As an example: if you have specified a set of restriction enzymes to display in the circular view of a sequence and switch to the linear view, these enzymes will also be listed in the Side Panel here. Note that the Side Panel settings are only copied to the new view if they have been changed by the user in the old view.
- Performance when sorting of large tables has been improved
- Rename can now be done through right-click menu in Navigation Area.
- Annotations on circular sequences:
- When shown in linear view they have a cleaner appearance. Before, there was a line from beginning to end of the annotation, and this has now been removed.
- When shown in circular view, it is no longer displayed as a joined annotation over the start point but as a continuous annotation.
- Alignments: The performance of the algorithm for running multiple alignments has been improved and now runs on multiple cores.
- Find Open Reading Frames can be run in batch and workflows
- Translate to protein can be run in batch and workflows
- Restriction map: Excel export now creates a sheet for both the cut sites table and the restriction map.
- Alignments can be used as input for finding primer binding sites.
- Export now has progress and can be canceled.
- BLAST results and 3D structures can be exported as text.
- Batching: Previously, results were saved in the same folder as the input data. This can now be changed and a new save folder can be specified. Sub-folders will be created for each batch unit.
- Export to fastq now supports sequences up to 32k in length
- The limit for the cloning editor has been increased to 6,000,000 bases (from 4,000,000 bases).
- Naming of output from de novo assembly and read mapping made consistent
- Create Expression Clone (LR) from Gateway Cloning produces sequence object rather than a list
- Shortcut key for Translate to Protein has changed from Ctrl + Shift + T into Ctrl + Shift + P.
Bug fixes
- Fixed a number of mapper errors causing the mapper to crash.
- Fixed a problem in the read mapper when estimating paired distances. This lead to very few reads mapping as pairs.
- Fix to the proxy settings recognition meaning that Download Genomes and download of BLAST databases now work when there is a proxy setup.
- Fixed problem of not correctly formatting qualifiers in EMBL export.
- Fixed a problem sorting sequence lists on modification date.
- Test on proportions: Fixed an error caused by the wrong group being used as reference, which means that the positive values should have been negative and vice versa.
- Various bug fixes.
Plug-in releases
Structural variation plug-in has been updated with a completely new algorithm based on unaligned ends (soft clipping). The plug-in is still in beta. Read more in the user manual .
QIAGEN CLC Genomics Workbench 5.5.2
Bug fixes
- Fixed a number of mapper errors causing the mapper to crash.
- Fixed a problem in the read mapper when estimating paired distances. This lead to very few reads mapping as pairs.
- Fix to the proxy settings recognition meaning that Download Genomes and download of BLAST databases now work when there is a proxy setup.
- Fixed problem of not correctly formatting qualifiers in EMBL export.
- Fixed a problem sorting sequence lists on modification date.
- Test on proportions: Fixed an error caused by the wrong group being used as reference, which means that the positive values should have been negative and vice versa.
- Various bug fixes.
QIAGEN CLC Genomics Workbench 5.5.1
Improvements
- Improved accuracy of read mapping
Bug fixes
- Important: In Genomics Workbench 5.5, the Process Tagged Sequences tool would sometimes switch the sample names of the results. We strongly recommend everybody to update to the new version, and re-run all analyses made with this tool in Genomics Workbench 5.5.
- Fixed: Various read mapper bug-fixes that made the read mapper crash on certain data sets
- Fixed: Workflows would fail when intermediate results were empty (e.g. if no variants were found and a variant track was used for subsequent analysis).
- Fixed: Consensus generation when creating standard read mappings was slow in Genomics Workbench 5.5
- Fixed: Some IonTorrent sff files would fail to import on Windows.
- Various bug fixes
QIAGEN CLC Genomics Workbench 5.5
New features
Resequencing tools
- New variant caller: Probabilistic variant detection.
- This is based on a probabilistic model in contrast to the quality-based variant caller that is based on quality analysis and cut-offs.
- Supports genomes with a ploidy of 1, 2, 3 or 4.
- Pre-filtering for non-specific matches and intact pairs
- Post-filtering of homopolymer regions and forward/reverse reads balance
- The current SNP and DIP detection tools are merged into one: Quality-based Variant Detection
- Pre-filtering for non-specific matches and intact pairs
- Post-filtering of homopolymer regions and forward/reverse reads balance
- Target regions statistics(previously a plug-in) is now integrated into the Workbench
- A new parameter: Minimum coverage that will report the fraction of each region that is covered by at least this number of reads
- Works on tracks: the regions of interest are defined in a track and the resulting per-region table is reported as a track
- Annotation and filtering tools for variants
- Annotate and filter against database variants (dbSNP, 1000 genomes or other databases that can be downloaded or imported)
- Filtering of marginal variant calls based on average base quality, forward/reverse reads balance and frequency
- Annotating variants with exon numbers
- Variant comparison
- Compare variants within group: Find variants that are shared between a number of samples
- Fisher exact test: Compare variants between case and control groups to find variants that are more common in the case than in the control
- Trio analysis: Compare child-father-mother variants to enable studies of inherited and de novo mutations
- Filter against control reads: Compare a variant track against a control sample to remove variants that are also present in the control
- Filter on haplotype comparison: Identifies variants that have the same haplotype in two samples.
- Functional consequences of variants
- GO enrichment analysis.This tool can be used to investigate the effect of candidate variants by analyzing the affected genes for a common functional role.
- Amino acid changes: Classify synonymous and non-synonymous variants and see the effect on the protein.
- Annotate with conservation scores: Annotate a variant with a score from conservation tracks that can be imported into the Workbench.
- Predict splice site effect: A simple investigation to see if the variant is within two bases of an intron-exon boundary
Download of reference genome and annotations
- Integrated download of reference genome sequences and annotations for selected organisms
- Example: for human hg19, you can directly download sequences, genes and transcripts, variants from 1000 genomes, Hapmap, COSMIC, and dbSNP (incl. common SNPs).
Tracks
- Genomic information for re-sequencing analysis can now be stored as tracks.
- Great power for comparison and visualization because different kinds of data (reads, variants, genes etc) are not bundled into one static file but are separated into one file per data type. This means that different data sources can be compared and visualized in a flexible way.
- Track lists provide a mechanisms for combining several de-coupled tracks into one list for visualization purposes while retaining the individual files that contain the data
- All tools for re-sequencing has options to create and use tracks (e.g. read mapping, variant detection etc). More tools will be re-designed to work with tracks later.
- Tools for converting between standard sequences and mappings and tracks:
- Convert tracks to sequences, mappings etc
- Convert sequences, mappings and annotations to tracks
- Tools for filtering, annotating and merging tracks
- Support for importing files as tracksfrom a number of new formats:
- Fasta
- VCF
- BED
- Wiggle
- UCSC table format
- GFF / GTF and GVF
- Complete genomics master var files
Workflow
- Workflows can be built in the Workbench to combine various tools from the Toolbox into one analysis, connecting the output from one tool to the input from another
- Workflows can be distributed and installed either in the Workbench or in the QIAGEN CLC Genomics Server
- The creator of the workflow can configure parameters for the workflow and these will be fixed when the workflow is distributed and installed
- The creator of the workflow decides which of the output from the tools that should be saved and which should be discarded
- Workflows can be run in batch, making it a powerful tool for crunching high numbers of samples through the same pipeline.
New read mapper
- Great improvement of speed for mapping (white paper to be released soon)
- Support for complex genomes with many repeats
- Re-design of wizard for read mapping to make it simpler and easier to use. Options to control consensus sequence building and annotating with conflict annotations have been removed, since they have very little relevance for the amounts of data created by NGS platforms today
- Color space mapping is still performed with the old mapper
- Automatic calculation of paired distance (only for base space data)
- Report includes percentage of reads instead of only counts
- Changed strategy for placement of gaps: previous versions tried to cluster gaps into as few units as possible. This would sometimes cause problems for variant calling because this would in some situations place the gaps differently from read to read.
- Please note that the memory requirements are different than for the old mapper. The memory requirements depend largely on the size of the reference genome. We will soon update our system requirements page to reflect this.
Sequencing QC report: Create summary statistics for sequencing data in various ways:
- General statistics on read length etc
- Quality statistics on quality scores
- Over-representation analysis of subsequences
- Analysis of duplicated reads
Re-organization of menus in general to be more genomics focused
- Classical sequencing tools organized into a subfolder (for gene and protein analysis, alignments and trees etc)
- Molecular biology tools like cloning, PCR primer design, Sanger sequencing analysis etc moved to a special folder
- Two new folders for core NGS and core track tools
- Application-specific folders for the various types of NGS applications: resequencing, de novo sequencing, transcriptomics and epigenomics
- Search tools moved to the Download menu (available from the top menu and the Tool bar)
- Different importers integrated into one menu, including the new track import. The Vector NTI import has been moved into a plug-in that can easily be installed from the plug-in manager.
- The Local Search has been moved from the Search menu (now renamed to Download) and into the Edit menu
Special notes for customers already using the Genomics Gateway plug-in
- New search tool in track list editor
- New navigation and position panel at the top of the Side Panel in the track editor
- Download tool for downloading genomic data replaces Ensembl download tool
- Unlimited number of chromosomes in tracks
- More streamlined conversion tools:
- Convert tracks to sequences, mappings etc
- Convert sequences, mappings and annotations to tracks
- Export tracks to gff, sam
- Print and graphics export of tracks
- New tool for filtering marginal variant calls
- New tool for annotating against database variants
Plugin updates
-
- Genomics Gateway plug-in is integrated into the standard Genomics Workbench and Server.
- Probabilistic variant detection Plug-in is integrated into the standard Genomics Workbench and Server.
- Sequencing QC plug-in is integrated into the standard Genomics Workbench and Server.
- Target regions statistics plug-in is integrated into the standard Genomics Workbench and Server
- Grid integration plug-in is integrated into the general server plug-in. If a grid preset is present on the server, the Grid option becomes available in the Workbench dialog.
- Old read mapper made available as a legacy plug-in that customers can download. This facilitates compatibility of results with previous versions and can be used when memory requirements for the new mapper are too large.
- Beta read mapper is integrated into the standard Genomics Workbench and Server.
- Biobase genome Trax is redesigned and split into two:
- For downloading data (requires a download license)
- For annotating a variant track (requires an online license)
QIAGEN CLC Genomics Workbench 5.1.5
Improvements
-
- Ion Torrent paired protocols are now supported for both fastq and sff files. Read more...
- MiSeq multiplexed data directly supported. This means that the barcoded samples are recognized on import and the reads are grouped accordingly. The reads from the same sample will be grouped in its own sequence list.
- New broken pair mate locater tool for getting overview of where the mates of broken pairs in a selected region are mapped. It includes the possibility to extract a sequence list with the broken pairs.
- Aligned fasta import and export is now supported (see http://www.bioperl.org/wiki/FASTA_multiple_alignment_format). A consequence of this is that the standard fasta import of sequences will reject to import sequences that contain gaps, assuming they should be imported as alignments instead.
- User manual includes a section on which tools will be benefit from computers with multiple cores.
- The license order ID is visible in the License Manager, both for static and network licenses. For security reasons, the last 10 characters of the ID are masked. This will prevent unauthorized persons from copying the license order ID to another computer, but will allow the CLC staff to identify the license used.
Bug fixes
- Fixed: ChIP-Seq Analysis would sometimes yield no results when the FDR could not be estimated. This error was introduced with Genomics Workbench 5.0.1. If you have had ChIP-Seq samples were no peaks were reported, we recommend re-running the analysis with the new version.
- Fixed: Cloning bug: when performing restriction cloning in regions with single-stranded DNA, you would get an error.
- Fixed: 454 paired data import: quality scores on the second part of the read were not imported.
QIAGEN CLC Genomics Workbench 5.0.1
Plug-in updates
Probabilistic Variant Detection Plug-in updated
- There is a new filter that requires sequencing reads from both strands to call a variant
- The forward and reverse coverage for each allele is reported in the output
Bug fixes
- Fixed: Downloading of protein sequences from NCBI fails.
- Fixed: Calculation of cDNA-level changes in variant detection fails in some situations.
- Fixed: Trimming tool in Sequencing Data Analysis (not for High-throughput Sequencing data) does not add annotation to sequences when the full sequence should be discarded.
- Fixed: Opening external files (e.g. pdf files or Word documents) with spaces in the file name does not work on Windows.
QIAGEN CLC Genomics Workbench 5.0
New features
- New de novo assembler.
- Scaffolding is integrated into the assembly. This means better resolution of contigs and insertion of Ns when two contigs cannot be joined in sequence but there is pair information that connects them.
- New extended report for the assembly with information about nucleotide distribution, contig lengths measurements and scaffolding regions.
- User interface improvements: Wizard re-designed to better reflect the process of the assembly. The parameters related to the mapping step are only available when the user chooses to map the reads back to the contigs.
- New parameter for specifying the maximum bubble size. There is a default value which is automatically calculated based on the input data.
- New white paper with benchmarks and results from quality control.
- The old de novo assembler is available as a plug-in. At the end of 2012, the plug-in will be discontinued, so it should only be used for backwards compatibility with results from older runs or if the new assembler fails.
- Printing and pdf export of read mappings: the mappings are now wrapped to make better use of the paper.
- SNP and DIP detection results include cDNA-level numbering and variant information compatible with www.hgvs.org/
- SAM files exported from the Workbench now include basic information about read groups. Furthermore, read orientation for paired reads is now preserved when exporting to SAM and BAM files.
- Improved exploitation of multi-core machines in read-mapping, RNA-Seq, and de-novo assembly.
- Improved performance and memory management for high-throughput analyses in general.
- Usability of Close icon on tabs has been improved. Both in terms of responsiveness and making it impossible by accident to initiate a drag and drop movement when you hit the close icon to try to close a tab.
- "Show" submenu has been removed from File Menu, and the right-click menu now includes only the relevant views and editors. This provides a better overview.
- The behavior of the Close Other Tabs function has changed so that it will close all views, regardless of the way the view area is split.
- The most common annotation types are assigned a special color per default. Other annotation types previously got the same color. This has been extended so that the Workbench attempts to find a special color for each type.
- VectorNTI import is no longer in a separate plug-in but part of the Workbench. The functionality remains unchanged.
Plugin updates
- Genomics Gateway plug-in updated
- New tools for analyzing variants in groups of samples, enabling systematic analysis of genetic variants for whole genome, exome or targeted approaches.
- Find Common Variations in Group. This can be used to find common variants in a group of variant tracks.
- Fisher Exact Test. Comparing two groups of variant tracks (e.g. can be used for case-control studies). You can see which variants are found more common in the case compared to the control group using the Fisher Exact test.
- Filter against Control Reads. This can be used to compare a single case variant track against a negative control from the same sample. It will check whether a certain number of the reads in the control sample have the same allele present as in the case variant.
- New tools for functional annotation of variants
- GO Enrichment Analysis for identifying significant gene ontology terms, which are annotated to genes having at least one variation.
- Annotation with Conservation Scores. By importing a conservation score track (e.g. PhyloP Scores), variants can be annotated with a conservation score. Variants with a high score are assumed to alter functionally important regions.
- New data structure.
- All tracks are now saved as single files, and you can create a Track List to visualize them together.
- A tool is available for data conversion from track sets to single tracks
- New organization of the "Tool box" to provide a better overview
- Support for batching and running tools on a Genomics Server
- The Track List view supports drag and drop for adding and re-arranging tracks
- Several Graph tracks can be created and displayed
- New tools for analyzing variants in groups of samples, enabling systematic analysis of genetic variants for whole genome, exome or targeted approaches.
- Probabilistic Variant Detection Plug-in updated
- The probability used as threshold for the algorithm is now reported in the output
- Variants reported cDNA-level numbering and variant information compatible with www.hgvs.org/
- Additional Alignments Plug-in updated
- The algorithms have been updated to the most recent versions
- The list of algorithms has been reduced to two for compatibility reasons
QIAGEN CLC Genomics Workbench 4.9
New plug-ins and plug-in updates
- New plug-in released: Ab Initio Transcript Discovery
Brand new tool for transcript discovery. Based on gapped alignments of RNA-Seq data, the plug-in identifies new transcripts and creates or extends annotations on the reference sequence that can be used for measuring gene expression using the RNA-Seq Analysis tool of the Genomics Workbench. The plug-in provides functionality a la Cufflinks/TopHat. Note that this used to be called the Large Gap Mapper plug-in. - Genomics Gateway plug-in updated
- New refiner: variant frequency. This allows you to filter a variation track, so that only the variants that have a frequency above a user-defined threshold remain. Note that the filter only applies to the frequency of non-reference alleles.
- Performance improvements when visualizing read tracks
- Fixed: CDS annotations from Ensembl did not include start codons
- Fixed: Some variation tracks were not always recognized as variations. This means that the variation-specific refiners could not be used.
- Fixed: Table view of annotation tracks could have a very large number of columns that are now combined into one column.
- Fixed: There was an error when closing a view without saving changes. This could lead to subsequent errors when trying to rename tracks.
- MLST module updated
- Possible to download MLST schemes from any web site compatible with mlstDBnet
- When a new allele is called because the sequencing reads are not long enough, this is reported in the isolate view rather than “New allele”
- Structural variation plug-in updated
- Only detection of insertions, deletions and interchromosomal variations are now supported.
- The plug-in has a problem with repeats. The best way to work around this is to ignore non-specific matches when doing the mapping, to run the structural variant detection with a very stringent p-value cutoff and filter repeats out afterwards if possible (this could be by refinement with the microsatallite track from Biobase or another repeat track using the Genomics Gateway).
- Integration of exporter to export results in circos format.
See a list of all plug-ins here.
New and improved features
- Process tagged sequences
- A summary report is now available with an overview of the number of reads per bar code.
- You can search for barcodes (MIDs) on both strands, supporting new 454 protocol.
- Core management: you can restrict the maximum number of cores that the Workbench is allowed to use. This can be useful when the Workbench is running on a system with shared resources where other applications need reliable access to CPU when the Workbench is doing analyses. This is mainly an issue for the De novo assembly and Read Mapping algorithms but the restriction applies to all algorithms that use several cores.
- Multi-site Gateway Cloning . You can perform multi-site gateway cloning and in a few clicks create your expression clones with multiple fragments. The existing Gateway Cloning tool has been expanded so that you can easily recombine several fragments as well as continue using it for the standard Gateway Cloning.
- Find Binding Sites and Create Fragments improved:
- If your template sequence contains ambiguity nucleotides (like N, Y etc), these will no longer count as mismatches when checking your primers. Note that the primer base of course need to be covered by the ambiguity symbol (e.g. a T would still be a mismatch if the template sequence has an R, which means either A or G).
- Fixed: When using multiple template sequences, the choices to open or annotate a fragment from the fragment table did not work properly. They always applied to the first sequence although the fragment was located on another sequence (as indicated in the table).
- Exporting fastq format no longer includes redundant name of the read in the quality score line. Now the name only appears once per read.
- Enhancing the nomenclature of reporting amino acid changes in variant detection:
- p. prefix included
- ? used for unknown (rather than non-standard “Unknown”)
- = used to denote an allele which agrees with the reference sequence (rather than missing entries or entries like Ala45Ala)
- [...] used around ,-separated lists of changes, each change coming from a different CDS annotation
- [...];[...] scheme used to separate multiple alleles at same site
Bug fixes
- Fixed: Import of SOLiD data failed when multiple sets of paired data was selected.
- Fixed: Annotations spanning the sequence from start to end did not display right when the sequence was wrapped. The annotation was only displayed on the first line.
- Fixed: Set-up experiment would crash when using many samples.
- Fixed: Calculation of consensus sequence in read mappings: Sometimes a majority of gaps would be ignored and a base erroneously introduced in the consensus sequence. It occurs when 1) there is no coverage in an initial segment of the reference sequence, and 2) a gap is encountered in the global read alignment. From that point onwards, gap counts are included in the consensus vote, but they are taken from the start of the mapping (where they are all 0), so they are out of sync with associated base counts. High gap counts would then kick in further downstream, possibly making the consensus a gap where it should not be. We recommend checking your mapping results manually if you rely on using the consensus sequence for further analysis.
- Fixed: importing adapters for trimming and barcodes for de-multiplexing did not work properly for CSV files and empty rows in Excel files were not allowed.
- Fixed: Motif search did not exclude regions with Ns when the option “Exclude matches in N-regions for simple motifs” was selected.
QIAGEN CLC Genomics Workbench 4.8
New plug-ins and plug-in updates
- New scaffolding de novo assembler released as a beta plug-in .
- New read mapper released as a beta plug-in. First version without color space support.
- New probabilistic variant detection released as a beta plug-in.
- Genomics Gateway beta plug-in updated:
- Direct download of annotations from Ensembl through the Workbench.
- Support for importing zipped data
- Multiple files can be imported in one go
- Conservation track from UCSC can now be imported
- Common SNP track from UCSC can now be imported
- Tools to merge and copy tracks
- New refiner to extract a subset of genes from a gene track (look for the Name filter refiner)
- SpliceSite refiner to annotate variations that affect exon/intron boundaries
- Various bugs fixed
- Check the updated manual
- Structural variation beta plug-in released in version 2:
- Now support for inter-chromosomal structural variations
- Works with mappings created using the Large Gap Mapper beta plug-in
- Check the updated manual
New and improved features
- De novo assembly improvements:
- Word size can now be manually adjusted
- When update contigs is not selected, the resulting mapping table will also include contigs where no reads map back. This means that the number of rows in the table will be identical to the number of “Simple contigs” produced by the de novo assembler. Previously contigs with less than two reads mapped back would be omitted from the table.
- Merge Mapping Results will produce a mapping table when mapping tables are provided as input
- New button to extract a subset of mappings from a mapping table
- Mapping tables now include a row for reference sequences where no reads map. This is done to provide consistency of results. Opening such an entry in the table will just open the reference sequence in the table.
- You can switch between compactness levels by pressing the Alt key while scrolling with your mouse or touchpad.
- SNP detection no longer ignores ambiguity bases in the reads. Each ambiguity code is treated as a separate variant; no merging of the possible variants covered by each ambiguity code is attempted (this typically only has an effect when using Sanger sequencing data since standard NGS platforms do not use ambiguity base calls).
- Translation in the Side Panel of nucleotide sequences now includes options to translate “All forward” or “All reverse” reading frames.
- Conflict table view of read mappings: reference positions also reported in addition to the consensus sequence position.
- Alignments: it is now possible to copy all annotations from one sequence to other sequences in the alignment.
- Cloning editor: number of restriction cut sites and motifs are shown separately for the sequence currently displayed and for all sequences in the list.
- Restriction enzymes updated with latest REBASE version.
- Clean-up of the Workbench window so that it no longer holds information about which Workspace is active. This information is now displayed with check boxes in the Workspace menu.
- SAM import and export format is now described in detail in the user manual.
Bug fixes
- Fixed: Orientation of SOLiD mate-pair data was not set correctly on import. This meant that the reads were marked as broken pairs after mapping. We strongly recommend all users to re-run the import if using SOLiD mate pair data.
- Fixed: Virtual tag lists created with RNA failed
- Fixed: For circular molecules, the Find Open Reading Frames tool did not find reading frames on the negative strand. We recommend users to rerun any reading frame analyses on circular molecules.
- Fixed: Experiments tables can now be exported in Excel and csv formats
- Fixed: BLAST searches at NCBI always searched nr or nt, regardless of which database was specified. This has been a problem since the release of QIAGEN CLC Genomics Workbench 4.7
- Fixed: If a combination of trim options is used, like quality trim or length trim in addition to adapter trimming on both strands, the reads could end up reverse complemented.
- Fixed: Import of paired data generated by Illumina Casava 1.8 did not match the pairs correctly. Users are advised to re-import and re-analyze all data imported from Casava 1.8.
- Fixed: Pattern discovery wizard failed when the tool is run for the second time.
- Fixed: De novo assembly sometimes failed on Mac OS 10.7 Lion.
- Fixed: Errors for read mappings with the text “premature end of .cas file” have now been fixed. This has only been a problem on Windows.
- Fixed: Certain annotation types were mapped to generic annotation types when exporting sequences in Genbank format.
QIAGEN CLC Genomics Workbench 4.7.2
Bug fixes
- Fixed: A cache-related bug which would sometimes result in errors when running large jobs.
- Fixed: The UniProt search has been updated to reflect URL-changes at uniprot.org.
- Fixed: A problem with interpretation of broken pairs on re-import from SAM format files.
- Fixed: A problem with microarray experiments where large experiments could not be analyzed.
QIAGEN CLC Genomics Workbench 4.7.1
Bug fixes
- De novo assembly produced empty results
- Paired distances for read mapping were not recorded correctly in history
- Read mapping in batch: the minimum and maximum paired distance fields were enabled even though the “Override” checkbox was unchecked
- Improved performance of packed view rendering
- Various minor bug-fixes
QIAGEN CLC Genomics Workbench 4.7
New and improved features
- Mapping
- Packed view of mapping data: a great improvement of the default way of visualizing NGS reads when mapped to a reference.
- New mapping data format supports multiple alignments and allows for import and full visualization of Complete Genomics evidence files in SAM format
- New plug-in for gapped read mapping of e.g. cDNA to genomes
- New plugin to detect Structural variation
- Action to detect structural genomic variation from paired read information
- Action to detect copy-number variation (CNV) from coverage information
- New and more flexible data structures to store information about paired data
- All history entries will from now on include the version number of the software
- Previous limit at 2 billion for the maximum number of reads in one analysis has been removed.
- Reporting of amino acid changes in SNP and DIP detection now follows recommended nomenclature more closely w.r.t. changes that affect start codons and changes that cause indels at the amino acid level.
- Performance of Excel 2010 exporter improved in terms of speed and memory requirements
- When using a license server, the Workbench user can now specify a custom user name which can be checked in the license server configuration. This makes it possible to get more fine-grained control of the users of the license server.
- Export of trace data in scf format.
- BLAST
- BLAST parameters have been changed so that number of threads is 1 per default (there is a bug in the BLAST code provided by NCBI which makes it fails on certain data sets when using multiple threads)
- The “Mask lower case” option has been removed
- Tool to download BLAST databases from NCBI within the Workbench
- The BLAST Database Manager has been improved to show when referred databases are missing
Bug fixes
- Fixed: References between SNP tables and mapping results were broken when exporting-importing data.
- Fixed: Summary mapping report did not mention customized mapping parameters when running in batch mode
- Fixed: Various SAM/BAM import and export errors
- Fixed: When running adapter trimming searching on both strands, some reads were marked as “reversed” in the result. The only consequence is incorrect reporting of the number of forward and reverse reads in the mapping results.
- Fixed: Mapping report did not calculate read length for individual reads when using paired data
- Fixed: Alignment-based primer design failed for columns with many gaps
- Fixed: “Find Binding Sites and Create Fragments” did not find binding sites where the primer extended the 5′ end of the template sequence
- Fixed: DIP detection would crash on large data sets
- Fixed: Import of certain special Genbank files failed
- Fixed: RNA-Seq report with paired data: total number of reads was counted as pairs rather than individual reads
- Changed: RNA-seq transcript variants are named using ‘.’ rather than ‘_’. Note that it is not possible to create transcript-level experiments based on old and new samples
- Changed: Label of Illumina quality score selector has been changed to reflect the 1.8 update of the Illumina pipeline which now uses quality scores on the Phred scale
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 4.6.1
Bug fixes
- RNA-Seq would crash when selecting prokaryote as organism type
QIAGEN CLC Genomics Workbench 4.6
New and improved features
- Import of Ion Torrent data. A special importer has been made for Ion Torrent data in fastq or sff format. Read more.
- Checkbox for reporting merged SNPs. Read more.
- An adapter trim setting for SOLiD 50bp small RNA reads has been added. Read more about adapter trimming.
- SNP detection: When minimum paired coverage is set, reads from broken pairs will be completely ignored. Read more.
- DIP detection: Reporting of amino acid changes better reflects nomenclature consensus.
- RNA-Seq: the transcript-level sample includes a column for the ratio of unique to total transcript reads. Note that this means that results generated with this version cannot be used in older versions. Read more.
- Better support for color space SAM/BAM files.
- Local BLAST is faster when blasting against small databases
- Export in color space fastq format. When data is marked as color space, exporting in fastq format will produce a file with color encoding rather than bases.
- The plug-in used by the Workbench can now be installed using the Download Plug-ins tab in the Plug-in Manager.
Bug fixes
- Fixed: In certain situations, the data-specific parameters in read mapping did not take effect
- Fixed: When running read mapping in batch, the data-specific parameters were only applied to the first data set in the batch
- Fixed: Local BLAST did not work on Mac OS 10.5
- Fixed: Read mapping and RNA-Seq crashed because the reference could not be found.
- Fixed: Joined annotations did not get the right off set when inserting a sequence in the cloning editor.
- Fixed: Import of csfasta paired data crashed when one read had a dot in the beginning of the sequence.
- Import of paired qseq files: the read pairs are now joined correctly when importing paired qseq files
- Fixed: Import of GO annotation files did not work
- When processing tagged paired data sets, the status of the resulting files were not marked as paired. This means that subsequent analyses did not make use of the paired information.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 4.5.1
Bug fixes
- Fixed issue with synonymous overhang characters in cloning editor
- Graphics export now works with restriction sites shown
- Gene Ontology Annotations can now be imported
- CHiP-seq analysis adjusted for the use of gapped aligner – CHiP-seq analysis with previous version should be redone
- Improved support for Mac OS X systems with japanese language
- Improved support for systems with 512 MB of memory or less
- Blast: Fixed issue with BLAST database creation taking too long under certain circumstances
- Blast: Fixed issue concerning errors when input sequence names contained illegal characters
- Blast: Fixed issue with Extract And Open option being erroneously disabled
- Blast: Option to enter custom Entrez Query limits in Blast at NCBI re-introduced.
- Blast: Improved speed when using Blast results as input to wizards.
QIAGEN CLC Genomics Workbench 4.5
New features:
- Batching functionality of all high-throughput sequencing tools. It is now possible to start batch runs, e.g. running 12 samples through RNA-Seq Analysis in one go. Read more.
- RNA-seq: transcript-level expression values and support for paired data
- Included option to use paired information in RNA-seq. Read more.
- Expression values can now be stratified into transcript level expression values, both for single and paired reads. Read more.
- SOLiD data: new algorithm for mapping reads allows much higher fraction of reads to be mapped. Rather than a score limit, you now specify the stringency of the mapping using length and similarity fractions. Read more.
- Similarity fraction for mapping of long reads is now available as a user-specified option (this was previously automatically set). Read more.
- Simple reporting of putative gene fusions when using paired data. Read more.
- Note about compatibility: Results from earlier versions should not be compared with results from this version.
- BLAST tools have been redesigned
- New Blast Database manager for easy administration and management of local BLAST databases. Read more.
- More user-friendly way of creating and accessing local BLAST databases. Read more.
- Much more stable design of both BLAST at NCBI and Local BLAST when running large data sets.
- The SNP Annotation using BLAST tool has been discontinued.
- See migration notes for using your old databases here.
- SOLiD data: new algorithm for mapping reads allows much higher fraction of the reads to be mapped.
- Rather than a score limit, you now specify the stringency of the mapping using length and similarity fractions. Read more..
- Note about compatibility: Results from earlier versions should not be compared with results from this version.
- Multiplexing: Process tagged sequencing data
- It is now possible to import and use a file with bar codes and sample names. This makes it easier to process data with a high number of multiplexed samples. Read more.
- You can specify separate output folders for each sample, making it convenient to batch process the subsequent analyses.
- High-throughput Sequencing Import includes an option to place data into sub-folders (useful for batching subsequent analyses)
- Cloning tool re-designed to make it easier and faster to perform restriction cloning Read more
- Restriction sites used to select target vector and fragment. Read more.
- Sequences can now be displayed in circular mode in the cloning editor. Read more.
- Only one sequence displayed at a time (there is a list at the top of the view to switch between sequences). Read more.
- Option to clone several fragments and adjust overhangs and orientation in one dialog. Read more.
- New cloning tutorial available for a quick introduction. Read more.
- Improved layout of restriction site annotations
- Linear view: There is a new option for displaying labels as “Stacked” which means that the labels of overlapping cut sites can be discriminated. Read more.
- Circular view: There is a “Radial” option that will place restriction sites (and annotations) as close to the sequence as possible with a radial layout. Read more.
- Improved layout of general annotations
- Linear view: There is an option to separate restriction sites and annotations in separate layers.
- Circular view: There is a “Radial” option that will place annotations (and restriction sites) as close to the sequence as possible with a radial layout.
- Motif search available in Side Panel
- Dynamic annotations will be added for motifs defined in the Side Panel (similar to restriction sites). Read more.
- Use motif lists to add your own motifs to the Side Panel.
- Annotation table now available for sequence lists, mappings, mapping tables, BLAST results and alignments. Read more.
- SNP detection reports adjacent SNPs within the same codon as one SNP. Read more.
- De novo assembly: post-processing options when mapping reads back to contig sequences have been expanded. It is now possible to preserve the original contig sequences from the assembler (they used to be replaced by the consensus sequence from the mapping). Read more.
- Support for exporting tables as tab-delimited files.
- Audit option: manual editing of sequences will be recorded with an annotation on the sequence (this has to be switched on in the Preferences dialog). Read more.
- The default database of restriction enzymes can be expanded (requires manual edit of database file). Read more.
- The default set of codon frequencies can be expanded (requires manual edit of table files). Read more.
- Improved option to export and import Side Panel settings. Read more.
- Memory allocation: the default memory allocation for the Workbench changes from 75% to 50% of available physical memory with a maximum at 50 GB.
Bug fixes
- SNP detection bug with corrupt complementary CDS annotations.
- SNP detection: color correction errors now count when filtering SNPs (this has become important with the new mapping algorithm for SOLiD data).
- The molecular weight calculation for the sequence statistics report is more accurate and is now reported for both single- and double-stranded molecules.
- Various bug-fixes
QIAGEN CLC Genomics Workbench 4.0.3
Improvements:
- Enhanced usability of GSEA analysis wizard: The “Remove duplicates” option is now a check box to switch on and off. Before, the choice of switching off was implicit by choosing Feature ID as the identifier. Now this is explicit using a check box.
- Improved performance rendering large tables, particularly those with html formatting.
Bug fixes
- SNP and DIP detection previously ignored overlapping pairs. Now they count (as one read) if they fulfill the quality criteria (SNP detection). In cases where the two parts of the pair disagree, the pair does not count. We recommend running all SNP and DIP detections based on overlapping pairs data sets again (this would be the case if the minimum distance when mapping the reads is lower than two times the read length). There is no need to re-run mappings – just the SNP/DIP detection.
- ChIP-Seq: “nearest gene” reported not always right. This was the case for the last peak on each chromosome and also in cases where the order of the gene annotations in the reference file did not correspond to the order of the annotations on the actual sequence. We recommend running all ChIP-Seq Analyses again to get the correct reporting of nearest genes. There is no need to re-run the mappings.
- SNP Annotation Using BLAST failed with certain query sequences (the result could not be shown)
- Fixed crash of 454 import on certain Linux and Mac systems
- SOLiD import accepts read names with -P2 at the end
- Improved import of SAM/BAM files:
- Better support for files from SOLiD Bioscope
- Preliminary support for Complete Genomics files (The actual alignment is not represented completely – insertions that relates to a consensus sequence will be represented as unaligned ends in the imported mappings. This should be taken into account when looking for variations.)
- In the Sequencing Data Analysis-> Assemble Sequences to Reference, the conflict resolution was disabled when not including a reference sequence in the output.
- When importing sequences from Genbank files, mRNA annotations now prefer taking name after “locus_tag” rather than “product”.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 4.0.2
Bug fixes:
- Fixed error when importing 454 SFF files
- Fixed error when importing SOLiD data with quality scores when the reads had “.”
- Fixed error mapping large data sets on Windows 64-bit systems
- Fixed error when opening tables generated by the Transfac plug-in and the primer search tool
- Fixed errors when running analyses on experiments generated from RNA-Seq results
- Genbank export of annotations on the negative strand were not in the right order
- Fixed memory and performance issues related to import of many sequences, eg. from ACE files.
- Various minor bug fixes
QIAGEN CLC Genomics Workbench 4.0.1
New features:
- Improved performance of table filtering. Removed limit on the number of rows that can be filtered.
- Option to search for read names in mapping results (and also sequence lists and BLAST results).
- Improved performance of conflict table.
- Better layout of graphics export and printing of mapping results: reference and consensus sequence repeated to provide an orientation context on all pages.
- Extracting consensus sequence of mapping tables is now running in the background to provide a better user experience.
Bug-fixes:
- Problem regarding mapping of base-space data erroneously in color space. Under special circumstances, the user settings file contained the wrong default parameters and caused the mapping to be in color space rather than base space. We recommend running mappings performed in Genomics Workbench 4.0 again with Genomics Workbench 4.0.1.
- Fixed problem with SNP detection on large data sets suddenly running very slow.
- Scalability improvements in mapping and de-novo assembly with drastic improvements in performance
- Fixed various problems regarding editing alignment and read mappings.
- In the detailed mapping report, the zero coverage section was empty when there was only one reference sequence.
- Various smaller bug fixes.
QIAGEN CLC Genomics Workbench 4.0
New features:
- Small RNA Analysis
- Brand new tool for analyzing small RNA (including miRNA) data sets
- Adapter trimming
- Counting of tags
- Annotation using miRBase and other resources
- Visualization of miRNA variants
- Expression analysis
- Renaming and redefining concepts
- Reference assembly -> Read mapping. We adjust to the common term used today for aligning sequencing reads to a reference sequence.
- Contig -> Read mapping. The result of read mapping was previously called a contig (i.e. the alignment of reads to a reference sequence). Now, the term “contig” is used exclusively for results from de novo assembly. The result of mapping reads is called a “read mapping”.
- Paired-end -> Paired. We now distinguish during import between Paired-end and Mate pair data. Once imported, there is no difference, and they are both called “Paired”.
- Trim redesign
- Brand new adapter trimming including library of adapters
- Performance improved
- Multiple data sets supported as input
- Summary report of the trimming
- Improved SAM/BAM import
- BAM format now supported, both import and export
- More robust implementation
- Better performance
- Preview panel making it easier to match reference and SAM/BAM file
- Reference sequence name spaces automatically converted to underscores when comparing with SAM/BAM file
- High-throughput Sequence Data Import
- Gzip support
- SOLiD fastq format supported (when downloading SOLiD data from Sequence Read Archive, SRA). Read more
- 454 paired data: Support for both FLX and Titanium linkers (also the possibility to add custom non-palindromic linkers). Read more
- Improved support for SOLiD paired-end data. Read more
- Support for data from Illumina Pipeline 1.5. Read more
- Import of tabular alignment files: it is now possible to specify a read name from the file to be imported with the read. Read more
- Better compression of reference sequences (lower memory footprint and disk space usage)
- Performance improvement of read mapping algorithm
- Improved memory management in general: lower memory footprint and shorter management overhead pauses.
- Improved memory handling of large tabular data sets.
- RNA-Seq:
- New de novo assembler has replaced the old one, making the de novo assembly plug-in obsolete. Read more
- SNP and DIP detection
- Dialog usability improved by adding an advanced panel for advanced users
- Minimum counts have been made more clear by creating a Minimum and Sufficient count
- Contig report has been renamed to Detailed Mapping Report and has been split up to support data with many reference sequences (e.g. when mapping against contigs from de novo assembly). Read more.
- Redesign of product graphics
- Improved consistency of data handling including faster listing of folder contents
- Performance when saving small files significantly improved
- Performance of ACE export improved, especially for long reference sequences or read mapping tables.
- Sequence annotations are packed to lower memory footprint and disk space usage, especially for SNP, DIP, and Conflict annotations.
- Improved performance of reading data files from shared drives.
- REBASE collection of enzymes updated to latest version
- BLAST: In the overview BLAST table, it is now possible to extract query sequences. Read more
- Process tagged sequences: it is now possible to input barcodes on a comma-separated list. Read more
- Folder structure (expanded/collapsed folders) is preserved through the life-time of a wizard (e.g. when selecting input data and reference for read mapping)
- Find in Side Panel: separators are allowed when performing position search (e.g. 1.000.000 or 1,000,000 or 1’000’000 or 1 000 000). Read more
- It is now possible to pause and restart processes involving read mapping and de novo assembly (except the accelerated mapping part of the analyses). Read more
- Normalization of expression data: it is now possible to do “Reads per 1,000,000″-style normalization of count-based data. Read more
- New preference group called “Data” to hold information about adapter sequences and Gateway cloning primer additions. Read more
Bug-fixes:
- Print of folder content now takes settings in the Side Panel into account
- Process tagged sequences of paired data: it was not possible to specify one read without sequence (necessary for Illumina barcodes using paired data)
- Better memory handling in conflict table
- Read mapping: fixed windows errors on large data sets, fixed color space errors
- RNA-Seq: max number of mismatches when running color space data could be set to three in the dialog but did not take effect. Now the limit at 2 is enforced in the dialog.
- Find in Side Panel: space are now allowed
- Genbank import: sequence name (LOCUS) was truncated to 18 characters
QIAGEN CLC Genomics Workbench 3.7.1
Bug fixes:
- Fixed error concerning naming of dots in PCA plot
- RNA-seq: reads that extend over more than two exons are now shown correctly
- Error in folder editor that prevented all elements to be shown is fixed
- Documentation on trim using quality scores has been updated
- Names of results from reference assemblies are now named according to the input data
- Fixed error preventing manual editing of contigs under special circumstances
- Various bug fixes
QIAGEN CLC Genomics Workbench 3.7
New features
- Global alignment for long reads when running reference assembly algorithm
- Gapped color-space alignment when running reference assembly
- Significantly improved speed of all operations with large data sets
- RNA-Seq analysis:
- Performance optimization: A run of 44 mio reads against the mouse genome now takes 32 minutes on an eight-core computer with 32GB RAM. This used to be more than two hours. With the previous version, a lot of small temporary files were created and deleted, and this took a long time and impacted the comupter’s general responsiveness. In comparison, only a small fraction of temporary files are created with the new version.
- New option to specify minimum required exon-overlap of reads spanning an exon-exon junction. Read more…
- New RNA-Seq report which gives statistical overview of the assembly process. Read more…
- Result table now reports number of exon-exon- and intron-exon junction spanning reads.
- Result table now reports chromosome location of genes. Read more…
- Visualization of reads that span exon-exon junctions. Read more…
- Reads mapping equally well to intron-exon and exon-exon boundaries are now identified as unique exon-exon spanning reads.
- RPKM is better defined in the user manual. Read more…
- Default setting for multi-hits is now 10 as in the Mortazavi paper Read more…
- Very short reads are now assembled allowing more mismatches.
- Expression analysis:
- Volcano plots: you can now choose the values to plot on the x-axis. Choose between “Difference” and “Fold change”. Read more…
- Table view of bar plots shows the same intervals as are shown in the bar plot.
- Generic importer for expression array data in tabular format. Read more…
- Generic importer for expression experiment annotation data in tabular format. Read more…
- Gene Ontology (GO) files can now be used to annotate an expression experiment. Read more…
- Tag profiling: You are no longer allowed to annotate tag samples, only experiments
- Side panel of experiment table has been re-organized to provide better overview. Read more…
- Import high-throughput sequencing data
- Import tool moved from Toolbox to File menu and tool bar. Read more..
- Import and export of the SAM alignment format. Read more…
- Import of alignment data in tab-delimited format, including the ELAND alignment format. Read more…
- Import of Illumina QSEQ file format. Read more…
- Linker in 454 data is also found for non-perfect matches Read more…
- Enhanced visualization of contigs:
- Un-aligned nucleotides on the inside of paired-end reads are now shown
- Paired-end reads have a single line connecting the pair rather than gaps
- Drag handles to move the aligned/unaligned border are only shown when you can see the bases of the reads. This means that you need to have zoomed in to 100% or more and chosen Compactness levels “Not compact” or “Low”. Otherwise the handles for dragging are not available (this is done in order to make the visual overview more simple). Read more….
- It is possible to display pairs that overlap
- The unassembled reads from an assembly now preserves their paired-end status (this also means that you can get two lists – one with pairs and one with the remainder of the broken pairs
- SNP detection output table now reports if multiple non-synonymous SNPs exist in same codon
- SNP detection dialog: Quality filtering is no longer disabled when quality scores are missing. Due to performance issues it is not possible to check if quality scores are present. The SNP detection will just omit the quality score filtering if quality scores are not present.
- SNP detection: possible to detect variants with frequency less than 1 percent.
- Contig report now includes information about coverage for both covered regions and whole reference. Read more…
- Opening consensus sequence including gaps will also put Ns before the consensus sequence starts and after it ends
- The trim functionality now includes the option to trim away a predefined number of nucleotides from either end of a read. Read more…
- Gateway cloning. Simple and easy-to-use support for creating Gateway entry and expression clones. Read more…
- Search for matches among all your saved primers. The Find Binding Sites tool has been greatly improved to now allow you to search among all your primers. In addition, you also get a tabular output of the binding sites and possible fragments. Read more…
- In silico PCR: create PCR product based on primer pair and template sequence (including primer extensions). As part of the improved Find Binding Sites and Create Fragments tool, you can extract the PCR product from the list of fragments through a right-click menu. Read more…
- Check primer specificity. As part of the improved Find Binding Sites and Create Fragments tool, you can search with a primer pair in a list of potential target sequences and see an overview table of binding sites and mismatches as well as potential PCR fragments. Read more…
- Deployment
- You can set a path to the default data location used when the Workbench starts for the first time. This is a feature to help system administrators control where new installations per default save their data. Read more…
- Support for removing tools accessing the internet (NCBI BLAST, update notifications etc). Read more…
- General import and export
- Support for import of complex regions from GFF files
- Export tables and reports in Excel format.
- Import section of user manual re-structured to provide better overview Read more…. Expression data importers are now described in technical details in a separate section Read more….
- You can now export multiple sequence lists in fasta format
- Forced import of zip files is now supported (it will force import the contents of the zip file)
- The standard import now accepts gzip and tar files as well as zip
- If a forced import fails, there will be more technical information about what went wrong, allowing you to identify bad formatting of the import files
- Both Genbank and gff importer now makes several attempts at naming genes that do not have a gene name. It will iteratively try the following qualifiers: “product”, “locus_tag”, “protein_id” and “transcript_id”
- When importing genbank files where the length stated does not match the actual sequence, a warning is shown but the sequence is accepted.
- When exporting in csv format, the Locale settings are used to determine whether comma or semi-colons should be used as delimiter (comma used for US locales)
- GFF plug-in has been updated to accept complex annotations
- Miscellaneous
- Advanced retyping of annotations using the annotation table. Read more…
- Improved reporting of situations when a full disk prevents saving of data
- Downloading sequences using drag and drop from the search table no longer creates a “Downloading…” node in the folder. The download process can be monitored in the Processes tab.
- Primer design now supports PCR fragments longer than 5000 bp.
- Extract Sequences moved from File manu to Toolbox-> General Sequence Analysis. Read more…
- Better progress feedback on various dialogs
Bug-fixes
- Problem with order of genes when setting up RNA-Seq experiments. If the order of input sequences was not the same for all samples, the experiment would be wrong.
- Fixed wrong orientation of SOLiD mate-pair data
- Fixed problem with naming of tabs. The fix means that on Windows and Linux unsaved data now gets a * rather than make the tab name bold and italics. (This has always been the behavior on Mac OS X).
- Fixed problem displaying the “Copying…” label when copying data and then updating the folder
- Misleading label when assembling reads shorter than 15 bp. Now it says that these reads will be ignored in assembly
QIAGEN CLC Genomics Workbench 3.6.5
New features
- Export of annotations in GFF format (note that annotations with joined regions are not supported)
- Export of sequence data in fastq format
- Now possible to perform detailed manual editing of contigs with up to 100,000 reads
- Improved performance when zooming large contigs displaying a coverage graph
- Now possible to change the linker used when importing 454 paired-end data
Bug-fixes
- Fixed problems importing expression annotation files
- Fixed error when trimming for vector sequences
- Fixed tblastn numbering issue
- Various bug-fixes
This update is recommended for all users.
QIAGEN CLC Genomics Workbench 3.6.1
Issues resolved
- Problem when adding annotations to an Illumina array file
- Error handling annotated tag-data
- DNA strider files could loose name upon import
- Rare misplacement of annotations when editing very large sequences
- Problem when importing color space data alongside a .cas file
QIAGEN CLC Genomics Workbench 3.6
New features
- Tag profiling: tag-based transcriptomics. Read more…
- ChIP-Seq analysis is now able to (optionally) use a control sample. Read more…
- Advanced view of elements in a folder including batch editing. Read more…
- Create new contig from selection. Read more…
- Import high-throughput sequencing data: you can now import without quality scores. Read more…
- Reference assembly of short reads: user can now choose between local and global alignment Read more…
- Reference and de novo assembly output options have been changed so that you no longer need to decide whether you want a contig table or single contigs. Whenever more than one contig is produced, the Workbench automatically creates a contig table Read more…
- Contig report for reference assemblies: GC content of the reference sequence now included
- Extract sequences improvements Read more…
- Now contig tables, overview BLAST tables and RNA-Seq samples can be used
- User feedback in the dialog is improved
- Problem with extracting paired-end reads correctly is fixed
- mRNA Sequencing tool changed name to RNA-Seq Analysis to reflect the consensus about this naming in the NGS community
- Heat maps and clustering improved:
- You can now perform different clusterings on an experiment and save them all. In the Side Panel you can switch between the different clusterings to show the corresponding heat map. Read more…
- Terminology change in the clustering dialogs: “similarity measure” and “cluster distance metric” are replaced by “distance measure” and “cluster linkage”, respectively.
- Annotating samples or experiments for expression analysis:
- This is now possible even if the number of features doesn’t match the number of annotations
- You can now decide which column in the annotation file to use for matching to the sample or experiment. Read more…
- Because of this extra option, you can no longer include an annotation file when setting up an experiment. You need to add the annotations afterwards
- Microarray import improved:Added support for import of more versions of native Illumina BeadChip and GEO expression files
- “Find” in text view now accepts Enter as command to find the next hit
- Importing VectorNTI archives previously resulted in a sequence list. Now it imports as single sequences.
- Import list of sequences in csv format: each line in the file represents a sequence with name, optional description, and sequence. Typically useful for importing lists of oligos.
- You can now drag results from NCBI searches into the view area to open directly (previously you could only drag into a folder to save)
Bug fixes
- Assembly against many reference sequences could run out of memory. This is been significantly improved.
- Integration with the Genomics Server: fixed an error when selecting contigs from a contig table for analysis. This is no longer possible (i.e. you have to save the contig first).
- Microarray import: Fixed a bug that prevented import of expression data with white spaces in column names.
- Various bug fixes
QIAGEN CLC Genomics Workbench 3.5.1
Issues resolved
- Rare failure when importing very large Illumina files
- Memory problem when mapping against many(>20.000) references
- Rare concurrency issue when translating DNA->protein in e.g. SNP detection
- Problem rendering scatter plots without lines
- Graphics export of contigs
- ChIP-seq table did not show the right distance to nearest gene
QIAGEN CLC Genomics Workbench 3.5
Data formats
- Data generated with version 3.5 cannot be read in earlier versions
New features
- New ChIP seq tool
- Contig report that records various statistics and graphs for contigs, including e.g. N75, N50 and N25 statistics, coverage distribution, contig size distributions.
- Extension of RNA-seq functionality to also handle color space data
- RNA-seq now outputs and can use unique and total gene/exon reads as well as median coverage as measures of expression.
- Implementations of statistical tests for comparing expression levels of count-based expression measures as may be produced in RNA-seq
- Kal’s test for differences of proportions in single sample to single sample comparisons.
- Baggerley’s test for differences of proportions in two groups with replicates comparisons.
- New filter options in SNP and DIP detection.
- SNP and DIP detection: as supplement to minimum variant frequency in percent, you can also specify a minimum variant count.
- SNP detection: just as DIP detection there is a maximum coverage filter
- SNP detection: there is now a “ploidy” setting just as for DIPs. This is used to mark SNPs as “complex”. The “Genetic code” drop-down box has been moved to step 3.
- Alignment of SNP and DIP tabular output to allow for easy merging of SNP and DIP tables into complete variance tables
- Support for Sanger Institute defined FASTQ and new Illumina format (QSEQ)
- Import of NGS data now allows discarding of sequence names for large savings in disc space and processing time
- Performance optimization when adding sequences to a sequence list. This now works for NGS data also.
- SNP and DIP detection can now be performed directly on RNA-seq output contig tables
- Exporting coverage graph to csv file now has an option to include or exclude gaps. Excluding gaps will make the file use the reference sequence coordinates
- Much improved memory performance and processing time of large data sets
- Improved performance when handling trace data. Trace now take up 50 % less disk space. This means that the data is opened and saved much faster and less memory is used.
- You can now specify minimum length of contigs to be reported in de novo assembly
- “Reverse Contig” has been renamed to “Reverse Complement Contig.” Functionality is un-changed.
- Import of Illumina expression bead arrays and bead array annotation files
- Import of Affymetrix Chp files (CHP / PSI)
- Transformation of expression values now supports square root transformation.
- Better feedback on processes: there is a tool tip showing details and start time.
- Translation of DNA to protein in sequence views can now be set to follow existing CDS/ORF annotations.
Bug fixes
- Fixed error when trimming reads for vectors
- Fixed out-of-memory error in mRNA sequencing
- Fixed error in mRNA sequencing when gene annotations were present outside the reference sequence
- Fixed error when parsing files from Clone Manager (cm5-files)
- UniProt search works again
Note
- This version introduces a new data format which is not readable by older versions of the software.
QIAGEN CLC Genomics Workbench 3.2.0
New features
- DIP detection – automatic examination and reporting of insertions/deletions in reference assembly contigs. In the Toolbox under High-Throughput Sequencing. Can be used together with SNP detection to systematically examine positions where the reads differ from the reference sequence. This eliminates the need for manually inspecting gaps and conflicts in the contig.
Learn more… - 15% less disk space usage of imported NGS data sets.
- 25% faster assembly of NGS data sets.
Bug fixes
- Under certain circumstances, trim failed on Mac OS X
- mRNA Sequencing: Downstream/upstream options should be disabled when using un-annotated reference sequences
- Color space information now shown per default for mixed data sets including color space reads
- De novo assembly report: sometimes number of reverse matches were reported as negative
- Corrections to the ACE export
- Better performance of files with many annotations
- Fixed an error in RNA Structure Evaluation
- Fixed error and improved performance of Join Sequences tool
- Fixed error in Find Binding Sites on Sequence: no longer distinguish between lower and upper case
- Various small fixes
QIAGEN CLC Genomics Workbench 3.1.0
New features
- Support for reference assembly of SOLiD data in color space (learn more). You need to reimport your data to make use of color space.
- Viewing of color space data in contig results (learn more).
- Option of using non-annotated sequences (e.g. EST-library) for RNA-seq (learn more).
Bug fixes
- Assembly and mRNA sequencing errors (“Empty match not allowed” and “Could not read from temporary file”) fixed
- Under special circumstances, quality scores were not aligned correctly
- SNP detection with an RNA sequence as reference failed
- SNP detection performance for annotated sequences improved
- Find in the Side Panel did not support spaces when searching for annotations
- In the cloning editor under special circumstances, an error occurred when replacing a selection with fragment
- Sequence statistics codon count were not correct when using RNA sequences
QIAGEN CLC Genomics Workbench 3.0.1
Updates
- Fixed an error when trimming NGS data
- Fixed an error in the contig view when deleting a sequence that was selected
- Fixed an error when changing the filter of a sorted table
- Fixed error when assembling a mix of paired ends and single reads under special circumstances
- Fixed error in import of cas file based on SOLiD data from the CLC NGS Cell
- Fixed a rare error when running SNP detection on a contig table
- Made mRNA Sequencing accept a sequence list as reference
- Fixed table view of contigs: sometimes an empty entry would appear which did not reflect the reads at the current position
QIAGEN CLC Genomics Workbench 3.0
New features
Transcriptomics
- Support for both microarray- and sequencing-based (RNA-Seq) expression data
- Visualization: Interactive heat map, table and scatter plot views
- Transformation and normalization tools
- Quality control tools including principal component analysis, MA- and boxplots
- Experimental design tools for two- or multiple group comparisons
- T-tests and ANOVA analysis with support for paired/repeated measures
- Multiple testing corrected p-values (Bonferroni and/or FDR)
- Clustering algorithms: hierarchical clustering, k-means and Partitioning Around Medoids (PAM) with support for various distance and linkage measures.
- Ability to import NetAffx annotation arrays and adding annotation to experiments
- Tools for Gene Set Enrichment Analysis (GSEA) and for Hyper-Geometric based tests for overrepresented annotation categories (e.g. ‘GO’stats or specific protein pathways).
- Ability to work with Expression Arrays and RNA-seq results at the same time, enabling comparison of results
- Facility for annotating sequences from GFF or GTF files (as used by Ensembl and the UCSC Genome Browser), useful for annotating reference genomes before assembly
- Statistics on numbers of matching and unique gene, exon and exon-exon boundary spanning reads
- Calculation of gene expression measures (RPKM) from mRNA sequence data and generation of gene expression profiles (RNA-Seq analysis)
- Discovery of novel transcripts/exons through mapping of mRNA reads to whole chromosomes or genomes, comparing matches with known exons
- Interactive views of assemblies and derived gene expression data
Assembly
- Long reads assembly significantly faster
- No upper limit on number of reads in de novo assembly (there is still a limit regarding the size of the genome)
- New simple output option for de novo assembly: only generate consensus sequence instead of full contigs. At the last step of the de novo assembly wizard, you can now choose between “Full contigs” and “Simple contig sequences”. The latter option will result in a sequence list with all the consensus sequences. This is much faster and less the demanding for the computer. You can always create full contigs later by running a reference assembly with the consensus sequences as references.
- Quality of trimming for contamination from own sequences improved. It is now possible to trim off smaller primer sequences.
- SNP detection:
- Accepts multiple contigs and table of contigs (the table output includes a new column for the name of the contig)
- For coding regions (annotated with CDS/ORF annotations): changes on the amino acid level as a consequence of a SNP is now reported (both in the table and in the annotations).
- General performance improvements
- Right-clicking a graph (e.g. coverage) on a contig lets you export the data points to a csv file.
- Contig table shows latin and common name of reference sequences. This is beneficial if you perform a reference assembly against references from different species.
- Multiplexing – Process Tagged Sequences now has an option to filter away groups with few sequences. This is an advantage if you have very ambiguous barcode definitions where sequencing errors would lead to a lot of “false” groups. These groups can now be filtered because of their small size. (The option is called “Minimum number of sequences” and is found in the third step of the wizard.)
- Coverage info is now included when you export a table of contigs in ace format. (It contains a “Contig Tag” of type comment (a CT clause) containing a textual description of the coverage in the form “Average coverage: 14.65″. )
- Coverage info is put into the description of consensus sequences extracted from a table of contigs (this means that if you export to fasta, this information will be included).
- Importing assemblies with more than one contig creates multi contig tables (ace and cas file import)
Improved user experience of processes
- Non-modal feedback from processes:
- When there is a message (e.g. from a BLAST search: not hits found)
- If you have chosen to save the results in the last step of the wizard, you will be notified when the process is done.
- Processes running on the CLC Science Server will notify when they are done.
- Possibility to open results by clicking the button next to the process
- Possibility to find and select results in the Navigation Area by clicking the button next to the process
- You can see a log of your process by clicking the button next to the process (even if you did not choose to see the log in the last step of the wizard)
Support for interacting with CLC Science Server
- Read more at http://www.clcbio.com/index.php?id=1260
3D editor re-design
- The 3D editor now allows you to select individual structure subunits, residues, active sites, disulfide bridges and even atoms, and to customize their appearance
General improvements
- Limited mode: when using a license server – if there are no more licenses left, you can still access your data. The Workbench will then run in Limited mode where only a few tools are available (corresponds to the tools found in CLC Sequence Viewer). Click “Limited Mode” in the license dialog.
- Tables:
- New advanced filter to use numerical data for filtering and to combined several filter criteria. Click the small button next to the normal filter to see the advanced filter.
- Visual feedback when sorting and filtering tables
- Improved automatic detection of column width
- Performance of graphs and plots improved
- Local BLAST is upgraded to use NCBI BLAST version 2.2.19
- More elaborate error reports including error logs
- You can specify which folder the Workbench should use for temporary files
- Extract sequences from a sequence list, contig or alignment by right-clicking the white empty space. You will then be able to extract the sequences into a list or as separate sequences.
- The “Find” option in the Side Panel of sequence views automatically detects if you have entered a position instead of a sequence.
Plug-ins
- Extract Annotations plug-in has been improved:
- Possibility to specify the naming of the sequences (based on annotation name, type etc)
- Performance improvements to make it possible to extract annotations of large genomes.
- MLST plug-in: various bug fixes
Bug fixes
- Locale settings were not automatically set right on the first start-up. The locale settings determine whether . or , should be used for before decimals. For new installations of the Workbench, it will now be set to the locale of the computer’s operating system. For existing installations, you will have to change this in the Edit->Preferences dialog.
- Fixed problem when BLASTing with an empty sequence
- Various performance improvements and bug fixes
QIAGEN CLC Genomics Workbench 2.1.1
Updates
- Reference assembly: fixed an error which meant that in some cases, reference assembly produces different results depending on the amount of memory available.
- SNP Detection: Reads were dismissed because of gaps even though the reference sequence also had gaps.
- The Side Panel’s Find only high-lighted the first hit. This is now fixed.
- Fixed error when importing 454 fna/qual files
- Extract sequences: fixed an error when extracting paired-ends sequences from contigs and sequence lists
- Local BLAST: solved problem applying command-line parameters, now a checkbox determines whether command-line options should take effect
- BLAST: it was possible to use a BLAST result as input and database
- Trace data: fixed an error when deleting parts of an unsaved sequence with traces
- Better performance when zooming a dot plot
- Better performance when using the Side Panel’s Find in large contigs and sequence lists
- When right-clicking a CDS annotation and translating into protein, gaps were erroneously introduced into the protein sequence
- There was an error related to selecting sequences in the Cloning editor
- Multi-select (using Ctrl / Command key) did not work for sequence lists
- Various bug fixes
QIAGEN CLC Genomics Workbench 2.1
Updates
- Support for paired-end Sanger reads
- Support for paired-end FASTA reads
- Improved user interface of High-throughput Sequencing Data import dialog
- Assembly report includes information about assembly parameters
- Corrected error when opening multiple consensus sequences
- Fixed problem with import of NGS data in FASTA format
- Improved error handling for assembly
- Fixed issue with contig selections while scrolling
- Corrected error introduced by overlapping mate-pairs
QIAGEN CLC Genomics Workbench 2.0.4
Updates
- Fixed problems when assembling large or mixed data sets
- Ensured correct setting of limit for assembly of short reads
QIAGEN CLC Genomics Workbench 2.0.3
Updates
- Fixed problems with de novo assembly
- Status properly updated when a conflict is resolved
- Assembly programs now run on older version of Linux
QIAGEN CLC Genomics Workbench 2.0.2
Updates
- Fixed problems when scrolling very large sequences
- Fixed problem when importing very large GenBank files
- Improved possibilities for navigating contigs
- Improved stability when importing non-standard data
- Improved memory handling and stability of assembly algorithms
- Support for import of Illumina long insert paired-end data
QIAGEN CLC Genomics Workbench 2.0.1
New features
General performance improvements
- Improved performance when handling large data sets
High-throughput Sequencing Assembly
- Support for reference assembly against the human genome (i.e. reference sequences of any size)
- New and much faster algorithm for assembling short reads (less than 55 nucleotides)
- Significant performance improvements of reference assembly.
- True support for reference assembly of mixed data sets in one go. Sequencing data from different platforms (and both single and paired ends) can now be assembled together. Previously this could be accomplished by making separate assemblies and joining the contigs afterwards, but now this process is automated. Read more…
- Reference sequences can be masked based on annotations. This could be used to e.g. mask off repeat regions or only include exons in the assembly. The reference sequences have to be annotated in order to use masking. Read more…
- Assembly report includes the number of contigs produced
- Contigs from a reference assembly can also be shown in an overview table. This was previously only possible for De novo assembly. In the last step of the reference assembly wizard, there is an option: Create overview table including all contigs. Read more…
Import and export
- Support for high number of Sanger sequencing data with trace information. Using the Import functionality under High-throughout sequencing you can import huge amounts of e.g. abi files. This will import quality scores but discard trace data to produce a sequence list in the Workbench which makes it possible to assemble thousands of Sanger reads. Read more
- SOLiD import of paired-ends data improved. In some cases paired-ends data also contains single reads which are now removed during import.
- Possible to import cas files created by the CLC NGS Cell (the command-line version of the assembly algorithms of the QIAGEN CLC Genomics Workbench)
- Contigs can be exported in ACE format
- Improvement of ACE file importer
- Trim information in sff files can be used during import
- Support for import of SCARF files (from Illumina Genome Analyzer systems)
- Export of graph data points in csv format
Various high-throughput sequencing improvements
- SNP detection now also reports position relative to the reference sequence as well as the consensus sequence. The table includes both positions per default (can be checked on and off), and the user decides where annotations should be added. Read more…
- SNP detection table includes information about the name of annotations covering the SNPs. Previously only the annotation type was reported.
- Trimming now also supports paired-ends data. If one of the reads in a pair is trimmed off, the whole pair will be removed.
- Partially matched reads are reported as a graph along the contig.
- Possibility to open consensus sequence with gaps. Right-click the label of the consensus sequence in the contig view and select: Open Copy of Sequence Including Gaps. The gaps will be represented by Ns in the new sequence.
- Dynamic consensus graph removed from contig view. Since contigs now have a “real” consensus sequence which is also updated to reflect changes in the reads, the dynamic consensus sequence which is switched on in the Side Panel has been removed.
- Annotations can be transferred from reference to consensus sequence in bulk. Right-click one of the annotations and choose “Copy to Consensus Sequence” or “Copy Annotations of Type xx to Consensus Sequence”.
- Multiplexing now also possible for paired-end reads
Plug-in updates
- New plug-in! GFF/GTF support: You can now annotate a sequence using a GFF/GTF file. The plug-in is available for all Workbenches (not CLC Sequence Viewer). Once installed, you find it in Toolbox->General Sequence Analysis-> Annotate from GFF/GTF File. Read more…
- Extract annotations plug-in updated: it now uses the name of the annotation as the name of the new sequence.
Annotation handling
- Annotation table has been greatly improved:
- supports very long, heavily annotated genomes
- usability of the filtering has been improved with feedback on the filtering process
- Advanced renaming options. Read more…
Bug fixes
- Fixed bugs related to contig editing.
- Various bug-fixes.
- Fixed problem with import in 2.0 release.
QIAGEN CLC Genomics Workbench 1.1.1
New features
- Scrollbars can be adjusted manually
Problems fixed
- Fixed problems when aligning sequences with lowercase characters
- Fixed import of trace files without quality scores
- Fixed problem when removing location
- A new sequence list can be created from a selection in the table view
- Better memory handling and managment of large contigs
- User definable scrollbar areas for contig views
- A few other minor bugs have been fixed.
QIAGEN CLC Genomics Workbench 1.1
New features
- Increased speed of de novo assembly
- Option of generating a contig table as a result of de novo assembly. This way the workspace is not polluted by a large number of contigs.
- Multi contig table has options for opening contigs or extracting consensus sequences for further analysis
- Much smoother scrolling on contigs when there is very high coverage
Problems fixed
- Problems with import of .ACE files
- Problems with excessive generation of files when doing de novo assembly of short read data
- Problems with the use of quality scores in SNP detection