*Tools marked with an asterisk were available to earlier Workbench versions via the Advanced RNA-Seq plugin. They can now be found in the Toolbox in the RNA-Seq Analysis folder.
These tools automatically account for differences due to sequencing depth, removing the need to normalize input data. They work with existing RNA-seq TE and GE tracks. Changes made in this release mean that outputs from the Differential Expression for RNA-Seq tool can now be used as inputs to the Extract Reads Based on Overlap tool.
The three workflows Identify and Annotate Differentially Expressed Genes and Pathways for human, mouse, and rat have been replaced by three new workflows of the same names. The new workflows benefit from the inclusion of new RNA-seq tools. (See the New Tools section.)
RNA-Seq Analysis
- The RNA-Seq Analysis tool now supports RNA spike-ins, such as ERCC and SIRV, for quality control. This makes it possible to validate RNA-Seq experiments by comparing known spike-in concentrations to measured transcript concentrations. Spike-ins can be imported using the new RNA Spike-ins Import tool.
- The RNA-Seq Analysis report has been revised and updated:
- We now show the distribution of the biotypes that the reads mapped to.
- The strand specificity of the mapped reads is now reported. Ready-to-use workflows listed under the “Whole Transcriptome Sequencing” folder of the Workbench Toolbox now support strand-specific RNA-seq protocols by allowing the “Strand Specific” parameter to be set.
- Transcript coverage plots make it possible to detect and visualize 5′ and 3′ coverage bias.
- For paired-end reads, we now detect and warn about potential adapter read-through.
- A biotype column is now available in the Expression Track tables produced by the RNA-Seq Analysis tool, when biotype information is available.
- The Mapping options of the RNA-Seq Analysis tool, “Map to gene regions only” and “Also map to inter-genic regions”, have been removed. The tool now runs by mapping reads to the full reference supplied, which is equivalent to choosing the recommended “Also map to inter-genic regions” option in earlier versions.
- The RNA-Seq Analysis tool now always uses the “Expression level” option “Use EM estimation (recommended)” to quantify expression. This is more accurate than the previous default option. Differences are especially noticeable for Transcript Expression (TE) tracks.
- The RNA-Seq Analysis quantification by EM estimation now runs faster.
- In RNA-Seq analyses, reads that map uniquely to a genome position are now always marked as unique. Previously, a uniquely mapped read would be marked as ambiguous if it mapped to a position with multiple overlapping genes.
- Exon IDs will no longer be included in the ENSEMBL column of transcript expression (TE) tracks generated by the RNA-Seq Analysis tool. Gene and transcript names will continue to be listed and hyperlinked in this column.
Import/Export
- A tool to import PacBio data is now available at Import | PacBio.
- Usability aspects of data association using the Import Metadata tool have been improved, including adding a preview of data items to be associated with particular metadata rows.
- Fasta is now the default format the first time the Import | Tracks tool is invoked (was GFF2/GTF/GVF in earlier versions).
- The GFF2/GTF/GVF tracks importer can no longer be used to import GFF3 format files. The new GFF3 tracks importer should be used for this instead.
- The GFF3 importer has been updated with respect to the handling of CDS features. In earlier versions, CDSs with different IDs but the same parent gene would always be merged into the same CDS feature during import. This behavior will still occur in cases where all CDSs in the GFF3 file either have unique IDs or no IDs. For GFF3 files where there are any CDSs with identical IDs, then only CDSs with the same ID are merged into a single feature.
- The Import | Tracks tool now accepts files with a .fna extension.
- The display of the types of files to import using the Import | Tracks tool has been improved.
- The speed of importing to tracks where the original file contains data relating to many chromosomes has been substantially improved.
- RNA tracks imported from GFF3 format files are now colored according to their biotype.
- The Cosmic option of the Import | Tracks tool is now more flexible with regards to the column headings in the files being imported.
- An exporter has been added to export annotations on sequences or tracks to Generic Feature Format Version 3 (GFF3) format.
- A text exporter has been added.
- An option has been added to create an index file when exporting to BAM format.
New features and improvements
- Two new human reference data sets are available for download from the Reference Data Manager. One is based on Ensembl 86 and the other is based on RefSeq GRCh38.p9.
- The former top level Toolbox folder “Expression Analysis” has been removed and the expression analysis tools are now in two top-level folders: “RNA-Seq Analysis” and “Microarray and Small RNA Analysis“.
- When working with Gene Sets that refer to Gene Ontology terms, gene annotations are now automatically propagated to parent Gene Ontology terms. This improvement affects the tools: Identify Differentially Expressed Gene Groups and Pathways, Hypergeometric Tests on Annotations and Gene Set Enrichment Analysis (GSEA).
- The mapping tool in the Workbench, which is used in tools involving a mapping stage, such as Map Reads to References, Map Reads to Contigs and RNA-Seq Analysis has been updated. The update includes improved read mapping quality and speed (especially for longer reads), improved memory performance for the index building stage, and various minor bug fixes. The new mapping tool corresponds to the clc_mapper tool included in Assembly Cell 5.0.3, planned for release in March, 2017.
- Fixed an issue where sequence circularity was not reported in the output from the Map Reads to References tool.
- The default value for the parameter “Maximum guidance-variant length” in the tool Local Realignment tool has been changed to 200 (was 100). This change applies to all ready-to-use workflows and when the tools is launched directly.
- The Basic Variant Detection tool will no longer report N as an alternative allele when there is an ambiguous base at a variant position.
- Default values for two parameters of the InDels and Structural Variants tool have been changed when the tool is run as part of a ready-to-use workflow: “Minimum quality score” has been changed to 20 (was 0), and “Minimum consensus coverage” has been changed to 0.1 (was 0.0). Default values have not been changed in the case where the tool is launched directly.
- The report generated by the tool QC for Target Sequencing now includes a “≥” sign instead of a “>” sign.
- The “Additional Reporting” options in the QC for Sequencing Reads tool, “Quality analysis” and “Over-representation analysis”, have been removed. These outputs are now generated by default.
- A PubMed search option has been added to the Search for Reads in SRA tool. This returns only those runs that are associated with a PubMed abstract or full-text article.
- Support has been added for ‘negative lookahead’ when using Java regular expressions when using the Motif Search Tool.
- For new or existing sequence lists the sequencing platform can now be specified via the Read Group setting of the Element Info view.
- It is now possible to right-click on a table cell and filter table rows based on the value of that cell by choosing options under the new context menu section called “Table filters”. This change applies to all tables where advanced filtering is available.
- The speed of sorting and loading tracks has been greatly improved. Due to these changes, tracks created with this or later versions of the Workbench cannot be used with older Workbenches. Backwards compatibility has been maintained: tracks created using older versions of the Workbench can continue to be used.
- The speed of searches for data elements with associations to specified metadata, from within a Metadata Table, has been greatly improved. To enable metadata related searches to work after upgrading to Biomedical Workbench 4.0, indices for the locations containing the relevant data will need to be rebuilt.
- Tutorial windows are no longer blocked when a wizard is open.
- Less temporary space is now consumed when downloading data via the Reference Data Manager.
- Various minor improvements
Bug fixes and changes
- In all Ready-to-Use workflows containing the tool Map Reads to Reference, the default value for the parameter “Cost of insertions and deletions” has been changed to “affine” (it used to be “linear”). Default values have not been changed in the case where the tool is launched directly.
- Fixed an issue where the index building stage of the Map Reads to References tool was not taking into account the maxcores setting in the cpu.properties file, where this had been configured.
- Fixed a bug in the QC for Read Mapping tool , which sometimes reported incorrect read counts for circular sequences.
- Fixed an issue where the Basic Variant Detection, Low Frequency Variant Detection and Fixed Ploidy Variant Detection tools reported homozygous reference insertions in cases where a heterozygous variant was possible but the insertion variant was disregarded during filtering.
- Fixed an issue where the Identify Known Mutations from Sample Mappings tool would fail if it was part of a workflow and it received multiple input sample mappings as input.
- Fixed an issue with the Annotation Table view of a sequence where it was possible to change the types of annotations displayed at the same time as an annotation was being edited, which could lead to an error being thrown or the wrong annotation being changed.
- Fixed an issue with GenBank and EMBL exports where quoting specifications were not being conformed to.
- Fixed an issue with Primer Tables where an error resulted if either the option “Save Primer(s) Fwd, Rev” or “Save Fragment” was chosen and then the save operation was stopped by clicking on the Cancel button.
- Fixed an issue where in some cases filtering tables for empty values would not produce any results.
- Fixed an issue where advanced filtering did not work when looking for rows with cells containing multiple values using the filtering term “=” (equals).
- Fixed an issue where a workflow containing an export step that failed did not provide any indication that a problem had occurred.
- A sporadic java issue that led to errors including the text “java.lang.ClassCastException: sun.awt.image.BufImgSurfaceData cannot be cast to sun.java2d.xr.XRSurfaceData”, has been addressed through an upgrade to java. This issue was primarily seen when using the Workbench remotely on Linux systems.
- Fixed a problem with the identification of the correct sequence types from MLST schemes in cases where the schemes contained blank characters. This issue affected workbenches with QIAGEN CLC Microbial Genomics Module installed.
- Various minor bugfixes.
Retirement
- The GFF exporter has been retired and is no longer available. The new GFF3 exporter should be used instead.
- The Probabilistic Variant Detection (legacy) and Quality-based Variant Detection (legacy) tools have been retired and have been removed from the Legacy folder of the Toolbox.
- Tools in the Expression Profiling by Tags folder under the Toolbox | Legacy area have been retired and this folder has been removed. The tools retired are Extract and Count Tags, Create Virtual Tag List and Annotate Tag Experiment.
- The tool Trim Primers of Mapped Reads has been retired and has been removed from the Toolbox. For trimming primers from mapped reads, please use the Trim Primers and their Dimers from Mapping tool, which is distributed with the “QIAGEN GeneRead Panel Analysis Plugin“.
Plugin notes
- The Advanced RNA-Seq plugin has been retired. The tools from this plugin have been integrated into the software. Please see the New Tools for RNA-Seq section for more details.
Other notifications
- An option to opt out of providing anonymous usage information to QIAGEN has been added to the Workbench Preferences. We are not yet collecting any usage information so opting in or out does not have any effect at this time.