New features and improvements
Tools for annotating and manipulating sequence lists
The new Sequence Lists folder under Toolbox | Utility Tools contains tools for working with sequence lists. This includes existing tools, with new names and expanded functionality, as well as new tools:
- Split Sequence List New tool: Splits up nucleotide or peptide sequence lists. The output can be a specified number of lists, lists containing a specified number of sequences, or lists containing sequences with particular attribute values, such as terms in the description.
- Update Sequence Attributes in Lists New tool: Updates and adds information about the sequences in a list. For example, descriptions can be updated, or new information types can be added based on information provided in an Excel file.
- Create Sequence List Existing tool. Create new sequence lists from sequence elements and/or sequence list elements. Previously available only from the File | New menu.
Other new functionality
- Rename Sequences in Lists Rename sequences within sequence lists by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- Rename Elements Rename elements by adding or removing characters, or replacing parts of names, optionally using regular expressions.
- Import data from Amazon S3 when launching workflows and save workflow results to Amazon S3
- A Heat map graphics exporter has been introduced for exporting heat maps to graphics file formats.
- Files containing tab separated values (.tsv) can be imported as tables using Standard Import.
Improved menu organization and tool access
- New top level menus:
- Connections For tools and functionality relevant to connections to other systems, such as a CLC Genomics Server.
- Utilities For general tools and functionality such as search tools, the Plugin Manager and Workflow Manager.
- Improvements to the contents and order of tools in other top level menus
- The Favorites tab, where favorite tools and frequently used tools are listed for easy access, is now available in the Launch dialog and the workflow Add Elements dialog, in addition to in the Toolbox area in the bottom, left side of the Workbench.
BLAST related updates
- BLAST has been upgraded to BLAST+ 2.12.0 that includes a number of improvements and bug fixes. A full list of BLAST+ 2.12.0 changes can be viewed at http://www.ncbi.nlm.nih.gov/books/NBK131777.
- The list of databases available using BLAST at NCBI has been expanded, including the addition of ‘16S ribosomal RNA sequences (Bacteria and Archea)’ and ‘28S ribosomal RNA sequences from Fungi type and reference material (LSU)’.
- When BLAST at NCBI is used with multiple query sequences, the job will continue even if particular sequences fail due to a problem. Results for successful searches (including those with no hits) are returned. Sequences missing from the results due to problems are recorded in the job log.
- Searches against the Patented protein sequences database using BLAST at NCBI work once again. Previously, these searches always failed, with a dialog message saying only that no hits were found even though an error was returned by the NCBI. For affected searches, the error was reported in the job log.
- Fixed an issue affecting BLAST HSP Tables where the calculation of percent overlap between blast hits in reverse direction and query sequence was based on a sequence length that was 2 base pairs two short leading to incorrect values.
- Improvements have been made to make it less likely that a “CPU usage limit was exceeded” error will be returned when running blastp, blastx, tblastn or tblastx using BLAST at NCBI.
Other improvements
- When viewing data, tabs within the same tab area can be re-ordered by drag and drop.
- Multiple tables can be exported to a single file when using the following exporters: Tab delimited text, Annotation tab delimited text, Table CSV, Annotation CSV.
- A choice of extinction coefficients has been introduced in Create Sequence Statistics.
- A Create Sequence List workflow element is available, replacing the New | Sequence List element. Create Sequence List can be connected to many more tools downstream than the earlier element.
- In a workflow, Extract Annotated Regions (formerly Extract Annotations) can be connected to many more downstream tools than earlier.
- Trim Sequences specifies which version of the UniVec database was used, both in the report and in the history of the trimmed sequences output.
- When the option to create a log is enabled when launching analyses in batch mode, a log file is created for each batch unit, as well as a combined log for all the analyses. Previously, only the combined log was generated.
- The table search criteria “is in list” and “is not in list” can be used with integers without specifying a thousand separators in the search term.
- The few tools that directly manipulate input elements, instead of generating a new element containing the changes as output, now generate a new element as output when used within a workflow. This allows them to be handled like any other tool in a workflow context.
- In addition to sequence elements, Add attB Sites accepts sequence lists with fewer than 10,000 sequences as input.
- Internal compression of CLC data has been improved. Elements created with this version of the software, with compression enabled, can be opened in version 21.0.5 and higher. Data must be exported or saved as uncompressed if sharing data with earlier versions of the software.
- Various minor improvements
Bug fixes
- Fixed an issue in Create Box Plot where percentiles reported in the history of a box plot element were off by one. For example, the “25%-ile” value was given the 24th percentile value. The correct values were used in the plots themselves.
- Fixed an issue causing BLAST hits with an identity below 40% to be shown in black even if the threshold for coloring was set lower than this.
- Fixed an issue where threshold values for color selectors in the side panel of the View Area could not be adjusted.
- Fixed an issue where specifying the color range values for heat maps in the side panel settings did not work.
- Fixed an issue where the names of outputs from Output elements attached directly to an Iterate element in workflows were not as intended when the metadata ({3} placeholder was used. We generally recommend that the specific input number(s) to include in output names are specified when configuring workflows that contain control flow elements.
- An element’s position within a folder in the Navigation Area can be controlled when copy/pasting, with the pasted element appearing above a selected element in the same folder. This fixes an issue introduced in CLC Main Workbench 8.1, where pasted elements were always placed at the bottom of the list in a folder when pasting.
- Fixed an issue where the content of the recycle bin was not shown correctly after the recycle bin had been emptied.
- Various minor bug fixes
Changes
- The Extract Annotations tool is now named Extract Annotated Regions.
- The tool Set Up Experiment is now named Set Up Microarray Experiment.
- The workflow element for creating sequence lists is under the Utility Tools folder. It no longer appears under the “New” list in the Add Elements dialog.
- Input modifying tools within workflows generate an output element instead of directly modifying the input provided. Workflows containing these tools may need to be edited.
- The Cut, Copy and Paste buttons have been removed from the toolbar. These functionalities are still available using items under the Edit menu or standard keyboard shortcuts.
- The Restore option under the Edit menu, for moving elements back out of the recycle bin, is now called Restore from Recycle bin.
- The Empty option under the Edit menu, for emptying the recycle bin, is now called Empty Recycle bin.
- The Java version bundled with CLC Main Workbench 22.0 is Java 11.0.10, where we use the JRE from AdoptOpenJDK.
Legacy tools
The following tool has been moved to the Legacy folder of the Workbench Toolbox and will be retired in a future version of the software:
Retirements
- The right-click option “Run in batch mode (legacy) for launching installed, multi-input workflows in Batch mode has been retired. Workflows can be launched in batch mode using standard launch functionality.