QIAGEN powered by

Restrictions on sequence names when creating BLAST databases

Issue description

New requirements for sequence identifiers, introduced with NCBI BLAST+ 2.8.1, are affecting the creation of BLAST databases in some versions of QIAGEN CLC software. (See the “Affected software and tools” section below.)

These requirements include a 50 character limit for local accessions, and stringent checks on the format of accessions that resemble PDB identifiers. The effect of these changes can be exacerbated in the QIAGEN CLC software due to how we handle sequence identifiers to avoid duplicate identifiers, which are not accepted by makeblastdb, the NCBI BLAST+ program for creating BLAST databases.

Recommendations

Upgrade your software to a version where this restriction is not present.  See the “Affected software and tools section” below for details.

If upgrading is not an option, the following work-arounds are available:

  1. Download an pre-formatted database from the NCBI using Download BLAST Database.
  2. Install an older version of a QIAGEN CLC Workbench and use its BLAST-related tools. See the linked FAQ about how to get installers for older versions of the software.

    BLAST databases made using older versions of the QIAGEN CLC Workbenches and QIAGEN CLC Genomics Server can be searched using tools in the affected versions of the software.

  3. Use Batch Rename to rename the sequences with unique identifiers shorter than 50 characters.

    Here, we recommend avoiding underscores if you will be building BLAST databases using affected versions of the software, as these can lead to identifiers appearing to be malformed PDB identifiers, which will cause makeblastdb (the NCBI tool used by Create BLAST Database) to fail.

  4. Create or obtain BLAST databases from another source, and place these in a location your QIAGEN CLC software knows about. (E.g. Using Manage BLAST Databases for QIAGEN CLC Workbenches or by configuring the QIAGEN CLC Genomics Server accordingly.)

 

Affected software and tools

This problem affects only the following QIAGEN CLC software, where BLAST+ 2.9.0 was included:

Software affected

This issue was addressed in CLC Genomics Workbench 20.0.3, CLC Main Workbench 20.0.3 and CLC Genomics Server 20.0.3, where we replaced the BLAST+ 2.9.0 makeblastdb tool, used by Create BLAST Database with makeblastdb from BLAST+ 2.6.0. This is the same version used in CLC Genomics Workbench 12.x. This change does not affect the searching of BLAST databases.

Earlier product versions (before 20.0) are not affected by this problem. Such versions include programs from BLAST+ 2.6.0 and earlier, which predate the restrictions relating to sequence identifiers that underly this issue.

Tools affected