New requirements for sequence identifiers, introduced with NCBI BLAST+ 2.8.1, are affecting the creation of BLAST databases in some versions of QIAGEN CLC software. (See the “Affected software and tools” section below.)
These requirements include a 50 character limit for local accessions, and stringent checks on the format of accessions that resemble PDB identifiers. The effect of these changes can be exacerbated in the QIAGEN CLC software due to how we handle sequence identifiers to avoid duplicate identifiers, which are not accepted by makeblastdb, the NCBI BLAST+ program for creating BLAST databases.
Upgrade your software to a version where this restriction is not present. See the “Affected software and tools section” below for details.
If upgrading is not an option, the following work-arounds are available:
BLAST databases made using older versions of the QIAGEN CLC Workbenches and QIAGEN CLC Genomics Server can be searched using tools in the affected versions of the software.
Here, we recommend avoiding underscores if you will be building BLAST databases using affected versions of the software, as these can lead to identifiers appearing to be malformed PDB identifiers, which will cause makeblastdb (the NCBI tool used by Create BLAST Database) to fail.
This problem affects only the following QIAGEN CLC software, where BLAST+ 2.9.0 was included:
This issue was addressed in CLC Genomics Workbench 20.0.3, CLC Main Workbench 20.0.3 and CLC Genomics Server 20.0.3, where we replaced the BLAST+ 2.9.0 makeblastdb tool, used by Create BLAST Database with makeblastdb from BLAST+ 2.6.0. This is the same version used in CLC Genomics Workbench 12.x. This change does not affect the searching of BLAST databases.
Earlier product versions (before 20.0) are not affected by this problem. Such versions include programs from BLAST+ 2.6.0 and earlier, which predate the restrictions relating to sequence identifiers that underly this issue.