QIAGEN powered by

Download Pathogen Database returns too few reference sequences

Issue description

Multiple issues have been identified with the Download Pathogen Database tool that result in only a subset of the intended data being downloaded, and in the data that is downloaded coming from outdated sources.
This issue affects all searches where “NCBI Pathogen Detection” has been selected.
Depending on the choice made when downloading a pathogen reference database:

Results of downstream analyses making use of affected reference data sets should be considered critically, as they will be based on a partial, possibly outdated, dataset. For example, the use of incomplete genomic assemblies could lead to incorrect strain identification and recent outbreak data may be missing completely.

Recommendations

For retrieving genomic data, we recommend using the Download Custom Microbial Reference Database tool, which is not affected by this issue.

To download genomes from NCBI RefSeq

To download data from NCBI Pathogen Detection

The data downloaded can now be annotated using the Create Annotated Sequence List tool by matching on the “Assembly ID” column .
The metadata file from the NCBI can be used for this, by renaming it so the suffix is .txt. E.g. for the file mentioned above, the name should be changed to “PDG000000003.1542.metadata.txt”.

Note that renaming the headers will make the results more readable, e.g. “asm_level” to “Assembly Level”.

Affected software

This issue was addressed in MGM 21.1.1