QIAGEN powered by

Extract Reads from Selection in abundance tables not working as expected in some circumstances

Issue description

When rows in an abundance table generated using tools of the QIAGEN CLC Microbial Genomics Module are selected, and the Extract Reads from Selection button is used, the resulting sequence list may not always contain just the reads assigned to the selected row(s) in the table.

There are two different situations where problems are known to arise:

1) When extracting reads for a row with an incomplete taxonomy (marked by the identifier “Unknown”).

The sequence list generated will contain all the reads assigned to the selected row in the abundance table, as well as reads assigned to any of its children in the taxonomy. For example a sequence list generated by extracting reads for an entry called Firmicutes (Unknown) will also contain reads from other rows representing levels under that phylum, for example, Lactobacillales (Unknown).

2) When extracting reads when several rows are selected in the abundance table.

If all the selected rows have the same taxonomic level, the selection works as you would expect, i.e. the union of the reads associated with that selection is extracted. However, if different taxonomic levels are selected, then you can get different reads returned, depending on the nature of the selection.

 

Recommendations

We recommend that the Extract Reads from Selection button is not used when working with incomplete taxonomies in the affected software.

Please check whether the list of extracted reads generated using this functionality contains the number of reads reported in the abundance table.

 

Affected software

QIAGEN CLC Microbial Genomics Module 4.1 through 20.1.1.