QIAGEN powered by

Region and gene-level CNV detection problems when low coverage control targets present

Issue description

An issue has been identified in Copy Number Variant Detection (CNV) (CLC Genomics Workbench) and CNV and LOH Detection (Biomedical Genomics Analysis plugin) that can affect Region-level CNV and Gene-level CNV results for chromosomes where there are targets with low coverage in the control sample. Target-level CNV results are not affected.

Regions are calculated by splitting the chromosome at positions where scores calculated for a sliding window change abruptly between two targets, for example, when targets change from having a positive fold change to a negative fold change. These scores should be computed for all targets, such that the chromosome can be split into regions between any two targets. However, instead, we take the number of targets on each chromosome that exceed the low coverage minimum threshold (N) and then calculate the scores for the first N targets on the chromosome. This means that after the first N targets, we do not attempt to split the remaining region into smaller regions. So, for example, if 10% of targets have coverage below the threshold, we will only consider region break points for the first 90% of targets along the chromosome.

Where only a few targets in the control sample(s) have coverage below the specified low coverage cutoff, the risk due to this issue is quite low. As the number of targets below the specified low coverage cutoff increases, so too does the chance that regions of interest are overlooked or misreported. When the issue occurs, too few Region-level CNV results and too many Gene-level CNVs will be produced.
Region-level CNV and Gene-level CNV results are not affected by this issue if no low coverage target regions are identified in the control read mapping(s).

Recommendations

  1. When running Copy Number Variant Detection (CNV) or CNV and LOH Detection, use a target region track specific to your experiment, thereby decreasing the potential for the inclusion of targets with low coverage in the analysis.
  2. Avoid increasing the value for the “Low coverage target“ option too much, such that a substantial number of targets are considered to have low coverage. Low coverage target regions can be identified through the comment “Low coverage target“ in the Target-level region CNV track.

Affected software

This issue was fixed in CLC Genomics Workbench 21.0.6 and 22.0.1, CLC Genomics Server 21.0.6 and 22.01, and Biomedical Genomics Analysis plugin 21.2.1 and 22.0.1.