Genomics analysis of wastewater for SARS-CoV-2 tracking

Author:

QIAGEN Digital Insights

Genomics analysis of wastewater for SARS-CoV-2 tracking

Did you know SARS-CoV-2 is shed in the feces of individuals with symptomatic or asymptomatic infection? Viral particles shed into wastewater via the sewer system are no longer infectious but can still be measured. Therefore, recent public health monitoring efforts target sewers to identify known genotypes of SARS-CoV-2. Genotyping by sequencing SARS-CoV-2 from wastewater correlates with sequencing results in patients in the wastewater catchment area, providing an efficient monitoring tool for viral epidemiology. Wastewater is readily available at sewage plants, and collection of wastewater samples avoids biases associated with sampling from hospitals or testing facilities (1).

PCR approaches are highly effective on well-targeted variants, and multiplexing strategies capable of simultaneously targeting several mutations can unravel the mutation patterns of circulating variants. However, NGS approaches can find new variants, increase the sensitivity of variant detection and provide an unbiased representation of the variants circulating in populations. It is also used for whole genome SNP analysis in local epidemiological analyses, such as hospital infection control and local outbreak tracing.

Whether using Oxford Nanopore, Illumina, PacBio or IonTorrent technology, and whether using ARTIC or vendor-designed panels, QIAGEN CLC Genomics Workbench has standard SARS-CoV-2 analysis workflows that can easily be modified towards any platform, protocol and application by exchanging workflow elements, primer design files or parameter settings.

The general approach of the workflows is mapping the reads to a reference, calling variants, generating a consensus sequence and generating outputs that enable efficient review of results, including cross-sample comparison. See (2) for examples of building workflows.

When working with several samples, multi-FASTA export of consensus sequences, as well as PDF export of the quality report, is easily accomplished.

Typically, the generated consensus sequences are manually submitted to Nextclade and Panoglin to annotate the samples with the latest phylogenetic lineage information.

For high-throughput use, any manual steps introduce errors and inefficiencies. QIAGEN CLC Genomics Server software has the capability to automate linage annotation processes by making use of its “external applications” functionality, where regularly-updated docker images of Nextclade or Pangolin can be included in CLC workflows (Figure 1). For other examples of external applications, see (3).

a)

b)

c)

Figure 1. a) Using QIAGEN CLC Genomics Server, Nextclade and Pangolin docker images are added to CLC as an “external application” so that the functionalities can be integrated into CLC workflows to assign lineage information to the sample. b) Example output of the Nextclade functionality and c) example output of the Pangolin functionality of the CLC workflow shown.

The server software is also well-suited for handling many workflow executions in parallel, as it has a “scheduler” functionality that manages the execution queue. This queuing ability ensures that parallel workflow execution is coordinated, and individual steps do not interfere with each other by competing for computational resources. External applications can also be executed in the cloud by using QIAGEN CLC Genomics Cloud Engine, reducing local hardware needs to a minimum. QIAGEN CoV-2 Insights service is an instance of this architecture, available if you wish to use this pipeline without setting up the software on your own.

These bioinformatic workflows work fine in cases where it can be assumed that there is only one dominant strain in circulation. However, in situations where a novel strain is emerging and there are several possibilities to monitor, it is a better strategy to test for evidence of marker mutations in the reads. A tool that can be used for this purpose, by monitoring predefined reference positions in read mappings, is the “Identify Known Mutations from Sample Mappings” algorithm, which outputs whether the variant could be detected or not, whether the coverage was sufficient at the given position, the frequency and other statistics of the variant(s) in the sample. As input, the tool takes the read mapping and a variant track that holds the specific variants that you wish to test for. By applying the mutation tester tool iteratively, in series, with variant tracks for each SARS-CoV-2 strain one wishes to monitor, you can test for evidence of many strains in a single workflow (Figure 2), which can then be applied on batches of samples simultaneously, providing a fully-scalable solution that only needs updating when new strains are expected to enter the population.

Figure 2. A QIAGEN CLC Genomics Workbench workflow interrogating input sample read mapping to a SARS reference at genomic positions defining known variants of the virus. The workflow can be executed in batch mode to monitor many samples simultaneously.

References:

  1. Wurtz, N., et al. (2021). Monitoring the Circulation of SARS-CoV-2 Variants by Genomic Analysis of Wastewater in Marseille, South-East France. Pathogens 10, 1042. https://doi.org/10.3390/pathogens10081042
  2. Theiagen Consulting LCC video tutorials on how to build workflows in CLC Genomics Workbench for SARS CoV-2 analysis
  3. Theiagen Consulting LCC video tutorials on how to include external applications to the CLC Genomics Server: a) RAxML b) MAFFT c) iVAR

Additional resources:

Related blog posts:

Learn more about the capabilities of QIAGEN CLC Genomics Workbench Premium and download your free trial today.