During the current pandemic, the importance of continually monitoring viral genomes for new mutations has become fundamental to help guide decisions. The combined efforts of labs across the world have generated enormous amounts of SARS-CoV-2 sequencing data that must be analyzed in order to place it into the broader context of the pandemic.
QIAGEN has several resources to support SARS-CoV-2 data analysis. This includes the CoV-2 Insights Service for genomic surveillance, which offers full bioinformatics analysis for QIAGEN’s QIAseq SARS-CoV-2 Primer Panel, Ion Torrent’s Ion AmpliSeq SARS-CoV-2 Research Panel or the Illumina panels. However, if you find your panel data is currently not supported by this prebuilt solution, don’t worry. You can easily analyze any panel data by creating a simple workflow in QIAGEN CLC Genomics Workbench.
Here we show an example of building a workflow to process the long reads generated by Oxford Nanopore Technology using QIAGEN CLC Genomics Workbench with the Long Read Support plugin. We show a simple workflow that can process the data to generate variant calls. Using sequencing data from the University of Exeter (Baker et al., 2020), we provide an example analysis by examining the mutation signatures at different time points in the pandemic.
Download reference data as a GenBank file from NCBI and extract annotations using the ‘Convert to Tracks’ tool.
Figure 1 shows an example of a simple workflow, which consists of the following steps:
The called variants can be visualized in a track list using the reference genome and the variant tracks. This view makes it easy to monitor new mutations. An amino acid track helps us by distinguishing synonymous from non-synonymous mutations. In Figure 2, we show a subset of 9 variant tracks from samples collected at various time points in the pandemic. The tracks span from March 2020 to December 2020 and have been sorted chronologically by sample collection date from top to bottom. Here, we can see that variants are accumulating over time.
The latest data set from December 12, 2020 is a sequencing run of the B.1.1.7 strain. We identify the strain by adding the amino acid changes to the track list. In Figure 3, two of the variants characteristic of this strain can be seen, namely N501Y and P681H in the spike protein.
We visualize the viral evolution in a SNP tree using one of the oldest samples (2020-03-25 as root).
As you can see, constructing a workflow for the analysis of SARS-CoV2 variants in QIAGEN CLC Genomics Workbench is quick and easy. The entire workflow shown here can be run in less than 5 minutes from input to variant calling on a standard laptop for a sample of 400,000 reads. This allows for great scalability and efficiency in sample analysis.
Learn more and sign up for a free trial today.
Reference:
Baker, Dave J. et al. (2020) CoronaHiT: Large scale multiplexing of SARS-CoV-2 genomes using Nanopore sequencing. bioRxiv 2020.06.24