Highlights of the latest update to QIAGEN CLC Microbial Genomics Module 20.1

Author:

QIAGEN Digital Insights

Highlights of the latest update to QIAGEN CLC Microbial Genomics Module 20.1

Using viral reference databases for phylogeny construction and taxonomic profiling of samples with low viral load

This blog tutorial highlights several recent improvements in the latest update to QIAGEN CLC Microbial Genomics Module 20.1. The update includes improved usability in the Download Microbial Reference Database tool and improved support for long reads in Taxonomic Profiling. Some of the improvements include:

  • Faster load times for the selection table, which now loads in just seconds
  • Full access to the latest assemblies from NCBI with a taxonomy-aware download selection
  • No deduplication: The tool no longer removes duplicate sequences, as this functionality has been moved to Create Taxonomic Profiling Index

With the 20.1 update, it is now easy to customize the Microbial Reference Database to fit your needs. Here we demonstrate two use cases:

  • Visualizing phylogenetic relationships of all coronavirus genomes
  • Creating a taxonomic profiling index of all viral genomes and carrying out taxonomic profiling of viral metagenome samples containing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in a few simple steps
Visualizing phylogenetic relationships made easy

The updated downloader makes it simple to visualize phylogenetic relationships. To create a dendrogram of the four coronavirus genera, we first create a microbial database containing only coronavirus:

  • Run the Download Microbial Reference Database tool to load the Database builder
  • Filter the table to show only entries where the Taxonomy column contains ‘coronoviridae’
  • Aggregate rows on Genus; we observe five samples which do not include the genus
  • Use Quick Selection: “Complete genomes in RefSeq” to quickly select all complete, coronavirus genomes

Approximately 200 references remained and were downloaded with a minimum contig length of 1000. The five samples with an unknown genus were included in the downloaded database.

The phylogenies of the downloaded database of assemblies can be easily visualized using Create K-mer Tree. In Create K-mer Tree, select the downloaded database of coronavirus genomes. The dendrogram shown was created with default settings, except “Only index k-mers with prefix” was left blank due to the short length of coronavirus genomes.

Figure 1 shows a circular dendrogram with added genus metadata. For ease of viewing, 50% of both the alphacoronavirus and betacoronavirus genomes have been excluded from the tree.

In the tree, the five references without a genus are selected and their branches are shown in dark blue. From the tree, we can see that three of these references cluster with the betacoronavirus, one clusters with the alphacoronavirus and one clusters between alphacoronavirus and gammacoronavirus.

This highlights a quick and easy way to download a database of viral genomes, and how to use the database to create a phylogeny. The phylogeny can then be used to resolve samples of unknown genus.

 

Figure 1. Dendrogram of the four coronavirus genera.

 

Create K-mer tree also works with reads. In the next section, we demonstrate how to create a taxonomic profile with metagenome samples.

Create a taxonomic profiling index and detect abundance of coronavirus in metagenome samples with low coronavirus copy number

With the recent updates to the Download Microbial Reference Database and Taxonomic Profiling functions in QIAGEN CLC Microbial Genomics Module, it is now fast and easy to detect coronavirus presence in metagenome samples containing only a few virus reads. Taxonomic profiling now also supports long reads such as those generated by Oxford Nanopore and PacBio sequencing technologies.

For the first time setup, we create a viral database:

  • Run the Download Microbial Reference Database tool to load the Database builder
  • Filter the table to show only entries where the Taxonomy column contains ‘virae’ – we skip the remaining virus kingdom in the interest of speed
  • Use ’Quick Selection: Complete genomes in RefSeq’ to quickly select all complete, viral genomes

All complete virus genomes to date, approximately 18,500, remained and were downloaded with a minimum contig length of 1000.

The downloaded database was used to create a taxonomic profiling index using default settings.

The analysis can be carried out in a simple workflow using the curated Microbial Reference Database and human genome to create a Taxonomic Profiling index for host genome filtering (Figure 2).

Figure 2. Taxonomic Profiling workflow.

 

Results are presented from 3 different studies with low fraction of viral reads (Table 1).

  • SRR10948550: Long read sequencing using Oxford Nanopore (1)
  • SRR11092061: Paired end sequencing using Illumina HiSeq 3000 (2)
  • ERR4385803: Paired end sequencing using Illumina HiSeq 2500 (gut virome sample – negative for SARS-CoV-2)

Abundance virus values have been aggregated to species level and table filtered to abundance >10. The % viral reads is the percentage of reads in the sample matching the virus database.

Table 1. Abundances for the different samples (results have been aggregated to species level)
Sample % viral reads Species Taxonomy Abundance
 

SRR10948550

 

1.0556

Severe acute respiratory syndrome-related coronavirus Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Coronaviridae; Betacoronavirus; Severe acute respiratory syndrome-related coronavirus 985
Ambystoma tigrinum virus Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Ambystoma tigrinum virus 39
Common midwife toad virus Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Common midwife toad virus 26
 

 

 

SRR11092061

 

 

 

0.0045

Severe acute respiratory syndrome-related coronavirus Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Coronaviridae; Betacoronavirus; Severe acute respiratory syndrome-related coronavirus 1304
Spodoptera frugiperda rhabdovirus Orthornavirae; Negarnaviricota; Monjiviricetes; Mononegavirales; Rhabdoviridae; Spodoptera frugiperda rhabdovirus 822
Saccharomyces 20S RNA narnavirus Orthornavirae; Lenarviricota; Amabiliviricetes; Wolframvirales; Narnaviridae; Narnavirus; Saccharomyces 20S RNA narnavirus 336
Stenotrophomonas virus SMA7 Loebvirae; Hofneiviricota; Faserviricetes; Tubulavirales; Inoviridae; Subteminivirus; Stenotrophomonas virus SMA7 126
Influenza A virus Orthornavirae; Negarnaviricota; Insthoviricetes; Articulavirales; Orthomyxoviridae; Alphainfluenzavirus; Influenza A virus 112
Nipah henipavirus Orthornavirae; Negarnaviricota; Monjiviricetes; Mononegavirales; Paramyxoviridae; Henipavirus; Nipah henipavirus 48
Common midwife toad virus Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Common midwife toad virus 12
Inoviridae sp Loebvirae; Hofneiviricota; Faserviricetes; Tubulavirales; Inoviridae; Inoviridae sp 12
 

 

ERR4385803

 

 

0.6578

Gokushovirus WZ-2015a Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Gokushovirus WZ-2015a 19753
Human gut gokushovirus Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Human gut gokushovirus 3883
Microviridae sp Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Microviridae sp 1726
Microviridae Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae 47

The negative control sample ERR4385803 correctly reports no coronavirus. The abundance of virus was correctly reported in both positive samples (Table 1).

References:

  1. Zhou, P. et al. (2020) A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 7798: 270-273.
  2. Chan, J.F.W. et al. (2020) A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet 10223: 514-523.