This blog tutorial highlights several recent improvements in the latest update to QIAGEN CLC Microbial Genomics Module 20.1. The update includes improved usability in the Download Microbial Reference Database tool and improved support for long reads in Taxonomic Profiling. Some of the improvements include:
With the 20.1 update, it is now easy to customize the Microbial Reference Database to fit your needs. Here we demonstrate two use cases:
The updated downloader makes it simple to visualize phylogenetic relationships. To create a dendrogram of the four coronavirus genera, we first create a microbial database containing only coronavirus:
Approximately 200 references remained and were downloaded with a minimum contig length of 1000. The five samples with an unknown genus were included in the downloaded database.
The phylogenies of the downloaded database of assemblies can be easily visualized using Create K-mer Tree. In Create K-mer Tree, select the downloaded database of coronavirus genomes. The dendrogram shown was created with default settings, except "Only index k-mers with prefix" was left blank due to the short length of coronavirus genomes.
Figure 1 shows a circular dendrogram with added genus metadata. For ease of viewing, 50% of both the alphacoronavirus and betacoronavirus genomes have been excluded from the tree.
In the tree, the five references without a genus are selected and their branches are shown in dark blue. From the tree, we can see that three of these references cluster with the betacoronavirus, one clusters with the alphacoronavirus and one clusters between alphacoronavirus and gammacoronavirus.
This highlights a quick and easy way to download a database of viral genomes, and how to use the database to create a phylogeny. The phylogeny can then be used to resolve samples of unknown genus.
Create K-mer tree also works with reads. In the next section, we demonstrate how to create a taxonomic profile with metagenome samples.
With the recent updates to the Download Microbial Reference Database and Taxonomic Profiling functions in QIAGEN CLC Microbial Genomics Module, it is now fast and easy to detect coronavirus presence in metagenome samples containing only a few virus reads. Taxonomic profiling now also supports long reads such as those generated by Oxford Nanopore and PacBio sequencing technologies.
For the first time setup, we create a viral database:
All complete virus genomes to date, approximately 18,500, remained and were downloaded with a minimum contig length of 1000.
The downloaded database was used to create a taxonomic profiling index using default settings.
The analysis can be carried out in a simple workflow using the curated Microbial Reference Database and human genome to create a Taxonomic Profiling index for host genome filtering (Figure 2).
Results are presented from 3 different studies with low fraction of viral reads (Table 1).
Abundance virus values have been aggregated to species level and table filtered to abundance >10. The % viral reads is the percentage of reads in the sample matching the virus database.
Sample | % viral reads | Species | Taxonomy | Abundance |
SRR10948550 |
1.0556 |
Severe acute respiratory syndrome-related coronavirus | Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Coronaviridae; Betacoronavirus; Severe acute respiratory syndrome-related coronavirus | 985 |
Ambystoma tigrinum virus | Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Ambystoma tigrinum virus | 39 | ||
Common midwife toad virus | Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Common midwife toad virus | 26 | ||
SRR11092061 |
0.0045 |
Severe acute respiratory syndrome-related coronavirus | Orthornavirae; Pisuviricota; Pisoniviricetes; Nidovirales; Coronaviridae; Betacoronavirus; Severe acute respiratory syndrome-related coronavirus | 1304 |
Spodoptera frugiperda rhabdovirus | Orthornavirae; Negarnaviricota; Monjiviricetes; Mononegavirales; Rhabdoviridae; Spodoptera frugiperda rhabdovirus | 822 | ||
Saccharomyces 20S RNA narnavirus | Orthornavirae; Lenarviricota; Amabiliviricetes; Wolframvirales; Narnaviridae; Narnavirus; Saccharomyces 20S RNA narnavirus | 336 | ||
Stenotrophomonas virus SMA7 | Loebvirae; Hofneiviricota; Faserviricetes; Tubulavirales; Inoviridae; Subteminivirus; Stenotrophomonas virus SMA7 | 126 | ||
Influenza A virus | Orthornavirae; Negarnaviricota; Insthoviricetes; Articulavirales; Orthomyxoviridae; Alphainfluenzavirus; Influenza A virus | 112 | ||
Nipah henipavirus | Orthornavirae; Negarnaviricota; Monjiviricetes; Mononegavirales; Paramyxoviridae; Henipavirus; Nipah henipavirus | 48 | ||
Common midwife toad virus | Bamfordvirae; Nucleocytoviricota; Megaviricetes; Pimascovirales; Iridoviridae; Ranavirus; Common midwife toad virus | 12 | ||
Inoviridae sp | Loebvirae; Hofneiviricota; Faserviricetes; Tubulavirales; Inoviridae; Inoviridae sp | 12 | ||
ERR4385803 |
0.6578 |
Gokushovirus WZ-2015a | Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Gokushovirus WZ-2015a | 19753 |
Human gut gokushovirus | Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Human gut gokushovirus | 3883 | ||
Microviridae sp | Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; Microviridae sp | 1726 | ||
Microviridae | Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae | 47 |
The negative control sample ERR4385803 correctly reports no coronavirus. The abundance of virus was correctly reported in both positive samples (Table 1).
References: