Home > CLC Single Cell Analysis Module (commercial plugin)
CLC Single Cell Analysis Module provides tools and workflows for processing single-cell data. The module is a part of QIAGEN CLC Genomics Premium, our complete, full-feature package for ‘omics data analysis.
The following analyses are supported:
Analyses can be performed from raw FASTQ or from imported pre-processed data. Importers are provided for different expression and peak matrix formats, spatial transcriptomics data, clonotypes, and cell annotations and clusters.
The tools provided by the module can be found under the Single Cell Analysis folder in the Toolbox. Read more details in the manual.
Watch this webinar to learn how to use CLC Single Cell Analysis Module for single-cell gene expression analysis starting from either FASTQ or expression data, and try these tutorials on training a cell type classifier or performing velocity analysis.
Preparing the reads
The first step in performing any single-cell analysis starting from reads, is to annotate the reads with cell barcode, and optionally Unique Molecular Index (UMI) and hashtag. The annotated reads can then further be processed, as described below, depending on the type of the data.
CLC Single Cell Analysis Module supports scRNA-seq data analysis.
Creating the expression data matrix
The annotated reads are mapped to a reference using the Read mapping and counting tool.
The approach maps the reads to the transcriptome, genome and provided spike-ins. Mapping to the genome in addition to the transcriptome removes noise. Multi-mapping reads are placed using an expectation-maximization approach. A comprehensive report includes the types of features in the data (mRNA, lncRNA, etc.) and the correlation of expressions with known spike-in concentrations.
The first part of the analysis includes quality control and normalization steps.
Empty droplet detection, as well as doublet removal, is recommended for droplet-based approaches such as 10x Genomics.
Quality Control (QC) steps allow setting thresholds for various criteria for consideration of high-quality cells in downstream analysis. Some of the plots produced are shown below.
Figure 1. Plots from the quality control report.
Normalization of data affects all downstream analyses. The implementation can remove batch effects.
Figure 2. Batch correction: Several clusters are observed for each sample. After batch correction, clusters contain a mixture of both samples.
RNA velocity is a powerful method for analyzing time-resolved phenomena, such as embryogenesis or tissue regeneration. Velocity analysis is performed automatically when starting an analysis from FASTQ but requires spliced and unspliced counts to be present when starting from an imported matrix. The produced velocity matrix is used for scoring velocity genes and creating phase portraits. The matrix, together with cell clusters and/or annotations, aid the analysis of per-gene contributions to transitioning of cells.
Figure 3. Top: Browsable phase portrait with learned dynamics and steady-state ratio. There is one plot for each gene. Phase portraits can be overlaid with cell annotations and clusters, and data from gene expression, velocity and peak matrices. Here, the cells are colored by cell type. Bottom: UMAP showing velocity arrows, where cells are colored by the inferred latent time.
UMAP and tSNE are de facto standards for visualizing single-cell expression data. Our interactive 2D and 3D visualizations can be overlaid with cell annotations and clusters, and data from gene expression, velocity and peak matrices.
Figure 4. UMAP of single-cell data from 8393 liver cells (MacParland et al., 2018), colored by predicted cell types.
Manual annotation can be performed with just a few clicks by using the Lasso tool. Comprehensive filtering and selection options make it easy to select the relevant cells. The plot editor enables expression analysis and manual annotation. It has many visualization options.
Clustering is performed using the graph-based Leiden algorithm.
Cell type prediction is traditionally performed on clusters of cells. However, this has the disadvantage that errors in clustering, or simply too coarsely grained clusters, can lead to imprecise annotations. CLC Single Cell Analysis Module provides a pre-trained classifier that annotates individual cells. The classifier has been trained on large single-cell projects from human and mouse experiments, annotated with the QIAGEN Cell Ontology.
Figure 5. The QIAGEN Cell Ontology browser aids selection of cell types when performing manual curation. The ontology is also supported in the pre-trained classifier.
Differential gene expression between pairs of selected clusters or between a cluster and the rest of the cells can be quickly launched using the UMAP or tSNE plot editor and the expression of these genes can be visualized in the form of various expression plots (volcano plot, heat map, dot plot, violin plot). Differential gene expression can be used for GO analysis, guiding additional manual cluster annotation. Pathway analysis can be performed by uploading differential gene expression results to QIAGEN Ingenuity Pathway Analysis (IPA).
Figure 6. Heat Map, Dot Plot and Violin Plot of data where cell types were predicted using our pre-trained classifier.
CLC single Cell Analysis Module offers analysis of spatial transcriptomics data produced by 10x Visium Spatial Gene Expression and post-processed using Space Ranger.
Space Ranger outputs can be imported, and the resulting Spatial Transcriptomics plot can be used to visualize and interact with the spatial transcriptomics data. UMAP and tSNE plots can be associated to the Spatial Transcriptomics plot, such that selections made in one plot are also made in the other, and the same visualization options are applied to both plots.
Figure 7. The source of colors in the Spatial Transcriptomics plot at the top is controlled from the Side Panel of the UMAP plot at the bottom. Lasso selection in either plot is reflected in both plots.
CLC Single Cell Analysis Module offers analysis of clonotypes at the cellular level from scVDJ-seq (scTCR-seq and scBRC-seq) data. Clonotypes are predicted from cell contigs that are de novo assembled from the annotated reads.
Chains are identified and the V, D, J, C and CDR3 regions are annotated. The predicted clonotypes can be visualized as alignments between the assembled contigs and the annotated regions. Sankey plots show how the V, D, J, C and CDR3 regions form the clonotypes for the different chains.
Filters on different criteria can be applied, e.g., retain only cells for which scRNA-seq data is available, remove non-productive clonotypes, keep only specific chains, etc.
Clonotypes can also be combined and compared across sample/treatment and other groupings.
Reports summarize the identified clonotypes, with information about diversity, V, D, J and C gene usage, CDR3 length distribution and clonotype frequency.
Figure 8. Plots from the immune repertoire reports.
Finally, the clonotypes can be converted to cell annotations and overlaid in UMAP and tSNE plots from scRNA-seq data.
Figure 9. UMAP of scRNA-seq data. Top: Cells are colored by the predicted V gene for the TRB chain from matched scTCR-seq data. Clonotyped cells for which the TRB clonotype could not be identified are annotated as “None”. Bottom: Cells are colored by the predicted cell type. Note that, as expected, only the T cells have matching clonotypes.
CLC Single Cell Analysis Module supports analysis of scATAC-seq data. Peak calling is performed on deduplicated mappings obtained from the annotated reads. Further, nearby genes and transcription factor (TF) motif scans are conducted, and a comprehensive QC report is produced. Read mappings and graph tracks can be split by groups of cells and visualized together in a genome browser. It is also possible to generate UMAP and tSNE plots from the peak matrix.
Figure 10. Peak count analysis showing genome browser view (top), the UMAP plot calculated on the count matrix (bottom left), and tables containing information on TF motif binding and nearby genes (bottom right).
CLC Single Cell Analysis Module supports hashtag analysis for a wide variety of types of data.
Hashtags from annotated reads are mapped to cell annotations. UMAP and tSNE plots produced using scRNA-seq or scATAC-seq data can be colored using these cell annotations, revealing which cells contained which hashtags, and in what amounts. Alternatively, the cell annotations can be used to further demultiplex scRNA-seq, scVDJ-seq and scATAC-seq into samples, if hashtags have been used for sample multiplexing.
Figure 11. UMAP of two samples scRNA-seq data multiplexed using TotalSeq. Cells are colored by sample. Top left: Multiplexed data. Top right: Demultiplexed data. Red cells represent unidentified samples (no matching hashtag). Bottom: Batch corrected demultiplexed data.
The workflows are designed to perform most of the possible steps in the analysis. For example, gene expression workflows produce a UMAP plot annotated with automatically predicted cell types and clusters, a Dot Plot, Heat Map and Violin Plot showing the expression of highly variable genes in each cell, and, optionally, velocity estimates if spliced and unspliced counts are available.
We frequently release updates and improvements, such new features or bug fixes. To get a complete overview, please visit the latest improvements page.