Author: QIAGEN Digital Insights
Author: qiagen
May 7, 2024

Immune Repertoire Analysis Showdown: Speed, Ease, Accuracy

Your tools can make or break your analysis, so we pitted the top B-cell receptor reconstruction tools against each other.

As treatments are increasingly tailored to an individual for greater efficacy, understanding the immune repertoire becomes more critical. B-cell receptor (BCR) reconstruction plays a crucial role in understanding the immune system’s response to various stimuli. Our immune system produces B cells, which carry unique receptors on their surfaces. These receptors act like keys, explicitly recognizing and binding to foreign invaders like viruses or bacteria. By reconstructing BCRs from single-cell RNA-seq (scRNA-seq) data, we can gain valuable information on the diversity and specificity of the immune response at the single-cell level.

Have you ever had to put a jigsaw puzzle together? That’s what happens when BCRs are reconstructed from scRNA-seq data. Choosing the right tool to assemble your numerous small pieces of data then becomes vital for achieving accurate results. But with so many software options out there, which one takes the crown?

Benchmarking

A recent publication (1) compared the performance of several popular tools for BCR reconstruction with scRNA-seq data, namely:

Abdul R. Estalefi and Mathias Østergaard Mikkelsen, students of the Department of Biological and Chemical Engineering at Aarhus University, replicated the study and added QIAGEN CLC Genomics Workbench v.23.0.5 (CLC) to the mix, equipped with the Biomedical Genomics Analysis plugin. Specifically, the Immune Repertoire Analysis tool of CLC was used, which was developed for bulk RNA-seq data and also included in the CLC Single Cell Analysis Module. Each single cell was treated as a separate sample. See the complete workflow.

Analyzed datasets

The following dataset types were analyzed:

Real data: Datasets with BCR sequences from actual immune cells (plasmablasts) obtained from earlier studies

Simulated data: Datasets mimicking real-world scenarios with mutations in the BCR genes (heavy and light chains).

Testimonial: CLC Genomics Workbench offers a complete platform, fully optioned and easy to use.

Results

The results, detailed here, show that:

  1. CLC achieved the highest average score and performed well across all real and simulated datasets, followed by BASIC and BALDR.
  2. CLC and BRACER excelled at reconstructing receptors in simulated datasets with added mutations
  3. CLC was the easiest to set up and use. Its user-friendly point-and-click interface requires no extra software installation or coding.
  4. CLC was resource-efficient, as it completed the analysis on a standard laptop. See the minimum requirements to run CLC.
Conclusion

When working on BCR reconstruction (or NGS data in general), you want a tool with everything you might need. CLC offers a comprehensive toolset for immune repertoire analysis of single-cell data, among other applications.

You also need a software package that is easy to set up, does not require coding and works across various hardware. There would be no need to invest in new hardware and spend weeks or months learning a programming language. This makes it easier to get started and enables you to generate insights from your data right away. Remember – the right tools can take your research to the next level.

Learn more or request a trial of CLC Genomics Workbench, your all-in-one toolkit.

References:

  1. Andreani T, et al. NAR Genom Bioinform. 2022;4(3).
  2. Bolotin DA, et al. Nat. Methods. 2015;12:380–381.
  3. Canzar S, et al. Bioinformatics. 2017;33:425–427.
  4. Lindeman I, et al. Nat. Methods. 2018;15:563–565.
  5. Upadhyay AA, et al. Genome Med. 2018;10:20.
  6. Rizzetto S, et al. Bioinformatics. 2018;34:2846–2847.
  7. Song L, et al. Nat. Methods. 2021;18:627–630.

Author acknowledgments: We thank Dr. Tommaso Andreani, Senior Principal Data Scientist at Sanofi, for continuous support and encouragement during this study.

Figure 1. CLC workflow used for dataset analysis.
The samples were handled in parallel using the Iterate element, and the results were aggregated for all samples using the Collect and Distribute elements. FASTQs pre-processed with the Trim Reads tool were used as workflow inputs. The per-sample part of the workflow consisted of three steps: (1) de-novo assembly, (2) consensus sequence extraction and (3) Immune Repertoire Analysis. IMGT Human BCR segments were used as reference segments. Compare Immune Repertoires produced the final output containing the clonotypes across all samples, which were exported to .csv to compare with the truth and compute the scores. A MacBook® Pro 2021 with an M1 Pro processor and 32 GB RAM was used to run the workflow.

Figure 2. Heat map showing each method's individual and average scores (y-axis) for the four datasets (x-axis). Leiden, Canzar and Upadhyay used plasmablast SMART-seq datasets with Sanger-sequenced ground truths. As previously described, SHM consisted of simulated datasets of somatic hypermutations in heavy and light chains (1).

Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.