Products

Applications

QCI Interpret for Oncology

The most advanced end-to-end solution for oncology NGS analysis, interpretation and reporting

Ingenuity Pathway Analysis (IPA)

Analyze, compare and contextualize your NGS data with the leading pathway analysis application

Knowledge Bases

Biomedical Knowledge Bases

Gene Variant Databases

COSMIC (Catalogue of Somatic Mutations in Cancer)

HSMD (Human Somatic Mutation Database)

HGMD (Human Gene Mutation Database)

PGXI (Pharmacogenomic Insights)

'Omics Databases

QIAGEN DiseaseLand

QIAGEN OncoLand

QIAGEN Single Cell Land

ATCC Cell Line Land

QIAGEN OmicSoft Land Explorer

Biomedical Knowledge Base-AI

Find novel connections missed by traditional methods, hiding in over 640M biomedical relationships

Services

Discovery Bioinformatics Services

Clinical Analysis and Interpretation Services

QCI Precision Insights

Biomedical Knowledge Bases

Access critical drug discovery data, save time and explore novel biomedical relationships

QCI Precision Insights

A professional clinical interpretation service that translates molecular data specific to each patient into insights and therapeutic options

Biomedical Knowledge Base-HD

Directly access over 10M high-quality biological findings

QIAGEN receives European IVDR certification for QIAGEN Clincal Insight Interpret

Powerful cloud-enabled ‘omics GUI, complete NGS analysis workflows and unparalleled curated content for immediate exploration

Which secondary analysis solution is right for you?

Use our decision tree to find out which secondary analysis solution is right for your lab

Solutions

Discovery and Research

Biomarker Identification

Target Discovery

Mechanism of Action

Single-Cell Genomics

Microbial/Metagenomics

Gene Regulation

Variant Analysis

SARS-CoV-2 Solutions

OmicSoft NGS Data Analysis

Explore and compare data across 700K+ disease studies with a cloud-based NGS analysis suite

Data Sciences

Data and Pipeline Management

OmicSoft OncoLand

Explore high-quality, preprocessed genomics data with our oncology database

Webinar: How decentralized and small labs can adopt high-throughput NGS analysis

Discover a new secondary analysis solution for oncology & inherited disease applications for high-throughput use with any clincal NGS data

Clinical NGS Testing

Oncology

Solid Tumors

Hematological Malignancies

Hereditary Cancers

Inherited Disorders

Rare & Undiagnosed Diseases

Sample to Insight solutions

View our NGS workflows for labs of all sizes and experience - and find the right fit for you

Partner Program

Biomarker and Target Discovery

Augment your biomarker discovery research with 20M findings & 700K preprocessed ` omics samples

Rare and Undiagnosed Diseases

Finding a diagnosis for rare diseases is often a race against time. QDI is helping provide answers where none were available before

Clinical Testing Solutions

Deliver partient-specific reports for any NGS panel in minutes with on-demand, expert-curated content & professional interpretation services

Resources

App Notes & White Papers

Public Citations

Citation Guidelines

SARS-CoV-2 Resources

Knowledge Bases Blog: Using trusted cancer data can accelerate drug discovery and development

See our expert curation processes and how data curated this way helps biopharma research

Latest IPA Blog: Free pathway analysis - How much do you really save?

Discover why you need pathway analysis tools that provide rich, directional relationships

Latest Blog: Immune Repertoire Analysis Showdown: Speed, Ease, Accuracy

Find out which B-cell receptor reconstruction tool takes the crown

Webinar: Investigating genomic variants with QDI Software

Learn to analyze various types of NGS data with CLC Genomics Workbench, QCII Translational and IPA

Blog: How COSMIC & HSMD support different phases of cancer drug discovery and development

Learn how expert-curated cancer data can help biopharma researchers identify & validate drug targets faster & optimize clinical trial design

Support

Maintenance and Support

VIdeo: Introduction to QCI Interpret for Oncology

Check out this introductory video to QCI Interpret for Oncology, a CDS software that will allow you to confidently interpret NGS variants

Lateset improvements: IPA

See the newest improvements and updates in the IPA 2024R1 Release

Webinar: Leveraging the QIAGEN Knowledge Graph for insights into drug repurposing

This webinar shows how to predict novel drug-disease relationships and construct networks that capture relevant supporting evidence

Webinar: HGMD Pro in action: Search, curate and classify genetic variants - Session 2

Learn how HGMD Professional can help you get better variant data faster

Downloads

Product Downloads

Module and Plugin Downloads

Example Data

Somatic knowledge bases for clinical NGS testing

Expert-curated content for accelerated analysis of cancer mutations in clincal NGS testing

Somatic knowledge bases for biopharmaceutical research

Expert-curated content for the discovery and development of precision cancer therapies

Human Gene Mutation Database (HGMD) Professional

Improve diagnostics with the largest expert-curated source of hereditary disease-causing mutations

Latest improvements: OmicsSoft Lands

See the latest improvements and updates in the Lands 2024R1 Release

Video: Introduction to QCI for Hereditary Disorders

Check out this introductory video to QCI Interpret for Hereditary Disorders, industry's only automated FASTQ to final report solution

Press Release: QIAGEN enchances bioinformatics workflows with new secondary analysis solultion

QIAGEN launches QCI Secondary Analysis, a cloud-based solution enabling high-throughput secondary analysis with clincal NGS data

About

About Us

Press Releases

Contact

Careers | QIAGEN Digital Insights

Home > News BLOG > Using NGS to identify vector integration sites in the host genome

  Author: 
  QIAGEN Digital Insights

Author: qiagen

June 21, 2024

Using NGS to identify vector integration sites in the host genome

The integration of viruses, retroviruses, transposable elements or vectors into host genomes is a central feature of genome biology and bioengineering. Molecular characterization of insertion sites is one of the most important steps for ensuring this integration is safe and as intended – and it’s also a powerful tool for genetic screening strategies. Current ways to characterize these insertion events include inverse PCR (iPCR), targeted locus amplification (TLA) and next-generation sequencing (NGS) hybridization capture.

The purpose of characterization is to find all vector integration events, including incomplete events like partial or rearranged sequences. Common methods of hybridization capture will iteratively map reads to the inserted sequence and then map the “unaligned” ends to the host genome. If applicable, they also record discordant read mates by mapping one read to the inserted sequence and the other to the host genome (ex. 1).

New methods of vector integration site characterization

The CLC Identify Viral Integration Sites tool takes this a step further by supporting sequence capture enrichment protocols. These enrich for inserted sequences and capture chimeric reads and discordant paired reads. This tool has previously been used to identify human papillomavirus (HPV) integration sites (2).

Reads are mapped simultaneously against the host genome and a database of inserted virus or vector sequences. The Find Best References Using Read Mapping tool searches the database for the inserted sequence's best match to use as reference. By simultaneously mapping the reads to both virus and host genome, CLC can generate faster and more accurate results than with sequential mapping.

For reads that map to the host genome, unaligned ends are collected and mapped against the inserted sequence and vice versa. For broken read pairs that match the direction of the unaligned ends, the tool analyzes breakpoint information to identify host and virus reads. Users can customize these parameters for optimal sensitivity and specificity.

Let’s see how those tools perform with real data.

Vector integration site characterization with CLC

We used both Identify Viral Integration Sites and Find Best References Using Read Mapping to identify integration sites in the genome of Arabidopsis thaliana. This data was sourced from Illumina PE read NGS data of sequence capture probes from the T-DNA produced by Inagake et al. (3). Identified integration sites were verified by PCR in the original study. Sun et al. reanalyzed the data with new bioinformatic tools and found evidence of additional T-DNA integration sites. Many of these integration events were scrambled or partial.

CLC precisely identified the original PCR-verified pCAMBIA3300 integrations reported by Inagake et al. (Table 1, yellow), using the TAIR10 Arabidopsis host genome reference. It also identified the additional T-DNA integration sites from the more sophisticated reanalysis by Sun et al. (Table 1, green). Finally, CLC detected ten new integration sites (Table 1, blue), which creates a better resolution of local integration events and provides a more comprehensive characterization of the global integration landscape.

Table 1. Summary of the called insertions, their orientation, and the underlying supporting evidence. Read mappings to the inserted genome and to the host genome are also available (Figure 1 & 2). A circos plot provides a graphical interactive rendition of the table and the read mappings, where the concentric layers show coverage, broken pairs, unaligned ends and their starting points (Figure 3 & 4).

By using the right tools, you can accurately identify the T-DNA sequence, precise breakpoints, variants and rearrangements – without worrying that you could be missing sites. This helps you ensure that your work stays regulation-compliant, with all the genetic modifications characterized at molecular level.

Iterative mappings, read extractions, de novo assemblies and realignments to vector and host genomes allow you to reconstruct the integration events at identified sites. They cut through non-clean, partial and scrambled integration events that rearrange the genomic context, creating a complete picture ready for analysis. With its suite of advanced NGS analysis tools, CLC Genomics Workbench Premium remains the clear choice for safe, effective and efficient research.

Curious about how CLC can help your genomics research?

KEEP READING

Figure 1. Read mapping to inserted genome of sample SRR2077990 pCAMBIA3300-pFWA-HTB2-CFP_18, near the Left Border Repeat of the T-DNA, showing unaligned ends originating from the host genome.

Figure 2. Read mapping to host genome of sample SRR2077990 pCAMBIA3300-pFWA-HTB2-CFP_18, at two integration sites of the Left Border Repeat of the T-DNA (Chr1 28314915 and, in the opposite direction, 28314970), showing unaligned ends originating from the T-DNA.

Figure 3. Circos plot of the insertion site calls in sample SRR2077990 pCAMBIA3300-pFWA-HTB2-CFP_18, with concentric layers showing coverage, broken pairs, unaligned ends and their starting points.

Figure 4. Circos plot of the insertion site calls in sample SRR2077990 pCAMBIA3300-pFWA-HTB2-CFP_18, at positions Chr1 28314915 and, in the opposite direction, 28314970. 28314970 is the same region as shown in figures 1 and 2 for the insert and host genomes, respectively, with concentric layers showing coverage, broken pairs, unaligned ends and their starting points.

References

Sun L, et al. TDNAscan: A Software to Identify Complete and Truncated T-DNA Insertions. Front Genet. 2019;10:685. doi: 10.3389/fgene.2019.00685.
Inagaki S, Henry IM, Lieberman MC, Comai L. High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture. PLoS One. 2015;10(10):e0139672. doi: 10.1371/journal.pone.0139672.
Shen-Gunther J, Cai H, Wang Y. HPV Integration Site Mapping: A Rapid Method of Viral Integration Site (VIS) Analysis and Visualization Using Automated Tools in CLC Microbial Genomics. Int J Mol Sci. 2022;23(15):8132. doi: 10.3390/ijms23158132.

Share on:

Using NGS to identify vector integration sites in the host genome

New methods of vector integration site characterization

Vector integration site characterization with CLC

References

Categories

Categories

Upcoming Webinars

Follow Us

Using NGS to identify vector integration sites in the host genome

New methods of vector integration site characterization

Vector integration site characterization with CLC

References

Categories

Categories

Related Posts

Upcoming Webinars

Follow Us

Contact Us