English

Products

Applications

QCI Interpret for Oncology

The most advanced end-to-end solution for oncology NGS analysis, interpretation and reporting

Ingenuity Pathway Analysis (IPA)

Analyze, compare and contextualize your NGS data with the leading pathway analysis application

Knowledge Bases

Biomedical Knowledge Bases

Gene Variant Databases

COSMIC (Catalogue of Somatic Mutations in Cancer)

HSMD (Human Somatic Mutation Database)

HGMD (Human Gene Mutation Database)

PGXI (Pharmacogenomic Insights)

'Omics Databases

QIAGEN DiseaseLand

QIAGEN OncoLand

QIAGEN Single Cell Land

ATCC Cell Line Land

QIAGEN OmicSoft Land Explorer

Biomedical Knowledge Base-AI

Find novel connections missed by traditional methods, hiding in over 640M biomedical relationships

Services

Discovery Bioinformatics Services

Clinical Analysis and Interpretation Services

QCI Precision Insights

Biomedical Knowledge Bases

Access critical drug discovery data, save time and explore novel biomedical relationships

QCI Precision Insights

A professional clinical interpretation service that translates molecular data specific to each patient into insights and therapeutic options

Biomedical Knowledge Base-HD

Directly access over 10M high-quality biological findings

QIAGEN receives European IVDR certification for QIAGEN Clincal Insight Interpret

Powerful cloud-enabled ‘omics GUI, complete NGS analysis workflows and unparalleled curated content for immediate exploration

Which secondary analysis solution is right for you?

Use our decision tree to find out which secondary analysis solution is right for your lab

Solutions

Discovery and Research

Biomarker Identification

Target Discovery

Mechanism of Action

Single-Cell Genomics

Microbial/Metagenomics

Gene Regulation

Variant Analysis

SARS-CoV-2 Solutions

OmicSoft NGS Data Analysis

Explore and compare data across 700K+ disease studies with a cloud-based NGS analysis suite

Data Sciences

Data and Pipeline Management

OmicSoft OncoLand

Explore high-quality, preprocessed genomics data with our oncology database

Webinar: How decentralized and small labs can adopt high-throughput NGS analysis

Discover a new secondary analysis solution for oncology & inherited disease applications for high-throughput use with any clincal NGS data

Clinical NGS Testing

Oncology

Solid Tumors

Hematological Malignancies

Hereditary Cancers

Inherited Disorders

Rare & Undiagnosed Diseases

Sample to Insight solutions

View our NGS workflows for labs of all sizes and experience - and find the right fit for you

Partner Program

Biomarker and Target Discovery

Augment your biomarker discovery research with 20M findings & 700K preprocessed ` omics samples

Rare and Undiagnosed Diseases

Finding a diagnosis for rare diseases is often a race against time. QDI is helping provide answers where none were available before

Clinical Testing Solutions

Deliver partient-specific reports for any NGS panel in minutes with on-demand, expert-curated content & professional interpretation services

Resources

App Notes & White Papers

Public Citations

Citation Guidelines

SARS-CoV-2 Resources

Knowledge Bases Blog: Using trusted cancer data can accelerate drug discovery and development

See our expert curation processes and how data curated this way helps biopharma research

Latest IPA Blog: Free pathway analysis - How much do you really save?

Discover why you need pathway analysis tools that provide rich, directional relationships

Latest Blog: Immune Repertoire Analysis Showdown: Speed, Ease, Accuracy

Find out which B-cell receptor reconstruction tool takes the crown

Webinar: Investigating genomic variants with QDI Software

Learn to analyze various types of NGS data with CLC Genomics Workbench, QCII Translational and IPA

Blog: How COSMIC & HSMD support different phases of cancer drug discovery and development

Learn how expert-curated cancer data can help biopharma researchers identify & validate drug targets faster & optimize clinical trial design

Support

Maintenance and Support

VIdeo: Introduction to QCI Interpret for Oncology

Check out this introductory video to QCI Interpret for Oncology, a CDS software that will allow you to confidently interpret NGS variants

Lateset improvements: IPA

See the newest improvements and updates in the IPA 2024R1 Release

Webinar: Leveraging the QIAGEN Knowledge Graph for insights into drug repurposing

This webinar shows how to predict novel drug-disease relationships and construct networks that capture relevant supporting evidence

Webinar: HGMD Pro in action: Search, curate and classify genetic variants - Session 2

Learn how HGMD Professional can help you get better variant data faster

Downloads

Product Downloads

Module and Plugin Downloads

Example Data

Somatic knowledge bases for clinical NGS testing

Expert-curated content for accelerated analysis of cancer mutations in clincal NGS testing

Somatic knowledge bases for biopharmaceutical research

Expert-curated content for the discovery and development of precision cancer therapies

Human Gene Mutation Database (HGMD) Professional

Improve diagnostics with the largest expert-curated source of hereditary disease-causing mutations

Latest improvements: OmicsSoft Lands

See the latest improvements and updates in the Lands 2024R1 Release

Video: Introduction to QCI for Hereditary Disorders

Check out this introductory video to QCI Interpret for Hereditary Disorders, industry's only automated FASTQ to final report solution

Press Release: QIAGEN enchances bioinformatics workflows with new secondary analysis solultion

QIAGEN launches QCI Secondary Analysis, a cloud-based solution enabling high-throughput secondary analysis with clincal NGS data

About

About Us

Press Releases

Contact

Careers | QIAGEN Digital Insights

Home > News BLOG > 4 reasons you should use manually curated data

  Author: 
  QIAGEN Digital Insights

Author: qiagen

January 13, 2023

4 reasons you should use manually curated data

Why manually curated data is essential to convert data into knowledge

Are you a researcher or data scientist working in drug discovery? If so, you depend on data to help you achieve unique insights by revealing patterns across experiments. Yet, not all data are created equal. The quality of data you use to inform your research is essential. For example, if you acquire data using natural language processing (NLP) or text mining, you may have a broad pool of data, but at the high cost of a relatively large number of errors (1).

As a drug development researcher, you’re also familiar with freely available datasets from public ‘omics data repositories. You rely on them to help you gain insights for your preclinical programs. These open-source datasets aggregated in portals such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) contain data from thousands of samples used to validate or redirect the discovery of gene signatures, biomarkers and therapies. In theory, access to so much experimental data should be an asset. But, because the data are unintegrated and inconsistent, they are not directly usable. So in practice, it’s costly, time-consuming and utterly inefficient to spend hours sifting through these portals to find the information required to clean up these data so you can use them.

Data you can use right away

Imagine how transformative it would be if you had direct access to ‘usable data’ that you could immediately understand and work with, without searching for additional information or having to clean and structure it. Data that is comprehensive yet accurate, reliable and analysis-ready. Data you can right away begin to convert into knowledge to drive your biomedical discoveries.

Creating usable data

Data curation has become an essential requirement in producing usable data. Data scientists spend an estimated 80% of their time collecting, cleaning and processing data, leaving less than 20% of their time for analyzing the data to generate insights (2,3). But data curation is not just time-consuming. It’s costly and challenging to scale as well, particularly if legacy datasets must be revised to match updated curation standards.

What if there were a team of experts to take on the manual curation of the data you need so researchers like you could focus on making discoveries?

Our experts have been curating biomedical and clinical data for over 25 years. We’ve made massive investments in a biomedical and clinical knowledge base that contains millions of manually reviewed findings from the literature, plus information from commonly used third-party databases and ‘omics dataset repositories. Our human-certified data enables you to generate insights rather than collect and clean data. With our knowledge and databases, scientists like you can generate high-quality, novel hypotheses quickly and efficiently while using innovative and advanced approaches, including artificial intelligence.

Figure 1. Our workflow for processing 'omics data.

4 advantages of manually curated data

Our 200 dedicated curation experts follow these seven best practices for manual curation. Why do we apply so much manual effort to data curation? Based on our principles and practices for manual curation, here are the top reasons manually curated data is fundamental to your research success:

1. Metadata fields are unified, not redundant

Author-submitted metadata vary widely. Manual curation of field names can enforce alignment to a set of well-defined standards. Our curators identify hundreds of columns containing frequently-used information across studies and combine these data into unified columns to enhance cross-study analyses. This unification is evident in our TCGA metadata dictionary unification is evident in our TCGA metadata dictionary, for example, where we unified into a single field the five different fields that were used to indicate TCGA samples with a cancer diagnosis of a first-degree family member.

2. Data labels are clear and consistent

Unfortunately, it’s common that published datasets provide vague abbreviations as labels for patient groups, tissue type, drugs or other main elements. If you want to develop successful hypotheses from these data, it’s critical you understand the intended meaning and relationship among labels. Our curators take the time to investigate each study and precisely and accurately apply labels so that you can group and compare the data in the study with other relevant studies.

3. Additional contextual information and analysis

Properly labeled data enables scientifically meaningful comparisons between sample groups to reveal biomarkers. Our scientists are committed to expert manual curation and scientific review, which includes generating statistical models to reveal differential expression patterns. In addition to calculating differential expression between sample groups defined by the authors, our scientists perform custom statistical comparisons to support additional insights from the data.

4. Author errors are detected

No matter how consistent data labels are, NLP processes cannot identify misassigned sample groups, and such errors are devastating to data analysis. Unfortunately, it’s not unheard of that data are rendered uninterpretable due to conflicts in sample labeling presented in a publication versus its corresponding entry in a public ‘omics data repository. As shown in Figure 2, for a given Patient ID, both ‘Age’ and ‘Genetic Subtype’ are mismatched between the study’s GEO entry and publication table; which sample labels are correct? Our curators identify these issues and work with authors to correct errors before including the data in our databases.

Figure 2. In this submission to NCBI GEO, the ages of the various patients conflict between the GEO submission and the associated publication. What’s more, the genetic subtype labels are mixed up. Without resolving these errors, the data cannot be used. This attention to detail is required, and can only be achieved with manual curation.

At the core of our curation process, curators apply scientific expertise, controlled vocabularies and standardized formatting to all applicable metadata. The result is that you can quickly and easily find all applicable samples across data sources using simplified search criteria.

Dig deeper into the value of QIAGEN Digital Insights’ manual curation process

Ready to incorporate into your research the reliable biomedical, clinical and ‘omics data we’ve developed using manual curation best practices?  Explore our QIAGEN knowledge and databases, and request a consultation to find out how our manually curated data will save you time and enable you to develop quicker, more reliable hypotheses. Learn more about the costs of free data in our industry report and download our unique and comprehensive metadata dictionary of clinical covariates to experience first-hand just how valuable manual curation really is.

References:

Callahan TJ, Tripodi IJ, Pielke-Lombardo H, Hunter LE. Knowledge-based biomedical data science. Annu Rev Biomed Data Sci. 2020; 3:23–41.
Sarih, A. P. Tchangani, K. Medjaher and E. Pere Data preparation and preprocessing for broadcast systems monitoring in PHM framework. 6th International Conference on Control, Decision and Information Technologies (CoDIT). 2019; 1444–1449.
Big data to good data: Andrew Ng urges ML community to be more data-centric and less model-centric (06/04/2021) https://analyticsindiamag.com/big-data-to-good-data-andrew-ng-urges-ml-community-to-be-more-data-centric-and-less-model-centric/

Share on:

4 reasons you should use manually curated data

Why manually curated data is essential to convert data into knowledge

Categories

Categories

Upcoming Webinars

Follow Us

4 reasons you should use manually curated data

Why manually curated data is essential to convert data into knowledge

Categories

Categories

Related Posts

Upcoming Webinars

Follow Us

Contact Us