How expert-curated cancer data from COSMIC and HSMD can help biopharmaceutical researchers identify and validate targets faster and optimize clinical trial design.
In cancer drug discovery and development, data is king. From identifying potential molecular targets to helping predict drug toxicity and optimizing clinical trial design, high-quality data can significantly improve the efficiency and success rate of bringing new cancer therapies to market.
The Catalogue Of Somatic Mutations In Cancer (COSMIC) and the Human Somatic Mutation Database (HSMD) are two expert-curated somatic databases exclusively licensed through QIAGEN that enable biopharmaceutical researchers to avoid pitfalls in early cancer drug discovery, confidently qualify candidate drug targets, and accelerate indication expansion and repurposing of existing cancer therapies.
In this blog, we take a closer look at COSMIC and HSMD for biopharmaceutical research, providing an overview of the expert curation processes, what types of data can be found in each database, and examples of how this data can be applied through the cancer drug discovery and development pipeline.
COSMIC is an expert-curated knowledge base providing data on somatic variants in cancer, supported by a comprehensive suite of tools for interpreting genomic data, discerning the impact of somatic alterations on disease, and facilitating translational research. The catalogue is accessed and used by thousands of cancer and biopharmaceutical researchers and clinicians daily, allowing them to quickly access information from an immense pool of data curated from over 29 thousand scientific publications and large studies.
COSMIC integrates somatic data from multiple sources published around the world and allows researchers to access and scrutinize information about somatic mutations and their impact in cancer. Over the past two decades, COSMIC has been diligently collecting, cleaning, and organizing genomic data and associated metadata from cancer studies published in scientific literature and various bioinformatics sources. This data is then translated into a standardized format, integrated, and made available to the research community through well-structured datasets and user-friendly data exploration websites and tools.
In addition to the main catalogue of somatic mutations, a further 6 accompanying resources focus on different aspects of oncology (Figure 1). The Cancer Gene Census (CGC) and Cancer Mutation Census (CMC) provide additional annotations regarding the roles of genes and mutations in oncogenesis, which are based on a defined set of rules and sufficient evidence obtained through dedicated literature curation and analysis of the content of the core catalogue.
→ View the complete database numbers in the latest COSMIC v99 (December 2023) here.
Figure 1. COSMIC’s 7 key resources for understanding cancer and improving cancer patient care. The main catalogue of somatic mutations is supported by further six resources that together lay additional layers of knowledge helping to interpret the impact of somatic mutations on cancer development and presenting available therapeutic options (graphic from Sondka et al. 2024).
COSMIC’s workflows to manually curate cancer genetic data have been built to deliver high-quality, biologically and clinically-relevant data to the research community. Different data sources and types of curated data require different approaches (Figure 2). However, in each case there are common core elements.
Figure 2. COSMIC data curation flowchart. Depending on the data source and curation objectives, there are three main curation paths in COSMIC (graphic from Sondka et al. 2024).
HSMD is a web-based application that allows biopharmaceutical researchers and clinical NGS testing labs to harness genetic insights from QIAGEN’s real-world oncology dataset combined with knowledge from two decades of expert curation.
In the latest version of HSMD, the resource focuses on providing deep insight into small variants, such as SNVs, indels, frameshifts, fusions and copy number variants that have been clinically observed or curated from scientific literature to help users better understand and define precise function and actionability. This expert-curated resource contains content from over 547,000 real-world clinical oncology cases combined with content from the QIAGEN Knowledge Base (QKB), providing gene-level, alteration-level, and disease-level information.
HSMD enables users to easily search and explore mutational characteristics across genes, synthesize key findings from drug labels, clinical trials, and professional guidelines, and receive detailed annotations for each observed variant (Figure 3).
Figure 3. HSMD home screen. HSMD enables users to search by gene, alteration, disease, drugs, and clinical trials.
HSMD leverages variant content from two sources: expert-curated content from the QIAGEN Knowledge Base (QKB) and data from real-world oncology cases sourced from our professional clinical interpretation services (Figure 4).
When a variant has been “clinically observed,” it means our professional clinical interpretation service has encountered this alteration in a real-world clinical case. For these variants, QIAGEN's team has assessed the clinical and biological relevance and calculated the gene and variant prevalence across observed tumor types. Conversely, content from the QKB is proactively curated from scientific literature; therefore, not all variants have yet been directly clinically observed by our professional clinical interpretation services.
Figure 4. HSMD curation workflow. HSMD contains content from the QKB, which pulls information from all public and proprietary databases, clinical articles for the most relevant cancer genes, and thousands of clinical articles for somatic genes. Curation then occurs by artificial intelligence (AI) approaches, manual curation, or a combination of both. All content then goes through rigorous quality control to ensure consistency, accuracy, and reproducibility. In addition, HSMD contains content from over 500,000 somatic mutations submitted to QIAGEN's professional variant interpretation service, QCI Precision Insights (formerly N-of-One). This is de-identified patient data that provides even greater insight into real-world clinical cases.
COSMIC and HSMD are two expert-curated databases licensed exclusively through QIAGEN that enable biopharmaceutical companies to improve the drug discovery process, develop more effective clinical trials, and enhance the treatment of rare cancers. To learn more about how your research team can use COSMIC and HSMD, visit our product webpage or click the button below for a free trial and personal consultation with our biopharmaceutical research experts.