How expert-curated cancer data from COSMIC and HSMD can help biopharmaceutical researchers identify and validate targets faster and optimize clinical trial design.
In cancer drug discovery and development, data is king. From identifying potential molecular targets to helping predict drug toxicity and optimizing clinical trial design, high-quality data can significantly improve the efficiency and success rate of bringing new cancer therapies to market.
The Catalogue Of Somatic Mutations In Cancer (COSMIC) and the Human Somatic Mutation Database (HSMD) are two expert-curated somatic databases exclusively licensed through QIAGEN that enable biopharmaceutical researchers to avoid pitfalls in early cancer drug discovery, confidently qualify candidate drug targets, and accelerate indication expansion and repurposing of existing cancer therapies.
In this blog, we take a closer look at COSMIC and HSMD for biopharmaceutical research, providing an overview of the expert curation processes, what types of data can be found in each database, and examples of how this data can be applied through the cancer drug discovery and development pipeline.
COSMIC is an expert-curated knowledge base providing data on somatic variants in cancer, supported by a comprehensive suite of tools for interpreting genomic data, discerning the impact of somatic alterations on disease, and facilitating translational research. The catalogue is accessed and used by thousands of cancer and biopharmaceutical researchers and clinicians daily, allowing them to quickly access information from an immense pool of data curated from over 29 thousand scientific publications and large studies.
COSMIC integrates somatic data from multiple sources published around the world and allows researchers to access and scrutinize information about somatic mutations and their impact in cancer. Over the past two decades, COSMIC has been diligently collecting, cleaning, and organizing genomic data and associated metadata from cancer studies published in scientific literature and various bioinformatics sources. This data is then translated into a standardized format, integrated, and made available to the research community through well-structured datasets and user-friendly data exploration websites and tools.
In addition to the main catalogue of somatic mutations, a further 6 accompanying resources focus on different aspects of oncology (Figure 1). The Cancer Gene Census (CGC) and Cancer Mutation Census (CMC) provide additional annotations regarding the roles of genes and mutations in oncogenesis, which are based on a defined set of rules and sufficient evidence obtained through dedicated literature curation and analysis of the content of the core catalogue.
→ View the complete database numbers in the latest COSMIC v99 (December 2023) here.
Figure 1. COSMIC’s 7 key resources for understanding cancer and improving cancer patient care. The main catalogue of somatic mutations is supported by further six resources that together lay additional layers of knowledge helping to interpret the impact of somatic mutations on cancer development and presenting available therapeutic options (graphic from Sondka et al. 2024).
COSMIC’s workflows to manually curate cancer genetic data have been built to deliver high-quality, biologically and clinically-relevant data to the research community. Different data sources and types of curated data require different approaches (Figure 2). However, in each case there are common core elements.
Figure 2. COSMIC data curation flowchart. Depending on the data source and curation objectives, there are three main curation paths in COSMIC (graphic from Sondka et al. 2024).
HSMD is a web-based application that allows biopharmaceutical researchers and clinical NGS testing labs to harness genetic insights from QIAGEN’s real-world oncology dataset combined with knowledge from two decades of expert curation.
In the latest version of HSMD, the resource focuses on providing deep insight into small variants, such as SNVs, indels, frameshifts, fusions and copy number variants that have been clinically observed or curated from scientific literature to help users better understand and define precise function and actionability. This expert-curated resource contains content from over 547,000 real-world clinical oncology cases combined with content from the QIAGEN Knowledge Base (QKB), providing gene-level, alteration-level, and disease-level information.
HSMD enables users to easily search and explore mutational characteristics across genes, synthesize key findings from drug labels, clinical trials, and professional guidelines, and receive detailed annotations for each observed variant (Figure 3).
Figure 3. HSMD home screen. HSMD enables users to search by gene, alteration, disease, drugs, and clinical trials.
HSMD leverages variant content from two sources: expert-curated content from the QIAGEN Knowledge Base (QKB) and data from real-world oncology cases sourced from our professional clinical interpretation services (Figure 4).
When a variant has been “clinically observed,” it means our professional clinical interpretation service has encountered this alteration in a real-world clinical case. For these variants, QIAGEN's team has assessed the clinical and biological relevance and calculated the gene and variant prevalence across observed tumor types. Conversely, content from the QKB is proactively curated from scientific literature; therefore, not all variants have yet been directly clinically observed by our professional clinical interpretation services.
Figure 4. HSMD curation workflow. HSMD contains content from the QKB, which pulls information from all public and proprietary databases, clinical articles for the most relevant cancer genes, and thousands of clinical articles for somatic genes. Curation then occurs by artificial intelligence (AI) approaches, manual curation, or a combination of both. All content then goes through rigorous quality control to ensure consistency, accuracy, and reproducibility. In addition, HSMD contains content from over 500,000 somatic mutations submitted to QIAGEN's professional variant interpretation service, QCI Precision Insights (formerly N-of-One). This is de-identified patient data that provides even greater insight into real-world clinical cases.
COSMIC and HSMD are two expert-curated databases licensed exclusively through QIAGEN that enable biopharmaceutical companies to improve the drug discovery process, develop more effective clinical trials, and enhance the treatment of rare cancers. To learn more about how your research team can use COSMIC and HSMD, visit our product webpage or click the button below for a free trial and personal consultation with our biopharmaceutical research experts.
COSMIC & HSMD FOR BIOPHARMA
REQUEST FREE TRIAL
Two expert-curated databases exclusively licensed through QIAGEN link sequence-level somatic mutation data to detailed molecular information about functional and clinical impacts, as well as implications for druggability and relevant clinical trials. The two databases, the Catalogue Of Somatic Mutations In Cancer (COSMIC) and the Human Somatic Mutation Database (HSMD), enable biopharmaceutical researchers to avoid pitfalls in early cancer drug discovery and development, confidently qualify candidate drug targets, and accelerate indication expansion and repurposing of existing cancer therapies.
In this blog, learn more about the high-level applications of using COSMIC and HSMD in cancer drug discovery and development pipelines.
The Catalogue Of Somatic Mutations In Cancer (COSMIC) is the most detailed and comprehensive resource for exploring the effect of somatic mutations in human cancer. Developed and maintained by Wellcome Sanger Institute, the latest release, COSMIC v99 (December 2023), includes over 6 million coding mutations across 1.5 million tumor samples, curated from over 29,000 publications. In addition to coding mutations, COSMIC covers all the genetic mechanisms by which somatic mutations promote cancer, including non-coding mutations, gene fusions, copy-number variants and drug-resistance mutations.
COSMIC integrates somatic data from multiple sources published around the world and allows researchers to access and scrutinize information about somatic mutations and their impact in cancer. Over the past two decades, COSMIC, through predominantly manual curation workflows, has been diligently collecting, cleaning, and organizing genomic data and associated metadata from cancer studies published in scientific literature and various bioinformatics sources. This data is then translated into a standardized format, integrated, and made available to the research community through well-structured datasets and user-friendly data exploration websites and tools.
The Human Somatic Mutation Database (HSMD) is a relatively new somatic mutation database from QIAGEN (released in 2019) that combines over two decades of expert curation and data from scientific literature, on- and off-label therapies and clinical trials, and real-world clinical oncology cases. In the latest release, HSMD 3.0 (November 2023), the database contains manually curated, detailed molecular information on over 1.8 million somatic variants, with more than 430,000 observed in real clinical cases, as well as data from over 545,000 real-world clinical oncology cases.
Unique to HSMD is the availability of data from clinically observed variants. When a variant has been “clinically observed,” it means QIAGEN’s professional clinical interpretation service (previously N-of-One) has encountered this alteration in a real-world clinical case. For these variants, QIAGEN assesses the clinical and biological relevance and calculates the gene and variant prevalence across observed tumor types.
Easy to search with new content added weekly, HSMD enables researchers to explore key genes or mutations with driving properties or clinical relevance and search for associated treatment options, off-label therapies, resistance markers, and regional and/or disease-specific clinical trials.
While similar, COSMIC and HSMD differ in their applications for cancer drug discovery and development. As a result, biopharmaceutical researchers can use both databases to support different workflow stages.
COSMIC is a valuable resource for cancer researchers and drug discovery efforts. Here are several ways in which the COSMIC database can be used to support exploratory research in cancer drug discovery:
HSMD is a valuable resource for biopharmaceutical researchers, facilitating the confident evaluation of cancer-related genetic variations by granting access to real-world data. Here are several ways in which HSMD supports cancer drug clinical development and post-market research.
COSMIC and HSMD are two expert-curated databases licensed exclusively through QIAGEN that enable biopharmaceutical companies to improve the drug discovery process, develop more effective clinical trials, and enhance the treatment of rare cancers. To learn more about how your research team can use COSMIC and HSMD, visit our product webpage or click the button below for a free trial and personal consultation with our biopharmaceutical research experts.
COSMIC & HSMD FOR BIOPHARMA
REQUEST FREE TRIAL
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. When it was created in 2004 by researchers with the Cancer Genome Project in conjunction with the Sanger Institute, COSMIC was set up with a big ambition–to be the source of all cancer genomic knowledge.
Today, COSMIC contains nearly 24 million genomic variants across 6,800 precise forms of human cancer. It is the most expansive, expert-curated knowledge hub available for somatic NGS data analysis and interpretation. From molecular pathologists matching mutations to targeted therapies to bioinformaticians looking for patterns of DNA mutations in cancer cells, COSMIC is an excellent resource for identifying and understanding cancer mutations.
Now, as the demand for precision oncology increases, so does the need for a comprehensive cancer genomic knowledge base. Here are 5 reasons why you should be using COSMIC for biopharmaceutical research.
Precision is crucial in developing biopharmaceuticals. Unlike other somatic databases, COSMIC is meticulously and rigorously curated by a team of highly trained, PhD-level experts. This manual curation—the gold standard in genomic curation—ensures that every data point undergoes human scrutiny, giving scientists unparalleled confidence in the accuracy and consistency of the data they rely on. Through comprehensive literature searches, COSMIC’s experts have curated, standardized, and cataloged mutation data, phenotype information, and clinical details from over 1.5 million cancer samples and 29,000 peer-reviewed papers to date.
COSMIC provides an unmatched level of traceability for every data point, empowering scientists with transparency and fostering confidence in the presented evidence. With COSMIC, there is no 'black box'; each piece of information can be traced back to its source, providing users with complete visibility into its origins. This complete transparency is invaluable for biopharmaceutical scientists, especially when dealing with rare variants or variants of unknown significance. In these cases, users can independently assess each piece of data, exercising their judgment on whether to agree or disagree with COSMIC’s data for a particular variant.
In the pursuit of precision oncology, biopharmaceutical scientists must address a wide range of questions about somatic alterations as druggable targets. COSMIC stands as the largest repository of comprehensive genomic, phenotypic, and mutational characteristics of cancers. With COSMIC, you can obtain the most exhaustive information available on mutations associated with a specific cancer type, the frequency and tumor distribution of a specific alteration, driver oncogenic events, candidate therapeutic targets, and much more.
Furthermore, COSMIC’s Actionability functionality assists scientists in tracking and exploring drugs in various stages of development, monitoring the progress of clinical trials, and investigating drugs repurposed to target specific mutations.
And unlike other databases relying on volunteers, COSMIC is continually updated by its team of dedicated expert scientists, ensuring you have access to the accurate and up-to-date insights necessary to advance your translational research efforts.
In the dynamic field of biopharmaceuticals, adaptability is essential. COSMIC offers exceptional flexibility, enabling users to customize their data mining, visualization, and manipulation processes. COSMIC can be seamlessly integrated into your IT systems, allowing automatic updates or scheduled integration of newly released datasets to align with your individual workflow. COSMIC also allows you to customize filters according to your pipeline and fully integrate its data with proprietary databases to obtain a single comprehensive view. With COSMIC, you can easily align the data precisely with your unique research processes, enhancing your ability to extract actionable insights.
In biopharmaceutical research, credibility is earned through adoption. Over 50,000 molecular pathology labs, clinicians, bioinformaticians, and researchers worldwide trust and use COSMIC. It has also been cited in over 10,000 publications. Its extensive usage attests to its accuracy, consistency, and reliability. Recognized in the AMP/ASCO/CAP guidelines as a foundational evidence source for somatic variant assessments, COSMIC allows biopharmaceutical scientists to align their work with the highest standards in the field.
Trying COSMIC in your lab is easy. Simply visit the official COSMIC website, scroll to the bottom of the page, and "Request A Demo". One of our experts will contact you immediately about scheduling a free demo of COSMIC using your lab's data.