Two expert-curated databases exclusively licensed through QIAGEN link sequence-level somatic mutation data to detailed molecular information about functional and clinical impacts, as well as implications for druggability and relevant clinical trials. The two databases, the Catalogue Of Somatic Mutations In Cancer (COSMIC) and the Human Somatic Mutation Database (HSMD), enable biopharmaceutical researchers to avoid pitfalls in early cancer drug discovery and development, confidently qualify candidate drug targets, and accelerate indication expansion and repurposing of existing cancer therapies.
In this blog, learn more about the high-level applications of using COSMIC and HSMD in cancer drug discovery and development pipelines.
The Catalogue Of Somatic Mutations In Cancer (COSMIC) is the most detailed and comprehensive resource for exploring the effect of somatic mutations in human cancer. Developed and maintained by Wellcome Sanger Institute, the latest release, COSMIC v99 (December 2023), includes over 6 million coding mutations across 1.5 million tumor samples, curated from over 29,000 publications. In addition to coding mutations, COSMIC covers all the genetic mechanisms by which somatic mutations promote cancer, including non-coding mutations, gene fusions, copy-number variants and drug-resistance mutations.
COSMIC integrates somatic data from multiple sources published around the world and allows researchers to access and scrutinize information about somatic mutations and their impact in cancer. Over the past two decades, COSMIC, through predominantly manual curation workflows, has been diligently collecting, cleaning, and organizing genomic data and associated metadata from cancer studies published in scientific literature and various bioinformatics sources. This data is then translated into a standardized format, integrated, and made available to the research community through well-structured datasets and user-friendly data exploration websites and tools.
The Human Somatic Mutation Database (HSMD) is a relatively new somatic mutation database from QIAGEN (released in 2019) that combines over two decades of expert curation and data from scientific literature, on- and off-label therapies and clinical trials, and real-world clinical oncology cases. In the latest release, HSMD 3.0 (November 2023), the database contains manually curated, detailed molecular information on over 1.8 million somatic variants, with more than 430,000 observed in real clinical cases, as well as data from over 545,000 real-world clinical oncology cases.
Unique to HSMD is the availability of data from clinically observed variants. When a variant has been “clinically observed,” it means QIAGEN’s professional clinical interpretation service (previously N-of-One) has encountered this alteration in a real-world clinical case. For these variants, QIAGEN assesses the clinical and biological relevance and calculates the gene and variant prevalence across observed tumor types.
Easy to search with new content added weekly, HSMD enables researchers to explore key genes or mutations with driving properties or clinical relevance and search for associated treatment options, off-label therapies, resistance markers, and regional and/or disease-specific clinical trials.
While similar, COSMIC and HSMD differ in their applications for cancer drug discovery and development. As a result, biopharmaceutical researchers can use both databases to support different workflow stages.
COSMIC is a valuable resource for cancer researchers and drug discovery efforts. Here are several ways in which the COSMIC database can be used to support exploratory research in cancer drug discovery:
HSMD is a valuable resource for biopharmaceutical researchers, facilitating the confident evaluation of cancer-related genetic variations by granting access to real-world data. Here are several ways in which HSMD supports cancer drug clinical development and post-market research.
COSMIC and HSMD are two expert-curated databases licensed exclusively through QIAGEN that enable biopharmaceutical companies to improve the drug discovery process, develop more effective clinical trials, and enhance the treatment of rare cancers. To learn more about how your research team can use COSMIC and HSMD, visit our product webpage or click the button below for a free trial and personal consultation with our biopharmaceutical research experts.
COSMIC & HSMD FOR BIOPHARMA
REQUEST FREE TRIAL
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer. When it was created in 2004 by researchers with the Cancer Genome Project in conjunction with the Sanger Institute, COSMIC was set up with a big ambition–to be the source of all cancer genomic knowledge.
Today, COSMIC contains nearly 24 million genomic variants across 6,800 precise forms of human cancer. It is the most expansive, expert-curated knowledge hub available for somatic NGS data analysis and interpretation. From molecular pathologists matching mutations to targeted therapies to bioinformaticians looking for patterns of DNA mutations in cancer cells, COSMIC is an excellent resource for identifying and understanding cancer mutations.
Now, as the demand for precision oncology increases, so does the need for a comprehensive cancer genomic knowledge base. Here are 5 reasons why you should be using COSMIC for biopharmaceutical research.
Precision is crucial in developing biopharmaceuticals. Unlike other somatic databases, COSMIC is meticulously and rigorously curated by a team of highly trained, PhD-level experts. This manual curation—the gold standard in genomic curation—ensures that every data point undergoes human scrutiny, giving scientists unparalleled confidence in the accuracy and consistency of the data they rely on. Through comprehensive literature searches, COSMIC’s experts have curated, standardized, and cataloged mutation data, phenotype information, and clinical details from over 1.5 million cancer samples and 29,000 peer-reviewed papers to date.
COSMIC provides an unmatched level of traceability for every data point, empowering scientists with transparency and fostering confidence in the presented evidence. With COSMIC, there is no 'black box'; each piece of information can be traced back to its source, providing users with complete visibility into its origins. This complete transparency is invaluable for biopharmaceutical scientists, especially when dealing with rare variants or variants of unknown significance. In these cases, users can independently assess each piece of data, exercising their judgment on whether to agree or disagree with COSMIC’s data for a particular variant.
In the pursuit of precision oncology, biopharmaceutical scientists must address a wide range of questions about somatic alterations as druggable targets. COSMIC stands as the largest repository of comprehensive genomic, phenotypic, and mutational characteristics of cancers. With COSMIC, you can obtain the most exhaustive information available on mutations associated with a specific cancer type, the frequency and tumor distribution of a specific alteration, driver oncogenic events, candidate therapeutic targets, and much more.
Furthermore, COSMIC’s Actionability functionality assists scientists in tracking and exploring drugs in various stages of development, monitoring the progress of clinical trials, and investigating drugs repurposed to target specific mutations.
And unlike other databases relying on volunteers, COSMIC is continually updated by its team of dedicated expert scientists, ensuring you have access to the accurate and up-to-date insights necessary to advance your translational research efforts.
In the dynamic field of biopharmaceuticals, adaptability is essential. COSMIC offers exceptional flexibility, enabling users to customize their data mining, visualization, and manipulation processes. COSMIC can be seamlessly integrated into your IT systems, allowing automatic updates or scheduled integration of newly released datasets to align with your individual workflow. COSMIC also allows you to customize filters according to your pipeline and fully integrate its data with proprietary databases to obtain a single comprehensive view. With COSMIC, you can easily align the data precisely with your unique research processes, enhancing your ability to extract actionable insights.
In biopharmaceutical research, credibility is earned through adoption. Over 50,000 molecular pathology labs, clinicians, bioinformaticians, and researchers worldwide trust and use COSMIC. It has also been cited in over 10,000 publications. Its extensive usage attests to its accuracy, consistency, and reliability. Recognized in the AMP/ASCO/CAP guidelines as a foundational evidence source for somatic variant assessments, COSMIC allows biopharmaceutical scientists to align their work with the highest standards in the field.
Trying COSMIC in your lab is easy. Simply visit the official COSMIC website, scroll to the bottom of the page, and "Request A Demo". One of our experts will contact you immediately about scheduling a free demo of COSMIC using your lab's data.
Here's the scenario - You're a passionate scientist in a leading pharmaceutical company hoping to uncover a transformative drug candidate. Naturally, you use artificial intelligence (AI) to help you target the most promising leads. After weeks of dedicated work, you start to realize that something seems a little off with your results. Maybe you recognize that the algorithm-proposed drug candidate has a history of poor tolerance in human clinical trials. Or perhaps the drug candidate fails to reproduce even the most basic PK/PD modeling results in vitro. Just like you, many drug discovery researchers have found themselves misled by the results proposed by AI.
Even with state-of-the-art algorithms, outcomes of AI for drug discovery heavily depend on the data and context backing them up. Many researchers, just like you, are seeking ways to navigate the intricacies and challenges of this rapidly evolving field. The path to successful AI-driven drug discovery may appear complex, but with the right guidance, AI can significantly enhance both the efficiency and effectiveness of your drug discovery journey.
Here are 3 of our best secrets to help ensure your success when using AI for drug discovery:
1. Start with quality data
The foundation of any successful AI model lies in the quality of its training data. Inconsistent or noisy biomedical data can introduce biases, potentially making the AI model veer off course. Imagine trying to master a language using an inaccurate dictionary; the outcome would be a garbled mess.
Similarly, training an AI model on low-quality biomedical data can lead to misguided conclusions. Data quality, integrity and relevance are paramount. Using expert-curated databases ensures the model begins with accurate and comprehensive knowledge.
That's where our QIAGEN Biomedical Knowledge Base (BKB) database comes in. Curated by experts and continuously updated, QIAGEN BKB ensures you equip your AI models with the best possible start. It offers a strong foundation for building knowledge graphs and data models. Just as a building's strength depends on its foundation, your AI model's efficacy depends on starting with quality data.
2. Root AI inferences in real biological contexts
The power of AI lies in its ability to process vast amounts of information quickly. But it's worth remembering that an AI model, regardless of its sophistication, doesn't inherently understand the complexities of human biology. It sees numbers, patterns and correlations but not causations.
An AI model might draw associations that, at a glance, seem significant. However, without the biological context, these associations can be misleading. To avoid chasing after false positives, it's crucial to ensure the AI's conclusions are rooted within the biological realities.
Here's the good news: QIAGEN BKB and QIAGEN Ingenuity Pathway Analysis (IPA) have built-in causality. With IPA you can quickly check the conclusions your AI generates. IPA's intuitive GUI interface provides visual pathways, disease networks, upstream regulators/downstream effects and isoform-level differential expression analysis, all with the ability to bring in primary datasets for custom-tailored analyses.
3. Validate findings with peer-reviewed research
Science, at its core, thrives on collaboration, verification and iteration. A discovery today can be the stepping stone for a revolutionary breakthrough tomorrow. AI can be a potent tool in accelerating these discoveries, but its suggestions need validation.
While using AI for drug discovery can uncover potential candidates, it's essential to validate these findings using published, peer-reviewed studies. Not only does this process lend credibility to your findings, but it also provides invaluable insights. For instance, understanding which cell lines have been used in previous studies can guide your preclinical testing, ensuring you're on the right track.
For this crucial step, QIAGEN OmicSoft's curated omics data collection is your ally, especially for enterprises in need of high-quality multi-omics datasets. You can tap into a comprehensive landscape of sources, offering validation from published studies beyond just a single public repository. Such validation lends credibility to your discoveries and provides invaluable insights. QIAGEN OmicSoft's curated omics data collection facilitates this crucial step, bridging the gap between AI predictions and experimental data to construct disease models and digital twins of cells/organs/organisms.
Validating your cell line selection is also a critical factor for successful preclinical research. Using ATCC Cell Line Land, you can access authenticated cell line ‘omics data to make informed decisions before purchasing cell lines, helping to streamline your workflows, save time and resources, and enhance the predictability and reproducibility of your studies.
You can be confident in steering your research in the right direction with AI, provided you eliminate guesswork and maximize efficiency by using quality biomedical data, ensure biological soundness of AI results and validate your findings. By applying these three powerful tweaks to your AI, you'll surely revolutionize your drug discovery by spotting promising leads much quicker.
We design our QIAGEN Digital Insights knowledge and software with your success in mind.
After all, the future of new therapies is waiting, and we want to ensure you're well-equipped to lead the way. Want to uncover more secrets to drive drug discovery success from our experts?
Continue reading to see how QIAGEN can power your research.
Looking to collaborate further? Fast track your analysis with QIAGEN Discovery Bioinformatics Services.
Some human studies may be unfeasible or unethical, making cross-species research critical for drug discovery and biomarker validation. Cross-species research is a crucial method to collect data to examine potential toxicity for a candidate drug, determine the efficacious doses that may be suitable for humans, identify potential biomarkers for a disease of interest or a therapeutic response and understand the mechanisms of disease or treatment. While animal models and humans have similar anatomy and physiology, the subtle differences among organisms in the animal kingdom need to be considered and data collected must be interpreted using a meaningful method.
Using QIAGEN IPA, you can perform comparative analyses across various animal models, even combining different time points, treatments, tissues and cell types with data generated from a wide variety of ‘omics technologies (RNA-seq, scRNA-seq, proteomics, metabolomics, etc.). In this training, you will learn how to:
1. Generate activity heatmaps and expression charts comparing different pathways and regulatory networks across different species
2. Use Activity Plot, Pattern Search and Analysis Match to compare your own data against thousands of public data pre-curated and pre-analyzed representing an array of disease states, conditions and other biological conditions
3. Create expression and correlation plots using pre-curated and pre-analyzed public data to validate and confirm findings derived from a comparison analysis
Please consider reviewing the below tutorials before this meeting.
https://qiagen.showpad.com/share/SQjinvdxIs1iGhL8H3eOd
Research and development for new drugs and disease treatments can be lengthy and costly. Indication expansion can help broaden the impact of a new drug that has already been through the arduous R&D process for a disease or cancer. Drug repurposing can take this concept and expand on it by looking for other diseases with similar drug target biology. The logic is if they share similar target biology, they may benefit from the same treatment.
During this training, we'll cover skills such as:
• Querying QIAGEN OmicSoft Lands data from sources like TCGA or ICGC and exploring the incidence of a cancer-driving somatic mutation that is targeted by a treatment
• Creating a cohort of patients within OmicSoft with a disease-causing mutation and wild type for the gene of interest
• Generating survival curves for each mutant and wild type group for various indications in the relevant OmicSoft Land
• Using QIAGEN Ingenuity Pathway Analysis (IPA) to generate a mechanism of action network for the drug's target
• Exploring various network overlay features to enable in silico testing and combination drug partner investigation
• Searching for publicly available datasets relevant to your chosen indication
• Comparing the expression profile from our disease state with other publicly available analyses to find other indications or diseases that share similar biology
You need biomedical relationships knowledge for innovative data- and analytics-driven drug discovery. Yet this knowledge is locked in thousands of publications and dozens of databases. Collecting, structuring and integrating this knowledge is a challenging task that is time- and resource-consuming.
What if you could break knowledge silos and confidently power your drug discovery with data science using a high-quality and industry-validated source of structured and integrated biomedical relationships?
We are excited to introduce QIAGEN Biomedical Knowledge Base, the leading knowledge about biomedical relationships, manually structured and integrated from thousands of sources by experts. It is a vast collection of diverse causal relationships between genes, diseases, drugs, targets, functions, toxicological processes and more, all of which are enriched with full context. QIAGEN Biomedical Knowledge Base delivers high-quality data ideally suited for major data science-driven drug discovery applications. These include knowledge graph construction and analysis, analytics- and artificial intelligence (AI)-driven target identification and drug repositioning, development of target, disease and drug intelligence portals, disease subtype and biomarker identification and many more.
QIAGEN Biomedical Knowledge Base fuels QIAGEN Ingenuity Pathway Analysis (IPA), our premier ‘omics data analysis and interpretation software. This is data you know well, and now you can access it directly.
"For over 20 years, we have been assembling the world's leading source of molecular knowledge and data used to inform decisions from bench to bedside. This knowledge and data power market-leading products such as QIAGEN IPA, QIAGEN OmicSoft, QCI Interpret and online databases like HGMD and HSMD," said Dr. Jonathan Sheldon, Senior VP of QIAGEN Digital Insights. "Previously, our focus was to make our knowledge and data solely accessible through our industry-leading applications. Now, in addition, we are unlocking and giving the keys to our knowledge and data to fuel drug discovery with data science. The data is in a format and structure that makes it easy to integrate our reliable molecular data into data science projects within pharma and biotech."
Using QIAGEN Biomedical Knowledge Base, you’ll make biomedical discoveries that are:
See how QIAGEN Biomedical Knowledge Base empowers you to leverage biomedical knowledge graph analysis, fuel your data- and analytics-driven drug discovery and transform your research. Learn more and request your trial today.