You’re a data scientist working on drug discovery, watching the geography of your discipline shift dramatically with the introduction of accessible generative AI. It seems like everyone is talking about the incredible AI-powered future that awaits by using their tool. But is AI really that useful? What objective, concrete gains can you expect to see in your research?
You may be familiar with the QIAGEN Biomedical KB-HD, formerly known as the BKB. It’s the leading manually curated biomedical knowledge base, boasting over 24 million high-quality relationships created over 4000 human-years of manual work. But it takes time to deliver validated, full-context relationships.
Our new offering, the QIAGEN Biomedical KB-AI, uses generative AI data curation to create the largest biomedical knowledge graph on the market. Quality is great, but sometimes, you need the kind of quantity that only AI can enable. Let's get into it.
AI data curation helps create large, streamlined and timely knowledge graphs that cover more ground than any human-curated effort could. If you're looking for the whole picture, you might just find it in an AI-curated knowledge graph.
Here's the scenario - You're a passionate scientist in a leading pharmaceutical company hoping to uncover a transformative drug candidate. Naturally, you use artificial intelligence (AI) to help you target the most promising leads. After weeks of dedicated work, you start to realize that something seems a little off with your results. Maybe you recognize that the algorithm-proposed drug candidate has a history of poor tolerance in human clinical trials. Or perhaps the drug candidate fails to reproduce even the most basic PK/PD modeling results in vitro. Just like you, many drug discovery researchers have found themselves misled by the results proposed by AI.
Even with state-of-the-art algorithms, outcomes of AI for drug discovery heavily depend on the data and context backing them up. Many researchers, just like you, are seeking ways to navigate the intricacies and challenges of this rapidly evolving field. The path to successful AI-driven drug discovery may appear complex, but with the right guidance, AI can significantly enhance both the efficiency and effectiveness of your drug discovery journey.
Here are 3 of our best secrets to help ensure your success when using AI for drug discovery:
1. Start with quality data
The foundation of any successful AI model lies in the quality of its training data. Inconsistent or noisy biomedical data can introduce biases, potentially making the AI model veer off course. Imagine trying to master a language using an inaccurate dictionary; the outcome would be a garbled mess.
Similarly, training an AI model on low-quality biomedical data can lead to misguided conclusions. Data quality, integrity and relevance are paramount. Using expert-curated databases ensures the model begins with accurate and comprehensive knowledge.
That's where our QIAGEN Biomedical Knowledge Base (BKB) database comes in. Curated by experts and continuously updated, QIAGEN BKB ensures you equip your AI models with the best possible start. It offers a strong foundation for building knowledge graphs and data models. Just as a building's strength depends on its foundation, your AI model's efficacy depends on starting with quality data.
2. Root AI inferences in real biological contexts
The power of AI lies in its ability to process vast amounts of information quickly. But it's worth remembering that an AI model, regardless of its sophistication, doesn't inherently understand the complexities of human biology. It sees numbers, patterns and correlations but not causations.
An AI model might draw associations that, at a glance, seem significant. However, without the biological context, these associations can be misleading. To avoid chasing after false positives, it's crucial to ensure the AI's conclusions are rooted within the biological realities.
Here's the good news: QIAGEN BKB and QIAGEN Ingenuity Pathway Analysis (IPA) have built-in causality. With IPA you can quickly check the conclusions your AI generates. IPA's intuitive GUI interface provides visual pathways, disease networks, upstream regulators/downstream effects and isoform-level differential expression analysis, all with the ability to bring in primary datasets for custom-tailored analyses.
3. Validate findings with peer-reviewed research
Science, at its core, thrives on collaboration, verification and iteration. A discovery today can be the stepping stone for a revolutionary breakthrough tomorrow. AI can be a potent tool in accelerating these discoveries, but its suggestions need validation.
While using AI for drug discovery can uncover potential candidates, it's essential to validate these findings using published, peer-reviewed studies. Not only does this process lend credibility to your findings, but it also provides invaluable insights. For instance, understanding which cell lines have been used in previous studies can guide your preclinical testing, ensuring you're on the right track.
For this crucial step, QIAGEN OmicSoft's curated omics data collection is your ally, especially for enterprises in need of high-quality multi-omics datasets. You can tap into a comprehensive landscape of sources, offering validation from published studies beyond just a single public repository. Such validation lends credibility to your discoveries and provides invaluable insights. QIAGEN OmicSoft's curated omics data collection facilitates this crucial step, bridging the gap between AI predictions and experimental data to construct disease models and digital twins of cells/organs/organisms.
Validating your cell line selection is also a critical factor for successful preclinical research. Using ATCC Cell Line Land, you can access authenticated cell line ‘omics data to make informed decisions before purchasing cell lines, helping to streamline your workflows, save time and resources, and enhance the predictability and reproducibility of your studies.
You can be confident in steering your research in the right direction with AI, provided you eliminate guesswork and maximize efficiency by using quality biomedical data, ensure biological soundness of AI results and validate your findings. By applying these three powerful tweaks to your AI, you'll surely revolutionize your drug discovery by spotting promising leads much quicker.
We design our QIAGEN Digital Insights knowledge and software with your success in mind.
After all, the future of new therapies is waiting, and we want to ensure you're well-equipped to lead the way. Want to uncover more secrets to drive drug discovery success from our experts?
Continue reading to see how QIAGEN can power your research.
Looking to collaborate further? Fast track your analysis with QIAGEN Discovery Bioinformatics Services.
If you’re working in pharma or biotech, you likely rely on artificial intelligence (AI) to help you identify new drug targets or plausible biomarkers for disease within large data sets. Yet AI alone isn't enough. A large proportion of Biomedical data have errors and are unstructured. For AI models to provide reliable insights, the underlying data must be of ‘high quality’, meaning it’s accurate, comprehensive, up-to-date and standardized.
Jesper Ryge (Idorsia Pharmaceuticals), Alex Jarasch (Neo4j) and Venkatesh Moktali (QIAGEN Digital Insights) come together to showcase the practical applications of high-quality biomedical relationships data from the QIAGEN Biomedical Knowledge Base (BKB) to accelerate, improve and transform research in drug discovery and pharmaceutical development. By applying AI to a gene-disease knowledge graph, they identify promising drug targets and key mechanisms underlying diseases. A brief introduction to Neo4j shows how graph-centric analysis and visualizations facilitate the effective exploration of large knowledge graphs like BKB. This integration of high-quality curated data, AI-driven analysis and advanced visualization provides valuable insights and accelerates the progress of precision medicine.
In this webinar, you’ll learn how you can:
Don't miss this chance to learn how to supercharge your AI toolbox to transform your drug discovery.
If you’re working in pharma or biotech, you likely rely on artificial intelligence (AI) to help you identify new drug targets or plausible biomarkers for disease within large data sets. Yet AI alone isn't enough. A large proportion of Biomedical data have errors and are unstructured. For AI models to provide reliable insights, the underlying data must be of ‘high quality’, meaning it’s accurate, comprehensive, up-to-date and standardized.
Jesper Ryge (Idorsia Pharmaceuticals), Alex Jarasch (Neo4j) and Venkatesh Moktali (QIAGEN Digital Insights) come together to showcase the practical applications of high-quality biomedical relationships data from the QIAGEN Biomedical Knowledge Base (BKB) to accelerate, improve and transform research in drug discovery and pharmaceutical development. By applying AI to a gene-disease knowledge graph, they identify promising drug targets and key mechanisms underlying diseases. A brief introduction to Neo4j shows how graph-centric analysis and visualizations facilitate the effective exploration of large knowledge graphs like BKB. This integration of high-quality curated data, AI-driven analysis and advanced visualization provides valuable insights and accelerates the progress of precision medicine.
In this webinar, you’ll learn how you can:
Don't miss this chance to learn how to supercharge your AI toolbox to transform your drug discovery.
Biomedical relationships knowledge is now required for innovative data- and analytics-driven drug discovery. It powers biomedical knowledge graph analysis, artificial intelligence (AI)-driven target identification and many more applications.
In this one-hour training, you’ll get an introduction to QIAGEN Biomedical Knowledge Base. You’ll learn how to tackle applications you can’t achieve with the QIAGEN Ingenuity Pathway Analysis (IPA) graphical user interface, or which can be done quicker and with more flexibility when performed programmatically. You’ll learn how to perform queries such as:
• Quickly find the shortest connections between genes/proteins/metabolites of interest in the context of a specific disease
• Systematically build a network using a short list of genes/proteins/metabolites/chemicals
• Recreate a drug mechanism of action
Please note: Based on the feedback we receive from you, the registrants, we may modify the topics we cover to ensure we discuss material that’s most relevant to you.
If you’re working in pharma or biotech, artificial intelligence (AI) is no stranger. You likely use it to help you identify new targets to explore for a therapeutic area, for drug repurposing or to identify plausible biomarkers for your disease of interest. You may think using AI is enough and will have all the answers if there are enough data. However, there’s a big problem with that assumption.
Limitations of AI-derived biomedical data
Biomedical data have errors and are mainly unstructured. So, removing errors and structuring the data to make them usable to address specific questions is essential, yet far beyond current natural language processing (NLP) approaches and generative AI models with large memories. So for AI models to provide insight, the underlying data must be based on ‘high-quality’ data. High-quality means it’s got to be accurate, yet also complete and comprehensive, up-to-date and standardized.
To complicate matters, scientific knowledge evolves daily, and the genetic basis of hundreds of diseases are identified each year. So the amount of biomedical data is constantly growing and, well… there’s a lot of it. Yet we still don’t know what 99% of our DNA even does. So with all the groundbreaking discoveries yet to be made, you don’t want to miss anything that will help you make your next big discovery.
Like panning for gold
Can you reconcile your need for data that’s accurate yet also complete? How do you find the needles in the haystack yet ensure you won’t miss valuable data that could give you unique insights? What’s the best way to convert biomedical data into biomedical knowledge?
And, even if the data you’ve got ticks all those boxes, there’s always the question of accessibility. How are you going to access it? And how much will you have access to? What if you only want a small slice of the data? Are there access models that will accommodate your specific needs, whether big or small?
To turn data into its usable form of information to create knowledge, it must be honed, fine-tuned and polished—by a human. This produces high-quality data and is the very core and backbone of our knowledge and database offerings, such as our premier QIAGEN Biomedical Knowledge Base. They are trusted by over 90,000 scientists worldwide, in over 4000 accounts, to make confident decisions.
As leaders of this augmented scientific data collection approach, we’re excited by the development of AI tools for curation and continue to evaluate and evolve our technology to take advantage of beneficial advancements. We apply state-of-the-art AI to maximize the completeness of evidence in our knowledge base. But for scientific interpretation, scalable content quality is ultimately essential.
AI + manual curation = Accurate and complete biomedical data
And it’s core to what we do best.
Our curation team scales with today’s growth in scientific publishing because we leverage NLP and other technologies to speed curation but still rely on human certification of biological findings to ensure quality. With domain-specific analytics, you can compute over our unparalleled knowledge base of high-quality evidence; something AI cannot infer.
Imagine having 25 years of curation experience and 200 experts at your disposal
Our experience and findings show the quality of AI and machine-generated content is not good enough for scientific purposes. We regularly identify many false positives and false negatives from machine-only curation. That’s why we’ve been perfecting our market-leading ‘augmented molecular intelligence’ approach for over two decades and leverage 200+ PhD scientists to work alongside machines to verify and improve the utility of the content to drive sound research hypotheses.
Our human curation team enables us to:
Access the data your way
Yet, having a collection of high-quality and reliable data alone isn’t enough. It’s got to be accessible when you need it, how you need it.
That’s why we’ve developed API access to QIAGEN Biomedical Knowledge Base. Now you can rest easy with data that’s not just reliable; it’s also available the way you want it, from the entire knowledge base to just the right slice for your project.
That’s all possible with data that’s easy to access any way you’d like it.
Learn about how flexible access to QIAGEN Biomedical Knowledge Base will open doors to reliable data that deliver true insights. With >35 million findings, >2.1 million entities and >24 million unique relationships, it’s got data that will fuel your data- and analytics-driven drug discovery, at whatever scale you need. Request a consultation to discover how this powerful tool will transform your drug discovery research.
For those of us working in pharma drug or biomarker discovery, artificial intelligence (AI) plays a vital role in how we collect biological and pharmacological data. It's not only used in each step of the drug design pipeline, it ensures safer and more effective drugs in preclinical trials, while dramatically reducing development costs (1,2).
Yet there's a huge and potentially dangerous disadvantage when using AI-derived data—the question regarding their accuracy.
The unfortunate side effect of AI
Imagine you're a bioinformatician supporting discovery research projects in pharma. You work with biologists on experiments to prioritize leads for further drug development. You do a full analysis of existing data to help define which drug targets have the highest likelihood for therapeutic success. You use an AI-derived knowledge base to pull available 'omics data from a range of dataset repositories, and match that data with your company's internal data.
You analyze the data with your biologist colleagues to generate hypotheses, and define experiments to validate those hypotheses. After six months of costly but failed experiments, you realize something was off in the initial analysis and that your hypotheses were entirely misguided. After backtracking, you discover the AI-derived data were inconsistent in the annotated disease state, resulting in a complete misinterpretation of the data.
Now you've spent half a year, thousands of dollars and countless hours of research on following a dead lead. And your team has nothing to show for it.
AI-derived data: Does it make sense?
In the past few months, you've probably read countless news stories about Chat GPT. It's a powerful tool that uses AI to generate detailed answers to virtually any question you throw at it. Yet, a recognized drawback is that these answers are often factually inaccurate. Try asking it to write your bio, or the bio of your best friend. It will generate a lot of false information, but may appear plausibly factual to people who don't know you or your friend.
Chat GPT is just one example of how AI can be an impressive tool, but one that should be handled with extreme caution. Because, how can you trust insights or hypotheses derived from information that only might be accurate? Or partially accurate? Or worse still, completely inaccurate?
The answer is to couple AI with human-certified, manual curation.
We all recognize the incredible power and potential of AI to collect and bring together seemingly relevant data. Yet, ‘omics and biological relationships data is complex and nuanced and requires context that AI-derived data alone can’t provide.
As Figure 1 demonstrates, without the human 'magic touch' of aligning, correcting errors and removing irrelevant data, AI-derived data alone leaves you with a jumble of information that may or may not be accurate, which could send you down a rabbit hole in pursuit of your next biomarker or target discovery.
Figure 1. Decision tree for using AI-derived data.
We're confident that by using our manually curated, human-certified ‘omics data, you’ll quickly gain reliable insights to generate and confirm your hypotheses. We offer you direct access to the most extensive collections of integrated and standardized 'omics and biological relationships data, manually curated by a team of MS- and PhD-certified experts. In short, we find errors and correct them to ensure the data you work with are reliable and accurate.
This means that when you use our manually curated 'omics and biological relationships data, you'll avoid the stressful and frustrating consequences of being led astray by inaccurate data riddled with inconsistencies and errors.
Don't let bad data compromise your projects. And don't waste time fixing and cleaning the data yourself. Get direct access to 'golden' data that deliver true and immediate insights. Ready-to-use, manually curated data that are cleaned of errors and inconsistencies.
“Truth, like gold, is to be obtained not by its growth, but by washing away from it all that is not gold.”
Leo Tolstoy
Tweet
We wash away the 'dirt' so you can mine and collect clean and golden data.
References:
You need biomedical relationships knowledge for innovative data- and analytics-driven drug discovery. Yet this knowledge is locked in thousands of publications and dozens of databases. Collecting, structuring and integrating this knowledge is a challenging task that is time- and resource-consuming.
What if you could break knowledge silos and confidently power your drug discovery with data science using a high-quality and industry-validated source of structured and integrated biomedical relationships?
We are excited to introduce QIAGEN Biomedical Knowledge Base, the leading knowledge about biomedical relationships, manually structured and integrated from thousands of sources by experts. It is a vast collection of diverse causal relationships between genes, diseases, drugs, targets, functions, toxicological processes and more, all of which are enriched with full context. QIAGEN Biomedical Knowledge Base delivers high-quality data ideally suited for major data science-driven drug discovery applications. These include knowledge graph construction and analysis, analytics- and artificial intelligence (AI)-driven target identification and drug repositioning, development of target, disease and drug intelligence portals, disease subtype and biomarker identification and many more.
QIAGEN Biomedical Knowledge Base fuels QIAGEN Ingenuity Pathway Analysis (IPA), our premier ‘omics data analysis and interpretation software. This is data you know well, and now you can access it directly.
"For over 20 years, we have been assembling the world's leading source of molecular knowledge and data used to inform decisions from bench to bedside. This knowledge and data power market-leading products such as QIAGEN IPA, QIAGEN OmicSoft, QCI Interpret and online databases like HGMD and HSMD," said Dr. Jonathan Sheldon, Senior VP of QIAGEN Digital Insights. "Previously, our focus was to make our knowledge and data solely accessible through our industry-leading applications. Now, in addition, we are unlocking and giving the keys to our knowledge and data to fuel drug discovery with data science. The data is in a format and structure that makes it easy to integrate our reliable molecular data into data science projects within pharma and biotech."
Using QIAGEN Biomedical Knowledge Base, you’ll make biomedical discoveries that are:
See how QIAGEN Biomedical Knowledge Base empowers you to leverage biomedical knowledge graph analysis, fuel your data- and analytics-driven drug discovery and transform your research. Learn more and request your trial today.
Though not technically summer, on May 25th, the EU passed the General Data Protection Regulation (GDPR) into law, creating a global ripple effect. The law impacts the world of clinical decision support software because it stipulates the “right to explanation,” around automated decision-making (i.e., algorithms) and the expected consequences of applying those decisions. This requirement for transparency does not bode well for the walled-off “black box” approach to clinical decision support. For another perspective, read this contributed piece in The Pathologist, written by our own Ramon Felciano, in which he positions QCI as an enabling tool to transition to precision medicine in a cost-effective, scalable, and transparent way.
Artificial intelligence (AI) was frequently in the news over the past few months. In particular, we saw quite a few stories about IBM’s Watson and its limitations in beating cancer. Though Watson has not yet lived up to its promise of generating insights and identifying new approaches to cancer treatment, there remains hope in the industry that AI will eventually revolutionize medicine—whether through data pattern recognition, its impact on pharmaceutical development, or—even someday—cancer. In the meantime, we at QIAGEN continue to focus on our clinical decision support tools (big data, informatics and augmented intelligence) to improve test interpretation and accuracy of results.
QIAGEN was in the news as well.
Our second consecutive win during AMP Europe’s Battle of the Bioinformatics Pipeline event was covered in GenomeWeb; we published our own recap of the results to provide additional detail and background around standardizing variant interpretation and reporting. Finally, we recently hosted three international OmicSoft User Group Meetings:
We hope you had a wonderful summer, and we look forward to the busier pace and renewed activity that fall brings.