If you're working in pharma or biotech, you probably rely on artificial intelligence (AI) to help you identify new drug targets or disease biomarkers within large datasets. As pharma scientists, we know AI is becoming standard practice. But what happens if your AI models are accidentally fed data from unreliable sources? Imagine wasting months of time and resources chasing a drug target that turns out to be a flop. Ouch.
The pitfalls of unreliable, unstructured biomedical data are not just hypotheticals - they're a stark reality we and other researchers face around the globe. For AI models to provide reliable insights, the underlying data must be 'high quality', meaning it's accurate, comprehensive, up-to-date and standardized.
Leave AI errors in the dust with expert-curated biomedical data
Enter QIAGEN Biomedical Knowledge Base (BKB), a repository of meticulously curated biomedical relationships data. We've tailored the BKB database with a keen, expert eye to ensure the molecular connections and relationships are only extracted from the highest quality sources like peer-reviewed scientific literature.
With continuous updates and enriched contextual information, such as tissue specificity and relationship directionality, we make sure you always have access to the most current and reliable data.
As a data scientist, you can mine our BKB through database files or directly from API integration. Our BKB data is particularly useful for creating graph-based models that handle interconnected heterogeneous biological data far better than simple relational databases do. With API integration, we can even help you customize BKB to seamlessly fit with your internal workflows.
What discoveries are waiting for you with the power of QIAGEN BKB?
Here are 3 ideas to get you started:
1. Identify high-quality drug targets using inferred causal interactions
Literature is full of publications showcasing associations between biomarkers, receptors and disease. But how do you know if that relationship is strong or weak within the context of finding a good target for drug development?
Using knowledge graphs powered by BKB data, you can thoroughly explore causal relationships between your target and disease, before you ever even set foot in the lab. Start with a high-level view of the pharmacological landscape of your target, then zoom in on the pathways directly linking target and diseases. From there, you can overlap pathways and functions to discover the exact mechanisms that connect a target to a disease, such as activation/inhibition signaling pathways or RNA binding.
2. Explore the current clinical landscape around your target
QIAGEN Biomedical Knowledge Base contains a wealth of clinical trial information connecting drug targets and their indications. Use knowledge graphs to intuitively navigate from your target to ongoing and completed drug trials, including whether the drug is an antagonist or agonist, the drug's indication of use and trial phase.
Using BKB, you can efficiently prioritize diseases for drug development within the current clinical landscape surrounding your target.
3. Build disease interactomes using protein-protein interactions
What if there are promising disease indications for your target that have not yet been discovered in the literature? You can easily use BKB data to build a knowledge graph based on the protein-protein interactome (PPI) surrounding your target. You can use the PPI graph like a roadmap, measuring and ranking the protein-protein relationships by distance, and determine which diseases are related to those proteins.
This may even lead you to discover less obvious disease indications associated with your target - creating perfect positioning for your new drug to accelerate your entry into the market.
"If 80 percent of our work is data preparation, then ensuring data quality is the important work of a machine learning team" (1).
Andrew Ng, Founder & CEO of Landing AI
Tweet
Don't miss your chance to supercharge your AI toolbox and transform your drug discovery
During a recent webinar we hosted, a unique and talented group of experts came together to share their insights. They used gene-disease knowledge graphs created by combining AI with BKB to reveal promising novel drug targets for neuroinflammatory disease and other devastating autoimmune disorders.
This webinar offers a unique opportunity to learn directly from the experts how you can best use BKB to enhance and advance your drug discovery strategy. See BKB in action with an example of real-world drug discovery. Watch the webinar now.
Continue your journey with us and explore how you can combine AI and QIAGEN Biomedical Knowledge Base to maximize your drug discovery efforts while minimizing your timelines:
QIAGEN BKB powers the leading pathway analysis software in life science research
Looking for a transformative software suite that already integrates QIAGEN Biomedical Knowledge Base? Was that a ‘yes, please’?
QIAGEN Ingenuity Pathways Analysis (IPA) merges the advanced analytics of BKB with intuitive user design - all without ever having to write a line of code. Learn more.
References:
If you’re working in pharma or biotech, artificial intelligence (AI) is no stranger. You likely use it to help you identify new targets to explore for a therapeutic area, for drug repurposing or to identify plausible biomarkers for your disease of interest. You may think using AI is enough and will have all the answers if there are enough data. However, there’s a big problem with that assumption.
Limitations of AI-derived biomedical data
Biomedical data have errors and are mainly unstructured. So, removing errors and structuring the data to make them usable to address specific questions is essential, yet far beyond current natural language processing (NLP) approaches and generative AI models with large memories. So for AI models to provide insight, the underlying data must be based on ‘high-quality’ data. High-quality means it’s got to be accurate, yet also complete and comprehensive, up-to-date and standardized.
To complicate matters, scientific knowledge evolves daily, and the genetic basis of hundreds of diseases are identified each year. So the amount of biomedical data is constantly growing and, well… there’s a lot of it. Yet we still don’t know what 99% of our DNA even does. So with all the groundbreaking discoveries yet to be made, you don’t want to miss anything that will help you make your next big discovery.
Like panning for gold
Can you reconcile your need for data that’s accurate yet also complete? How do you find the needles in the haystack yet ensure you won’t miss valuable data that could give you unique insights? What’s the best way to convert biomedical data into biomedical knowledge?
And, even if the data you’ve got ticks all those boxes, there’s always the question of accessibility. How are you going to access it? And how much will you have access to? What if you only want a small slice of the data? Are there access models that will accommodate your specific needs, whether big or small?
To turn data into its usable form of information to create knowledge, it must be honed, fine-tuned and polished—by a human. This produces high-quality data and is the very core and backbone of our knowledge and database offerings, such as our premier QIAGEN Biomedical Knowledge Base. They are trusted by over 90,000 scientists worldwide, in over 4000 accounts, to make confident decisions.
As leaders of this augmented scientific data collection approach, we’re excited by the development of AI tools for curation and continue to evaluate and evolve our technology to take advantage of beneficial advancements. We apply state-of-the-art AI to maximize the completeness of evidence in our knowledge base. But for scientific interpretation, scalable content quality is ultimately essential.
AI + manual curation = Accurate and complete biomedical data
And it’s core to what we do best.
Our curation team scales with today’s growth in scientific publishing because we leverage NLP and other technologies to speed curation but still rely on human certification of biological findings to ensure quality. With domain-specific analytics, you can compute over our unparalleled knowledge base of high-quality evidence; something AI cannot infer.
Imagine having 25 years of curation experience and 200 experts at your disposal
Our experience and findings show the quality of AI and machine-generated content is not good enough for scientific purposes. We regularly identify many false positives and false negatives from machine-only curation. That’s why we’ve been perfecting our market-leading ‘augmented molecular intelligence’ approach for over two decades and leverage 200+ PhD scientists to work alongside machines to verify and improve the utility of the content to drive sound research hypotheses.
Our human curation team enables us to:
Access the data your way
Yet, having a collection of high-quality and reliable data alone isn’t enough. It’s got to be accessible when you need it, how you need it.
That’s why we’ve developed API access to QIAGEN Biomedical Knowledge Base. Now you can rest easy with data that’s not just reliable; it’s also available the way you want it, from the entire knowledge base to just the right slice for your project.
That’s all possible with data that’s easy to access any way you’d like it.
Learn about how flexible access to QIAGEN Biomedical Knowledge Base will open doors to reliable data that deliver true insights. With >35 million findings, >2.1 million entities and >24 million unique relationships, it’s got data that will fuel your data- and analytics-driven drug discovery, at whatever scale you need. Request a consultation to discover how this powerful tool will transform your drug discovery research.
Have you ever done a Google search to find a restaurant or look up what your favorite actor is up to? Most of us have, and therefore understand the benefit of knowledge graphs, possibly without even knowing it. When you do a search on a platform like Google, the information box displayed in the results is made possible by a knowledge graph (1).
Because of their power and versatility, knowledge graphs are rapidly being adopted by the pharmaceutical industry to accelerate data science driven drug discovery. They facilitate integration across multiple data types and sources, such as molecular, clinical trial and drug label data. This enables powerful algorithms to work on various types of data at once, for applications ranging from prioritizing novel disease targets to predicting previously unknown drug-disease associations.
What is a knowledge graph?
A knowledge graph combines entities of various types in one network. These entities are connected by multiple types of relationships. Both entities and relationships can also carry additional attributes. Entities and attributes may also be part of an ontology (2, 3).
Figure 1. A simple example of a knowledge graph.
In the biomedical domain, entities represented in a knowledge graph can be, for example, molecules, biological functions and diseases or phenotypes. Relationships include molecular interactions, gene-functional associations, and drug-target interactions among others. Both entities and relationships are supported by underlying scientific evidence. Simple graphs are undirected, while more powerful graphs include causal relationships to allow causal inference.
Knowledge graph analytics
In drug discovery, knowledge graphs are used for target prioritization and drug repurposing. These tasks frequently involve link prediction approaches that allow the prediction and scoring of relationships between entities that were not explicitly present in the graph before. Artificial intelligence (AI)-inspired methods that have been used for this purpose include tensor factorization (4) and various deep-learning algorithms (see (5) for an example).
The QIAGEN biomedical knowledge graph
QIAGEN Biomedical Knowledge Base is ideally suited to build a large-scale biomedical knowledge graph. It is founded on a vast collection of diverse relationships between biomedical entities of various types. The relationships were manually curated from peer-reviewed biomedical literature and integrated from third-party databases with the highest accuracy.
In a knowledge graph constructed from QIAGEN Biomedical Knowledge Base, the main entities connected by relationships are molecules, drugs, targets, diseases, variants, biological functions, pathways, locations and more. The relationships have multiple attributes, including relationship type, direction, effect, context and source. Causality of the relationships is represented through direction. Causal relationships frequently carry information about the direction of effect (activation and inhibition) that can be leveraged in powerful analytics. Relationships are annotated with the full experimental context (e.g., tissues or organism). Entities also have attributes; for example, they are mapped to public identifiers and synonyms to support data integration.
Figure 2. Example of a sub-graph constructed from the QIAGEN biomedical knowledge graph. In this knowledge graph representation, gene and gene product entities are aggregated at the ortholog cluster level. Relationships between the same entities and with the same type, direction and effect are aggregated as well. Cetuximab is a metastatic colorectal cancer drug. EGFR is a target of cetuximab. Molecular interactions in the graph enable you to reconstruct a pathway between EFG, EGFR and the pathological process metastasis. EGFR is also a known member of the canonical pathway Colorectal Cancer Metastasis Signaling. In addition to metastatic colorectal cancer, genetic alterations of EGFR are involved in other diseases, for example non-small cell lung carcinoma. Activation of cell proliferation and inhibition of apoptosis by EGFR are known oncology mechanisms.
QIAGEN knowledge graph research for drug discovery
We actively use our QIAGEN biomedical knowledge graph in drug discovery projects in collaboration with industry partners, and develop new knowledge graph analysis approaches.
For example, we developed a machine learning approach for link prediction (6) that uses our knowledge graph to identify and prioritize genes and biological functions for a given disease. Using our biomedical knowledge graph and this machine-learning approach (7), we prioritized genes linked to known clinical manifestations of COVID-19 and built networks connecting those genes to SARS-CoV-2 viral proteins via protein-protein interactions. Based on these networks, we identified about 450 drugs potentially interfering with viral-host interactions, 54 of which were involved in clinical trials against COVID-19. We further used this approach and our QIAGEN biomedical knowledge graph to develop over 1500 machine-learning-generated disease networks, such as this one on pulmonary hypertensive arterial disease.
Learn more about how QIAGEN Biomedical Knowledge Base enables biomedical knowledge graph construction and analysis to fuel your data- and analytics-driven drug discovery. Request a trial to discover how this powerful tool will transform your drug discovery research.
References