With the scientific research community publishing over two million peer-reviewed articles every year since 2012 (1) and next-generation sequencing fueling a data explosion, the need for comprehensive yet accurate, reliable and analysis-ready information on the path to biomedical discoveries is now more pressing than ever.
Manual curation has become an essential requirement in producing such data. Data scientists spend an estimated 80% of their time collecting, cleaning and processing data, leaving less than 20% of their time for analyzing the data to generate insights (2,3). But manual curation is not just time-consuming. It is costly and challenging to scale as well.
We at QIAGEN take on the task of manual curation so researchers like you can focus on making discoveries. Our human-certified data enables you to concentrate on generating insights rather than collecting data. QIAGEN has been curating biomedical and clinical data for over 25 years. We've made massive investments in a biomedical and clinical knowledge base that contains millions of manually reviewed findings from the literature, plus information from commonly-used third-party databases and 'omics dataset repositories. With our knowledge and databases, scientists can generate high-quality, novel hypotheses quickly and efficiently, while using innovative and advanced approaches, including artificial intelligence.
Here are seven best practices for manual curation that QIAGEN's 200 dedicated curation experts follow, which we presented at the November 2021 Pistoia Alliance event.
These principles ensure that our knowledge base and integrated 'omics database deliver timely, highly accurate, reliable and analysis-ready data. In our experience, 40% of public ‘omics datasets include typos or other potentially critical errors in an essential element (cell lines, treatments, etc.); 5% require us to contact the authors to resolve inconsistent terms, mislabeled treatments or infections, inaccurate sample groups or errors mapping subjects to samples. Thanks to our stringent manual curation processes, we can correct such errors.
Our extensive investment in high-quality manual curation means that scientists like you don't need to spend 80% of their time aggregating and cleaning data. We've scaled our rigorous manual curation procedures to collect and structure accurate and reliable information from many different sources, from journal articles to drug labels to 'omics datasets. In short, we accelerate your journey to comprehensive yet accurate, reliable and analysis-ready data.
Ready to get your hands on reliable biomedical, clinical and 'omics data that we've manually curated using these best practices? Learn about QIAGEN knowledge and databases, and request a consultation to find out how our accurate and reliable data will save you time and get you quick answers to your questions.
References:
As a researcher, it’s an enormous task to acquire knowledge and insight from the sea of biological data and complex interactions involved in a specific research topic. With QIAGEN Bioinformatics’ Ingenuity Pathway Analysis (IPA), we make it easier.
With the comprehensive, manually curated content of the Ingenuity Knowledge Base, combined with powerful algorithms, IPA provides advanced analysis capabilities to help scientists understand the biological context of expression analysis experiments. With IPA, you can identify the most significant pathways, and discover novel regulatory networks and causal relationships associated with your experimental data.
In the past several months there have been over 500 citations for Ingenuity Pathway Analysis, demonstrating how this tool helps put biological data in context to gain insight. Here, we round up just a few of them to offer a sense of the diverse research for which Ingenuity Pathway Analysis makes a difference.
Increasing evidence indicates that miR-301a is a potential oncogenic microRNA and that its genetic ablation reduces Kras-driven lung tumorigenesis in mice. A recent Molecular Cancer paper describes how researchers from China studied the role of miR-301a on host antitumor immunity.
After differentially expressed genes (DEGs) of two mouse models (with or without miR-301a) were identified from RNA-seq data, IPA was used to identify gene networks. The five most highly implicated IPA networks related to cell cycle and immune response were merged, and it was discovered that IFNG (INF-γ) and CTNNB1 (β-catenin) were in the core modules within the entire network. This discovery led to further investigation of these genes, which enabled the researchers to find that miR-301a deficiency recruits immune cells to the tumor microenvironment, resulting in higher IFN-γ expression in early lung tumorigenesis. Additionally, miR-301a directly targets Runx3 mRNA, a negative regulator of the β-catenin pathway. After further experiments, the authors conclude that miR-301a facilitates antitumor immunity in the tumor microenvironment via Runx3 suppression during lung tumorigenesis.
In a Nature Communications paper, scientists from the Broad Institute and the Worchester Polytechnic Institute looked into transcriptional dynamics of macrophages infected with Candida albicans. IPA was used to investigate biological relationships, canonical pathways and upstream regulators of differentially expressed genes in macrophages either exposed to or infected with C. albicans.
Using IPA, the group was able to assess the overlap between significantly DEGs and an extensively curated database of target genes for each of several hundred known regulatory proteins. The researchers found that transcriptomes of infected macrophages and phagocytosed C. albicans displayed tightly coordinated shifts in gene expression, and they established an approach for studying host-pathogen trajectories to resolve heterogeneity in dynamic populations.
A group of collaborating researchers from China recently published their findings on the signaling pathway network profile of human ovarian cancers. They used IPA to mine signaling pathway networks with nearly 1200 differentially expressed mitochondrial proteins, and they compared the pathway and network changes between ovarian cancers and controls. Their results were experimentally validated using qRT-PCR and Western blot. The scientific data generated in this study may lead to the discovery of pathway- and network-based disease and treatment biomarkers for ovarian cancers, and potentially novel molecular mechanisms and therapeutic targets for this disease.
Metal oxide nanoparticles (NPs) are widely used in industry despite little knowledge about the cellular pathways involved in their potential toxicity. Collaborating scientists from France and Ireland published in Cell Biology and Toxicology results of their gene expression study, showing expression changes in rat macrophages upon exposure to metal oxide NPs. IPA was used to identify top canonical pathways influenced by the exposure, notably eIF2 signaling involved in protein homeostasis.
If QIAGEN’s IPA is helping you make strides in your research, we would love to hear about it. Please contact us to share your story, or just to request a free trial!
OncoLand is a sophisticated oncology-focused database designed to accelerate cancer research. Integrating published research and large consortium cancer datasets, robust data visualization and discovery tools, OncoLand saves you valuable time and resources in your pursuit for actionable discoveries.
Start your free trial of OncoLand today!
OmicSoft DiseaseLand is the disease-focused platform designed to help you harness thousands of disease-focused datasets to accelerate biological discovery. Bringing together robust data visualization and analytics tools, OmicSoft “Land” technology rapidly connects you to the most relevant insights. Explore gene expression data at the gene, transcript, and exon level, as well as whole-transcriptome analyses of differential expression. Every project in DiseaseLand is carefully curated and processed through common pipelines, allowing you to quickly find the most relevant results across projects. DiseaseLand saves you valuable time and resources, enabling you to focus on your research.
With Rare Disease Day 2017 approaching on February 28, we want to show our support. As such, we'd like to share a few statistics to help shed light and increase our understanding on the state of rare disease today.
It’s clear that there’s plenty of work to be done! We are proud to be a friend of Rare Disease Day 2017, organized by EURORDIS. EURORDIS is doing important work to unify international efforts to understand rare disease. This year’s slogan is “with research, possibilities are limitless.”
We couldn’t agree more. We believe that research and increased access to comprehensive tools will mitigate the many hardships suffered by those with rare disease and their families.
In fact, earlier this year, we partnered with the Rare Genomics Institute by offering them access to our Hereditary Disease Solution. By helping to raise awareness and expand efforts to combat rare disease, we hope that the rare disease community finds unity, solace and eventually, cures.
We launched the Regulator Effects in Ingenuity Pathway Analysis (IPA) last year, and it’s been so gratifying to see how helpful this new feature has been for users of the application. It integrates results from our Upstream Regulator and Downstream Effects tools, creating hypotheses to explain what’s going on upstream that may be causing a phenotype or other functional outcome. With Regulator Effects, IPA users can identify potential mechanisms behind a phenotype, identify drug targets, and determine the biological impact of upstream molecules according to the genes they regulate.
Already, a number of publications have described interpretation advances based on Regulator Effects. Below you can find a short description of two of these articles. If you'd like to learn more about IPA, you can sign up for our webinars here.
CEMP1 Induces Transformation in Human Gingival Fibroblasts
First author: Mercedes Bermúdez
In this PLoS One paper, scientists from Mexico and the U.S. investigated the potential of CEMP1, an important regulator in tooth formation, for bone regeneration and other treatments. Starting with gene expression profiles, they found that CEMP1 modifies several genes, many of them linked to oncogenesis. “We also determined that the region spanning the CEMP1 locus is commonly amplified in a variety of cancers, and finally we found significant overexpression of CEMP1 in leukemia, cervix, breast, prostate and lung cancer,” the authors report. “Our findings suggest that CEMP1 exerts modulation of a number of cellular genes, cellular development, cellular growth, cell death, and cell cycle, and molecules associated with cancer.” The scientists note that further study is needed to determine whether CEMP1 is a novel oncogene or simply a passenger linked to another driver gene.
The team used IPA to analyze the genes identified by gene expression analysis. With Upstream Regulator Analysis, they were able to identify two potential upstream regulators: a beta-catenin protein involved in the Wnt signaling pathway, and a transcription factor known to mediate apoptosis and cell proliferation.
Inhibiting the Mammalian Target of Rapamycin Blocks the Development of Experimental Cerebral Malaria
First author: Emile Gordon
Scientists from the National Institute of Allergy and Infectious Diseases published in mBio their findings from studying a mouse model of cerebral malaria. They tested rapamycin, an mTOR inhibitor, and determined that outcomes improved significantly when treated with the medication within four days of infection. “Treatment with rapamycin increased survival, blocked breakdown of the blood-brain barrier and brain hemorrhaging, decreased the influx of both CD4+ and CD8+ T cells into the brain and the accumulation of parasitized red blood cells in the brain,” the team reports.
They analyzed transcriptional patterns caused by rapamycin to understand its effect, which suggested that leukocyte activity in the brain was blocked by the medication. “Remarkably, animals were protected against cerebral malaria even though rapamycin treatment significantly increased the inflammatory response induced by infection in both the brain and spleen,” the scientists note.
The team used IPA to identify enriched pathways, characterize networks, and predict upstream regulator effects for the differentially expressed genes they found in a comparison of untreated to treated mice. Regulator Effects features allowed the scientists to predict networks interrupted by rapamycin, identifying cellular invasion and lymphocyte proliferation, among others, as key functions inhibited by the treatment.