With the scientific research community publishing over two million peer-reviewed articles every year since 2012 (1) and next-generation sequencing fueling a data explosion, the need for comprehensive yet accurate, reliable and analysis-ready information on the path to biomedical discoveries is now more pressing than ever.

Manual curation has become an essential requirement in producing such data. Data scientists spend an estimated 80% of their time collecting, cleaning and processing data, leaving less than 20% of their time for analyzing the data to generate insights (2,3). But manual curation is not just time-consuming. It is costly and challenging to scale as well.

We at QIAGEN take on the task of manual curation so researchers like you can focus on making discoveries. Our human-certified data enables you to concentrate on generating insights rather than collecting data. QIAGEN has been curating biomedical and clinical data for over 25 years. We've made massive investments in a biomedical and clinical knowledge base that contains millions of manually reviewed findings from the literature, plus information from commonly-used third-party databases and 'omics dataset repositories. With our knowledge and databases, scientists can generate high-quality, novel hypotheses quickly and efficiently, while using innovative and advanced approaches, including artificial intelligence​.

Here are seven best practices for manual curation that QIAGEN's 200 dedicated curation experts follow, which we presented at the November 2021 Pistoia Alliance event.

  1. Efficient yet thorough information capture: Understanding ​articles is time-limiting, so efficiency is imperative. All essential elements must be captured in a single reading. But because critical information may be distributed throughout the article, curators must read it entirely to deliver accurate findings and context.
  2. Standardization: We use an ontology of more than 2 million concepts and dozens of relationship types to capture information. Wherever possible, data are mapped to public identifiers to enhance interoperability.
  3. Triaging: Document selection is fundamental to efficient manual curation and helps avoid reading articles that lack useful information. We've developed a way to identify relevant sources using criteria such as novelty, and employ automation to prioritize articles for manual curation, as well as utilize delivery workflows to orchestrate work.
  4. Training: For consistency, we use internally-developed curation protocols, training documents and editorial reviews. Trainees receive continuous feedback for several months before advancing to our production environment.
  5. Tooling: Good curation tools are fundamental to accuracy and efficiency. Our internally-created tools ensure we capture information consistently through guided forms, pulldown menus, constraints on slots and other features.
  6. Revisions: Knowledge constantly evolves and needs to be updated based on new evidence. Articles may become deprecated or have corrections published, and drug labels and guidelines undergo revisions. Our workflows deal with all of these situations.
  7. Quality control: Our metrics measure accuracy, including QC in curation tools, editor reviews, author error reviews and database consistency checks.

These principles ensure that our knowledge base and integrated 'omics database deliver timely, highly accurate, reliable and analysis-ready data. In our experience, 40% of public ‘omics datasets include typos or other potentially critical errors in an essential element (cell lines, treatments, etc.); 5% require us to contact the authors to resolve inconsistent terms, mislabeled treatments or infections, inaccurate sample groups or errors mapping subjects to samples. Thanks to our stringent manual curation processes, we can correct such errors.

Our extensive investment in high-quality manual curation means that scientists like you don't need to spend 80% of their time aggregating and cleaning data. We've scaled our rigorous manual curation procedures to collect and structure accurate and reliable information from many different sources, from journal articles to drug labels to 'omics datasets. In short, we accelerate your journey to comprehensive yet accurate, reliable and analysis-ready data.

Ready to get your hands on reliable biomedical, clinical and 'omics data that we've manually curated using these best practices? Learn about QIAGEN knowledge and databases, and request a consultation to find out how our accurate and reliable data will save you time and get you quick answers to your questions.

References:

  1. The STM Report 2018: An overview of scholarly and scientific publishing. https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf
  2. H. Sarih, A. P. Tchangani, K. Medjaher and E. Pere (2019) Data preparation and preprocessing for broadcast systems monitoring in PHM framework. 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), 1444–1449. DOI:10.1109/CoDIT.2019.8820370
  3. Big data to good data: Andrew Ng urges ML community to be more data-centric and less model-centric (06/04/2021) https://analyticsindiamag.com/big-data-to-good-data-andrew-ng-urges-ml-community-to-be-more-data-centric-and-less-model-centric/

Researchers across the world are using Ingenuity Pathway Analysis to accelerate their work in a variety of applications, including the role of a specific miRNA in tumorigenesis, host-pathogen interactions, ovarian cancer and nanoparticle toxicity.

As a researcher, it’s an enormous task to acquire knowledge and insight from the sea of biological data and complex interactions involved in a specific research topic. With QIAGEN Bioinformatics’ Ingenuity Pathway Analysis (IPA), we make it easier.

With the comprehensive, manually curated content of the Ingenuity Knowledge Base, combined with powerful algorithms, IPA provides advanced analysis capabilities to help scientists understand the biological context of expression analysis experiments. With IPA, you can identify the most significant pathways, and discover novel regulatory networks and causal relationships associated with your experimental data.

In the past several months there have been over 500 citations for Ingenuity Pathway Analysis, demonstrating how this tool helps put biological data in context to gain insight. Here, we round up just a few of them to offer a sense of the diverse research for which Ingenuity Pathway Analysis makes a difference.

miR-301a promotes lung tumorigenesis by suppressing Runx3

First author: Xun Li

Increasing evidence indicates that miR-301a is a potential oncogenic microRNA and that its genetic ablation reduces Kras-driven lung tumorigenesis in mice. A recent Molecular Cancer paper describes how researchers from China studied the role of miR-301a on host antitumor immunity.

After differentially expressed genes (DEGs) of two mouse models (with or without miR-301a) were identified from RNA-seq data, IPA was used to identify gene networks. The five most highly implicated IPA networks related to cell cycle and immune response were merged, and it was discovered that IFNG (INF-γ) and CTNNB1 (β-catenin) were in the core modules within the entire network. This discovery led to further investigation of these genes, which enabled the researchers to find that miR-301a deficiency recruits immune cells to the tumor microenvironment,  resulting in higher IFN-γ expression in early lung tumorigenesis. Additionally, miR-301a directly targets Runx3 mRNA, a negative regulator of the β-catenin pathway. After further experiments, the authors conclude that miR-301a facilitates antitumor immunity in the tumor microenvironment via Runx3 suppression during lung tumorigenesis.

Coordinated host-pathogen transcriptional dynamics revealed using sorted subpopulations and single macrophages infected with Candida albicans

First author: José F. Muñoz

In a Nature Communications paper, scientists from the Broad Institute and the Worchester Polytechnic Institute looked into transcriptional dynamics of macrophages infected with Candida albicans. IPA was used to investigate biological relationships, canonical pathways and upstream regulators of differentially expressed genes in macrophages either exposed to or infected with C. albicans.

Using IPA, the group was able to assess the overlap between significantly DEGs and an extensively curated database of target genes for each of several hundred known regulatory proteins. The researchers found that transcriptomes of infected macrophages and phagocytosed C. albicans displayed tightly coordinated shifts in gene expression, and they established an approach for studying host-pathogen trajectories to resolve heterogeneity in dynamic populations.

Signaling pathway network alterations in human ovarian cancers identified with quantitative mitochondrial proteomics

First author: Na Li

A group of collaborating researchers from China recently published their findings on the signaling pathway network profile of human ovarian cancers.  They used IPA to mine signaling pathway networks with nearly 1200 differentially expressed mitochondrial proteins, and they compared the pathway and network changes between ovarian cancers and controls. Their results were experimentally validated using qRT-PCR and Western blot. The scientific data generated in this study may lead to the discovery of pathway- and network-based disease and treatment biomarkers for ovarian cancers, and potentially novel molecular mechanisms and therapeutic targets for this disease.

Protein and lipid homeostasis altered in rat macrophages after exposure to metallic oxide nanoparticles

First author: Doumandji Zahra

Metal oxide nanoparticles (NPs) are widely used in industry despite little knowledge about the cellular pathways involved in their potential toxicity. Collaborating scientists from France and Ireland published in Cell Biology and Toxicology results of their gene expression study, showing expression changes in rat macrophages upon exposure to metal oxide NPs. IPA was used to identify top canonical pathways influenced by the exposure, notably eIF2 signaling involved in protein homeostasis.

If QIAGEN’s IPA is helping you make strides in your research, we would love to hear about it. Please contact us to share your story, or just to request a free trial!

Need access to 75,000+ human ‘omics datasets for your oncology-research?

Discover a world of oncology-focused datasets, analysis tools, and biomarker breakthroughs!

OncoLand is a sophisticated oncology-focused database designed to accelerate cancer research. Integrating published research and large consortium cancer datasets, robust data visualization and discovery tools, OncoLand saves you valuable time and resources in your pursuit for actionable discoveries.

Start your free trial of OncoLand today!

Searching for new biomarkers for your NGS panel and assay design?

Discover a world of disease-focused datasets, analysis tools, and biomarker breakthroughs!

OmicSoft DiseaseLand is the disease-focused platform designed to help you harness thousands of disease-focused datasets to accelerate biological discovery. Bringing together robust data visualization and analytics tools, OmicSoft “Land” technology rapidly connects you to the most relevant insights. Explore gene expression data at the gene, transcript, and exon level, as well as whole-transcriptome analyses of differential expression. Every project in DiseaseLand is carefully curated and processed through common pipelines, allowing you to quickly find the most relevant results across projects. DiseaseLand saves you valuable time and resources, enabling you to focus on your research.

Start your free trial of DiseaseLand today!

Research to beat rare diseases

With Rare Disease Day 2017 approaching on February 28, we want to show our support. As such, we'd like to share a few statistics to help shed light and increase our understanding on the state of rare disease today.

It’s clear that there’s plenty of work to be done! We are proud to be a friend of Rare Disease Day 2017, organized by EURORDIS. EURORDIS is doing important work to unify international efforts to understand rare disease. This year’s slogan is “with research, possibilities are limitless.”

We couldn’t agree more. We believe that research and increased access to comprehensive tools will mitigate the many hardships suffered by those with rare disease and their families.

In fact, earlier this year, we partnered with the Rare Genomics Institute by offering them access to our Hereditary Disease Solution. By helping to raise awareness and expand efforts to combat rare disease, we hope that the rare disease community finds unity, solace and eventually, cures.

We launched the Regulator Effects in Ingenuity Pathway Analysis (IPA) last year, and it’s been so gratifying to see how helpful this new feature has been for users of the application. It integrates results from our Upstream Regulator and Downstream Effects tools, creating hypotheses to explain what’s going on upstream that may be causing a phenotype or other functional outcome. With Regulator Effects, IPA users can identify potential mechanisms behind a phenotype, identify drug targets, and determine the biological impact of upstream molecules according to the genes they regulate.

Already, a number of publications have described interpretation advances based on Regulator Effects. Below you can find a short description of two of these articles. If you'd like to learn more about IPA, you can sign up for our webinars here.

 

CEMP1 Induces Transformation in Human Gingival Fibroblasts
First author: Mercedes Bermúdez

In this PLoS One paper, scientists from Mexico and the U.S. investigated the potential of CEMP1, an important regulator in tooth formation, for bone regeneration and other treatments. Starting with gene expression profiles, they found that CEMP1 modifies several genes, many of them linked to oncogenesis. “We also determined that the region spanning the CEMP1 locus is commonly amplified in a variety of cancers, and finally we found significant overexpression of CEMP1 in leukemia, cervix, breast, prostate and lung cancer,” the authors report. “Our findings suggest that CEMP1 exerts modulation of a number of cellular genes, cellular development, cellular growth, cell death, and cell cycle, and molecules associated with cancer.” The scientists note that further study is needed to determine whether CEMP1 is a novel oncogene or simply a passenger linked to another driver gene.

The team used IPA to analyze the genes identified by gene expression analysis. With Upstream Regulator Analysis, they were able to identify two potential upstream regulators: a beta-catenin protein involved in the Wnt signaling pathway, and a transcription factor known to mediate apoptosis and cell proliferation.

 

Inhibiting the Mammalian Target of Rapamycin Blocks the Development of Experimental Cerebral Malaria
First author: Emile Gordon

Scientists from the National Institute of Allergy and Infectious Diseases published in mBio their findings from studying a mouse model of cerebral malaria. They tested rapamycin, an mTOR inhibitor, and determined that outcomes improved significantly when treated with the medication within four days of infection. “Treatment with rapamycin increased survival, blocked breakdown of the blood-brain barrier and brain hemorrhaging, decreased the influx of both CD4+ and CD8+ T cells into the brain and the accumulation of parasitized red blood cells in the brain,” the team reports.

They analyzed transcriptional patterns caused by rapamycin to understand its effect, which suggested that leukocyte activity in the brain was blocked by the medication. “Remarkably, animals were protected against cerebral malaria even though rapamycin treatment significantly increased the inflammatory response induced by infection in both the brain and spleen,” the scientists note.

The team used IPA to identify enriched pathways, characterize networks, and predict upstream regulator effects for the differentially expressed genes they found in a comparison of untreated to treated mice. Regulator Effects features allowed the scientists to predict networks interrupted by rapamycin, identifying cellular invasion and lymphocyte proliferation, among others, as key functions inhibited by the treatment.

 

 

Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.