While there is great interest in the scientific community to investigate drug targets and biomarkers from public immune-oncology data, such investigation is hindered by the difficulty in finding and combining related datasets to perform large-scale meta-analyses. This webinar will focus on how high quality curated genomic repositories such as the QIAGEN OmicSoft Lands database immediately allows in-depth investigations across diverse data-sources (GEO, CPTAC, TCGA, GTEx and many more) to discover and validate candidate checkpoint inhibitor drug targets and biomarker investigation.

In this webinar you will learn how to

In this webinar, users will learn how to leverage solutions from QIAGEN Digital Insights to discover biomarkers, validate targets, and identify variants. Specifically, users will learn:
1. How to locate public studies of interest using OmicSoft DiseaseLand.
2. Investigate the expression of genes of interest across different treatments, disease states, etc.
3. Identify variants of interest for candidate biomarkers and targets using the Human Gene Mutation Database.
4. Leveraging the QIAGEN Knowledgebase in Ingenuity Pathway Analysis to explore and extend findings from OmicSoft DiseaseLand and Human Gene Mutation Database.
5. Learn about additional methods to access data from OmicSoft, Human Gene Mutation Database, and Ingenuity Pathway Analysis for data scientists.

Cancer outcome is influenced by both the tumor microenvironment and host immune response. Using QIAGEN OmicSoft Studio to access public data from The Cancer Genome Atlas (TCGA) and our human Single Cell Lands collection, you’ll learn how to:
• View host immune response clusters across TCGA samples
• Identify differentially expressed immunomodulators across sample groups
• Visualize single-cell dimension reduction maps and overlay expression data
• Identify potential biomarkers whose expression correlates or anti-correlates with target genes
• Validate new biomarkers using custom queries and TCGA survival data

In this 90-minute training, you'll learn how to perform drug treatment, toxicology and target safety assessment-related discoveries using QIAGEN Ingenuity Pathway Analysis (IPA) and QIAGEN Omicsoft Lands.

Using public data from GTEx (normal tissue), GEO, cancer collections and more, you'll learn how to use Omicsoft Lands to:
• Investigate a drug target or biomarker expression across different normal tissues, disease conditions, treatments and more
• Correlate the expression of two or more genes
• Identify a list of genes or biomarkers specific to treatment, disease, normal tissue, cell type and more

Using findings from peer-reviewed publications and other sources, we'll explore with you how to use QIAGEN IPA to:
• Study the impact of targeting a gene/protein on different toxicological and biological functions
• Derive toxicity findings for a gene of interest from QIAGEN IPA's knowledgebase
• Identify and study toxicity-related pathways, regulators and functions for an internal dataset or a public dataset
• Compare different drug treatments, other conditions or multi-omics data for novel discoveries

Research and development for new drugs and disease treatments can be lengthy and costly. Indication expansion can help broaden the impact of a new drug that has already been through the arduous R&D process for a disease or cancer. Drug repurposing can take this concept and expand on it by looking for other diseases with similar drug target biology. The logic is if they share similar target biology, they may benefit from the same treatment.

During this training, we'll cover skills such as:
• Querying QIAGEN OmicSoft Lands data from sources like TCGA or ICGC and exploring the incidence of a cancer-driving somatic mutation that is targeted by a treatment
• Creating a cohort of patients within OmicSoft with a disease-causing mutation and wild type for the gene of interest
• Generating survival curves for each mutant and wild type group for various indications in the relevant OmicSoft Land
• Using QIAGEN Ingenuity Pathway Analysis (IPA) to generate a mechanism of action network for the drug's target
• Exploring various network overlay features to enable in silico testing and combination drug partner investigation
• Searching for publicly available datasets relevant to your chosen indication
• Comparing the expression profile from our disease state with other publicly available analyses to find other indications or diseases that share similar biology

Single-cell RNA-sequencing (scRNA-seq) is widely used to study tissue heterogeneity, identify novel cell types, study pathogenic mechanisms, develop targeted therapies (including immunotherapy) and more. Accordingly, scientists have deposited a tremendous amount of scRNA-seq data into public domains like GEO.

In this training, you will learn how to:

· Locate public single-cell studies of interest to you using QIAGEN Omicsoft Single Cell Lands

· Study different cell types by dimension reduction plots (for example, t-SNE, UMAP)

· Investigate expression of genes of interest across different cell types (Violin plots, overlay expression on cluster)

· Identify key pathways and regulators from scRNA-seq data using QIAGEN IPA

In this 90-minute training on QIAGEN OmicSoft and Ingenuity Pathway Analysis (IPA), we’ll cover how to easily query inflammatory conditions related to public data (GEO, SRA and more) to:
• Rapidly query and identify public datasets that fit our search criteria
• Discover and validate biomarker expression in disease tissue, different treatments and response groups
• Identify a list of biomarkers specific to a condition (non-responders, disease-specific, cell-type specific and more)
• Confirm condition-specific biomarkers through gene expression heatmaps
• Investigate biological mechanisms through network study

Additional QIAGEN Digital Insights scientists will be on the call to answer your questions and help with other inquiries, such as how to install the software, etc.

In this training, we will focus on how you can use QIAGEN Omicsoft Studio and QIAGEN Ingenuity Pathway Analysis (IPA) to discover new biomarkers, validate (or study) drug targets and identify novel mechanisms of action with your own and/or public checkpoint inhibitor datasets from resources like GEO, SRA, TCGA and more.

In this training, you’ll learn how to:

· Investigate the expression of a gene/biomarker/drug target across different treatments and diseases

· Derive a biomarker/gene signature from a specific condition (for example, non-responders of a drug, or a particular disease/disease subtype and others)

· Correlate expression of multiple genes and biomarkers

· Compare different experimental groups (e.g., your own data and/or public data) at both the levels of gene expression and pathways/regulatory networks activity

Single-cell RNA-sequencing (scRNA-seq) is widely used to investigate tissue heterogeneity, identify novel cell types, study pathogenic mechanisms, develop targeted therapy (including immunotherapy) and more. Accordingly, a tremendous amount of scRNA-seq data has been deposited to public domains like GEO.

In this training, you will learn how to

· Locate public single-cell studies of interest using QIAGEN Omicsoft Single Cell Lands

· Study different cell types by dimension reduction plots (for example, t-SNE, UMAP)

· Investigate expression of genes of interest across different cell types (Violin plots, overlay expression on cluster)

· Identify key pathways and regulators from scRNA-seq data using QIAGEN Ingenuity Pathway Analysis (IPA)

Why manually curated data is essential to convert data into knowledge

Are you a researcher or data scientist working in drug discovery? If so, you depend on data to help you achieve unique insights by revealing patterns across experiments. Yet, not all data are created equal. The quality of data you use to inform your research is essential. For example, if you acquire data using natural language processing (NLP) or text mining, you may have a broad pool of data, but at the high cost of a relatively large number of errors (1).

As a drug development researcher, you’re also familiar with freely available datasets from public ‘omics data repositories. You rely on them to help you gain insights for your preclinical programs. These open-source datasets aggregated in portals such as The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) contain data from thousands of samples used to validate or redirect the discovery of gene signatures, biomarkers and therapies. In theory, access to so much experimental data should be an asset. But, because the data are unintegrated and inconsistent, they are not directly usable. So in practice, it’s costly, time-consuming and utterly inefficient to spend hours sifting through these portals to find the information required to clean up these data so you can use them.

Data you can use right away

Imagine how transformative it would be if you had direct access to ‘usable data’ that you could immediately understand and work with, without searching for additional information or having to clean and structure it. Data that is comprehensive yet accurate, reliable and analysis-ready. Data you can right away begin to convert into knowledge to drive your biomedical discoveries.

Creating usable data 

Data curation has become an essential requirement in producing usable data. Data scientists spend an estimated 80% of their time collecting, cleaning and processing data, leaving less than 20% of their time for analyzing the data to generate insights (2,3). But data curation is not just time-consuming. It’s costly and challenging to scale as well, particularly if legacy datasets must be revised to match updated curation standards.

What if there were a team of experts to take on the manual curation of the data you need so researchers like you could focus on making discoveries?

Our experts have been curating biomedical and clinical data for over 25 years. We’ve made massive investments in a biomedical and clinical knowledge base that contains millions of manually reviewed findings from the literature, plus information from commonly used third-party databases and ‘omics dataset repositories. Our human-certified data enables you to generate insights rather than collect and clean data. With our knowledge and databases, scientists like you can generate high-quality, novel hypotheses quickly and efficiently while using innovative and advanced approaches, including artificial intelligence​.

Figure 1.  Our workflow for processing 'omics data.

 

4 advantages of manually curated data 

Our 200 dedicated curation experts follow these seven best practices for manual curation. Why do we apply so much manual effort to data curation? Based on our principles and practices for manual curation, here are the top reasons manually curated data is fundamental to your research success:

1. Metadata fields are unified, not redundant

Author-submitted metadata vary widely. Manual curation of field names can enforce alignment to a set of well-defined standards.  Our curators identify hundreds of columns containing frequently-used information across studies and combine these data into unified columns to enhance cross-study analyses. This unification is evident in our TCGA metadata dictionary unification is evident in our TCGA metadata dictionary, for example, where we unified into a single field the five different fields that were used to indicate TCGA samples with a cancer diagnosis of a first-degree family member.

2. Data labels are clear and consistent

Unfortunately, it’s common that published datasets provide vague abbreviations as labels for patient groups, tissue type, drugs or other main elements. If you want to develop successful hypotheses from these data, it’s critical you understand the intended meaning and relationship among labels. Our curators take the time to investigate each study and precisely and accurately apply labels so that you can group and compare the data in the study with other relevant studies.

3. Additional contextual information and analysis

Properly labeled data enables scientifically meaningful comparisons between sample groups to reveal biomarkers. Our scientists are committed to expert manual curation and scientific review, which includes generating statistical models to reveal differential expression patterns. In addition to calculating differential expression between sample groups defined by the authors, our scientists perform custom statistical comparisons to support additional insights from the data.

4. Author errors are detected

No matter how consistent data labels are, NLP processes cannot identify misassigned sample groups, and such errors are devastating to data analysis. Unfortunately, it’s not unheard of that data are rendered uninterpretable due to conflicts in sample labeling presented in a publication versus its corresponding entry in a public ‘omics data repository. As shown in Figure 2, for a given Patient ID, both ‘Age’ and ‘Genetic Subtype’ are mismatched between the study’s GEO entry and publication table; which sample labels are correct? Our curators identify these issues and work with authors to correct errors before including the data in our databases.

Figure 2. In this submission to NCBI GEO, the ages of the various patients conflict between the GEO submission and the associated publication. What’s more, the genetic subtype labels are mixed up. Without resolving these errors, the data cannot be used. This attention to detail is required, and can only be achieved with manual curation.

 

At the core of our curation process, curators apply scientific expertise, controlled vocabularies and standardized formatting to all applicable metadata. The result is that you can quickly and easily find all applicable samples across data sources using simplified search criteria.

Dig deeper into the value of QIAGEN Digital Insights’ manual curation process 

Ready to incorporate into your research the reliable biomedical, clinical and ‘omics data we’ve developed using manual curation best practices?  Explore our QIAGEN knowledge and databases, and request a consultation to find out how our manually curated data will save you time and enable you to develop quicker, more reliable hypotheses. Learn more about the costs of free data in our industry report and download our unique and comprehensive metadata dictionary of clinical covariates to experience first-hand just how valuable manual curation really is.

References:

  1. Callahan TJ, Tripodi IJ, Pielke-Lombardo H, Hunter LE. Knowledge-based biomedical data science. Annu Rev Biomed Data Sci. 2020; 3:23–41.
  2. Sarih, A. P. Tchangani, K. Medjaher and E. Pere Data preparation and preprocessing for broadcast systems monitoring in PHM framework. 6th International Conference on Control, Decision and Information Technologies (CoDIT). 2019; 1444–1449.
  3. Big data to good data: Andrew Ng urges ML community to be more data-centric and less model-centric (06/04/2021) https://analyticsindiamag.com/big-data-to-good-data-andrew-ng-urges-ml-community-to-be-more-data-centric-and-less-model-centric/
Sample to Insight
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram
This site is registered on wpml.org as a development site. Switch to a production site key to remove this banner.