integrated omics data - QIAGEN Digital Insights

Build confidence and efficiency into your preclinical pipeline with traceable and high-quality cell line ‘omics data from a trusted source

If you’re a biologist studying gene function to identify new biomarkers or targets, or testing drug metabolism and toxicity, cell lines are your wingman. They provide the foundation for your experiments, helping you predict clinical response or develop new in vitro models of disease subtypes or patient segments. Yet fundamental to the value of a cell line is understanding its origin, genome and gene expression pattern to help you identify optimal cell lines for your preclinical studies.

Unfortunately, cell line misidentification, genomic aberrations and microbial contamination can compromise the value of cell lines for your experiments (1, 2). A further complication is that current ‘omics data repositories do not contain data on many cell lines. If you’re not studying cancer, you might find yourself lost since most data repositories are missing non-oncology cell line data. You’ll often come up empty-handed when searching for datasets on mouse cell lines, too. This means you need research funds not only to acquire cell lines for your experiments but also to sequence and characterize the genes in those cell lines before you can even begin experiments. This ends up consuming a lot of time and resources, slowing down your drug development research.

Build efficiency into your preclinical pipeline with high-quality cell line ‘omics data

If you depend on cell lines to support your drug development pipeline, we’ve got news that will transform your research.

We’re now offering a new tool that will streamline how you obtain and use cell line ‘omics data to plan and design your preclinical experiments. We partnered with ATCC to establish a database of transcriptomic (RNA-seq) and genomic (whole exome sequencing) datasets from the most highly utilized human and mouse cell lines and primary tissues and cells in ATCC’s collection. These include the most common cell lines, as well as novel cell lines that do not have publicly available transcriptomic data, offering unique data you can’t get your hands on elsewhere. This database is ATCC Cell Line Land.

Cell line ‘omics data from a credible source to ensure reliable research results

ATCC offers credible, authenticated and characterized cell lines, primary tissue and primary cells to enable reproducible research results. The advantage of working with ‘omics data from ATCC cell lines is that the data is derived from cell lines cultured in ISO-compliant conditions. This means the data is reliable and comes from pure, uncontaminated samples. The transcriptome or whole genome is then sequenced from the cell line or tissue, and the resulting ‘omics data is processed and curated using our stringent and rigorous methods for ‘omics data curation, structuring and integration.

“Every dataset we produce can be traced to a physical lot of cells in our bio repository. Since there are no questions about reproducibility and traceability of those materials, you end up with maximum data provenance.”

– Jonathan Jacobs, PhD, Senior Director of Bioinformatics, ATCC

Manually curated, integrated cell line ‘omics data

The ATCC Cell Line Land datasets are processed following the same high-quality data standards we apply to all QIAGEN OmicSoft Lands collections of ‘omics data, which integrate datasets from the largest public ‘omics data repositories using controlled vocabularies and extensive manual curation. Metadata for ATCC Cell Line Land datasets include standard culture conditions, extraction protocols, sample preparation and NGS library preparation. This consistency in curation increases confidence and enables flexible integration for bioinformatics projects, including AI/ML applications. You can use the data to answer your key research questions, explore genes of interest and investigate mutations that may be important to your in vitro experiments (Figure 1).

Figure 1. Box plot showing MYC gene expression across different cell lines in the human cell line collection of ATCC Cell Line Land. This example shows how you can use ATCC Cell Line Land to quickly find cell lines with either high or low expression for a gene of interest.

Tell us, we’re listening: What ATCC cell line data is most valuable to you?

The data in ATCC Cell Line Land is continually growing, with quarterly releases to include ‘omics data on 1000 new samples each year. What’s more, the data grows based on what you, as a researcher, need most. Our team takes your requests to prioritize the cell lines you want added to our ATCC Cell Line Land collection, as well as the type of experimental data you want curated and included in the database. This may include compound treatments with IC50 values or stimulations with cytokine measurements or other parameters. Contact us with your ideas.

Get in touch

Luckily, you no longer need to waste time and money dealing with public portals or taking on the sequencing of cell lines yourself. Speed up cell line characterization and efficiently plan your in vitro testing experiments with high-quality, manually curated cell line ‘omics data from ATCC Cell Line Land. Learn more about how ATCC Cell Line Land and our other integrated ‘omics data collections help you quickly glean insights from public ‘omics data. Your focus is cancer research? Explore how ATCC Cell Line Land is an excellent complement to the Cancer Cell Line Encyclopedia (CCLE) data in QIAGEN OmicSoft OncoLand.

Learn more and request a consultation to explore how ATCC Cell Line Land will streamline your in vitro experiments and accelerate your drug discovery. Read this press release to find out more about our partnership with ATCC.

References:

Freshney RI. Cell line provenance. Cytotechnology 2002; 39(2):55.
Didion JP, et al. SNP array profiling of mouse cell lines identifies their strains of origin and reveals cross-contamination and widespread aneuploidy. BMC Genomics 2014;3,15(1):847.

Looking for flexible ways to integrate high-quality ‘omics data? Execute detailed queries via API, download flat files or use the GUI. The choice is yours.

Picture this: Your team has excellent in silico data indicating a new compound your company is developing inhibits a particular growth factor. You're tasked with delivering a report summarizing the expression pattern of genes connected to this factor in different tissues and across diseases. Your mission is clear: Find experimental evidence for the transcriptional activity of this growth factor in the context of disease or treatment, and summarize the tissue specificity.

You begin by searching public 'omics data repositories to find possibly relevant datasets, but it's like searching for a needle in a haystack: There is missing metadata, unclear experimental conditions and inconsistent terms. The metadata are so unclear that you have to discard dataset after dataset.

You weed through dozens of public datasets one by one just to collect a few comparable analyses, spending several months of daily data retrieval, cleaning, sorting and categorizing labels for each sample. Finally, you've got a collection supporting your gene in the context of neurological disease. Yet when you scrutinize the data against the source publications, you see many of the experiments were performed under entirely different conditions and many are irrelevant.

Two steps forward and one step back

Months have passed and your report is still full of holes. With a nagging feeling you've let your stakeholders down, you have no choice but to ask for an extended deadline. Then, you go back to where you started and try to fill in the gaps. Frustrated and disappointed, you think: Did I work for years to earn my PhD to spend most of my time searching for and cleaning data?

Feeling like a slave to 'omics data management

If you're a bioinformatician or a data scientist working in pharma, this scenario may sound familiar. You need 'omics data to help you generate high-quality novel hypotheses for your R&D colleagues to explore. Your organization must remain ahead of your competition, so they need you to develop hypotheses quickly and efficiently.

Flexible data access is one way you can achieve this. Yet you spend time, money and resources for the advantages of flexible data access. You must invest heavily to maintain your data infrastructure and carefully and consistently search to find new data to add so you can make the large-scale queries required for your projects. Worse, gaps and inconsistencies in dataset metadata often return misleading results that could negatively impact your research. Even valuable consortia fall out of date because of the pain required to ingest and unify the latest updates into your schemas.

Goodbye 'omics data management, hello unique and reliable insights

What if you no longer had to retrieve, ingest and maintain databases containing public 'omics data riddled with inconsistencies? How might you reinvest your time in worthwhile tasks to accelerate R&D initiatives? What if you had flexible access to comprehensive, structured, highly granular databases of integrated, disease-relevant 'omics data collected from thousands of publications?

We bet you'd feel empowered to dig deeper into research questions. You'd have more time to focus on the science behind the data rather than locating data, scrutinizing its quality and cleaning up the various metadata inconsistencies. Instead of spending time ingesting and cleaning data, you'd be able to more quickly deliver reliable reports filled with unique and valuable insights your R&D colleagues can run with.

Introducing flexible API access to QIAGEN OmicSoft Land data

With API access to manually curated QIAGEN OmicSoft integrated 'omics data, you'll overcome public 'omics data hurdles to easily drive new discoveries and validation in drug development. Our curation process delivers consistent and extensive metadata across datasets and ensures reliable insights. This enables you to perform efficient, effective and targeted queries of data slices across our pre-structured database.

QIAGEN OmicSoft API delivers access to highly structured data and metadata in OmicSoft Lands (Figures 1 and 2). API access allows you to perform large and complex cross-database multi-omics queries without maintaining your own database. You can also explore the data through file exports to your own database or a GUI for 'omics visualization.

Figure 1. Full data delivery via flat files for data scientists. The flat file format is ideal for integration into internal databases, programmatic high-throughput integrative analysis, and machine learning applications. The benefits are highly structured export of all data and metadata in QIAGEN OmicSoft Lands, 'omics data tables and comparison results along with metadata.

Figure 2. QIAGEN OmicSoft Land content available through programmatic access. Explore over 650,000 samples across hundreds of diseases, tissues and cell types. Find all datasets with matching criteria and download 'omics results from all matching samples.

QIAGEN OmicSoft API is ideal for interactive and programmatic data querying for integrative analysis and machine learning applications. You can use it to identify and download cell-level expression from all curated single-cell RNA-seq projects, including specific cell types (Figure 3) or to identify potential gene signatures in cell lines (Figure 4).

Figure 3. Violin plot of single-cell RNA-seq data from QIAGEN OmicSoft Single Cell Lands, retrieved using the OmicSoft Land APIs. Gene-level RPM-normalized data for CD8A were retrieved for all curated cell types matching “T cell” from any breast or lung tissue with annotated disease “cancer” or “carcinoma”.

Figure 4. Example of a simple query you can do with OmicSoft Lands API, finding the top genes that are co-expressed together with SERPINB7 in Cancer Cell Line Encyclopedia (CCLE).

Get in touch

Learn about the integrated 'omics data collections in our QIAGEN OmicSoft DiseaseLand, OncoLand and Single Cell Land, which now offer flexible API access to enable your large-scale, complex queries. Have questions or research projects you'd like to discuss? Request a consultation so we can help you find the right type of access and 'omics data collection for your research goals. Get in touch with us at bioinformaticssales@qiagen.com to discuss your specific research requirements.

Build confidence and efficiency into your preclinical pipeline with traceable and high-quality cell line ‘omics data from a trusted source

Looking for flexible ways to integrate high-quality ‘omics data? Execute detailed queries via API, download flat files or use the GUI. The choice is yours.

Follow Us

Contact Us