Picture this: Your team has excellent in silico data indicating a new compound your company is developing inhibits a particular growth factor. You're tasked with delivering a report summarizing the expression pattern of genes connected to this factor in different tissues and across diseases. Your mission is clear: Find experimental evidence for the transcriptional activity of this growth factor in the context of disease or treatment, and summarize the tissue specificity.
You begin by searching public 'omics data repositories to find possibly relevant datasets, but it's like searching for a needle in a haystack: There is missing metadata, unclear experimental conditions and inconsistent terms. The metadata are so unclear that you have to discard dataset after dataset.
You weed through dozens of public datasets one by one just to collect a few comparable analyses, spending several months of daily data retrieval, cleaning, sorting and categorizing labels for each sample. Finally, you've got a collection supporting your gene in the context of neurological disease. Yet when you scrutinize the data against the source publications, you see many of the experiments were performed under entirely different conditions and many are irrelevant.
Two steps forward and one step back
Months have passed and your report is still full of holes. With a nagging feeling you've let your stakeholders down, you have no choice but to ask for an extended deadline. Then, you go back to where you started and try to fill in the gaps. Frustrated and disappointed, you think: Did I work for years to earn my PhD to spend most of my time searching for and cleaning data?
Feeling like a slave to 'omics data management
If you're a bioinformatician or a data scientist working in pharma, this scenario may sound familiar. You need 'omics data to help you generate high-quality novel hypotheses for your R&D colleagues to explore. Your organization must remain ahead of your competition, so they need you to develop hypotheses quickly and efficiently.
Flexible data access is one way you can achieve this. Yet you spend time, money and resources for the advantages of flexible data access. You must invest heavily to maintain your data infrastructure and carefully and consistently search to find new data to add so you can make the large-scale queries required for your projects. Worse, gaps and inconsistencies in dataset metadata often return misleading results that could negatively impact your research. Even valuable consortia fall out of date because of the pain required to ingest and unify the latest updates into your schemas.
Goodbye 'omics data management, hello unique and reliable insights
What if you no longer had to retrieve, ingest and maintain databases containing public 'omics data riddled with inconsistencies? How might you reinvest your time in worthwhile tasks to accelerate R&D initiatives? What if you had flexible access to comprehensive, structured, highly granular databases of integrated, disease-relevant 'omics data collected from thousands of publications?
We bet you'd feel empowered to dig deeper into research questions. You'd have more time to focus on the science behind the data rather than locating data, scrutinizing its quality and cleaning up the various metadata inconsistencies. Instead of spending time ingesting and cleaning data, you'd be able to more quickly deliver reliable reports filled with unique and valuable insights your R&D colleagues can run with.
Introducing flexible API access to QIAGEN OmicSoft Land data
With API access to manually curated QIAGEN OmicSoft integrated 'omics data, you'll overcome public 'omics data hurdles to easily drive new discoveries and validation in drug development. Our curation process delivers consistent and extensive metadata across datasets and ensures reliable insights. This enables you to perform efficient, effective and targeted queries of data slices across our pre-structured database.
QIAGEN OmicSoft API delivers access to highly structured data and metadata in OmicSoft Lands (Figures 1 and 2). API access allows you to perform large and complex cross-database multi-omics queries without maintaining your own database. You can also explore the data through file exports to your own database or a GUI for 'omics visualization.
Figure 1. Full data delivery via flat files for data scientists. The flat file format is ideal for integration into internal databases, programmatic high-throughput integrative analysis, and machine learning applications. The benefits are highly structured export of all data and metadata in QIAGEN OmicSoft Lands, 'omics data tables and comparison results along with metadata.
Figure 2. QIAGEN OmicSoft Land content available through programmatic access. Explore over 650,000 samples across hundreds of diseases, tissues and cell types. Find all datasets with matching criteria and download 'omics results from all matching samples.
QIAGEN OmicSoft API is ideal for interactive and programmatic data querying for integrative analysis and machine learning applications. You can use it to identify and download cell-level expression from all curated single-cell RNA-seq projects, including specific cell types (Figure 3) or to identify potential gene signatures in cell lines (Figure 4).
Figure 3. Violin plot of single-cell RNA-seq data from QIAGEN OmicSoft Single Cell Lands, retrieved using the OmicSoft Land APIs. Gene-level RPM-normalized data for CD8A were retrieved for all curated cell types matching “T cell” from any breast or lung tissue with annotated disease “cancer” or “carcinoma”.
Figure 4. Example of a simple query you can do with OmicSoft Lands API, finding the top genes that are co-expressed together with SERPINB7 in Cancer Cell Line Encyclopedia (CCLE).
Get in touch
Learn about the integrated 'omics data collections in our QIAGEN OmicSoft DiseaseLand, OncoLand and Single Cell Land, which now offer flexible API access to enable your large-scale, complex queries. Have questions or research projects you'd like to discuss? Request a consultation so we can help you find the right type of access and 'omics data collection for your research goals. Get in touch with us at bioinformaticssales@qiagen.com to discuss your specific research requirements.