QIAGEN powered by

Latest improvements for QIAGEN OmicSoft Lands

  Current line          Archive

QIAGEN OmicSoft Lands

Release date: 2024-02-01

OmicSoft Lands Release 2023R4

Highlights

  • Several updates for proteomics data, including data from The Cancer Genome Atlas (TCGA), the Human Protein Atlas (HPA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC)
  • Hundreds of new datasets with over 1000 new statistical comparisons revealing differentially expressed genes and proteins
  • 215 new samples added to ATCC Cell Line Land, bringing the total number of samples to 1353

 

General updates

Land Database Version Cleanup

With our upcoming update to OmicSoft Lands, we will retire legacy versions of databases from hosted Lands servers. Customers with dedicated installations are recommended to review the list of available databases and remove any legacy versions that are not being used. In most cases, the recommended version is Human Genome version 38 and gene model GenCode.V33 (“B38_GC33 Lands”). This will reduce confusion for users who are unsure which database to use to search for relevant information.

Attend live and on-demand webinars

The expert Field Application Scientists of QIAGEN® routinely hold online trainings for new and advanced users of OmicSoft Lands data, showcasing the use of these resources to answer scientific questions. See upcoming webinars, as well as recordings of previous webinars here: https://digitalinsights.qiagen.com/webinars-and-events/

Update to the latest OmicSoft Suite version to access the latest features

OmicSoft Suite will soon roll out a new version that will significantly reduce the loading time and memory footprint of Single Cell Lands. Updates include new visualizations and features that cannot be accessed in earlier versions. Contact ts-bioinformatics@qiagen.com to learn more.

 

BodyMap updates

Human Protein Atlas

Figure 1. Samples available from the Human Protein Atlas Land with RNA-seq or TMA protein data, grouped by TissueCategory and colored by tissue.

The HPA project aims to map all human proteins in cells, tissues, and organs. A key focus of this effort is the use of both transcriptomics and tissue microarray (TMA) proteomics.

HPA data are now available for Human Genome version 38/OmicSoftGencode.V33, with 134 more samples that were not available in HPA_B37. This release includes 226 samples with complete transcriptomic profiles (transcript, gene, exon junction, exon, mutation, and fusion) processed using the OmicSoft RNA-seq pipeline and 280 samples with protein profiles from TMA assays.

 

Metadata curation improvements

Metadata definitions can be found as tooltips in OmicSoft Studio.

New metadata fields: SampleMaterial, TissueDescription, AgeSummary, CellStructure, LibraryKit, Molecule, Description, SampleProcessingLocation, SampleAlias, SampleDescription, and RIN

 

Renamed metadata fields:

  Old Field Name  New Field Name   
Land Sample Type  OncoSampleType 
Reference  CellLineReference 
Sample Type  SampleType 
ErxID  SrxID 
EnaSample  SampleAccession 

Removed metadata fields (from HPA_B37): AssayName, EnaRun, ExtractName, FastqUri, MaterialType, NominalLength, Orientation, ProtocolRef, ReadIndex1BaseCoord, ScanName, SpotLength, TechnologyType, TermSourceRef, Land Tissue, Tumor Or Normal, Read Length, Coverage_GeneWith10RPKM, Coverage_GeneWith10RPKMRate, Coverage_GeneWith1RPKM, Coverage_GeneWith1RPKMRate, RNASeq Mapped Read Count, RNASeq Mapping Rate [Percent], CellOrigin, and Selected":2,"335551550":6,"335551620":6,"335559739":300,"335559740":300}">

Note: samples hpa_cerebellum_GLUC_cells_cytoplasm_membrane and hpa_cerebellum_GLUC_cells_nucleus refer to an unknown “GLUC” cell type, which could not be further identified. These samples are curated as “CellType: other cell” and “CellDescription: GLUC cell”.

 

OncoLand updates

ClinicalProteomicTumor

ClinicalProteomicTumor integrates studies focused on cancer proteomics from CPTAC and other repositories, including additional data such as transcriptomics and somatic variation.

This release adds 408 samples and 374 comparisons from 2 projects, focusing on lung adenocarcinoma (paired tumor-normal) and colon adenocarcinoma. These new studies include mass spectrometry (MS) proteomics, RNA-seq, and somatic mutation data profiling.

Figure 2. Correlation scatter plot of statistical comparisons of primary lung tumor vs paired normal lung samples in PDC000153, assessing RNA (X-axis) vs MS protein (Y-axis) fold change.

 

OncoHuman

OncoHuman is the unified repository of oncology transcriptomics projects from thousands of studies requested by OmicSoft users.

Figure 3. New samples in OncoHuman, grouped by DiseaseState and colored by TissueCategory.

 

This release adds 3285 samples and 819 comparisons from 88 datasets on the following topics:

  • Urinary cancer (bladder): GSE125547, GSE145260, GSE147938, GSE149582, GSE169455, GSE130455, GSE135527, GSE136401, GSE151505, GSE156461, GSE163899, GSE176178, GSE179440, GSE183777, GSE186609, GSE186610, GSE198607, GSE199471, GSE77952, and GSE86411
  • Urinary cancer (kidney): GSE133446, GSE146354, GSE150404, GSE153965, GSE180925, GSE98334, GSE133460, GSE138869, GSE151419, GSE151423, GSE165728, GSE171358, GSE180820, and GSE197047
  • Colorectal cancer: GSE97023, GSE86557, GSE86559, GSE86562, GSE86563, and GSE86564
  • β-cell acute lymphoblastic leukemia (β-ALL): GSE236141, GSE236138, GSE236138, GSE212312, GSE212312, GSE212209, GSE207057, and GSE206258
  • T cell lymphoma (TCL): GSE126768, GSE50803, and GSE63122
  • Cutaneous T cell lymphoma (CTCL): GSE157442, GSE162137, and GSE119345
  • Diffuse large B cell lymphoma (DLBCL): GSE215900, GSE212746, GSE171763, GSE207388, and GSE182359
  • Non-small cell lung cancer (NSCLC): GSE118370, GSE144945, GSE174330, GSE1987, and GSE211044
  • Small cell lung cancer (SCLC): GSE126353, GSE142024, GSE144457, GSE151000, GSE151904, GSE155923, GSE159801, GSE205880, GSE210101, GSE210104, and GSE210110
  • Lung adenocarcinoma (LUAD): GSE75037 and GSE94601
  • Neuroblastoma: GSE148263 and GSE89413
  • CRISPR knockout: GSE144972, GSE144972, GSE160393, GSE174256, GSE175787, GSE182487, GSE183592, GSE185318, GSE189552, GSE189553, GSE195675, GSE199538, and GSE210870
  • Drug toxicity: GSE102006

 

Removed/reprocessed datasets or comparisons

No datasets were removed from this release.

As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE159472, GSE57611, GSE34171, GSE34171, GSE34171, GSE4475, GSE1299, GSE1299, GSE1299, GSE45419, GSE8096, GSE28448, GSE1133, GSE1822, GSE1825, GSE2067, and GSE229.

 

The Cancer Genome Atlas 

Figure 4. New MS proteomics comparisons enable rapid biomarker discovery. (A) Hundreds of new statistical comparisons of MS proteomics data are available for BRCA, COAD, OV, and READ cancer types. (B) Differential protein expression analysis between HER2-positive and HER2-negative breast cancer proteomics samples reveal upregulation of ERBB2 (HER2), GRB7 (adjacent to ERBB2 in the genome), and MIEN1. (C) MIEN1 protein is significantly upregulated in many statistical comparisons, including in breast cancers. (D) Sample-level protein expression of MIEN1 is higher on average in HER2-positive breast cancer samples compared to HER2-negative samples.

The Cancer Genome Atlas (TCGA) is a landmark pan-cancer genomics program profiling thousands of tumors across 33 cancer types. QIAGEN OmicSoft enhanced the utility of these data with major curation and unification efforts. Learn more about the curation and available metadata dictionary here: https://digitalinsights.qiagen.com/news/blog/discovery/a-better-way-to-explore-tcga-data/

With this release, TCGA metadata are updated to the current formatting standards used in other OncoLands (project-level metadata, spaces removed from column names, reformatted AgeAtDiagnosis). New proteomics data and comparisons were added, and methylation array data uplifted from Human Genome version 37.

 

New data:

  • MS proteomics data added for COAD and READ samples
  • Statistical comparisons for BRCA, OV, COAD, and READ based on proteomics data
  • Methylation array data uplifted from Human.B37 array data

 

New curated metadata fields:

  • PairingStatus and PairingType fields added
  • MutationStatus[EGFR] added for relevant samples

 

Additional metadata changes:

  • SmokingStatus and surgery terms are now aligned to the OmicSoft CV 
  • Several fields formatted to the current OmicSoft standard, including MutationStatus[RET], PR[Staining][%][MetastaticBreastCarcinoma], PR[Staining][%], HER2[Staining][%][MetastaticBreastCarcinoma], HER2[Staining][%], ER[Staining][%][MetastaticBreastCarcinoma], ER[Staining][%], MIB1Positive[%], PRAD_Cell2015_TumorCellularity[%], and NecrosisContent[Total][%]

 

Known issues:

For some samples, the TCGA-reported immunohistochemistry (IHC) statuses for ER, HER2, and PR are not consistent with the reported staining intensity. Interpret these samples with caution.

  SubjectID  ER[Status][IHC]  ER[Staining][%]  ER[Status][IHC][Score]   
TCGA-LL-A73Y  Negative  NA  3 
TCGA-AC-A3W6  Positive  90–99  1 

 

  SubjectID  HER2[Status][IHC]  HER2[Staining][%]  HER2[Status][IHC][Score]   
TCGA-AO-A03M  Negative  30–39  2 
TCGA-AC-A3W6  Negative  NA  3 

 

  SubjectID  PR[Status][IHC]  PR[Staining][%]  PR[Status][IHC][Score]   
TCGA-BH-A0DQ  Positive  70–79  1 
TCGA-S3-AA12  Negative  NA  2 

DiseaseLand updates

HumanDisease

HumanDisease is the unified repository of non-oncology disease omics projects from thousands of studies requested by OmicSoft users.

Figure 5. New samples in HumanDisease (excluding control samples), grouped by DiseaseState and colored by TissueCategory.

 

This release adds 2962 samples and 985 comparisons from 63 datasets, including studies on:

  • Gene knockout/CRISPR/siRNA experiments: GSE102004, GSE103309, GSE104690, GSE110268, GSE114918, GSE115311, GSE115357, GSE115973, GSE116934, GSE120396, GSE120397, GSE122334, GSE124636, GSE128177, GSE130391, GSE134386, and GSE134662
  • Graft-versus-host disease (GVHD) and transplant: GSE193309, GSE157959, GSE158834, GSE159853, GSE163568, GSE164425, GSE165025, GSE166024, GSE174020, GSE181741, GSE181804, GSE181813, GSE181981, GSE182173, GSE182223, GSE182440, GSE182610, GSE182612, GSE182649, and GSE182678
  • Parkinson's disease (PD): GSE135998, GSE144127, GSE150686, GSE150735, GSE151190, GSE152939, GSE153657, GSE154577, and GSE157538
  • Alcohol dependence: GSE193977, GSE208347, GSE218048, GSE222889, GSE223390, GSE225159, GSE74324, and GSE74991

 

Removed/reprocessed datasets or comparisons

The following samples were temporarily removed from GSE7307 GPL570 because the treatment term “SDF” is underspecified: GSM175963, GSM175964, GSM175965, GSM175971, GSM175972, and GSM175997.

No datasets were removed this release.

As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE102956, GSE102956, GSE117588, GSE13837, GSE56495, GSE54363, GSE13837, GSE46451, GSE74235, GSE95588, GSE49454, GSE65391, GSE97165, GSE175913, GSE179448, GSE6269, GSE6269, GSE41332, GSE102875, and GSE121137.

 

MouseDisease

MouseDisease is the unified repository comprising thousands of studies exploring mouse models of human disease, requested by OmicSoft users.

Figure 6. New samples in MouseDisease, grouped by DiseaseState and colored by TissueCategory.

This release adds 252 samples and 164 comparisons from 20 datasets, focusing on neurodegenerative disorders, autoimmunity and liver diseases: GSE111988, GSE113815, GSE120937, GSE122367, GSE124426, GSE125708, GSE128724, GSE135167, GSE138992, GSE141663, GSE148784, GSE157766, GSE158259, GSE161160, GSE163568, GSE174214, GSE179441, GSE180808, GSE185327, and GSE191131.

 

Removed/reprocessed datasets or comparisons

No datasets were removed for this release.

As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE52403, GSE106463, GPL17021, GSE113951, GPL19057, GSE109125, GSE104817, GSE58813, and GSE30431.

 

RatDisease 

RatDisease is the unified repository comprising thousands of studies exploring rat models of human disease, requested by OmicSoft users.

Figure 7. New samples in RatDisease, grouped by DiseaseState and colored by TissueCategory.

This release adds 1189 samples and 602 comparisons from 18 datasets, focusing on compound profiling and toxicity: GSE172109, GSE124431, GSE10015, GSE10015, GSE113000, GSE132815, GSE34250, GSE47875, GSE49473, GSE53082, GSE63902, and GSE74676.

 

Removed/reprocessed datasets or comparisons

No datasets were removed for this release.

As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE194074.

 

ATCC Land updates

ATCC Human

This release adds 215 samples, bringing the total to 1353 samples from 315 unique cell lines.

Figure 8. Distribution of samples in ATCC_Human_B38_GC33, grouped by DiseaseCategory and colored by TissueCategory.

 

ATCC Mouse Summary

This release adds 84 new samples, bringing the total to 177 samples from 47 unique cell lines.

Figure 9. Distribution of samples in ATCC_Mouse_B38, grouped by DiseaseState and colored by TissueCategory.

 

Highlighted ATCC topics

This release begins the addition of lung cell lines from the ATCC collection.

Figure 10. Distribution of lung samples in ATCC_Human_B38_GC33, grouped by DiseaseState and colored by CellLine.

 

Combine the ability to search for QIAGEN Signaling Pathways with ATCC Cell Line Lands to identify cell lines exhibiting similar expression profiles within a pathway of interest.

Figure 11. Subset of a heatmap of the QIAGEN Sonic Hedgehog Signaling Pathway gene expression from respiratory samples from ATCC_Human_B38_GC33. Hierarchical clustering enables the grouping of samples with similar gene expression profiles.