Release date: 2024-02-01
With our upcoming update to OmicSoft Lands, we will retire legacy versions of databases from hosted Lands servers. Customers with dedicated installations are recommended to review the list of available databases and remove any legacy versions that are not being used. In most cases, the recommended version is Human Genome version 38 and gene model GenCode.V33 (“B38_GC33 Lands”). This will reduce confusion for users who are unsure which database to use to search for relevant information.
The expert Field Application Scientists of QIAGEN® routinely hold online trainings for new and advanced users of OmicSoft Lands data, showcasing the use of these resources to answer scientific questions. See upcoming webinars, as well as recordings of previous webinars here: https://digitalinsights.qiagen.com/webinars-and-events/
OmicSoft Suite will soon roll out a new version that will significantly reduce the loading time and memory footprint of Single Cell Lands. Updates include new visualizations and features that cannot be accessed in earlier versions. Contact ts-bioinformatics@qiagen.com to learn more.
Figure 1. Samples available from the Human Protein Atlas Land with RNA-seq or TMA protein data, grouped by TissueCategory and colored by tissue.
The HPA project aims to map all human proteins in cells, tissues, and organs. A key focus of this effort is the use of both transcriptomics and tissue microarray (TMA) proteomics.
HPA data are now available for Human Genome version 38/OmicSoftGencode.V33, with 134 more samples that were not available in HPA_B37. This release includes 226 samples with complete transcriptomic profiles (transcript, gene, exon junction, exon, mutation, and fusion) processed using the OmicSoft RNA-seq pipeline and 280 samples with protein profiles from TMA assays.
Metadata definitions can be found as tooltips in OmicSoft Studio.
New metadata fields: SampleMaterial, TissueDescription, AgeSummary, CellStructure, LibraryKit, Molecule, Description, SampleProcessingLocation, SampleAlias, SampleDescription, and RIN
Renamed metadata fields:
Old Field Name | New Field Name | ||
Land Sample Type | OncoSampleType | ||
Reference | CellLineReference | ||
Sample Type | SampleType | ||
ErxID | SrxID | ||
EnaSample | SampleAccession |
Removed metadata fields (from HPA_B37): AssayName, EnaRun, ExtractName, FastqUri, MaterialType, NominalLength, Orientation, ProtocolRef, ReadIndex1BaseCoord, ScanName, SpotLength, TechnologyType, TermSourceRef, Land Tissue, Tumor Or Normal, Read Length, Coverage_GeneWith10RPKM, Coverage_GeneWith10RPKMRate, Coverage_GeneWith1RPKM, Coverage_GeneWith1RPKMRate, RNASeq Mapped Read Count, RNASeq Mapping Rate [Percent], CellOrigin, and Selected":2,"335551550":6,"335551620":6,"335559739":300,"335559740":300}">
Note: samples hpa_cerebellum_GLUC_cells_cytoplasm_membrane and hpa_cerebellum_GLUC_cells_nucleus refer to an unknown “GLUC” cell type, which could not be further identified. These samples are curated as “CellType: other cell” and “CellDescription: GLUC cell”.
ClinicalProteomicTumor integrates studies focused on cancer proteomics from CPTAC and other repositories, including additional data such as transcriptomics and somatic variation.
This release adds 408 samples and 374 comparisons from 2 projects, focusing on lung adenocarcinoma (paired tumor-normal) and colon adenocarcinoma. These new studies include mass spectrometry (MS) proteomics, RNA-seq, and somatic mutation data profiling.
Figure 2. Correlation scatter plot of statistical comparisons of primary lung tumor vs paired normal lung samples in PDC000153, assessing RNA (X-axis) vs MS protein (Y-axis) fold change.
OncoHuman is the unified repository of oncology transcriptomics projects from thousands of studies requested by OmicSoft users.
Figure 3. New samples in OncoHuman, grouped by DiseaseState and colored by TissueCategory.
This release adds 3285 samples and 819 comparisons from 88 datasets on the following topics:
No datasets were removed from this release.
As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE159472, GSE57611, GSE34171, GSE34171, GSE34171, GSE4475, GSE1299, GSE1299, GSE1299, GSE45419, GSE8096, GSE28448, GSE1133, GSE1822, GSE1825, GSE2067, and GSE229.
Figure 4. New MS proteomics comparisons enable rapid biomarker discovery. (A) Hundreds of new statistical comparisons of MS proteomics data are available for BRCA, COAD, OV, and READ cancer types. (B) Differential protein expression analysis between HER2-positive and HER2-negative breast cancer proteomics samples reveal upregulation of ERBB2 (HER2), GRB7 (adjacent to ERBB2 in the genome), and MIEN1. (C) MIEN1 protein is significantly upregulated in many statistical comparisons, including in breast cancers. (D) Sample-level protein expression of MIEN1 is higher on average in HER2-positive breast cancer samples compared to HER2-negative samples.
The Cancer Genome Atlas (TCGA) is a landmark pan-cancer genomics program profiling thousands of tumors across 33 cancer types. QIAGEN OmicSoft enhanced the utility of these data with major curation and unification efforts. Learn more about the curation and available metadata dictionary here: https://digitalinsights.qiagen.com/news/blog/discovery/a-better-way-to-explore-tcga-data/
With this release, TCGA metadata are updated to the current formatting standards used in other OncoLands (project-level metadata, spaces removed from column names, reformatted AgeAtDiagnosis). New proteomics data and comparisons were added, and methylation array data uplifted from Human Genome version 37.
For some samples, the TCGA-reported immunohistochemistry (IHC) statuses for ER, HER2, and PR are not consistent with the reported staining intensity. Interpret these samples with caution.
SubjectID | ER[Status][IHC] | ER[Staining][%] | ER[Status][IHC][Score] | ||
TCGA-LL-A73Y | Negative | NA | 3 | ||
TCGA-AC-A3W6 | Positive | 90–99 | 1 |
SubjectID | HER2[Status][IHC] | HER2[Staining][%] | HER2[Status][IHC][Score] | ||
TCGA-AO-A03M | Negative | 30–39 | 2 | ||
TCGA-AC-A3W6 | Negative | NA | 3 |
SubjectID | PR[Status][IHC] | PR[Staining][%] | PR[Status][IHC][Score] | ||
TCGA-BH-A0DQ | Positive | 70–79 | 1 | ||
TCGA-S3-AA12 | Negative | NA | 2 |
HumanDisease is the unified repository of non-oncology disease omics projects from thousands of studies requested by OmicSoft users.
Figure 5. New samples in HumanDisease (excluding control samples), grouped by DiseaseState and colored by TissueCategory.
This release adds 2962 samples and 985 comparisons from 63 datasets, including studies on:
The following samples were temporarily removed from GSE7307 GPL570 because the treatment term “SDF” is underspecified: GSM175963, GSM175964, GSM175965, GSM175971, GSM175972, and GSM175997.
No datasets were removed this release.
As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE102956, GSE102956, GSE117588, GSE13837, GSE56495, GSE54363, GSE13837, GSE46451, GSE74235, GSE95588, GSE49454, GSE65391, GSE97165, GSE175913, GSE179448, GSE6269, GSE6269, GSE41332, GSE102875, and GSE121137.
MouseDisease is the unified repository comprising thousands of studies exploring mouse models of human disease, requested by OmicSoft users.
Figure 6. New samples in MouseDisease, grouped by DiseaseState and colored by TissueCategory.
This release adds 252 samples and 164 comparisons from 20 datasets, focusing on neurodegenerative disorders, autoimmunity and liver diseases: GSE111988, GSE113815, GSE120937, GSE122367, GSE124426, GSE125708, GSE128724, GSE135167, GSE138992, GSE141663, GSE148784, GSE157766, GSE158259, GSE161160, GSE163568, GSE174214, GSE179441, GSE180808, GSE185327, and GSE191131.
No datasets were removed for this release.
As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE52403, GSE106463, GPL17021, GSE113951, GPL19057, GSE109125, GSE104817, GSE58813, and GSE30431.
RatDisease is the unified repository comprising thousands of studies exploring rat models of human disease, requested by OmicSoft users.
Figure 7. New samples in RatDisease, grouped by DiseaseState and colored by TissueCategory.
This release adds 1189 samples and 602 comparisons from 18 datasets, focusing on compound profiling and toxicity: GSE172109, GSE124431, GSE10015, GSE10015, GSE113000, GSE132815, GSE34250, GSE47875, GSE49473, GSE53082, GSE63902, and GSE74676.
No datasets were removed for this release.
As part of our standard review process, comparisons for the following already-landed projects were revised and can be found with an updated “OSModifiedDate”: GSE194074.
This release adds 215 samples, bringing the total to 1353 samples from 315 unique cell lines.
Figure 8. Distribution of samples in ATCC_Human_B38_GC33, grouped by DiseaseCategory and colored by TissueCategory.
This release adds 84 new samples, bringing the total to 177 samples from 47 unique cell lines.
Figure 9. Distribution of samples in ATCC_Mouse_B38, grouped by DiseaseState and colored by TissueCategory.
This release begins the addition of lung cell lines from the ATCC collection.
Figure 10. Distribution of lung samples in ATCC_Human_B38_GC33, grouped by DiseaseState and colored by CellLine.
Combine the ability to search for QIAGEN Signaling Pathways with ATCC Cell Line Lands to identify cell lines exhibiting similar expression profiles within a pathway of interest.
Figure 11. Subset of a heatmap of the QIAGEN Sonic Hedgehog Signaling Pathway gene expression from respiratory samples from ATCC_Human_B38_GC33. Hierarchical clustering enables the grouping of samples with similar gene expression profiles.