Latest improvements for QIAGEN OmicSoft Lands
QIAGEN OmicSoft Lands
Release date: 2024-05-15
OmicSoft Lands Release 2024R1
Highlights
- New breast cancer proteomics studies added to ClinicalProteomicTumor
- CCLE Land updated with new variant information and metadata
- Over 8000 new samples added to OncoLand
- Over 5000 samples added to DiseaseLand
OncoLand updates
ClinicalProteomicTumor
ClinicalProteomicTumor integrates studies focused on cancer proteomics from CPTAC and other repositories, including additional data such as transcriptomics and somatic variation.
This release adds 155 samples and 96 comparisons from PDC000120, focusing on multiple subtypes of breast cancer. These new studies include MS proteomics, RNA-seq, miRNA-seq, and somatic mutation data profiling.
Figure 1. New Samples in ClinicalProteomicTumor from PDC000120, grouped by GeneticSubtype and colored by OncoSampleType.
With this new dataset, as with other datasets in ClinicalProteomicTumor, you can mine the collection of pre-computed comparisons to reveal differentially regulated genes and proteins that can be evaluated as candidate targets or biomarkers, then confirm at the sample level.
Figure 2. Differential expression of genes between triple-receptor negative breast cancer (TNBC) vs. non-TNBC in PDC000120 at the protein and gene levels. (A) Comparison of differential expression at the RNA-seq and protein levels reveals multiple candidate markers of TNBC. (B) Sample-level expression of PPP1R14C at the RNA and protein levels confirms increased levels in TNBC samples.
OncoHuman
OncoHuman is the unified repository of oncology transcriptomics projects from thousands of studies requested by OmicSoft users.
Figure 3. New samples in OncoHuman, grouped by DiseaseState and colored by OncoSampleType.
This release adds 7066 samples and 1187 comparisons from 81 datasets on the following topics:
- Colorectal cancer: GSE100179, GSE113513, GSE131353, GSE133057, GSE14095, GSE140973, GSE161158, GSE164191, GSE193814, GSE200129, GSE216455, GSE37175, GSE37178, GSE64857, GSE71187, GSE73255, GSE75315, GSE81653, and GSE97689
- Stomach cancer: GSE115637, GSE116167, GSE118916, GSE125177, GSE128459, GSE130823, GSE160116, GSE96667, GSE96668, and GSE98708
- Pancreas cancer and esophagus cancer: GSE157096, GSE161533, and GSE221250
- Cutaneous T-cell lymphoma (CTCL): GSE180574, GSE181117, and GSE181118
- Other datasets: GSE107170, GSE117970, GSE117970, GSE123285, GSE126464, GSE131592, GSE132707, GSE132966, GSE132966, GSE140186, GSE142720, GSE145148, GSE147745, GSE151423, GSE151825, GSE162669, GSE16757, GSE172153, GSE173771, GSE178998, GSE179443, GSE185824, GSE19977, GSE200146, GSE20017, GSE204862, GSE210274, GSE212248, GSE214846, GSE222334, GSE223655, GSE226448, GSE230453, GSE39791, GSE43362, GSE45267, GSE45434, GSE45435, GSE46581, GSE51697, GSE62743, GSE64041, GSE78806, GSE80774, GSE80774, and GSE89377
Removed/reprocessed datasets or comparisons
SRP017465, ERP003613 GPL11154, E-MTAB-2836 GPL16791, and GSE5057 GPL96 were removed due to redundancy with other lands (HumanDisease and HPA).
As part of our standard review process, comparisons for the following already landed projects were revised and can be found with an updated “OSModifiedDate”: E-MTAB-3610, E-MTAB-62, E-MTAB-783, E-MTAB-8412, GSE100025, GSE10021, GSE100705, GSE101833, GSE103340, GSE104922, GSE105402, GSE108088, GSE108286, GSE108345, GSE10843, GSE112282, GSE112369, GSE1133, GSE113970, GSE114012, GSE114564, GSE115544, GSE116305, GSE116437, GSE116438, GSE116439, GSE116440, GSE116441, GSE116442, GSE116443, GSE116444, GSE116445, GSE116446, GSE116447, GSE116448, GSE116449, GSE116450, GSE116451, GSE118171, GSE126109, GSE129696, GSE1323, GSE134147, GSE146361, GSE146687, GSE1474, GSE147971, GSE155343, GSE165914, GSE166716, GSE170999, GSE175787, GSE17714, GSE180440, GSE18088, GSE183202, GSE183777, GSE184398, GSE19114, GSE19188, GSE195984, GSE19860, GSE20124, GSE202434, GSE20462, GSE209746, GSE22821, GSE22984, GSE27157, GSE28567, GSE28645, GSE28709, GSE29288, GSE30543, GSE32036, GSE32323, GSE32474, GSE32989, GSE35159, GSE35896, GSE36552, GSE41035, GSE41445, GSE42937, GSE4342, GSE45052, GSE47992, GSE48213, GSE48276, GSE48433, GSE51447, GSE52219, GSE52329, GSE55624, GSE57083, GSE58326, GSE62080, GSE66514, GSE69795, GSE70691, GSE73318, GSE73360, GSE73526, GSE76402, GSE80606, GSE81089, GSE81980, GSE83129, GSE85465, GSE8596, GSE87419, GSE89127, GSE9031, GSE90592, GSE90681, GSE94304, GSE94669, GSE95499, GSE9677, GSE97023, GSE98383, PRJEB25780, and PRJNA816986.
OncoMouse
Figure 4. New samples in OncoMouse, grouped by DiseaseState and colored by TissueCategory.
This release adds 1135 samples and 470 comparisons from 20 datasets, including GSE112585, GSE122774, GSE143253, GSE145573, GSE149175, GSE149178, GSE168846, GSE173107, GSE184599, GSE202940, GSE203260, GSE205644, GSE218161, GSE235599, GSE237098, GSE242835, GSE25671, GSE85385, and GSE85507.
Removed/reprocessed datasets or comparisons
No datasets were removed for this release.
As part of our standard review process, metadata (and comparisons if the case) for the following already landed projects were revised and can be found with an updated “OSModifiedDate”: GSE102416, GSE103712, GSE106683, GSE112174, GSE112973, GSE126080, GSE135691, GSE135785, GSE26410, GSE30865, GSE42708, GSE43803, GSE56252, GSE65503, GSE67497, GSE68162, GSE69290, GSE69544, GSE69688, GSE71908, GSE83915, GSE89077, GSE89823, GSE94133, GSE97133, and GSE97452.
CCLE/DepMap
The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. OmicSoft's CCLE Land provides analysis and visualization of DNA copy number, mRNA expression, mutation data, and more, for 1879 cancer cell lines.
Figure 5. CCLE Land cell line distribution, grouped by DiseaseCategory and colored by TissueCategory.
With this release, new samples were added (based on DepMap 2023Q4 release) and new pharmacological drug response profiling data were added to metadata.
In addition, cell line descriptions were aligned to the OmicSoft curation standard for DiseaseState, TissueCategory, and OncoSampleType, to align with cell lines in OncoHuman and other Lands.
New data
- A total of 48 new samples were added.
- CRISPR gene dependency experiments were updated to CHRONOS data.
- A total of 49 new DNA-seq somatic mutation samples were added.
- A total of 61 new CNV samples were added, and all CNV data were updated with the current data as inferred from WGS, WES, or SNP array data.
Key metadata changes
- Histology and DiseaseLocation[PrimarySite] were recurated entirely from literature.
- DiseaseState, Tissue, and OncoSampleType for each cell line were updated according to the current OmicSoft standards.
- Fields were renamed to be consistent with other OmicSoft Lands.
Old Field Name | New Field Name |
---|---|
New Field | CatalogNumber |
New Field | TumorType[DepMap] |
New Field | TreatmentHistory |
Lineage[DepMap] | OncoTreeLineage |
DiseaseState[Cellosaurus] | OncoTreeDisease |
DiseaseState[Cellosaurus][NCItCode] | OncoTreeCode |
LineageSubtype[DepMap];DiseaseSubtype | OncoTreeDiseaseSubtype |
LineageMolecularSubtype[DepMap] | GeneticSubtype[DepMap][Legacy] |
LineageSubSubtype[DepMap] | LineageSubSubtype[DepMap][Legacy] |
Age[years] | AgeAtSampling[years] |
AgeCategory | AgeCategoryAtSampling |
MicrosatelliteInstability[MSI][CCLE] | MicrosatelliteInstability[MSI][Status][CCLE] |
MicrosatelliteInstability[MSI][GDSC] | MicrosatelliteInstability[MSI][Status][GDSC] |
GeneDependency[XPR1][PMID:35437317] | GeneDependency[XPR1][PMID35437317] |
CCLEName | CellLineName[CCLE] |
CellLineSource | BiomaterialProvider |
Known issues
- Cancerous and normal/non-tumor cell lines originating from the same individual have the same SubjectID, but different DiseaseState values
DiseaseLand updates
HumanDisease
HumanDisease is the unified repository of non-oncology disease omics projects from thousands of studies requested by OmicSoft users.
Figure 6. New samples in HumanDisease (excluding control samples), grouped by DiseaseState and colored by TissueCategory.
This release adds 4982 samples and 1108 comparisons from 72 datasets, including studies on:
- Schizophrenia: GSE202537, GSE235055, GSE226233, GSE206720, GSE184102, GSE182370, GSE155067, GSE132689, and GSE118941
- Depressive disorder: GSE178071, GSE178071, GSE193417, GSE99725, GSE135524, GSE128387, GSE85333, and GSE17440
- Eye disease, retinal degeneration, and retina profiling: GSE102485, GSE131877, GSE132828, GSE142333, GSE144785, GSE151610, GSE154684, GSE164884, GSE176513, GSE180705, GSE186751, GSE201219, GSE201219, GSE227975, GSE75990, GSE94437, and GSE98370
- CRISPR KO: GSE132704, GSE141171, GSE143371, GSE221916, GSE221916, GSE232818, GSE239367, and GSE246263
- Atopic dermatitis: GSE137430, GSE141570, GSE141571, GSE185764, GSE208405, GSE224783, and GSE237920
- Other topics: GSE99454, GSE198449, GSE155700, GSE162955, GSE24265, GSE209552, GSE137856, GSE19205, GSE206213, GSE48761, GSE52285, E-MTAB-12067, GSE124197, GSE141910, PXD038846, and GSE219278
Removed/reprocessed datasets or comparisons
The following datasets were removed from DiseaseLand, as they are duplicated in OncoHuman: GSE48953 GPL9115, GSE63816 GPL11154, GSE65185 GPL11154, GSE67501 GPL14951, GSE76340 GPL10558, GSE76340 GPL6947, and GSE79338 GPL11154.
As part of our standard review process, comparisons for the following already landed projects were revised and can be found with an updated “OSModifiedDate”: E-MTAB-1895, GSE100261, GSE101126, GSE102293, GSE102498, GSE103060, GSE109140, GSE11227, GSE117469, GSE118882, GSE120396, GSE12161, GSE12261, GSE124173, GSE124392, GSE12815, GSE129247, GSE130737, GSE13139, GSE137338, GSE13736, GSE143453, GSE144108, GSE144274, GSE144715, GSE145303, GSE145898, GSE147404, GSE150540, GSE151924, GSE154613, GSE155326, GSE159676, GSE164457, GSE16706, GSE17482, GSE177029, GSE17814, GSE194086, GSE205976, GSE206088, GSE206529, GSE20739, GSE216997, GSE21980, GSE22956, GSE23289, GSE24345, GSE26295, GSE27507, GSE28786, GSE29903, GSE30780, GSE32443, GSE34074, GSE37147, GSE37693, GSE39180, GSE40281, GSE41861, GSE43692, GSE44037, GSE45133, GSE45357, GSE4635, GSE50892, GSE51392, GSE53201, GSE54937, GSE57148, GSE57893, GSE60217, GSE6092, GSE60937, GSE62253, GSE6280, GSE62974, GSE64605, GSE65561, GSE65790, GSE66597, GSE66785, GSE67596, GSE71216, GSE71831, GSE71862, GSE72633, GSE73650, GSE75362, GSE75363, GSE75886, GSE75940, GSE83476, GSE85799, GSE86884, GSE87534, GSE87554, GSE90028, GSE92354, GSE92724, GSE93902, GSE95038, GSE95431, GSE96962, GSE97469, GSE994, and GSE99999.
MouseDisease
MouseDisease is the unified repository comprising thousands of studies exploring mouse models of human disease, requested by OmicSoft users.
Figure 7. New samples in MouseDisease, grouped by DiseaseCategory and colored by TissueCategory.
This release adds 642 samples and 374 comparisons from 31 datasets, including studies on:
- Sleep disorder: GSE166831, GSE211088, and GSE211301
- Schizophrenia: GSE218742, GSE207669, GSE209673, GSE197888, GSE181522, and GSE181285
- Depressive disorder: GSE218742, GSE207669, GSE209673, GSE197888, GSE181522, and GSE181285
- Hemophilia: GSE106436
- Other topics: GSE173926, GSE182698, GSE211982, GSE137595, GSE196266, GSE124197, GSE5296, GSE95653, GSE96055, ERP112950, GSE185476, GSE179802, GSE158777, GSE166412, GSE171852, GSE104036, GSE112348, GSE200575, GSE205958, GSE214701, and GSE221379
Removed/reprocessed datasets or comparisons
No datasets were removed for this release.
As part of our standard review process, comparisons for the following already landed projects were revised and can be found with an updated “OSModifiedDate”: E-MTAB-5326, GSE100635, GSE106463, GSE107655, GSE109055, GSE109329, GSE112116, GSE114838, GSE118628, GSE126454, GSE132040, GSE134226, GSE134659, GSE135442, GSE146074, GSE147034, GSE148084, GSE160020, GSE1623, GSE180493, GSE19286, GSE25765, GSE25766, GSE25767, GSE25890, GSE25926, GSE27382, GSE31928, GSE32078, GSE32936, GSE34889, GSE37746, GSE41044, GSE42813, GSE48200, GSE48217, GSE51969, GSE60413, GSE63062, GSE65094, GSE71379, GSE72069, GSE75000, GSE76811, GSE76812, GSE85409, GSE87212, GSE87317, GSE89412, GSE95401, GSE96694, GSE97353, GSE97806, GSE98423, PRJNA556537, and SRP100399.
ATCC Land updates
ATCC Human
This release adds 215 samples, bringing the total to 1568 samples from 341 unique cell lines.
Figure 8. Distribution of samples in ATCC_Human_B38_GC33, grouped by DiseaseCategory and colored by TissueCategory.
ATCC Mouse
This release adds 21 samples, bringing the total to 198 samples from 49 unique cell lines.
Figure 9. Distribution of samples in ATCC_Mouse_B38, grouped by DiseaseState and colored by TissueCategory.
ATCC update highlights
With this latest release, you can quickly mine statistical comparisons to reveal differentially expressed genes between pairs of cell lines from the same tissue.
Figure 10. Comparison bubble plot displaying fold change (x-axis) and significance (size of bubble) for the expression of DNMT3A.
These new comparison data can be combined with RNA-seq expression data and mutation data to quickly identify the best cell line for your research.
Figure 11. RNA-Seq Mutation Genome Browser View for Flt3 in a subset of cell lines from hematologic samples from ATCC_Mouse_B38. Click on the interactive plot to highlight mutations of interest and explore the underlying sample metadata.
General updates
Updates to OmicSoft Lands flat file schemas
With this latest release, several improvements have been made to the flat file exports of the Lands and the data queries via OmicSoft Lands API. Improvements include unification of the project_id field name across tables, consistent use of snake_case across all clinical_triplets attributes, availability of a persistent comparison_index, and unification of field types across databases.
Land Database Version Cleanup
Customers with dedicated installations are recommended to review the list of available databases and remove any legacy versions that are not being used.
In most cases, the recommended version is Human Genome version 38 and gene model GenCode.V33 (“B38_GC33 Lands”). This will reduce confusion for users who are unsure which database to search for relevant information.
Attend live and on-demand webinars
The expert Field Application Scientists of QIAGEN® routinely hold online trainings for new and advanced users of OmicSoft Lands data, showcasing the use of these resources to answer scientific questions. See upcoming webinars, as well as recordings of previous webinars here: https://digitalinsights.qiagen.com/webinars-and-events/
Update to the latest OmicSoft Suite version to access the latest features
OmicSoft Suite updates significantly reduce the loading time and memory footprint of Single Cell Lands. Updates include new visualizations and features that cannot be accessed in earlier versions. Contact ts-bioinformatics@qiagen.com to learn more.