CONTENTS
- 1. Introduction
- 2. Installation
- 3. Data Scenario and Preparation
- 4. IOBR Pipeline Introduction
- 5. Functional Modules
- 5.1 Signature And TME Deconvolution Module
- 5.2 Phenotype Module
- 5.3 Mutation Module
- 5.4 Model Construction Module
- 6. Demonstration of IOBR Pipeline with An Example Dataset
- 7. Summary
- 8. Session Information
- 9. Citing IOBR
- REFERENCES
1. Introduction
Recent advance in next generation sequencing has triggered the rapid
accumulation of publicly available multi-omics data1. The
application of integrated omics to exploring robust signatures for
clinical translation is increasingly highlighted in immuno-oncology,but
raises computational and biological challenges2. This
vignette aims to demonstrate how to utilize the package named IOBR to
perform multi-omics immuno-oncology biological research to decode tumor
microenvironment and signatures for clinical translation. This R package
integrates 8 published methodologies for decoding tumor microenvironment
(TME) contexture: CIBERSORT
, TIMER
,
xCell
, MCPcounter
, ESITMATE
,
EPIC
, IPS
, quanTIseq
. Moreover,
255 published signature gene sets were collected by IOBR, involving
tumor microenvironment, tumor metabolism, m6A, exosomes, microsatellite
instability, and tertiary lymphoid structure. Run the function
signature_collection_citation
to attain the source papers
and the function signature_collection
returns the detail
signature genes of all given signatures. Subsequently, IOBR adopts three
computational methods to calculate the signature score, comprising
PCA
,z-score
, and ssGSEA
. To note,
IOBR collected and employed multiple approaches for variable transition,
visualization, batch survival analysis, feature selection, and
statistical analysis. Batch analysis and visualization of corresponding
results is supported. The details of how IOBR works is described
below.
Graphical abstract outlines the workflow of IOBR package.
2. Installation
It is essential that you have R 3.6.3 or above already installed on your computer or server. IOBR is a pipeline that utilizes many other R packages that are currently available from CRAN, Bioconductor and GitHub.
Before installing IOBR, please properly install all dependencies
including tibble
, survival
,
survminer
, limma
, limSolve
,
GSVA
, e1071
, preprocessCore
,
ggplot2
, ggpubr
and so on. It is an easy way
to install them by executing the following command in your R
session:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
<-c('tibble', 'survival', 'survminer', 'sva', 'limma', "DESeq2","devtools",
depens'limSolve', 'GSVA', 'e1071', 'preprocessCore', 'ggplot2', "biomaRt",
'ggpubr', "devtools", "tidyHeatmap", "caret", "glmnet", "ppcor", "timeROC","pracma")
for(i in 1:length(depens)){
<-depens[i]
depenif (!requireNamespace(depen, quietly = TRUE))
::install(depen,update = FALSE)
BiocManager }
Then, you can start to install IOBR from github by typing the following code into your R session:
if (!requireNamespace("remotes", quietly = TRUE)) install("remotes")
if (!requireNamespace("IOBR", quietly = TRUE))
::install_github("IOBR/IOBR",ref="master") remotes
Load the IOBR package in your R session after the installation is complete:
library(IOBR)
library(tidyverse)
library(tidyHeatmap)
library(maftools)
library(ggpubr)
library(ggplot2)
library(survival)
3. Data Scenario and Preparation
3.1 Gene-expression Data Derived from RNAseq
Preparation of Gene-expression data derived from RNAseq.
For transcriptomic data of TCGA data sets, we strongly recommend user
to use UCSCXenaTools R
package. Here, we download counts data of TCGA-STAD from UCSC using UCSCXenaTools R package.
If you use it in published research, please cite: Wang et al.,
(2019). The UCSCXenaTools R package: a toolkit for accessing genomics
data from UCSC Xena platform, from cancer multi-omics to single-cell
RNA-seq. Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627. If you are not
online or can not access to UCSC, we have prepared expression data of
TCGA-STAD in IOBR package.data(eset_stad)
could be applied
to loading gene expression data.
if (!requireNamespace("UCSCXenaTools", quietly = TRUE))
::install("UCSCXenaTools")
BiocManagerlibrary(UCSCXenaTools)
#> Warning: package 'UCSCXenaTools' was built under R version 4.2.1
#> =========================================================================================
#> UCSCXenaTools version 1.4.8
#> Project URL: https://github.com/ropensci/UCSCXenaTools
#> Usages: https://cran.r-project.org/web/packages/UCSCXenaTools/vignettes/USCSXenaTools.html
#>
#> If you use it in published research, please cite:
#> Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data
#> from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq.
#> Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
#> =========================================================================================
#> --Enjoy it--
# NOTE: This process may take a few minutes which depends on the internet connection speed. Please wait for its completion.
<-XenaGenerate(subset = XenaCohorts =="GDC TCGA Stomach Cancer (STAD)") %>%
eset_stadXenaFilter(filterDatasets = "TCGA-STAD.htseq_counts.tsv") %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
#> This will check url status, please be patient.
#> All downloaded files will under directory C:\Users\inter\AppData\Local\Temp\Rtmpgrjpa0.
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena.
#> Creating directories for datasets...
#> Downloading TCGA-STAD.htseq_counts.tsv.gz
#> Warning in download.file(url, destfile, ...): downloaded length 15804857 !=
#> reported length 53326928
#> Warning in download.file(url, destfile, ...): URL 'https://gdc-hub.s3.us-
#> east-1.amazonaws.com:443/download/TCGA-STAD.htseq_counts.tsv.gz': Timeout of 60
#> seconds was reached
#> ==> Trying #2
1:5,1:5]
eset_stad[#> # A tibble: 5 × 5
#> Ensembl_ID `TCGA-D7-5577-01A` `TCGA-D7-6818-01A` TCGA-BR-795…¹ TCGA-…²
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 ENSG00000000003.13 11.3 10.6 10.2 10.8
#> 2 ENSG00000000005.5 1 0 0 2
#> 3 ENSG00000000419.11 11.2 9.96 10.2 11.6
#> 4 ENSG00000000457.12 9.13 8.77 8.73 10.1
#> 5 ENSG00000000460.15 8.42 8.89 8.46 9.87
#> # … with abbreviated variable names ¹`TCGA-BR-7958-01A`, ²`TCGA-D7-8572-01A`
Transform gene expression matrix into TPM format, and conduct subsequent annotation.
# Remove the version numbers in Ensembl ID.
$Ensembl_ID<-substring(eset_stad$Ensembl_ID, 1, 15)
eset_stad<-column_to_rownames(eset_stad, var = "Ensembl_ID")
eset_stad# Revert back to original format because the data from UCSC was log2(x+1)transformed.
<-(2^eset_stad)+1
eset_stad
# In this process, *biomaRt* R package is utilized to acquire the gene length of each Ensembl ID and calculate the TPM of each sample. If identical gene symbols exists, these genes would be ordered by the mean expression levels. The gene symbol with highest mean expression level is selected and remove others.
# NOTE: This process may take a few minutes which depends on the internet connection speed. Please wait for its completion.
<-count2tpm(countMat = eset_stad, idType = "Ensembl", org="hsa", source = "default" )
eset_stad1:5,1:5]
eset_stad[#> TCGA-D7-5577-01A TCGA-D7-6818-01A TCGA-BR-7958-01A TCGA-D7-8572-01A
#> MT-CO1 19600.53 57604.74 30611.19 35684.14
#> MT-CO3 17239.27 25291.91 20113.80 16945.16
#> MT-ND4 12377.47 24725.91 17311.51 19804.20
#> MT-CO2 15897.82 33685.69 20257.41 15774.81
#> MT-RNR2 16547.18 36068.29 14697.32 11877.90
#> TCGA-VQ-A91Z-01A
#> MT-CO1 27435.75
#> MT-CO3 30689.71
#> MT-ND4 29343.94
#> MT-CO2 22850.07
#> MT-RNR2 26176.65
3.2 Gene-expression Data Derived from Array
Obtain dataset from GEO Gastric cancer:GSE100935 using GEOquery R package.
if (!requireNamespace("GEOquery", quietly = TRUE))
::install("GEOquery")
BiocManagerlibrary("GEOquery")
# NOTE: This process may take a few minutes which depends on the internet connection speed. Please wait for its completion.
<-getGEO(GEO = "GSE100935",
eset_geogetGPL = F,
destdir = "./")
<-eset_geo[[1]]
eset <-exprs(eset)
eset 1:5,1:5]
eset[#> GSM2696792 GSM2696793 GSM2696794 GSM2696795 GSM2696796
#> 1007_s_at 10.261650 10.740100 10.694980 10.692940 10.565560
#> 1053_at 8.365080 8.463314 8.535923 8.091345 7.280246
#> 117_at 7.212391 5.832635 5.616512 8.090054 4.959763
#> 121_at 7.377803 7.549826 8.186557 7.459644 7.782259
#> 1255_g_at 6.699024 2.621356 2.422049 2.423371 2.603957
Annotate genes in expression matrix and remove duplicate genes.
# Load the annotation file `anno_hug133plus2` in IOBR.
head(anno_hug133plus2)
#> # A tibble: 6 × 2
#> probe_id symbol
#> <fct> <fct>
#> 1 1007_s_at MIR4640
#> 2 1053_at RFC2
#> 3 117_at HSPA6
#> 4 121_at PAX8
#> 5 1255_g_at GUCA1A
#> 6 1294_at MIR5193
# Conduct gene annotation using `anno_hug133plus2` file; If identical gene symbols exists, these genes would be ordered by the mean expression levels. The gene symbol with highest mean expression level is selected and remove others.
<-anno_eset(eset = eset,
esetannotation = anno_hug133plus2,
symbol = "symbol",
probe = "probe_id",
method = "mean")
1:5, 1:5]
eset[#> GSM2696792 GSM2696793 GSM2696794 GSM2696795 GSM2696796
#> SH3KBP1 14.30308 14.56398 14.37668 14.47983 14.56702
#> RPL41 14.36095 14.32783 14.33181 14.34614 14.35773
#> LOC101928826 14.18638 14.38247 14.34530 14.26433 14.23477
#> EEF1A1 14.13245 14.15141 14.08980 14.05879 14.09277
#> B2M 14.17932 14.25636 14.01158 14.12871 14.05124
# Another annotation file `anno_illumina` is also provided by IOBR for convenient annotation of Illumina gene expression arrays
head(anno_illumina)
#> # A tibble: 6 × 2
#> probe_id symbol
#> <fct> <fct>
#> 1 ILMN_1725881 LOC23117
#> 2 ILMN_1910180 HS.575038
#> 3 ILMN_1804174 FCGR2B
#> 4 ILMN_1796063 TRIM44
#> 5 ILMN_1811966 LOC653895
#> 6 ILMN_1668162 DGAT2L3
4. IOBR Pipeline Introduction
Pipeline diagram dipict major functions contained in IOBR which are categorized into four functional modules.
!
Outline of all functions for data preparation and analyses of each module.
- Data Preparation: data annotation and transformation
count2tpm()
: transform count data of RNA sequencing into TPM data.anno_eset()
: annotate the normalized genes expression matrix, including RNAseq and array (Affymetrix or Illumina).remove_duplicate_genes()
: remove the genes annotated with the duplicated symbol after normalization and retain only the symbol with highest expression level.
- TME Deconvolution Module: integrate multiple algorithms to decode immune contexture
deconvo_tme()
: decode the TME infiltration with different deconvolution methodologies, based on bulk RNAseq, microarray or single cell RNAseq data.generateRef()
: generate a novel gene reference matrix for a specific feature such as infiltrating cell, through the SVR and lsei algorithm.
- Signature Module: calculate signature scores, estimate phenotype related signatures and corresponding genes, and evaluate signatures generated from single-cell RNA sequencing data
calculate_sig_score()
: estimate the interested signatures enrolled in IOBR R package, which involves TME-associated, tumor-metabolism, and tumor-intrinsic signatures.feature_manipulation()
: manipulate features including the cell fraction and signatures generated from multi-omics data for latter analysis and model construction. Remove missing values, outliers and variables without significant variance.format_signatures()
: generate the object ofcalculate_sig_score()
function, by inputting a data frame with signatures as column names of corresponding gene sets, and return a list contain the signature information for calculating multiple signature scores.format_msigdb()
: transform the signature gene sets data with gmt format, which is not included in the signature collection and might be downloaded in the MSgiDB website, into the object ofcalculate_sig_score()
function.- Batch Analysis and Visualization: batch survival analysis and batch correlation analysis and other batch statistical analyses
batch_surv
: batch survival analysis of multiple continuous variables including varied signature scores.subgroup_survival
: batch survival analysis of multiple categorized variables with different number of subgroups.batch_cor()
: batch analysis of correlation between two continuous variables using Pearson correlation coefficient or Spearman’s rank correlation coefficient .batch_wilcoxon()
: conduct batch wilcoxon analyses of binary variables.batch_pcc()
: batch analyses of Partial Correlation coefficient(PCC) between continuous variables and minimize the interference derived from confounding factors.iobr_cor_plot()
: visualization of batch correlation analysis of signatures from ‘sig_group’. Visualize the correlation between signature or phenotype with expression of gene sets in target signature is also supported.cell_bar_plot()
: batch visualization of TME cell fraction, supporting input of deconvolution results from ‘CIBERSORT’, ‘EPIC’ and ‘quanTIseq’ methodologies to further compare the TME cell distributions within one sample or among different samples.
- Signature Associated Mutation Module: identify and analyze mutations relevant to targeted signatures
make_mut_matrix()
: transform the mutation data with MAF format(contain the columns of gene ID and the corresponding gene alterations which including SNP, indel and frameshift) into a mutation matrix in a suitable manner for further investigating signature relevant mutations.find_mutations()
: identify mutations associated with a distinct phenotype or signature.
- Model Construction Module: feature selection and fast model construct to predict clinical phenotype
BinomialModel()
: select features and construct a model to predict a binary phenotype.PrognosticMode()
: select features and construct a model to predict clinical survial outcome.
5. Functional Modules
IOBR consists of four functional modules, comprising decoding immune contexture (TME deconvolution module), estimation of signature scores, phenotype related signatures and corresponding genes, and signatures generated from single-cell sequencing data (signature module), analysis of signature associated mutations (signature associated mutation module) and fast model construction (model construction module). Details of each module and functional codes are described below. Hence, let us follow the analytic pipeline of IOBR step by step.
5.1 Signature And TME Deconvolution Module
5.2.1 Signature Estimation
IOBR integrates 255 published signature gene sets, involving tumor
microenvironment, tumor metabolism, m6A, exosomes, microsatellite
instability, and tertiary lymphoid structure. Running the function
signature_collection_citation
to attain the source papers.
The function signature_collection
returns the detail
signature genes of all given signatures.
1) Signature collection
Obtain the included signatures first.The signature collection is mainly classified into 3 categories: TME-associated, tumor-metabolism, and tumor-intrinsic signatures.
# Return available parameter options of signature estimation.
signature_score_calculation_methods#> PCA ssGSEA z-score Integration
#> "pca" "ssgsea" "zscore" "integration"
#TME associated signatures
names(signature_tme)[1:20]
#> [1] "CD_8_T_effector" "DDR"
#> [3] "APM" "Immune_Checkpoint"
#> [5] "CellCycle_Reg" "Pan_F_TBRs"
#> [7] "Histones" "EMT1"
#> [9] "EMT2" "EMT3"
#> [11] "WNT_target" "FGFR3_related"
#> [13] "Cell_cycle" "Mismatch_Repair"
#> [15] "Homologous_recombination" "Nucleotide_excision_repair"
#> [17] "DNA_replication" "Base_excision_repair"
#> [19] "TMEscoreA_CIR" "TMEscoreB_CIR"
#Metabolism related signatures
names(signature_metabolism)[1:20]
#> [1] "Cardiolipin_Metabolism"
#> [2] "Cardiolipin_Biosynthesis"
#> [3] "Cholesterol_Biosynthesis"
#> [4] "Citric_Acid_Cycle"
#> [5] "Cyclooxygenase_Arachidonic_Acid_Metabolism"
#> [6] "Prostaglandin_Biosynthesis"
#> [7] "Purine_Biosynthesis"
#> [8] "Pyrimidine_Biosynthesis"
#> [9] "Dopamine_Biosynthesis"
#> [10] "Epinephrine_Biosynthesis"
#> [11] "Norepinephrine_Biosynthesis"
#> [12] "Fatty_Acid_Degradation"
#> [13] "Fatty_Acid_Elongation"
#> [14] "Fatty_Acid_Biosynthesis"
#> [15] "Folate_One_Carbon_Metabolism"
#> [16] "Folate_biosynthesis"
#> [17] "Gluconeogenesis"
#> [18] "Glycolysis"
#> [19] "Glycogen_Biosynthesis"
#> [20] "Glycogen_Degradation"
#Signatures associated with biomedical basic research: such as m6A and exosomes
names(signature_tumor)
#> [1] "Nature_metabolism_Hypoxia"
#> [2] "Winter_hypoxia_signature"
#> [3] "Hu_hypoxia_signature"
#> [4] "Molecular_Cancer_m6A"
#> [5] "MT_exosome"
#> [6] "SR_exosome"
#> [7] "Positive_regulation_of_exosomal_secretion"
#> [8] "Negative_regulation_of_exosomal_secretion"
#> [9] "Exosomal_secretion"
#> [10] "Exosome_assembly"
#> [11] "Extracellular_vesicle_biogenesis"
#> [12] "MC_Review_Exosome1"
#> [13] "MC_Review_Exosome2"
#> [14] "CMLS_Review_Exosome"
#> [15] "Ferroptosis"
#> [16] "EV_Cell_2020"
#signature collection including all aforementioned signatures
names(signature_collection)[1:20]
#> [1] "CD_8_T_effector" "DDR"
#> [3] "APM" "Immune_Checkpoint"
#> [5] "CellCycle_Reg" "Pan_F_TBRs"
#> [7] "Histones" "EMT1"
#> [9] "EMT2" "EMT3"
#> [11] "WNT_target" "FGFR3_related"
#> [13] "Cell_cycle" "Mismatch_Repair"
#> [15] "Homologous_recombination" "Nucleotide_excision_repair"
#> [17] "DNA_replication" "Base_excision_repair"
#> [19] "TMEscoreA_CIR" "TMEscoreB_CIR"
#citation of signatures
1:20,]
signature_collection_citation[#> # A tibble: 20 × 6
#> Signatures `Published year` Journal Title PMID DOI
#> <chr> <dbl> <chr> <chr> <chr> <chr>
#> 1 CD_8_T_effector 2018 Nature TGFβ… 2944… 10.1…
#> 2 DDR 2018 Nature TGFβ… 2944… 10.1…
#> 3 APM 2018 Nature TGFβ… 2944… 10.1…
#> 4 Immune_Checkpoint 2018 Nature TGFβ… 2944… 10.1…
#> 5 CellCycle_Reg 2018 Nature TGFβ… 2944… 10.1…
#> 6 Pan_F_TBRs 2018 Nature TGFβ… 2944… 10.1…
#> 7 Histones 2018 Nature TGFβ… 2944… 10.1…
#> 8 EMT1 2018 Nature TGFβ… 2944… 10.1…
#> 9 EMT2 2018 Nature TGFβ… 2944… 10.1…
#> 10 EMT3 2018 Nature TGFβ… 2944… 10.1…
#> 11 WNT_target 2018 Nature TGFβ… 2944… 10.1…
#> 12 FGFR3_related 2018 Nature TGFβ… 2944… 10.1…
#> 13 Cell_cycle 2018 Nature TGFβ… 2944… 10.1…
#> 14 Mismatch_Repair 2018 Nature TGFβ… 2944… 10.1…
#> 15 Homologous_recombination 2018 Nature TGFβ… 2944… 10.1…
#> 16 Nucleotide_excision_repair 2018 Nature TGFβ… 2944… 10.1…
#> 17 DNA_replication 2018 Nature TGFβ… 2944… 10.1…
#> 18 Base_excision_repair 2018 Nature TGFβ… 2944… 10.1…
#> 19 TMEscoreA_CIR 2019 Cancer Immunol… Tumo… 3084… 10.1…
#> 20 TMEscoreB_CIR 2019 Cancer Immunol… Tumo… 3084… 10.1…
2) Methods for signature calculation
Three methodologies were adopted in the process of signature score evaluation, comprising Single-sample Gene Set Enrichment Analysis (ssGSEA), Principal component analysis (PCA), and Z-score.
The input data prepared in IOBR is a matrix (log2(TPM+1) transformed) containing 98 TCGA-STAD samples, with genes in rows and samples in columns. The row name must be HGNC symbols and the column name must be sample names.
# Load the `eset_stad` test data of gene expression matrix in IOBR
data("eset_stad")
1:5, 1:5]
eset_stad[#> TCGA-B7-5818 TCGA-BR-4187 TCGA-BR-4201 TCGA-BR-4253 TCGA-BR-4256
#> MT-CO1 15.18012 15.55806 14.60960 14.63728 15.23528
#> MT-CO3 14.75536 15.19199 14.55337 13.54925 14.30425
#> MT-ND4 14.19637 15.61564 15.80262 14.98329 14.83764
#> MT-CO2 15.10790 15.50514 15.43261 14.52009 14.65806
#> MT-RNR2 14.22690 14.71157 13.48096 13.32553 13.55689
Calculate TME associated signatures-(through PCA method).
<-calculate_sig_score(pdata = NULL,
sig_tmeeset = eset_stad,
signature = signature_tme,
method = "pca",
mini_gene_count = 2)
1:5, 1:10]
sig_tme[#> # A tibble: 5 × 10
#> Index ID CD_8_…¹ DDR APM Immun…² CellC…³ Pan_F…⁴ Histo…⁵ EMT1
#> <int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 TCGA-B7-5818 2.44 -2.94 1.60 1.08 -3.29 -1.48 3.25 -1.16
#> 2 2 TCGA-BR-4187 -0.490 -6.86 -0.468 0.129 -0.105 1.89 1.28 2.48
#> 3 3 TCGA-BR-4201 1.11 4.39 -0.650 1.17 1.82 0.466 -0.184 1.47
#> 4 4 TCGA-BR-4253 7.56 12.9 3.87 5.65 2.50 -3.78 2.15 0.179
#> 5 5 TCGA-BR-4256 2.74 4.25 4.06 3.46 1.47 3.11 -0.706 1.42
#> # … with abbreviated variable names ¹CD_8_T_effector, ²Immune_Checkpoint,
#> # ³CellCycle_Reg, ⁴Pan_F_TBRs, ⁵Histones
Estimate TME associated signatures-(through ssGSEA method).
<-calculate_sig_score(pdata = NULL,
sig_tmeeset = eset_stad,
signature = signature_tme,
method = "ssgsea",
mini_gene_count = 5)
1:5, 1:10]
sig_tme[#> # A tibble: 5 × 10
#> ID Index CD_8_T_ef…¹ DDR APM Immun…² CellC…³ Pan_F…⁴ EMT1 EMT2
#> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 1 0.456 0.368 0.555 0.340 0.362 0.479 0.414 0.385
#> 2 TCGA-BR-4187 2 0.334 0.343 0.537 0.278 0.419 0.513 0.516 0.472
#> 3 TCGA-BR-4201 3 0.400 0.370 0.548 0.307 0.443 0.512 0.504 0.463
#> 4 TCGA-BR-4253 4 0.539 0.394 0.573 0.453 0.442 0.431 0.419 0.335
#> 5 TCGA-BR-4256 5 0.447 0.362 0.569 0.381 0.452 0.529 0.503 0.455
#> # … with abbreviated variable names ¹CD_8_T_effector, ²Immune_Checkpoint,
#> # ³CellCycle_Reg, ⁴Pan_F_TBRs
Analyze all collected signature scores (integrating three methods: PCA, ssGSEA and z-score).
<-calculate_sig_score(pdata = NULL,
sig_reseset = eset_stad,
signature = signature_collection,
method = "integration",
mini_gene_count = 2)
1:5,1:5]
sig_res[#> # A tibble: 5 × 5
#> ID Index CD_8_T_effector_PCA DDR_PCA APM_PCA
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 1 2.44 -2.94 1.60
#> 2 TCGA-BR-4187 2 -0.490 -6.86 -0.468
#> 3 TCGA-BR-4201 3 1.11 4.39 -0.650
#> 4 TCGA-BR-4253 4 7.56 12.9 3.87
#> 5 TCGA-BR-4256 5 2.74 4.25 4.06
The signature gene sets derived from GO, KEGG, HALLMARK and REACTOME datasets.
IOBR also enrolls the signature gene sets, containing GO, KEGG, HALLMARK and REACTOME gene sets obtained from MsigDB. The ssGSEA method is recommended for these signature estimation. This process may take a while for big datasets or calculating a large number of signatures.
<-calculate_sig_score(pdata = NULL,
sig_hallmarkeset = eset_stad,
signature = hallmark,
method = "ssgsea",
mini_gene_count = 2)
1:5,1:5]
sig_hallmark[#> # A tibble: 5 × 5
#> ID Index HALLMARK_TNFA_SIGNALING_VIA_NFKB HALLMARK_HYPOXIA HALLMAR…¹
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 1 0.911 0.794 0.887
#> 2 TCGA-BR-4187 2 0.923 0.865 0.889
#> 3 TCGA-BR-4201 3 0.980 0.870 0.936
#> 4 TCGA-BR-4253 4 0.952 0.754 0.858
#> 5 TCGA-BR-4256 5 1.01 0.863 0.907
#> # … with abbreviated variable name ¹HALLMARK_CHOLESTEROL_HOMEOSTASIS
5.2.2 Signatures Derived From Single Cell RNAseq
The recent advances in single-cell analysis make it a popular alternative to determine cell markers and gene signatures for phenotype. However, the significantly expensive cost and high requirement for starting tumor material still limited its widespread utility. Therefore, the attainable bulk sequencing larger dataset is still a need in validating the clinical significance of these signatures.
1) Methods for signature estimation
Given that single-cell sequencing have a high resolution for
dissecting signature genes of cells, we provide mutiple methodologies to
extract cell signaturte genes from the single-cell sequencing data
(inputTPM
or inputcounts
). Additionally, the
linear svr
algorithm of CIBERSORT or lsei
algorithm is adopted in the bulk-seq data analysis to verify the
clinical significance of the targeted cells identified by single-cell
RNA sequencing.
Data Sources: Lineage-dependent gene expression programs influence the immune landscape of colorectal cancer. Nat Genet 2020 Method: 10X 3’ Data operation process: Cell filtering: 1. Only immune cells and stromal cells detected in the tumor are retained.
- Select the cell type with the cell volume greater than 500.
Gene filtering: Genes detected in more than 5% of cells.
load("E:/18-Github/Organization/backup-data/Park_et_al_pseudo_bulk_count.rda")
load("E:/18-Github/Organization/backup-data/Park_et_al_pdata.rda")
load("E:/18-Github/Organization/backup-data/Park_et_al_pseudo_bulk_TPM.rda")
<- GenerateRef(dat = inputTPM,
immune_feature_limma pheno = pdatacounts$cluster,
method = "limma",
FDR = 0.05)
#> NULL
<- GenerateRef(dds = inputcounts,
immune_feature_DESeq2 pheno = pdatacounts$cluster,
dat = inputTPM,
method = "DESeq2",
FDR = 0.05)
#> NULL
<- immune_feature_DESeq2$reference_matrix
reference <- immune_feature_DESeq2$condition_number
condition_number <- immune_feature_DESeq2$G top_probe
data("eset_stad")
<-deconvo_tme(eset = eset_stad,
svmreference = reference,
method = "svm",
arrays = FALSE,
absolute.mode = FALSE,
# abs.method = "sig.score",
perm = 1000)
head(svm)
#> # A tibble: 0 × 0
<-deconvo_tme(eset = eset_stad,
lseireference = reference,
method = "lsei",
arrays = FALSE,
scale_reference = T,
perm = 1000)
head(lsei)
#> # A tibble: 6 × 21
#> ID Bcell…¹ Bcell…² Bcell…³ Myelo…⁴ Myelo…⁵ Myelo…⁶ Myelo…⁷ Strom…⁸ Strom…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-… 0 0 0.0182 0 0 0 0.305 0.676 0
#> 2 TCGA-… 0 0 0.0174 0 0 0 0.229 0.753 0
#> 3 TCGA-… 0 0 0.0325 0 0 0 0.251 0.716 0
#> 4 TCGA-… 0 0 0.0311 0 0 0 0.339 0.431 0
#> 5 TCGA-… 0 0 0.0242 0 0 0 0.302 0.674 0
#> 6 TCGA-… 0 0 0.0100 0 0 0 0.279 0.711 0
#> # … with 11 more variables: StromalcellsStalklikeECs_lsei <dbl>,
#> # StromalcellsStromal2_lsei <dbl>, StromalcellsStromal3_lsei <dbl>,
#> # StromalcellsTiplikeECs_lsei <dbl>, TcellsCD4Tcells_lsei <dbl>,
#> # TcellsCD8Tcells_lsei <dbl>, TcellsgammadeltaTcells_lsei <dbl>,
#> # TcellsNKcells_lsei <dbl>, TcellsRegulatoryTcells_lsei <dbl>,
#> # TcellsTfollicularhelpercells_lsei <dbl>, TcellsThelper17cells_lsei <dbl>,
#> # and abbreviated variable names ¹BcellsCD19CD20B_lsei, …
2) Signature-phenotype correlation
Following is the example to explore the correlation between TLS score with tumor molecular characteristics, tumor microenvironment, and clinical phenotype of patients. B cells and tertiary lymphoid structures promote immunotherapy response
# Construct the signature list as an object first. The TLs signature mentioned above is included in the 'signature_collection' in IOBR.
$TLS_Nature
signature_collection
# It is recommended to calculate other included signatures simultaneously, to systematically decode the correlation between TLS score and other tumor microenvironment characteristics.
# Data of mice received checkpoint blockade is adopted in this exploration.
<-calculate_sig_score(pdata = NULL,
sig_reseset = eset_GSE63557,
signature = signature_collection,
method = "integration",
mini_gene_count = 2)
<-merge(pdata_GSE63557,sig_res,by="ID",all =FALSE)
inputwilcox.test(input$TLS_Nature_PCA~input$BOR_binary)
#>
#> Wilcoxon rank sum exact test
#>
#> data: input$TLS_Nature_PCA by input$BOR_binary
#> W = 0, p-value = 1.083e-05
#> alternative hypothesis: true location shift is not equal to 0
5.1.1 Available Methods to Decode TME Contexture
tme_deconvolution_methods#> MCPcounter EPIC xCell CIBERSORT
#> "mcpcounter" "epic" "xcell" "cibersort"
#> CIBERSORT Absolute IPS ESTIMATE SVR
#> "cibersort_abs" "ips" "estimate" "svr"
#> lsei TIMER quanTIseq
#> "lsei" "timer" "quantiseq"
# Return available parameter options of deconvolution methods
If you use this package in your work, please cite both our package and the method(s) you are using.
Licenses of the deconvolution methods
method | license | citation |
---|---|---|
CIBERSORT | free for non-commerical use only | Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., … Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12(5), 453–457. https://doi.org/10.1038/nmeth.3337 |
ESTIMATE | free (GPL2.0) | Vegesna R, Kim H, Torres-Garcia W, …, Verhaak R. (2013). Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications 4, 2612. http://doi.org/10.1038/ncomms3612 |
quanTIseq | free (BSD) | Finotello, F., Mayer, C., Plattner, C., Laschober, G., Rieder, D., Hackl, H., …, Sopper, S. (2019). Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome medicine, 11(1), 34. https://doi.org/10.1186/s13073-019-0638-6 |
TIMER | free (GPL 2.0) | Li, B., Severson, E., Pignon, J.-C., Zhao, H., Li, T., Novak, J., … Liu, X. S. (2016). Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biology, 17(1), 174. https://doi.org/10.1186/s13059-016-1028-7 |
IPS | free (BSD) | P. Charoentong et al., Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Reports 18, 248-262 (2017). https://doi.org/10.1016/j.celrep.2016.12.019 |
MCPCounter | free (GPL 3.0) | Becht, E., Giraldo, N. A., Lacroix, L., Buttard, B., Elarouci, N., Petitprez, F., … de Reyniès, A. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biology, 17(1), 218. https://doi.org/10.1186/s13059-016-1070-5 |
xCell | free (GPL 3.0) | Aran, D., Hu, Z., & Butte, A. J. (2017). xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biology, 18(1), 220. https://doi.org/10.1186/s13059-017-1349-1 |
EPIC | free for non-commercial use only (Academic License) | Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E., & Gfeller, D. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. ELife, 6, e26476. https://doi.org/10.7554/eLife.26476 |
Licenses of the signature-esitmation method
method | license | citation |
---|---|---|
GSVA | free (GPL (>= 2)) | Hänzelmann S, Castelo R, Guinney J (2013). “GSVA: gene set variation analysis for microarray and RNA-Seq data.” BMC Bioinformatics, 14, 7. doi: 10.1186/1471-2105-14-7, http://www.biomedcentral.com/1471-2105/14/7 |
The input data is a matrix (log2(TMP+1) transformed) containing 98 TCGA-STAD samples, with genes in rows and samples in columns. The row name must be HGNC symbols and the column name must be sample names.
# Load the `eset_stad` test data of gene expression matrix in IOBR
data("eset_stad")
1:5, 1:5]
eset_stad[#> TCGA-B7-5818 TCGA-BR-4187 TCGA-BR-4201 TCGA-BR-4253 TCGA-BR-4256
#> MT-CO1 15.18012 15.55806 14.60960 14.63728 15.23528
#> MT-CO3 14.75536 15.19199 14.55337 13.54925 14.30425
#> MT-ND4 14.19637 15.61564 15.80262 14.98329 14.83764
#> MT-CO2 15.10790 15.50514 15.43261 14.52009 14.65806
#> MT-RNR2 14.22690 14.71157 13.48096 13.32553 13.55689
Check detail parameters of the function
help(deconvo_tme)
#> starting httpd help server ... done
1) Method 1: CIBERSORT
<-deconvo_tme(eset = eset_stad, method = "cibersort", arrays = FALSE, perm = 200 )
cibersort#>
#> >>> Running CIBERSORT
head(cibersort)
#> # A tibble: 6 × 26
#> ID B_cel…¹ B_cel…² Plasm…³ T_cel…⁴ T_cel…⁵ T_cel…⁶ T_cel…⁷ T_cel…⁸ T_cel…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-… 0.0323 0 0 0.193 0 0.105 0.0755 0.0175 0.0956
#> 2 TCGA-… 0.0866 0 0 0.0872 0 0.143 0.0745 0 0
#> 3 TCGA-… 0.0474 0 0.00644 0.0286 0 0.215 0.0586 0 0
#> 4 TCGA-… 0.0125 0 0.00257 0.224 0 0 0.301 0.0119 0.00332
#> 5 TCGA-… 0.0544 0 0.00923 0.0936 0 0.176 0.0760 0.0327 0.00548
#> 6 TCGA-… 0.0246 0 0.00162 0.124 0 0.110 0.0931 0.0414 0.0172
#> # … with 16 more variables: T_cells_gamma_delta_CIBERSORT <dbl>,
#> # NK_cells_resting_CIBERSORT <dbl>, NK_cells_activated_CIBERSORT <dbl>,
#> # Monocytes_CIBERSORT <dbl>, Macrophages_M0_CIBERSORT <dbl>,
#> # Macrophages_M1_CIBERSORT <dbl>, Macrophages_M2_CIBERSORT <dbl>,
#> # Dendritic_cells_resting_CIBERSORT <dbl>,
#> # Dendritic_cells_activated_CIBERSORT <dbl>,
#> # Mast_cells_resting_CIBERSORT <dbl>, Mast_cells_activated_CIBERSORT <dbl>, …
<-cell_bar_plot(input = cibersort[1:12,], title = "CIBERSORT Cell Fraction")
res#> There are seven categories you can choose: box, continue2, continue, random, heatmap, heatmap3, tidyheatmap
#> >>>>=== Palette option for random: 1: palette1; 2: palette2; 3: palette3; 4: palette4
2) Method 2: EPIC
help(deconvo_epic)
<-deconvo_tme(eset = eset_stad, method = "epic", arrays = FALSE)
epic#>
#> >>> Running EPIC
#> Warning in IOBR::EPIC(bulk = eset, reference = ref, mRNA_cell = NULL, scaleExprs = TRUE): The optimization didn't fully converge for some samples:
#> TCGA-BR-6705; TCGA-BR-7958; TCGA-BR-8366; TCGA-BR-A4J4; TCGA-BR-A4J9; TCGA-CD-A4MG; TCGA-FP-8209; TCGA-HU-A4G9
#> - check fit.gof for the convergeCode and convergeMessage
#> Warning in IOBR::EPIC(bulk = eset, reference = ref, mRNA_cell = NULL, scaleExprs
#> = TRUE): mRNA_cell value unknown for some cell types: CAFs, Endothelial - using
#> the default value of 0.4 for these but this might bias the true cell proportions
#> from all cell types.
head(epic)
#> # A tibble: 6 × 9
#> ID Bcells…¹ CAFs_…² CD4_T…³ CD8_T…⁴ Endot…⁵ Macro…⁶ NKcell…⁷ other…⁸
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 0.0226 0.0206 0.163 0.110 0.124 0.0146 6.22e- 9 0.546
#> 2 TCGA-BR-4187 0.0511 0.0254 0.189 0.107 0.182 0.0153 3.32e-10 0.430
#> 3 TCGA-BR-4201 0.0435 0.0245 0.242 0.0726 0.146 0.0132 1.85e- 9 0.458
#> 4 TCGA-BR-4253 0.0353 0.0163 0.212 0.169 0.128 0.0156 3.99e-10 0.424
#> 5 TCGA-BR-4256 0.0330 0.0226 0.208 0.111 0.177 0.0149 9.89e-11 0.434
#> 6 TCGA-BR-4257 0.0145 0.0223 0.205 0.102 0.140 0.0147 2.89e-10 0.500
#> # … with abbreviated variable names ¹Bcells_EPIC, ²CAFs_EPIC, ³CD4_Tcells_EPIC,
#> # ⁴CD8_Tcells_EPIC, ⁵Endothelial_EPIC, ⁶Macrophages_EPIC, ⁷NKcells_EPIC,
#> # ⁸otherCells_EPIC
3) Method 3: MCPcounter
<-deconvo_tme(eset = eset_stad, method = "mcpcounter")
mcp#>
#> >>> Running MCP-counter
head(mcp)
#> # A tibble: 6 × 11
#> ID T_cel…¹ CD8_T…² Cytot…³ B_lin…⁴ NK_ce…⁵ Monoc…⁶ Myelo…⁷ Neutr…⁸ Endot…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-… 2.26 2.81 1.59 1.62 0.387 3.73 1.96 1.89 2.24
#> 2 TCGA-… 2.07 3.14 1.53 3.25 0.487 4.49 2.27 2.75 3.75
#> 3 TCGA-… 2.28 1.11 1.96 3.46 1.15 4.23 1.98 2.88 3.24
#> 4 TCGA-… 3.69 4.66 3.83 2.98 1.12 5.48 2.45 2.27 2.82
#> 5 TCGA-… 2.71 3.40 2.26 3.01 0.800 4.88 1.93 3.18 3.79
#> 6 TCGA-… 2.06 2.54 1.76 1.68 0.489 4.11 2.62 2.30 2.67
#> # … with 1 more variable: Fibroblasts_MCPcounter <dbl>, and abbreviated
#> # variable names ¹T_cells_MCPcounter, ²CD8_T_cells_MCPcounter,
#> # ³Cytotoxic_lymphocytes_MCPcounter, ⁴B_lineage_MCPcounter,
#> # ⁵NK_cells_MCPcounter, ⁶Monocytic_lineage_MCPcounter,
#> # ⁷Myeloid_dendritic_cells_MCPcounter, ⁸Neutrophils_MCPcounter,
#> # ⁹Endothelial_cells_MCPcounter
4) Method 4: xCELL
<-deconvo_tme(eset = eset_stad, method = "xcell",arrays = FALSE) xcell
head(xcell)
#> # A tibble: 6 × 68
#> ID aDC_x…¹ Adipo…² Astro…³ B-cel…⁴ Basoph…⁵ CD4+_…⁶ CD4+_n…⁷ CD4+_T…⁸
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-58… 0.189 0 0.00159 0.0664 1.20e- 1 0.00551 8.79e- 3 8.22e-19
#> 2 TCGA-BR-41… 0.0604 0.00896 0.132 0.0599 8.80e-18 0.0210 7.86e-19 0
#> 3 TCGA-BR-42… 0.0993 0.00142 0.116 0.0504 0 0.0319 0 1.90e-19
#> 4 TCGA-BR-42… 0.311 0 0 0.202 7.63e- 2 0.140 5.24e-19 3.74e- 2
#> 5 TCGA-BR-42… 0.315 0.00768 0.127 0.0458 2.00e- 1 0.0307 0 8.33e-19
#> 6 TCGA-BR-42… 0.0805 0 0.0508 0.0499 4.21e- 2 0.0412 1.11e-18 9.19e- 4
#> # … with 59 more variables: `CD4+_Tcm_xCell` <dbl>, `CD4+_Tem_xCell` <dbl>,
#> # `CD8+_naive_T-cells_xCell` <dbl>, `CD8+_T-cells_xCell` <dbl>,
#> # `CD8+_Tcm_xCell` <dbl>, `CD8+_Tem_xCell` <dbl>, cDC_xCell <dbl>,
#> # Chondrocytes_xCell <dbl>, `Class-switched_memory_B-cells_xCell` <dbl>,
#> # CLP_xCell <dbl>, CMP_xCell <dbl>, DC_xCell <dbl>,
#> # Endothelial_cells_xCell <dbl>, Eosinophils_xCell <dbl>,
#> # Epithelial_cells_xCell <dbl>, Erythrocytes_xCell <dbl>, …
5) Method 5: ESTIMATE
<-deconvo_tme(eset = eset_stad, method = "estimate")
estimate#> [1] "Merged dataset includes 10156 genes (256 mismatched)."
#> [1] "1 gene set: StromalSignature overlap= 139"
#> [1] "2 gene set: ImmuneSignature overlap= 140"
head(estimate)
#> # A tibble: 6 × 5
#> ID StromalScore_estimate ImmuneScore_estimate ESTIMATEScor…¹ Tumor…²
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 -111. 1762. 1651. 0.662
#> 2 TCGA-BR-4187 2054. 2208. 4262. 0.334
#> 3 TCGA-BR-4201 1411. 1770. 3181. 0.479
#> 4 TCGA-BR-4253 483. 2905. 3389. 0.451
#> 5 TCGA-BR-4256 1659. 2541. 4200. 0.342
#> 6 TCGA-BR-4257 831. 1722. 2553. 0.557
#> # … with abbreviated variable names ¹ESTIMATEScore_estimate,
#> # ²TumorPurity_estimate
6) Method 6: TIMER
<-deconvo_tme(eset = eset_stad, method = "timer", group_list = rep("stad",dim(eset_stad)[2]))
timer#> [1] "Outlier genes: FTL IGF2 IGHA1 IGHM IGKC IGKV4-1 LYZ MT-ATP6 MT-CO1 MT-CO2 MT-CO3 MT-CYB MT-ND1 MT-ND2 MT-ND3 MT-ND4 MT-ND4L MT-RNR1 MT-RNR2 MT-TP PGC"
head(timer)
#> # A tibble: 6 × 7
#> ID B_cell_TIMER T_cell_CD4_TIMER T_cell_CD…¹ Neutr…² Macro…³ DC_TI…⁴
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 0.0954 0.130 0.191 0.114 0.0389 0.515
#> 2 TCGA-BR-4187 0.0983 0.131 0.202 0.125 0.0627 0.508
#> 3 TCGA-BR-4201 0.0996 0.122 0.200 0.127 0.0551 0.510
#> 4 TCGA-BR-4253 0.101 0.131 0.240 0.129 0.0498 0.531
#> 5 TCGA-BR-4256 0.0945 0.133 0.213 0.137 0.0581 0.520
#> 6 TCGA-BR-4257 0.0907 0.126 0.199 0.121 0.0513 0.515
#> # … with abbreviated variable names ¹T_cell_CD8_TIMER, ²Neutrophil_TIMER,
#> # ³Macrophage_TIMER, ⁴DC_TIMER
7) Method 7: quanTIseq
<-deconvo_tme(eset = eset_stad, tumor = TRUE, arrays = FALSE, scale_mrna = TRUE, method = "quantiseq")
quantiseq#>
#> Running quanTIseq deconvolution module
#> Gene expression normalization and re-annotation (arrays: FALSE)
#> Removing 17 noisy genes
#> Removing 15 genes with high expression in tumors
#> Signature genes found in data set: 138/138 (100%)
#> Mixture deconvolution (method: lsei)
#> Deconvolution sucessful!
head(quantiseq)
#> # A tibble: 6 × 12
#> ID B_cel…¹ Macro…² Macro…³ Monoc…⁴ Neutr…⁵ NK_ce…⁶ T_cel…⁷ T_cel…⁸ Tregs…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-… 0.00897 0.0943 0.0299 0 0.00387 0.0216 0 0.0190 0.0176
#> 2 TCGA-… 0.0269 0.0155 0.169 0 0.0301 0.0119 0.0105 0.0164 0.0217
#> 3 TCGA-… 0.0160 0.0570 0.0661 0 0.0478 0.00943 0 0.00449 0.0398
#> 4 TCGA-… 0.0240 0.171 0.128 0 0.0685 0.00548 0 0.103 0.0660
#> 5 TCGA-… 0.0129 0.126 0.0922 0 0.0668 0.0122 0 0.0315 0.0551
#> 6 TCGA-… 0.0148 0.0685 0.0673 0 0.0277 0.0125 0.0298 0.0156 0.0226
#> # … with 2 more variables: Dendritic_cells_quantiseq <dbl>,
#> # Other_quantiseq <dbl>, and abbreviated variable names ¹B_cells_quantiseq,
#> # ²Macrophages_M1_quantiseq, ³Macrophages_M2_quantiseq, ⁴Monocytes_quantiseq,
#> # ⁵Neutrophils_quantiseq, ⁶NK_cells_quantiseq, ⁷T_cells_CD4_quantiseq,
#> # ⁸T_cells_CD8_quantiseq, ⁹Tregs_quantiseq
<-cell_bar_plot(input = quantiseq[1:12, ], title = "quanTIseq Cell Fraction")
res#> There are seven categories you can choose: box, continue2, continue, random, heatmap, heatmap3, tidyheatmap
#> >>>>=== Palette option for random: 1: palette1; 2: palette2; 3: palette3; 4: palette4
8) Method 8: IPS
<-deconvo_tme(eset = eset_stad, method = "ips", plot= FALSE)
ipshead(ips)
#> # A tibble: 6 × 7
#> ID MHC_IPS EC_IPS SC_IPS CP_IPS AZ_IPS IPS_IPS
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 TCGA-B7-5818 3.56 1.09 -1.37 -0.576 2.70 9
#> 2 TCGA-BR-4187 3.37 1.18 -1.71 -0.308 2.53 8
#> 3 TCGA-BR-4201 3.04 1.22 -1.62 -0.439 2.20 7
#> 4 TCGA-BR-4253 3.79 1.63 -1.74 -1.25 2.42 8
#> 5 TCGA-BR-4256 3.65 1.40 -1.95 -0.847 2.25 8
#> 6 TCGA-BR-4257 3.11 1.23 -1.74 -0.529 2.07 7
9) Combination of above deconvolution results
<-cibersort %>%
tme_combineinner_join(.,mcp,by = "ID") %>%
inner_join(.,xcell,by = "ID") %>%
inner_join(.,epic,by = "ID") %>%
inner_join(.,estimate,by = "ID") %>%
inner_join(.,timer,by = "ID") %>%
inner_join(.,quantiseq,by = "ID") %>%
inner_join(.,ips,by = "ID")
dim(tme_combine)
#> [1] 98 138
5.2 Phenotype Module
5.2.3 Explore Phenotype Relevant Signatures
1) Load data
Data of IMvigor210 immunotherapy cohort was used for batch analysis of phenotype-related signatures.
## Load the signatures and Cell fractions calculated in previous steps by IOBR.
data("imvigor210_sig")
1:5, 1:10]
imvigor210_sig[#> # A tibble: 5 × 10
#> ID B_cel…¹ B_cel…² Plasm…³ T_cel…⁴ T_cel…⁵ T_cel…⁶ T_cel…⁷ T_cel…⁸ T_cel…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 SAMf2… 0.0290 0.0457 0 0 0.132 0.239 0.0456 0 0
#> 2 SAM69… 0.0812 0.00127 0 0 0 0.196 0.0387 0.00823 0
#> 3 SAMc1… 0.0124 0 0.00127 0 0.169 0.317 0.0892 0 0
#> 4 SAM85… 0 0 0.00119 0 0 0.319 0.0443 0.0138 0
#> 5 SAMf2… 0 0 0.00949 0 0 0.392 0.0449 0 0
#> # … with abbreviated variable names ¹B_cells_naive_CIBERSORT,
#> # ²B_cells_memory_CIBERSORT, ³Plasma_cells_CIBERSORT, ⁴T_cells_CD8_CIBERSORT,
#> # ⁵T_cells_CD4_naive_CIBERSORT, ⁶T_cells_CD4_memory_resting_CIBERSORT,
#> # ⁷T_cells_CD4_memory_activated_CIBERSORT,
#> # ⁸T_cells_follicular_helper_CIBERSORT,
#> # ⁹`T_cells_regulatory_(Tregs)_CIBERSORT`
# Check therapeutic response and survival outcome in the phenotype data.
data("imvigor210_pdata")
1:5, 1:5]
imvigor210_pdata[#> # A tibble: 5 × 5
#> ID BOR BOR_binary OS_days OS_status
#> <chr> <chr> <chr> <chr> <chr>
#> 1 SAM00b9e5c52da9 NA NA 57.166324439999997 1
#> 2 SAM0257bbbbd388 SD NR 469.15811100000002 1
#> 3 SAM025b45c27e05 PD NR 263.16221766000001 1
#> 4 SAM032c642382a7 PD NR 74.907597539999998 1
#> 5 SAM04c589eb3fb3 NA NA 20.698151939999999 0
2)Signature groups
Run sig_group()
function for latter batch analysis of
response associated signatures.
# The signature group list.
names(sig_group)
#> [1] "tumor_signature" "EMT" "io_biomarkers"
#> [4] "immu_microenvironment" "immu_suppression" "immu_exclusion"
#> [7] "immu_exhaustion" "TCR_BCR" "tme_signatures1"
#> [10] "tme_signatures2" "Bcells" "Tcells"
#> [13] "DCs" "Macrophages" "Neutrophils"
#> [16] "Monocytes" "CAFs" "NK"
#> [19] "tme_cell_types" "CIBERSORT" "MCPcounter"
#> [22] "EPIC" "xCell" "quanTIseq"
#> [25] "ESTIMATE" "IPS" "TIMER"
#> [28] "fatty_acid_metabolism" "hypoxia_signature" "cholesterol_metabolism"
#> [31] "Metabolism" "hallmark" "hallmark1"
#> [34] "hallmark2" "hallmark3" "Rooney_et_al"
#> [37] "Bindea_et_al" "Li_et_al" "Peng_et_al"
#> [40] "CAF_singleCell"
# The tumor-relevant signatures in first group.
names(sig_group)[1]
#> [1] "tumor_signature"
1]]
sig_group[[#> [1] "CellCycle_Reg"
#> [2] "Cell_cycle"
#> [3] "DDR"
#> [4] "Mismatch_Repair"
#> [5] "Histones"
#> [6] "Homologous_recombination"
#> [7] "Nature_metabolism_Hypoxia"
#> [8] "Molecular_Cancer_m6A"
#> [9] "MT_exosome"
#> [10] "Positive_regulation_of_exosomal_secretion"
#> [11] "Ferroptosis"
#> [12] "EV_Cell_2020"
# The signatures associated with immunotherapy biomarkers.
names(sig_group)[3]
#> [1] "io_biomarkers"
3]]
sig_group[[#> [1] "TMEscore_CIR" "TMEscoreA_CIR"
#> [3] "TMEscoreB_CIR" "T_cell_inflamed_GEP_Ayers_et_al"
#> [5] "CD_8_T_effector" "IPS_IPS"
#> [7] "Immune_Checkpoint" "Exhausted_CD8_Danaher_et_al"
#> [9] "Pan_F_TBRs" "Mismatch_Repair"
#> [11] "APM"
# The signatures of immunosuppression.
names(sig_group)[5]
#> [1] "immu_suppression"
5]]
sig_group[[#> [1] "Pan_F_TBRs"
#> [2] "Fibroblasts_MCPcounter"
#> [3] "Immune_Checkpoint"
#> [4] "Exhausted_CD8_Danaher_et_al"
#> [5] "MDSC_Wang_et_al"
#> [6] "Macrophages_M2_cibersort"
#> [7] "Tregs_quantiseq"
#> [8] "T_cells_regulatory_(Tregs)_CIBERSORT"
3) Identify phenotype relevant signatures
Construct phenotype group and bath visualize correlation between therapy response and selected signatures.
<-imvigor210_pdata[!imvigor210_pdata$BOR_binary=="NA",c("ID","BOR","BOR_binary")]
pdata_group<-iobr_cor_plot(pdata_group = pdata_group,
resid1 = "ID",
feature_data = imvigor210_sig,
id2 = "ID",
target = NULL,
group = "BOR_binary",
is_target_continuous = FALSE,
padj_cutoff = 1,
index = 1,
category = "signature",
signature_group = sig_group[c(1,3,5)],
ProjectID = "IMvigor210",
palette_box = "paired1",
palette_corplot = "pheatmap",
palette_heatmap = 2,
feature_limit = 26,
character_limit = 30,
show_heatmap_col_name = FALSE,
show_col = FALSE,
show_plot = TRUE)
#> [1] ">>> Processing signature: tumor_signature"
#> [1] ">>> Processing signature: io_biomarkers"
#> [1] ">>> Processing signature: immu_suppression"
#> [1] ">>> Proportion of two groups:"
#> NR R
#> 230 68
#>
#> NR R
#> 230 68
head(res)
#> # A tibble: 6 × 8
#> sig_names p.value NR R stati…¹ p.adj log10…² stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 Mismatch_Repair 0.00000921 -0.142 0.479 -0.620 1.22e-4 5.04 ****
#> 2 Cell_cycle 0.0000105 -0.143 0.484 -0.627 1.22e-4 4.98 ****
#> 3 DDR 0.0000152 -0.140 0.474 -0.614 1.22e-4 4.82 ****
#> 4 Homologous_recombination 0.0000331 -0.134 0.453 -0.587 1.79e-4 4.48 ****
#> 5 Histones 0.0000372 -0.127 0.429 -0.556 1.79e-4 4.43 ****
#> 6 CD_8_T_effector 0.000911 -0.104 0.351 -0.455 3.64e-3 3.04 ***
#> # … with abbreviated variable names ¹statistic, ²log10pvalue
4) Visualize phenotype relevant signature genes
Estimate and visualize the expression of response relevant signature genes.
1:5, 1:6]
imvigor210_eset[#> SAMf2ce197162ce SAM698d8d76b934 SAMc1b27bc16435 SAM85e41e7f33f9
#> LOC100093631 3.1009398 2.8207636 3.7058993 2.81012848
#> LOC100126784 -2.6237747 -4.2560520 -5.4104447 -2.07600356
#> LOC100128108 -1.5017841 0.5200520 -0.7665885 -2.07600356
#> LOC100128288 -0.3361981 -1.2204281 -1.9510131 -1.25886761
#> LOC100128361 0.2545468 0.2923847 -0.2009913 -0.02537748
#> SAMf275eb859a39 SAM7f0d9cc7f001
#> LOC100093631 4.0102463 5.24697867
#> LOC100126784 -3.6376118 -2.23243417
#> LOC100128108 -1.9495558 0.08949393
#> LOC100128288 0.3320146 -0.33431378
#> LOC100128361 0.7698920 -1.16830383
<-iobr_cor_plot(pdata_group = pdata_group,
resid1 = "ID",
feature_data = imvigor210_eset,
id2 = "ID",
target = NULL,
group = "BOR_binary",
is_target_continuous = FALSE,
padj_cutoff = 1,
index = 2,
category = "gene",
signature_group = signature_collection[c(1:2,4)],
ProjectID = "IMvigor210",
palette_box = "paired1",
palette_corplot = "pheatmap",
palette_heatmap = 4,
feature_limit = 26,
character_limit = 30,
show_heatmap_col_name = FALSE,
show_col = FALSE,
show_plot = TRUE)
#> [1] ">>> Processing signature: CD_8_T_effector"
#> [1] ">>> Processing signature: DDR"
#>
#> NR R
#> 230 68
#> [1] ">>> Processing signature: Immune_Checkpoint"
#> [1] ">>> Proportion of two groups:"
#> NR R
#> 230 68
#>
#> NR R
#> 230 68
head(res)
#> # A tibble: 6 × 8
#> sig_names p.value NR R statistic p.adj log10pvalue stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 RDM1 0.00000113 -0.147 0.499 -0.646 0.000205 5.95 ****
#> 2 ERCC4 0.00000301 -0.154 0.522 -0.677 0.000272 5.52 ****
#> 3 FANCI 0.00000643 -0.140 0.473 -0.612 0.000388 5.19 ****
#> 4 EME1 0.0000173 -0.132 0.446 -0.578 0.000783 4.76 ****
#> 5 POLE 0.0000246 -0.133 0.451 -0.584 0.000890 4.61 ****
#> 6 BLM 0.0000295 -0.130 0.441 -0.571 0.000891 4.53 ****
5) Estimate lncRNA associated signatures
## Load the signatures and Cell fractions calculated in previous steps by IOBR.
head(as_tibble(imvigor210_sig))
#> # A tibble: 6 × 456
#> ID B_cel…¹ B_cel…² Plasm…³ T_cel…⁴ T_cel…⁵ T_cel…⁶ T_cel…⁷ T_cel…⁸ T_cel…⁹
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 SAMf2… 0.0290 0.0457 0 0 0.132 0.239 0.0456 0 0
#> 2 SAM69… 0.0812 0.00127 0 0 0 0.196 0.0387 0.00823 0
#> 3 SAMc1… 0.0124 0 0.00127 0 0.169 0.317 0.0892 0 0
#> 4 SAM85… 0 0 0.00119 0 0 0.319 0.0443 0.0138 0
#> 5 SAMf2… 0 0 0.00949 0 0 0.392 0.0449 0 0
#> 6 SAM7f… 0.0582 0.129 0 0 0 0.372 0.0349 0.0269 0
#> # … with 446 more variables: T_cells_gamma_delta_CIBERSORT <dbl>,
#> # NK_cells_resting_CIBERSORT <dbl>, NK_cells_activated_CIBERSORT <dbl>,
#> # Monocytes_CIBERSORT <dbl>, Macrophages_M0_CIBERSORT <dbl>,
#> # Macrophages_M1_CIBERSORT <dbl>, Macrophages_M2_CIBERSORT <dbl>,
#> # Dendritic_cells_resting_CIBERSORT <dbl>,
#> # Dendritic_cells_activated_CIBERSORT <dbl>,
#> # Mast_cells_resting_CIBERSORT <dbl>, Mast_cells_activated_CIBERSORT <dbl>, …
# Check therapeutic response and survival outcome in the phenotype data.
head(as_tibble(imvigor210_pdata))
#> # A tibble: 6 × 12
#> ID BOR BOR_b…¹ OS_days OS_st…² Mutat…³ Neo_a…⁴ CD_8_…⁵ Immun…⁶ Pan_F…⁷
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
#> 1 SAM00b9… NA NA 57.166… 1 NA NA -53.0 3.97 353.03…
#> 2 SAM0257… SD NR 469.15… 1 18 4.6862… -64.8 -7.37 608.03…
#> 3 SAM025b… PD NR 263.16… 1 1 0.3137… -81.9 -4.13 -1083.…
#> 4 SAM032c… PD NR 74.907… 1 44 6.1960… 99.9 31.4 -547.2…
#> 5 SAM04c5… NA NA 20.698… 0 50 NA -27.0 -2.71 11876.…
#> 6 SAM0571… SD NR 136.01… 1 2 1.4705… 101. 12.2 -587.5…
#> # … with 2 more variables: Mismatch_Repair <chr>, TumorPurity <dbl>, and
#> # abbreviated variable names ¹BOR_binary, ²OS_status, ³Mutation_Load,
#> # ⁴Neo_antigen_Load, ⁵CD_8_T_effector, ⁶Immune_Checkpoint, ⁷Pan_F_TBRs
5.2.4 Identify Signatures Relevant to Targeted Signature
1) Construct phenotype group
Construct the pdata_group
as a phenotype data frame of
patients.
# The data have been saved in `imvigor210_pdata`.
# head(imvigor210_pdata)
<-as.data.frame(imvigor210_pdata[,c("ID","Pan_F_TBRs")])
pdata_group$Pan_F_TBRs<-scale(as.numeric(pdata_group$Pan_F_TBRs))
pdata_grouphead(pdata_group)
#> ID Pan_F_TBRs
#> 1 SAM00b9e5c52da9 0.2681507
#> 2 SAM0257bbbbd388 0.4618378
#> 3 SAM025b45c27e05 -0.8229645
#> 4 SAM032c642382a7 -0.4156874
#> 5 SAM04c589eb3fb3 9.0208975
#> 6 SAM0571f17f4045 -0.4462966
5.2.5 Batch Analyses and Visualization
IOBR provide multiple batch analytic functions for statistical analysis, data operation and transformation. For instance,batch survival analysis could be employed for batch analysis of the best cutoff value for subsequent batch survival analysis, varied batch statistical tests, etc.
1) Batch survival analyses
<-imvigor210_pdata %>%
inputfilter(!is.na(.$OS_days)) %>%
filter(!is.na(.$OS_status)) %>%
mutate(OS_days = as.numeric(.$OS_days)) %>%
mutate(OS_status = as.numeric(.$OS_status)) %>%
::select(ID,OS_days,OS_status) %>%
dplyrinner_join(.,imvigor210_sig, by= "ID")
<- batch_surv(pdata = input,
res variable = c(4:ncol(input)),
time = "OS_days",
status = "OS_status")
#> Warning in coxph.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> Loglik converged before variable 1 ; coefficient may be infinite.
head(res)
#> # A tibble: 6 × 5
#> ID P HR CI_low_0.95 CI_up_0.95
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 62 0.0000226 3381. 78.8 145000.
#> 2 169 0.000131 0.814 0.733 0.905
#> 3 18 0.000165 0.0014 0 0.0421
#> 4 327 0.000385 1.32 1.13 1.53
#> 5 197 0.000629 0.886 0.827 0.950
#> 6 132 0.000812 0.856 0.781 0.937
2) Subgroup survival analyses
##loading data and filter NA
data(subgroup_data)
<- subgroup_data %>%
input filter(time > 0) %>%
filter(!is.na(status)) %>%
filter(!is.na(AJCC_stage))
##for binary variable
<- subgroup_survival(pdata = input,
res time = "time",
status = "status",
variable = c("ProjectID", "AJCC_stage"),
object = "score_binary" )
res#> P HR CI_low_0.95 CI_up_0.95
#> ProjectID_Dataset1 1.197083e-05 3.0901 1.8648 5.1205
#> ProjectID_Dataset2 2.250787e-03 2.1049 1.3057 3.3931
#> ProjectID_Dataset3 1.396953e-01 2.4431 0.7467 7.9940
#> ProjectID_Dataset4 2.402609e-03 2.0312 1.2854 3.2097
#> ProjectID_Dataset5 6.923325e-06 4.6375 2.3759 9.0520
#> AJCC_stage_1 2.533212e-01 1.7307 0.6753 4.4354
#> AJCC_stage_2 2.319962e-05 2.9450 1.7858 4.8569
#> AJCC_stage_3 2.482420e-05 2.2821 1.5551 3.3490
#> AJCC_stage_4 5.104331e-02 1.5941 0.9979 2.5466
3) Batch statistical analyses
data("imvigor210_sig")
data("imvigor210_pdata")
# For analyses of binary variables
<-merge(imvigor210_pdata[,c("ID","BOR_binary")],imvigor210_sig,by="ID",all = F)
input<-input[!input$BOR_binary=="NA",]
input1:5,1:5]
input[#> ID BOR_binary B_cells_naive_CIBERSORT B_cells_memory_CIBERSORT
#> 2 SAM0257bbbbd388 NR 0.00000000 0.03161044
#> 3 SAM025b45c27e05 NR 0.00000000 0.00000000
#> 4 SAM032c642382a7 NR 0.08022624 0.00000000
#> 6 SAM0571f17f4045 NR 0.31954131 0.13731249
#> 7 SAM065890737112 R 0.10875636 0.00000000
#> Plasma_cells_CIBERSORT
#> 2 0.03799257
#> 3 0.00167134
#> 4 0.01413286
#> 6 0.00000000
#> 7 0.01061609
<-batch_wilcoxon(data = input,
restarget = "BOR_binary",
feature = colnames(imvigor210_sig)[3:ncol(imvigor210_sig)])
#>
#> NR R
#> 230 68
head(res, n=10)
#> # A tibble: 10 × 8
#> sig_names p.value NR R stati…¹ p.adj log10…² stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 Mismatch_Repair 9.21e-6 -0.111 0.246 -0.357 0.00138 5.04 ****
#> 2 Cell_cycle 1.05e-5 -0.297 0.673 -0.969 0.00138 4.98 ****
#> 3 TMEscoreB_plus 1.35e-5 0.149 -0.432 0.581 0.00138 4.87 ****
#> 4 DNA_replication 1.38e-5 -0.147 0.353 -0.500 0.00138 4.86 ****
#> 5 DDR 1.52e-5 -0.229 0.532 -0.760 0.00138 4.82 ****
#> 6 Nucleotide_excision_re… 2.16e-5 -0.106 0.244 -0.350 0.00154 4.67 ****
#> 7 Homologous_recombinati… 3.31e-5 -0.118 0.272 -0.390 0.00154 4.48 ****
#> 8 HALLMARK_E2F_TARGETS 3.33e-5 0.231 0.256 -0.0251 0.00154 4.48 ****
#> 9 T_cells_follicular_hel… 3.48e-5 0.0164 0.0380 -0.0216 0.00154 4.46 ****
#> 10 HALLMARK_WNT_BETA_CATE… 3.70e-5 0.144 0.132 0.0119 0.00154 4.43 ****
#> # … with abbreviated variable names ¹statistic, ²log10pvalue
#Note: Features with statistically significant p value could be used to construct model in the next module.
<-as.character(res[res$p.value<0.05,"sig_names"])
model_feas
# For analyses of continuous variables
# head(imvigor210_sig)
<-batch_cor(data = imvigor210_sig,
restarget = "Pan_F_TBRs",
feature = colnames(imvigor210_sig)[3:ncol(imvigor210_sig)],
method = "spearman")
head(res, n=10)
#> # A tibble: 10 × 6
#> sig_names p.value stati…¹ p.adj log10…² stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 Normal_mucosa_Bindea_et_al 2.28e-146 0.924 1.03e-143 146. ****
#> 2 TMEscoreB_CIR 6.06e-145 0.922 1.37e-142 144. ****
#> 3 Fibroblasts_MCPcounter 1.07e-144 0.922 1.62e-142 144. ****
#> 4 CAF_Peng_et_al 2.48e-127 0.901 2.81e-125 127. ****
#> 5 EMT2 7.72e-124 0.896 7.00e-122 123. ****
#> 6 HALLMARK_EPITHELIAL_MESENCHYMAL_TR… 1.63e-123 0.895 1.23e-121 123. ****
#> 7 HALLMARK_MYOGENESIS 2.02e-113 0.879 1.31e-111 113. ****
#> 8 HALLMARK_APICAL_JUNCTION 8.79e-112 0.876 4.98e-110 111. ****
#> 9 TMEscoreB_plus 2.82e-106 0.866 1.42e-104 106. ****
#> 10 StromalScore_estimate 1.21e-104 0.863 5.46e-103 104. ****
#> # … with abbreviated variable names ¹statistic, ²log10pvalue
4) Batch analyses of Pearson’s correlation coefficient
Detail code to perform batch_pcc()
function for batch
analyses of Partial Correlation coefficient(PCC) is shown below.
Herein, we utilized the data of the Pan_F_TBRs relevant signatures after adjusting TumorPurity as an example input. The major limitation of this method is that it may merely suitable for correlation estimation between two continuous variables.
<- imvigor210_pdata[, c("ID", "TumorPurity", "Pan_F_TBRs")] %>%
pdata_group rename(target = Pan_F_TBRs) %>% mutate(target = as.numeric(target))
<- batch_pcc(pdata_group = pdata_group,
res id1 = "ID",
feature_data = imvigor210_sig,
id2 = "ID",
interferenceid = "TumorPurity",
target = "target",
method = "pearson")
head(res, n = 8)
#> # A tibble: 8 × 6
#> sig_names p.value statistic p.adj log10pvalue stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 Epithelial_cells_xCell 1.26e-25 0.522 5.72e-23 24.9 ****
#> 2 Keratinocytes_xCell 1.24e-16 0.425 2.82e-14 15.9 ****
#> 3 MHC_Class_I 6.63e-15 0.402 1.01e-12 14.2 ****
#> 4 APM 1.78e-13 0.382 2.03e-11 12.7 ****
#> 5 Sebocytes_xCell 2.38e-13 0.380 2.17e-11 12.6 ****
#> 6 Riboflavin_Metabolism 9.02e-12 0.355 6.84e-10 11.0 ****
#> 7 HALLMARK_PI3K_AKT_MTOR_SIGNALING 1.13e-11 0.354 7.35e-10 10.9 ****
#> 8 HALLMARK_P53_PATHWAY 1.98e-11 0.350 1.13e- 9 10.7 ****
5.3 Mutation Module
5.3.1 Load Mutation MAF data
The input Mutation matrix with MAF data format is loaded for further analyses.
MAF data of TCGA-STAD was obtained from UCSC Xena website
# help("make_mut_matrix")
<-"E:/18-Github/Organization/IOBR/tests/TCGA.STAD.mutect.c06465a3-50e7-46f7-b2dd-7bd654ca206b.DR-10.0.somatic.maf"
maf_file<-make_mut_matrix(maf = maf_file,
mut_listisTCGA = T,
category = "multi")
#> -Reading
#> -Validating
#> -Silent variants: 70967
#> -Summarizing
#> --Possible FLAGS among top ten genes:
#> TTN
#> MUC16
#> SYNE1
#> FLG
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 10.5s elapsed (11.2s cpu)
#> Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
#> 18418 4461 692
#> In_Frame_Ins Missense_Mutation Nonsense_Mutation
#> 268 109669 6011
#> Nonstop_Mutation Splice_Site Translation_Start_Site
#> 107 2445 106
#> DEL INS SNP
#> 19387 4900 117890
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
# If "multi" is set in above "category" parameter, four data frames will be returned, which evaluate all the mutations of every gene or estimate only SNP, indel and frameshift as follow:
# NOTE: UCSCXenaTools can be applied to access variant data of TCGA data sets.
<-XenaGenerate(subset = XenaCohorts =="GDC TCGA Stomach Cancer (STAD)") %>%
var_stadXenaFilter(filterDatasets = "TCGA-STAD.mutect2_snv.tsv") %>%
XenaQuery() %>%
XenaDownload() %>%
XenaPrepare()
#> This will check url status, please be patient.
#> All downloaded files will under directory C:\Users\inter\AppData\Local\Temp\Rtmpgrjpa0.
#> The 'trans_slash' option is FALSE, keep same directory structure as Xena.
#> Creating directories for datasets...
#> Downloading TCGA-STAD.mutect2_snv.tsv.gz
head(var_stad)
#> # A tibble: 6 × 11
#> Sample_ID gene chrom start end ref alt Amino…¹ effect filter dna_vaf
#> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 TCGA-CD-A… C1or… chr1 2.19e6 2.19e6 G - p.P72R… frame… PASS 0.137
#> 2 TCGA-CD-A… ERRF… chr1 8.01e6 8.01e6 C T p.P327P synon… panel… 0.341
#> 3 TCGA-CD-A… CLCN6 chr1 1.18e7 1.18e7 G A p.S486N misse… PASS 0.507
#> 4 TCGA-CD-A… PRAM… chr1 1.28e7 1.28e7 G A p.G341R misse… panel… 0.2
#> 5 TCGA-CD-A… PRAM… chr1 1.32e7 1.32e7 G T p.P148H misse… PASS 0.263
#> 6 TCGA-CD-A… CELA… chr1 1.55e7 1.55e7 G A p.G59R misse… PASS 0.0672
#> # … with abbreviated variable name ¹Amino_Acid_Change
# Then function `make_mut_matrix` can be used to transform data frame into mutation matrix
<-make_mut_matrix(mut_data = var_stad,
mut_list2category = "multi",
Tumor_Sample_Barcode = "Sample_ID",
Hugo_Symbol = "gene",
Variant_Classification = "effect",
Variant_Type = "Variant_Type")
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
#> Aggregation function missing: defaulting to length
# NOTE: IOBR provides mutation matrix(`tcga_stad_var`), if MAF data or UCSC can not be accessed.
1:5,1:5]
tcga_stad_var[#> ABCA12 ABCA13 ABCA2 ABCB1 ABCC9
#> TCGA-3M-AB46 0 0 0 0 0
#> TCGA-3M-AB47 0 0 0 0 0
#> TCGA-B7-5816 0 2 0 1 1
#> TCGA-B7-5818 0 0 0 0 0
#> TCGA-B7-A5TI 0 0 0 1 0
5.3.2 Analyze Signature Associated Mutations
Utilize sig_group()
function for batch analyses and
visualization of response associated signatures.
names(mut_list)
#> [1] "all" "snp" "indel" "frameshift"
$all[1:5, 1:10]
mut_list#> A1BG A1CF A2M A2ML1 A3GALT2 A4GALT A4GNT AAAS AACS AADAC
#> TCGA-3M-AB46 1 1 0 0 0 0 0 0 0 0
#> TCGA-3M-AB47 0 0 0 0 0 0 0 0 0 0
#> TCGA-B7-5816 0 0 1 0 0 0 0 1 0 0
#> TCGA-B7-5818 0 0 0 0 0 0 0 0 0 0
#> TCGA-B7-A5TI 0 0 0 0 0 0 0 0 0 0
# choose SNP mutation matrix as input
<-mut_list$snp
mut# NOTE: If the maximum of mutation counts of a single gene of a person in is over 4, the mutation data would be standardized according to following principles.
# mut[mut>=3&mut<=5]<-3
# mut[mut>5]<-4
# Each gene of every samples is categorized as binary variables(mutation or non-mutation) in the MAF data.
##########################
<-find_mutations(mutation_matrix = mut,
ressignature_matrix = tcga_stad_sig,
id_signature_matrix = "ID",
signature = "CD_8_T_effector",
min_mut_freq = 0.01,
plot = TRUE,
method = "Wilcoxon",
save_path = paste0("CD_8_T_effector-relevant-mutations"),
palette = "jco",
show_plot = T)
#> [1] ">>> Result of Wilcoxon test (top 10): "
#> p.value names statistic adjust_pvalue
#> PIK3CA 1.921035e-10 PIK3CA 4125 9.605174e-08
#> TCHH 1.961642e-05 TCHH 3312 4.904106e-03
#> SPEG 3.532750e-05 SPEG 1947 5.887916e-03
#> LRP1 7.511741e-05 LRP1 2649 9.389676e-03
#> WDFY3 1.257659e-04 WDFY3 2964 1.257659e-02
#> ARID1A 2.468609e-04 ARID1A 4878 2.057174e-02
#> PLXNA4 4.215809e-04 PLXNA4 3638 3.011292e-02
#> ANK3 6.399572e-04 ANK3 4446 3.933979e-02
#> DMD 7.364591e-04 DMD 5311 3.933979e-02
#> PLEC 8.026240e-04 PLEC 5562 3.933979e-02
5.4 Model Construction Module
For effective application of the signatures in clinical interpretation, IOBR also provides functions for feature selection, robust biomarker identification, and model construction based on priorly identified phenotype associated biomarkers in 5.2.1 Explore Phenotype Relevant Signatures. For predictive model construction.
data("imvigor210_sig")
data("imvigor210_pdata")
# For analyses of binary variables
<-imvigor210_pdata %>%
input::select(ID,BOR_binary) %>%
dplyrinner_join(.,imvigor210_sig,by="ID") %>%
filter(!is.na(.$BOR_binary)) %>%
filter(!.$BOR_binary=="NA")
# Feature engineering
<-batch_wilcoxon(data = as.data.frame(input),
restarget = "BOR_binary",
feature = colnames(input)[3:ncol(input)])
#>
#> NR R
#> 230 68
head(res)
#> # A tibble: 6 × 8
#> sig_names p.value NR R stati…¹ p.adj log10…² stars
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 Mismatch_Repair 9.21e-6 -0.111 0.246 -0.357 0.00138 5.04 ****
#> 2 Cell_cycle 1.05e-5 -0.297 0.673 -0.969 0.00138 4.98 ****
#> 3 TMEscoreB_plus 1.35e-5 0.149 -0.432 0.581 0.00138 4.87 ****
#> 4 DNA_replication 1.38e-5 -0.147 0.353 -0.500 0.00138 4.86 ****
#> 5 DDR 1.52e-5 -0.229 0.532 -0.760 0.00138 4.82 ****
#> 6 Nucleotide_excision_repair 2.16e-5 -0.106 0.244 -0.350 0.00154 4.67 ****
#> # … with abbreviated variable names ¹statistic, ²log10pvalue
<-as.character(res[res$p.value<0.05,]$sig_names)
model_feas
<-as.data.frame(imvigor210_sig)
input<-colnames(input)[colnames(input)%in%model_feas]
feas<-input[, c("ID",feas)]
input
# target data
<- imvigor210_pdata[!imvigor210_pdata$BOR_binary=="NA",c("ID","BOR_binary")]
pdata_group $BOR_binary <- ifelse(pdata_group$BOR_binary == "R", 1, 0)
pdata_group
#Feature selection
<- BinomialModel(x = input,
binomial_result y = pdata_group,
seed = "123456",
scale = TRUE,
train_ratio = 0.7,
nfold = 10,
plot = T)
#> NULL
#> NULL
#> NULL
#> NULL
#> NULL
plot(binomial_result$lasso_result$model)
plot(binomial_result$ridge_result$model)
lapply(binomial_result[1:2], function(z)z$AUC)
#> $lasso_result
#> lambda.min lambda.1se
#> train 0.8911641 0.5
#> test 0.7017974 0.5
#>
#> $ridge_result
#> lambda.min lambda.1se
#> train 0.7915115 0.5
#> test 0.7263072 0.5
For prognostic model construction.
data("imvigor210_pdata")
<-imvigor210_pdata %>%
pdata_prog::select(ID, OS_days, OS_status) %>%
dplyrfilter(!is.na(.$OS_days)) %>%
filter(!.$OS_days=="NA") %>%
:: rename(time = OS_days) %>%
dplyr:: rename(status = OS_status) %>%
dplyrmutate(time = as.numeric(.$time)) %>%
mutate(status = as.numeric(.$status))
<-as.data.frame(pdata_prog)
pdata_prog<-as.data.frame(imvigor210_sig)
input
<- PrognosticModel(x = input,
prognostic_result y = pdata_prog,
scale = T,
seed = "123456",
train_ratio = 0.8,
nfold = 10,
plot = T)
#> NULL
#> NULL
#> NULL
plot(prognostic_result$lasso_result$model)
plot(prognostic_result$ridge_result$model)
6. Demonstration of IOBR Pipeline with An Example Dataset
In order to find best predictive or prognostic signature, we recommend Blasso package to determine the most robust markers. If Blasso R package is utilized in your published research, please cite: DQ Zeng, ZL Ye, et al., Macrophage correlates with immunophenotype and predicts anti-PD-L1 response of urothelial cancer. Theranostics 2020; 10(15):7002-7014. doi:10.7150/thno.46176
# Download package through this website (http://research-pub.gene.com/IMvigor210CoreBiologies/packageVersions/) and install it.
if (!requireNamespace("IMvigor210CoreBiologies", quietly = TRUE))
install.packages("/home/stu10/IOBR/IMvigor210CoreBiologies_1.0.0.tar.gz", repos=NULL)
library(IMvigor210CoreBiologies)
# Preparing expression data for TME and signatures deconvolution
###############################
data(cds)
head(fData(cds))
<-counts(cds)
expMatrix###############################
<-fData(cds)[,c("entrez_id","length","symbol")]
eff_length2 rownames(eff_length2)<-eff_length2$entrez_id
head(eff_length2)
###############################
<- rownames(expMatrix)
feature_ids # trim the expression matrix and effetive gene length
<- expMatrix[feature_ids %in% rownames(eff_length2),]
expMatrix <- match(rownames(expMatrix), rownames(eff_length2))
mm <- eff_length2[mm, ]
eff_length2 #################################
# transform count data to TPM
<- expMatrix / eff_length2$length
x <- t( t(x) / colSums(x) ) * 1e6
eset summary(duplicated(rownames(eset)))
#################################
# annotating expression set
<-IOBR::anno_eset(eset = eset,
esetannotation = eff_length2,
symbol = "symbol",
probe = "entrez_id",
method = "mean")
1:5,1:5] eset[
<-"blca"
tumor_typeif(max(eset)>100) eset<-log2(eset+1)
<-deconvo_tme(eset = eset,method = "cibersort",arrays = FALSE,perm = 1000 )
cibersort<-deconvo_tme(eset = eset,method = "epic",arrays = FALSE)
epic <-deconvo_tme(eset = eset,method = "mcpcounter") mcp
Step 1. TME cell estimation
<-deconvo_tme(eset = eset,method = "xcell",arrays = FALSE) xcell
<-deconvo_tme(eset = eset,method = "estimate")
estimate <-deconvo_tme(eset = eset,method = "timer",group_list = rep(tumor_type,dim(eset)[2]))
timer <-deconvo_tme(eset = eset,method = "quantiseq", tumor = TRUE, arrays = FALSE, scale_mrna = TRUE)
quantiseq<-deconvo_tme(eset = eset,method = "ips",plot= FALSE)
ips
<-cibersort %>%
tme_combineinner_join(.,mcp,by = "ID") %>%
inner_join(.,xcell,by = "ID") %>%
inner_join(.,epic,by = "ID") %>%
inner_join(.,estimate,by = "ID") %>%
inner_join(.,quantiseq,by = "ID") %>%
inner_join(.,timer,by = "ID") %>%
inner_join(.,ips,by = "ID")
if("Index"%in%colnames(tme_combine)) tme_combine<-tme_combine[,-which(colnames(tme_combine)=="index")]
Step-2. Collected gene Signatures estimation
<-calculate_sig_score(pdata = NULL,
sig_reseset = eset,
signature = signature_collection,
method = "pca",
mini_gene_count = 2)
Step 3: Evaluation of Gene signatures and pathway obtained from MsigDB of Broad Institute.
<-calculate_sig_score(pdata = NULL,
sig_go_keggeset = eset,
signature = c(hallmark,go_bp,go_cc,go_mf,kegg,reactome),
method = "ssgsea",
mini_gene_count = 2)
#########################################
<-tme_combine %>%
tme_sig_cominner_join(.,sig_res,by = "ID") %>%
inner_join(.,sig_go_kegg,by = "ID")
save(tme_sig_com,file = paste0("0-IMvigor210-Merge-TME-Signature-Hallmark-GO-KEGG-Reactome.RData"))
#########################################
#Preparing phenotype data
#########################################
<-pData(cds)
pdatacolnames(pdata)<-gsub(colnames(pdata),pattern = " ",replacement = "_")
<-rownames_to_column(pdata[,c("binaryResponse",
pdata"FMOne_mutation_burden_per_MB",
"Neoantigen_burden_per_MB",
"censOS","os")],var = "ID")
head(pdata)
colnames(pdata)<-c("ID","BOR_binary","TMB","TNB","status","time")
<-pdata[!is.na(pdata$BOR_binary),]
pdata$BOR_binary<-ifelse(pdata$BOR_binary=="CR/PR","R","NR")
pdata
# Feature engineering
# Step 1: Selecting features associated with treatment response (BOR based on RECIST)
<-merge(pdata[,c("ID","BOR_binary")],tme_sig_com,by="ID",all = FALSE)
input1<-batch_wilcoxon(data = input1,
restarget = "BOR_binary",
feature = colnames(input1)[3:ncol(input1)])
<-as.character(res[res$p.value<0.05,]$sig_names)
feas1
# Step 2: Selecting features associated with treatment outcome (Overall survival)
<-pdata %>%
input2filter(!is.na(.$time)) %>%
filter(!is.na(.$status)) %>%
mutate(time = as.numeric(.$time)) %>%
mutate(status = as.numeric(.$status)) %>%
::select(ID,time,status) %>%
dplyrinner_join(.,tme_sig_com, by= "ID")
<- batch_surv(pdata = input2,
res variable = c(4:ncol(input2)),
time = "time",
status = "status")
<-as.character(res[res$P < 0.01,]$ID)
feas2
<-intersect(feas1, feas2)
feas<-input2[,c("ID","time","status")]
target<-input2[,c("ID",feas)]
fea_data
# Integrating LASSO cox regression and bootstrapping algorithm to find best prognostic features
######################################
if (!requireNamespace("Blasso", quietly = TRUE))
::install_github("DongqiangZeng0808/Blasso")
devtoolslibrary("Blasso")
#For prognostic biomarkers
<-best_predictor_cox(target_data = target,
resfeatures = fea_data,
status = "status",
time = "time",
nfolds = 10,
permutation = 1000,
show_progress = FALSE)
7. Summary
IOBR provides four major analytic modules allowing effective and flexible analysis of tumor immunologic, clinical, genomics, and single-cell data. There might be some limitations in this package, we will put our efforts into its improvement for more functionality in the future. Any questions, bug reports, or suggestions about IOBR R package is appreciated. Please contact us through E-mail to dongqiangzeng0808@gmail.com or yzlsibyl@163.com.
8. Session Information
sessionInfo()
#> R version 4.2.0 (2022-04-22 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=Chinese (Simplified)_China.utf8
#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8
#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=Chinese (Simplified)_China.utf8
#>
#> attached base packages:
#> [1] grid stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] GEOquery_2.64.2 Biobase_2.56.0 BiocGenerics_0.42.0
#> [4] UCSCXenaTools_1.4.8 maftools_2.12.0 forcats_0.5.1
#> [7] stringr_1.4.0 purrr_0.3.4 readr_2.1.2
#> [10] tidyr_1.2.0 tidyverse_1.3.2 IOBR_0.99.9
#> [13] survival_3.4-0 ggpubr_0.4.0 ggplot2_3.3.6
#> [16] dplyr_1.0.10 tibble_3.1.7 tidyHeatmap_1.8.1
#> [19] ComplexHeatmap_2.12.0
#>
#> loaded via a namespace (and not attached):
#> [1] R.methodsS3_1.8.2 ragg_1.2.2
#> [3] bit64_4.0.5 knitr_1.39
#> [5] irlba_2.3.5 DelayedArray_0.22.0
#> [7] R.utils_2.11.0 data.table_1.14.2
#> [9] rpart_4.1.16 KEGGREST_1.36.2
#> [11] RCurl_1.98-1.7 doParallel_1.0.17
#> [13] generics_0.1.3 timereg_2.0.2
#> [15] preprocessCore_1.58.0 ScaledMatrix_1.4.0
#> [17] cowplot_1.1.1 RSQLite_2.2.14
#> [19] future_1.26.1 proxy_0.4-27
#> [21] bit_4.0.4 tzdb_0.3.0
#> [23] xml2_1.3.3 lubridate_1.8.0
#> [25] ggsci_2.9 SummarizedExperiment_1.26.1
#> [27] assertthat_0.2.1 viridis_0.6.2
#> [29] gargle_1.2.0 xfun_0.31
#> [31] hms_1.1.1 jquerylib_0.1.4
#> [33] evaluate_0.15 DNAcopy_1.70.0
#> [35] fansi_1.0.3 dendextend_1.15.2
#> [37] dbplyr_2.2.0 readxl_1.4.0
#> [39] km.ci_0.5-6 DBI_1.1.2
#> [41] geneplotter_1.74.0 htmlwidgets_1.5.4
#> [43] googledrive_2.0.0 stats4_4.2.0
#> [45] ellipsis_0.3.2 corrplot_0.92
#> [47] backports_1.4.1 ggcorrplot_0.1.3
#> [49] annotate_1.74.0 sparseMatrixStats_1.8.0
#> [51] MatrixGenerics_1.8.0 vctrs_0.4.1
#> [53] SingleCellExperiment_1.18.0 ROCR_1.0-11
#> [55] remotes_2.4.2 Cairo_1.5-15
#> [57] prettydoc_0.4.1 abind_1.4-5
#> [59] cachem_1.0.6 withr_2.5.0
#> [61] checkmate_2.1.0 vroom_1.5.7
#> [63] cluster_2.1.3 crayon_1.5.2
#> [65] genefilter_1.78.0 glmnet_4.1-4
#> [67] edgeR_3.38.1 pkgconfig_2.0.3
#> [69] labeling_0.4.2 GenomeInfoDb_1.32.2
#> [71] nlme_3.1-157 nnet_7.3-17
#> [73] globals_0.15.0 rlang_1.0.6
#> [75] lifecycle_1.0.3 modelr_0.1.8
#> [77] rsvd_1.0.5 cellranger_1.1.0
#> [79] GSVA_1.44.2 matrixStats_0.62.0
#> [81] graph_1.74.0 Matrix_1.5-3
#> [83] KMsurv_0.1-5 carData_3.0-5
#> [85] Rhdf5lib_1.18.2 zoo_1.8-10
#> [87] reprex_2.0.1 base64enc_0.1-3
#> [89] GlobalOptions_0.1.2 googlesheets4_1.0.0
#> [91] png_0.1-7 viridisLite_0.4.1
#> [93] rjson_0.2.21 bitops_1.0-7
#> [95] R.oo_1.25.0 rhdf5filters_1.8.0
#> [97] Biostrings_2.64.0 blob_1.2.3
#> [99] DelayedMatrixStats_1.18.0 shape_1.4.6
#> [101] parallelly_1.32.0 jpeg_0.1-9
#> [103] rstatix_0.7.0 S4Vectors_0.34.0
#> [105] ggsignif_0.6.3 beachmat_2.12.0
#> [107] scales_1.2.0 lpSolve_5.6.15
#> [109] plyr_1.8.7 memoise_2.0.1
#> [111] GSEABase_1.58.0 magrittr_2.0.3
#> [113] zlibbioc_1.42.0 compiler_4.2.0
#> [115] RColorBrewer_1.1-3 clue_0.3-61
#> [117] DESeq2_1.36.0 cli_3.4.1
#> [119] XVector_0.36.0 listenv_0.8.0
#> [121] patchwork_1.1.1 htmlTable_2.4.0
#> [123] Formula_1.2-4 MASS_7.3-56
#> [125] mgcv_1.8-40 limSolve_1.5.6
#> [127] tidyselect_1.2.0 stringi_1.7.6
#> [129] textshaping_0.3.6 highr_0.9
#> [131] yaml_2.3.5 BiocSingular_1.12.0
#> [133] locfit_1.5-9.5 latticeExtra_0.6-29
#> [135] survMisc_0.5.6 sass_0.4.1
#> [137] tools_4.2.0 future.apply_1.9.0
#> [139] parallel_4.2.0 circlize_0.4.15
#> [141] rstudioapi_0.13 foreach_1.5.2
#> [143] foreign_0.8-82 gridExtra_2.3
#> [145] prodlim_2019.11.13 farver_2.1.0
#> [147] pec_2022.05.04 digest_0.6.29
#> [149] lava_1.6.10 pracma_2.3.8
#> [151] quadprog_1.5-8 ppcor_1.1
#> [153] Rcpp_1.0.9 GenomicRanges_1.48.0
#> [155] car_3.1-0 broom_0.8.0
#> [157] httr_1.4.3 survminer_0.4.9
#> [159] AnnotationDbi_1.58.0 colorspace_2.0-3
#> [161] rvest_1.0.2 XML_3.99-0.10
#> [163] fs_1.5.2 IRanges_2.30.0
#> [165] splines_4.2.0 systemfonts_1.0.4
#> [167] xtable_1.8-4 jsonlite_1.8.0
#> [169] R6_2.5.1 Hmisc_4.7-0
#> [171] pillar_1.8.1 htmltools_0.5.2
#> [173] glue_1.6.2 fastmap_1.1.0
#> [175] BiocParallel_1.30.3 class_7.3-20
#> [177] codetools_0.2-18 mvtnorm_1.1-3
#> [179] utf8_1.2.2 lattice_0.20-45
#> [181] bslib_0.3.1 sva_3.35.2
#> [183] numDeriv_2016.8-1.1 curl_4.3.2
#> [185] magick_2.7.3 timeROC_0.4
#> [187] limma_3.52.1 rmarkdown_2.14
#> [189] munsell_0.5.0 e1071_1.7-11
#> [191] GetoptLong_1.0.5 rhdf5_2.40.0
#> [193] GenomeInfoDbData_1.2.8 iterators_1.0.14
#> [195] HDF5Array_1.24.1 reshape2_1.4.4
#> [197] haven_2.5.0 gtable_0.3.1
9. Citing IOBR
In that IOBR incorporating some algorithms and methodologies from many other packages, please cite corresponding papers alongside IOBR when using the functions contains these algorithms. The detailed paper sources and respective licenses were listed in the subset of 5.1.3 Available Methods to Decode TME Contexture.
If IOBR R package is utilized in your published research, please cite:
- DQ Zeng, ZL Ye, GC Yu, …, WJ Liao*. (2020). IOBR: Multi-omics Immuno-Oncology Biological Research to decode tumor microenvironment and signatures. [doi: 10.1101/2020.12.14.422647]
Please cite the following papers appropriately for TME deconvolution algorithm if used:
- CIBERSORT: Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., … Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12(5), 453–457. https://doi.org/10.1038/nmeth.3337
- ESTIMATE: Vegesna R, Kim H, Torres-Garcia W, …, Verhaak R.*(2013). Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications 4, 2612. http://doi.org/10.1038/ncomms3612
- quanTIseq: Finotello, F., Mayer, C., Plattner, C., Laschober, G., Rieder, D., Hackl, H., …, Sopper, S.* (2019). Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome medicine, 11(1), 34. https://doi.org/10.1186/s13073-019-0638-6
- Li, B., Severson, E., Pignon, J.-C., Zhao, H., Li, T., Novak, J., … Liu, X. S.* (2016). Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biology, 17(1), 174.
- IPS: P. Charoentong et al.*, Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Reports 18, 248-262 (2017). https://doi.org/10.1016/j.celrep.2016.12.019
- MCPCounter: Becht, E., Giraldo, N. A., Lacroix, L., Buttard, B., Elarouci, N., Petitprez, F., … de Reyniès, A*. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biology, 17(1), 218. https://doi.org/10.1186/s13059-016-1070-5
- xCell: Aran, D., Hu, Z., & Butte, A. J.* (2017). xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biology, 18(1), 220. https://doi.org/10.1186/s13059-017-1349-1
- EPIC: Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E., & Gfeller, D*. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. ELife, 6, e26476. https://doi.org/10.7554/eLife.26476
For signature score estimation, please cite corresponding literature below:
- ssgsea: Barbie, D.A. et al (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112.
- gsva: Hänzelmann, S., Castelo, R. and Guinney, J. (2013). GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14(1):7.
- zscore: Lee, E. et al (2008). Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217.
For the datasets enrolled in IOBR, please cite the data sources:
- UCSCXena: Wang et al.,et al (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Journal of Open Source Software, 4(40), 1627
- TLSscore: Helmink BA, Reddy SM, Gao J, et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature. 2020 Jan;577(7791):549-555.
- IMvigor210 immuntherapy cohort: Mariathasan S, Turley SJ, Nickles D, et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature. 2018 Feb 22;554(7693):544-548.
- HCP5: Kulski, J.K. Long Noncoding RNA HCP5, a Hybrid HLA Class I Endogenous Retroviral Gene: Structure, Expression, and Disease Associations. Cells 2019, 8, 480.
- HCP5: Li, Y., Jiang, T., Zhou, W. et al. Pan-cancer characterization of immune-related lncRNAs identifies potential oncogenic biomarkers. Nat Commun 11, 1000 (2020).
- HCP5: Sun J, Zhang Z, Bao S, et alIdentification of tumor immune infiltration-associated lncRNAs for improving prognosis and immunotherapy response of patients with non-small cell lung cancerJournal for ImmunoTherapy of Cancer 2020;8:e000110.
- LINC00657: Feng Q, Zhang H, Yao D, Chen WD, Wang YD. Emerging Role of Non-Coding RNAs in Esophageal Squamous Cell Carcinoma. Int J Mol Sci. 2019 Dec 30;21(1):258. doi: 10.3390/ijms21010258.
- LINC00657: Qin X, Zhou M, Lv H, Mao X, Li X, Guo H, Li L, Xing H. Long noncoding RNA LINC00657 inhibits cervical cancer development by sponging miR-20a-5p and targeting RUNX3. Cancer Lett. 2020 Oct 28:S0304-3835(20)30578-4. doi: 10.1016/j.canlet.2020.10.044.
- LINC00657: Zhang XM, Wang J, Liu ZL, Liu H, Cheng YF, Wang T. LINC00657/miR-26a-5p/CKS2 ceRNA network promotes the growth of esophageal cancer cells via the MDM2/p53/Bcl2/Bax pathway. Biosci Rep. 2020;40(6):BSR20200525.
- TCGA.STAD: Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature. 2014 Sep 11;513(7517):202-9. doi: 10.1038/nature13480.
- TCGA.STAD MAF data: https://api.gdc.cancer.gov/data/c06465a3-50e7-46f7-b2dd-7bd654ca206b
REFERENCES
- Newman, A. M., Liu, C. L., Green, M. R., Gentles, A. J., Feng, W., Xu, Y., … Alizadeh, A. A. (2015). Robust enumeration of cell subsets from tissue expression profiles. Nature Methods, 12(5), 453–457.
- Vegesna R, Kim H, Torres-Garcia W, …, Verhaak R.*(2013). Inferring tumour purity and stromal and immune cell admixture from expression data. Nature Communications 4, 2612.
- Rieder, D., Hackl, H., …, Sopper, S.* (2019). Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome medicine, 11(1), 34.
- Li, B., Severson, E., Pignon, J.-C., Zhao, H., Li, T., Novak, J., … Liu, X. S.* (2016). Comprehensive analyses of tumor immunity: implications for cancer immunotherapy. Genome Biology, 17(1), 174.
- P. Charoentong et al.*, Pan-cancer Immunogenomic Analyses Reveal Genotype-Immunophenotype Relationships and Predictors of Response to Checkpoint Blockade. Cell Reports 18, 248-262 (2017).
- Becht, E., Giraldo, N. A., Lacroix, L., Buttard, B., Elarouci, N., Petitprez, F., … de Reyniès, A*. (2016). Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression. Genome Biology, 17(1), 218.
- Aran, D., Hu, Z., & Butte, A. J.* (2017). xCell: digitally portraying the tissue cellular heterogeneity landscape. Genome Biology, 18(1), 220.
- Racle, J., de Jonge, K., Baumgaertner, P., Speiser, D. E., & Gfeller, D*. (2017). Simultaneous enumeration of cancer and immune cell types from bulk tumor gene expression data. ELife, 6, e26476.
- Barbie, D.A. et al (2009). Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1. Nature, 462(5):108-112.
- Hänzelmann, S., Castelo, R. and Guinney, J. (2013). GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinformatics, 14(1):7.
- Lee, E. et al (2008). Inferring pathway activity toward precise disease classification. PLoS Comp Biol, 4(11):e1000217.
- Wang et al.,et al (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Journal of Open Source Software, 4(40), 1627
- Helmink BA, Reddy SM, Gao J, et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature. 2020 Jan;577(7791):549-555.
- Mariathasan S, Turley SJ, Nickles D, et al. TGFβ attenuates tumour response to PD-L1 blockade by contributing to exclusion of T cells. Nature. 2018 Feb 22;554(7693):544-548.
- Kulski, J.K. Long Noncoding RNA HCP5, a Hybrid HLA Class I Endogenous Retroviral Gene: Structure, Expression, and Disease Associations. Cells 2019, 8, 480.
- Li, Y., Jiang, T., Zhou, W. et al. Pan-cancer characterization of immune-related lncRNAs identifies potential oncogenic biomarkers. Nat Commun 11, 1000 (2020).
- Sun J, Zhang Z, Bao S, et alIdentification of tumor immune infiltration-associated lncRNAs for improving prognosis and immunotherapy response of patients with non-small cell lung cancerJournal for ImmunoTherapy of Cancer 2020;8:e000110.
- Feng Q, Zhang H, Yao D, Chen WD, Wang YD. Emerging Role of Non-Coding RNAs in Esophageal Squamous Cell Carcinoma. Int J Mol Sci. 2019 Dec 30;21(1):258. doi: 10.3390/ijms21010258.
- Qin X, Zhou M, Lv H, Mao X, Li X, Guo H, Li L, Xing H. Long noncoding RNA LINC00657 inhibits cervical cancer development by sponging miR-20a-5p and targeting RUNX3. Cancer Lett. 2020 Oct
- Zhang XM, Wang J, Liu ZL, Liu H, Cheng YF, Wang T. LINC00657/miR-26a-5p/CKS2 ceRNA network promotes the growth of esophageal cancer cells via the MDM2/p53/Bcl2/Bax pathway. Biosci Rep. 2020;40(6):BSR20200525.
- Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature. 2014 Sep 11;513(7517):202-9. doi: 10.1038/nature13480.