Chapter 9 TME and genomic interaction
9.2 Genomic data prepare
In this section, we are going to use the MAF data of TCGA-STAD cohort as an example dataset. This dataset could be found in multiple places, here we show two ways to get it.
9.2.1 Using TCGAbiolinks to download genomic data
R Bioconductor package TCGAbiolinks provides an R interface of GDC data portal, which stores updating TCGA data. You can check and install this package with the following code.
Then you can query, download and prepare the required dataset.
library(TCGAbiolinks)
query <- GDCquery(
project = "TCGA-STAD",
data.category = "Simple Nucleotide Variation",
access = "open",
data.type = "Masked Somatic Mutation",
workflow.type = "Aliquot Ensemble Somatic Variant Merging and Masking"
)
GDCdownload(query)
maf <- GDCprepare(query)
This maf
object is a data.frame
, you can use read.maf()
from R package Maftools to convert it to a MAF object.
In this example, we used the maf file of TCGA-STAD to extract the SNPs in it,and then transformed it into a non-negative matrix.
## [1] "maf"
## -Validating
## -Silent variants: 45460
## -Summarizing
## --Possible FLAGS among top ten genes:
## TTN
## MUC16
## SYNE1
## FLG
## -Processing clinical data
## --Missing clinical data
## -Finished in 8.120s elapsed (2.130s cpu)
## Frame_Shift_Del Frame_Shift_Ins In_Frame_Del
## 21547 4526 1196
## In_Frame_Ins Missense_Mutation Nonsense_Mutation
## 106 102137 5669
## Nonstop_Mutation Splice_Site Translation_Start_Site
## 117 2242 107
## DEL INS ONP SNP
## 22997 4675 10 109965
9.3 Identifying Mutations Associated with TME
The microenvironmental data from the TCGA-STAD expression matrix was merged. The Cuzick or Wilcoxon test was used to identify genetic variants associated with microenvironmental factors. CD_8_T_effector was used as the target variable in this example.
data("tcga_stad_sig", package = "IOBR")
res<-find_mutations(mutation_matrix = mut,
signature_matrix = tcga_stad_sig,
id_signature_matrix = "ID",
signature = "CD_8_T_effector",
palette = "nrc",
min_mut_freq = 0.01,
plot = TRUE,
jitter = TRUE,
point.alpha = 0.25)
## [1] ">>>> Result of Cuzick Test"
## p.value names statistic adjust_pvalue
## PIK3CA 6.279087e-10 PIK3CA 6.183261 3.139543e-07
## PLXNA4 8.196539e-05 PLXNA4 3.938580 2.049135e-02
## DMD 3.514677e-04 DMD 3.574075 3.882570e-02
## SPEG 3.665757e-04 SPEG 3.563047 3.882570e-02
## AHNAK2 3.882570e-04 AHNAK2 3.547940 3.882570e-02
## TCHH 6.571311e-04 TCHH 3.406867 5.476093e-02
## ABCC9 1.043516e-03 ABCC9 3.278524 7.453684e-02
## ANK3 1.470480e-03 ANK3 3.180447 7.713409e-02
## WDFY3 1.690248e-03 WDFY3 3.139867 7.713409e-02
## LRP1 1.762002e-03 LRP1 3.127666 7.713409e-02
## [1] ">>> Result of Wilcoxon test (top 10)"
## p.value names statistic adjust_pvalue
## PIK3CA 3.045532e-11 PIK3CA 4006 1.522766e-08
## TCHH 2.585594e-05 TCHH 2929 6.463985e-03
## PLXNA4 5.257043e-05 PLXNA4 2904 7.754311e-03
## LRP1 6.203449e-05 LRP1 2330 7.754311e-03
## RNF213 1.066555e-04 RNF213 3827 1.066555e-02
## SPEG 1.809850e-04 SPEG 1854 1.439761e-02
## WDFY3 2.420595e-04 WDFY3 2626 1.439761e-02
## DMD 2.430706e-04 DMD 4995 1.439761e-02
## ANK3 2.591569e-04 ANK3 3845 1.439761e-02
## AHNAK2 3.456706e-04 AHNAK2 4828 1.728353e-02
## All mutation types: mut.
## Warning: You defined `cell_fun` for a heatmap with more than 100 rows or
## columns, which might be very slow to draw. Consider to use the
## vectorized version `layer_fun`.
## All mutation types: mut.
## Warning: You defined `cell_fun` for a heatmap with more than 100 rows or
## columns, which might be very slow to draw. Consider to use the
## vectorized version `layer_fun`.
9.6 Other methods to obtaind genomic data
9.6.1 Using TCGAmutations
As its name, the R package TCGAmutations provides pre-compiled, curated somatic mutations from 33 TCGA cohorts along with relevant clinical information for all sequenced samples. You can install it similar to the TCGAbiolinks.
if (!requireNamespace("TCGAmutations", quietly = TRUE))
BiocManager::install("PoisonAlien/TCGAmutations")
Itβs quite simple to use:
9.6.2 Using maftools
If you are a user of R package Maftools, you can access and load the data in a similar way (because the author of TCGAmutations and Maftools is the same person).
# The following github can be changed to gitee
# it maybe fast in China mainland
maftools::tcgaAvailable(repo = "github")
maftools::tcgaLoad("STAD", repo = "github")
MAF data transformation To prepare the data for the downstream analysis. We need to extract the SNV data in it and transform it into a non-negative matrix.
9.7 References
Wang et al., (2019). The UCSCXenaTools R package: a toolkit for accessing genomics data from UCSC Xena platform, from cancer multi-omics to single-cell RNA-seq. Journal of Open Source Software, 4(40), 1627, https://doi.org/10.21105/joss.01627
Gu, Z. (2022) Complex Heatmap Visualization. iMeta.
Anand Mayakonda et al., (2018) Maftools: efficient and comprehensive analysis of somatic variants in cancer. Genome Research