7 Examples
7.1 From downloading data to TME
7.1.1 Data download
7.1.1.1 Prepare the SRR list
- Retrieve the SRR accessions for PRJNA1161405 from NCBI SRA: https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA1161405.
- Save the accessions (one per line) into
PRJNA1161405.txt
and upload it to:path/to/PRJNA1161405/
.
7.1.1.2 High-speed download with prefetch
Requires SRA Toolkit installed and on your
PATH
.
# (Optional) load/activate your environment
# module load sra-tools # or: conda activate sra-tools
cd path/to/PRJNA1161405
# Download all SRR runs listed in PRJNA1161405.txt into the current directory
prefetch -O ./ --option-file PRJNA1161405.txt
7.1.1.3 Convert .sra
to FASTQ with fasterq-dump
This loop finds each run directory produced by prefetch
and converts the .sra
file to paired FASTQ files.
folder="path/to/PRJNA1161405/"
cd path/to/PRJNA1161405
for dir in "${folder}"SRR*; do
if [[ -d "${dir}" ]]; then
dir_name="$(basename "${dir}")"
input_file="${dir}/${dir_name}.sra"
# -3: skip technical reads, -p: show progress, -e 64: threads, -O . : output to current dir
fasterq-dump -3 "${input_file}" -p -e 64 -O .
fi
done
7.1.1.4 Multi-thread compression with pigz
Compress all .fastq
files in the folder using 8 threads.
cd path/to/PRJNA1161405
for file in SRR*.fastq; do
if [ -f "$file" ]; then
pigz "$file" -p 8
fi
done
7.1.1.5 (Optional) Direct downloads from ENA FTP with curl
If you prefer pulling FASTQ files directly from ENA:
#!/usr/bin/env bash
set -euo pipefail
# Normal samples
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/063/SRR35344563/SRR35344563_1.fastq.gz -o SRR35344563_GSM8516765_Normal4_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/063/SRR35344563/SRR35344563_2.fastq.gz -o SRR35344563_GSM8516765_Normal4_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/061/SRR35344561/SRR35344561_1.fastq.gz -o SRR35344561_GSM8516763_Normal2_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/061/SRR35344561/SRR35344561_2.fastq.gz -o SRR35344561_GSM8516763_Normal2_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/060/SRR35344560/SRR35344560_1.fastq.gz -o SRR35344560_GSM8516762_Normal1_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/060/SRR35344560/SRR35344560_2.fastq.gz -o SRR35344560_GSM8516762_Normal1_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/062/SRR35344562/SRR35344562_1.fastq.gz -o SRR35344562_GSM8516764_Normal3_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/062/SRR35344562/SRR35344562_2.fastq.gz -o SRR35344562_GSM8516764_Normal3_Homo_sapiens_RNA-Seq_2.fastq.gz
# HCC samples
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/068/SRR35344568/SRR35344568_1.fastq.gz -o SRR35344568_GSM8516770_HCC3_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/068/SRR35344568/SRR35344568_2.fastq.gz -o SRR35344568_GSM8516770_HCC3_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/069/SRR35344569/SRR35344569_1.fastq.gz -o SRR35344569_GSM8516771_HCC4_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/069/SRR35344569/SRR35344569_2.fastq.gz -o SRR35344569_GSM8516771_HCC4_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/070/SRR35344570/SRR35344570_1.fastq.gz -o SRR35344570_GSM8516772_HCC5_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/070/SRR35344570/SRR35344570_2.fastq.gz -o SRR35344570_GSM8516772_HCC5_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/071/SRR35344571/SRR35344571_1.fastq.gz -o SRR35344571_GSM8516773_HCC6_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/071/SRR35344571/SRR35344571_2.fastq.gz -o SRR35344571_GSM8516773_HCC6_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/072/SRR35344572/SRR35344572_1.fastq.gz -o SRR35344572_GSM8516774_HCC7_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/072/SRR35344572/SRR35344572_2.fastq.gz -o SRR35344572_GSM8516774_HCC7_Homo_sapiens_RNA-Seq_2.fastq.gz
# CLD samples
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/073/SRR35344573/SRR35344573_1.fastq.gz -o SRR35344573_GSM8516775_CLD1_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/073/SRR35344573/SRR35344573_2.fastq.gz -o SRR35344573_GSM8516775_CLD1_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/074/SRR35344574/SRR35344574_1.fastq.gz -o SRR35344574_GSM8516776_CLD2_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/074/SRR35344574/SRR35344574_2.fastq.gz -o SRR35344574_GSM8516776_CLD2_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/075/SRR35344575/SRR35344575_1.fastq.gz -o SRR35344575_GSM8516777_CLD3_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/075/SRR35344575/SRR35344575_2.fastq.gz -o SRR35344575_GSM8516777_CLD3_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/076/SRR35344576/SRR35344576_1.fastq.gz -o SRR35344576_GSM8516778_CLD4_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/076/SRR35344576/SRR35344576_2.fastq.gz -o SRR35344576_GSM8516778_CLD4_Homo_sapiens_RNA-Seq_2.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/077/SRR35344577/SRR35344577_1.fastq.gz -o SRR35344577_GSM8516779_CLD5_Homo_sapiens_RNA-Seq_1.fastq.gz
curl -L ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR353/077/SRR35344577/SRR35344577_2.fastq.gz -o SRR35344577_GSM8516779_CLD5_Homo_sapiens_RNA-Seq_2.fastq.gz
7.1.2 From FASTQ to TME - runall
7.1.2.1 Salmon mode
iobrpy runall \
--mode salmon \
--outdir "/path/to/outdir" \
--fastq "/path/to/fastq" \
--threads 8 \
--batch_size 1 \
--index "/path/to/salmon/index" \
--project SRR
7.1.2.2 STAR mode
iobrpy runall \
--mode star \
--outdir "/path/to/outdir" \
--fastq "/path/to/fastq" \
--threads 8 \
--batch_size 1 \
--index "/path/to/star/index" \
--project SRR
7.2 TPM conversion
This page shows four common entry points to a TPM matrix and the final log2(x+1)
transform you should apply after each path.
Quick rule of thumb
- Raw counts → TPM: usecount2tpm
.
- Salmon quant → TPM: useprepare_salmon
.
- Gene-expression tables (e.g., arrays) → gene-level matrix: useanno_eset
to map/aggregate to symbols.
- Mouse → Human: usemouse2human_eset
to map symbols. - After any of the above, runlog2_eset
.
7.2.1 From count matrix to TPM
# 1) counts → TPM
iobrpy count2tpm \
-i MyProj.STAR.count.tsv.gz \
-o TPM_matrix.csv \
--idtype ensembl \
--org hsa \
--remove_version
# (Optional) Add effective transcript lengths if available:
# --effLength_csv efflen.csv --id id --length eff_length --gene_symbol symbol
# 2) TPM → log2(x+1)
iobrpy log2_eset \
-i TPM_matrix.csv \
-o TPM_matrix.log2.csv
7.2.2 From Salmon matrix to TPM
# 1) Salmon TPM (gene/transcript) → cleaned gene-level TPM
iobrpy prepare_salmon \
-i MyProj_salmon_tpm.tsv.gz \
-o TPM_matrix.csv \
--return_feature symbol \
--remove_version
# 2) TPM → log2(x+1)
iobrpy log2_eset \
-i TPM_matrix.csv \
-o TPM_matrix.log2.csv
7.2.3 From gene-expression matrix to gene-level matrix with annotation (anno_eset
)
Use when your input is an expression table that needs ID mapping / de-duplication (e.g., microarray probes → symbols, or TPM tables with mixed identifiers).
# Map/aggregate to symbols using a built-in annotation set
iobrpy anno_eset \
-i expression_matrix.csv \
-o expression_anno.csv \
--annotation anno_grch38 \
--symbol symbol \
--probe id \
--method mean \
--remove_version
# Alternative platform example:
# iobrpy anno_eset -i expression_matrix.csv -o expression_anno.csv \
# --annotation anno_hug133plus2 --symbol symbol --probe id --method mean
# if your input was already TPM-like, finish with log2(x+1)
iobrpy log2_eset \
-i expression_anno.csv \
-o expression_anno.log2.csv
7.2.4 Mouse → Human gene conversion (mouse2human_eset
)
Two common modes:
# Matrix mode: rows = mouse gene symbols, columns = samples
iobrpy mouse2human_eset \
-i mouse_matrix.tsv \
-o human_matrix.tsv \
--is_matrix \
--verbose
# Table mode: has a symbol column (e.g., SYMBOL); will de-duplicate then map
iobrpy mouse2human_eset \
-i mouse_table.csv \
-o human_matrix.csv \
--column_of_symbol SYMBOL \
--verbose
# log2(x+1) after mapping
iobrpy log2_eset \
-i human_matrix.tsv \
-o human_matrix.log2.tsv
7.3 From TPM to TME
This page takes a TPM matrix and runs downstream TME analyses: signature scoring, immune deconvolution (multiple methods), clustering, and ligand–receptor scoring.
7.3.1 Inputs
- TPM matrix:
TPM_matrix.csv
- (Optional) log2 transform: if desired, apply:
iobrpy log2_eset \
-i TPM_matrix.csv \
-o TPM_matrix.log2.csv
7.3.2 All-in-one TME profiling - tme_profile
tme_profile
wraps the following functions into one command:- Signature scoring →
calculate_sig_score
- Immune deconvolution (six methods) →
cibersort
,IPS
,estimate
,mcpcounter
,quantiseq
,epic
- Ligand–receptor scoring →
LR_cal
- It also merges the deconvolution outputs into a single table
- Signature scoring →
Not included:
deside
and any clustering (tme_cluster
,nmf
).
Tip: You can either run each function step-by-step (see the sections below for individual commands and options), or usetme_profile
to execute the full chain in one go.
7.3.2.1 Minimal usage
iobrpy tme_profile \
-i TPM_matrix.csv \
-o out/tme \
--threads 1
7.3.3 Immune deconvolution
Choose one or several methods below; each writes one result file.
7.3.3.1 CIBERSORT
iobrpy cibersort \
-i TPM_matrix.csv \
-o cibersort.csv \
--perm 100 \
--QN True \
--absolute False \
--abs_method sig.score \
--threads 1
7.3.3.2 quanTIseq
iobrpy quantiseq \
-i TPM_matrix.csv \
-o quantiseq.csv \
--signame TIL10 \
--method lsei \
--tumor \
--arrays \
--scale_mrna
7.3.3.3 EPIC
iobrpy epic \
-i TPM_matrix.csv \
-o epic.csv \
--reference TRef
7.3.3.4 ESTIMATE
iobrpy estimate \
-i TPM_matrix.csv \
-o estimate.csv \
--platform affymetrix
7.3.3.5 MCPcounter
iobrpy mcpcounter \
-i TPM_matrix.csv \
-o mcpcounter.csv \
--features HUGO_symbols
7.3.3.6 IPS
iobrpy IPS \
-i TPM_matrix.csv \
-o IPS.csv
7.3.3.7 DeSide
iobrpy deside \
--model_dir path/to/your/DeSide_model \
-i TPM_matrix.csv \
-o deside.csv \
--result_dir path/to/your/plot/folder \
--exp_type TPM \
--scaling_by_constant \
--transpose \
--print_info
7.3.4 TME clustering
You can cluster samples by cell fractions or signature scores.
7.3.4.1 k-means with KL index auto-k (recommended)
iobrpy tme_cluster \
-i cibersort.csv \
-o tme_cluster.csv \
--features 1:22 \
--id "ID" \
--min_nc 2 \
--max_nc 5 \
--print_result \
--scale
7.3.4.2 NMF clustering (auto-k, excluding k=2)
iobrpy nmf \
-i cibersort.csv \
-o path/to/your/result/folder \
--kmax 10 \
--features 1:22 \
--skip_k_2
7.3.5 Ligand–receptor scoring
Compute bulk ligand–receptor interaction scores from TPM:
iobrpy LR_cal \
-i TPM_matrix.csv \
-o LR_score.csv \
--data_type "tpm" \
--id_type "symbol" \
--cancer_type pancan \
--verbose