runall
— From FASTQ to TME
runall
--mode {salmon|star}
(required)
--outdir <DIR>
(required): root output directory
--fastq <DIR>
(required): forwarded to fastq_qc --path1_fastq
--threads <INT>
(per-block): CPU/concurrency control set via block-level flags (e.g., fastq_qc –num_threads, batch_salmon –num_threads, batch_star_count –num_threads, merge_salmon –num_processes, cibersort –threads, calculate_sig_score –parallel_size).
--batch_size <INT>
(per-block): batching size set via block-level flags (e.g., fastq_qc –batch_size, batch_salmon –batch_size, batch_star_count –batch_size).
--resume
: skip steps if outputs already exist
--dry_run
: print planned commands without executing
All-in-one TME profiling - tme_profile
tme_profile
-i/--input <CSV|TSV[.gz]>
(required): TPM matrix (genes × samples)
-o/--output <DIR>
(required): root output directory
--threads <int>
(default: 1
): threads for scoring/deconvolution
From FASTQ through FASTQ Quality Control and Salmon/STAR to TPM
fastq_qc
--path1_fastq <DIR>
(required): raw FASTQ directory
--path2_fastp <DIR>
(required): output directory for fastp results (01-qc/
)
--num_threads <int>
(default: 8
)
--suffix1 <str>
(default: _1.fastq.gz
): forward read suffix
--batch_size <int>
(default: 5
)
--se
: single-end mode
--length_required <int>
(default: 50
)
Notes: Writes per-sample *_fastp.html/json
; if multiqc is present, also writes 01-qc/multiqc_report/multiqc_fastp_report.html
.
(Implementation: automatic MultiQC invocation and output path)
Salmon mode
batch_salmon
--index <DIR>
(required): salmon index
--path_fq <DIR>
(required): directory of FASTQs (after fastq_qc
)
--path_out <DIR>
(required): output root (e.g., 02-salmon/
)
--suffix1 <str>
(default: _1.fastq.gz
)
--batch_size <int>
(default: 1
): concurrent samples (processes)
--num_threads <int>
(default: 8
): threads per salmon
--gtf <FILE>
: optional GTF for -g
gene-level quant
Behavior: safe R1 to R2 inference; per-sample task.complete
; progress; preflight prints salmon version & index meta keys.
merge_salmon
--path_salmon <DIR>
(required): root containing per-sample salmon outputs (searched recursively)
--project <STR>
(required): prefix for outputs
--num_processes <int>
: I/O threads (default: CPU count)
Output: <project>_salmon_tpm.tsv.gz
, <project>_salmon_count.tsv.gz
under --path_salmon
with progress and head preview.
prepare_salmon
-i/--input <TSV|TSV.GZ>
(required): Salmon-combined gene TPM table
-o/--output <CSV/TSV>
(required): cleaned TPM matrix (genes × samples)
-r/--return_feature {ENST|ENSG|symbol}
(default: symbol
): which identifier to keep
--remove_version
: strip version suffix from gene IDs (e.g., ENSG000001.12 to ENSG000001
)
STAR mode
batch_star_count
--index <DIR>
(required): STAR genomeDir
--path_fq <DIR>
(required): directory of FASTQs (after fastq_qc
)
--path_out <DIR>
(required): outputs (e.g., 02-star/
)
--suffix1 <str>
(default: _1.fastq.gz
)
--batch_size <int>
(default: 1
)
--num_threads <int>
(default: 8
)
Notes: generates sorted BAM and _ReadsPerGene.out.tab
per sample and a summary of paths.
merge_star_count
--path <DIR>
(required): directory containing multiple *_ReadsPerGene.out.tab
--project <STR>
(required): output prefix
Output: <project>.STAR.count.tsv.gz
(gzipped TSV with gene IDs as rows and samples as columns)
count2tpm
-i/--input <CSV/TSV[.gz]>
(required): raw count matrix (genes × samples)
-o/--output <CSV/TSV>
(required): output TPM matrix
--effLength_csv <CSV>
: optional effective-length file with columns id
, eff_length
, symbol
--idtype {ensembl|entrez|symbol|mgi}
(default: ensembl
)
--org {hsa|mmus}
(default: hsa
)
--id <str>
(default: id
): ID column name in --effLength_csv
--length <str>
(default: eff_length
): length column
--gene_symbol <str>
(default: symbol
): gene symbol column
--check_data
: check & drop missing/invalid entries before conversion
--remove_version
: strip version suffix from gene IDs
(Optional) Mouse to Human symbol mapping
mouse2human_eset
-i/--input <CSV|TSV|TXT[.gz]>
(required): input expression matrix or table
-o/--output <CSV|TSV|TXT[.gz]>
(required): converted matrix indexed by human symbols (genes × samples)
--is_matrix
: treat input as a matrix (rows = mouse gene symbols, columns = samples); if omitted, runs in table mode
--column_of_symbol <str>
(required in table mode): column name that contains mouse gene symbols
--sep <,|\t>
: override input separator; if omitted, inferred by extension.
--out_sep <,|\t>
: override output separator; if omitted, inferred by output path extension
--verbose
: print shapes and basic run info
(Optional) Annotate / de‑duplicate
anno_eset
-i/--input <CSV/TSV/TXT>
(required)
-o/--output <CSV/TSV/TXT>
(required)
--annotation {anno_hug133plus2|anno_rnaseq|anno_illumina|anno_grch38}
(required unless using external file)
--annotation-file <pkl/csv/tsv/xlsx>
: external annotation (overrides built-in)
--annotation-key <str>
: key to pick a table if external .pkl
stores a dict of DataFrames
--symbol <str>
(default: symbol
): column used as gene symbol
--probe <str>
(default: id
): column used as probe/feature ID
--method {mean|sd|sum}
(default: mean
): duplicate-ID aggregation
--remove_version
: strip version suffix from gene IDs
Signature scoring
calculate_sig_score
-i/--input <CSV/TSV/TXT>
(required), -o/--output <CSV/TSV/TXT>
(required)
--signature <one or more groups>
(required; space- or comma-separated; all
uses every group)
Groups: go_bp
, go_cc
, go_mf
, signature_collection
, signature_tme
, signature_sc
, signature_tumor
, signature_metabolism
, kegg
, hallmark
, reactome
--method {pca|zscore|ssgsea|integration}
(default: pca
)
--mini_gene_count <int>
(default: 3
)
--adjust_eset
: apply extra filtering after log2 transform
--parallel_size <int>
(default: 1
; threads for scoring (PCA
/zscore
/ssGSEA
))
Deconvolution / scoring
cibersort
-i/--input <CSV/TSV>
(required), -o/--output <CSV/TSV>
(required)
--perm <int>
(default: 100
)
--QN <True|False>
(default: True
): quantile normalization
--absolute <True|False>
(default: False
): absolute mode
--abs_method {sig.score|no.sumto1}
(default: sig.score
)
--threads <int>
(default: 1
)
Output: columns are suffixed with _CIBERSORT
, index name is ID
, separator inferred from output extension.
quantiseq
-i/--input <CSV/TSV>
(required; genes × samples), -o/--output <TSV>
(required)
--arrays
: perform quantile normalization for arrays
--signame <str>
(default: TIL10
)
--tumor
: remove genes highly expressed in tumors
--scale_mrna
: enable mRNA scaling (otherwise raw signature proportions)
--method {lsei|hampel|huber|bisquare}
(default: lsei
)
--rmgenes <str>
(default: unassigned
; allowed: default
, none
, or comma-separated list)
epic
-i/--input <CSV/TSV>
(required; genes × samples)
-o/--output <CSV/TSV>
(required)
--reference {TRef|BRef|both}
(default: TRef
)
estimate
-i/--input <CSV/TSV/TXT>
(required; genes × samples)
-p/--platform {affymetrix|agilent|illumina}
(default: affymetrix
)
-o/--output <CSV/TSV/TXT>
(required)
Output is transposed; columns are suffixed with _estimate
; index label is ID
; separator inferred from extension.
mcpcounter
-i/--input <TSV>
(required; genes × samples)
-f/--features {affy133P2_probesets|HUGO_symbols|ENTREZ_ID|ENSEMBL_ID}
(required)
-o/--output <CSV/TSV>
(required)
Output: suffixed with _MCPcounter
; index label ID
; separator inferred from extension.
IPS
-i/--input <matrix>
(required), -o/--output <file>
(required)
No extra flags (the expression matrix yields IPS sub-scores and a total score).
deside (deep learning–based deconvolution)
-m/--model_dir <dir>
(required): path to the pre-downloaded DeSide model directory
-i/--input <CSV/TSV>
(required): rows = genes, columns = samples
-o/--output <CSV>
(required)
--exp_type {TPM|log_space|linear}
(default: TPM
)
TPM
: already log2 processed
log_space
: log2(TPM+1)
linear
: linear space (TPM/counts)
--gmt <file1.gmt file2.gmt ...>
: optional one or more GMT files for pathway masking
--method_adding_pathway {add_to_end|convert}
(default: add_to_end
)
--scaling_by_constant
, --scaling_by_sample
, --one_minus_alpha
: optional scaling/transforms
--print_info
: verbose logs
--add_cell_type
: append predicted cell-type labels
--transpose
: use if your file is samples × genes
-r/--result_dir <dir>
: optional directory to save result plots/logs
Clustering / decomposition
tme_cluster
-i/--input <CSV/TSV/TXT>
(required): input table for clustering.
Expected shape: first column = sample ID (use --id
if not first), remaining columns = features.
-o/--output <CSV/TSV/TXT>
(required): output file for clustering results.
--features <spec>
: select feature columns by 1-based inclusive range, e.g. 1:22
(intended for CIBERSORT outputs; exclude the sample ID column when counting).
--pattern <regex>
: alternatively select features by a regex on column names (e.g. ^CD8|^NK
).
Tip: use one of --features
or --pattern
.
--id <str>
(default: first column): column name containing sample IDs.
--scale
/ --no-scale
: toggle z-score scaling of features (help text: default = True ).
--min_nc <int>
(default: 2
): minimum number of clusters to try.
--max_nc <int>
(default: 6
): maximum number of clusters to try.
--max_iter <int>
(default: 10
): maximum iterations for k-means.
--tol <float>
(default: 1e-4
): convergence tolerance for centroid updates.
--print_result
: print intermediate KL scores and cluster counts.
--input_sep <str>
(default: auto): input delimiter (e.g. ,
or \t
); auto-detected if unset.
--output_sep <str>
(default: auto): output delimiter; inferred from filename if unset.
nmf
-i/--input <CSV/TSV>
(required): matrix to factorize; first column should be sample names (index).
-o/--output <DIR>
(required): directory to save results.
--kmin <int>
(default: 2
): minimum k
(inclusive).
--kmax <int>
(default: 8
): maximum k
(inclusive).
--features <spec>
: 1-based inclusive selection of feature columns (e.g. 2-10
or 1:5
), typically cell-type columns.
--log1p
: apply log1p
to the input (useful for counts).
--normalize
: L1 row normalization (each sample sums to 1).
--shift <float>
(default: None
): if data contain negatives, add a constant to make all values non-negative.
--random-state <int>
(default: 42
): random seed for NMF.
--max-iter <int>
(default: 1000
): NMF max iterations.
--skip_k_2
: skip evaluating k = 2
when searching for the best k
.
Ligand–receptor
LR_cal
-i/--input <CSV/TSV>
(required): expression matrix (genes × samples).
-o/--output <CSV/TSV>
(required): file to save LR scores.
--data_type {count|tpm}
(default: tpm
): type of the input matrix.
--id_type <str>
(default: ensembl
): gene ID type expected by the LR backend.Choices: ensembl
, entrez
, symbol
, mgi
.
--cancer_type <str>
(default: pancan
): cancer-type network to use.
--verbose
: verbose logging.