4 From FASTQ to TME — runall

4.1 How `runall` passes options

runall defines a small set of top-level options (e.g., --mode/--outdir/--fastq/--threads/--batch_size). Any unrecognized options are forwarded to the corresponding sub-steps. This keeps runall flexible as sub-commands evolve.

Below are two fully wired workflows handled by iobrpy runall.

4.2 Salmon mode

iobrpy runall \
  --mode salmon \
  --outdir "/path/to/outdir" \
  --fastq "/path/to/fastq" \
  --threads 8 \
  --batch_size 1 \
  --index "/path/to/salmon/index" \
  --project MyProj

4.3 STAR mode

iobrpy runall \
  --mode star \
  --outdir "/path/to/outdir" \
  --fastq "/path/to/fastq" \
  --threads 8 \
  --batch_size 1 \
  --index "/path/to/star/index" \
  --project MyProj

4.4 Option legend for the `runall` examples

4.4.1 Common options

--mode {salmon|star} — Select backend (Salmon quant vs. STAR align+count)
--outdir <DIR> — Root output directory (creates the standardized layout)
--fastq <DIR> — Raw FASTQ dir, forwarded to fastq_qc --path1_fastq
--threads <INT> / --batch_size <INT> — Global concurrency / batching
--resume — Skip steps whose outputs already exist
--dry_run — Print planned commands without executing

4.4.2 Salmon-only

--index <DIR> — Salmon index for batch_salmon
--project <STR> — Prefix for merged outputs in merge_salmon
--return_feature {symbol|ENSG|ENST} — Output gene ID type in prepare_salmon
--remove_version — Strip version suffix in prepare_salmon

4.4.3 STAR-only

--index <DIR> — STAR genomeDir for batch_star_count
--project <STR> — Prefix for merged counts in merge_star_count
--idtype {ensembl|entrez|symbol|mgi} — Gene ID type for count2tpm
--org {hsa|mmus} — Organism for count2tpm
--remove_version — Strip version suffix before count2tpm

4.4.4 Signature scoring

--method {integration|pca|zscore|ssgsea} — Scoring method for calculate_sig_score
--signature <set> — Which signature set to use (all, etc.)
--mini_gene_count <INT> — Minimum genes per signature
--adjust_eset — Extra filtering after log transform

4.4.5 Deconvolution

--perm <INT> / --QN {true|false} — CIBERSORT permutations / quantile normalization
--platform <STR> — ESTIMATE platform
--features HUGO_symbols — MCPcounter feature type
--arrays --tumor --scale_mrna — quanTIseq options
--reference {TRef|BRef|both} — EPIC reference profile

4.4.6 Ligand–receptor

--data_type {tpm|count} — Input matrix type for LR_cal
--id_type {symbol|ensembl|...} — Gene ID type for LR_cal
--verbose — Verbose logging

4.5 Expected layout

# Salmon mode：
/path/to/outdir
|-- 01-qc
|   |-- <sample>_1.fastq.gz
|   |-- <sample>_2.fastq.gz
|   |-- <sample>_fastp.html
|   |-- <sample>_fastp.json
|   |-- <sample>.task.complete
|   `-- multiqc_report
|       `-- multiqc_fastp_report.html
|-- 02-salmon
|   |-- <sample>
|   |   `-- quant.sf
|   |-- MyProj_salmon_count.tsv.gz
|   `-- MyProj_salmon_tpm.tsv.gz
|-- 03-tpm
|   |-- prepare_salmon.csv
|   `-- tpm_matrix.csv
|-- 04-signatures
|   `-- calculate_sig_score.csv
|-- 05-tme
|   |-- cibersort_results.csv
|   |-- epic_results.csv
|   |-- quantiseq_results.csv
|   |-- IPS_results.csv
|   |-- estimate_results.csv
|   |-- mcpcounter_results.csv
|   `-- deconvo_merged.csv
`-- 06-LR_cal
    `-- lr_cal.csv
# STAR mode：
/path/to/outdir
|-- 01-qc
|   |-- <sample>_1.fastq.gz
|   |-- <sample>_2.fastq.gz
|   |-- <sample>_fastp.html
|   |-- <sample>_fastp.json
|   |-- <sample>.task.complete
|   `-- multiqc_report
|       `-- multiqc_fastp_report.html
|-- 02-star
|   |-- <sample>/
|   |-- <sample>__STARgenome/
|   |-- <sample>__STARpass1/
|   |-- <sample>_STARtmp/
|   |-- <sample>_Aligned.sortedByCoord.out.bam
|   |-- <sample>_Log.final.out
|   |-- <sample>_Log.out
|   |-- <sample>_Log.progress.out
|   |-- <sample>_ReadsPerGene.out.tab
|   |-- <sample>_SJ.out.tab
|   |-- <sample>.task.complete
|   |-- .batch_star_count.done
|   |-- .merge_star_count.done
|   `-- MyProj.STAR.count.tsv.gz
|-- 03-tpm
|   |-- count2tpm.csv
|   `-- tpm_matrix.csv
|-- 04-signatures
|   `-- calculate_sig_score.csv
|-- 05-tme
|   |-- cibersort_results.csv
|   |-- epic_results.csv
|   |-- quantiseq_results.csv
|   |-- IPS_results.csv
|   |-- estimate_results.csv
|   |-- mcpcounter_results.csv
|   `-- deconvo_merged.csv
`-- 06-LR_cal
    `-- lr_cal.csv

4.6 Output Reference

4.6.1 Standard layout (produced by `iobrpy runall`)

01-qc/ — fastp outputs; a resume flag .fastq_qc.done is written when the step completes.
02-salmon/ or 02-star/ — quantification/alignment + merged matrices; resume flags like .batch_salmon.done, .merge_salmon.done, or .merge_star_count.done.
03-tpm/ — unified TPM matrix tpm_matrix.csv. For Salmon mode it comes from prepare_salmon; for STAR mode it comes from count2tpm.
04-signatures/ — signature scoring results (file: calculate_sig_score.csv).
05-tme/ — deconvolution outputs from multiple methods + deconvo_merged.csv.
06-LR_cal/ — ligand–receptor results lr_cal.csv.

4.6.2 Salmon mode (`02-salmon/`)

Per-sample Salmon folders containing quant.sf (from batch_salmon). A .batch_salmon.done flag is written after completion.
Merged matrices (from merge_salmon):
- <PROJECT>_salmon_tpm.tsv[.gz]
- <PROJECT>_salmon_count.tsv[.gz]
  A .merge_salmon.done flag is written after completion.
03-tpm/prepare_salmon.csv — cleaned genes × samples TPM matrix produced by prepare_salmon (default --return_feature symbol unless overridden).
03-tpm/tpm_matrix.csv — log2(x+1) matrix produced by log2_eset from prepare_salmon.csv.

4.6.3 STAR mode (`02-star/`)

Per-sample STAR outputs (BAM, logs, *_ReadsPerGene.out.tab, etc.).
Merged counts (from merge_star_count):
- <PROJECT>.STAR.count.tsv.gz . A .merge_star_count.done flag is written after completion.
03-tpm/count2tpm.csv — TPM matrix produced by count2tpm from the merged STAR ReadPerGene/count matrix.
03-tpm/tpm_matrix.csv — log2(x+1) matrix produced by log2_eset from count2tpm.csv.

4.6.4 Signatures (`04-signatures/`)

calculate_sig_score.csv — per-sample pathway/signature scores. Columns correspond to the selected signature set and method (integration, pca, zscore, or ssgsea).

4.6.5 Deconvolution (`05-tme/`)

Each method writes a single table named <method>_results.csv:

cibersort_results.csv — columns suffixed with _CIBERSORT. Note whether --perm and --QN were used.
quantiseq_results.csv — quanTIseq fractions. Document the chosen --method {lsei|hampel|huber|bisquare} and flags like --arrays, --tumor, --scale_mrna, --signame.
epic_results.csv — EPIC fractions; record the reference profile used (--reference {TRef|BRef|both}).
estimate_results.csv — ESTIMATE immune/stromal/purity scores; columns suffixed _estimate.
mcpcounter_results.csv — MCPcounter scores; columns suffixed _MCPcounter.
IPS_results.csv — IPS sub-scores and total score.

Merged table - deconvo_merged.csv — produced by runall after all deconvolution methods finish; normalizes the sample index to a column named ID and outer-joins by sample ID across methods.

4.6.6 Ligand–receptor (`06-LR_cal/`)

lr_cal.csv — ligand–receptor scoring table from LR_cal. Record the --data_type {count|tpm} and the --id_type you used.

4.1 How runall passes options