Annotates an expression matrix with gene symbols using provided annotation data, filters out missing or invalid symbols, handles duplicate gene entries, and removes uninformative rows. The function supports multiple aggregation methods for resolving duplicate gene symbols.
Arguments
- eset
Expression matrix or ExpressionSet object containing gene expression data.
- annotation
Data frame containing annotation information for probes. Built-in options include `anno_hug133plus2`, `anno_rnaseq`, and `anno_illumina`.
- symbol
Character string specifying the column name in `annotation` that represents gene symbols. Default is `"symbol"`.
- probe
Character string specifying the column name in `annotation` that represents probe identifiers. Default is `"probe_id"`.
- method
Character string specifying the aggregation method for duplicate gene symbols. Options are `"mean"`, `"sum"`, or `"sd"`. Default is `"mean"`.
Details
The function performs the following operations:
Filters probes with missing symbols or labeled as `"NA_NA"`
Matches probes between expression set and annotation data
Merges annotation with expression data
Handles duplicate gene symbols using specified aggregation method
Removes rows with all zeros, all NAs, or missing values in the first column
Examples
# Create a small example expression matrix
eset_mat <- matrix(runif(100), nrow = 10, ncol = 10)
rownames(eset_mat) <- paste0("Probe", 1:10)
colnames(eset_mat) <- paste0("Sample", 1:10)
# Create a matching annotation data frame
anno_df <- data.frame(
probe_id = paste0("Probe", 1:10),
symbol = c("Gene1", "Gene1", "Gene2", "Gene3", "Gene4",
"Gene5", "Gene6", "Gene7", "Gene8", "Gene9")
)
# Annotate
result <- anno_eset(eset = eset_mat, annotation = anno_df)
#> ℹ Row number of original eset: 10
#> ✔ 100% of probes in expression set were annotated
#> ℹ Found 1 duplicate symbol, using "mean" method
#> ℹ Row number after filtering duplicated gene symbol: 9
head(result)
#> Sample1 Sample2 Sample3 Sample4 Sample5 Sample6 Sample7
#> Gene1 0.7619853 0.6753624 0.66960355 0.852364600 0.8665602 0.3637427 0.7706230
#> Gene2 0.8609894 0.7453418 0.81146366 0.668323644 0.3138426 0.6979466 0.9576697
#> Gene3 0.6735550 0.9425989 0.75916680 0.511314597 0.9594658 0.6844865 0.1586884
#> Gene8 0.1072947 0.5656543 0.92369921 0.003896343 0.8083863 0.8868626 0.9758897
#> Gene6 0.8917137 0.2594267 0.05088011 0.820474512 0.3839367 0.1372436 0.8697061
#> Gene5 0.6931989 0.2977231 0.38090370 0.903362288 0.5314093 0.5546818 0.8731514
#> Sample8 Sample9 Sample10
#> Gene1 0.38917032 0.95657976 0.6232614
#> Gene2 0.09292582 0.37215774 0.9079464
#> Gene3 0.16180921 0.96261534 0.5694433
#> Gene8 0.56028054 0.50815808 0.7819689
#> Gene6 0.41525745 0.40994593 0.7620283
#> Gene5 0.34181444 0.06126261 0.1168276