In this example workflow, we demonstrate a single cell classifier we recently developed in our preprint.
For illustration, we’ve chosen a T cell dataset that we recently published to get started. The TPM expression matrix can be downloaded here.
suppressMessages(library(ggplot2))
suppressMessages(library(tidyverse))
suppressMessages(library(scibet))
suppressMessages(library(viridis))
suppressMessages(library(ggsci))
path_da <- "~/test.rds.gz"
expr <- readr::read_rds(path = path_da)
For expression matrix (TPM), rows should be cells and the last column should be
"label"
.
expr[1:10, 1:10]
Based on our unified model, we developed E-test for supervised gene selection. This
step is implemented with SelectGene
function. Our use of E-test involves an assumption that there
is no heterogeneity within each population and hence 𝑆 could be directly calculated by feeding its
corresponding 𝐸 into the 𝑆-𝐸 formula.
etest_gene <- SelectGene(expr, k = 50)
etest_gene
## [1] "CXCL13" "CCR7" "FGFBP2" "SELL" "CCL4" "GZMH"
## [7] "GZMB" "IFNG" "CCL3" "FCGR3A" "GPR15" "RGS1"
## [13] "GZMK" "CX3CR1" "KLRC2" "CXCR6" "GZMA" "RGS2"
## [19] "MAL" "TMIGD2" "LEF1" "KLRC1" "PLEK" "HAVCR2"
## [25] "KLRF1" "GNLY" "S100B" "CD160" "NR4A2" "KLRG1"
## [31] "ITGAE" "NR4A1" "FOS" "FCER1G" "LDLRAP1" "NKG7"
## [37] "HLA-DRB5" "VCAM1" "CCL5" "CST7" "PDCD1" "FCRL6"
## [43] "C1orf162" "CD82" "TNFAIP3" "GPR183" "LAT2" "CD69"
## [49] "S1PR5" "PLAC8"
To verify these genes, we can examine their expression patterns across different
cell types with Marker_heatmap
.
Marker_heatmap(expr, etest_gene)