In this example workflow, we demonstrate a single cell classifier we recently developed in our preprint.

For illustration, we’ve chosen a T cell dataset that we recently published to get started. The TPM expression matrix can be downloaded here.

Library

suppressMessages(library(ggplot2))
suppressMessages(library(tidyverse))
suppressMessages(library(scibet))
suppressMessages(library(viridis))
suppressMessages(library(ggsci))

Load the data

path_da <- "~/test.rds.gz"
expr <- readr::read_rds(path = path_da) 

For expression matrix (TPM), rows should be cells and the last column should be "label".

expr[1:10, 1:10]

E(ntropy)-test for supervised gene selection

Based on our unified model, we developed E-test for supervised gene selection. This step is implemented with SelectGene function. Our use of E-test involves an assumption that there is no heterogeneity within each population and hence 𝑆 could be directly calculated by feeding its corresponding 𝐸 into the 𝑆-𝐸 formula.

etest_gene <- SelectGene(expr, k = 50)
etest_gene
##  [1] "CXCL13"   "CCR7"     "FGFBP2"   "SELL"     "CCL4"     "GZMH"    
##  [7] "GZMB"     "IFNG"     "CCL3"     "FCGR3A"   "GPR15"    "RGS1"    
## [13] "GZMK"     "CX3CR1"   "KLRC2"    "CXCR6"    "GZMA"     "RGS2"    
## [19] "MAL"      "TMIGD2"   "LEF1"     "KLRC1"    "PLEK"     "HAVCR2"  
## [25] "KLRF1"    "GNLY"     "S100B"    "CD160"    "NR4A2"    "KLRG1"   
## [31] "ITGAE"    "NR4A1"    "FOS"      "FCER1G"   "LDLRAP1"  "NKG7"    
## [37] "HLA-DRB5" "VCAM1"    "CCL5"     "CST7"     "PDCD1"    "FCRL6"   
## [43] "C1orf162" "CD82"     "TNFAIP3"  "GPR183"   "LAT2"     "CD69"    
## [49] "S1PR5"    "PLAC8"

To verify these genes, we can examine their expression patterns across different cell types with Marker_heatmap.

Marker_heatmap(expr, etest_gene)