datasets

We provide a trained model that includes 30 major human cell types from 42 scRNA-seq datasets. And this model could serve as a reference for single cell identification.

30_major_human_cell_types

test_data

Load the SciBet package

suppressMessages(library(tidyverse))
suppressMessages(library(scibet))

Load the pre-trained SciBet model.

model <- readr::read_csv("~/major_human_cell_types.csv")

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   .default = col_double(),
##   X1 = col_character()
## )

## See spec(...) for full column specifications.

model <- pro.core(model)
query <- readr::read_rds("~/TEST.rds.gz")

For query set (TPM), rows should be cells and columns should be genes.

query[1:10, 1:10]

The identification of query cells could be implemented with the function LoadModel.

ori_label <- query$label
query <- query[,-ncol(query)]

prd <- LoadModel(model)
label <- prd(query)

In this example, the classification accuracy is 92%

num1 <- length(ori_label)
num2 <- tibble(
  ori = ori_label,
  prd = label
) %>%
  dplyr::filter(ori == prd) %>%
  nrow(.)

num2/num1

## [1] 0.92

datasets

January 1, 2020