GitHub - SONGDONGYUAN1994/scsampler (original) (raw)
scSampler
Overview
scSampler
is a Python pacakge for fast diversity-preserving subsampling of large-scale single-cell transcriptomic data.
Installation
Please install it from PyPI:
Quick start
First we load all modules.
import numpy as np import pandas as pd import scanpy as sc from time import time from scsampler import scsampler sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3) sc.logging.print_header() sc.settings.set_figure_params(dpi=80, facecolor='white')
Read in data
The example data can be downloaded from https://doi.org/10.5281/zenodo.5811787 in the anndata
format by scanpy
. Here we use the ~68'000 PBMC cells. Please modify the path as your own path.
adata = sc.read_h5ad('/home/dongyuan/scSampler/data/final_h5ad/pbmc68k.h5ad')
anndata as input
Subsample 10% cells and return a new anndata. The space is top PCs.
adata_sub = scsampler(adata, fraction = 0.1, copy = True)
If you want to speed it up, you can use the random_split
. It will lead to slightly less optimal result, of course.
start = time()
adata_sub = scsampler(adata, fraction = 0.1, obsm = 'X_pca', copy = True, random_split = 16)
end = time()
print(end - start)
matrix as input
You can also use the numpy.ndarray
as the input.
mat = adata.obsm['X_pca']
print(type(mat))
res = scsampler(mat, fraction = 0.1, copy = True, random_split = 16)
subsample_index = res[1]
subsample_mat = res[0]
Contact
Any questions or suggestions on scSampler
are welcomed! If you have any questions, please report it on issues or contact Dongyuan (dongyuansong@ucla.edu).