GitHub - caokai1073/Pamona: The software of Pamona, a partial manifold alignment algorithm. (original) (raw)
Pamona
Paper
Manifold alignment for heterogeneous single-cell multi-omics data integration using Pamona
The implementation is based on UnionCom and SCOT.
Enviroment
python >= 3.6
numpy >= 1.18.5
scikit-learn >= 0.23.2
umap-learn >= 0.3.10
Cython >= 0.29.21
scipy >= 1.4.1
matplotlib >= 3.3.1
POT >= 0.7.0
Install
Pamona software is available on the Python package index (PyPI), latest version 0.0.1. To install it using pip, simply type:
Integrate data
Each row should contain the measured values for a single cell, and each column should contain the values of a feature across cells.
from pamona import Pamona import numpy as np data1 = np.loadtxt("./scGEM/methylation_partial.txt") data2 = np.loadtxt("./scGEM/expression_partial.txt") type1 = np.loadtxt("./scGEM/methylation_type_partial.txt") type2 = np.loadtxt("./scGEM/expression_type_partial.txt") type1 = type1.astype(np.int) type2 = type2.astype(np.int) Pa = Pamona.Pamona(n_shared=[138], Lambda=10, output_dim=5) # shared cell number 138 is estimated by SPL integrated_data, T = Pa.run_Pamona([data1, data2]) Pa.test_LabelTA(integrated_data[0],integrated_data[-1],type1,type2) Pa.alignment_score(integrated_data[0], integrated_data[-1][0:142], data2_specific=integrated_data[-1][142:177]) Pa.Visualize([data1,data2], integrated_data, mode='UMAP') # without datatype Pa.Visualize([data1,data2], integrated_data, [type1,type2], mode='UMAP') # with datatype
mode: ["PCA", "TSNE", "UMAP"], default as "PCA".
Example of disagreement matrix of prior information
If cell types are available, users can incorporate the information as follows
gamma = 0.5 # gamma is a parameter, ranges from 0 to 1.
A larger gamma gives more importance to the matching of prior information.
DM = np.ones((len(data1), len(data2))) for i in range(len(data1)): for j in range(len(data2)): if type1[i] == type2[j]: DM[i][j] = gamma ... Pa = Pamona.Pamona(..., M=DM, ...)
Examples on simualtions and real data sets (jupyter notebook)
- Integration of Simulation 1 in Pamona paper
- Integration of Simulation 2 in Pamona paper
- Integration of scGEM data
- Integration of three datasets of scNMT data
- Integration of PBMC data with cell types as prior information
Parameters of class Pamona
The list of parameters is given below:
- data: _list of numpy array, [dataset1, dataset2, ...] (n_datasets, n_samples, n_features)._list of datasets to be integrated, in the form of a numpy array.
- n_shared: _int, default as the cell number of the smallest dataset._shared cell number between datasets.
- epsilon: _float, default as 0.001._the regularization parameter of the partial-GW framework.
- n_neighbors: _int, default as 30._the number of neighborhoods of the k-nn graph.
- Lambda: _float, default as 1.0._the parameter of manifold alignment to make a trade-off between aligning corresponding cells and preserving the local geometries
- output_dim: _int, default as 30._output dimension of the common embedding space after the manifold alignment
- M: numpy array , default as None (optionally). disagreement matrix of prior information.
The other parameters include:
virtual_cells
: int, number of virtual cells, default as 1.max_iter
: int, maximum iterations of the partial-GW framework, default as 1000.tol
: float, the precision condition under which the iteration of the partial-GW framework stops, default as 1e-9.manual_seed
: int, random seed, default as 666.mode
: {‘connectivity’, ‘distance’}, type of returned matrix: ‘connectivity’ will return the connectivity matrix with ones and zeros, and ‘distance’ will return the distances between neighbors according to the given metric. has to be either one of 'connectivity' or 'distance', default as 'distance'.metric
: str, the distance metric used to calculate the k-Neighbors for each sample point, default as ’minkowski’.