Removal of batch effects using distribution-matching residual networks - PubMed (original) (raw)
Removal of batch effects using distribution-matching residual networks
Uri Shaham et al. Bioinformatics. 2017.
Abstract
Motivation: Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq (scRNA-seq), are plagued with systematic errors that may severely affect statistical analysis if the data are not properly calibrated.
Results: We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual neural network, trained to minimize the Maximum Mean Discrepancy between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and scRNA-seq datasets, and demonstrate that it effectively attenuates batch effects.
Availability and implementation: our codes and data are publicly available at https://github.com/ushaham/BatchEffectRemoval.git.
Contact: yuval.kluger@yale.edu.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Figures
Fig. 1
Calibration of CyTOF data. Projection of the source (red) and target (blue) samples on the first two principal components of the target data. Left: before calibration. Right: after calibration
Fig. 2
A typical ResNet block
Fig. 3
Quality of calibration in terms of the marginal distribution of each marker. Empirical cumulative distribution functions of the first three markers in the CyTOF calibration experiment. In each plot the full, dashed and dotted curves corresponds to the target, source and calibrated source samples, respectively. In each marker the full and dotted curves are substantially closer than the full and dashed curves
Fig. 4
Calibration of CyTOF data: CD8 + T-cells cells (red) and target (blue) samples in the (CD28, GzB) plane. Left: before calibration. Center: calibration using MLP. Right, calibration using ResNet
Fig. 5
Histograms of the 25 _P_-values of Kolmogorov-Smirnov tests, comparing the distributions of the calibrated data with the target distribution of each of the 25 markers
Fig. 6
Calibration of scRNA-seq. Top: _t_-SNE plots before (left) and after (right) calibration using MMD-ResNet. Bottom: Calibration of cells with high expression of Prkca. _t_-SNE plots before calibration (left), after calibration using Combat (middle) and MMD-ResNet (right)
Similar articles
- Gating mass cytometry data by deep learning.
Li H, Shaham U, Stanton KP, Yao Y, Montgomery RR, Kluger Y. Li H, et al. Bioinformatics. 2017 Nov 1;33(21):3423-3430. doi: 10.1093/bioinformatics/btx448. Bioinformatics. 2017. PMID: 29036374 Free PMC article. - HDMC: a novel deep learning-based framework for removing batch effects in single-cell RNA-seq data.
Wang X, Wang J, Zhang H, Huang S, Yin Y. Wang X, et al. Bioinformatics. 2022 Feb 7;38(5):1295-1303. doi: 10.1093/bioinformatics/btab821. Bioinformatics. 2022. PMID: 34864918 - ResPAN: a powerful batch correction model for scRNA-seq data through residual adversarial networks.
Wang Y, Liu T, Zhao H. Wang Y, et al. Bioinformatics. 2022 Aug 10;38(16):3942-3949. doi: 10.1093/bioinformatics/btac427. Bioinformatics. 2022. PMID: 35771600 Free PMC article. - Machine learning and statistical methods for clustering single-cell RNA-sequencing data.
Petegrosso R, Li Z, Kuang R. Petegrosso R, et al. Brief Bioinform. 2020 Jul 15;21(4):1209-1223. doi: 10.1093/bib/bbz063. Brief Bioinform. 2020. PMID: 31243426 Review. - Design and computational analysis of single-cell RNA-sequencing experiments.
Bacher R, Kendziorski C. Bacher R, et al. Genome Biol. 2016 Apr 7;17:63. doi: 10.1186/s13059-016-0927-y. Genome Biol. 2016. PMID: 27052890 Free PMC article. Review.
Cited by
- JSOM: Jointly-evolving self-organizing maps for alignment of biological datasets and identification of related clusters.
Lim HS, Qiu P. Lim HS, et al. PLoS Comput Biol. 2021 Mar 16;17(3):e1008804. doi: 10.1371/journal.pcbi.1008804. eCollection 2021 Mar. PLoS Comput Biol. 2021. PMID: 33724985 Free PMC article. - Novel multiparameter correlates of Coxiella burnetii infection and vaccination identified by longitudinal deep immune profiling.
Reeves PM, Raju Paul S, Baeten L, Korek SE, Yi Y, Hess J, Sobell D, Scholzen A, Garritsen A, De Groot AS, Moise L, Brauns T, Bowen R, Sluder AE, Poznansky MC. Reeves PM, et al. Sci Rep. 2020 Aug 7;10(1):13311. doi: 10.1038/s41598-020-69327-x. Sci Rep. 2020. PMID: 32770104 Free PMC article. - Quantitative assessment of cell population diversity in single-cell landscapes.
Liu Q, Herring CA, Sheng Q, Ping J, Simmons AJ, Chen B, Banerjee A, Li W, Gu G, Coffey RJ, Shyr Y, Lau KS. Liu Q, et al. PLoS Biol. 2018 Oct 22;16(10):e2006687. doi: 10.1371/journal.pbio.2006687. eCollection 2018 Oct. PLoS Biol. 2018. PMID: 30346945 Free PMC article. - CLAIRE: contrastive learning-based batch correction framework for better balance between batch mixing and preservation of cellular heterogeneity.
Yan X, Zheng R, Wu F, Li M. Yan X, et al. Bioinformatics. 2023 Mar 1;39(3):btad099. doi: 10.1093/bioinformatics/btad099. Bioinformatics. 2023. PMID: 36821425 Free PMC article. - [A review on integration methods for single-cell data].
Pan D, Li H, Liu H, Sun X. Pan D, et al. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2021 Oct 25;38(5):1010-1017. doi: 10.7507/1001-5515.202104073. Sheng Wu Yi Xue Gong Cheng Xue Za Zhi. 2021. PMID: 34713670 Free PMC article. Review. Chinese.
References
- Dziugaite G.K. et al. (2015). Training generative neural networks via maximum mean discrepancy optimization. Uncertainty in Artificial Intelligence-Proceedings of the 31st Conference. UAI 2015, 258–267.
- Glorot X., Bengio Y. (2010) Understanding the difficulty of training deep feedforward neural networks. In Proceedings of AISTATS, Sardinia, Italy, vol 9, pp. 249–256.
- Gretton A. et al. (2006) A kernel method for the two-sample-problem. Advances in Neural Information Processing Systems, 19, 513–520.
- Gretton A. et al. (2012) A kernel two-sample test. J. Mach. Learn. Res., 13, 723–773.
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical