Model-based clustering and outlier detection with missing data (original) (raw)
Aitken A (1926) A series formula for the roots of algebraic and transcendental equations. Proc R Soc Edinb 45(1):14–22 ArticleMATH Google Scholar
Biernacki C, Celeux G, Govaert G (2003) Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models. Comput Stat Data Anal 41(3–4):561–575 ArticleMathSciNetMATH Google Scholar
Böhning D, Dietz E, Schaub R, Schlattmann P, Lindsay BG (1994) The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann Inst Stat Math 46(2):373–388 ArticleMATH Google Scholar
Buck S (1960) A method of estimation of missing values in multivariate data suitable for use with an electronic computer. J R Stat Soc B 22:302–306 MathSciNetMATH Google Scholar
Coretto P, Hennig C (2016) Robust improper maximum likelihood: tuning, computation, and a comparison with other methods for robust gaussian clustering. J Am Stat Assoc 111(516):1648–1659 ArticleMathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodol) 39(1):1–22 MathSciNetMATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A et al (2008) A general trimming approach to robust cluster analysis. Ann Stat 36(3):1324–1345 ArticleMathSciNetMATH Google Scholar
Genz A, Bretz F, Miwa T, Mi X, Leisch F, Scheipl F, Hothorn T (2019) mvtnorm: multivariate normal and t distributions. R package version 1.0-10
Ghahramani Z, Jordan MI (1994) Learning from incomplete data. Technical report, USA
Karlis D, Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Comput Stat Data Anal 41(3–4):577–590 ArticleMathSciNetMATH Google Scholar
Kaufman L, Rousseeuw P (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistical data analysis based on the L1-norm and related methods, pp 405–416
Lin TI (2014) Learning from incomplete data via parameterized t mixture models through eigenvalue decomposition. Comput Stat Data Anal 71:183–195 ArticleMathSciNetMATH Google Scholar
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 81(4):633–648 ArticleMathSciNetMATH Google Scholar
Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K, Studer M, Roudier P (2016) cluster: cluster analysis extended Rousseeuw et al. R package version 2.0.4
McNicholas PD, Murphy TB, McDaid AF, Frost D (2010) Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput Stat Data Anal 54(3):711–723 ArticleMathSciNetMATH Google Scholar
Meng XL, Rubin DB (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80(2):267–278 ArticleMathSciNetMATH Google Scholar
Novi Inverardi PL, Taufer E (2020) Outlier detection through mixtures with an improper component. Electron J Appl Stat Anal 13(1):146–163 Google Scholar
Peel D, McLachlan GJ (2000) Robust mixture modelling using the t distribution. Stat Comput 10(4):339–348 Article Google Scholar
Punzo A, McNicholas PD (2016) Parsimonious mixtures of multivariate contaminated normal distributions. Biom J 58(6):1506–1537 ArticleMathSciNetMATH Google Scholar
Punzo A, Tortora C (2021) Multiple scaled contaminated normal distribution and its application in clustering. Stat Model 21(4):332–358
Punzo A, Mazza A, McNicholas PD (2016) Contaminatedmixt: an R package for fitting parsimonious mixtures of multivariate contaminated normal distributions. arXiv preprint arXiv:1606.03766
Serafini A, Murphy TB, Scrucca L (2020) Handling missing data in model-based clustering. arXiv preprint arXiv:2006.02954
Titterington DM, Smith AFM, Makov UE (1985) Statistical analysis of finite mixture distributions. Wiley, Chichester MATH Google Scholar
Tortora C, ElSherbiny A, Browne RP, Franczak BC, McNicholas PD, Amos DD (2020) MixGHD: model based clustering, classification and discriminant analysis using the mixture of generalized hyperbolic distributions. https://CRAN.R-project.org/package=MixGHD. R package version 2.3.4
van Buuren S, Groothuis-Oudshoorn K (2011) mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–67 Article Google Scholar
Wang WL, Lin TI (2015) Robust model-based clustering via mixtures of skew-t distributions with missing information. Adv Data Anal Classif 9(4):423–445 ArticleMathSciNetMATH Google Scholar
Wang H, Zhang Q, Luo B, Wei S (2004) Robust mixture modelling using multivariate \(t\)-distribution with missing information. Pattern Recognit Lett 25(6):701–710 Google Scholar
Wei Y, Tang Y, McNicholas PD (2019) Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data. Comput Stat Data Anal 130:18–41 ArticleMathSciNetMATH Google Scholar
Wilks SS (1932) Moments and distributions of estimates of population parameters from fragmentary samples. Ann Math Stat 3(3):163–195 ArticleMATH Google Scholar
Yu C, Chen K, Yao W (2015) Outlier detection and robust mixture modeling using nonconvex penalized likelihood. J Stat Plan Inference 164:27–38 ArticleMathSciNetMATH Google Scholar