Mia Hubert | KU Leuven (original) (raw)
Papers by Mia Hubert
Analytica Chimica Acta, 2011
The minimum covariance determinant (MCD) method is a robust estimator of multivari- ate location ... more The minimum covariance determinant (MCD) method is a robust estimator of multivari- ate location and scatter (Rousseeuw, 1984). The MCD is highly resistant to outliers, and it is often applied by itself and as a building block for other robust multivariate methods. Computing the exact MCD is very hard, so in practice one resorts to approximate algorithms. Most often the
Abstract. Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are th... more Abstract. Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are the two most popular regression techniques in chemo- metrics. They both fit a linear relationship between two sets of variables. The responses are usually low-dimensional whereas the regressors are very numer- ous compared,to the number,of observations. In this paper we compare two recent robust PCR and PLSR methods,and
Computational Statistics Data Analysis, 2007
Http Dx Doi Org 10 1080 10408340600969403, Jan 12, 2007
Journal of Computational and Graphical Statistics, Jul 1, 2012
Theory and Applications of Recent Robust Methods, 2004
Oberwolfach Reports, 2000
Abstract. Extraction of information about the distribution underlying a high-dimensional data set... more Abstract. Extraction of information about the distribution underlying a high-dimensional data set is a formidable, complex problem dominating modern nonparametric statistics. Two general strategies are (i) to extract merely qualitative information, such as modality or ...
We construct classifiers for multivariate and functional data, aiming to combine affine invarianc... more We construct classifiers for multivariate and functional data, aiming to combine affine invariance, robustness, and computational feasibility. The recent approach of Li et al. (2012, JASA) is affine invariant but performs poorly with depth functions that become zero outside the convex hull of the data. For this purpose the bagdistance (bd) is proposed, based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry. Rather than transforming the data to their depths we propose the DistSpace transform, based on bd or a measure of outlyingness, and followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines affine invariance with the simplicity, general applicability and robustness of kNN. The proposal is compared with other methods in experiments with real and simulated data.
fl fl fl fl where m(:) and s(:) are univariate robust estimators of location and scale. In order ... more fl fl fl fl where m(:) and s(:) are univariate robust estimators of location and scale. In order to obtain robust estimates of the covariance matrix, we want to concentrate on those data points with small outlyingness. We consider two options. A flrst approach consists of downweighting all observations according to their outlyingness. We will call this estimator weighted Stahel-Donoho
Journal of the American Statistical Association
Statistical Methods & Applications, 2015
Journal of Machine Learning Research
Recent results about the robustness of kernel methods involve the analysis of influence functions... more Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is opti...
Computational Statistics & Data Analysis, 2015
Lecture Notes in Statistics, 1996
Analytica Chimica Acta, 2011
The minimum covariance determinant (MCD) method is a robust estimator of multivari- ate location ... more The minimum covariance determinant (MCD) method is a robust estimator of multivari- ate location and scatter (Rousseeuw, 1984). The MCD is highly resistant to outliers, and it is often applied by itself and as a building block for other robust multivariate methods. Computing the exact MCD is very hard, so in practice one resorts to approximate algorithms. Most often the
Abstract. Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are th... more Abstract. Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR) are the two most popular regression techniques in chemo- metrics. They both fit a linear relationship between two sets of variables. The responses are usually low-dimensional whereas the regressors are very numer- ous compared,to the number,of observations. In this paper we compare two recent robust PCR and PLSR methods,and
Computational Statistics Data Analysis, 2007
Http Dx Doi Org 10 1080 10408340600969403, Jan 12, 2007
Journal of Computational and Graphical Statistics, Jul 1, 2012
Theory and Applications of Recent Robust Methods, 2004
Oberwolfach Reports, 2000
Abstract. Extraction of information about the distribution underlying a high-dimensional data set... more Abstract. Extraction of information about the distribution underlying a high-dimensional data set is a formidable, complex problem dominating modern nonparametric statistics. Two general strategies are (i) to extract merely qualitative information, such as modality or ...
We construct classifiers for multivariate and functional data, aiming to combine affine invarianc... more We construct classifiers for multivariate and functional data, aiming to combine affine invariance, robustness, and computational feasibility. The recent approach of Li et al. (2012, JASA) is affine invariant but performs poorly with depth functions that become zero outside the convex hull of the data. For this purpose the bagdistance (bd) is proposed, based on halfspace depth. It satisfies most of the properties of a norm but is able to reflect asymmetry. Rather than transforming the data to their depths we propose the DistSpace transform, based on bd or a measure of outlyingness, and followed by k-nearest neighbor (kNN) classification of the transformed data points. This combines affine invariance with the simplicity, general applicability and robustness of kNN. The proposal is compared with other methods in experiments with real and simulated data.
fl fl fl fl where m(:) and s(:) are univariate robust estimators of location and scale. In order ... more fl fl fl fl where m(:) and s(:) are univariate robust estimators of location and scale. In order to obtain robust estimates of the covariance matrix, we want to concentrate on those data points with small outlyingness. We consider two options. A flrst approach consists of downweighting all observations according to their outlyingness. We will call this estimator weighted Stahel-Donoho
Journal of the American Statistical Association
Statistical Methods & Applications, 2015
Journal of Machine Learning Research
Recent results about the robustness of kernel methods involve the analysis of influence functions... more Recent results about the robustness of kernel methods involve the analysis of influence functions. By definition the influence function is closely related to leave-one-out criteria. In statistical learning, the latter is often used to assess the generalization of a method. In statistics, the influence function is used in a similar way to analyze the statistical efficiency of a method. Links between both worlds are explored. The influence function is related to the first term of a Taylor expansion. Higher order influence functions are calculated. A recursive relation between these terms is found characterizing the full Taylor expansion. It is shown how to evaluate influence functions at a specific sample distribution to obtain an approximation of the leave-one-out error. A specific implementation is proposed using a L1 loss in the selection of the hyperparameters and a Huber loss in the estimation procedure. The parameter in the Huber loss controlling the degree of robustness is opti...
Computational Statistics & Data Analysis, 2015
Lecture Notes in Statistics, 1996