Spatial Trimming , with Applications to Robustify Sample Spatial Quantile and Outlyingness Functions , and to Construct a New Robust Scatter Estimator (original) (raw)

A robust sample spatial outlyingness function

Journal of Statistical Planning and Inference, 2013

Sample quantile, rank, and outlyingness functions play long-established roles in univariate exploratory data analysis. In recent years, various multivariate generalizations have been formulated, among which the ''spatial'' approach has become especially well developed, including fully affine equivariant/invariant versions with but modest computational burden (

Robust Multivariate Location Estimation in the Existence of Casewise and Cellwise Outliers

Mathematics and Statistics, 2021

Multivariate outliers can exist in two forms, casewise and cellwise. Data collection typically contains unknown proportion and types of outliers which can jeopardize the location estimation and affect research findings. In cases where the two coexist in the same data set, traditional distance-based trimmed mean and coordinate-wise trimmed mean are unable to perform well in estimating location measurement. Distance-based trimmed mean suffers from leftover cellwise outliers after the trimming whereas coordinate-wise trimmed mean is affected by extra casewise outliers. Thus, this paper proposes new robust multivariate location estimation known as Ξ±-distance-based trimmed median (𝐌𝐌τ€·‘(𝐌𝐌𝐌𝐌𝐌𝐌,𝛂𝛂)) to deal with both types of outliers simultaneously in a data set. Simulated data were used to illustrate the feasibility of the new procedure by comparing with the classical mean, classical median and Ξ±-distance-based trimmed mean. Undeniably, the classical mean performed the best when dealing with clean data, but contrarily on contaminated data. Meanwhile, classical median outperformed distance-based trimmed mean when dealing with both casewise and cellwise outliers, but still affected by the combined outliers’ effect. Based on the simulation results, the proposed 𝐌𝐌τ€·‘(𝐌𝐌𝐌𝐌𝐌𝐌,𝛂𝛂) yields better location estimation on contaminated data compared to the other three estimators considered in this paper. Thus, the proposed 𝐌𝐌τ€·‘(𝐌𝐌𝐌𝐌𝐌𝐌,𝛂𝛂) can mitigate the issues of outliers and provide a better location estimation.

On the statistical efficiency of robust estimators of multivariate location

Statistical Methodology, 2011

Univariate median is a well-known location estimator, which is √ nconsistent, asymptotically Gaussian and affine equivariant. It is also a robust estimator of location with the highest asymptotic breakdown point (i.e., 50%). While there are several versions of multivariate median proposed and extensively studied in the literature, many of the aforesaid statistical properties of univariate median fail to hold for some of those multivariate medians. Among multivariate medians, the affine equivariant versions of spatial and co-ordinatewise medians have 50% asymptotic breakdown point, and they have asymptotically Gaussian distribution. The minimum covariance determinant (MCD) estimator is another widely used robust estimator of multivariate location, which is also affine equivariant, has 50% asymptotic breakdown point, and its asymptotic distribution is Gaussian. In this article, we make a comparative study of the efficiencies of affine equivariant versions of spatial and co-ordinatewise medians and the efficiencies of MCD and related estimators considered in the literature.

Robust Location and Scatter Estimators in Multivariate Analysis

Frontiers in Statistics, 2006

The sample mean vector and the sample covariance matrix are the corner stone of the classical multivariate analysis. They are optimal when the underlying data are normal. They, however, are notorious for being extremely sensitive to outliers and heavy tailed noise data. This article surveys robust alternatives of these classical location and scatter estimators and discusses their applications to the multivariate data analysis.

Minimum Covariance Determinant-Based Quantile Robust Regression-Type Estimators for Mean Parameter

Mathematical Problems in Engineering

Robust regression tools are commonly used to develop regression-type ratio estimators with traditional measures of location whenever data are contaminated with outliers. Recently, the researchers extended this idea and developed regression-type ratio estimators through robust minimum covariance determinant (MCD) estimation. In this study, the quantile regression with MCD-based measures of location is utilized and a class of quantile regression-type mean estimators is proposed. The mean squared errors (MSEs) of the proposed estimators are also obtained. The proposed estimators are compared with the reviewed class of estimators through a simulation study. We also incorporated two real-life applications. To assess the presence of outliers in these real-life applications, the Dixon chi-squared test is used. It is found that the quantile regression estimators are performing better as compared to some existing estimators.

Robust Estimation of Multivariate Location and Scatter in the Presence of Missing Data

Journal of the American Statistical Association, 2012

Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of variables and a relatively small number of cases are common place in modern statistical applications. In these cases global down-weighting of an entire case, as performed by traditional robust procedures, may lead to poor results. We highlight the need for a new generation of robust estimators that can efficiently deal with cellwise outliers and at the same time show good performance under casewise outliers.

Multivariate Location: Robust Estimators And Inference

Journal of Modern Applied Statistical Methods, 2004

It has long been known that under arbitrarily small departures from normality, the sample mean can have poor efficiency relative to various alternative estimators. In the multivariate case, (affine equivariant) estimators have been proposed for dealing with this problem, but a comparison of various estimators by Masse and Plante (2003) indicates that the small-sample efficiency of some recently derived methods is rather poor. This article reports that a skipped mean, where outliers are removed via a projection-type outlier detection method, is found to be more satisfactory. The more obvious method for computing a confidence region based on the skipped estimator (using a slight modification of the method in Liu & Singh, 1997) is found to be unsatisfactory except in the bivariate case, at least when the sample size is small. A much more effective method is to use the Bonferroni inequality in conjunction with a standard percentile bootstrap technique applied to the marginal distributions.

A note on the robustness of multivariate medians

Statistics & Probability Letters, 1999

In this note we investigate the extent to which some of the fundamental properties of univariate median are retained by di erent multivariate versions of median with special emphasis on robustness and breakdown properties. We show that transformation retransformation medians, which are a ne equivariant, n 1=2 -consistent and asymptotically normally distributed under standard regularity conditions, can also be very robust with high breakdown points. We prove that with some appropriate adaptive choice of the transformation matrix based on a high breakdown estimate of the multivariate scatter matrix (e.g. S-estimate or minimum covariance determinant estimate), the ΓΏnite sample breakdown point of a transformation retransformation median will be as high as n βˆ’1 [(nβˆ’d+1)=2], where n= the sample size, d= the dimension of the data, and [x] denotes the largest integer smaller than or equal to x. This implies that as n β†’ ∞, the asymptotic breakdown point of a transformation retransformation median can be made equal to 50% in any dimension just like the univariate median. We present a brief comparative study of the robustness properties of di erent a ne equivariant multivariate medians using an illustrative example. .sg (B. Chakraborty), probal@isical.ac.in (P. Chaudhuri) 0167-7152/99/$ -see front matter c 1999 Elsevier Science B.V. All rights reserved PII: S 0 1 6 7 -7 1 5 2 ( 9 9 ) 0 0 0 6 7 -X

Controlling the size of multivariate outlier tests with the MCD estimator of scatter

Statistics and Computing, 2009

Multivariate outlier detection requires computation of robust distances to be compared with appropriate cutoff points. In this paper we propose a new calibration method for obtaining reliable cutoff points of distances derived from the MCD estimator of scatter. These cutoff points are based on a more accurate estimate of the extreme tail of the distribution of robust distances. We show that our procedure gives reliable tests of outlyingness in almost all situations of practical interest, provided that the sample size is not much smaller than 50. Therefore, it is a considerable improvement over all the available MCD procedures, which are unable to provide good control over the size of multiple outlier tests for the data structures considered in this paper.

Reducing the mean squared error of quantile-based estimators by smoothing

TEST, 2013

Many univariate robust estimators are based on quantiles. As already theoretically pointed out by , smoothing the empirical distribution function with an appropriate kernel and bandwidth can reduce the variance and mean squared error (MSE) of some quantile-based estimators in small data sets. In this paper we apply this idea on several robust estimators of location, scale and skewness. We propose a robust bandwidth selection and bias reduction procedure. We show that the use of this smoothing method indeed leads to smaller MSEs, also at contaminated data sets. In particular we obtain better performances for the medcouple which is a robust measure of skewness that can be used for outlier detection in skewed distributions.