Fast and robust bootstrap (original) (raw)

Robust Location and Scatter Estimators in Multivariate Analysis

Frontiers in Statistics, 2006

The sample mean vector and the sample covariance matrix are the corner stone of the classical multivariate analysis. They are optimal when the underlying data are normal. They, however, are notorious for being extremely sensitive to outliers and heavy tailed noise data. This article surveys robust alternatives of these classical location and scatter estimators and discusses their applications to the multivariate data analysis.

Multivariate Location: Robust Estimators And Inference

Journal of Modern Applied Statistical Methods, 2004

It has long been known that under arbitrarily small departures from normality, the sample mean can have poor efficiency relative to various alternative estimators. In the multivariate case, (affine equivariant) estimators have been proposed for dealing with this problem, but a comparison of various estimators by Masse and Plante (2003) indicates that the small-sample efficiency of some recently derived methods is rather poor. This article reports that a skipped mean, where outliers are removed via a projection-type outlier detection method, is found to be more satisfactory. The more obvious method for computing a confidence region based on the skipped estimator (using a slight modification of the method in Liu & Singh, 1997) is found to be unsatisfactory except in the bivariate case, at least when the sample size is small. A much more effective method is to use the Bonferroni inequality in conjunction with a standard percentile bootstrap technique applied to the marginal distributions.

MULTIVARIATE REGRESSION S-ESTIMATORS FOR ROBUST ESTIMATION AND INFERENCE

2005

In this paper we consider S-estimators for multivariate regression. We study the robustness of the estimators in terms of their breakdown point and in- uence function. Our results extend results on S-estimators in the context of uni- variate regression and multivariate location and scatter. Furthermore we develop a fast and robust bootstrap method for the multivariate S-estimators to obtain in-

A comparative study of some robust methods for coefficient-estimation in linear regression

Computational Statistics & Data Analysis, 1997

Robust regression estimators are known to perform well in the presence of outliers. Although theoretical properties of these estimators have been derived, there is always a need for empirical results to assist their implementation in practical situations. A simulation study of four robust alternatives to the least-squares method is presented within a set of error-distributions which includes many outlier-generating models. The robustness and efficiency features of the methods are exhibited, some finite-sample results are discussed in combination with asymptotic properties, and the relative merits of the estimators are viewed in connection with the tail-length of the underlying errordistribution.

Robust model selection using fast and robust bootstrap

Computational Statistics & Data Analysis, 2008

Robust model selection procedures control the undue influence that outliers can have on the selection criteria by using both robust point estimators and a bounded loss function when measuring either the goodness-of-fit or the expected prediction error of each model. Furthermore, to avoid favoring over-fitting models, these two measures can be combined with a penalty term for the size of the model. The expected prediction error conditional on the observed data may be estimated using the bootstrap. However, bootstrapping robust estimators becomes extremely time consuming on moderate to high dimensional data sets. It is shown that the expected prediction error can be estimated using a very fast and robust bootstrap method, and that this approach yields a consistent model selection method that is computationally feasible even for a relatively large number of covariates. Moreover, as opposed to other bootstrap methods, this proposal avoids the numerical problems associated with the small bootstrap samples required to obtain consistent model Preprint submitted to Elsevier model selection method is investigated through a simulation study while its feasibility and good performance on moderately large regression models are illustrated on several real data examples.

A New Robust Method for Estimating Linear Regression Model in the Presence of Outliers

Pacific Journal of Science and technology , 2018

Ordinary Least-Squares (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among response values. Even single atypical value may have a large effect on the parameter estimates. In this paper, we propose a new class of robust regression method for the classical linear regression model. The proposed method was developed using regularization methods that allow one to handle a variety of inferential problems where there are more covariates than cases. Specifically, each outlying point in the data is estimated using case-specific parameter. Penalized estimators are often suggested when the number of parameters in the model is more than the number of observed data points. In light of this, we propose the use of Ridge regression method for estimating the case-specific parameters. The proposed robust regression method was validated using Monte-Carlo datasets of varying proportion of outliers. Also, performance comparison was done for the proposed method with some existing robust methods. Assessment criteria results using breakdown point and efficiency revealed the supremacy of the proposed method over the existing methods considered.

Robust Estimation of Multivariate Location and Scatter in the Presence of Missing Data

Journal of the American Statistical Association, 2012

Multivariate location and scatter matrix estimation is a cornerstone in multivariate data analysis. We consider this problem when the data may contain independent cellwise and casewise outliers. Flat data sets with a large number of variables and a relatively small number of cases are common place in modern statistical applications. In these cases global down-weighting of an entire case, as performed by traditional robust procedures, may lead to poor results. We highlight the need for a new generation of robust estimators that can efficiently deal with cellwise outliers and at the same time show good performance under casewise outliers.

Inference for robust canonical variate analysis

2010

We consider the problem of optimally separating two multivariate populations. Robust linear discriminant rules can be obtained by replacing the empirical means and covariance in the classical discriminant rules by S or MM-estimates of location and scatter. We propose to use a fast and robust bootstrap method to obtain inference for such a robust discriminant analysis. This is useful since classical bootstrap methods may be unstable as well as extremely time-consuming when robust estimates such as S or MM-estimates are involved. In particular, fast and robust bootstrap can be used to investigate which variables contribute significantly to the canonical variate, and thus the discrimination of the classes. Through bootstrap, we can also examine the stability of the canonical variate. We illustrate the method on some real data examples.

On the statistical efficiency of robust estimators of multivariate location

Statistical Methodology, 2011

Univariate median is a well-known location estimator, which is √ nconsistent, asymptotically Gaussian and affine equivariant. It is also a robust estimator of location with the highest asymptotic breakdown point (i.e., 50%). While there are several versions of multivariate median proposed and extensively studied in the literature, many of the aforesaid statistical properties of univariate median fail to hold for some of those multivariate medians. Among multivariate medians, the affine equivariant versions of spatial and co-ordinatewise medians have 50% asymptotic breakdown point, and they have asymptotically Gaussian distribution. The minimum covariance determinant (MCD) estimator is another widely used robust estimator of multivariate location, which is also affine equivariant, has 50% asymptotic breakdown point, and its asymptotic distribution is Gaussian. In this article, we make a comparative study of the efficiencies of affine equivariant versions of spatial and co-ordinatewise medians and the efficiencies of MCD and related estimators considered in the literature.