The forward search: Theory and data analysis (original) (raw)

The Forward Search

Compstat, 2002

The paper is based on an introductory talk to a session on the forward search at Compstat 2002 in Berlin. Four examples are considered: multiple regression, response transformation, generalized linear models with gamma errors and multivariate data, exemplified by measurements on Swiss banknotes. The method reveals the effect of multiple outliers on inferences about suitable models for the data. References are given to more extensive treatments in joint work with Marco Riani.

Asymptotic Analysis of the Forward Search

SSRN Electronic Journal, 2000

The Forward Search is an iterative algorithm concerned with detection of outliers and other unsuspected structures in data. This approach has been suggested, analysed and applied for regression models in the monograph . An asymptotic analysis of the Forward Search is made. The argument involves theory for a new class of weighted and marked empirical processes, quantile process theory, and a …xed point argument to describe the iterative element of the procedure.

A robust procedure based on forward search to detect outliers

It is now widely recognized that the presence of outliers or errors in the data collection process can affect the results of any statistical analysis. The effect is likely to be even more severe in presence of complex surveys like Census. In the context of the VI Italian agriculture census, ISTAT has used a robust procedure based on the Forward Search to detect cases in which the collected information by the census was not in agreement with that coming from the General Agency for Agricultural Subsidies (AGEA). The controls have concerned total agricultural area (SAT), used agricultural area (SAU), land for vineyards and olive groves. The outliers have been subject to further investigation to subject matter experts of the regions. This process has enabled to improve in a significant way both the quality of data in the Agriculture census and those of AGEA. This paper summarizes how ISTAT tackled the problems of data correction and control, discusses the methodological problems found d...

Extensions of the Forward Search to Time Series

Studies in Nonlinear Dynamics & Econometrics, 2000

This paper extends the forward search technique to the analysis of structural time series data. It provides a series of powerful new forward plots that use information from the whole sample to display the effect of each observation on a wide variety of aspects of the fitted model and shows how the forward search, free from masking and swamping problems, can detect the main underlying features of the series under study (masked multiple outliers, level shifts or transitory changes). The effectiveness of the suggested approach is shown through the analysis of real and simulated data.

The Forward Search for Very Large Datasets

Journal of Statistical Software, 2015

The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthemore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.

Building Regression Models with the Forward Search

Journal of Computing and Information Technology, 2007

We give an example of the use of the forward search in building a regression model. The standard backwards elimination of variables is supplemented by forward plots of added variable t statistics that exhibit the effect of each observation on the process of model building. Attention is also paid to the effect of individual observations on selection of a transformation. Variable selection using AIC is mentioned, as is the analysis of multivariate data.

A Detection Measure of Outliers Based on Forward Search Approach for Cox-Regression Model

STATISTIKA: Journal of Theoretical Statistics and Its Applications, 2014

This paper focuses on identifying possible outliers based on Cox regression model. Forward search method has been applied in several studies involving regression-based models such as linear regression and generalized linear model. The method starts with a pre-selected subset of a data set. The method moves forward through the data by adding observations one by one and progressive changes in values of statistics are noted. In this paper, we extend the application of forward search in survival data analysis. Currently, graphical methods are used to detect any significant changes in values of the statistics. We propose a measure which may aid us in determining observations that are outlier.

Some Methods of Detection of Outliers in Linear Regression Model

An outlier is an observation that deviates markedly from the majority of the data. To know which observation has greater influence on parameter estimate, detection of outlier is very important. There are several methods for detection of outliers available in the literature. A good number of test-statistics for detecting outliers have been developed. In contrast to detection, outliers are also tackled through robust regression techniques like, M-estimator, Least Median of Square (LMS). Robust regression provides parameter estimates that are insensitive to the presence of outliers and also helps to detect outlying observations. Recently, Forward Search (FS) method has been developed, in which a small number of observations robustly chosen are used to fit a model through Least Square (LS) method. Then more number of observations are included in the subsequent steps. This forward search procedure provides a wealth of information not only for outlier detection but, much more importantly, on the effect of each observation on aspects of inferences about the model. It also reveals the masking problem, if present, very nicely in the data.

Forward search outlier detection in data envelopment analysis

In this paper we tackle the problem of outlier detection in data envelopment analysis (DEA). We propose a procedure where we merge the super-efficiency DEA and the forward search. Since DEA provides efficiency scores which are not parameters to fit the model to the data, we introduce a distance, to be monitored along the search. This distance is obtained through the integration of a regression model and the super-efficiency DEA. We simulate a Cobb–Douglas production function and we compare the super-efficiency DEA and the forward search analysis in both uncontaminated and contaminated settings. For inference about outliers, we exploit envelopes obtained through Monte Carlo simulations.