Marco Riani | Università degli Studi di Parma (Italy) (original) (raw)
Papers by Marco Riani
Statistical Methods & Applications
The paper introduces an automatic procedure for the parametric transformation of the response in ... more The paper introduces an automatic procedure for the parametric transformation of the response in regression models to approximate normality. We consider the Box–Cox transformation and its generalization to the extended Yeo–Johnson transformation which allows for both positive and negative responses. A simulation study illuminates the superior comparative properties of our automatic procedure for the Box–Cox transformation. The usefulness of our procedure is demonstrated on four sets of data, two including negative observations. An important theoretical development is an extension of the Bayesian Information Criterion (BIC) to the comparison of models following the deletion of observations, the number deleted here depending on the transformation parameter.
Pattern Recognition, 2018
Monitoring the properties of single sample robust analyses of multivariate data as a function of ... more Monitoring the properties of single sample robust analyses of multivariate data as a function of breakdown point or efficiency leads to the adaptive choice of the best values of these parameters, eliminating arbitrary decisions about their values and so increasing the quality of estimators. Monitoring the trimming proportion in robust cluster analysis likewise leads to improved estimators. We illustrate these procedures on a sample of 424 cows with bovine phlegmon. For clustering we use a method which includes constraints on the eigenvalues of the dispersion matrices, so avoiding thread shaped clusters. The "car-bike" plot reveals the stability of clustering as the trimming level changes. The pattern of clusters and outliers alters appreciably for low levels of trimming.
Outlook on Agriculture, 2016
The objective of this paper was to test whether investing activity in the futures markets of diff... more The objective of this paper was to test whether investing activity in the futures markets of different commodities (grains, sugar, coffee, cotton, cocoa, livestock) could be identified as a source of the increasing level and volatility of agricultural commodity prices. The causal link between trading activity and market factors (returns, volatility) can be investigated using weekly data, usually derived from the Commitment of Traders Reports released by the US Commodity Futures Trading Commission (CFTC), or daily data expressed as the ratio of volume to open interest (VOIR). To increase the power of the estimation process and investigate the role of causal variables to determine the trends of all the market factors, the authors tested the estimates obtained by seemingly unrelated regression (SUR). One innovation is represented by the evaluation of the inverse relationships between market factors and causal variables. The market factors were also tested as causal variables, avoiding giving priority to only one part of the relationship according to Granger's causality. The lack of significance revealed by the Granger causality test on weekly models could be due to the inappropriate frequency of the information. The ratio of volume to open interest in futures contracts performs better than other parameters extensively adopted in the literature. The likely reason is that it depends on the daily frequency of this parameter, which provides statistical evidence of phenomena that include their effect in weekly intervals. The estimations for the daily model provide statistical evidence of a mutual relationship only between trading activity and realized volatility. No causal relationships were found for returns. The behaviour of all 12 futures markets examined is quite similar and uniform with respect to the scale of the coefficients and their temporal profile.
Springer Series in Statistics, 2004
In spite of its practical relevance, clustering of discrete multivariate observations has receive... more In spite of its practical relevance, clustering of discrete multivariate observations has received relatively little attention within the classification literature. The traditional approach has been to compute suitable measures of pairwise dissimilarity, such as the simple matching coefficient, and then to use these measures as input for hierarchical clustering algorithms. Hierarchical agglomeration plays an important role also in the clustering algorithm of Friedman and Meulman (2004), which can cope with categorical information. The main problem with hierarchical algorithms is that they rapidly become computationally unacceptable and provide results that are difficult to represent as the number of objects grows. Therefore, they may be inappropriate for the analysis of large or even moderate data sets, like those encountered in marketing research or service quality evaluation. The k-modes algorithm of Huang (1998) and Chaturvedi et al. (2001) is a notable exception trying to combin...
Springer Series in Statistics, 2000
In all examples in previous chapters it was assumed that the errors of observation were either no... more In all examples in previous chapters it was assumed that the errors of observation were either normally distributed or, in Chapter 4, could be made approximately so by transformation. This chapter extends the class of models for the forward search to include generalized linear models. We give examples in which the errors of observation have the gamma distribution. For this continuous distribution the results are similar to those for the normal distribution. We also give examples of discrete data from the Poisson distribution and from the binomial. Interest again is in the relationship between the distribution of the response and the values of one or more explanatory variables. The distribution which is most unlike the normal is that for binary data, that is, binomial observations with one trial at each combination of factors.
There are several methods for obtaining very robust estimates of regression parameters that asymp... more There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50 in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.
The problem of robust estimation and multivariate outlier detection of the term structure of defa... more The problem of robust estimation and multivariate outlier detection of the term structure of default intensity is considered. Both the multivariate Vasicek and CIR models, embedding the Kalman filter algorithm in a forward search context, are used to estimate default intensity. The focus is not on the estimation of credit models including jumps, but on the automatic detection of masked multiple outliers in multivariate time series. Both simulated and real market credit spread time series are analyzed. In order to make inference on outliers, confidence envelopes which are virtually independent of the estimated parameters are introduced. The output is not only a unique default intensity term structure curve, as often used in the financial literature, but a robust confidence interval within which default intensity is likely to stay.
The methods of very robust regression resist up to 50% of outliers. The algorithms for very robus... more The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. New algorithms for LMS and LTS estimators that have increased computational efficiency due to improved combinatorial sampling are proposed. These and other publicly available algorithms are compared for outlier detection. Timings and estimator quality are also considered. An algorithm using the forward search (FS) has the best properties for both size and power of the outlier tests.
New robust dynamic plots for regression mixture detection
Description Forward search approach to robust analysis in linear and generalized linear regressio... more Description Forward search approach to robust analysis in linear and generalized linear regression models.
Metron, 2021
Starting with 2020 volume, the journal Metron has decided to celebrate the centenary since its fo... more Starting with 2020 volume, the journal Metron has decided to celebrate the centenary since its foundation with three special issues. This volume is dedicated to robust statistics. A striking feature of most applied statistical analyses is the use of methods that are well known to be sensitive to outliers or to other departures from the postulated model. Robust statistical methods provide useful tools for reducing this sensitivity, through the detection of the outliers by first fitting the majority of the data and then by flagging deviant data points. The six papers in this issue cover a wide orientation in all fields of robustness. This editorial first provides some facts about the history and current state of robust statistics and then summarizes the contents of each paper.
Statistical Methods & Applications
The paper introduces an automatic procedure for the parametric transformation of the response in ... more The paper introduces an automatic procedure for the parametric transformation of the response in regression models to approximate normality. We consider the Box–Cox transformation and its generalization to the extended Yeo–Johnson transformation which allows for both positive and negative responses. A simulation study illuminates the superior comparative properties of our automatic procedure for the Box–Cox transformation. The usefulness of our procedure is demonstrated on four sets of data, two including negative observations. An important theoretical development is an extension of the Bayesian Information Criterion (BIC) to the comparison of models following the deletion of observations, the number deleted here depending on the transformation parameter.
Pattern Recognition, 2018
Monitoring the properties of single sample robust analyses of multivariate data as a function of ... more Monitoring the properties of single sample robust analyses of multivariate data as a function of breakdown point or efficiency leads to the adaptive choice of the best values of these parameters, eliminating arbitrary decisions about their values and so increasing the quality of estimators. Monitoring the trimming proportion in robust cluster analysis likewise leads to improved estimators. We illustrate these procedures on a sample of 424 cows with bovine phlegmon. For clustering we use a method which includes constraints on the eigenvalues of the dispersion matrices, so avoiding thread shaped clusters. The "car-bike" plot reveals the stability of clustering as the trimming level changes. The pattern of clusters and outliers alters appreciably for low levels of trimming.
Outlook on Agriculture, 2016
The objective of this paper was to test whether investing activity in the futures markets of diff... more The objective of this paper was to test whether investing activity in the futures markets of different commodities (grains, sugar, coffee, cotton, cocoa, livestock) could be identified as a source of the increasing level and volatility of agricultural commodity prices. The causal link between trading activity and market factors (returns, volatility) can be investigated using weekly data, usually derived from the Commitment of Traders Reports released by the US Commodity Futures Trading Commission (CFTC), or daily data expressed as the ratio of volume to open interest (VOIR). To increase the power of the estimation process and investigate the role of causal variables to determine the trends of all the market factors, the authors tested the estimates obtained by seemingly unrelated regression (SUR). One innovation is represented by the evaluation of the inverse relationships between market factors and causal variables. The market factors were also tested as causal variables, avoiding giving priority to only one part of the relationship according to Granger's causality. The lack of significance revealed by the Granger causality test on weekly models could be due to the inappropriate frequency of the information. The ratio of volume to open interest in futures contracts performs better than other parameters extensively adopted in the literature. The likely reason is that it depends on the daily frequency of this parameter, which provides statistical evidence of phenomena that include their effect in weekly intervals. The estimations for the daily model provide statistical evidence of a mutual relationship only between trading activity and realized volatility. No causal relationships were found for returns. The behaviour of all 12 futures markets examined is quite similar and uniform with respect to the scale of the coefficients and their temporal profile.
Springer Series in Statistics, 2004
In spite of its practical relevance, clustering of discrete multivariate observations has receive... more In spite of its practical relevance, clustering of discrete multivariate observations has received relatively little attention within the classification literature. The traditional approach has been to compute suitable measures of pairwise dissimilarity, such as the simple matching coefficient, and then to use these measures as input for hierarchical clustering algorithms. Hierarchical agglomeration plays an important role also in the clustering algorithm of Friedman and Meulman (2004), which can cope with categorical information. The main problem with hierarchical algorithms is that they rapidly become computationally unacceptable and provide results that are difficult to represent as the number of objects grows. Therefore, they may be inappropriate for the analysis of large or even moderate data sets, like those encountered in marketing research or service quality evaluation. The k-modes algorithm of Huang (1998) and Chaturvedi et al. (2001) is a notable exception trying to combin...
Springer Series in Statistics, 2000
In all examples in previous chapters it was assumed that the errors of observation were either no... more In all examples in previous chapters it was assumed that the errors of observation were either normally distributed or, in Chapter 4, could be made approximately so by transformation. This chapter extends the class of models for the forward search to include generalized linear models. We give examples in which the errors of observation have the gamma distribution. For this continuous distribution the results are similar to those for the normal distribution. We also give examples of discrete data from the Poisson distribution and from the binomial. Interest again is in the relationship between the distribution of the response and the values of one or more explanatory variables. The distribution which is most unlike the normal is that for binary data, that is, binomial observations with one trial at each combination of factors.
There are several methods for obtaining very robust estimates of regression parameters that asymp... more There are several methods for obtaining very robust estimates of regression parameters that asymptotically resist 50 in the behaviour of these algorithms depend on the distance between the regression data and the outliers. We introduce a parameter λ that defines a parametric path in the space of models and enables us to study, in a systematic way, the properties of estimators as the groups of data move from being far apart to close together. We examine, as a function of λ, the variance and squared bias of five estimators and we also consider their power when used in the detection of outliers. This systematic approach provides tools for gaining knowledge and better understanding of the properties of robust estimators.
The problem of robust estimation and multivariate outlier detection of the term structure of defa... more The problem of robust estimation and multivariate outlier detection of the term structure of default intensity is considered. Both the multivariate Vasicek and CIR models, embedding the Kalman filter algorithm in a forward search context, are used to estimate default intensity. The focus is not on the estimation of credit models including jumps, but on the automatic detection of masked multiple outliers in multivariate time series. Both simulated and real market credit spread time series are analyzed. In order to make inference on outliers, confidence envelopes which are virtually independent of the estimated parameters are introduced. The output is not only a unique default intensity term structure curve, as often used in the financial literature, but a robust confidence interval within which default intensity is likely to stay.
The methods of very robust regression resist up to 50% of outliers. The algorithms for very robus... more The methods of very robust regression resist up to 50% of outliers. The algorithms for very robust regression rely on selecting numerous subsamples of the data. New algorithms for LMS and LTS estimators that have increased computational efficiency due to improved combinatorial sampling are proposed. These and other publicly available algorithms are compared for outlier detection. Timings and estimator quality are also considered. An algorithm using the forward search (FS) has the best properties for both size and power of the outlier tests.
New robust dynamic plots for regression mixture detection
Description Forward search approach to robust analysis in linear and generalized linear regressio... more Description Forward search approach to robust analysis in linear and generalized linear regression models.
Metron, 2021
Starting with 2020 volume, the journal Metron has decided to celebrate the centenary since its fo... more Starting with 2020 volume, the journal Metron has decided to celebrate the centenary since its foundation with three special issues. This volume is dedicated to robust statistics. A striking feature of most applied statistical analyses is the use of methods that are well known to be sensitive to outliers or to other departures from the postulated model. Robust statistical methods provide useful tools for reducing this sensitivity, through the detection of the outliers by first fitting the majority of the data and then by flagging deviant data points. The six papers in this issue cover a wide orientation in all fields of robustness. This editorial first provides some facts about the history and current state of robust statistics and then summarizes the contents of each paper.