LASSO Research Papers - Academia.edu (original) (raw)

This paper presents an approach for fast modeling and identification of robot dynamics. By using a data-driven machine learning approach, the process is simplified considerably from the conventional analytical method. Regressor selection... more

This paper presents an approach for fast modeling and identification of robot dynamics. By using a data-driven machine learning approach, the process is simplified considerably from the conventional analytical method. Regressor selection using the Lasso (l1-norm penalized least squares regression) is used. The method is explained with a simple example of a two-link direct-drive robot. Further demonstration is given by applying the method to a three-link belt-driven robot. Promising result has been demonstrated.

This article introduces the sparse group fused lasso (SGFL) as a statistical framework for segmenting sparse regression models with multivariate time series. To compute solutions of the SGFL, a nonsmooth and nonseparable convex program,... more

This article introduces the sparse group fused lasso (SGFL) as a statistical framework for segmenting sparse regression models with multivariate time series. To compute solutions of the SGFL, a nonsmooth and nonseparable convex program, we develop a hybrid optimization method that is fast, requires no tuning parameter selection, and is guaranteed to converge to a global minimizer. In numerical experiments, the hybrid method compares favorably to state-of-the-art techniques with respect to computation time and numerical accuracy; benefits are particularly substantial in high dimension. The method's statistical performance is satisfactory in recovering nonzero regression coefficients and excellent in change point detection. An application to air quality data is presented. The hybrid method is implemented in the R package sparseGFL available on the author's Github page.

Inflation forecasting is an important but difficult task. In this paper, we explore the advances in machine learning (ML) methods and the availability of new and rich datasets to forecast US inflation over a long period of out-of-sample... more

Inflation forecasting is an important but difficult task. In this paper, we explore the advances in machine learning (ML) methods and the availability of new and rich datasets to forecast US inflation over a long period of out-of-sample observations. Despite the skepticism in the previous literature, we show that ML models with a large number of covariates are systematically more accurate than the benchmarks for several forecasting horizons both in the 1990s and the 2000s. The ML method that deserves more attention is the random forest, which dominated all other models in several cases. The good performance of the random forest method is due not only to its specific method of variable selection but also the potential nonlinearities between past key macroeconomic variables and inflation. The results are robust to inflation measures, different samples, levels of macroeconomic uncertainty, and periods of recession and expansion.

The LASSO is a penalized regression method which simultaneously performs shrinkage and variable selection. The output produced by the LASSO consists of a piecewise linear solution path, starting with the null model and ending with the... more

The LASSO is a penalized regression method which simultaneously performs shrinkage and variable selection. The output produced by the LASSO consists of a piecewise linear solution path, starting with the null model and ending with the full least squares fit, as the value of a tuning parameter is decreased. The performance of the selected model therefore depends greatly on the choice of this parameter. This paper attempts to provide an overview of methods which are available to select the value of the tuning parameter for either prediction or variable selection purposes. A simulation study provides a comparison of these methods and assesses their performance.

We study the asymptotic properties of the Adaptive LASSO (adaLASSO) in sparse, high-dimensional, linear time-series models. The adaLASSO is a one-step implementation of the family of folded concave penalized least-squares. We assume that... more

We study the asymptotic properties of the Adaptive LASSO (adaLASSO) in sparse, high-dimensional, linear time-series models. The adaLASSO is a one-step implementation of the family of folded concave penalized least-squares. We assume that both the number of covariates in the model and the number of candidate variables can increase with the sample size (polynomially or geometrically). In other words, we let the number of candidate variables to be larger than the number of observations. We show the adaLASSO consistently chooses the relevant variables as the number of observations increases (model selection consistency) and has the oracle property, even when the errors are non-Gaussian and conditionally heteroskedastic. This allows the adaLASSO to be applied to a myriad of applications in empirical finance and macroeconomics. A simulation study shows that the method performs well in very general settings with t-distributed and heteroskedastic errors as well with highly correlated regressors. Finally, we consider an application to forecast monthly US inflation with many predictors. The model estimated by the adaLASSO delivers superior forecasts than traditional benchmark competitors such as autoregressive and factor models.

The common issues of high-dimensional gene expression data are that many of genes may not be relevant to their diseases. Genes have naturally pathway structure, where the pathway contains several genes sharing a biological function. Gene... more

The common issues of high-dimensional gene expression data are that many of genes may not be relevant to their diseases. Genes have naturally pathway structure, where the pathway contains several genes sharing a biological function. Gene selection has been proved to be an effective way to improve the result of many classification methods. It is of great interest to incorporate pathway knowledge into gene selection. In this paper, a weighted sparse support vector is proposed, with the aim of identification genes and pathways, by combining the support vector machine with the weighted L 1-norm. Experimental results based on three publicly gene expression datasets show that the proposed method significantly outperforms three competitor methods in terms of classification accuracy, G-mean, and area under the curve. In addition, the results demonstrate that the top identified genes and pathways are biologically related to the cancer type. Thus, the proposed method can be useful for cancer classification using DNA gene expression data in the real clinical practice.

En esta investigación se construye un listado de verificación (“checklist”) de características que permiten perfilar de forma diferenciada la manifestación de las distintas formas de Violencia Entre Parejas (VEP) en el hogar (Psicológica,... more

En esta investigación se construye un listado de verificación (“checklist”) de características que permiten perfilar de forma diferenciada la manifestación de las distintas formas de Violencia Entre Parejas (VEP) en el hogar (Psicológica, Económica, Física y Sexual) que afectan a la mujer. Para ello, partiendo del Enfoque Ecológico de la Violencia (EEV) que establece que la VEP debe analizarse a partir de factores individuales, del hogar y de la sociedad, fueron estimados una serie de modelos probabilísticos. Dada la alta dimensionalidad del fenómeno analizado, fue necesario utilizar el estimador LASSO, el cual permitió realizar una selección de variables y con esto construir el “checklist”. Luego, con estas variables se construyó el Índice de Dominancia Relativa (IDR), el cual permite diseñar un ranking en base a la contribución que cada variable aporta a la varianza total, por tipo de VEP. Los resultados sugieren que las variables que más contribuyen son: la experiencia de haber sufrido violencia durante la adolescencia por parte de la mujer afectada y el cónyuge (violencia intergeneracional o transmisión vertical); el nivel violencia en el entorno comunitario (efecto contaminación de la comunidad o transmisión horizontal); y otras variables asociadas, como la frecuencia de matrimonio y uniones de la mujer, falta acceso a la de educación en el cónyuge, la diferencia de edad, entre otras. Finalmente, utilizando estrategias de “Machine Learning” para “Oversampling”, se estimaron los puntajes de probabilidad de padecer VEP por tipo de Violencia, a partir de la selección de variables arrojada por el estimador LASSO.

The lasso in the ethno-cultural tradition of the Alans-Ossetians. The lasso was an important tool among the pastoral nomadic tribes of the Great Steppe. According to the Latin sources, the Alans were experts in the use of the lasso. In... more

The lasso in the ethno-cultural tradition of the Alans-Ossetians. The lasso was an important tool among the pastoral nomadic tribes of the Great Steppe. According to the Latin sources, the Alans were experts in the use of the lasso. In this article are collected the written and material evidences of use of the lasso among mostly all of the Northern Iranian tribes. There is also enough evidence to believe that the Ossetians, dispite the fact that they lost the possibility of large-scale horse breeding due to the lack of pastures, mantained the use of the lasso as a tool and weapon. This is also reflected in the Ossetian Nart Sagas, where the lasso is frequently mentioned.

We show that high-dimensional models produce, on average, smaller forecasting errors for macroeconomic variables when we consider a large set of predictors. Our results showed that a good selection of the adaptive LASSO hyperparameters... more

We show that high-dimensional models produce, on average, smaller forecasting errors for macroeconomic variables when we consider a large set of predictors. Our results showed that a good selection of the adaptive LASSO hyperparameters also reduces forecast errors.

Because church music served no liturgical function in the Zwinglian cantons of the Old Swiss Confederacy, music was largely pushed into the private sphere. From 1600 on, the thirst for music in parts gave rise to the development of... more

Because church music served no liturgical function in the Zwinglian cantons of the Old Swiss Confederacy, music was largely pushed into the private sphere. From 1600 on, the thirst for music in parts gave rise to the development of Collegia Musica in the reformed cities of German-speaking Switzerland. These organizations recruited their members from the middle classes, who enjoyed a musical education from a young age. The statutes of the Collegia Musica show that their members brought their own instruments and sheet music to the Collegium and occasionally took them home as well. Gifts of sheet music, instruments or money from individual donors to the Aargauer Musikkollegium demonstrate the widespread importance of music within the middle class, and documents from Winterthur show that certain musical townspeople made extensive donations of music to the Collegia. The breadth of the musical repertoire documented is quite astonishing, ranging from music published in German-speaking regions and northern Italy to compositions from Catholic areas and even some pieces from the 16th century. At the same time, however, figures such as the Zurich merchant Salomon Ott (1653-1711) owned the latest musical prints and also favored much-frowned-upon dance music.

We backtest 59 instruments and investigate the predictability of daily returns using Bayesian variable selection methods. Through these models we show the importance of variable selection and reduction of over tting. We also visualize how... more

We backtest 59 instruments and investigate the predictability of daily returns using
Bayesian variable selection methods. Through these models we show the importance
of variable selection and reduction of over tting. We also visualize how the driving
factors of daily returns from di erent classes vary over time. Predicting daily returns'
magnitude is again con rmed to be a hard task but we show that for some instruments
it is possible to achieve above average hit-rates that could lead to pro table strategies.
Simulation results show that predictability levels of daily returns also vary over time.

(1) Background: Medical imaging provides quantitative and spatial information to evaluate treatment response in the management of patients with non-small cell lung cancer (NSCLC). High throughput extraction of radiomic features on these... more

(1) Background: Medical imaging provides quantitative and spatial information to evaluate treatment response in the management of patients with non-small cell lung cancer (NSCLC). High throughput extraction of radiomic features on these images can potentially phenotype tumors non-invasively and support risk stratification based on survival outcome prediction. The prognostic value of radiomics from different imaging modalities and time points prior to and during chemoradiation therapy of NSCLC, relative to conventional imaging biomarker or delta radiomics models, remains uncharacterized. We investigated the utility of multitask learning of multi-time point radiomic features, as opposed to single-task learning, for improving survival outcome prediction relative to conventional clinical imaging feature model benchmarks. (2) Methods: Survival outcomes were prospectively collected for 45 patients with unresectable NSCLC enrolled on the FLARE-RT phase II trial of risk-adaptive chemoradiation and optional consolidation PD-L1 checkpoint blockade (NCT02773238). FDG-PET, CT, and perfusion SPECT imaging pretreatment and week 3 mid-treatment was performed and 110 IBSI-compliant pyradiomics shape-/intensity-/texture-based features from the metabolic tumor volume were extracted. Outcome modeling consisted of a fused Laplacian sparse group LASSO with component-wise gradient boosting survival regression in a multitask learning framework. Testing performance under stratified 10-fold cross-validation was evaluated for multitask learning radiomics of different imaging modalities and time points. Multitask learning models were benchmarked against conventional clinical imaging and delta radiomics models and evaluated with the concordance index (c-index) and index of prediction accuracy (IPA). (3) Results: FDG-PET radiomics had higher prognostic value for overall survival in test folds (c-index 0.71 [0.67, 0.75]) than CT radiomics (c-index 0.64 [0.60, 0.71]) or perfusion SPECT radiomics (c-index 0.60 [0.57, 0.63]). Multitask learning of pre-/mid-treatment FDG-PET radiomics (c-index 0.71 [0.67, 0.75]) outperformed benchmark clinical imaging (c-index 0.65 [0.59, 0.71]) and FDG-PET delta radiomics (c-index 0.52 [0.48, 0.58]) models. Similarly, the IPA for multitask learning FDG-PET radiomics (30%) was higher than clinical imaging (26%) and delta radiomics (15%) models. Radiomics models performed consistently under different voxel resampling conditions. (4) Conclusion: Multitask learning radiomics for outcome modeling provides a clinical decision support platform that leverages longitudinal imaging information. This framework can reveal the relative importance of different imaging modalities and time points when designing risk-adaptive cancer treatment strategies.

Multinomial Logistic Regression (MLR) has been advocated for developing clinical prediction models that distinguish between three or more unordered outcomes. We present a full-factorial simulation study to examine the predic-tive... more

Multinomial Logistic Regression (MLR) has been advocated for developing clinical prediction models that distinguish between three or more unordered outcomes. We present a full-factorial simulation study to examine the predic-tive performance of MLR models in relation to the relative size of outcome categories, number of predictors and the number of events per variable. It is shown that MLR estimated by Maximum Likelihood yields overfitted prediction models in small to medium sized data. In most cases, the calibration and overall predictive performance of the multinomial prediction model is improved by using penalized MLR. Our simulation study also highlights the importance of events per variable in the multinomial context as well as the total sample size. As expected, our study demonstrates the need for optimism correction of the predictive performance measures when developing the multinomial logistic prediction model. We recommend the use of penalized MLR when prediction models are developed in small data sets or in medium sized data sets with a small total sample size (ie, when the sizes of the outcome categories are balanced). Finally, we present a case study in which we illustrate the development and validation of penalized and unpenalized multinomial prediction models for predicting malignancy of ovarian cancer.

As one important means of ensuring secure operation in a power system, the contingency selection and ranking methods need to be more rapid and accurate. A novel method-based least absolute shrinkage and selection operator (Lasso)... more

As one important means of ensuring secure operation in a power system, the contingency selection and ranking methods need to be more rapid and accurate. A novel method-based least absolute shrinkage and selection operator (Lasso) algorithm is proposed in this paper to apply to online static security assessment (OSSA). The assessment is based on a security index, which is applied to select and screen contingencies. Firstly, the multi-step adaptive Lasso (MSA-Lasso) regression algorithm is introduced based on the regression algorithm, whose predictive performance has an advantage. Then, an OSSA module is proposed to evaluate and select contingencies in different load conditions. In addition, the Lasso algorithm is employed to predict the security index of each power system operation state with the consideration of bus voltages and power flows, according to Newton-Raphson load flow (NRLF) analysis in post-contingency states. Finally, the numerical results of applying the proposed approach to the IEEE 14-bus, 118-bus, and 300-bus test systems demonstrate the accuracy and rapidity of OSSA.

In this paper we show the validity of the adaptive LASSO procedure in estimating stationary ARDL(p,q) models with GARCH innovations. We show that, given a set of initial weights, the adaptive Lasso selects the relevant variables with... more

In this paper we show the validity of the adaptive LASSO procedure in estimating stationary ARDL(p,q) models with GARCH innovations. We show that, given a set of initial weights, the adaptive Lasso selects the relevant variables with probability converging to one. Afterwards, we show that the estimator is oracle, meaning that its distribution converges to the same distribution of the oracle assisted least squares, i.e., the least squares estimator calculated as if we knew the set of relevant variables beforehand. Finally, we show that the LASSO estimator can be used to construct the initial weights. The performance of the method in finite samples is illustrated using Monte Carlo simulation. 1

Aging is one of the chief biomedical problems of the 21st century. After decades of basic research on biogerontology (the science of aging), the aging process still remains an enigma. Although hundreds of "theories" on aging have been... more

Aging is one of the chief biomedical problems of the 21st century. After decades of basic research on biogerontology
(the science of aging), the aging process still remains an enigma. Although hundreds of "theories" on aging have
been formulated and many fundamental insights about age-related changes and genetic as well as environmental
interventions that change the pace of aging have been discovered, the actual why and how we age remain enigmatic.
In the post-genomic era there is an exponential increase in data. As a consequence it is a challenge to utilize all
information based on it and derive meaningful knowledge about biological phenomena. No individual scientist, no
group, nor consortium is capable of keeping up within their own field and are overwhelmed by the explosion of data
increase. Machine learning applied on biological data has the potential to solve this and cause a paradigm shift from
hypothesis-driven research (which predominates biological research including biogerontology) to data-driven
research.
This dissertation addresses this problem. In particular it proposes and executes the use of machine learning on
current existing data to predict drivers of aging (and therefore helps to distinguish causes from consequences),
interventions to counteract aging, and specific hypotheses to fill in research gaps that require experimental
validation.
The objective of this project is therefore to build computational models that are based on data relevant to the
phenomenon of aging and to predict as many of its aspects and dimensions as possible (thus elucidate their relations
to each other). For converting between and sorting within dimensions which are relevant to aging, different machine
learning models are evaluated. Ones models are built, it can be determined how much they can explain different
aspects of aging. Those models will also be capable of specifying which features are most relevant for prediction (in
both classification or regression). It is possible to train models that incorporate age-related changes based on
transcriptomic, proteomic, metabolomic, epigenomic as well as morphological data and their combinations. Machine
learning is further used to convert between and within them.
This work focuses on three types of predictors. Subsequently, discoveries are made with the statistical and learning
algorithms. The first model (lifespan predictor) is trained on predicting the lifespan based on genotype, environment
and combinations thereof. It is useful for predicting lifespan extending interventions on the population level. The
second model (age predictor) is trained on predicting the age given features measured on individuals. This is useful
for identifying biomarkers of aging and to determine the effects of interventions on the level of individuals. The third
model predicts functions/regulations of biological entities in regard to the aging process based on heterogeneous data
such as ontologies and diverse omics including time-series gene expression profiles (which can be visualized as
plots), and linked data. It is used to understand the role of genes and proteins as well as perhaps other entities such as
small molecules including lipids and other metabolites. Functions of proteins, which are still unknown, especially
those involved in yeast lipid metabolism and its regulation, can be predicted.
For this purpose we use primarily yeast as model organism as well as data on humans. Other biomedical model
organisms might be added if found beneficial.
The novel aspects of this research are for instance that 1) aging is investigated systematically in an unbiased
data-driven approach, 2) lifespan is predicted as continuous values, 3) age is predicted by combining multiple omics
data, 4) functions and regulations of biological entities like genes are predicted with high confidence from
heterogeneous data sources.
This thesis discovered that genetics is the most important feature of lifespan determination. Phenotypic features
related to lipid and membranes such as vacuolar morphology and autophagy activity are important for lifespan
determination according to the best performing models. A age predictor based on transcriptomics and proteomics can
highly accurately determine the age. It selected features are associated with both translation and lipid metabolism.
Among the top selected features are transcripts of genes when deleted exhibit abnormal vacuolar morphology as well
as targets of Opi1. Opi1 itself and its regulators were found to be differentially regulated post-transcriptional or
post-translational. Lastly, a function predictor for genes was created that achieved exceptional accuracy of
classifying aging genes. It learned for instance that piecemeal autophagy of the nucleus is strongly predictive for
aging-suppressor genes while cytoplasmic translation is strongly predictive for gerontogenes

The change-point problem is reformulated as a penalized likelihood estimation problem. A new non-convex penalty function is introduced to allow consistent estimation of the number of change points, and their locations and sizes. Penalized... more

The change-point problem is reformulated as a penalized likelihood estimation problem. A new non-convex penalty function is introduced to allow consistent estimation of the number of change points, and their locations and sizes. Penalized likelihood methods based on LASSO and SCAD penalties may not satisfy such a property. The asymptotic properties for the local solutions are established and numerical studies are conducted to highlight their performance. An application to copy number variation is discussed.

Prognostic models for survival outcomes are often developed by fitting standard survival regression models, such as the Cox proportional hazards model, to representative datasets. However, these models can be unreliable if the datasets... more

Prognostic models for survival outcomes are often developed by fitting standard survival regression models, such as the Cox proportional hazards model, to representative datasets. However, these models can be unreliable if the datasets contain few events, which may be the case if either the disease or the event of interest is rare. Specific problems include predictions that are too extreme, and poor discrimination between low-risk and high-risk patients. The objective of this paper is to evaluate three existing penalised methods that have been proposed to improve predictive accuracy. In particular, ridge, lasso and the garotte, which use penalised maximum likelihood to shrink coefficient estimates and in some cases omit predictors entirely, are assessed using simulated data derived from two clinical datasets. The predictions obtained using these methods are compared with those from Cox models fitted using standard maximum likelihood. The simulation results suggest that Cox models fi...

We show that data-driven instrument selection based on the LASSO estimator can perform well comparative to the usual ad hoc instrument set for single equation estimation of a forward-looking Phillips Curve, when the overall identification... more

We show that data-driven instrument selection based on the LASSO estimator can perform well comparative to the usual ad hoc instrument set for single equation estimation of a forward-looking Phillips Curve, when the overall identification condition is strong or in cases when the instruments are not very weak. We conclude that in face of model uncertainty and/or some potentially weak instruments within a large number of candidates, data-driven selection may provide a disciplined and more reliable estimation strategy.

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC)... more

Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC) [In IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2009) 2790-2797] to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.

In the applications of finite mixture of regression models, a large number of covariates are often used and their contributions toward the response variable vary from one component to another of the mixture model. This creates a complex... more

In the applications of finite mixture of regression models, a large number of covariates are often used and their contributions toward the response variable vary from one component to another of the mixture model. This creates a complex variable selection problem. Existing methods, such as AIC and BIC, are computationally expensive as the number of covariates and the components in the mixture model increase. In this paper, we introduce a penalized likelihood approach for variable selection in finite mixture of regression models. The new method introduces a penalty which depends on the sizes of regression coefficients and the mixture structure. The new method is shown to have the desired sparsity property. A data adaptive method for selecting tuning parameters, and an EM-algorithm for efficient numerical computations are developed. Simulations show that the method has very good performance with much lower demand on computing power. The new method is also illustrated by analyzing a re...

In this paper we adopt Adaptive Lasso techniques in vector Multiplicative Error Models (vMEM), and we show that they provide asymptotic consistency in variable selection and the same efficiency as if the set of true predictors were known... more

In this paper we adopt Adaptive Lasso techniques in vector Multiplicative Error Models (vMEM), and we show that they provide asymptotic consistency in variable selection and the same efficiency as if the set of true predictors were known in advance (oracle property). A Monte Carlo exercise demonstrates the good performance of this approach and an empirical application shows its effectiveness in studying the network of volatility spillovers among European financial indices, during and after the sovereign debt crisis. We conclude demonstrating the superior volatility forecast ability of Adaptive Lasso techniques also when a common trend is removed prior to multivariate volatility spillover analysis.

We evaluate the predictive performance of a variety of value-at-risk (VaR) models for a portfolio consisting of five assets. Traditional VaR models such as historical simulation with bootstrap and filtered historical simulation methods... more

We evaluate the predictive performance of a variety of value-at-risk (VaR) models for a portfolio consisting
of five assets. Traditional VaR models such as historical simulation with bootstrap and filtered historical
simulation methods are considered. We suggest a new method for estimating Value at Risk: the filtered
historical simulation GJR-GARCH method based on bootstrapping the standardized GJR-GARCH residuals.
The predictive performance is evaluated in terms of three criteria, the test of unconditional coverage,
independence and conditional coverage and the quadratic loss function suggested. The results show that
classical methods are in efficient under moderate departures from normality and that the new method
produces the most accurate forecasts of extreme losses.

In this paper, we aim to bring together into one common framework various advances in factor-based hedge fund replication. Our replication methodology relies on a set of investable dynamic risk factors extracted from futures contract... more

In this paper, we aim to bring together into one common framework various advances in factor-based hedge fund replication. Our replication methodology relies on a set of investable dynamic risk factors extracted from futures contract prices and on an automatic variable and model selection procedure. The methodology is then validated by creating out-of-sample replicating portfolios for the monthly returns of more than 7,000 hedge funds ranging from 2006 to 2012 and under the assumption of transaction costs. Our results suggest that hedge fund replication is on average possible and works best for liquid strategies.