J. Dorronsoro | Universidad Autónoma de Madrid (original) (raw)
Papers by J. Dorronsoro
Proceedings of the 4th international conference conference on Computer systems and technologies e-Learning - CompSysTech '03, 2003
Natural images form very small subset of all images. In spite of this fact, the direct computatio... more Natural images form very small subset of all images. In spite of this fact, the direct computation of their block densities is not possible. On the other hand, the existence of various successful image compression methods, in particularly, the fractal compression, indicates that the compression somehow is able to capture and use at least part of the natural image statistics. In this work we show how hash based fractal image compression can be used to derive quite precise the entropies of 4 x 4 patches of the natural images. We state that the probability density in first order factorize to the probability densities of the contrast, the brightness and the index of the codebook blocks.
Journal of Modern Power Systems and Clean Energy, 2018
General noise cost functions have been recently proposed for support vector regression (SVR). Whe... more General noise cost functions have been recently proposed for support vector regression (SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models should perform better than classical-SVR. On the other hand, uncertainty estimates for SVR have received a somewhat limited attention in the literature until now and still have unaddressed problems. Keeping this in mind, three main goals are addressed here. First, we propose a framework that uses a combination of general noise SVR models with naive online R minimization algorithm (NORMA) as optimization method, and then gives nonconstant error intervals dependent upon input data aided by the use of clustering techniques. We give theoretical details required to implement this framework for Laplace, Gaussian, Beta, Weibull and Marshall-Olkin generalized exponential distributions. Second, we test the proposed framework in two real-world regression problems using data of two public competitions about solar energy. Results show the validity of our models and an improvement over classical-SVR. Finally, in accordance with the principle of reproducible research, we make sure that data and model implementations used for the experiments are easily and publicly accessible.
Renewable Energy and Power Quality Journal, 2014
The growing presence of solar energy in the electrical systems of many countries has made its acc... more The growing presence of solar energy in the electrical systems of many countries has made its accurate forecasting an important issue. In this work we will explore the application of Support Vector Regression (SVR), an advanced Machine Learning modelling tool, to forecast the daily photovoltaic generation of Spain. Given the very large geographical spread of photovoltaic installations, we will use as input features NWP forecasts of relevant meteorological variables for the entire Iberian Peninsula. The input dimension is thus very large but, while further work is needed, our results show SVR to be an effective tool to deal with the problem's underlying dimension, yield useful forecasts and further provide some insights on the relationship between NWP and actual solar energy production.
IEEE transactions on neural networks and learning systems, 2012
In this brief, we give a new proof of the asymptotic convergence of the sequential minimum optimi... more In this brief, we give a new proof of the asymptotic convergence of the sequential minimum optimization (SMO) algorithm for both the most violating pair and second order rules to select the pair of coefficients to be updated. The proof is more self-contained, shorter, and simpler than previous ones and has a different flavor, partially building upon Gilbert's original convergence proof of its algorithm to solve the minimum norm problem for convex hulls. It is valid for both support vector classification (SVC) and support vector regression, which are formulated under a general problem that encompasses them. Moreover, this general problem can be further extended to also cover other support vector machines (SVM)-related problems such as -SVC or one-class SVMs, while the convergence proof of the slight variant of SMO needed for them remains basically unchanged.
Journal of Biomolecular Screening, 2013
High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated m... more High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated microscopy and is increasingly being adopted for small interfering RNA genomic screening and phenotypic drug discovery. We introduce a series of cell-based evaluation metrics that have been implemented and validated in a mono-parametric HCS for regulators of the membrane trafficking protein caveolin 1 (CAV1) and have also proved useful for the development of a multiparametric phenotypic HCS for regulators of cytoskeletal reorganization. Imaging metrics evaluate imaging quality such as staining and focus, whereas cell biology metrics are fuzzy logic-based evaluators describing complex biological parameters such as sparseness, confluency, and spreading. The evaluation metrics were implemented in a data-mining pipeline, which first filters out cells that do not pass a quality criterion based on imaging metrics and then uses cell biology metrics to stratify cell samples to allow further analysis of homogeneous cell populations. Use of these metrics significantly improved the robustness of the monoparametric assay tested, as revealed by an increase in Z' factor, Kolmogorov-Smirnov distance, and strict standard mean difference. Cell biology evaluation metrics were also implemented in a novel supervised learning classification method that combines them with phenotypic features in a statistical model that exceeded conventional classification methods, thus improving multiparametric phenotypic assay sensitivity.
Industrial Applications of Neural Networks, 1998
Lecture Notes in Computer Science, 1995
Fast automatic methods of architecture selection are of great interest for use in local, dynamica... more Fast automatic methods of architecture selection are of great interest for use in local, dynamical modeling. A general procedure to select an optimal network architecture is proposed in the case of RBF nets. Taking as a starting point the universal approximation properties of RBF networks and the natural interpretation of their weights, a search for the simplest, best generalizing network
Lecture Notes in Computer Science, 1997
In one form or another, noise is present in almost any signal processing problem. Moreover, its p... more In one form or another, noise is present in almost any signal processing problem. Moreover, its profile is very often essentially unknown. Efficient automatic methods for its detection and filtering are thus very useful, and more so if they only rely on the signal internal structure. In this note we will shown how one such method can be obtained through
Neural Networks: Artificial Intelligence and Industrial Applications, 1995
The relat ionship between the quality of state space reconst ruction and the accuracy in time ser... more The relat ionship between the quality of state space reconst ruction and the accuracy in time series forecast ing is analyzed. The averaged scalar product of the dynamical syst em flow vectors has been used to give a degree of determinism to the selected state space reconst ruct ion. T his value helps dist inguish between those regions of t he state space where predictions will be accurate and those where they are not . A time series measur ed in an industri al environment where noise is present is used as an example. It is shown that prediction methods used to estimate futu re values play a less import ant role than a good reconstruction of t he state space itself.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2009
Implicit Wiener series are a powerful tool to build Volterra representations of time series with ... more Implicit Wiener series are a powerful tool to build Volterra representations of time series with any degree of nonlinearity. A natural question is then whether higher order representations yield more useful models. In this work we shall study this question for ECoG data channel relationships in epileptic seizure recordings, considering whether quadratic representations yield more accurate classifiers than linear ones. To do so we first show how to derive statistical information on the Volterra coefficient distribution and how to construct seizure classification patterns over that information. As our results illustrate, a quadratic model seems to provide no advantages over a linear one. Nevertheless, we shall also show that the interpretability of the implicit Wiener series provides insights into the inter-channel relationships of the recordings.
2010 International Joint Conference on Neural Networks (IJCNN 2010), 2010
... Email: j.lopez@uam.es (Jorge López), al-varo.barbero@uam.es ( ´Alvaro Barbero), jose.r.dorron... more ... Email: j.lopez@uam.es (Jorge López), al-varo.barbero@uam.es ( ´Alvaro Barbero), jose.r.dorronsoro@iic.uam.es (José R. Dorronsoro). ... idea is to scale down the initial sample convex hulls by means of a parameter λ in such a way that the shape of the original hull is preserved. ...
Artificial Neural Networks and Machine Learning - ICANN 2012. 22nd International Conference on Artificial Neural Networks, 2012
Physical Review E, 1994
Abstract The averaged scalar product, P, of the dynamical system flow vectors evaluated along the... more Abstract The averaged scalar product, P, of the dynamical system flow vectors evaluated along the trajectory is shown to be a very simple, efficient, and useful quantity for the purpose of selecting adequate embedding dimensions and time delays. The effectiveness ...
Artificial Intelligence in Real-Time Control 1992, 1993
ABSTRACT In this paper a detailed description of the MIP project is presented. MIP (Intelligent P... more ABSTRACT In this paper a detailed description of the MIP project is presented. MIP (Intelligent Process Monitoring) is a Real-Time Expert System that monitors, diagnoses, and generates suggestions in real time about optimization and stability concerning the operation of a petrochemical plant. MIP software architecture is based on the concept of Blackboard. The Blackboard is the central mechanism for information exchange between the modules of the system, being also the only centralized knowledge representation scheme within the system. MIP uses a hierarchical knowledge representation with four levels of abstraction, different knowledge sources being responsible of maintaining each of these levels. MIP is deployed and in current use since March 1991 in an acrylonitrile plant of REPSOL QUIMICA, S.A. at Tarragona (Spain), with a reported success both from the technical and the economical point of view.
Lecture Notes in Computer Science, 2013
ABSTRACT The increasing importance of solar energy has made the accurate forecasting of radiation... more ABSTRACT The increasing importance of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression to downscale and improve 3-hour accumulated radiation forecasts for two locations in Spain. We use either direct 3-hour SVR-refined forecasts or we build first global accumulated daily predictions and disaggregate them into 3-hour values, with both approaches outperforming the base forecasts. We also interpolate the 3-hour forecasts into hourly values using a clear sky radiation model. Here again the disaggregated SVR forecast perform better than the base ones, but the SVR advantage is now less marked. This may be because of the clear sky assumption made for interpolation not being adequate for cloudy days or because of the underlying clear sky model not being adequate enough. In any case, our study shows that machine learning methods or, more generally, hybrid artificial intelligence systems are quite relevant for solar energy prediction.
Lecture Notes in Computer Science, 2012
ABSTRACT In this work we will apply sparse linear regression methods to forecast wind farm energy... more ABSTRACT In this work we will apply sparse linear regression methods to forecast wind farm energy production using numerical weather prediction (NWP) features over several pressure levels, a problem where pattern dimension can become very large. We shall place sparse regression in the context of proximal optimization, which we shall briefly review, and we shall show how sparse methods outperform other models while at the same time shedding light on the most relevant NWP features and on their predictive structure.
Neural Computation, 1993
We present two methods for the prediction of coupled time series. The first one is based on model... more We present two methods for the prediction of coupled time series. The first one is based on modeling the series by a dynamic system with a polynomial format. This method can be formulated in terms of learning in a recurrent network, for which we give a computationally effective algorithm. ...
Journal of Biomolecular Screening, 2013
High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated m... more High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated microscopy and is increasingly being adopted for small interfering RNA genomic screening and phenotypic drug discovery. We introduce a series of cell-based evaluation metrics that have been implemented and validated in a mono-parametric HCS for regulators of the membrane trafficking protein caveolin 1 (CAV1) and have also proved useful for the development of a multiparametric phenotypic HCS for regulators of cytoskeletal reorganization. Imaging metrics evaluate imaging quality such as staining and focus, whereas cell biology metrics are fuzzy logic-based evaluators describing complex biological parameters such as sparseness, confluency, and spreading. The evaluation metrics were implemented in a data-mining pipeline, which first filters out cells that do not pass a quality criterion based on imaging metrics and then uses cell biology metrics to stratify cell samples to allow further analysis of homogeneous cell populations. Use of these metrics significantly improved the robustness of the monoparametric assay tested, as revealed by an increase in Z' factor, Kolmogorov-Smirnov distance, and strict standard mean difference. Cell biology evaluation metrics were also implemented in a novel supervised learning classification method that combines them with phenotypic features in a statistical model that exceeded conventional classification methods, thus improving multiparametric phenotypic assay sensitivity.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000
This paper reports on the evaluation of different machine learning techniques for the automated c... more This paper reports on the evaluation of different machine learning techniques for the automated classification of coding gene sequences obtained from several organisms in terms of their functional role as adhesins. Diverse, biologically-meaningful, sequence-based features were extracted from the sequences and used as inputs to the in silico prediction models. Another contribution of this work is the generation of potentially novel and testable predictions about the surface protein DGF-1 family in Trypanosoma cruzi. Finally, these techniques are potentially useful for the automated annotation of known adhesin-like proteins from the trans-sialidase surface protein family in T. cruzi, the etiological agent of Chagas disease.
Proceedings of the 4th international conference conference on Computer systems and technologies e-Learning - CompSysTech '03, 2003
Natural images form very small subset of all images. In spite of this fact, the direct computatio... more Natural images form very small subset of all images. In spite of this fact, the direct computation of their block densities is not possible. On the other hand, the existence of various successful image compression methods, in particularly, the fractal compression, indicates that the compression somehow is able to capture and use at least part of the natural image statistics. In this work we show how hash based fractal image compression can be used to derive quite precise the entropies of 4 x 4 patches of the natural images. We state that the probability density in first order factorize to the probability densities of the contrast, the brightness and the index of the codebook blocks.
Journal of Modern Power Systems and Clean Energy, 2018
General noise cost functions have been recently proposed for support vector regression (SVR). Whe... more General noise cost functions have been recently proposed for support vector regression (SVR). When applied to tasks whose underlying noise distribution is similar to the one assumed for the cost function, these models should perform better than classical-SVR. On the other hand, uncertainty estimates for SVR have received a somewhat limited attention in the literature until now and still have unaddressed problems. Keeping this in mind, three main goals are addressed here. First, we propose a framework that uses a combination of general noise SVR models with naive online R minimization algorithm (NORMA) as optimization method, and then gives nonconstant error intervals dependent upon input data aided by the use of clustering techniques. We give theoretical details required to implement this framework for Laplace, Gaussian, Beta, Weibull and Marshall-Olkin generalized exponential distributions. Second, we test the proposed framework in two real-world regression problems using data of two public competitions about solar energy. Results show the validity of our models and an improvement over classical-SVR. Finally, in accordance with the principle of reproducible research, we make sure that data and model implementations used for the experiments are easily and publicly accessible.
Renewable Energy and Power Quality Journal, 2014
The growing presence of solar energy in the electrical systems of many countries has made its acc... more The growing presence of solar energy in the electrical systems of many countries has made its accurate forecasting an important issue. In this work we will explore the application of Support Vector Regression (SVR), an advanced Machine Learning modelling tool, to forecast the daily photovoltaic generation of Spain. Given the very large geographical spread of photovoltaic installations, we will use as input features NWP forecasts of relevant meteorological variables for the entire Iberian Peninsula. The input dimension is thus very large but, while further work is needed, our results show SVR to be an effective tool to deal with the problem's underlying dimension, yield useful forecasts and further provide some insights on the relationship between NWP and actual solar energy production.
IEEE transactions on neural networks and learning systems, 2012
In this brief, we give a new proof of the asymptotic convergence of the sequential minimum optimi... more In this brief, we give a new proof of the asymptotic convergence of the sequential minimum optimization (SMO) algorithm for both the most violating pair and second order rules to select the pair of coefficients to be updated. The proof is more self-contained, shorter, and simpler than previous ones and has a different flavor, partially building upon Gilbert's original convergence proof of its algorithm to solve the minimum norm problem for convex hulls. It is valid for both support vector classification (SVC) and support vector regression, which are formulated under a general problem that encompasses them. Moreover, this general problem can be further extended to also cover other support vector machines (SVM)-related problems such as -SVC or one-class SVMs, while the convergence proof of the slight variant of SMO needed for them remains basically unchanged.
Journal of Biomolecular Screening, 2013
High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated m... more High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated microscopy and is increasingly being adopted for small interfering RNA genomic screening and phenotypic drug discovery. We introduce a series of cell-based evaluation metrics that have been implemented and validated in a mono-parametric HCS for regulators of the membrane trafficking protein caveolin 1 (CAV1) and have also proved useful for the development of a multiparametric phenotypic HCS for regulators of cytoskeletal reorganization. Imaging metrics evaluate imaging quality such as staining and focus, whereas cell biology metrics are fuzzy logic-based evaluators describing complex biological parameters such as sparseness, confluency, and spreading. The evaluation metrics were implemented in a data-mining pipeline, which first filters out cells that do not pass a quality criterion based on imaging metrics and then uses cell biology metrics to stratify cell samples to allow further analysis of homogeneous cell populations. Use of these metrics significantly improved the robustness of the monoparametric assay tested, as revealed by an increase in Z' factor, Kolmogorov-Smirnov distance, and strict standard mean difference. Cell biology evaluation metrics were also implemented in a novel supervised learning classification method that combines them with phenotypic features in a statistical model that exceeded conventional classification methods, thus improving multiparametric phenotypic assay sensitivity.
Industrial Applications of Neural Networks, 1998
Lecture Notes in Computer Science, 1995
Fast automatic methods of architecture selection are of great interest for use in local, dynamica... more Fast automatic methods of architecture selection are of great interest for use in local, dynamical modeling. A general procedure to select an optimal network architecture is proposed in the case of RBF nets. Taking as a starting point the universal approximation properties of RBF networks and the natural interpretation of their weights, a search for the simplest, best generalizing network
Lecture Notes in Computer Science, 1997
In one form or another, noise is present in almost any signal processing problem. Moreover, its p... more In one form or another, noise is present in almost any signal processing problem. Moreover, its profile is very often essentially unknown. Efficient automatic methods for its detection and filtering are thus very useful, and more so if they only rely on the signal internal structure. In this note we will shown how one such method can be obtained through
Neural Networks: Artificial Intelligence and Industrial Applications, 1995
The relat ionship between the quality of state space reconst ruction and the accuracy in time ser... more The relat ionship between the quality of state space reconst ruction and the accuracy in time series forecast ing is analyzed. The averaged scalar product of the dynamical syst em flow vectors has been used to give a degree of determinism to the selected state space reconst ruct ion. T his value helps dist inguish between those regions of t he state space where predictions will be accurate and those where they are not . A time series measur ed in an industri al environment where noise is present is used as an example. It is shown that prediction methods used to estimate futu re values play a less import ant role than a good reconstruction of t he state space itself.
Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Conference, 2009
Implicit Wiener series are a powerful tool to build Volterra representations of time series with ... more Implicit Wiener series are a powerful tool to build Volterra representations of time series with any degree of nonlinearity. A natural question is then whether higher order representations yield more useful models. In this work we shall study this question for ECoG data channel relationships in epileptic seizure recordings, considering whether quadratic representations yield more accurate classifiers than linear ones. To do so we first show how to derive statistical information on the Volterra coefficient distribution and how to construct seizure classification patterns over that information. As our results illustrate, a quadratic model seems to provide no advantages over a linear one. Nevertheless, we shall also show that the interpretability of the implicit Wiener series provides insights into the inter-channel relationships of the recordings.
2010 International Joint Conference on Neural Networks (IJCNN 2010), 2010
... Email: j.lopez@uam.es (Jorge López), al-varo.barbero@uam.es ( ´Alvaro Barbero), jose.r.dorron... more ... Email: j.lopez@uam.es (Jorge López), al-varo.barbero@uam.es ( ´Alvaro Barbero), jose.r.dorronsoro@iic.uam.es (José R. Dorronsoro). ... idea is to scale down the initial sample convex hulls by means of a parameter λ in such a way that the shape of the original hull is preserved. ...
Artificial Neural Networks and Machine Learning - ICANN 2012. 22nd International Conference on Artificial Neural Networks, 2012
Physical Review E, 1994
Abstract The averaged scalar product, P, of the dynamical system flow vectors evaluated along the... more Abstract The averaged scalar product, P, of the dynamical system flow vectors evaluated along the trajectory is shown to be a very simple, efficient, and useful quantity for the purpose of selecting adequate embedding dimensions and time delays. The effectiveness ...
Artificial Intelligence in Real-Time Control 1992, 1993
ABSTRACT In this paper a detailed description of the MIP project is presented. MIP (Intelligent P... more ABSTRACT In this paper a detailed description of the MIP project is presented. MIP (Intelligent Process Monitoring) is a Real-Time Expert System that monitors, diagnoses, and generates suggestions in real time about optimization and stability concerning the operation of a petrochemical plant. MIP software architecture is based on the concept of Blackboard. The Blackboard is the central mechanism for information exchange between the modules of the system, being also the only centralized knowledge representation scheme within the system. MIP uses a hierarchical knowledge representation with four levels of abstraction, different knowledge sources being responsible of maintaining each of these levels. MIP is deployed and in current use since March 1991 in an acrylonitrile plant of REPSOL QUIMICA, S.A. at Tarragona (Spain), with a reported success both from the technical and the economical point of view.
Lecture Notes in Computer Science, 2013
ABSTRACT The increasing importance of solar energy has made the accurate forecasting of radiation... more ABSTRACT The increasing importance of solar energy has made the accurate forecasting of radiation an important issue. In this work we apply Support Vector Regression to downscale and improve 3-hour accumulated radiation forecasts for two locations in Spain. We use either direct 3-hour SVR-refined forecasts or we build first global accumulated daily predictions and disaggregate them into 3-hour values, with both approaches outperforming the base forecasts. We also interpolate the 3-hour forecasts into hourly values using a clear sky radiation model. Here again the disaggregated SVR forecast perform better than the base ones, but the SVR advantage is now less marked. This may be because of the clear sky assumption made for interpolation not being adequate for cloudy days or because of the underlying clear sky model not being adequate enough. In any case, our study shows that machine learning methods or, more generally, hybrid artificial intelligence systems are quite relevant for solar energy prediction.
Lecture Notes in Computer Science, 2012
ABSTRACT In this work we will apply sparse linear regression methods to forecast wind farm energy... more ABSTRACT In this work we will apply sparse linear regression methods to forecast wind farm energy production using numerical weather prediction (NWP) features over several pressure levels, a problem where pattern dimension can become very large. We shall place sparse regression in the context of proximal optimization, which we shall briefly review, and we shall show how sparse methods outperform other models while at the same time shedding light on the most relevant NWP features and on their predictive structure.
Neural Computation, 1993
We present two methods for the prediction of coupled time series. The first one is based on model... more We present two methods for the prediction of coupled time series. The first one is based on modeling the series by a dynamic system with a polynomial format. This method can be formulated in terms of learning in a recurrent network, for which we give a computationally effective algorithm. ...
Journal of Biomolecular Screening, 2013
High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated m... more High-content screening (HCS) allows the exploration of complex cellular phenotypes by automated microscopy and is increasingly being adopted for small interfering RNA genomic screening and phenotypic drug discovery. We introduce a series of cell-based evaluation metrics that have been implemented and validated in a mono-parametric HCS for regulators of the membrane trafficking protein caveolin 1 (CAV1) and have also proved useful for the development of a multiparametric phenotypic HCS for regulators of cytoskeletal reorganization. Imaging metrics evaluate imaging quality such as staining and focus, whereas cell biology metrics are fuzzy logic-based evaluators describing complex biological parameters such as sparseness, confluency, and spreading. The evaluation metrics were implemented in a data-mining pipeline, which first filters out cells that do not pass a quality criterion based on imaging metrics and then uses cell biology metrics to stratify cell samples to allow further analysis of homogeneous cell populations. Use of these metrics significantly improved the robustness of the monoparametric assay tested, as revealed by an increase in Z' factor, Kolmogorov-Smirnov distance, and strict standard mean difference. Cell biology evaluation metrics were also implemented in a novel supervised learning classification method that combines them with phenotypic features in a statistical model that exceeded conventional classification methods, thus improving multiparametric phenotypic assay sensitivity.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000
This paper reports on the evaluation of different machine learning techniques for the automated c... more This paper reports on the evaluation of different machine learning techniques for the automated classification of coding gene sequences obtained from several organisms in terms of their functional role as adhesins. Diverse, biologically-meaningful, sequence-based features were extracted from the sequences and used as inputs to the in silico prediction models. Another contribution of this work is the generation of potentially novel and testable predictions about the surface protein DGF-1 family in Trypanosoma cruzi. Finally, these techniques are potentially useful for the automated annotation of known adhesin-like proteins from the trans-sialidase surface protein family in T. cruzi, the etiological agent of Chagas disease.