Marcos Gestal | Universidade da Coruña (original) (raw)
Papers by Marcos Gestal
PeerJ, 2016
The design of experiments and the validation of the results achieved with them are vital in any r... more The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant a...
Engineering Computations, 2016
Purpose – The purpose of this paper is to assess the quality of commercial lubricant oils. A spec... more Purpose – The purpose of this paper is to assess the quality of commercial lubricant oils. A spectroscopic method was used in combination with multivariate regression techniques (ordinary multivariate multiple regression, principal components analysis, partial least squares, and support vector regression (SVR)). Design/methodology/approach – The rationale behind the use of SVR was the fuzzy characteristics of the signal and its inherent ability to find nonlinear, global solutions in highly complex dimensional input spaces. Thus, SVR allows extracting useful information from calibration samples that makes it possible to characterize physical-chemical properties of the lubricant oils. Findings – A dataset of 42 spectra measured from oil standards was studied to assess the concentration of copper into the oils and, thus, evaluate the wearing of the machinery. It was found that the use of SVR was very advantageous to get a regression model. Originality/value – The use of genetic algorit...
Soft Computing, 2015
The interpretation of the results in a classification problem can be enhanced, specially in image... more The interpretation of the results in a classification problem can be enhanced, specially in image texture analysis problems, by feature selection techniques, knowing which features contribute more to the classification performance. This paper presents an evaluation of a number of feature selection techniques for classification in a biomedical image texture dataset (2-DE gel images), with the aim Communicated by I. R. Ruiz.
Given the background of the use of Neural Networks in problems of apple juice classification, thi... more Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.
ABSTRACT 2.1 INTRODUCCIÓN A medida que transcurre el tiempo, la complejidad de los problemas abor... more ABSTRACT 2.1 INTRODUCCIÓN A medida que transcurre el tiempo, la complejidad de los problemas abordados desde las diferentes ramas de la Ciencia, crece de manera ininterrumpida. Parejo a este crecimiento va el tiempo y el esfuerzo requerido para la resolución de dichos problemas mediante técnicas clásicas, bien porque en principio no se conoce la manera de obtener la solución, o bien porque, aún conociéndola, el nivel de complejidad de su aplicación también sea elevado. Sin embargo, observando detenidamente nuestro entorno puede encontrarse la solución a esta problemática. Quizá el mayor reto que cualquier sistema pueda plantear sea la supervivencia de los organismos o especies que habitan dicho sistema. Y la Naturaleza lleva proporcionando multitud soluciones válidas a dicho reto desde el inicio de los días [1].
Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological... more Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological processes. Among that techniques, one of the most used are Genetic Programming. They follow the Darwin Law's to find the solution of a problem. The overall objective of this book is to provide a roadmap with which to find the way toward finding computational solutions to various problems. To this end, a general introduction is presented of the methods and techniques associated with the Genetic Programming technique. In this context, this book ought not be viewed as an exhaustive or comprehensive analysis of the techniques presented herein. Instead, it should be considered as a reference that introduces the terminology, key concepts, and basic bibliography that can serve as a starting point such that the reader will be better equipped to subsequently pursue more deeply those topics that may be of special interest. In particular, this book is designed for those people who are interested in new approaches to problem solving as well for researchers who hope to initiate research directions along the lines of evolutionary computing or closely connected areas.
Current Topics in Medicinal Chemistry, 2013
Journal of Theoretical Biology, 2014
The cell death (CD) is a dynamic biological function involved in physiological and pathological p... more The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.
Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extr... more Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of biomedical data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Therefore, a novel approach based on genetic programming with the aim of weighting the importance of the different variables contained in the data generated in the biomedical field is presented. This approach was applied to SNP data from Galician schizophrenia patients, showing the possibilities it offers.
In this paper, the influence of textural information is studied in two-dimensional electrophoresi... more In this paper, the influence of textural information is studied in two-dimensional electrophoresis gel images. A Genetic Algorithm-based feature selection technique is used in order to select the most representative textural features and reduced the original set (296 feat.) to a more efficient subset. Such a method makes use of a Support Vector Machines classifier. Different experiments have been performed, the pattern set has been divided into two parts (training and validation) extracting a total of 30%, 20% and 0% of the training data, and a 10-fold cross validation is used for validation. In case of extracting 0% means that training set is used for validation. For each division 10 different trials have been done. Experiments have been carried out in order to measure the behaviour of the system and to achieve the most representative textural features for the classification of proteins in two-dimensional gel electrophoresis images. This information can be useful for a protein segm...
PeerJ, 2016
The design of experiments and the validation of the results achieved with them are vital in any r... more The design of experiments and the validation of the results achieved with them are vital in any research study. This paper focuses on the use of different Machine Learning approaches for regression tasks in the field of Computational Intelligence and especially on a correct comparison between the different results provided for different methods, as those techniques are complex systems that require further study to be fully understood. A methodology commonly accepted in Computational intelligence is implemented in an R package called RRegrs. This package includes ten simple and complex regression models to carry out predictive modeling using Machine Learning and well-known regression algorithms. The framework for experimental design presented herein is evaluated and validated against RRegrs. Our results are different for three out of five state-of-the-art simple datasets and it can be stated that the selection of the best model according to our proposal is statistically significant a...
Engineering Computations, 2016
Purpose – The purpose of this paper is to assess the quality of commercial lubricant oils. A spec... more Purpose – The purpose of this paper is to assess the quality of commercial lubricant oils. A spectroscopic method was used in combination with multivariate regression techniques (ordinary multivariate multiple regression, principal components analysis, partial least squares, and support vector regression (SVR)). Design/methodology/approach – The rationale behind the use of SVR was the fuzzy characteristics of the signal and its inherent ability to find nonlinear, global solutions in highly complex dimensional input spaces. Thus, SVR allows extracting useful information from calibration samples that makes it possible to characterize physical-chemical properties of the lubricant oils. Findings – A dataset of 42 spectra measured from oil standards was studied to assess the concentration of copper into the oils and, thus, evaluate the wearing of the machinery. It was found that the use of SVR was very advantageous to get a regression model. Originality/value – The use of genetic algorit...
Soft Computing, 2015
The interpretation of the results in a classification problem can be enhanced, specially in image... more The interpretation of the results in a classification problem can be enhanced, specially in image texture analysis problems, by feature selection techniques, knowing which features contribute more to the classification performance. This paper presents an evaluation of a number of feature selection techniques for classification in a biomedical image texture dataset (2-DE gel images), with the aim Communicated by I. R. Ruiz.
Given the background of the use of Neural Networks in problems of apple juice classification, thi... more Given the background of the use of Neural Networks in problems of apple juice classification, this paper aim at implementing a newly developed method in the field of machine learning: the Support Vector Machines (SVM). Therefore, a hybrid model that combines genetic algorithms and support vector machines is suggested in such a way that, when using SVM as a fitness function of the Genetic Algorithm (GA), the most representative variables for a specific classification problem can be selected.
ABSTRACT 2.1 INTRODUCCIÓN A medida que transcurre el tiempo, la complejidad de los problemas abor... more ABSTRACT 2.1 INTRODUCCIÓN A medida que transcurre el tiempo, la complejidad de los problemas abordados desde las diferentes ramas de la Ciencia, crece de manera ininterrumpida. Parejo a este crecimiento va el tiempo y el esfuerzo requerido para la resolución de dichos problemas mediante técnicas clásicas, bien porque en principio no se conoce la manera de obtener la solución, o bien porque, aún conociéndola, el nivel de complejidad de su aplicación también sea elevado. Sin embargo, observando detenidamente nuestro entorno puede encontrarse la solución a esta problemática. Quizá el mayor reto que cualquier sistema pueda plantear sea la supervivencia de los organismos o especies que habitan dicho sistema. Y la Naturaleza lleva proporcionando multitud soluciones válidas a dicho reto desde el inicio de los días [1].
Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological... more Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological processes. Among that techniques, one of the most used are Genetic Programming. They follow the Darwin Law's to find the solution of a problem. The overall objective of this book is to provide a roadmap with which to find the way toward finding computational solutions to various problems. To this end, a general introduction is presented of the methods and techniques associated with the Genetic Programming technique. In this context, this book ought not be viewed as an exhaustive or comprehensive analysis of the techniques presented herein. Instead, it should be considered as a reference that introduces the terminology, key concepts, and basic bibliography that can serve as a starting point such that the reader will be better equipped to subsequently pursue more deeply those topics that may be of special interest. In particular, this book is designed for those people who are interested in new approaches to problem solving as well for researchers who hope to initiate research directions along the lines of evolutionary computing or closely connected areas.
Current Topics in Medicinal Chemistry, 2013
Journal of Theoretical Biology, 2014
The cell death (CD) is a dynamic biological function involved in physiological and pathological p... more The cell death (CD) is a dynamic biological function involved in physiological and pathological processes. Due to the complexity of CD, there is a demand for fast theoretical methods that can help to find new CD molecular targets. The current work presents the first classification model to predict CD-related proteins based on Markov Mean Properties. These protein descriptors have been calculated with the MInD-Prot tool using the topological information of the amino acid contact networks of the 2423 protein chains, five atom physicochemical properties and the protein 3D regions. The Machine Learning algorithms from Weka were used to find the best classification model for CD-related protein chains using all 20 attributes. The most accurate algorithm to solve this problem was K*. After several feature subset methods, the best model found is based on only 11 variables and is characterized by the Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.992 and the true positive rate (TP Rate) of 88.2% (validation set). 7409 protein chains labeled with "unknown function" in the PDB Databank were analyzed with the best model in order to predict the CD-related biological activity. Thus, several proteins have been predicted to have CD-related function in Homo sapiens: 3DRX-involved in virus-host interaction biological process, protein homooligomerization; 4DWF-involved in cell differentiation, chromatin modification, DNA damage response, protein stabilization; 1IUR-involved in ATP binding, chaperone binding; 1J7D-involved in DNA double-strand break processing, histone ubiquitination, nucleotide-binding oligomerization; 1UTU-linked with DNA repair, regulation of transcription; 3EEC-participating to the cellular membrane organization, egress of virus within host cell, class mediator resulting in cell cycle arrest, negative regulation of ubiquitin-protein ligase activity involved in mitotic cell cycle and apoptotic process. Other proteins from bacteria predicted as CD-related are 2G3V - a CAG pathogenicity island protein 13 from Helicobacter pylori, 4G5A - a hypothetical protein in Bacteroides thetaiotaomicron, 1YLK-involved in the nitrogen metabolism of Mycobacterium tuberculosis, and 1XSV - with possible DNA/RNA binding domains. The results demonstrated the possibility to predict CD-related proteins using molecular information encoded into the protein 3D structure. Thus, the current work demonstrated the possibility to predict new molecular targets involved in cell-death processes.
Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extr... more Data mining, a part of the Knowledge Discovery in Databases process (KDD), is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. Analyses of biomedical data have evolved towards genome-wide and high-throughput approaches, thus generating great amounts of data for which data mining is essential. Therefore, a novel approach based on genetic programming with the aim of weighting the importance of the different variables contained in the data generated in the biomedical field is presented. This approach was applied to SNP data from Galician schizophrenia patients, showing the possibilities it offers.
In this paper, the influence of textural information is studied in two-dimensional electrophoresi... more In this paper, the influence of textural information is studied in two-dimensional electrophoresis gel images. A Genetic Algorithm-based feature selection technique is used in order to select the most representative textural features and reduced the original set (296 feat.) to a more efficient subset. Such a method makes use of a Support Vector Machines classifier. Different experiments have been performed, the pattern set has been divided into two parts (training and validation) extracting a total of 30%, 20% and 0% of the training data, and a 10-fold cross validation is used for validation. In case of extracting 0% means that training set is used for validation. For each division 10 different trials have been done. Experiments have been carried out in order to measure the behaviour of the system and to achieve the most representative textural features for the classification of proteins in two-dimensional gel electrophoresis images. This information can be useful for a protein segm...
Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological... more Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological processes. Among that techniques, one of the most used are genetic algorithms. They follow the Darwin Law's to find the solution of a problem. The overall objective of this book is to provide a "roadmap" with which to find the way toward finding computational solutions to various problems. To this end, a general introduction is presented of the methods and techniques associated with genetic algorithms. In this context, this book ought not be viewed as an exhaustive or comprehensive analysis of the techniques presented herein. Instead, it should be considered as a reference that introduces the terminology, key concepts, and basic bibliography that can serve as a starting point such that the reader will be better equipped to subsequently pursue more deeply those topics that may be of special interest. In particular, this book is designed for those people who are interested in new approaches to problem solving as well for researchers who hope to initiate research directions along the lines of evolutionary computing or closely connected areas.
Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological... more Evolutionary computation can be viewed as a set of techniques conceptually inspired by biological processes. Among that techniques, one of the most used are Genetic Programming. They follow the Darwin Law's to find the solution of a problem. The overall objective of this book is to provide a “roadmap” with which to find the way toward finding computational solutions to various problems. To this end, a general introduction is presented of the methods and techniques associated with the Genetic Programming technique. In this context, this book ought not be viewed as an exhaustive or comprehensive analysis of the techniques presented herein. Instead, it should be considered as a reference that introduces the terminology, key concepts, and basic bibliography that can serve as a starting point such that the reader will be better equipped to subsequently pursue more deeply those topics that may be of special interest. In particular, this book is designed for those people who are interested in new approaches to problem solving as well for researchers who hope to initiate research directions along the lines of evolutionary computing or closely connected areas.