Basilio Sierra - Academia.edu (original) (raw)
Papers by Basilio Sierra
Sensors
Coffee Leaf Rust (CLR) is a fungal epidemic disease that has been affecting coffee trees around t... more Coffee Leaf Rust (CLR) is a fungal epidemic disease that has been affecting coffee trees around the world since the 1980s. The early diagnosis of CLR would contribute strategically to minimize the impact on the crops and, therefore, protect the farmers’ profitability. In this research, a cyber-physical data-collection system was developed, by integrating Remote Sensing and Wireless Sensor Networks, to gather data, during the development of the CLR, on a test bench coffee-crop. The system is capable of automatically collecting, structuring, and locally and remotely storing reliable multi-type data from different field sensors, Red-Green-Blue (RGB) and multi-spectral cameras (RE and RGN). In addition, a data-visualization dashboard was implemented to monitor the data-collection routines in real-time. The operation of the data collection system allowed to create a three-month size dataset that can be used to train CLR diagnosis machine learning models. This result validates that the de...
In this paper we present a multiclassifier approach for multilabel document classification proble... more In this paper we present a multiclassifier approach for multilabel document classification problems, where a set of k-NN classifiers is used to predict the category of text documents based on different training subsampling databases. These databases are obtained from the original training database by random subsampling. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimension vector representation obtained by Singular Value Decomposition (SVD) is used for training and testing documents. The good results of our experiments give an indication of the potentiality of the proposed approach. 1
Applied Sciences, 2020
In industrial applications of data science and machine learning, most of the steps of a typical p... more In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process.
Applied Sciences, 2020
Agricultural activity has always been threatened by the presence of pests and diseases that preve... more Agricultural activity has always been threatened by the presence of pests and diseases that prevent the proper development of crops and negatively affect the economy of farmers. One of these pests is Coffee Leaf Rust (CLR), which is a fungal epidemic disease that affects coffee trees and causes massive defoliation. As an example, this disease has been affecting coffee trees in Colombia (the third largest producer of coffee worldwide) since the 1980s, leading to devastating losses between 70% and 80% of the harvest. Failure to detect pathogens at an early stage can result in infestations that cause massive destruction of plantations and significantly damage the commercial value of the products. The most common way to detect this disease is by walking through the crop and performing a human visual inspection. As a result of this problem, different research studies have proven that technological methods can help to identify these pathogens. Our contribution is an experiment that includ...
Engineering Applications of Artificial Intelligence, 2015
In this paper a new machine learning approach is presented to deal with the coreference resolutio... more In this paper a new machine learning approach is presented to deal with the coreference resolution task. This approach consists of a multi-classifier system that classifies mention-pairs in a reduced dimensional vector space. The vector representation for mention-pairs is generated using a rich set of linguistic features. The (Singular Value Decomposition) SVD technique is used to generate the reduced dimensional vector space. The approach is applied to the OntoNotes v4.0 Release Corpus for the column-format files used in CONLL-2011 coreference resolution shared task. The results obtained show that the reduced dimensional representation obtained by SVD is very adequate to appropriately classify mention-pair vectors. Moreover, it can be stated that the multi-classifier plays an important role in improving the results.
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015
In this paper a different machine learning approach is presented to deal with the coreference res... more In this paper a different machine learning approach is presented to deal with the coreference resolution task. This approach consists of a multi-classifier system that classifies mention-pairs in a reduced dimensional vector space. The vector representation for mentionpairs is generated using a rich set of linguistic features. The SVD technique is used to generate the reduced dimensional vector space. The approach is applied to the OntoNotes v4.0 Release Corpus for the column-format files used in CONLL-2011 coreference resolution shared task. The results obtained show that the reduced dimensional representation obtained by SVD is very adequate to appropriately classify mention-pair vectors. Moreover, we can state that the multi-classifier plays an important role in improving the results.
The continuously growing amount of multimedia content has enabled the application of image conten... more The continuously growing amount of multimedia content has enabled the application of image content retrieval solutions to different domains. Botanical scientists are working on the classification of plant species in order to infer the relevant knowledge that permits them going forward in their environmental researches. The manual annotation of the existing and newly creation plants datasets is an outsized task that is becoming more and more tedious with the daily incorporation of new images. In this paper we present an automatic system for the identification of plants based on not only the content of images but also on the metadata associated to them. The classification has been defined as a classification plus fusion solution, where the images representing different parts of a plant have been considered independently. The promising results bring to light the chances of the application computer vision solutions to botanical domain.
IEEE Transactions on Multimedia, 2014
Context categorization is a fundamental prerequisite for multi-domain multimedia content analysis... more Context categorization is a fundamental prerequisite for multi-domain multimedia content analysis applications. Most feature extraction methods require prior knowledge to decide if they are suitable for a specific domain and to optimize their input parameters. In this paper, we introduce a new color image context categorization method (DITEC) based on the trace transform. We also analyze the distortions produce by the parameters that determine the sampling of the discrete trace transform. The problem of dimensionality reduction of the obtained trace transform signal is addressed through statistical descriptors that keep the underlying information. Moreover, Feature Subset Selection (FSS) is applied to both, improve the classification performance and compact the final length of the descriptor. These extracted features offer a highly discriminant behavior for content categorization. The theoretical properties of the method are analyzed and experimentally validated through two different datasets.
Detection and description of keypoints from an image is a well-studied problem in Computer Vision... more Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.
Detection and description of keypoints from an image is a well-studied problem in Computer Vision... more Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.
Lecture Notes in Computer Science, 2010
In this paper we present the first machine learning approach to resolve the pronominal anaphora i... more In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI, in order to make a general comparison. We determine the feature vector obtained with our linguistic processing system and we analyze the contribution of different subsets of features, as well as the weight of each feature used in the task.
Applied Soft Computing, 2011
This article presents a multiclassifier approach for multiclass/multilabel document categorizatio... more This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database. To perform multilabeling classifications, a new approach based on Bayesian weighted voting is also presented. The good results obtained in the experiments give an indication of the potential of the proposed approach.
Proceedings of the 4th …, 2007
In this article a multiclassifier approach for word sense disambiguation (WSD) problems is presen... more In this article a multiclassifier approach for word sense disambiguation (WSD) problems is presented, where a set of k-NN classifiers is used to predict the category (sense) of each word. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimensional vector representation obtained by Singular Value Decomposition (SVD) is used. Each word is considered an independent classification problem, and so different parameter setting, selected after a tuning phase, is applied to each word. The approach has been applied to the lexical sample WSD subtask of SemEval 2007 (task 17).
This paper analyzes the incidence that dimensionality reduction techniques have in the process of... more This paper analyzes the incidence that dimensionality reduction techniques have in the process of text categorization of documents written in Basque. Classification techniques such as Naïve Bayes, Winnow, SVMs and k-NN have been selected. The Singular Value Decomposition (SVD) dimensionality reduction technique together with lemmatization and noun selection have been used in our experiments. The results obtained show that the approach which combines SVD and k-NN for a lemmatized corpus gives the best accuracy rates of all with a remarkable difference.
Proceedings of the Eighth International Conference on Computational Semantics - IWCS-8 '09, 2009
In this paper a multiclassifier based approach is presented for a word sense disambiguation (WSD)... more In this paper a multiclassifier based approach is presented for a word sense disambiguation (WSD) problem. A vector representation is used for training and testing cases and the Singular Value Decomposition (SVD) technique is applied to reduce the dimension of the representation. The approach we present consists in creating a set of k-NN classifiers and combining the predictions generated in order to give a final word sense prediction for each case to be classified. The combination is done by applying a Bayesian voting scheme. The approach has been applied to a database of 100 words made available by the lexical sample WSD subtask of SemEval-2007 (task 17) organizers. Each of the words was considered an independent classification problem. A methodological parameter tuning phase was applied in order to optimize parameter setting for each word. Results achieved are among the best and make the approach encouraging to apply to other WSD tasks.
Intelligent Autonomous Systems, 2000
There are different approaches to mobile robot navigation. Landmark-based localization has shown ... more There are different approaches to mobile robot navigation. Landmark-based localization has shown to be the alternative to simple dead-reckoning, but often land- marks are environmental specific, and recognition algorithms are computationally very expensive. This paper presents an approach to landmark-based navigation using emer- gency exit pannels and corridors as cues, without odometric information. Experiments are carried out to verify appart
In this paper we present an appearance-based method to be used for topologically localising the r... more In this paper we present an appearance-based method to be used for topologically localising the robot. The information extracted from the image sequences is just an approximation of the colour probability density function estimated using a non-parametric clustering paradigm, the Self-organising Map neural network, together with the information obtained from the segmentation of a single image, i.e the pixel ratio
Knowledge-Based Intelligent Information & Engineering Systems, 2000
Abstract: The k-Nearest-Neighbour (KNN) decision rule has often been used inPattern Recognition p... more Abstract: The k-Nearest-Neighbour (KNN) decision rule has often been used inPattern Recognition problems. One of the diculties that arises when utilizingthis technique is that each of the labeled samples is given equal importance indeciding the class memberships of the pattern to be classied, regardless of the"typicalness" of each of the neighbours.
International Conference on Enterprise Information Systems, 2001
Sensors
Coffee Leaf Rust (CLR) is a fungal epidemic disease that has been affecting coffee trees around t... more Coffee Leaf Rust (CLR) is a fungal epidemic disease that has been affecting coffee trees around the world since the 1980s. The early diagnosis of CLR would contribute strategically to minimize the impact on the crops and, therefore, protect the farmers’ profitability. In this research, a cyber-physical data-collection system was developed, by integrating Remote Sensing and Wireless Sensor Networks, to gather data, during the development of the CLR, on a test bench coffee-crop. The system is capable of automatically collecting, structuring, and locally and remotely storing reliable multi-type data from different field sensors, Red-Green-Blue (RGB) and multi-spectral cameras (RE and RGN). In addition, a data-visualization dashboard was implemented to monitor the data-collection routines in real-time. The operation of the data collection system allowed to create a three-month size dataset that can be used to train CLR diagnosis machine learning models. This result validates that the de...
In this paper we present a multiclassifier approach for multilabel document classification proble... more In this paper we present a multiclassifier approach for multilabel document classification problems, where a set of k-NN classifiers is used to predict the category of text documents based on different training subsampling databases. These databases are obtained from the original training database by random subsampling. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimension vector representation obtained by Singular Value Decomposition (SVD) is used for training and testing documents. The good results of our experiments give an indication of the potentiality of the proposed approach. 1
Applied Sciences, 2020
In industrial applications of data science and machine learning, most of the steps of a typical p... more In industrial applications of data science and machine learning, most of the steps of a typical pipeline focus on optimizing measures of model fitness to the available data. Data preprocessing, instead, is often ad-hoc, and not based on the optimization of quantitative measures. This paper proposes the use of optimization in the preprocessing step, specifically studying a time series joining methodology, and introduces an error function to measure the adequateness of the joining. Experiments show how the method allows monitoring preprocessing errors for different time slices, indicating when a retraining of the preprocessing may be needed. Thus, this contribution helps quantifying the implications of data preprocessing on the result of data analysis and machine learning methods. The methodology is applied to two case studies: synthetic simulation data with controlled distortions, and a real scenario of an industrial process.
Applied Sciences, 2020
Agricultural activity has always been threatened by the presence of pests and diseases that preve... more Agricultural activity has always been threatened by the presence of pests and diseases that prevent the proper development of crops and negatively affect the economy of farmers. One of these pests is Coffee Leaf Rust (CLR), which is a fungal epidemic disease that affects coffee trees and causes massive defoliation. As an example, this disease has been affecting coffee trees in Colombia (the third largest producer of coffee worldwide) since the 1980s, leading to devastating losses between 70% and 80% of the harvest. Failure to detect pathogens at an early stage can result in infestations that cause massive destruction of plantations and significantly damage the commercial value of the products. The most common way to detect this disease is by walking through the crop and performing a human visual inspection. As a result of this problem, different research studies have proven that technological methods can help to identify these pathogens. Our contribution is an experiment that includ...
Engineering Applications of Artificial Intelligence, 2015
In this paper a new machine learning approach is presented to deal with the coreference resolutio... more In this paper a new machine learning approach is presented to deal with the coreference resolution task. This approach consists of a multi-classifier system that classifies mention-pairs in a reduced dimensional vector space. The vector representation for mention-pairs is generated using a rich set of linguistic features. The (Singular Value Decomposition) SVD technique is used to generate the reduced dimensional vector space. The approach is applied to the OntoNotes v4.0 Release Corpus for the column-format files used in CONLL-2011 coreference resolution shared task. The results obtained show that the reduced dimensional representation obtained by SVD is very adequate to appropriately classify mention-pair vectors. Moreover, it can be stated that the multi-classifier plays an important role in improving the results.
Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, 2015
In this paper a different machine learning approach is presented to deal with the coreference res... more In this paper a different machine learning approach is presented to deal with the coreference resolution task. This approach consists of a multi-classifier system that classifies mention-pairs in a reduced dimensional vector space. The vector representation for mentionpairs is generated using a rich set of linguistic features. The SVD technique is used to generate the reduced dimensional vector space. The approach is applied to the OntoNotes v4.0 Release Corpus for the column-format files used in CONLL-2011 coreference resolution shared task. The results obtained show that the reduced dimensional representation obtained by SVD is very adequate to appropriately classify mention-pair vectors. Moreover, we can state that the multi-classifier plays an important role in improving the results.
The continuously growing amount of multimedia content has enabled the application of image conten... more The continuously growing amount of multimedia content has enabled the application of image content retrieval solutions to different domains. Botanical scientists are working on the classification of plant species in order to infer the relevant knowledge that permits them going forward in their environmental researches. The manual annotation of the existing and newly creation plants datasets is an outsized task that is becoming more and more tedious with the daily incorporation of new images. In this paper we present an automatic system for the identification of plants based on not only the content of images but also on the metadata associated to them. The classification has been defined as a classification plus fusion solution, where the images representing different parts of a plant have been considered independently. The promising results bring to light the chances of the application computer vision solutions to botanical domain.
IEEE Transactions on Multimedia, 2014
Context categorization is a fundamental prerequisite for multi-domain multimedia content analysis... more Context categorization is a fundamental prerequisite for multi-domain multimedia content analysis applications. Most feature extraction methods require prior knowledge to decide if they are suitable for a specific domain and to optimize their input parameters. In this paper, we introduce a new color image context categorization method (DITEC) based on the trace transform. We also analyze the distortions produce by the parameters that determine the sampling of the discrete trace transform. The problem of dimensionality reduction of the obtained trace transform signal is addressed through statistical descriptors that keep the underlying information. Moreover, Feature Subset Selection (FSS) is applied to both, improve the classification performance and compact the final length of the descriptor. These extracted features offer a highly discriminant behavior for content categorization. The theoretical properties of the method are analyzed and experimentally validated through two different datasets.
Detection and description of keypoints from an image is a well-studied problem in Computer Vision... more Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.
Detection and description of keypoints from an image is a well-studied problem in Computer Vision... more Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.
Lecture Notes in Computer Science, 2010
In this paper we present the first machine learning approach to resolve the pronominal anaphora i... more In this paper we present the first machine learning approach to resolve the pronominal anaphora in Basque language. In this work we consider different classifiers in order to find the system that fits best to the characteristics of the language under examination. We do not restrict our study to the classifiers typically used for this task, we have considered others, such as Random Forest or VFI, in order to make a general comparison. We determine the feature vector obtained with our linguistic processing system and we analyze the contribution of different subsets of features, as well as the weight of each feature used in the task.
Applied Soft Computing, 2011
This article presents a multiclassifier approach for multiclass/multilabel document categorizatio... more This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database. To perform multilabeling classifications, a new approach based on Bayesian weighted voting is also presented. The good results obtained in the experiments give an indication of the potential of the proposed approach.
Proceedings of the 4th …, 2007
In this article a multiclassifier approach for word sense disambiguation (WSD) problems is presen... more In this article a multiclassifier approach for word sense disambiguation (WSD) problems is presented, where a set of k-NN classifiers is used to predict the category (sense) of each word. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimensional vector representation obtained by Singular Value Decomposition (SVD) is used. Each word is considered an independent classification problem, and so different parameter setting, selected after a tuning phase, is applied to each word. The approach has been applied to the lexical sample WSD subtask of SemEval 2007 (task 17).
This paper analyzes the incidence that dimensionality reduction techniques have in the process of... more This paper analyzes the incidence that dimensionality reduction techniques have in the process of text categorization of documents written in Basque. Classification techniques such as Naïve Bayes, Winnow, SVMs and k-NN have been selected. The Singular Value Decomposition (SVD) dimensionality reduction technique together with lemmatization and noun selection have been used in our experiments. The results obtained show that the approach which combines SVD and k-NN for a lemmatized corpus gives the best accuracy rates of all with a remarkable difference.
Proceedings of the Eighth International Conference on Computational Semantics - IWCS-8 '09, 2009
In this paper a multiclassifier based approach is presented for a word sense disambiguation (WSD)... more In this paper a multiclassifier based approach is presented for a word sense disambiguation (WSD) problem. A vector representation is used for training and testing cases and the Singular Value Decomposition (SVD) technique is applied to reduce the dimension of the representation. The approach we present consists in creating a set of k-NN classifiers and combining the predictions generated in order to give a final word sense prediction for each case to be classified. The combination is done by applying a Bayesian voting scheme. The approach has been applied to a database of 100 words made available by the lexical sample WSD subtask of SemEval-2007 (task 17) organizers. Each of the words was considered an independent classification problem. A methodological parameter tuning phase was applied in order to optimize parameter setting for each word. Results achieved are among the best and make the approach encouraging to apply to other WSD tasks.
Intelligent Autonomous Systems, 2000
There are different approaches to mobile robot navigation. Landmark-based localization has shown ... more There are different approaches to mobile robot navigation. Landmark-based localization has shown to be the alternative to simple dead-reckoning, but often land- marks are environmental specific, and recognition algorithms are computationally very expensive. This paper presents an approach to landmark-based navigation using emer- gency exit pannels and corridors as cues, without odometric information. Experiments are carried out to verify appart
In this paper we present an appearance-based method to be used for topologically localising the r... more In this paper we present an appearance-based method to be used for topologically localising the robot. The information extracted from the image sequences is just an approximation of the colour probability density function estimated using a non-parametric clustering paradigm, the Self-organising Map neural network, together with the information obtained from the segmentation of a single image, i.e the pixel ratio
Knowledge-Based Intelligent Information & Engineering Systems, 2000
Abstract: The k-Nearest-Neighbour (KNN) decision rule has often been used inPattern Recognition p... more Abstract: The k-Nearest-Neighbour (KNN) decision rule has often been used inPattern Recognition problems. One of the diculties that arises when utilizingthis technique is that each of the labeled samples is given equal importance indeciding the class memberships of the pattern to be classied, regardless of the"typicalness" of each of the neighbours.
International Conference on Enterprise Information Systems, 2001