Feature subset selection for learning preferences (original) (raw)
Related papers
Feature subset selection for learning preferences: a case study
2004
In this paper we tackle a real world problem, the search of a function to evaluate the merits of beef cattle as meat producers. The independent variables represent a set of live animals' measurements; while the outputs cannot be captured with a single number, since the available experts tend to assess each animal in a relative way, comparing animals with the other partners in the same batch. Therefore, this problem can not be solved by means of regression methods; our approach is to learn the preferences of the experts when they order small groups of animals. Thus, the problem can be reduced to a binary classification, and can be dealt with a Support Vector Machine (SVM) improved with the use of a feature subset selection (FSS) method. We develop a method based on Recursive Feature Elimination (RFE) that employs an adaptation of a metric based method devised for model selection (ADJ). Finally, we discuss the extension of the resulting method to more general settings, and provide a comparison with other possible alternatives.
Analysis of Supervised Feature Selection Techniques on Animal Husbandry Dataset
International Journal of Computer Applications, 2018
Data mining techniques have become an obvious need of today's high-dimensional animal industry data. In the last decade almost every aspect of animal related activities are being captured and stored either in local or central data repositories. Due to complex animal traits such as efficiency, growth, health, stress, behavior and adaptation, data mining is an area of challenge which can be optimally performed only with reduced number of relevant features. In this paper, a comparative analysis of various feature selection techniques based on some performance measuring parameter is presented using animal husbandry dataset. This research work finds J48 classifier to perform better in comparison to other traditional classification approaches.
Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM
2004
In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers' ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we had to use a special purpose SVM polynomial kernel. The training data set used collects the ratings of panels of experts and consumers; the meat was provided by 103 bovines of 7 Spanish breeds with different carcass weights and aging periods. Additionally, to gain insight into consumer preferences, we used feature subset selection tools. The result is that aging is the most important trait for improving consumers' appreciation of beef meat.
A Comprehensive Feature Selection Approach for Machine Learning
International Journal of Distributed Artificial Intelligence, 2021
In machine learning, it is required that the underlying important input variables are known or else the value of the predicted outcome variable would never match the value of the target outcome variable. Machine learning tools are used in many applications where the underlying scientific model is inadequate. Unfortunately, making any kind of mathematical relationship is difficult, and as a result, incorporation of variables during the training becomes a big issue as it affects the accuracy of results. Another important issue is to find the cause behind the phenomena and the major factor that affects the outcome variable. The aim of this article is to focus on developing an approach that is not particular-tool specific, but it gives accurate results under all circumstances. This paper proposes a model that filters out the irrelevant variables irrespective of the type of dataset that the researcher can use. This approach provides parameters for determining the quality of the data used...
Feature selection is one of the important issues in the domain of system modelling, data mining and pattern recognition. Subset selection evaluates a subset of features as a group for suitability prior to applying a learning algorithm. Subset selection algorithms can be broken into wrapper, filter and hybrid categories. Literatures surveyed related to this are given as follows..
Recursive Feature Selection with Significant Variables of Support Vectors
Computational and Mathematical Methods in Medicine, 2012
The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high-and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.
Analyzing Sensory Data Using Non-linear Preference Learning with Feature Subset Selection
2004
The quality of food can be assessed from different points of view. In this paper, we deal with those aspects that can be appreciated through sensory impressions. When we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers’ ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we postulate to learn from consumers’ preference judgments instead of using an approach based on regression. This requires the use of special purpose kernels and feature subset selection methods. We illustrate the benefits of our approach in two families of real-world data bases.
A Simple Evaluation Model for Feature Subset Selection Algorithms
INTELIGENCIA ARTIFICIAL, 2006
The aim of Feature Subset Selection -FSS -algorithms is to select a subset of features from the original set of features that describes a data set according to some importance criterion. To accomplish this task, FSS removes irrelevant and/or redundant features, as they may decrease data quality and reduce several of the desired properties of classifiers induced by supervised learning algorithms. As learning the best subset of features is an NP-hard problem, FSS algorithms generally use heuristics to select subsets. Therefore, it is important to empirically evaluate the performance of these algorithms. However, this evaluation needs to be multicriteria, i.e., it should take into account several properties. This work describes a simple model we have proposed to evaluate FSS algorithms which considers two properties, namely the predictive performance of the classifier induced using the subset of features selected by different FSS algorithms, as well as the reduction in the number of features. Another multicriteria performance evaluation model based on rankings, which makes it possible to consider any number of properties is also presented. The models are illustrated by their application to four well known FSS algorithms and two versions of a new FSS algorithm we have developed.
Wrappers for Feature Subset Selection
Artificial Intelligence, 1997
In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.
Feature selection plays an important role in the data mining process. It is needed to deal with the excessive number of features, which can become a computational burden on the learning algorithms. It is also necessary, even when computational resources are not scarce, since it improves the accuracy of the machine learning tasks, as we will see in the upcoming sections. In this review, we discuss the different feature selection approaches, and the relation between them and the various machine learning algorithms. This report tries to compare between the existing feature selection approaches. I wrote this report as part of my MSc. degree in data mining program in the University of East Anglia.