Rob Potharst - Academia.edu (original) (raw)
Papers by Rob Potharst
ABSTRACT The research of this thesis focuses on the development of methods for building intellige... more ABSTRACT The research of this thesis focuses on the development of methods for building intelligent classification systems. Classification in this thesis refers to the assignment of objects to one of several predefined groups or classes to which those objects may belong. This assignment is performed on the basis of a number of characteristics (or attributes) of these objects. A procedure that describes the assignment of a relevant object to some class in terms of the attribute values of that object is called a classifier or classification rule. This thesis considers classification rules of two types: decision tree classifiers and neural network classifiers. Both types are important alternatives to the classical classifiers based on a linear or quadratic discriminant function.
In many classification problems the domains of the attributes and the classes are linearly ordere... more In many classification problems the domains of the attributes and the classes are linearly ordered. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classification problems. We already provided order-preserving tree-generation algorithms for multi-attribute classification problems with k linearly ordered classes in a previous paper. For real-world datasets it is important to consider approximate solutions to handle problems like speed, tree-size and noise. In this report we develop a new decision tree algorithm that generates quasi-monotone decision trees. This algorithm outperforms classical algorithms such as those of Quinlan with respect to prediction, and beats algorithms that generate strictly monotone decision trees with respect to speed. This report contains proofs of all presented results.
Lecture Notes in Computer Science, 2001
ABSTRACT In this paper a newclassification algorithm based upon frequent patterns is proposed. A ... more ABSTRACT In this paper a newclassification algorithm based upon frequent patterns is proposed. A frequent pattern is a generalization of the concept of a frequent item set, used in association rule mining. First of all, the collection of frequent patterns in the training set is constructed. For each frequent pattern, the support and the confidence is determined and registered. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The proposed classification method makes direct use of this collection. This method turns out to be competitive with a well-known classifier like C4.5 and other comparable methods. For large data sets it seems to be a very appropriate method.
Lecture Notes in Computer Science, 1999
In many classiffication problems the domains of the attributes and the classes are linearly order... more In many classiffication problems the domains of the attributes and the classes are linearly orderded. For such problems the classiffication rule often needs to be order-preserving or monotone as we call it. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classiffication problems. We provide an order-preserving tree-generation algorithm for multi-attribute classiffication problems
Direct marketing firms want to transfer their message as efficiently as possible in order to obta... more Direct marketing firms want to transfer their message as efficiently as possible in order to obtain a profitable long-term relationship with individual customers. Much attention has been paid to address selection of existing customers and on identifying new profitable prospects. Less attention has been paid to the optimal frequency of the contacts with customers. We provide a decision support system that helps the direct mailer to determine mailing frequency for active customers. The system observes the mailing pattern of these customers in terms of the well known R(ecency), F(requency) and M(onetary) variables. The underlying model is based on an optimization model for the frequency of direct mailings. The system provides the direct mailer with tools to define preferred response behavior and advises the direct mailer on the mailing strategy that will steer the customers towards this preferred response behavior. 5001-6182 Business 5201-5982 Business Science Library of Congress Class...
European Journal of Operational Research, 2007
In this paper various ensemble learning methods from machine learning and statistics are consider... more In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ensemble versions of decision trees and the logistic regression model, which is a standard approach for this problem. The ensemble models are found to improve upon individual decision trees and outperform logistic regression. Next, an additive decomposition of the prediction error of a model, the bias/variance decomposition, is considered. A model with a high bias lacks the flexibility to fit the data well. A high variance indicates that a model is instable with respect to different datasets. Decision trees have a high variance component and a low bias component in the prediction error, whereas logistic regression has a high bias component and a low variance component. It is shown that ensemble methods aim at minimizing the variance component in the prediction error while leaving the bias component unaltered. Bias/variance decompositions for all models for both customer choice datasets are given to illustrate these concepts.
Engineering Applications of Artificial Intelligence, 2009
... Rob Potharst a , Arie Ben-David b , Corresponding Author Contact Information , E-mail The Cor... more ... Rob Potharst a , Arie Ben-David b , Corresponding Author Contact Information , E-mail The Corresponding Author and Michiel van Wezel a. ... Daniels and Kamp (1999) adapted a feed forward neural networks approach for producing monotone classifications. Kramer et al. ...
Decision Support Systems, 2008
We create a support system for predicting end prices on eBay. The end price predictions are based... more We create a support system for predicting end prices on eBay. The end price predictions are based on the item descriptions found in the item listings of eBay, and on some numerical item features. The system uses text mining and boosting algorithms from the field of machine learning. Our system substantially outperforms the naive method of predicting the category mean price. Moreover, interpretation of the model enables us to identify influential terms in the item descriptions and shows that the item description is more influential than the seller feedback rating, which was shown to be influential in earlier studies.
Report Econometric Institute Erasmus University Rotterdam, 2002
Direct marketing firms want to transfer their message as efficiently as possible in order to obta... more Direct marketing firms want to transfer their message as efficiently as possible in order to obtain a profitable long-term relationship with individual customers. Much attention has been paid to address selection of existing customers and on identifying new profitable prospects. Less attention has been paid to the optimal frequency of the contacts with customers. We provide a decision support system that helps the direct mailer to determine mailing frequency for active customers. The system observes the mailing pattern of these customers in terms of the well known R(ecency), F(requency) and M(onetary) variables. The underlying model is based on an optimization model for the frequency of direct mailings. The system provides the direct mailer with tools to define preferred response behavior and advises the direct mailer on the mailing strategy that will steer the customers towards this preferred response behavior.
Econometric Institute Report, 2009
The monotonicity constraint is a common side condition imposed on modeling problems as diverse as... more The monotonicity constraint is a common side condition imposed on modeling problems as diverse as hedonic pricing, personnel selection and credit rating. Experience tells us that it is not trivial to generate artificial data for supervised learning problems when the monotonicity constraint holds. Two algorithms are presented in this paper for such learning problems. The first one can be used to generate random monotone data sets without an underlying model, and the second can be used to generate monotone decision tree models. If needed, noise can be added to the generated data. The second algorithm makes use of the first one. Both algorithms are illustrated with an example.
Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three... more Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed.
Artificial Intelligence Research, 2015
Operations Research/Computer Science Interfaces Series, 1997
In this report we discuss the use of two simple classifiers to initialise the input-tohidden laye... more In this report we discuss the use of two simple classifiers to initialise the input-tohidden layer of a one-hidden-layer neural network. These classifiers divide the input space in convex regions that can be represented by membership functions. These functions are then used to determine the weights of the first layer of a feedforward network. Keywords and phrases: mapping decision trees onto neural networks, simple perceptrons, LVQ-networks, initialisation of feedforward networks.
Business Applications and Computational Intelligence
Starting with a review of some classical quantitative methods for modeling customer behavior in t... more Starting with a review of some classical quantitative methods for modeling customer behavior in the brand choice situation, some new methods are explained which are based on recently developed techniques from data mining and artificial intelligence: boosting and/or stacking neural network models. The main advantage of these new methods is the gain in predictive performance that is often achieved, which in a marketing setting directly translates into increased reliability of expected market share estimates. The new models are applied to a well-known data set containing scanner data on liquid detergent purchases. The performance of the new models on this data set is compared with results from the marketing literature. Finally, the developed models are applied to some practical marketing issues such as predicting the effect of different pricing schemes upon market share.
Principles of Data Mining and Knowledge Discovery, 1997
Decision tree methods constitute an important and much used technique for classification problems... more Decision tree methods constitute an important and much used technique for classification problems. When such trees are used in a Datamining and Knowledge Discovery context, ease of interpretation of the resulting trees is an important requirement to be met. Decision trees with tests based on a single variable, as produced by methods such as ID3, C4.5 etc., often require a large number of tests to achieve an acceptable accuracy. This makes interpretation of these trees, which is an important reason for their use, disputable. Recently, a number of methods for constructing decision trees with multivariate tests have been presented. Multivariate decision trees are often smaller and more accurate than univariate trees; however, the use of linear combinations of the variables may result in trees that are hard to interpret. In this paper we consider trees with test bases on combinations of at most two variables. We show that bivariate decision trees are an interesting alternative to both uni-and multivariate trees, especially qua ease of interpretation.
Journal of Discrete Mathematics, 2013
This paper proposes a new proof of Dilworth's theorem. The proof is based upon the minflow/ma... more This paper proposes a new proof of Dilworth's theorem. The proof is based upon the minflow/maxcut property in flow networks. In relation to this proof, a new method to find both a Dilworth decomposition and a maximal antichain is presented.
Artificial Intelligence Research, 2013
Ordinal decision problems are very common in real-life. As a result, ordinal classification model... more Ordinal decision problems are very common in real-life. As a result, ordinal classification models have drawn much attention in recent years. Many ordinal problem domains assume that the output is monotonously related to the input, and some ordinal data mining models ensure this property while classifying. However, no one has ever reported how accurate these models are in presence of varying levels of non-monotone noise. In order to do that researchers need an easy-to-use tool for generating artificial ordinal datasets which contain both an arbitrary monotone pattern as well as user-specified levels of non-monotone noise. An algorithm that generates such datasets is presented here in detail for the first time. Two versions of the algorithm are discussed. The first is more time consuming. It generates purely monotone datasets as the base of the computation. Later, non-monotone noise is incrementally inserted to the dataset. The second version is basically similar, but it is significantly faster. It begins with the generation of almost monotone datasets before introducing the noise. Theoretical and empirical studies of the two versions are provided, showing that the second, faster, algorithm is sufficient for almost all practical applications. Some useful information about the two algorithms and suggestions for further research are also discussed.
This paper proposes a new algorithm for target selection. This algorithm collects all frequent pa... more This paper proposes a new algorithm for target selection. This algorithm collects all frequent patterns (equivalent to frequent item sets) in a training set. These patterns are stored efficiently using a compact data structure called a trie. For each pattern the relative frequency of the target class is determined. Target selection is achieved by matching the candidate records with the patterns in the trie. A score for each record results from this matching process, based upon the frequency values in the trie. The records with the best score values are selected. We have applied the new algorithm to a large data set containing the results of a number of mailing campaigns by a Dutch charity organization. Our algorithm turns out to be competitive with logistic regression and superior to CHAID.
ABSTRACT The research of this thesis focuses on the development of methods for building intellige... more ABSTRACT The research of this thesis focuses on the development of methods for building intelligent classification systems. Classification in this thesis refers to the assignment of objects to one of several predefined groups or classes to which those objects may belong. This assignment is performed on the basis of a number of characteristics (or attributes) of these objects. A procedure that describes the assignment of a relevant object to some class in terms of the attribute values of that object is called a classifier or classification rule. This thesis considers classification rules of two types: decision tree classifiers and neural network classifiers. Both types are important alternatives to the classical classifiers based on a linear or quadratic discriminant function.
In many classification problems the domains of the attributes and the classes are linearly ordere... more In many classification problems the domains of the attributes and the classes are linearly ordered. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classification problems. We already provided order-preserving tree-generation algorithms for multi-attribute classification problems with k linearly ordered classes in a previous paper. For real-world datasets it is important to consider approximate solutions to handle problems like speed, tree-size and noise. In this report we develop a new decision tree algorithm that generates quasi-monotone decision trees. This algorithm outperforms classical algorithms such as those of Quinlan with respect to prediction, and beats algorithms that generate strictly monotone decision trees with respect to speed. This report contains proofs of all presented results.
Lecture Notes in Computer Science, 2001
ABSTRACT In this paper a newclassification algorithm based upon frequent patterns is proposed. A ... more ABSTRACT In this paper a newclassification algorithm based upon frequent patterns is proposed. A frequent pattern is a generalization of the concept of a frequent item set, used in association rule mining. First of all, the collection of frequent patterns in the training set is constructed. For each frequent pattern, the support and the confidence is determined and registered. Choosing an appropriate data structure allows us to keep the full collection of frequent patterns in memory. The proposed classification method makes direct use of this collection. This method turns out to be competitive with a well-known classifier like C4.5 and other comparable methods. For large data sets it seems to be a very appropriate method.
Lecture Notes in Computer Science, 1999
In many classiffication problems the domains of the attributes and the classes are linearly order... more In many classiffication problems the domains of the attributes and the classes are linearly orderded. For such problems the classiffication rule often needs to be order-preserving or monotone as we call it. Since the known decision tree methods generate non-monotone trees, these methods are not suitable for monotone classiffication problems. We provide an order-preserving tree-generation algorithm for multi-attribute classiffication problems
Direct marketing firms want to transfer their message as efficiently as possible in order to obta... more Direct marketing firms want to transfer their message as efficiently as possible in order to obtain a profitable long-term relationship with individual customers. Much attention has been paid to address selection of existing customers and on identifying new profitable prospects. Less attention has been paid to the optimal frequency of the contacts with customers. We provide a decision support system that helps the direct mailer to determine mailing frequency for active customers. The system observes the mailing pattern of these customers in terms of the well known R(ecency), F(requency) and M(onetary) variables. The underlying model is based on an optimization model for the frequency of direct mailings. The system provides the direct mailer with tools to define preferred response behavior and advises the direct mailer on the mailing strategy that will steer the customers towards this preferred response behavior. 5001-6182 Business 5201-5982 Business Science Library of Congress Class...
European Journal of Operational Research, 2007
In this paper various ensemble learning methods from machine learning and statistics are consider... more In this paper various ensemble learning methods from machine learning and statistics are considered and applied to the customer choice modeling problem. The application of ensemble learning usually improves the prediction quality of flexible models like decision trees and thus leads to improved predictions. We give experimental results for two real-life marketing datasets using decision trees, ensemble versions of decision trees and the logistic regression model, which is a standard approach for this problem. The ensemble models are found to improve upon individual decision trees and outperform logistic regression. Next, an additive decomposition of the prediction error of a model, the bias/variance decomposition, is considered. A model with a high bias lacks the flexibility to fit the data well. A high variance indicates that a model is instable with respect to different datasets. Decision trees have a high variance component and a low bias component in the prediction error, whereas logistic regression has a high bias component and a low variance component. It is shown that ensemble methods aim at minimizing the variance component in the prediction error while leaving the bias component unaltered. Bias/variance decompositions for all models for both customer choice datasets are given to illustrate these concepts.
Engineering Applications of Artificial Intelligence, 2009
... Rob Potharst a , Arie Ben-David b , Corresponding Author Contact Information , E-mail The Cor... more ... Rob Potharst a , Arie Ben-David b , Corresponding Author Contact Information , E-mail The Corresponding Author and Michiel van Wezel a. ... Daniels and Kamp (1999) adapted a feed forward neural networks approach for producing monotone classifications. Kramer et al. ...
Decision Support Systems, 2008
We create a support system for predicting end prices on eBay. The end price predictions are based... more We create a support system for predicting end prices on eBay. The end price predictions are based on the item descriptions found in the item listings of eBay, and on some numerical item features. The system uses text mining and boosting algorithms from the field of machine learning. Our system substantially outperforms the naive method of predicting the category mean price. Moreover, interpretation of the model enables us to identify influential terms in the item descriptions and shows that the item description is more influential than the seller feedback rating, which was shown to be influential in earlier studies.
Report Econometric Institute Erasmus University Rotterdam, 2002
Direct marketing firms want to transfer their message as efficiently as possible in order to obta... more Direct marketing firms want to transfer their message as efficiently as possible in order to obtain a profitable long-term relationship with individual customers. Much attention has been paid to address selection of existing customers and on identifying new profitable prospects. Less attention has been paid to the optimal frequency of the contacts with customers. We provide a decision support system that helps the direct mailer to determine mailing frequency for active customers. The system observes the mailing pattern of these customers in terms of the well known R(ecency), F(requency) and M(onetary) variables. The underlying model is based on an optimization model for the frequency of direct mailings. The system provides the direct mailer with tools to define preferred response behavior and advises the direct mailer on the mailing strategy that will steer the customers towards this preferred response behavior.
Econometric Institute Report, 2009
The monotonicity constraint is a common side condition imposed on modeling problems as diverse as... more The monotonicity constraint is a common side condition imposed on modeling problems as diverse as hedonic pricing, personnel selection and credit rating. Experience tells us that it is not trivial to generate artificial data for supervised learning problems when the monotonicity constraint holds. Two algorithms are presented in this paper for such learning problems. The first one can be used to generate random monotone data sets without an underlying model, and the second can be used to generate monotone decision tree models. If needed, noise can be added to the generated data. The second algorithm makes use of the first one. Both algorithms are illustrated with an example.
Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three... more Ordinal data sets often contain a certain amount of non-monotone noise. This paper proposes three algorithms for removing these non-monotonicities by relabeling the noisy instances. The first one is a naive algorithm. The second one is a refinement of this naive algorithm which minimizes the difference between the old and the new label. The third one is optimal in the sense that the number of unchanged instances is maximized. The last algorithm is a refinement of the second. In addition, the runtime complexities are discussed.
Artificial Intelligence Research, 2015
Operations Research/Computer Science Interfaces Series, 1997
In this report we discuss the use of two simple classifiers to initialise the input-tohidden laye... more In this report we discuss the use of two simple classifiers to initialise the input-tohidden layer of a one-hidden-layer neural network. These classifiers divide the input space in convex regions that can be represented by membership functions. These functions are then used to determine the weights of the first layer of a feedforward network. Keywords and phrases: mapping decision trees onto neural networks, simple perceptrons, LVQ-networks, initialisation of feedforward networks.
Business Applications and Computational Intelligence
Starting with a review of some classical quantitative methods for modeling customer behavior in t... more Starting with a review of some classical quantitative methods for modeling customer behavior in the brand choice situation, some new methods are explained which are based on recently developed techniques from data mining and artificial intelligence: boosting and/or stacking neural network models. The main advantage of these new methods is the gain in predictive performance that is often achieved, which in a marketing setting directly translates into increased reliability of expected market share estimates. The new models are applied to a well-known data set containing scanner data on liquid detergent purchases. The performance of the new models on this data set is compared with results from the marketing literature. Finally, the developed models are applied to some practical marketing issues such as predicting the effect of different pricing schemes upon market share.
Principles of Data Mining and Knowledge Discovery, 1997
Decision tree methods constitute an important and much used technique for classification problems... more Decision tree methods constitute an important and much used technique for classification problems. When such trees are used in a Datamining and Knowledge Discovery context, ease of interpretation of the resulting trees is an important requirement to be met. Decision trees with tests based on a single variable, as produced by methods such as ID3, C4.5 etc., often require a large number of tests to achieve an acceptable accuracy. This makes interpretation of these trees, which is an important reason for their use, disputable. Recently, a number of methods for constructing decision trees with multivariate tests have been presented. Multivariate decision trees are often smaller and more accurate than univariate trees; however, the use of linear combinations of the variables may result in trees that are hard to interpret. In this paper we consider trees with test bases on combinations of at most two variables. We show that bivariate decision trees are an interesting alternative to both uni-and multivariate trees, especially qua ease of interpretation.
Journal of Discrete Mathematics, 2013
This paper proposes a new proof of Dilworth's theorem. The proof is based upon the minflow/ma... more This paper proposes a new proof of Dilworth's theorem. The proof is based upon the minflow/maxcut property in flow networks. In relation to this proof, a new method to find both a Dilworth decomposition and a maximal antichain is presented.
Artificial Intelligence Research, 2013
Ordinal decision problems are very common in real-life. As a result, ordinal classification model... more Ordinal decision problems are very common in real-life. As a result, ordinal classification models have drawn much attention in recent years. Many ordinal problem domains assume that the output is monotonously related to the input, and some ordinal data mining models ensure this property while classifying. However, no one has ever reported how accurate these models are in presence of varying levels of non-monotone noise. In order to do that researchers need an easy-to-use tool for generating artificial ordinal datasets which contain both an arbitrary monotone pattern as well as user-specified levels of non-monotone noise. An algorithm that generates such datasets is presented here in detail for the first time. Two versions of the algorithm are discussed. The first is more time consuming. It generates purely monotone datasets as the base of the computation. Later, non-monotone noise is incrementally inserted to the dataset. The second version is basically similar, but it is significantly faster. It begins with the generation of almost monotone datasets before introducing the noise. Theoretical and empirical studies of the two versions are provided, showing that the second, faster, algorithm is sufficient for almost all practical applications. Some useful information about the two algorithms and suggestions for further research are also discussed.
This paper proposes a new algorithm for target selection. This algorithm collects all frequent pa... more This paper proposes a new algorithm for target selection. This algorithm collects all frequent patterns (equivalent to frequent item sets) in a training set. These patterns are stored efficiently using a compact data structure called a trie. For each pattern the relative frequency of the target class is determined. Target selection is achieved by matching the candidate records with the patterns in the trie. A score for each record results from this matching process, based upon the frequency values in the trie. The records with the best score values are selected. We have applied the new algorithm to a large data set containing the results of a number of mailing campaigns by a Dutch charity organization. Our algorithm turns out to be competitive with logistic regression and superior to CHAID.