Szymon Jaroszewicz - Academia.edu (original) (raw)
Uploads
Papers by Szymon Jaroszewicz
2013 IEEE 13th International Conference on Data Mining Workshops, 2013
ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itse... more ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subject to some action, while objects in the control group have not. By including the control group it is possible to build a model which predicts the causal effect of the action for a given individual. In this paper we present a variant of Support Vector Machines designed specifically for uplift modeling. The SVM optimization task has been reformulated to explicitly model the difference in class behavior between two datasets. The model predicts whether a given object will have a positive, neutral or negative response to a given action, and by tuning a parameter of the model the analyst is able to influence the relative proportion of neutral predictions and thus the sensitivity of the model. We adapt the dual coordinate descent method to efficiently solve our optimization task. Finally the proposed method is compared experimentally with other uplift modeling approaches.
We address the problem of matching imperfectly docu- mented schemas of data streams and large dat... more We address the problem of matching imperfectly docu- mented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between at- tributes by quantifying the similarity of their corresponding values. How- ever, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses
Journal of Statistical Software, 2014
Proceedings of the 2010 Ieee International Conference on Data Mining, 2010
Information Sciences, 2016
Discrete Applied Mathematics, Nov 30, 2004
Intelligent Data Analysis, 2008
Journal of Telecommunications and Information Technology, 2012
Journal of Statistical Software, 2014
Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004
Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005
Lecture Notes in Computer Science, 2004
Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999
ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions betw... more ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions between finite sets and we introduce and axiomatize the notion of conditional entropy between functions. The results can be directly applied to logic functions, which can be regarded as functions between finite sets. Our axiomatizations are based on properties of entropy with regard to operations commonly applied to discrete functions and are related to the usage of entropy as a measure of the energy dissipated by circuits that implement discrete functions
Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001
Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000
ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some ... more ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some completion of f. Using a graph-theoretical characterization of functions that admit such decompositions, we present a technique derived from the a priori algorithm that allows a data mining approach to identifying these decompositions
Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002
Given a medical data set containing genetic description of sodium-sensitive and non-sensitive pat... more Given a medical data set containing genetic description of sodium-sensitive and non-sensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting per-ceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.
2013 IEEE 13th International Conference on Data Mining Workshops, 2013
ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itse... more ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subject to some action, while objects in the control group have not. By including the control group it is possible to build a model which predicts the causal effect of the action for a given individual. In this paper we present a variant of Support Vector Machines designed specifically for uplift modeling. The SVM optimization task has been reformulated to explicitly model the difference in class behavior between two datasets. The model predicts whether a given object will have a positive, neutral or negative response to a given action, and by tuning a parameter of the model the analyst is able to influence the relative proportion of neutral predictions and thus the sensitivity of the model. We adapt the dual coordinate descent method to efficiently solve our optimization task. Finally the proposed method is compared experimentally with other uplift modeling approaches.
We address the problem of matching imperfectly docu- mented schemas of data streams and large dat... more We address the problem of matching imperfectly docu- mented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between at- tributes by quantifying the similarity of their corresponding values. How- ever, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses
Journal of Statistical Software, 2014
Proceedings of the 2010 Ieee International Conference on Data Mining, 2010
Information Sciences, 2016
Discrete Applied Mathematics, Nov 30, 2004
Intelligent Data Analysis, 2008
Journal of Telecommunications and Information Technology, 2012
Journal of Statistical Software, 2014
Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004
Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005
Lecture Notes in Computer Science, 2004
Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999
ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions betw... more ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions between finite sets and we introduce and axiomatize the notion of conditional entropy between functions. The results can be directly applied to logic functions, which can be regarded as functions between finite sets. Our axiomatizations are based on properties of entropy with regard to operations commonly applied to discrete functions and are related to the usage of entropy as a measure of the energy dissipated by circuits that implement discrete functions
Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001
Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000
ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some ... more ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some completion of f. Using a graph-theoretical characterization of functions that admit such decompositions, we present a technique derived from the a priori algorithm that allows a data mining approach to identifying these decompositions
Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002
Given a medical data set containing genetic description of sodium-sensitive and non-sensitive pat... more Given a medical data set containing genetic description of sodium-sensitive and non-sensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting per-ceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.