Szymon Jaroszewicz - Academia.edu (original) (raw)

Uploads

Papers by Szymon Jaroszewicz

Research paper thumbnail of Support Vector Machines for Uplift Modeling

2013 IEEE 13th International Conference on Data Mining Workshops, 2013

ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itse... more ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subject to some action, while objects in the control group have not. By including the control group it is possible to build a model which predicts the causal effect of the action for a given individual. In this paper we present a variant of Support Vector Machines designed specifically for uplift modeling. The SVM optimization task has been reformulated to explicitly model the difference in class behavior between two datasets. The model predicts whether a given object will have a positive, neutral or negative response to a given action, and by tuning a parameter of the model the analyst is able to influence the relative proportion of neutral predictions and thus the sensitivity of the model. We adapt the dual coordinate descent method to efficiently solve our optimization task. Finally the proposed method is compared experimentally with other uplift modeling approaches.

Research paper thumbnail of Accurate Schema Matching on Streams

We address the problem of matching imperfectly docu- mented schemas of data streams and large dat... more We address the problem of matching imperfectly docu- mented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between at- tributes by quantifying the similarity of their corresponding values. How- ever, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses

Research paper thumbnail of PaCAL: A Python Package for Arithmetic Computations with Random Variables

Journal of Statistical Software, 2014

Research paper thumbnail of An Inclusion-Exclusion Result For Boolean Polynomials And Its Applications In Data Mining

Research paper thumbnail of The Goodman&#8211 Kruskal Coefficient and Its Applications in Genetic Diagnosis of Cancer

Research paper thumbnail of Decision Trees for Uplift Modeling

Proceedings of the 2010 Ieee International Conference on Data Mining, 2010

Research paper thumbnail of Verifying social network models of Wikipedia knowledge community

Information Sciences, 2016

Research paper thumbnail of Measures on Boolean polynomials and their applications in data mining

Discrete Applied Mathematics, Nov 30, 2004

Research paper thumbnail of Approximating Representations for Large Numerical Databases

Research paper thumbnail of Schema matching on streams with accuracy guarantees

Intelligent Data Analysis, 2008

Research paper thumbnail of Uplift Modeling in Direct Marketing

Journal of Telecommunications and Information Technology, 2012

Research paper thumbnail of PaCAL : A Python Package for Arithmetic Computations with Random Variables

Journal of Statistical Software, 2014

Research paper thumbnail of Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004

Research paper thumbnail of Fast discovery of unexpected patterns in data, relative to a Bayesian network

Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005

Research paper thumbnail of A Metric Approach to Building Decision Trees Based on Goodman-Kruskal Association Index

Lecture Notes in Computer Science, 2004

Research paper thumbnail of On axiomatization of conditional entropy of functions between finite sets

Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999

ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions betw... more ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions between finite sets and we introduce and axiomatize the notion of conditional entropy between functions. The results can be directly applied to logic functions, which can be regarded as functions between finite sets. Our axiomatizations are based on properties of entropy with regard to operations commonly applied to discrete functions and are related to the usage of entropy as a measure of the energy dissipated by circuits that implement discrete functions

Research paper thumbnail of An axiomatization of generalized entropy of partitions

Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001

Research paper thumbnail of Data mining of weak functional decompositions

Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000

ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some ... more ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some completion of f. Using a graph-theoretical characterization of functions that admit such decompositions, we present a technique derived from the a priori algorithm that allows a data mining approach to identifying these decompositions

Research paper thumbnail of On functions defined on free Boolean algebras

Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002

Research paper thumbnail of Mining interesting rules and patterns for salt sensitivity of blood pressure

Given a medical data set containing genetic description of sodium-sensitive and non-sensitive pat... more Given a medical data set containing genetic description of sodium-sensitive and non-sensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting per-ceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.

Research paper thumbnail of Support Vector Machines for Uplift Modeling

2013 IEEE 13th International Conference on Data Mining Workshops, 2013

ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itse... more ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subject to some action, while objects in the control group have not. By including the control group it is possible to build a model which predicts the causal effect of the action for a given individual. In this paper we present a variant of Support Vector Machines designed specifically for uplift modeling. The SVM optimization task has been reformulated to explicitly model the difference in class behavior between two datasets. The model predicts whether a given object will have a positive, neutral or negative response to a given action, and by tuning a parameter of the model the analyst is able to influence the relative proportion of neutral predictions and thus the sensitivity of the model. We adapt the dual coordinate descent method to efficiently solve our optimization task. Finally the proposed method is compared experimentally with other uplift modeling approaches.

Research paper thumbnail of Accurate Schema Matching on Streams

We address the problem of matching imperfectly docu- mented schemas of data streams and large dat... more We address the problem of matching imperfectly docu- mented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between at- tributes by quantifying the similarity of their corresponding values. How- ever, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses

Research paper thumbnail of PaCAL: A Python Package for Arithmetic Computations with Random Variables

Journal of Statistical Software, 2014

Research paper thumbnail of An Inclusion-Exclusion Result For Boolean Polynomials And Its Applications In Data Mining

Research paper thumbnail of The Goodman&#8211 Kruskal Coefficient and Its Applications in Genetic Diagnosis of Cancer

Research paper thumbnail of Decision Trees for Uplift Modeling

Proceedings of the 2010 Ieee International Conference on Data Mining, 2010

Research paper thumbnail of Verifying social network models of Wikipedia knowledge community

Information Sciences, 2016

Research paper thumbnail of Measures on Boolean polynomials and their applications in data mining

Discrete Applied Mathematics, Nov 30, 2004

Research paper thumbnail of Approximating Representations for Large Numerical Databases

Research paper thumbnail of Schema matching on streams with accuracy guarantees

Intelligent Data Analysis, 2008

Research paper thumbnail of Uplift Modeling in Direct Marketing

Journal of Telecommunications and Information Technology, 2012

Research paper thumbnail of PaCAL : A Python Package for Arithmetic Computations with Random Variables

Journal of Statistical Software, 2014

Research paper thumbnail of Interestingness of frequent itemsets using Bayesian networks as background knowledge

Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004

Research paper thumbnail of Fast discovery of unexpected patterns in data, relative to a Bayesian network

Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005

Research paper thumbnail of A Metric Approach to Building Decision Trees Based on Goodman-Kruskal Association Index

Lecture Notes in Computer Science, 2004

Research paper thumbnail of On axiomatization of conditional entropy of functions between finite sets

Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999

ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions betw... more ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions between finite sets and we introduce and axiomatize the notion of conditional entropy between functions. The results can be directly applied to logic functions, which can be regarded as functions between finite sets. Our axiomatizations are based on properties of entropy with regard to operations commonly applied to discrete functions and are related to the usage of entropy as a measure of the energy dissipated by circuits that implement discrete functions

Research paper thumbnail of An axiomatization of generalized entropy of partitions

Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001

Research paper thumbnail of Data mining of weak functional decompositions

Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000

ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some ... more ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some completion of f. Using a graph-theoretical characterization of functions that admit such decompositions, we present a technique derived from the a priori algorithm that allows a data mining approach to identifying these decompositions

Research paper thumbnail of On functions defined on free Boolean algebras

Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002

Research paper thumbnail of Mining interesting rules and patterns for salt sensitivity of blood pressure

Given a medical data set containing genetic description of sodium-sensitive and non-sensitive pat... more Given a medical data set containing genetic description of sodium-sensitive and non-sensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting per-ceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.