Szymon Jaroszewicz - Academia.edu (original) (raw)

Uploads

Papers by Szymon Jaroszewicz

2013 IEEE 13th International Conference on Data Mining Workshops, 2013

ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itse... more ABSTRACT Uplift modeling is a branch of Machine Learning which aims to predict not the class itself, but the difference between the class variable behavior in two groups: treatment and control. Objects in the treatment group have been subject to some action, while objects in the control group have not. By including the control group it is possible to build a model which predicts the causal effect of the action for a given individual. In this paper we present a variant of Support Vector Machines designed specifically for uplift modeling. The SVM optimization task has been reformulated to explicitly model the difference in class behavior between two datasets. The model predicts whether a given object will have a positive, neutral or negative response to a given action, and by tuning a parameter of the model the analyst is able to influence the relative proportion of neutral predictions and thus the sensitivity of the model. We adapt the dual coordinate descent method to efficiently solve our optimization task. Finally the proposed method is compared experimentally with other uplift modeling approaches.

We address the problem of matching imperfectly docu- mented schemas of data streams and large dat... more We address the problem of matching imperfectly docu- mented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between at- tributes by quantifying the similarity of their corresponding values. How- ever, exact calculation of these similarities requires processing of all database records—which is infeasible for data streams. We devise a fast matching algorithm that uses

Journal of Statistical Software, 2014

Proceedings of the 2010 Ieee International Conference on Data Mining, 2010

Information Sciences, 2016

Discrete Applied Mathematics, Nov 30, 2004

Intelligent Data Analysis, 2008

Journal of Telecommunications and Information Technology, 2012

Journal of Statistical Software, 2014

Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004

Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005

Lecture Notes in Computer Science, 2004

Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999

ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions betw... more ABSTRACT In this paper we present a new axiomatization of the notion of entropy of functions between finite sets and we introduce and axiomatize the notion of conditional entropy between functions. The results can be directly applied to logic functions, which can be regarded as functions between finite sets. Our axiomatizations are based on properties of entropy with regard to operations commonly applied to discrete functions and are related to the usage of entropy as a measure of the energy dissipated by circuits that implement discrete functions

Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001

Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000

ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some ... more ABSTRACT A weak decomposition of an incompletely specified function f is a decomposition of some completion of f. Using a graph-theoretical characterization of functions that admit such decompositions, we present a technique derived from the a priori algorithm that allows a data mining approach to identifying these decompositions

Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002

Given a medical data set containing genetic description of sodium-sensitive and non-sensitive pat... more Given a medical data set containing genetic description of sodium-sensitive and non-sensitive patients, we examine it using several techniques: induction of decision rules, naive Bayes classifier, voting per-ceptron classifier, decision trees, SVM classifier. We specifically focus on induction of decision rules and so called Pareto-optimal rules, which are of large interpretative value for physicians. We find statistically relevant combinations of attributes, which affect the sodium sensitivity.

2013 IEEE 13th International Conference on Data Mining Workshops, 2013

Journal of Statistical Software, 2014

Proceedings of the 2010 Ieee International Conference on Data Mining, 2010

Information Sciences, 2016

Discrete Applied Mathematics, Nov 30, 2004

Intelligent Data Analysis, 2008

Journal of Telecommunications and Information Technology, 2012

Journal of Statistical Software, 2014

Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04, 2004

Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining - KDD '05, 2005

Lecture Notes in Computer Science, 2004

Proceedings 1999 29th IEEE International Symposium on Multiple-Valued Logic (Cat. No.99CB36329), 1999

Proceedings 31st IEEE International Symposium on Multiple-Valued Logic, 2001

Proceedings 30th IEEE International Symposium on Multiple-Valued Logic (ISMVL 2000), 2000

Proceedings 32nd IEEE International Symposium on Multiple- Valued Logic, 2002