Bassem Sayrafi | Birzeit University (original) (raw)
Papers by Bassem Sayrafi
We establish a link between measures and certain types of inference systems and we illustrate thi... more We establish a link between measures and certain types of inference systems and we illustrate this connection on examples that occur in computing applications, especially in the areas of databases and data mining.
Proceedings of the 2005 ACM symposium on …, Jan 1, 2005
We study the problem of one dimensional selectivity estimation in relational databases. We introd... more We study the problem of one dimensional selectivity estimation in relational databases. We introduce a new type of histogram based on information theory. We compare our histogram against a large number of other techniques and on a wide array of datasets. We observe our histograms to have the overall best accuracy on the real datasets. We also observe that the accuracy ranking of all methods varies significantly across datasets. As such, we observe results not consistent with several conclusions drawn in past literature. Thus, we believe a gap exists in the past accuracy characterization.
Proceedings of the 1st …, Jan 1, 2005
We study the relative effectiveness and the efficiency of computing support-bounding rules that c... more We study the relative effectiveness and the efficiency of computing support-bounding rules that can be used to prune the search space in algorithms to solve the frequent item-sets mining problem (FIM). We develop a formalism wherein these rules can be stated and analyzed using the concept of differentials and density functions of the support function. We derive a general bounding theorem, which provides lower and upper bounds on the supports of item-sets in terms of the supports of their subsets. Since, in general, many lower and upper bounds exists for the support of an item-set, we show how to the best bounds. The result of this optimization shows that the best bounds are among those that involve the supports of all the strict subsets of an item-set of a particular size q. These bounds are determined on the basis of so called q-rules. In this way, we derive the bounding theorem established by Calders . For these types of bounds, we consider how they compare relative to each other, and in so doing determine the best bounds. Since determining these bounds is combinatorially expensive, we study heuristics that efficiently produce bounds that are usually the best. These heuristics always produce the best bounds on the support of item-sets for basket databases that satisfies independence properties. In particular, we show that for an item-set I determining which bounds to compute that lead to the best lower and upper bounds on freq(I) can be done in time O(|I|). Even though, in practice, basket databases do not have these independence properties, we argue that our analysis carries over to a much larger set of basket databases where local "near" independence hold. Finally, we conduct an experimental study using real baskets databases, where we compute upper bounds in the context of generalizing the Apriori algorithm. Both the analysis and the study confirm that the q-rule (q odd and larger than 1) will almost always do better than the 1-rule (Apriori rule) on large dense baskets databases. Our experiment re- * The first two authors were supported by NSF Grant IIS-0082407.
Abstract We propose a mathematical framework for measures to study constraints and bounds on meas... more Abstract We propose a mathematical framework for measures to study constraints and bounds on measurements of data used in the domains of databases and data mining. This framework is significant in that it facilitates the use of two tools for knowledge extraction, ...
Information Systems, Jan 1, 2008
We study the implication problem of measure-based constraints. These constraints are formulated i... more We study the implication problem of measure-based constraints. These constraints are formulated in a framework for measures generalizing that for mathematical measures. Measures arise naturally in a wide variety of domains. We show that measure constraints, for particular measures, correspond to constraints that occur in relational databases, data mining applications, cooperative game theory, and in the Dempster-Shafer and possibility theories of reasoning about uncertainty. We prove that the implication problem for measure constraints is in general decidable. We introduce inference systems for particular classes of measure constraints and show that some of these are complete, yielding tractability for the corresponding implication problem. r
Workshop on Causality and Causal Discovery, London, …, Jan 1, 2004
Proceedings of the twenty-fourth ACM …, Jan 1, 2005
Differential constraints are a class of finite difference equations specified over functions from... more Differential constraints are a class of finite difference equations specified over functions from the powerset of a finite set into the reals. We characterize the implication problem for such constraints in terms of lattice decompositions, and give a sound and complete set of inference rules. We relate differential constraints to a subclass of propositional logic formulas, allowing us to show that the implication problem is coNP-complete. Furthermore, we apply the theory of differential constraints to the problem of concise representations in the frequent itemset problem by linking differential constraints to disjunctive rules. We also establish a connection to relational databases by associating differential constraints to positive boolean dependencies.
We establish a link between measures and certain types of inference systems and we illustrate thi... more We establish a link between measures and certain types of inference systems and we illustrate this connection on examples that occur in computing applications, especially in the areas of databases and data mining.
Proceedings of the 2005 ACM symposium on …, Jan 1, 2005
We study the problem of one dimensional selectivity estimation in relational databases. We introd... more We study the problem of one dimensional selectivity estimation in relational databases. We introduce a new type of histogram based on information theory. We compare our histogram against a large number of other techniques and on a wide array of datasets. We observe our histograms to have the overall best accuracy on the real datasets. We also observe that the accuracy ranking of all methods varies significantly across datasets. As such, we observe results not consistent with several conclusions drawn in past literature. Thus, we believe a gap exists in the past accuracy characterization.
Proceedings of the 1st …, Jan 1, 2005
We study the relative effectiveness and the efficiency of computing support-bounding rules that c... more We study the relative effectiveness and the efficiency of computing support-bounding rules that can be used to prune the search space in algorithms to solve the frequent item-sets mining problem (FIM). We develop a formalism wherein these rules can be stated and analyzed using the concept of differentials and density functions of the support function. We derive a general bounding theorem, which provides lower and upper bounds on the supports of item-sets in terms of the supports of their subsets. Since, in general, many lower and upper bounds exists for the support of an item-set, we show how to the best bounds. The result of this optimization shows that the best bounds are among those that involve the supports of all the strict subsets of an item-set of a particular size q. These bounds are determined on the basis of so called q-rules. In this way, we derive the bounding theorem established by Calders . For these types of bounds, we consider how they compare relative to each other, and in so doing determine the best bounds. Since determining these bounds is combinatorially expensive, we study heuristics that efficiently produce bounds that are usually the best. These heuristics always produce the best bounds on the support of item-sets for basket databases that satisfies independence properties. In particular, we show that for an item-set I determining which bounds to compute that lead to the best lower and upper bounds on freq(I) can be done in time O(|I|). Even though, in practice, basket databases do not have these independence properties, we argue that our analysis carries over to a much larger set of basket databases where local "near" independence hold. Finally, we conduct an experimental study using real baskets databases, where we compute upper bounds in the context of generalizing the Apriori algorithm. Both the analysis and the study confirm that the q-rule (q odd and larger than 1) will almost always do better than the 1-rule (Apriori rule) on large dense baskets databases. Our experiment re- * The first two authors were supported by NSF Grant IIS-0082407.
Abstract We propose a mathematical framework for measures to study constraints and bounds on meas... more Abstract We propose a mathematical framework for measures to study constraints and bounds on measurements of data used in the domains of databases and data mining. This framework is significant in that it facilitates the use of two tools for knowledge extraction, ...
Information Systems, Jan 1, 2008
We study the implication problem of measure-based constraints. These constraints are formulated i... more We study the implication problem of measure-based constraints. These constraints are formulated in a framework for measures generalizing that for mathematical measures. Measures arise naturally in a wide variety of domains. We show that measure constraints, for particular measures, correspond to constraints that occur in relational databases, data mining applications, cooperative game theory, and in the Dempster-Shafer and possibility theories of reasoning about uncertainty. We prove that the implication problem for measure constraints is in general decidable. We introduce inference systems for particular classes of measure constraints and show that some of these are complete, yielding tractability for the corresponding implication problem. r
Workshop on Causality and Causal Discovery, London, …, Jan 1, 2004
Proceedings of the twenty-fourth ACM …, Jan 1, 2005
Differential constraints are a class of finite difference equations specified over functions from... more Differential constraints are a class of finite difference equations specified over functions from the powerset of a finite set into the reals. We characterize the implication problem for such constraints in terms of lattice decompositions, and give a sound and complete set of inference rules. We relate differential constraints to a subclass of propositional logic formulas, allowing us to show that the implication problem is coNP-complete. Furthermore, we apply the theory of differential constraints to the problem of concise representations in the frequent itemset problem by linking differential constraints to disjunctive rules. We also establish a connection to relational databases by associating differential constraints to positive boolean dependencies.