Parallel bottom-up processing of datalog queries (original) (raw)

Foundations and Trends® in Databases, 2012

In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog. This paper surveys for a general audience the Datalog language, recursive query processing, and optimization techniques. This survey differs from prior surveys written in the eighties and nineties in its comprehensiveness of topics, its coverage of recent developments and applications, and its emphasis on features and techniques beyond "classical" Datalog which are vital for practical applications. Specifically, the topics covered include the core Datalog language and various extensions, semantics, query optimizations, magic-sets optimizations, incremental view maintenance, aggregates, negation, and types. We conclude the paper with a survey of recent systems and applications that use Datalog and recursive queries.

On Decompositions of Chain Datalog Programs into

1995

As an approach to optimization, this paper examines the decomposition of chain Datalog programs into 7 ) (left-)linear sequences of 1-rule programs. The notion of 7) (left-)linear, introduced here, encompasses numerous special (left-) linear forms and includes the traditional (left) linear as a subcase. The decompositions are first characterized in terms of properties of associated context-free languages. More specific characterizations are provided for three types of 7) (left-)linear decompositions with 1-rule components, and the corresponding decision problems considered. Finally, arbitrarily large, inherently nondecomposable, /)-linear size-prime programs are exhibited. DECOMPOSING DATALOG INTO SEQUENCES OF RULES 205

Extending the power of datalog recursion

The VLDB Journal, 2012

Supporting aggregates in recursive logic rules represents a very important problem for Datalog. To solve this problem, we propose a simple extension, called Datalog F S (Datalog extended with frequency support goals), that supports queries and reasoning about the number of distinct variable assignments satisfying given goals, or conjunctions of goals, in rules. This monotonic extension greatly enhances the power of Datalog, while preserving (i) its declarative semantics and (ii) its amenability to efficient implementation via differential fixpoint and other optimization techniques presented in the paper. Thus, Datalog F S enables the efficient formulation of queries that could not be expressed efficiently or could not be expressed at all in Datalog with stratified negation and aggregates. In fact, using a generalized notion of multiplicity called frequency, we show that diffusion models and page rank computations can be easily expressed and efficiently implemented using Datalog F S .

Linearisability on datalog programs

Theoretical Computer Science, 2003

Linear Datalog programs are programs whose clauses have a t m o s t o n e i n tensional atom in their bodies. We explore syntactic classes of Datalog programs (syntactically non-linear) which turn out to express no more than the queries expressed by linear Datalog programs. In particular, we i n vestigate linearisability of (database queries corresponding to) piecewise linear Datalog programs and chain queries: a) We prove that piecewise linear Datalog programs can always be transformed into linear Datalog programs, by virtue of a procedure which performs the transformation automatically. The procedure relies upon conventional logic program transformation techniques. b) We identify a new class of linearisable chain queries, referred to as pseudoregular, and prove their linearisability constructively, b y generating, for any given pseudo-regular chain query, the Datalog program corresponding to it.

Efficiently computable datalog^E programs

Datalog ∃ is the extension of Datalog, allowing existentially quantified variables in rule heads. This language is highly expressive and enables easy and powerful knowledge-modeling, but the presence of existentially quantified variables makes reasoning over Datalog ∃ undecidable, in the general case. The results in this paper enable powerful, yet decidable and efficient reasoning (query answering) on top of Datalog ∃ programs. On the theoretical side, we define the class of parsimonious Datalog ∃ programs, and show that it allows of decidable and efficiently-computable reasoning. Unfortunately, we can demonstrate that recognizing parsimony is undecidable. However, we single out Shy, an easily recognizable fragment of parsimonious programs, that significantly extends both Datalog and Linear-Datalog ∃ , while preserving the same (data and combined) complexity of query answering over Datalog, although the addition of existential quantifiers. On the practical side, we implement a bottom-up evaluation strategy for Shy programs inside the DLV system, enhancing the computation by a number of optimization techniques to result in DLV ∃ -a powerful system for answering conjunctive queries over Shy programs, which is profitably applicable to ontology-based query answering. Moreover, we carry out an experimental analysis, comparing DLV ∃ against a number of stateof-the-art systems for ontology-based query answering. The results confirm the effectiveness of DLV ∃ , which outperforms all other systems in the benchmark domain.

Compiling data-parallel Datalog

Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, 2021

Datalog allows intuitive declarative specification of logical inference tasks while enjoying efficient implementation via state-ofthe-art engines such as LogicBlox and Soufflé. These engines enable high-performance implementation of complex logical tasks including graph mining, program analysis, and business analytics. However, all efficient modern Datalog solvers make use of shared memory, and present inherent challenges scalability. In this paper we leverage recent insights in parallel relational algebra and present a methodology for constructing data-parallel deductive databases. Our approach leverages recent developments in parallelizing relational algebra to create an efficient data-parallel semantics for Datalog. Based on our methodology, we have implemented the first MPI-based data-parallel Datalog solver. Our experiments demonstrate comparable performance and improved single-node scalability versus Soufflé, a state-of-art solver.

Circumscribing DATALOG: Expressive Power and Complexity

Theoretical Computer Science, 1998

In this paper we study a generalization of DATALOG, the language of function-free definite clauses. It is known that standard DATALOG semantics (i.e., least Herbrand model semantics) can be obtained by regarding programs as theories to be circumscribed with all predicates to be minimized. The extension proposed here, called DATALOG~!~~, consists in considering the general form of circumscription, where some predicates are minimized, some predicates are fixed, and some vary. We study the complexity and the expressive power of the language thus obtained. We show that this language (and, actually, its non-recursive fragment) is capable of expressing all the queries in DB-co-m and, as such, is much more powerful than standard DATALOG, whose expressive power is limited to a strict subset of PTIME queries. Both data and combined complexities of answering DATALOGCIRC queries are studied. Data complexity is proved to be co-NP-complete. Combined complexity is shown to be in general hard for co-NE and complete for co-NE in the case of Herbrand bases containing k distinct constant symbols, where k is bounded.

Datalog and emerging applications

Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011

We are witnessing an exciting revival of interest in recursive Datalog queries in a variety of emerging application domains such as data integration, information extraction, networking, program analysis, security, and cloud computing. This tutorial briefly reviews the Datalog language and recursive query processing and optimization techniques, then discusses applications of Datalog in three application domains: data integration, declarative networking, and program analysis. Throughout the tutorial, we use LogicBlox, a commercial Datalog engine for enterprise software systems, to allow the audience to walk through code examples presented in the tutorial.

Parallel bottom-up processing of datalog queries (original) (raw)

Related papers