Clemens Ley - Profile on Academia.edu (original) (raw)

Papers by Clemens Ley

Forward looking logics and automata

&amp;amp;quot;This thesis is concerned with extending properties of regular word language... more &amp;amp;quot;This thesis is concerned with extending properties of regular word languages to richer structures. We consider intricate properties like the relationship between one-way and two-way temporal logics, minimization of automata, and the ability to effectively characterize logics. We investigate whether these properties can be extended to tree languages or word languages over an infinite alphabet. It is known that linear temporal logic (LTL) is as expressive as first-order logic over finite words [Kam68, GPSS80]. LTL is a unidirectional logic, that can only navigate forwards in a word, hence it is quite surprising that it can capture all of first-order logic. In fact, one of the main ideas of the proof of [GPSS80] is to show that the expressiveness of LTL is not increased if modalities for navigating backwards are added. It is also known that an extension of bidirectional LTL to ordered trees, called Conditional XPath, is first-order complete [Mar04]. We investigate whether the unidirectional fragment of Conditional XPath is also first-order complete. We show that this is not the case. In fact we show that there is a strict hierarchy of expressiveness consisting of languages that are all weaker than first-order logic. Unidirectional Conditional XPath is contained in the lowest level of this hierarchy. In the second part of the thesis we consider data word languages. That is, word languages over an infinite alphabet. We extend the theorem of Myhill and Nerode to a class of automata for data word languages, called deterministic finite memory automata (DMA). We give a characterization of the languages that are accepted by DMA, and also provide an algorithm for minimizing DMA. Finally we extend theorems of Büchi, Schützenberger, McNaughton, and Papert to data word languages. A theorem of Bu ̈chi states that a language is regular iff it can be defined in monadic second-order logic. Schützenberger, McNaughton, and Papert have provided an effective characterization of first-order logic, that is, an algorithm for deciding whether a regular language can be defined in first-order logic. We provide a counterpart of Büchi’s theorem for data languages. More precisely we define a new logic and we show that it has the same expressiveness as non-deterministic finite memory automata. We then turn to a smaller class of data languages, those that are recognized by algebraic objects called orbit finite data monoids. We define a second new logic and show that it can define precisely the languages accepted by orbit finite data monoids. We provide an effective characterization of a first-order variant of this second logic, as well as of restrictions of first-order logic, such as its two variable fragment and local variants.&amp;amp;quot;

Lecture Notes in Computer Science, 2010

The relationship between automata and logics has been investigated since the 1960s. In particular... more The relationship between automata and logics has been investigated since the 1960s. In particular, it was shown how to determine, given an automaton, whether or not it is definable in first-order logic with label tests and the order relation, and for first-order logic with the successor relation. In recent years, there has been much interest in languages over an infinite alphabet. Kaminski and Francez introduced a class of automata called finite memory automata (FMA), that represent a natural analog of finite state machines. A FMA can use, in addition to its control state, a (bounded) number of registers to store and compare values from the input word. The class of data languages recognized by FMA is incomparable with the class of data languages defined by firstorder formulas with the order relation and an additional binary relation for data equality. We first compare the expressive power of several variants of FMA with several data word logics. Then we consider the corresponding decision problem: given an automaton A and a logic, can the language recognized by A be defined in the logic? We show that it is undecidable for several variants of FMA, and then investigate the issue in detail for deterministic FMA. We show the problem is decidable for first-order logic with local data comparisons-an analog of first-order logic with successor. We also show instances of the problem for richer classes of first-order logic that are decidable. Organization: Section 1 explains the automata and logic formalisms that are the core topic of this paper, and their relationships. Section 2 gives undecidability results for several powerful models. Section 3 gives decidable criteria for non-uniform first order definability within certain classes of memory automata. Section 4 gives a decision procedure for first-order definability with only local data comparisons. Section 5 investigates the broader question of deciding firstorder definability with unrestricted data comparisons. We do not resolve this question, but provide effective necessary conditions and effective sufficient criteria. Section 6 gives conclusions. All proofs can be found in the appendix.

Proceedings of the VLDB Endowment, 2012

We study verification of systems whose transitions consist of accesses to a Web-based data-source... more We study verification of systems whose transitions consist of accesses to a Web-based data-source . An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, A-automata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying wi...

Semantic Web Information Management, 2009

SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is fu... more SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition SPARQLog is aware of important RDF features such as the distinction between blank nodes, literals and IRIs or the RDFS vocabulary. The semantics of SPARQLog is closed (every answer is an RDF graph), but lifts RDF's restrictions on literal and blank node occurrences for intermediary data. We show how to define a sound and complete operational semantics that can be implemented using existing logic programming techniques. While SPARQLog is Turing complete, we identify a decidable (in fact, polynomial time) fragment SwARQLog ensuring polynomial data-complexity inspired from the notion of super-weak acyclicity in data exchange. Furthermore, we prove that SPAR-QLog with no universal quantifiers in the scope of existential ones (∀∃ fragment) is equivalent to full SPARQLog in presence of graph projection. Thus, the convenience of arbitrary quantifier alternation comes, in fact, for free. These results, though here presented in the context of RDF querying, apply similarly also in the more general setting of data exchange.

Lecture Notes in Computer Science, 2008

We introduce the recursive, rule-based RDF query language RDFLog. RDFLog extends previous RDF que... more We introduce the recursive, rule-based RDF query language RDFLog. RDFLog extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition RDFLog is aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS vocabulary. The semantics of RDFLog is closed (every answer is an RDF graph), but lifts RDF's restrictions on literal and blank node occurrences for intermediary data. We show how to define a sound and complete operational semantics that can be implemented using existing logic programming techniques. Using RDFLog we classify previous approaches to RDF querying along their support for blank node construction and show equivalence between languages with full quantifier alternation and languages with only ∀∃ rules.

Citeseer

Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry D... more Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry Dr. Sebastian Schaffert ... This project thesis investigates error handling for the Web Query Language Xcerpt (cf. http://www.xcerpt.org). In addition, a formal definition of the Xcerpt syntax is given. ... After a short introduction to Xcerpt and parsing, several error management techniques known from literature are presented. They are compared regarding their usefulness for Xcerpt, and one technique is recommended as most suitable for Xcerpt.

Alberto Mendelzon Workshop on Foundations of Databases, 2010

Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages ... more Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized by deterministic finite-memory automata (DMA). As a byproduct of this characterization result, we obtain a canonical representation for any DMA-recognizable language. We then show that this canonical automaton is minimal in a strong sense: it has the minimal number of control states and also the minimal amount of internal storage. We finally show how this minimal automaton can be computed.

SPARQLog: SPARQL with Rules and Quantiﬁcation

Semantic Web Information Management, 2010

Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is no... more Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is not first-order complete – that is it cannot express every query definable in firstorder logic over the navigational predicates. How can one extend XPath to get a first-order complete language? Marx has shown that Conditional XPath – an extension of XPath with an “Until ” operator – is first order complete. The completeness argument makes essential use of the presence of upward axes in Conditional XPath. We examine whether it is possible to get “forward-only ” languages that are first-order complete for XML Boolean queries. It is easy to see that a variant of the temporal logic CTL ∗ is first-order complete; the variant has path quantifiers for downward, leftward and rightward paths, while along a path one can check arbitrary formulas of linear temporal logic (LTL). This language has two major disadvantages: it requires path quantification in both horizontal directions (in particular, it r...

The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic obje... more The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic object for defining recognizable languages of data words. Following Büchi’s approach, we introduce a variant of monadic second-order logic with data equality tests that captures precisely the data languages recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Pa-pert, that the first-order fragment of this logic defines exactly the data languages recognizable by aperiodic orbit finite data monoids. Finally, we consider another variant of the logic that can be interpreted over generic structures with data. The data languages defined in this variant are also recognized by unambiguous finite memory automata.

Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an alg... more Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an algebraic object for defining recognizable lan-guages of data words. Following Büchi’s approach, we introduce the new logic ‘rigidly guarded MSO ’ and show that the data languages definable in this logic are exactly those recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Papert, that the first-order variant of this logic defines exactly the languages recognizable by aperiodic orbit finite data monoids. Finally, we give a variant of the logic that captures the larger class of languages recognized by non-deterministic finite memory automata. 1

Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This ... more Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This article proposes to overcome these limitations with “simulation unification ” and “RDFLog”. Simulation unification is a non-standard unification inspired from regular path queries. Like standard unification, it yields bindings for variables in both terms to unify. Unlike standard unification, it does not try to make the two terms identical but instead to embed the query into the data. Simulation unification is decidable. Without variables, it has polynomial complexity. With variables it is, like standard unification, np-complete. We identify a number of interesting special cases of unification, e.g., in presence or absence of term injectivity. In particular, we show that simulation unification without term injectivity on tree data is linear and in presence of injectivity it is still polynomial even on unordered trees in contrast to the np-complete unordered tree inclusion problem. RDFLog...

Abstract. RDF data is set apart from relational or XML data by its support of rich existential in... more Abstract. RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construction, i.e., the introduction of new blank nodes has been mostly ignored (e.g., in Triple) or treated in a very limited form (e.g., in SPARQL). In this paper, we classify three kinds of blank nodes in RDF query languages and introduce the recursive, rule-based RDF query language RDFLog. RDFLog is the first RDF query language with full arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. RDFLog is also aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS voca...

This survey article introduces into the essential concepts and methods underlying rule-based quer... more This survey article introduces into the essential concepts and methods underlying rule-based query languages. It covers four complementary areas: declarative semantics based on adaptations of mathematical logic, operational semantics, complexity and expressive power, and optimisation of query evaluation. The treatment of these areas is foundation-oriented, the foundations having resulted from over four decades of research in the logic programming and database communities on combinations of query languages and rules. These results have later formed the basis for conceiving, improving, and implementing several Web and Semantic Web technologies, in particular query languages such as XQuery or SPARQL for querying relational, XML, and RDF data, and rule languages like the “Rule Interchange Framework (RIF) ” currently being developed in a working group of the W3C. Coverage of the article is deliberately limited to declarative languages in a classical setting: issues such as query answerin...

Chaining with Memory in Xcerpt

Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt prog... more Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt programs requires to address the issue of efficient rule chaining. In this deliverable, we first survey existing approaches for efficient rule chaining (using some form of memoization) in logic programming and then briefly outline first results and challenges when extending these results to Xcerpt.

RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of R... more RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of RDF are blank nodes, which allow to assert the existence of an entity without naming for it. Despite the importance of blank nodes for RDF, many existing RDF query language have only insu cient support for blank nodes. We propose a query language for RDF, called RDFLog, with extensive blank node support. ¿e evaluation of RDFLog may be reduced to the evaluation of Datalog. ¿is allows to apply standard database technology to querying RDF. Our Experimental evaluation shows that our implementation scales well, even for large data sets. ¿e core feature of the reduction is Skolemisation and an new form of unSkolemisation. In contrast to previous de nitions of un-Skolemisation our un-Skolemisation has desirable symmetric properties to those of the Skolemisation. We de ne a hierarchy of syntactical restrictions of RDFLog with lower expressivity but better complexity, thereby showing the computati...

We provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized... more We provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized by deterministic finite-memory automata (DMA). As a byproduct of this characterization result, we obtain a canonical representation for any DMA-recognizable language. We then show that this canonical automaton is minimal in a strong sense: it has the minimal number of control states and also the minimal amount of internal storage.

We study verification of systems whose transitions consist of accesses to a Web-based data-source... more We study verification of systems whose transitions consist of accesses to a Web-based data-source. An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, A-automata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying with limited access patterns" that have been studied in the database literature in the past, such as whether an access is relevant to answering a query, and whether two queries are equivalent in the accessible data they can return. We prove decidability and complexity results for several restrictions and variants of AccLTL, and explain which properties of paths can be expressed in each restriction.

Forward looking logics and automata

Lecture Notes in Computer Science, 2010

Proceedings of the VLDB Endowment, 2012

Semantic Web Information Management, 2009

Lecture Notes in Computer Science, 2008

Citeseer

Alberto Mendelzon Workshop on Foundations of Databases, 2010

SPARQLog: SPARQL with Rules and Quantiﬁcation

Semantic Web Information Management, 2010

Chaining with Memory in Xcerpt

We study verification of systems whose transitions consist of accesses to a Web-based data-source... more We study verification of systems whose transitions consist of accesses to a Web-based data-source. An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, A-automata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying with limited access patterns" that have been studied in the database literature in the past, such as whether an access is relevant to answering a query, and whether two queries are equivalent in the accessible data they can return. We prove decidability and complexity results for several restrictions and variants of AccLTL, and explain which properties of paths can be expressed in each restriction.