Clemens Ley | University of Oxford (original) (raw)
Papers by Clemens Ley
"This thesis is concerned with extending properties of regular word language... more "This thesis is concerned with extending properties of regular word languages to richer structures. We consider intricate properties like the relationship between one-way and two-way temporal logics, minimization of automata, and the ability to effectively characterize logics. We investigate whether these properties can be extended to tree languages or word languages over an infinite alphabet. It is known that linear temporal logic (LTL) is as expressive as first-order logic over finite words [Kam68, GPSS80]. LTL is a unidirectional logic, that can only navigate forwards in a word, hence it is quite surprising that it can capture all of first-order logic. In fact, one of the main ideas of the proof of [GPSS80] is to show that the expressiveness of LTL is not increased if modalities for navigating backwards are added. It is also known that an extension of bidirectional LTL to ordered trees, called Conditional XPath, is first-order complete [Mar04]. We investigate whether the unidirectional fragment of Conditional XPath is also first-order complete. We show that this is not the case. In fact we show that there is a strict hierarchy of expressiveness consisting of languages that are all weaker than first-order logic. Unidirectional Conditional XPath is contained in the lowest level of this hierarchy. In the second part of the thesis we consider data word languages. That is, word languages over an infinite alphabet. We extend the theorem of Myhill and Nerode to a class of automata for data word languages, called deterministic finite memory automata (DMA). We give a characterization of the languages that are accepted by DMA, and also provide an algorithm for minimizing DMA. Finally we extend theorems of Büchi, Schützenberger, McNaughton, and Papert to data word languages. A theorem of Bu ̈chi states that a language is regular iff it can be defined in monadic second-order logic. Schützenberger, McNaughton, and Papert have provided an effective characterization of first-order logic, that is, an algorithm for deciding whether a regular language can be defined in first-order logic. We provide a counterpart of Büchi’s theorem for data languages. More precisely we define a new logic and we show that it has the same expressiveness as non-deterministic finite memory automata. We then turn to a smaller class of data languages, those that are recognized by algebraic objects called orbit finite data monoids. We define a second new logic and show that it can define precisely the languages accepted by orbit finite data monoids. We provide an effective characterization of a first-order variant of this second logic, as well as of restrictions of first-order logic, such as its two variable fragment and local variants."
Lecture Notes in Computer Science, 2010
Proceedings of the VLDB Endowment, 2012
We study verification of systems whose transitions consist of accesses to a Web-based data-source... more We study verification of systems whose transitions consist of accesses to a Web-based data-source . An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, A-automata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying wi...
Semantic Web Information Management, 2009
Lecture Notes in Computer Science, 2008
Citeseer
Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry D... more Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry Dr. Sebastian Schaffert ... This project thesis investigates error handling for the Web Query Language Xcerpt (cf. http://www.xcerpt.org). In addition, a formal definition of the Xcerpt syntax is given. ... After a short introduction to Xcerpt and parsing, several error management techniques known from literature are presented. They are compared regarding their usefulness for Xcerpt, and one technique is recommended as most suitable for Xcerpt.
Alberto Mendelzon Workshop on Foundations of Databases, 2010
Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages ... more Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized by deterministic finite-memory automata (DMA). As a byproduct of this characterization result, we obtain a canonical representation for any DMA-recognizable language. We then show that this canonical automaton is minimal in a strong sense: it has the minimal number of control states and also the minimal amount of internal storage. We finally show how this minimal automaton can be computed.
Semantic Web Information Management, 2010
SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is fu... more SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition, SPARQLog is aware of important RDF features such as the distinction between ...
Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is no... more Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is not first-order complete – that is it cannot express every query definable in firstorder logic over the navigational predicates. How can one extend XPath to get a first-order complete language? Marx has shown that Conditional XPath – an extension of XPath with an “Until ” operator – is first order complete. The completeness argument makes essential use of the presence of upward axes in Conditional XPath. We examine whether it is possible to get “forward-only ” languages that are first-order complete for XML Boolean queries. It is easy to see that a variant of the temporal logic CTL ∗ is first-order complete; the variant has path quantifiers for downward, leftward and rightward paths, while along a path one can check arbitrary formulas of linear temporal logic (LTL). This language has two major disadvantages: it requires path quantification in both horizontal directions (in particular, it r...
The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic obje... more The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic object for defining recognizable languages of data words. Following Büchi’s approach, we introduce a variant of monadic second-order logic with data equality tests that captures precisely the data languages recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Pa-pert, that the first-order fragment of this logic defines exactly the data languages recognizable by aperiodic orbit finite data monoids. Finally, we consider another variant of the logic that can be interpreted over generic structures with data. The data languages defined in this variant are also recognized by unambiguous finite memory automata.
Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an alg... more Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an algebraic object for defining recognizable lan-guages of data words. Following Büchi’s approach, we introduce the new logic ‘rigidly guarded MSO ’ and show that the data languages definable in this logic are exactly those recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Papert, that the first-order variant of this logic defines exactly the languages recognizable by aperiodic orbit finite data monoids. Finally, we give a variant of the logic that captures the larger class of languages recognized by non-deterministic finite memory automata. 1
Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This ... more Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This article proposes to overcome these limitations with “simulation unification ” and “RDFLog”. Simulation unification is a non-standard unification inspired from regular path queries. Like standard unification, it yields bindings for variables in both terms to unify. Unlike standard unification, it does not try to make the two terms identical but instead to embed the query into the data. Simulation unification is decidable. Without variables, it has polynomial complexity. With variables it is, like standard unification, np-complete. We identify a number of interesting special cases of unification, e.g., in presence or absence of term injectivity. In particular, we show that simulation unification without term injectivity on tree data is linear and in presence of injectivity it is still polynomial even on unordered trees in contrast to the np-complete unordered tree inclusion problem. RDFLog...
Abstract. RDF data is set apart from relational or XML data by its support of rich existential in... more Abstract. RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construction, i.e., the introduction of new blank nodes has been mostly ignored (e.g., in Triple) or treated in a very limited form (e.g., in SPARQL). In this paper, we classify three kinds of blank nodes in RDF query languages and introduce the recursive, rule-based RDF query language RDFLog. RDFLog is the first RDF query language with full arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. RDFLog is also aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS voca...
This survey article introduces into the essential concepts and methods underlying rule-based quer... more This survey article introduces into the essential concepts and methods underlying rule-based query languages. It covers four complementary areas: declarative semantics based on adaptations of mathematical logic, operational semantics, complexity and expressive power, and optimisation of query evaluation. The treatment of these areas is foundation-oriented, the foundations having resulted from over four decades of research in the logic programming and database communities on combinations of query languages and rules. These results have later formed the basis for conceiving, improving, and implementing several Web and Semantic Web technologies, in particular query languages such as XQuery or SPARQL for querying relational, XML, and RDF data, and rule languages like the “Rule Interchange Framework (RIF) ” currently being developed in a working group of the W3C. Coverage of the article is deliberately limited to declarative languages in a classical setting: issues such as query answerin...
Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt prog... more Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt programs requires to address the issue of efficient rule chaining. In this deliverable, we first survey existing approaches for efficient rule chaining (using some form of memoization) in logic programming and then briefly outline first results and challenges when extending these results to Xcerpt.
RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of R... more RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of RDF are blank nodes, which allow to assert the existence of an entity without naming for it. Despite the importance of blank nodes for RDF, many existing RDF query language have only insu cient support for blank nodes. We propose a query language for RDF, called RDFLog, with extensive blank node support. ¿e evaluation of RDFLog may be reduced to the evaluation of Datalog. ¿is allows to apply standard database technology to querying RDF. Our Experimental evaluation shows that our implementation scales well, even for large data sets. ¿e core feature of the reduction is Skolemisation and an new form of unSkolemisation. In contrast to previous de nitions of un-Skolemisation our un-Skolemisation has desirable symmetric properties to those of the Skolemisation. We de ne a hierarchy of syntactical restrictions of RDFLog with lower expressivity but better complexity, thereby showing the computati...
Abstract. RDF data is set apart from relational or XML data by its support of rich existential in... more Abstract. RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construction, i.e., the introduction of new blank nodes has been mostly ignored (e.g., in Triple) or treated in a very limited form (e.g., in SPARQL). In this paper, we classify three kinds of blank nodes in RDF query languages and introduce the recursive, rule-based RDF query language RDFLog. RDFLog is the first RDF query language with full arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. RDFLog is also aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS voca...
"This thesis is concerned with extending properties of regular word language... more "This thesis is concerned with extending properties of regular word languages to richer structures. We consider intricate properties like the relationship between one-way and two-way temporal logics, minimization of automata, and the ability to effectively characterize logics. We investigate whether these properties can be extended to tree languages or word languages over an infinite alphabet. It is known that linear temporal logic (LTL) is as expressive as first-order logic over finite words [Kam68, GPSS80]. LTL is a unidirectional logic, that can only navigate forwards in a word, hence it is quite surprising that it can capture all of first-order logic. In fact, one of the main ideas of the proof of [GPSS80] is to show that the expressiveness of LTL is not increased if modalities for navigating backwards are added. It is also known that an extension of bidirectional LTL to ordered trees, called Conditional XPath, is first-order complete [Mar04]. We investigate whether the unidirectional fragment of Conditional XPath is also first-order complete. We show that this is not the case. In fact we show that there is a strict hierarchy of expressiveness consisting of languages that are all weaker than first-order logic. Unidirectional Conditional XPath is contained in the lowest level of this hierarchy. In the second part of the thesis we consider data word languages. That is, word languages over an infinite alphabet. We extend the theorem of Myhill and Nerode to a class of automata for data word languages, called deterministic finite memory automata (DMA). We give a characterization of the languages that are accepted by DMA, and also provide an algorithm for minimizing DMA. Finally we extend theorems of Büchi, Schützenberger, McNaughton, and Papert to data word languages. A theorem of Bu ̈chi states that a language is regular iff it can be defined in monadic second-order logic. Schützenberger, McNaughton, and Papert have provided an effective characterization of first-order logic, that is, an algorithm for deciding whether a regular language can be defined in first-order logic. We provide a counterpart of Büchi’s theorem for data languages. More precisely we define a new logic and we show that it has the same expressiveness as non-deterministic finite memory automata. We then turn to a smaller class of data languages, those that are recognized by algebraic objects called orbit finite data monoids. We define a second new logic and show that it can define precisely the languages accepted by orbit finite data monoids. We provide an effective characterization of a first-order variant of this second logic, as well as of restrictions of first-order logic, such as its two variable fragment and local variants."
Lecture Notes in Computer Science, 2010
Proceedings of the VLDB Endowment, 2012
We study verification of systems whose transitions consist of accesses to a Web-based data-source... more We study verification of systems whose transitions consist of accesses to a Web-based data-source . An access is a lookup on a relation within a relational database, fixing values for a set of positions in the relation. For example, a transition can represent access to a Web form, where the user is restricted to filling in values for a particular set of fields. We look at verifying properties of a schema describing the possible accesses of such a system. We present a language where one can describe the properties of an access path, and also specify additional restrictions on accesses that are enforced by the schema. Our main property language, AccLTL, is based on a first-order extension of linear-time temporal logic, interpreting access paths as sequences of relational structures. We also present a lower-level automaton model, A-automata, which AccLTL specifications can compile into. We show that AccLTL and A-automata can express static analysis problems related to "querying wi...
Semantic Web Information Management, 2009
Lecture Notes in Computer Science, 2008
Citeseer
Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry D... more Beginn der Arbeit: September 2004 Abgabe der Arbeit: Juli 2005 Betreuer: Prof. Dr. François Bry Dr. Sebastian Schaffert ... This project thesis investigates error handling for the Web Query Language Xcerpt (cf. http://www.xcerpt.org). In addition, a formal definition of the Xcerpt syntax is given. ... After a short introduction to Xcerpt and parsing, several error management techniques known from literature are presented. They are compared regarding their usefulness for Xcerpt, and one technique is recommended as most suitable for Xcerpt.
Alberto Mendelzon Workshop on Foundations of Databases, 2010
Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages ... more Abstract. We provide a Myhill-Nerode-like theorem that characterizes the class of data languages recognized by deterministic finite-memory automata (DMA). As a byproduct of this characterization result, we obtain a canonical representation for any DMA-recognizable language. We then show that this canonical automaton is minimal in a strong sense: it has the minimal number of control states and also the minimal amount of internal storage. We finally show how this minimal automaton can be computed.
Semantic Web Information Management, 2010
SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is fu... more SPARQL has become the gold-standard for RDF query languages. Nevertheless, we believe there is further room for improving RDF query languages. In this chapter, we investigate the addition of rules and quantifier alternation to SPARQL. That extension, called SPARQLog, extends previous RDF query languages by arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. In addition, SPARQLog is aware of important RDF features such as the distinction between ...
Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is no... more Marx and de Rijke have shown that the navigational core of the w3c XML query language XPath is not first-order complete – that is it cannot express every query definable in firstorder logic over the navigational predicates. How can one extend XPath to get a first-order complete language? Marx has shown that Conditional XPath – an extension of XPath with an “Until ” operator – is first order complete. The completeness argument makes essential use of the presence of upward axes in Conditional XPath. We examine whether it is possible to get “forward-only ” languages that are first-order complete for XML Boolean queries. It is easy to see that a variant of the temporal logic CTL ∗ is first-order complete; the variant has path quantifiers for downward, leftward and rightward paths, while along a path one can check arbitrary formulas of linear temporal logic (LTL). This language has two major disadvantages: it requires path quantification in both horizontal directions (in particular, it r...
The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic obje... more The notion of orbit finite data monoid was recently introduced by Bojańczyk as an algebraic object for defining recognizable languages of data words. Following Büchi’s approach, we introduce a variant of monadic second-order logic with data equality tests that captures precisely the data languages recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Pa-pert, that the first-order fragment of this logic defines exactly the data languages recognizable by aperiodic orbit finite data monoids. Finally, we consider another variant of the logic that can be interpreted over generic structures with data. The data languages defined in this variant are also recognized by unambiguous finite memory automata.
Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an alg... more Abstract. The notion of orbit finite data monoid was recently intro-duced by Bojańczyk as an algebraic object for defining recognizable lan-guages of data words. Following Büchi’s approach, we introduce the new logic ‘rigidly guarded MSO ’ and show that the data languages definable in this logic are exactly those recognizable by orbit finite data monoids. We also establish, following this time the approach of Schützenberger, McNaughton and Papert, that the first-order variant of this logic defines exactly the languages recognizable by aperiodic orbit finite data monoids. Finally, we give a variant of the logic that captures the larger class of languages recognized by non-deterministic finite memory automata. 1
Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This ... more Abstract. For reasoning on the Web, Datalog is lacking data extraction and value invention. This article proposes to overcome these limitations with “simulation unification ” and “RDFLog”. Simulation unification is a non-standard unification inspired from regular path queries. Like standard unification, it yields bindings for variables in both terms to unify. Unlike standard unification, it does not try to make the two terms identical but instead to embed the query into the data. Simulation unification is decidable. Without variables, it has polynomial complexity. With variables it is, like standard unification, np-complete. We identify a number of interesting special cases of unification, e.g., in presence or absence of term injectivity. In particular, we show that simulation unification without term injectivity on tree data is linear and in presence of injectivity it is still polynomial even on unordered trees in contrast to the np-complete unordered tree inclusion problem. RDFLog...
Abstract. RDF data is set apart from relational or XML data by its support of rich existential in... more Abstract. RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construction, i.e., the introduction of new blank nodes has been mostly ignored (e.g., in Triple) or treated in a very limited form (e.g., in SPARQL). In this paper, we classify three kinds of blank nodes in RDF query languages and introduce the recursive, rule-based RDF query language RDFLog. RDFLog is the first RDF query language with full arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. RDFLog is also aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS voca...
This survey article introduces into the essential concepts and methods underlying rule-based quer... more This survey article introduces into the essential concepts and methods underlying rule-based query languages. It covers four complementary areas: declarative semantics based on adaptations of mathematical logic, operational semantics, complexity and expressive power, and optimisation of query evaluation. The treatment of these areas is foundation-oriented, the foundations having resulted from over four decades of research in the logic programming and database communities on combinations of query languages and rules. These results have later formed the basis for conceiving, improving, and implementing several Web and Semantic Web technologies, in particular query languages such as XQuery or SPARQL for querying relational, XML, and RDF data, and rule languages like the “Rule Interchange Framework (RIF) ” currently being developed in a working group of the W3C. Coverage of the article is deliberately limited to declarative languages in a classical setting: issues such as query answerin...
Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt prog... more Moving from single-rule Xcerpt programs as described in previous deliverables to full Xcerpt programs requires to address the issue of efficient rule chaining. In this deliverable, we first survey existing approaches for efficient rule chaining (using some form of memoization) in logic programming and then briefly outline first results and challenges when extending these results to Xcerpt.
RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of R... more RDF is an emerging knowledge representation formalism proposed by the W3C. A central feature of RDF are blank nodes, which allow to assert the existence of an entity without naming for it. Despite the importance of blank nodes for RDF, many existing RDF query language have only insu cient support for blank nodes. We propose a query language for RDF, called RDFLog, with extensive blank node support. ¿e evaluation of RDFLog may be reduced to the evaluation of Datalog. ¿is allows to apply standard database technology to querying RDF. Our Experimental evaluation shows that our implementation scales well, even for large data sets. ¿e core feature of the reduction is Skolemisation and an new form of unSkolemisation. In contrast to previous de nitions of un-Skolemisation our un-Skolemisation has desirable symmetric properties to those of the Skolemisation. We de ne a hierarchy of syntactical restrictions of RDFLog with lower expressivity but better complexity, thereby showing the computati...
Abstract. RDF data is set apart from relational or XML data by its support of rich existential in... more Abstract. RDF data is set apart from relational or XML data by its support of rich existential information in the form of blank nodes. Where in SQL databases null values are scoped over a single tuple, blank nodes in RDF can span over any number of statements and thus can be seen as existentially quantified variables. Blank node querying is considered in most RDF query languages, but blank node construction, i.e., the introduction of new blank nodes has been mostly ignored (e.g., in Triple) or treated in a very limited form (e.g., in SPARQL). In this paper, we classify three kinds of blank nodes in RDF query languages and introduce the recursive, rule-based RDF query language RDFLog. RDFLog is the first RDF query language with full arbitrary quantifier alternation: blank nodes may occur in the scope of all, some, or none of the universal variables of a rule. RDFLog is also aware of important RDF features such as the distinction between blank nodes, literals and URIs or the RDFS voca...