Canonical form of SPARQL Patterns · Issue #483 · RDFLib/rdflib (original) (raw)
I'm currently performing >> 1M SPARQL Queries as part of some machine learning algorithm. As this takes a while, i thought about caching results for SPARQL queries. The problem here is, that different SPARQL queries can contain Variables with different names, but are isomorphic otherwise. Example:
select * where { ?s foo:bar foo:bla }
is isomorphic to
select * where { ?s2 foo:bar foo:bla }
For quick checking in a cache it would be cool to have a canonical form of a SPARQL Pattern, very much like #441 (rdflib.compare.to_canonical_graph(g1)
) for rdflib.Graph
.
A SPARQL Query's pattern part can be represented as an rdflib.Graph
which contains Variable
s. By replacing Variables with BNodes (using the variable name as bnode id) one gets pretty close to a graph that one could use the to_canonical_graph
algorithm on, with one exception: BNodes can't be used as predicates (RDF Concepts).
As this is out of spec, i guess it's ok this fails:
In [1]: from rdflib import * INFO:rdflib:RDFLib Version: 4.2.1-dev
In [2]: from rdflib.compare import *
In [3]: g = Graph()
In [4]: g.add((BNode('v1'), BNode('v2'), URIRef('foo')))
In [5]: to_canonical_graph(g)
KeyError Traceback (most recent call last) [...] /usr/local/lib/python2.7/site-packages/rdflib/compare.pyc in _canonicalize_bnodes(self, triple, labels) 456 for term in triple: 457 if isinstance(term, BNode): --> 458 yield BNode(value="cb%s" % labels[term]) 459 else: 460 yield term
KeyError: rdflib.term.BNode('v2')
Nevertheless, as this is quite close to a cool feature and graph canonicalization isn't exactly the easiest problem to think about: is it maybe possible to slightly adapt the RGDA1 algorithm to support BNodes in the predicate position as well and thereby also making it fit for SPARQL Patterns? Maybe @jimmccusker has an idea on this?