ConjunctiveGraph doesn't handle parsing datasets with default graphs properly · Issue #436 · RDFLib/rdflib (original) (raw)

When ConjunctiveGraph.parse is called, it wraps its underlying store in a regular Graph instance. This causes problems for parsers of datasets, e.g. NQuads, TriG and JSON-LD.

Specifically, the triples in the default graph of a dataset haphazardly end up in bnode-named contexts.

Example:

import sys from rdflib import *

cg = ConjunctiveGraph() cg.parse(format="nquads", data=u""" http://example.org/a http://example.org/ns#label "A" . http://example.org/b http://example.org/ns#label "B" http://example.org/b/ . """) assert len(cg.default_context) == 1 # fails

While I've attempted to overcome this by using the underlying graph.store in these parsers, they cannot access the default_context of ConjunctiveGraph through this store. It is there in the underlying store, but its identifier is inaccessible to the parser without further changes to the parse method of ConjunctiveGraph.

This becomes tricky because the contract for ConjunctiveGraph:s parse method is:

    Parse source adding the resulting triples to its own context
    (sub graph of this graph).

    See :meth:`rdflib.graph.Graph.parse` for documentation on arguments.

    :Returns:

    The graph into which the source was parsed. In the case of n3
    it returns the root context.

I am not sure how we can change this behaviour, since client code may rely on this. We could either add a new method, e.g. parse_dataset, or a flag. That would not be obvious to all users though, and somehow I would like to change the behaviour to handle datasets as well. It is always possible to get/create a named graph from a conjunctive graph and parse data into that.

I have gotten further by adding publicID=cg.default_context.identifier to the parse invocation. This causes the TriG parser to behave properly (and it is easy to adapt the nquads parser to work from there on). But I am not sure if this is a wise solution to the problem.

I'll mull more on this given time, but it would be good to have more people consider a proper revision of the parsing mechanism for datasets.

This underlies the problems described in #432 and #433 (and is related #428).

(Obviously, this in turn causes the serializers for the same formats to emit unexpected bnode-named graphs when data has been read through these parsers.)