Change some more internal usages of ConjunctiveGraph to Dataset to silence warnings by ashleysommer · Pull Request #2867 · RDFLib/rdflib (original) (raw)

This was originally a very simple patch to change the use of ConjuctiveGraph to Dataset in the hextuples and nquads graph parsers.

The motivation was to silence the deprecation warnings that are emitted when simply parsing a nquads file or hextuples file, even if you were not personally using a ConjunctiveGraph, the warnings were being emitted for the use of ConjunctiveGraph within the parsers.

One of the key differences between ConjuctiveGraph and Dataset is how the default_graph works. When creating a new ConjunctiveGraph (cg), you can pass an identifier that becomes cg.identifier and also becomes the identifier of the default_graph. It uses a BNode as the identifier for the default graph if you don't pass one. A named graph with BNode as identifier is actually considered an "unnamed graph".

On the other hand, you cannot pass an identifier to a Dataset. The Dataset's internal identifier is always a new BNode, (that isn't even used for anything_. The identifier of ds.default_graph is the constant DATASET_DEFAULT_GRAPH_ID ("urn:x-rdflib:default"). So the difference is ConjunctiveGraph by default has an Unnamed graph as its default_graph, and Dataset has a Named Graph (with a URI) as its default_graph.

Changing ConjunctiveGraph to Dataset in the Hextuples and NQuads parsers inadvertently exposed an underlying existing bug in both parsers. If the destination graph (aka, the "sink") already had a Named default_graph assigned (like in the case of every Dataset), the parser would ignore that and parse into a new default_graph, causing one extra graph in the resulting dataset than you were expecting. That was fixed in this PR.

Another key difference between ConjuctiveGraph and Dataset is that ConjunctiveGraph always sets default_union to False unless you force it True, but Dataset is easily selectable, with the default False.
So for compatibility sake, we set default_union to True on the usage of Dataset in these two parsers.

Finally, after those changes were made I started getting failing tests in the seemingly unrelated JSON-LD Serializer. This was caused by the change of nquads to use Dataset in the parser. Specifically, it was an "Unnamed default-graph" vs "Named default-graph" issue. There was an existing bug in the JSON-LD serializer that treated all named-graphs as "non-default" graphs, where it wraps them in a @graph{} wrapper and adds an @id with the named-graph identifier. All "Unnamed" graphs were treated as "default-graph" and were added to the top-level context in the JSON-LD file. The problem is it was treating the Dataset named graph with name DATASET_DEFAULT_GRAPH_ID as a "non-default" graph because it has a URI as its identifier. This was triggering a lot of JSON-LD serializer tests to fail because the expected output was different.

So this PR fixes that issue in the JSON-LD serializer too, so it now treats Dataset default_graph as an un-named graph.
Note, this is directly related to #2445 which still needs to be fixed.