BREAKING CHANGE: Don't use publicID as the name for the default graph. by aucampia · Pull Request #2406 · RDFLib/rdflib (original) (raw)

You are right, the test you shared will work on main and fail on this PR branch, and this is because on main publicID is used as the identifier for the default graph in addition to the base for relative URI resolution in source documents, while in this PR branch, it is only used as the base for relative URI resolution.

Do note, this PR together with every issue it addresses is labelled as breaking change, so with that as baseline, with the issues that this PR tries to address, what we need to answer is:

  1. How should parse work? This is a more critical question than how does parse work in main, given this is labelled as a breaking change, so that is clearly signalling something will be different from the main branch.
  2. What should we do about the issues this tries to address?

If parse should work the way it works in main, we close the issues and move on as then they are not issues. If however the issues are valid, then parse should not work the way it works in main, we have to change how it works. And then the question is only what we should change it to, or even more pertinently, how do we make it better than it is.

I think the way it is working in main, i.e. using publicID as the name of the graph that triples inside the default graph is loaded into, is wrong. This is also not what the documentation of publicID suggests will happen.

- ``publicID``: the logical URI to use as the document base. If None
specified the document location is used (at least in the case where
there is a document location).

Given that description, what I would expect is that the publicID is what relative URIs will be resolved against. And that is also what it is used for, and the only thing it should be used for, I think. And I don't see why the base for URI relative URI resolution should be the same as the graph name that default triples are being loaded into, which should not be named to begin with because as per the spec, has no name:

https://www.w3.org/TR/rdf11-concepts/#section-dataset

Exactly one default graph, being an RDF graph. The default graph does not have a name and may be empty.

Another way to think about this is, what is worse:

  1. Not having the content of default graph in sources loaded into the default graph of Dataset/ConjunctiveGraph
  2. Having to explicitly load something into a sub-graph if you don't want it in the default graph

I think 1 is worse.