Strategies for breaking changes · RDFLib/rdflib · Discussion #1841 (original) (raw)

While I like some of your suggestions in theory, I'm not sure how practical it
is in the near term, maybe after a lot of cleanup and rearchitecting we can
consider it, but it really does amount to a complete rewrite almost.

For now we have the reality that we have one version number for RDFLib, and we
are somewhat trying to follow semver, though I think we should make a firm
commitment to it. I will try and clarify exactly what this commitment is in
documentation when I have time, but at the very least major version 6 releases
should be backwards compatible on the interface provided by modules directly
under rdflib and outside of rdflib.plugins.

Consider the following real world examples of what we have to deal with:

  1. Reworking the Query.serialize interface:
    Make serialize() on a CONSTRUCT result act like normal g.serialize() #1834
  2. Fixing the skolemization URI for RDFLib:
    Fix for #1824 s,http://rdlib.net,http://rdflib.net,g #1901
  3. Reworking Dataset interface and removing ConjunctiveGraph:
    dataset re-work for 7.0 release #1814

Our options here are, as far as I can see:

For #1:

For #2:

For #3 it more or less comes down to the same, if we can somehow keep the change
under the surface and maintain public contracts we can merge the bulk of the
changes early, and then once we are ready make a very minimal change to release
a new version.

Specifically addressing your feedback:

I don't think you need to add under private names -- if it's code that's ready
for prime time, give it a public name.

The problem is, if we we make it under a public name it becomes a public
contract, and to me the less we are committing to the easier thigns are to
maintain. Also in the cases in question it is not about adding a new function,
it is about updating existing functions that are quite core to RDFLib.

You can put the updated class/function/value under a namespace package or in
the same module with a name that indicates the version.

We currently have one version for RDFLib, prolifierating this into multiple
dimensions will not make things that much easier in my view, neither for users
of RDFLib or for the maintainers. It will complicate inheritence heirarchies, it
will be strange for users who now have to recall which version of
Graph.serialize is actual at the moment, it will make upgrading quite noisy. I
would want to see some prior art on this before considering it.

Deprecate the specific functionality that's going away by clearly adding a
message with DeprecationWarning or something else if appropriate: the dead
date for the old functionality can be firmed up as late as needed. You're
obliged to keep the versioned names around for a while, of course, but you can
deprecate those too eventually.

If we are following semver, which we are on paper, we are obliged to keep it
around as long as we have the same major version if removing it will break
backwards compatibility.

  • Use feature flags. We could use global flags in a hidden namespace to
    control behaviour. ...

Please don't do this. Would create headaches for dealing with multiple
unrelated packages both depending on RDFLib, but only working for one or the
other value for a flag.

These flags would not be for public consumption, they would be entirely for
internal use, if someone is fiddling with private variables then they should not
expect continued support or reliability as these are not part of our contract
with RDFLib users. Also it could be as simple as one global value which is
essentially the major version under which we are operating, and we can then
quite easily run our test suite for V7 and V8 by just mutating it in conftest.py.