Strategies for breaking changes · RDFLib/rdflib · Discussion #1841 (original) (raw)

While I like some of your suggestions in theory, I'm not sure how practical it
is in the near term, maybe after a lot of cleanup and rearchitecting we can
consider it, but it really does amount to a complete rewrite almost.

For now we have the reality that we have one version number for RDFLib, and we
are somewhat trying to follow semver, though I think we should make a firm
commitment to it. I will try and clarify exactly what this commitment is in
documentation when I have time, but at the very least major version 6 releases
should be backwards compatible on the interface provided by modules directly
under rdflib and outside of rdflib.plugins.

Consider the following real world examples of what we have to deal with:

Reworking the Query.serialize interface:
Make serialize() on a CONSTRUCT result act like normal g.serialize() #1834
Fixing the skolemization URI for RDFLib:
Fix for #1824 s,http://rdlib.net,http://rdflib.net,g #1901
Reworking Dataset interface and removing ConjunctiveGraph:
dataset re-work for 7.0 release #1814

Our options here are, as far as I can see:

For #1:

Change Query.serialize and commit to releasing V7.
Possibly the simplest option, but the downside here is that every release will
likely be a major version release.
Keep a PR around for many months, rebasing it all the time until we are
actually prepared to release RDFLib 7.
This is what is happening now and it is quite resource intensive to do.
Create rdflib.query.v2, rdflib.query.QueryV2,
rdflib.query.Query.serialize_v2,
I think this complicates things too much and I have not really seen this
pattern being used that successfully using python. I would not mind seeing
some examples of it being used in practice, in theory it sounds good, but this
actually adds many dimensions to the problems we have. Say we go for
versioning the function, now every function that calls it needs to be
versioned if it has to somehow expose some functionality from it, the end
result may be a quite complicated and unmanagable proliferation.
Merge the fix under rdflib.query.Query._serialize_v7, with
rdflib.query.Query.serialize implemented on top of it - and then once we
release V7 just replace rdflib.query.Query.serialize with
rdflib.query.Query._serialize_v7.
This to me reduces the actual change that is needed to release V7 compared to
keeping the whole PR around, but also allows us flexibility in when we want to
release V7.

For #2:

Change the skolemization URI and commit to releasing V7.
See previous concern.
Keep a PR around for many months, rebasing it all the time until we are
actually prepared to release RDFLib 7.
See previous concern.
Versioned modules/classes/functions.
See previous concern.
Use _SKOLEM_AUTHORITY instead of "http://rdlib.net/" - and then set
_SKOLEM_AUTHORITY based on the value of some private global variable, it
could be an explicit feature flag, it could be something as simple as
rdflib._internal.version[0]. We can already test it, and when we are ready
to commit to releasing it then it is a very minimal change.

For #3 it more or less comes down to the same, if we can somehow keep the change
under the surface and maintain public contracts we can merge the bulk of the
changes early, and then once we are ready make a very minimal change to release
a new version.

Specifically addressing your feedback:

I don't think you need to add under private names -- if it's code that's ready
for prime time, give it a public name.

The problem is, if we we make it under a public name it becomes a public
contract, and to me the less we are committing to the easier thigns are to
maintain. Also in the cases in question it is not about adding a new function,
it is about updating existing functions that are quite core to RDFLib.

You can put the updated class/function/value under a namespace package or in
the same module with a name that indicates the version.

We currently have one version for RDFLib, prolifierating this into multiple
dimensions will not make things that much easier in my view, neither for users
of RDFLib or for the maintainers. It will complicate inheritence heirarchies, it
will be strange for users who now have to recall which version of
Graph.serialize is actual at the moment, it will make upgrading quite noisy. I
would want to see some prior art on this before considering it.

Deprecate the specific functionality that's going away by clearly adding a
message with DeprecationWarning or something else if appropriate: the dead
date for the old functionality can be firmed up as late as needed. You're
obliged to keep the versioned names around for a while, of course, but you can
deprecate those too eventually.

If we are following semver, which we are on paper, we are obliged to keep it
around as long as we have the same major version if removing it will break
backwards compatibility.

Use feature flags. We could use global flags in a hidden namespace to
control behaviour. ...

Please don't do this. Would create headaches for dealing with multiple
unrelated packages both depending on RDFLib, but only working for one or the
other value for a flag.

These flags would not be for public consumption, they would be entirely for
internal use, if someone is fiddling with private variables then they should not
expect continued support or reliability as these are not part of our contract
with RDFLib users. Also it could be as simple as one global value which is
essentially the major version under which we are operating, and we can then
quite easily run our test suite for V7 and V8 by just mutating it in conftest.py.