Implement RDF Patch serializer by recalcitrantsupplant · Pull Request #2877 · RDFLib/rdflib (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation11 Commits8 Checks20 Files changed

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

recalcitrantsupplant

Supports serialization from Dataset instances only; triples and quads within a Dataset are supported.

Summary of changes

Three methods to create RDF Patches from RDFLib Datasets:

  1. Serialize a Dataset as an addition patch
  2. Serialize a Dataset as a delete patch
  3. Create a patch representing the difference between a Dataset instance and a target Dataset instance

Basic usage:

  1. ds.serialize(format="patch", operation="add")
  2. ds.serialize(format="patch", operation="remove")
  3. ds1.serialize(format="patch", target=ds2)

Complete examples are provided in an example script.

Checklist

@recalcitrantsupplant

…nstances only; triples and quads within a Dataset are supported.

@recalcitrantsupplant

@recalcitrantsupplant

edmondchuc

edmondchuc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in our chat, this version of the serializer could function as a utility function rather than a serializer, performing one of two operations:

This approach allows us to reserve the patch serializer for RDF patches on an RDFLib store with change tracking enabled. The backend would track changes applied to the store through a series of RDF patches, and the serializer would output these patches sequentially.

I'm not suggesting we need to implement this, but just toying with what the potential RDFLib public interfaces may look like with a complete RDF patch support.

@recalcitrantsupplant

@ashleysommer

@ashleysommer

The mypy issues in main should be resolved now. I've resynced this PR, hopefully it parses now.

@ashleysommer

@recalcitrantsupplant
Still 4 mypy errors related to this PR:

poetry run python -m mypy --show-error-context --show-error-codes --junit-xml=test_reports/3.9-macos-latest-mypy-junit.xml
  rdflib/plugins/serializers/patch.py: note: In member "serialize" of class "PatchSerializer":
  rdflib/plugins/serializers/patch.py:30: error: Signature of "serialize" incompatible with supertype "Serializer"  [override]
  rdflib/plugins/serializers/patch.py:30: note:      Superclass:
  rdflib/plugins/serializers/patch.py:30: note:          def serialize(self, stream: IO[bytes], base: Optional[str] = ..., encoding: Optional[str] = ..., **args: Any) -> None
  rdflib/plugins/serializers/patch.py:30: note:      Subclass:
  rdflib/plugins/serializers/patch.py:30: note:          def serialize(self, stream: IO[bytes], base: Optional[str] = ..., encoding: Optional[str] = ..., operation: Optional[str] = ..., target: Optional[Graph] = ..., header_id: Optional[str] = ..., header_prev: Optional[str] = ...) -> Any
  rdflib/plugins/serializers/patch.py: note: In function "serialize":
  rdflib/plugins/serializers/patch.py:60: error: "Graph" has no attribute "get_context"  [attr-defined]
  rdflib/plugins/serializers/patch.py: note: In member "serialize" of class "PatchSerializer":
  rdflib/plugins/serializers/patch.py:74: error: "Graph" has no attribute "contexts"  [attr-defined]
  rdflib/plugins/serializers/patch.py: note: In member "_patch_row" of class "PatchSerializer":
  rdflib/plugins/serializers/patch.py:88: error: "Graph" has no attribute "default_context"  [attr-defined]
  Found 4 errors in 1 file (checked 401 source files)

@recalcitrantsupplant

@recalcitrantsupplant

@coveralls

Coverage Status

coverage: 90.611% (+0.02%) from 90.595%
when pulling e58efff on recalcitrantsupplant:david/rdf-patch-serialiser
into 0c11deb on RDFLib:main.

@recalcitrantsupplant

@ashleysommer

@recalcitrantsupplant I thought this was complete and passing tests and ready to merge, however after merging it there are now some test failures appearing in main, and I've verified locally it has something to do with the roundtrip tests.
As this .patch parser is a registered RDFLib parser plugin, the test suite includes it in the roundtrip tests. There are a bunch of .nt->.patch conversions and .n3->.patch roundtrip tests that are failing with odd errors.
I don't have time to troubleshoot it now.

As @edmondchuc mentioned above, I don't know what use this is as a regular RDFLib serializer, because you really need to associate it with actions such as Add or Remove or target, which are not defined by default when doing operations like roundtrip. This really makes sense as a helper function, that allows you to script the creation of a .patch file.

To allow main to pass tests again, I'm going to revert this merge. We can revisit it again soon.

@ashleysommer

@recalcitrantsupplant Another update, I haven't revered this yet.
I found that if I add a fallback to find cases where operation is not passed, but target is also not passed, then default to "add" operation. With simple change, now all roundtrip tests are passing. Does that sound like a sensible change to you? I would assume anyone tring to naively serialize a graph or Dataset to .patch format without an operation is going to by default want the "add" patch operation.

This prevents unexpectedly getting a .patch file that contains only the header block and no content (thats why the test cases were failing.)

@ashleysommer

The issue described in the previous two comments is now fixed by #2898 so tests in main are now passing.