Implement RDF Patch serializer by recalcitrantsupplant · Pull Request #2877 · RDFLib/rdflib (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation11 Commits8 Checks20 Files changed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Supports serialization from Dataset instances only; triples and quads within a Dataset are supported.
Summary of changes
Three methods to create RDF Patches from RDFLib Datasets:
- Serialize a Dataset as an addition patch
- Serialize a Dataset as a delete patch
- Create a patch representing the difference between a Dataset instance and a target Dataset instance
Basic usage:
ds.serialize(format="patch", operation="add")
ds.serialize(format="patch", operation="remove")
ds1.serialize(format="patch", target=ds2)
Complete examples are provided in an example script.
Checklist
- Checked that there aren't other open pull requests for
the same change. - Checked that all tests and type checking passes.
- If the change adds new features or changes the RDFLib public API:
Created an issue to discuss the change and get in-principle agreement.Discussed w/ maintainers- Considered adding an example in
./examples
.
- If the change has a potential impact on users of this project:
- Updated relevant documentation to avoid inaccuracies.
- Considered adding additional documentation.
- Considered granting push permissions to the PR branch,
so maintainers can fix minor issues and keep your PR up to date.
…nstances only; triples and quads within a Dataset are supported.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in our chat, this version of the serializer could function as a utility function rather than a serializer, performing one of two operations:
- Add or Remove Operation: Takes a dataset with an
add
orremove
operation parameter and generates an RDF patch. - Difference Operation: Takes two datasets and performs a
difference
operation, with the second dataset as the "target".
This approach allows us to reserve the patch serializer for RDF patches on an RDFLib store with change tracking enabled. The backend would track changes applied to the store through a series of RDF patches, and the serializer would output these patches sequentially.
I'm not suggesting we need to implement this, but just toying with what the potential RDFLib public interfaces may look like with a complete RDF patch support.
The mypy issues in main
should be resolved now. I've resynced this PR, hopefully it parses now.
@recalcitrantsupplant
Still 4 mypy errors related to this PR:
poetry run python -m mypy --show-error-context --show-error-codes --junit-xml=test_reports/3.9-macos-latest-mypy-junit.xml
rdflib/plugins/serializers/patch.py: note: In member "serialize" of class "PatchSerializer":
rdflib/plugins/serializers/patch.py:30: error: Signature of "serialize" incompatible with supertype "Serializer" [override]
rdflib/plugins/serializers/patch.py:30: note: Superclass:
rdflib/plugins/serializers/patch.py:30: note: def serialize(self, stream: IO[bytes], base: Optional[str] = ..., encoding: Optional[str] = ..., **args: Any) -> None
rdflib/plugins/serializers/patch.py:30: note: Subclass:
rdflib/plugins/serializers/patch.py:30: note: def serialize(self, stream: IO[bytes], base: Optional[str] = ..., encoding: Optional[str] = ..., operation: Optional[str] = ..., target: Optional[Graph] = ..., header_id: Optional[str] = ..., header_prev: Optional[str] = ...) -> Any
rdflib/plugins/serializers/patch.py: note: In function "serialize":
rdflib/plugins/serializers/patch.py:60: error: "Graph" has no attribute "get_context" [attr-defined]
rdflib/plugins/serializers/patch.py: note: In member "serialize" of class "PatchSerializer":
rdflib/plugins/serializers/patch.py:74: error: "Graph" has no attribute "contexts" [attr-defined]
rdflib/plugins/serializers/patch.py: note: In member "_patch_row" of class "PatchSerializer":
rdflib/plugins/serializers/patch.py:88: error: "Graph" has no attribute "default_context" [attr-defined]
Found 4 errors in 1 file (checked 401 source files)
coverage: 90.611% (+0.02%) from 90.595%
when pulling e58efff on recalcitrantsupplant:david/rdf-patch-serialiser
into 0c11deb on RDFLib:main.
@recalcitrantsupplant I thought this was complete and passing tests and ready to merge, however after merging it there are now some test failures appearing in main
, and I've verified locally it has something to do with the roundtrip tests.
As this .patch
parser is a registered RDFLib parser plugin, the test suite includes it in the roundtrip tests. There are a bunch of .nt
->.patch
conversions and .n3
->.patch
roundtrip tests that are failing with odd errors.
I don't have time to troubleshoot it now.
As @edmondchuc mentioned above, I don't know what use this is as a regular RDFLib serializer, because you really need to associate it with actions such as Add
or Remove
or target
, which are not defined by default when doing operations like roundtrip. This really makes sense as a helper function, that allows you to script the creation of a .patch
file.
To allow main
to pass tests again, I'm going to revert this merge. We can revisit it again soon.
@recalcitrantsupplant Another update, I haven't revered this yet.
I found that if I add a fallback to find cases where operation
is not passed, but target
is also not passed, then default to "add"
operation. With simple change, now all roundtrip tests are passing. Does that sound like a sensible change to you? I would assume anyone tring to naively serialize a graph or Dataset to .patch
format without an operation is going to by default want the "add"
patch operation.
This prevents unexpectedly getting a .patch
file that contains only the header block and no content (thats why the test cases were failing.)
The issue described in the previous two comments is now fixed by #2898 so tests in main
are now passing.