feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort by edmondchuc · Pull Request #3008 · RDFLib/rdflib (original) (raw)
This PR improves upon #2997 to remove the bespoke object blank node sorting technique to instead use sorted n-triples str lines after applying the RGDA1 graph canonicalisation algorithm. Fixes #1890.
It's necessary to read in the sorted n-triples lines with skolemize=True
to preserve the blank node identifiers from the canonicalisation algorithm.
Now that we can sort reliably by the blank node identifiers, this implementation works for all blank node positions in an RDF statement, no matter if it's in the subject or object position. It even works for blank nodes at the top-level.
@ajnelson-nist, I've added your blank node test from Sort Turtle output (#1978) and it's now passing, yay!
Update: this also fixes the double up of semicolons bug when the subject is a top-level blank node. See 412fb5d.
Checklist
- Checked that there aren't other open pull requests for
the same change. - Checked that all tests and type checking passes.
- If the change adds new features or changes the RDFLib public API:
- Created an issue to discuss the change and get in-principle agreement.
- Considered adding an example in
./examples
.
- If the change has a potential impact on users of this project:
- Added or updated tests that fail without the change.
- Updated relevant documentation to avoid inaccuracies.
- Considered adding additional documentation.
- Considered granting push permissions to the PR branch,
so maintainers can fix minor issues and keep your PR up to date.