feat: deterministic longturtle serialisation using RDF canonicalization + n-triples sort by edmondchuc · Pull Request #3008 · RDFLib/rdflib (original) (raw)

This PR improves upon #2997 to remove the bespoke object blank node sorting technique to instead use sorted n-triples str lines after applying the RGDA1 graph canonicalisation algorithm. Fixes #1890.

It's necessary to read in the sorted n-triples lines with skolemize=True to preserve the blank node identifiers from the canonicalisation algorithm.

Now that we can sort reliably by the blank node identifiers, this implementation works for all blank node positions in an RDF statement, no matter if it's in the subject or object position. It even works for blank nodes at the top-level.

@ajnelson-nist, I've added your blank node test from Sort Turtle output (#1978) and it's now passing, yay!

Update: this also fixes the double up of semicolons bug when the subject is a top-level blank node. See 412fb5d.

Checklist