NTriples Serializer emits invalid bnode identifiers · Issue #212 · RDFLib/rdflib (original) (raw)
NTriples serializer reuses Literal's n3() function, which returns UUID-based bnode identifiers, prefixed by an underscore. BNode identifiers contain underscores and dashes, both of which are invalid in NTriples, and both (rightfully) cause rdflib's NTriples parser to reject the data.
To reproduce:
from rdflib.graph import Graph from rdflib.term import BNode, URIRef g=Graph() g.add((BNode(), URIRef('http://foobar/something'), BNode())) Graph().parse(data=g.serialize(format='nt'), format='nt') ParseError: Invalid line: '_:_9bd619ea-117a-41d7-93b3-1bd1622b4660 http://foobar/something _:_1b2da19f-8a5c-4f4d-9461-7f1214137746 .'
Proposed fix consists of two parts:
- Change
rdflib.term._unique_id()
to return "A" instead of an underscore -- thus, bnode ids will always start with a letter - Instead of using uuid4 as serial number generator, use a variant that removes all dashes from a uuid:
def uuid4_ncname(): """ Generates UUID4-based but ncname-compliant identifiers. """ return unicode(uuid4()).replace('-', '')
See gnowsis@4072188