Some special characters might be parsed wrongly (?) · Issue #1655 · RDFLib/rdflib (original) (raw)

It seems that some special characters in RDF literals are not preserved after parsing them but rather translated into something faulty. So far, I found following ones:

n3_test.nt:

<http:s> <http:o1> "\n" .
<http:s> <http:o2> "\f" .
<http:s> <http:o3> "\b" .
<http:s> <http:o4> "\\r" .
<http:s> <http:o5> "\\\r" .

from rdflib import Graph
from rdflib import term

g = Graph()
g.parse("n3_test.nt")

for s, p, o in g:
    assert (type(o) == term.Literal)
    print("{s} {p} {o}".format(s=s.n3(), p=p.n3(), o=o.n3()))

Sorted Output:

<http:s> <http:o1> """
"""
<http:s> <http:o2> ""
<http:s> <http:o3> "
<http:s> <http:o4> "\\\r"
<http:s> <http:o5> "\\\r"

We see e.g. that "\\r" and "\\\r" result into the same literal and I am not sure if this is the expected behavior. There are some DBPedia logs unfortunately which have such characters and currently I just cannot parse them correctly.

Is there a trick or a proper way how to get around this?