Some special characters might be parsed wrongly (?) · Issue #1655 · RDFLib/rdflib (original) (raw)
It seems that some special characters in RDF literals are not preserved after parsing them but rather translated into something faulty. So far, I found following ones:
n3_test.nt:
<http:s> <http:o1> "\n" .
<http:s> <http:o2> "\f" .
<http:s> <http:o3> "\b" .
<http:s> <http:o4> "\\r" .
<http:s> <http:o5> "\\\r" .
from rdflib import Graph
from rdflib import term
g = Graph()
g.parse("n3_test.nt")
for s, p, o in g:
assert (type(o) == term.Literal)
print("{s} {p} {o}".format(s=s.n3(), p=p.n3(), o=o.n3()))
Sorted Output:
<http:s> <http:o1> """
"""
<http:s> <http:o2> ""
<http:s> <http:o3> "
<http:s> <http:o4> "\\\r"
<http:s> <http:o5> "\\\r"
We see e.g. that "\\r" and "\\\r" result into the same literal and I am not sure if this is the expected behavior. There are some DBPedia logs unfortunately which have such characters and currently I just cannot parse them correctly.
Is there a trick or a proper way how to get around this?