Bad UTF-8 handling for NTriple parser · Issue #400 · RDFLib/rdflib (original) (raw)

The current NTriple parser does not handle properly utf-8 string.

the following triple trigger an error :
<http://linkedgeodata.org/triplify/user22701> <http://www.w3.org/2000/01/rdf-schema#label> "CiaránMooney" .

when executing this code :

line = codecs.open("file", "r", "utf-8").readline() parser = NTriples.NTriplesParser(sink = mySink) parser.parsestring(line)#where line is the previous triple

The parser fail to parse the "á" character.
Modifying the parse function of the NTriplesParser class allow to solve this bug.

Current :

def parse(self, f): """Parse f as an N-Triples file.""" if not hasattr(f, 'read'): raise ParseError("Item to parse must be a file-like object.")

    f = ascii(f)
    ...

Fix:

def parse(self, f): """Parse f as an N-Triples file.""" if not hasattr(f, 'read'): raise ParseError("Item to parse must be a file-like object.")

    f = codecs.getreader("utf-8")(f)
    ...

Hope this help.

AVL