Correctly handle \u in Notation 3 files. by amandasaurus · Pull Request #426 · RDFLib/rdflib (original) (raw)

It was using case insensitive regex, so it was mistaking \u and \U.

Sec 6.4 of the W3C spec (http://www.w3.org/TR/turtle/#sec-escapes) says that it's either \uXXXX (4 chars) or \UXXXXXXXX (8 chars). The current code uses a regex, but the regex has the case-insensitive flag set. So if there is a \uXXXX in the turtle file, the 8 character regex for \U will match and it'll try to pull in 8 character (rather than 4).

I've included tests that demostrate this.