Correctly handle \u in Notation 3 files. by amandasaurus · Pull Request #426 · RDFLib/rdflib (original) (raw)
It was using case insensitive regex, so it was mistaking \u and \U.
Sec 6.4 of the W3C spec (http://www.w3.org/TR/turtle/#sec-escapes) says that it's either \uXXXX (4 chars) or \UXXXXXXXX (8 chars). The current code uses a regex, but the regex has the case-insensitive flag set. So if there is a \uXXXX in the turtle file, the 8 character regex for \U will match and it'll try to pull in 8 character (rather than 4).
I've included tests that demostrate this.