Unicode tests fail in python3.3 when using a non-UTF-8 locale · Issue #344 · RDFLib/rdflib (original) (raw)
For example, run
cd build/src; LC_ALL=C python3.3 ./run_tests.py
(Note: this is somewhat important for linux packagers, because most linux packaging runs in a "clean" environment with posix locale set, and we'd like to run tests where possible.)
You will get several errors like:
ERROR: test.test_dawg.test_dawg((rdflib.term.URIRef('http://www.w3.org/2009/sparql/docs/tests/data-sparql11/functions/manifest#strlang03'), 'STRLANG() TypeErrors', None, 'file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/data.ttl', [], 'file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/strlang03.rq', rdflib.term.URIRef('file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/strlang03.srx'), True),)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib64/python3.3/site-packages/nose/case.py", line 198, in runTest
self.test(*self.arg)
File "/home/cwalton/Development/rdflib-4.0.1/build/src/test/test_dawg.py", line 412, in query_test
res = Result.parse(open(resfile[7:]), format='xml')
File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/query.py", line 197, in parse
return parser.parse(source, **kwargs)
File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/plugins/sparql/results/xmlresults.py", line 34, in parse
return XMLResult(source)
File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/plugins/sparql/results/xmlresults.py", line 40, in __init__
xmlstring = source.read()
File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1097: ordinal not in range(128)
The trivial fix of setting the character encoding on the open() statements in test_dawg.py allows the tests to work in python3.3; but if you set suitably, one test in python2.7 will fail with this output:
ERROR: test.test_dawg.test_dawg((rdflib.term.URIRef(u'http://raw.github.com/RDFLib/rdflib/master/test/DAWG/rdflib/manifest.ttl#unicode'), 'Unicode in SPARQL queries', None, 'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.ttl', [], 'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.rq', rdflib.term.URIRef(u'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.srx'), True),)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/lib64/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/home/cwalton/Development/rdflib-4.0.1/test/test_dawg.py", line 384, in query_test
res2 = g.query(codecs.open(query[7:], encoding="utf-8").read(), base=urljoin(query, '.'))
File "/home/cwalton/Development/rdflib-4.0.1/rdflib/graph.py", line 1045, in query
query_object, initBindings, initNs, **kwargs))
File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/processor.py", line 72, in query
parsetree = parseQuery(strOrQuery)
File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/parser.py", line 1034, in parseQuery
return Query.parseString(q, parseAll=True)
File "/usr/lib64/python2.7/site-packages/pyparsing.py", line 1031, in parseString
loc, tokens = self._parse( instring, 0 )
<snip a bunch of pyparsing internals>
File "/usr/lib64/python2.7/site-packages/pyparsing.py", line 695, in wrapper
ret = func(*args[limit[0]:])
File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/parser.py", line 300, in <lambda>
lambda x: rdflib.Literal(decodeStringEscape(x[0][1:-1])))
File "/home/cwalton/Development/rdflib-4.0.1/rdflib/py3compat.py", line 129, in decodeStringEscape
return s.decode('string-escape')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)