Unicode tests fail in python3.3 when using a non-UTF-8 locale · Issue #344 · RDFLib/rdflib (original) (raw)

For example, run

cd build/src; LC_ALL=C python3.3 ./run_tests.py

(Note: this is somewhat important for linux packagers, because most linux packaging runs in a "clean" environment with posix locale set, and we'd like to run tests where possible.)

You will get several errors like:

ERROR: test.test_dawg.test_dawg((rdflib.term.URIRef('http://www.w3.org/2009/sparql/docs/tests/data-sparql11/functions/manifest#strlang03'), 'STRLANG() TypeErrors', None, 'file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/data.ttl', [], 'file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/strlang03.rq', rdflib.term.URIRef('file:///home/cwalton/Development/rdflib-4.0.1/build/src/test/DAWG/data-sparql11/functions/strlang03.srx'), True),)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.3/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/cwalton/Development/rdflib-4.0.1/build/src/test/test_dawg.py", line 412, in query_test
    res = Result.parse(open(resfile[7:]), format='xml')
  File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/query.py", line 197, in parse
    return parser.parse(source, **kwargs)
  File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/plugins/sparql/results/xmlresults.py", line 34, in parse
    return XMLResult(source)
  File "/home/cwalton/Development/rdflib-4.0.1/build/src/rdflib/plugins/sparql/results/xmlresults.py", line 40, in __init__
    xmlstring = source.read()
  File "/usr/lib/python3.3/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe9 in position 1097: ordinal not in range(128)

The trivial fix of setting the character encoding on the open() statements in test_dawg.py allows the tests to work in python3.3; but if you set suitably, one test in python2.7 will fail with this output:

ERROR: test.test_dawg.test_dawg((rdflib.term.URIRef(u'http://raw.github.com/RDFLib/rdflib/master/test/DAWG/rdflib/manifest.ttl#unicode'), 'Unicode in SPARQL queries', None, 'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.ttl', [], 'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.rq', rdflib.term.URIRef(u'file:///home/cwalton/Development/rdflib-4.0.1/test/DAWG/rdflib/unicode.srx'), True),)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/cwalton/Development/rdflib-4.0.1/test/test_dawg.py", line 384, in query_test
    res2 = g.query(codecs.open(query[7:], encoding="utf-8").read(), base=urljoin(query, '.'))
  File "/home/cwalton/Development/rdflib-4.0.1/rdflib/graph.py", line 1045, in query
    query_object, initBindings, initNs, **kwargs))
  File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/processor.py", line 72, in query
    parsetree = parseQuery(strOrQuery)
  File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/parser.py", line 1034, in parseQuery
    return Query.parseString(q, parseAll=True)
  File "/usr/lib64/python2.7/site-packages/pyparsing.py", line 1031, in parseString
    loc, tokens = self._parse( instring, 0 )
<snip a bunch of pyparsing internals>
  File "/usr/lib64/python2.7/site-packages/pyparsing.py", line 695, in wrapper
    ret = func(*args[limit[0]:])
  File "/home/cwalton/Development/rdflib-4.0.1/rdflib/plugins/sparql/parser.py", line 300, in <lambda>
    lambda x: rdflib.Literal(decodeStringEscape(x[0][1:-1])))
  File "/home/cwalton/Development/rdflib-4.0.1/rdflib/py3compat.py", line 129, in decodeStringEscape
    return s.decode('string-escape')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)