Issue 4382: test_dbm_dumb fails due to character encoding issue on Mac OS X (original) (raw)
test_dbm_dumb fails due to what appears to be a character encoding issue on Mac OS X:
Majestix:Python-3.0rc3 martina$ DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe -E -bb ./Lib/test/regrtest.py -l test_dbm_dumbtest_dbm_dumb Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2510>> ignored Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored Exception UnicodeEncodeError: UnicodeEncodeError('charmap', "'ü', (3072, 1)\n", 2, 3, 'character maps to ') in <bound method _Database.close of <dbm.dumb._Database object at 0x6a2550>> ignored test test_dbm_dumb failed -- errors occurred; run in verbose mode for details 1 test failed: test_dbm_dumb
Example of verbose output (other testcases are similar):
====================================================================== ERROR: test_dumbdbm_creation (test.test_dbm_dumb.DumbDBMTestCase)
Traceback (most recent call last): File "/Users/martina/Downloads/Python- 3.0rc3/Lib/test/test_dbm_dumb.py", line 41, in test_dumbdbm_creation f.close() File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line 228, in close self._commit() File "/Users/martina/Downloads/Python-3.0rc3/Lib/dbm/dumb.py", line 116, in _commit f.write("%r, %r\n" % (key.decode('Latin-1'), pos_and_siz_pair)) File "./Lib/io.py", line 1491, in write b = encoder.encode(s) File "./Lib/encodings/mac_roman.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xbc' in position 2: character maps to
The Mac Roman encoding comes into play, because _commit opens _dirfile without explicitly specifying an encoding. io.open then gets the encoding via locale.getpreferredencoding, which returns mac-roman:
Majestix:Python-3.0rc3 martina$ DYLD_FRAMEWORK_PATH=/Users/martina/Downloads/Python-3.0rc3: ./python.exe -c "import locale;print(locale.getpreferredencoding())" mac-roman
Two issues:
- since dumb.py handles encoding explicitly, shouldn't it specify the encoding for _dirfile as well? (or use a binary file; but this could cause new line-ending troubles...)
- is mac-roman really the appropriate choice for locale.getpreferredencoding? This is on Mac OS X 10.5, not Mac OS 9... The preferred encoding for Mac OS X should be utf-8, not some legacy encoding...
Seems to be related to r67310, which was intended to fix issue #3799 http://svn.python.org/view/python/branches/py3k/Lib/dbm/dumb.py? rev=67310&r1=63662&r2=67310