Issue 5274: sys.exc_info()[1] - different handling from str() and unicode() - py 2.6 (original) (raw)

Recently I noticed a bit surprising behaviour of sys.exc_info() in python 2.6.1 (comparing to 2.5.4). Some code like: try: my_text = unicode(open("test_file.txt").read(), "utf-8")

triggers UnicodeDecodeError while opening a text file wrongly with utf-8 codec (test_file.txt contains the string "abšcd" encoded in windows-1250);

this error is catched by the except clause. Further the mentioned versions of python differ in handling sys.exc_info () The following lines: print sys.exc_info()[1] print repr(sys.exc_info()[1]) print str(sys.exc_info()[1]) print unicode(sys.exc_info()[1])

result in python 2.5 in:

'utf8' codec can't decode byte 0x9a in position 2: unexpected code byte UnicodeDecodeError('utf8', 'ab\x9acd', 2, 3, 'unexpected code byte') 'utf8' codec can't decode byte 0x9a in position 2: unexpected code byte 'utf8' codec can't decode byte 0x9a in position 2: unexpected code byte

in python 2.6 it is: 'utf8' codec can't decode byte 0x9a in position 2: unexpected code byte UnicodeDecodeError('utf8', 'ab\x9acd', 2, 3, 'unexpected code byte') 'utf8' codec can't decode byte 0x9a in position 2: unexpected code byte ('utf8', 'ab\x9acd', 2, 3, 'unexpected code byte')

Which is kind of confusing, I'd expect str() and unicode() to return the equivalent content, which is not the case here. (The second part "ab\x9acd" is the whole content of the file being read - which is normally quite a bit longer than this sample...)

Regards vbr