[Python-Dev] Known doctest bug with unicode? (original) (raw)
Adam Olsen rhamph at gmail.com
Fri Apr 18 18:05:19 CEST 2008
- Previous message: [Python-Dev] Known doctest bug with unicode?
- Next message: [Python-Dev] Known doctest bug with unicode?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Apr 18, 2008 at 8:27 AM, Jeroen Ruigrok van der Werven <asmodai at in-nomine.org> wrote:
# vim: set fileencoding=utf-8 :
kanamap = { u'あ': 'a' } def transpose(word): """Convert a word in kana to its equivalent Hepburn romanisation. >>> transpose(u'あ') 'a' """ transposed = '' for character in word: transposed += kanamap[character] return transposed if name == 'main': import doctest doctest.testmod() doctest: [16:24] [ruigrok at akuma] (1) {20} % python trans.py ********************************************************************** File "trans.py", line 11, in main.transpose Failed example: transpose(u'あ') Exception raised: Traceback (most recent call last): _File "doctest.py", line 1212, in run compileflags, 1) in test.globs File "<doctest _main_.transpose[0]>", line 1, in transpose(u'あ') File "trans.py", line 16, in transpose transposed += kanamap[character] KeyError: u'\xe3' ********************************************************************** 1 items had failures: 1 of 1 in main.transpose Test Failed 1 failures. normal interpreter: >>> fromm trans import transpose >>> transpose(u'あ') 'a'
What you've got is an 8-bit string containing a unicode literal. Since this gets past the module's compilation stage, it doctest passes it to the compiler again, and it defaults to iso-8859-1. Thus u'あ'.encode('utf-8').decode('latin-1') -> u'\xe3\x81\x82'.
Possible solutions:
- Make the docstring itself unicode, assuming doctest allows this.
- Call doctest explicitly, giving it the correct encoding.
- See if you can put an encoding declaration in the doctest itself.
- Make doctest smarter, so that it can grab the original module's encoding.
- Wait until 3.0, where this is hopefully fixed by making doctests use unicode by default?
-- Adam Olsen, aka Rhamphoryncus
- Previous message: [Python-Dev] Known doctest bug with unicode?
- Next message: [Python-Dev] Known doctest bug with unicode?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]