[Python-Dev] Known doctest bug with unicode? (original) (raw)

Adam Olsen rhamph at gmail.com
Fri Apr 18 18:05:19 CEST 2008


On Fri, Apr 18, 2008 at 8:27 AM, Jeroen Ruigrok van der Werven <asmodai at in-nomine.org> wrote:

# vim: set fileencoding=utf-8 :

kanamap = { u'あ': 'a' } def transpose(word): """Convert a word in kana to its equivalent Hepburn romanisation. >>> transpose(u'あ') 'a' """ transposed = '' for character in word: transposed += kanamap[character] return transposed if name == 'main': import doctest doctest.testmod() doctest: [16:24] [ruigrok at akuma] (1) {20} % python trans.py ********************************************************************** File "trans.py", line 11, in main.transpose Failed example: transpose(u'あ') Exception raised: Traceback (most recent call last): _File "doctest.py", line 1212, in run compileflags, 1) in test.globs File "<doctest _main_.transpose[0]>", line 1, in transpose(u'あ') File "trans.py", line 16, in transpose transposed += kanamap[character] KeyError: u'\xe3' ********************************************************************** 1 items had failures: 1 of 1 in main.transpose Test Failed 1 failures. normal interpreter: >>> fromm trans import transpose >>> transpose(u'あ') 'a'

What you've got is an 8-bit string containing a unicode literal. Since this gets past the module's compilation stage, it doctest passes it to the compiler again, and it defaults to iso-8859-1. Thus u'あ'.encode('utf-8').decode('latin-1') -> u'\xe3\x81\x82'.

Possible solutions:

  1. Make the docstring itself unicode, assuming doctest allows this.
  2. Call doctest explicitly, giving it the correct encoding.
  3. See if you can put an encoding declaration in the doctest itself.
  4. Make doctest smarter, so that it can grab the original module's encoding.
  5. Wait until 3.0, where this is hopefully fixed by making doctests use unicode by default?

-- Adam Olsen, aka Rhamphoryncus



More information about the Python-Dev mailing list