[Python-Dev] len(chr(i)) = 2? (original) (raw)

Amaury Forgeot d'Arc amauryfa at gmail.com
Tue Nov 23 20:19:28 CET 2010


2010/11/23 Alexander Belopolsky <alexander.belopolsky at gmail.com>:

This discussion motivated me to start looking into how well Python library itself is prepared to deal with len(chr(i)) = 2.  I was not surprised to find that textwrap does not handle the issue that well:

len(wrap(' \U00010140' * 80, 20)) 12 len(wrap(' \U00000140' * 80, 20)) 8 That module should probably be rewritten to properly implement  the Unicode line breaking algorithm <http://unicode.org/reports/tr14/tr14-22.html>. Yet finding a bug in a str object method after a 5 min review was a bit discouraging: 'xyz'.center(20, '\U00010140') Traceback (most recent call last):  File "", line 1, in TypeError: The fill character must be exactly one character long Given the apparent difficulty of writing even basic text processing algorithms in presence of surrogate pairs, I wonder how wise it is to expose Python users to them.

This was already discussed two years ago:

http://mail.python.org/pipermail/python-dev/2008-July/080900.html

So yes, wrap() and center() should be fixed.

-- Amaury Forgeot d'Arc



More information about the Python-Dev mailing list