msg70258 - (view) |
Author: Ignas Mikalajūnas (ignas) |
Date: 2008-07-25 15:58 |
Not all combinations of unicode/non-unicode parameters work for ljust, center and rjust. Passing a unicode character to them as a parameter when the string is ascii fails with an error. This doctest fails in 3 places. Though I would expect it to be passing. def doctest_strings(): """ >>> uni = u"a" >>> ascii = "a" >>> uni.center(5, ascii) u'aaaaa' >>> uni.center(5, uni) u'aaaaa' >>> ascii.center(5, ascii) 'aaaaa' >>> ascii.center(5, uni) u'aaaaa' >>> uni.ljust(5, ascii) u'aaaaa' >>> uni.ljust(5, uni) u'aaaaa' >>> ascii.ljust(5, ascii) 'aaaaa' >>> ascii.ljust(5, uni) u'aaaaa' >>> uni.rjust(5, ascii) u'aaaaa' >>> uni.rjust(5, uni) u'aaaaa' >>> ascii.rjust(5, ascii) 'aaaaa' >>> ascii.rjust(5, uni) u'aaaaa' """ |
|
|
msg82514 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2009-02-20 06:13 |
Indeed this behavior doesn't seem to be documented. When the string is unicode and the fillchar non-unicode Python implicitly tries to decode the fillchar (and possibly it raises a TypeError if it's not in range(0,128)): >>> u'x'.center(5, 'y') # unicode string, non-unicode (str) fillchar u'yyxyy' # the fillchar is decoded When the string is non-unicode it only accepts a non-unicode fillchar (e.g. 'x'.center(5, 'y')) and it raises a TypeError if the fillchar is unicode: >>> 'x'.center(5, u'y') # non-unicode (str) string, unicode fillchar Traceback (most recent call last): File "", line 1, in TypeError: center() argument 2 must be char, not unicode If it tries to decode the fillchar when the string is unicode, it could also try to encode the unicode fillchar (and possibly raise a TypeError) when the string is non-unicode. Py3, instead, seems to have the opposite behavior. It implicitly encodes unicode fillchars into byte strings when the string is a byte string but it doesn't decode a byte fillchar if the string is unicode: >>> b'x'.center(5, 'y') # byte string, unicode fillchar b'yyxyy' # the fillchar is encoded >>> 'x'.center(5, b'y') # unicode string, byte fillchar Traceback (most recent call last): File "", line 1, in TypeError: The fill character cannot be converted to Unicode In the doc [1] there's written that "The methods on bytes and bytearray objects don’t accept strings as their arguments, just as the methods on strings don’t accept bytes as their arguments." so b'x'.center(5, 'y') should probably raise an error on Py3 (I could open a new issue for this). [1]: http://docs.python.org/3.0/library/stdtypes.html#bytes-and-byte-array-methods - In the note |
|
|
msg82660 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2009-02-24 07:19 |
In Py2.x, I think the desired behavior should match str.join(). If either input in unicode the output is unicode. If both are ascii, ascii should come out. For Py3.x, I think the goal was to have str.join() enforce that both inputs are unicode. If either are bytes, then you have to know the encoding. |
|
|
msg83669 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-03-17 12:10 |
About Python3, bytes.center accepts unicode as second argument, which is an error for me: >>> b"x".center(5, b"\xe9") b'\xe9\xe9x\xe9\xe9' >>> b"x".center(5, "\xe9") b'\xe9\xe9x\xe9\xe9' The second example must fail with a TypeError. str.center has the right behaviour: >>> "x".center(5, "\xe9") 'ééxéé' >>> "x".center(5, b"\xe9") TypeError: The fill character cannot be converted to Unicode |
|
|
msg87121 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-05-04 12:36 |
haypo> About Python3, bytes.center accepts unicode as second argument, haypo> which is an error for me Ok, it's fixed thanks by r71013 (issue #5499). |
|
|
msg87122 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-05-04 12:38 |
This issue only concerns Python 2.x, Python 3.x has the right behaviour: it disallow mixing bytes with characters. |
|
|
msg87123 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2009-05-04 12:58 |
The question is why str.{ljust,rjust,center} doesn't accept unicode argument, whereas unicode.{ljust,rjust,center} accept ASCII string. Other string methods accept unicode argument, like str.count() (encode the unicode string to bytes using utf8 charset). To be consistent with other string methods, str.{ljust,rjust,center} should accept unicode string and convert them to byte string using utf8, like str.count does. But I hate such implicit conversion (I prefer Python3 way: disallow mixing bytes and characters), so I will not contribute to such patch. Can you write such patch? -- str.{ljust,rjust,center} use PyArg_ParseTuple(args, "n|c:...", ...) and getarg('c') which only accepts a string of 1 byte. unicode.{ljust,rjust,center} use PyArg_ParseTuple(args, "n |
O&:...", ..., convert_uc, ...) where convert_uc looks something like: def convert_uc(o): try: u = unicode(o) except: raise TypeError("The fill character cannot be converted to Unicode") if len(u) != 1: raise TypeError("The fill character must be exactly one character long")) return u[0] convert_uc() accepts an byte string of 1 ASCII. string_count() uses PyArg_ParseTuple(args, "O...", ...) and then test the substring type. |
|
msg123483 - (view) |
Author: Alexander Belopolsky (belopolsky) *  |
Date: 2010-12-06 18:25 |
As a feature request for 2.x, I think this should be rejected. Any objections? The "behavior" part seem to have been fixed. |
|
|