msg241800 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2015-04-22 13:23 |
In Python 2, the unicode() constructor does not accept bytes arguments, unless an encoding argument is given: >>> unicode(u'abcäöü'.encode('utf-8')) Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128) In Python 3, the str() constructor masks this programming error by returning the repr() of the bytes object: >>> str('abcäöü'.encode('utf-8')) "b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'" I think it would be more helpful to point the programmer to the most probably missing encoding argument by raising an error. Also note that you get a different output with encoding argument set: >>> str('abcäöü'.encode('utf-8'), 'utf-8') 'abcäöü' I know this is documented, but it is still not very helpful and can easily hide errors. |
|
|
msg241802 - (view) |
Author: Eryk Sun (eryksun) *  |
Date: 2015-04-22 13:39 |
bytes.__str__ can already raise either a warning (-b) >>> str('abcäöü'.encode('utf-8')) __main__:1: BytesWarning: str() on a bytes instance "b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'" or error (-bb), which applies equally to implicit conversion by print(): >>> print('abcäöü'.encode('utf-8')) Traceback (most recent call last): File "", line 1, in BytesWarning: str() on a bytes instance |
|
|
msg241803 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2015-04-22 13:43 |
In Python 2, the unicode() constructor accepts bytes argument if it is decodeable with sys.getdefaultencoding(). >>> unicode(b'abc') u'abc' >>> import sys >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding("utf-8") >>> unicode(u'abcäöü'.encode('utf-8')) u'abc\xe4\xf6\xfc' In Python 3, the str() constructor does not accept bytes arguments if Python is ran with -bb option. |
|
|
msg241804 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-04-22 13:52 |
str accepting bytes and returning the repr was a conscious design choice, as evidenced by the -bb option, and I'm sure there is code that is both unintentionally and *intentionally* using this, despite the warning. Unless we want to discuss making the -bb behavior the default in a future version of python, this issue should be closed. |
|
|
msg241808 - (view) |
Author: Marc-Andre Lemburg (lemburg) *  |
Date: 2015-04-22 14:48 |
On 22.04.2015 15:52, R. David Murray wrote: > str accepting bytes and returning the repr was a conscious design choice, as evidenced by the -bb option, and I'm sure there is code that is both unintentionally and *intentionally* using this, despite the warning. Unless we want to discuss making the -bb behavior the default in a future version of python, this issue should be closed. I guess that would be helpful, yes. Here's the original patch which introduced -b and -bb: http://bugs.python.org/issue1392 This was Guido's answer back then: """ I'll look at the patches later, but we've gone over this before on the list. str() of *any* object needs to return *something*. Yes, it's unfortunate that this masks bugs in the transitional period, but it really is the best thing in the long run. We had other exceptional treatement for str vs. bytes (e.g. the comparison was raising TypeError for a while) and we had to kill that too. """ I'm not sure what the "transitional period" refers to, though. It's 8 years later now and doesn't look like str(bytes_object) will go away a source of subtle bugs anytime soon :-) |
|
|
msg241811 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2015-04-22 15:11 |
Yeah, that's why I run tests with -bb myself. Except that there was a bug in -W/-bb handling that meant I wasn't really...and that bit me because there is at least one buildbot that really does, and it complained... (Although in that case the 'bug' was really benign, since it was just optional debug print output for which the repr of the bytes was actually fine.) |
|
|
msg241867 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2015-04-23 16:34 |
> I'm not sure what the "transitional period" refers to, though. The Python 2 -> Python 3 migration. > It's 8 years later now and doesn't look like str(bytes_object) will go away a source of subtle bugs anytime soon str(bytes_object) is perfectly reasonable when logging stuff, for example. Recommend closing. |
|
|
msg241869 - (view) |
Author: Guido van Rossum (gvanrossum) *  |
Date: 2015-04-23 16:57 |
It would be unacceptable if print(b) were to raise an exception. The reason the transitional period is long is just that people are still porting Python 2 code. |
|
|