Issue 24025: str(bytes_obj) should raise an error (original) (raw)

Created on 2015-04-22 13:23 by lemburg, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (8)
msg241800 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-22 13:23
In Python 2, the unicode() constructor does not accept bytes arguments, unless an encoding argument is given: >>> unicode(u'abcäöü'.encode('utf-8')) Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128) In Python 3, the str() constructor masks this programming error by returning the repr() of the bytes object: >>> str('abcäöü'.encode('utf-8')) "b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'" I think it would be more helpful to point the programmer to the most probably missing encoding argument by raising an error. Also note that you get a different output with encoding argument set: >>> str('abcäöü'.encode('utf-8'), 'utf-8') 'abcäöü' I know this is documented, but it is still not very helpful and can easily hide errors.
msg241802 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2015-04-22 13:39
bytes.__str__ can already raise either a warning (-b) >>> str('abcäöü'.encode('utf-8')) __main__:1: BytesWarning: str() on a bytes instance "b'abc\\xc3\\xa4\\xc3\\xb6\\xc3\\xbc'" or error (-bb), which applies equally to implicit conversion by print(): >>> print('abcäöü'.encode('utf-8')) Traceback (most recent call last): File "", line 1, in BytesWarning: str() on a bytes instance
msg241803 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-04-22 13:43
In Python 2, the unicode() constructor accepts bytes argument if it is decodeable with sys.getdefaultencoding(). >>> unicode(b'abc') u'abc' >>> import sys >>> reload(sys) <module 'sys' (built-in)> >>> sys.setdefaultencoding("utf-8") >>> unicode(u'abcäöü'.encode('utf-8')) u'abc\xe4\xf6\xfc' In Python 3, the str() constructor does not accept bytes arguments if Python is ran with -bb option.
msg241804 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-04-22 13:52
str accepting bytes and returning the repr was a conscious design choice, as evidenced by the -bb option, and I'm sure there is code that is both unintentionally and *intentionally* using this, despite the warning. Unless we want to discuss making the -bb behavior the default in a future version of python, this issue should be closed.
msg241808 - (view) Author: Marc-Andre Lemburg (lemburg) * (Python committer) Date: 2015-04-22 14:48
On 22.04.2015 15:52, R. David Murray wrote: > str accepting bytes and returning the repr was a conscious design choice, as evidenced by the -bb option, and I'm sure there is code that is both unintentionally and *intentionally* using this, despite the warning. Unless we want to discuss making the -bb behavior the default in a future version of python, this issue should be closed. I guess that would be helpful, yes. Here's the original patch which introduced -b and -bb: http://bugs.python.org/issue1392 This was Guido's answer back then: """ I'll look at the patches later, but we've gone over this before on the list. str() of *any* object needs to return *something*. Yes, it's unfortunate that this masks bugs in the transitional period, but it really is the best thing in the long run. We had other exceptional treatement for str vs. bytes (e.g. the comparison was raising TypeError for a while) and we had to kill that too. """ I'm not sure what the "transitional period" refers to, though. It's 8 years later now and doesn't look like str(bytes_object) will go away a source of subtle bugs anytime soon :-)
msg241811 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-04-22 15:11
Yeah, that's why I run tests with -bb myself. Except that there was a bug in -W/-bb handling that meant I wasn't really...and that bit me because there is at least one buildbot that really does, and it complained... (Although in that case the 'bug' was really benign, since it was just optional debug print output for which the repr of the bytes was actually fine.)
msg241867 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2015-04-23 16:34
> I'm not sure what the "transitional period" refers to, though. The Python 2 -> Python 3 migration. > It's 8 years later now and doesn't look like str(bytes_object) will go away a source of subtle bugs anytime soon str(bytes_object) is perfectly reasonable when logging stuff, for example. Recommend closing.
msg241869 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2015-04-23 16:57
It would be unacceptable if print(b) were to raise an exception. The reason the transitional period is long is just that people are still porting Python 2 code.
History
Date User Action Args
2022-04-11 14:58:15 admin set github: 68213
2015-04-23 16:57:52 gvanrossum set status: pending -> closedassignee: gvanrossummessages: +
2015-04-23 16:34:30 pitrou set status: open -> pendingnosy: + pitrou, gvanrossummessages: + superseder: py3k-pep3137: issue warnings / errors on str(bytes()) and similar operationsresolution: rejected
2015-04-22 15:11:24 r.david.murray set messages: +
2015-04-22 14:48:38 lemburg set messages: +
2015-04-22 13:52:34 r.david.murray set nosy: + r.david.murraymessages: +
2015-04-22 13:43:36 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2015-04-22 13:39:03 eryksun set nosy: + eryksunmessages: +
2015-04-22 13:23:32 lemburg create