Issue 6108: unicode(exception) and str(exception) should return the same message on Py2.6 (original) (raw)
Created on 2009-05-26 04:53 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (25)
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-05-26 04:52
On Python 2.5 str(exception) and unicode(exception) return the same text:
err UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)') str(err) "'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)" unicode(err) u"'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)"
On Python 2.6 unicode(exception) returns unicode(exception.args):
err UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)') str(err) "'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)" unicode(err) u"('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)')"
This seems to affect only exceptions with more than 1 arg (e.g. UnicodeErrors and SyntaxErrors). KeyError is also different (the '' are missing with unicode()).
Note that when an exception like ValueError() is instantiated with more than 1 arg even str() returns str(exception.args) on both Py2.5 and Py2.6.
Probably str() checks the number of args before returning a specific message and if it doesn't match it returns str(self.args). unicode() instead seems to always return unicode(self.args) on Py2.6.
Attached there's a script that prints the repr(), str() and unicode() of some exceptions, run it on Py2.5 and Py2.6 to see the differences.
Author: Jean-Paul Calderone (exarkun) *
Date: 2009-09-13 05:00
Perhaps also worth noting is that in Python 2.4 as well, str(exception) and unicode(exception) returned the same thing. Unlike some other exception changes in 2.6, this doesn't seem to be a return to older behavior, but just a new behavior. (Or maybe no one cares about that; just wanted to point it out, though.)
Author: Antoine Pitrou (pitrou) *
Date: 2009-09-13 14:36
Looks like a potentially annoying bug to me.
Author: Barry A. Warsaw (barry) *
Date: 2009-09-29 18:31
Since we do not yet have a patch for this, I'm knocking it off the list for 2.6.3. It seems like an annoying loss of compatibility, but do we have any reports of it breaking real-world code?
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-11-12 05:22
I added the output of unicode_exceptions.py on Py2.6 and a testcase (against the trunk) that fails for 5 different exceptions, including the IOError mentioned in #6890 (also added to unicode_exceptions.py). The problem has been introduced by #2517.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-12 03:21
In r64791, BaseException gained a new unicode method that does the equivalent of the following things:
- if the number of args is 0, returns u''
- if it's 1 returns unicode(self.args[0])
- if it's >1 returns unicode(self.args)
Before this, BaseException only had a str method, so unicode(e) (with e being an exception derived from BaseException) called:
- e.str().decode(), if e didn't implement unicode
- e.unicode(), if e implemented an unicode method
Now, all the derived exceptions that don't implement their own unicode method inherit the "generic" unicode of BaseException, and they use that instead of falling back on str. This is generally ok if the numbers of args is 0 or 1, but if there are more args, there's usually some specific formatting in the str method that is lost when BaseException.unicode returns unicode(self.args).
Possible solutions:
- implement a unicode method that does the equivalent of calling unicode(str(self)) (i.e. converting to unicode the message returned by str instead of converting self.args);
- implement a unicode method that formats the message as str for all the exceptions with a str that does some specific formatting;
Attached there's a proof of concept (.diff) where I tried to implement the first method with UnicodeDecodeError. This method can be used as long as str always returns only ascii.
The patch seems to work fine for me (note: this is my first attempt to use the C API). If the approach is correct I can do the same for the other exceptions too and submit a proper patch.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-12 15:53
In r64791, BaseException gained a new unicode method that does the equivalent of the following things:
It remains to be seen why that behaviour was chosen. Apparently Nick implemented it. IMO unicode should have the same behaviour as str. There's no reason to implement two different formatting algorithms.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2009-12-13 01:09
Following this down the rabbit hole a little further: Issue #2517 (the origin of my checkin) was just a restoration of the unicode slot implementation that had been ripped out in r51837 due to Issue #1551432.
At the time of the r64791 checkin, BaseException_str and BaseException_unicode were identical aside from the type of object returned (checking SVN head shows they're actually still identical).
However, it looks like several exceptions with str overrides (i.e. Unicode[Encode/Decode/Translate]Error_str, EnvironmentError_str, WindowsError_str. SyntaxError_str, KeyError_str) are missing corresponding unicode overrides, so invoking unicode() on them falls back to the BaseException_unicode implementation instead of using the custom formatting behaviour of the subclass.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-13 01:39
IMO unicode should have the same behaviour as str. There's no reason to implement two different formatting algorithms.
If BaseException has both the methods they have to be both overridden by derived exceptions in order to have the same behaviour. The simplest way to do it is to convert the string returned by str to unicode, as I did in .diff. If you have better suggestions let me know.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-13 01:52
Well the obvious problem with this approach is that it won't work if str() returns a non-ascii string. The only working solution would be to replicate the functioning of str() in each unicode() implementation.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2009-12-13 04:24
As Antoine said, there's a reason BaseException now implements both str and unicode and doesn't implement the latter in terms of the former - it's the only way to consistently support Unicode arguments that can't be encoded to an 8-bit ASCII string:
str(Exception(u"\xc3\xa0")) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) unicode(Exception(u"\xc3\xa0")) u'\xc3\xa0'
For some of the exception subclasses that will always return ASCII (e.g. KeyError, which calls repr() on its arguments) then defining unicode in terms of str as Ezio suggests will work.
For others (as happened with BaseException itself), the unicode method will need to be a reimplementation that avoids trying to encode potentially non-ASCII characters into an 8-bit ASCII string.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-13 04:39
What you said is only a special case, and I agree that the solution introduced with r64791 is correct for that. However, that fix has the side effect of breaking the code in other situations.
To summarize the possible cases and the behaviours I prepared the following list (odd numbers -> BaseException; even numbers -> any exception with overridden str and no unicode.):
0 args, e = Exception(): py2.5 : str(e) -> ''; unicode(e) -> u'' py2.6 : str(e) -> ''; unicode(e) -> u'' desired: str(e) -> ''; unicode(e) -> u'' Note: this is OK
0 args, e = MyException(), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u'' desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string (including ''), unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error.
3a) 1 str arg, e = Exception('foo'): py2.5 : str(e) -> 'foo'; unicode(e) -> u'foo' py2.6 : str(e) -> 'foo'; unicode(e) -> u'foo' desired: str(e) -> 'foo'; unicode(e) -> u'foo' Note: this is OK
3b) 1 non-ascii unicode arg, e = Exception(u'föö'): py2.5 : str(e) -> error; unicode(e) -> error py2.6 : str(e) -> error; unicode(e) -> u'föö' desired: str(e) -> error; unicode(e) -> u'föö' Note: py2.6 behaviour is better: unicode(e) should return u'föö'
1 unicode arg, e = MyException(u'föö'), with overridden str: py2.5 : str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii' py2.6 : str(e) -> error or 'ascii'; unicode(e) -> u'föö' desired: str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii' Note: py2.5 behaviour is better: if str returns an ascii string str(e) should work, otherwise it should raise an error. unicode(e) should return the ascii string decoded or an error, not the arg.
1 args of any type, e = Exception('foo', u'föö', 5): py2.5 : str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; py2.6 : str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; desired: str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)";
Note: this is OK
1 args of any type, e = MyException('foo', u'föö', 5), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string, unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error.
As you can see, your example corresponds just to the case 3b) (now fixed), but cases 2, 4, 6 are now broken.
Making this list allowed me to come out with a new patch, that seems to solve all the problems (2, 4 and 6 while leaving 3b as it is now). The only exception is for KeyError, if we want it to print the repr, then KeyError_unicode should be implemented, but I think that Python only calls str() so it's probably not necessary.
Attached new patch that passes all the tests in issue6108_testcase except for KeyError. Unless you disagree with the 'desired behaviours' that I listed, this patch should fix the issue.
Author: Robert Collins (rbcollins) *
Date: 2009-12-13 04:44
"2) 0 args, e = MyException(), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u'' desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string (including ''), unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error. "
I'm not sure how you justify raising an unnecessary error when trying to stringify an exception as being 'better'.
str should not decode its arguments if they are already strings: they may be valid data for the user even if they are not decodable (and note that an implicit decode may try to decode('ascii') which is totally useless.
str and unicode are /different/ things, claiming they have to behave the same is equivalent to claiming either that we don't need unicode, or that we don't need binary data.
Surely there is space for both things, which does imply that unicode(str(e)) != unicode(e).
Why should that be the same anyway?
Author: Alyssa Coghlan (ncoghlan) *
Date: 2009-12-13 04:49
I agree the 2.6 implementation creates backwards compatibility problems with subclasses that only override str that we didn't recognise at the time.
An alternative approach that should work even for the KeyError case is for BaseException_unicode to check explicitly for the situation where the str slot has been overridden but unicode is still the BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))" when it detects that situation.
That way subclasses that only override str would continue to see the old behaviour, while subclasses that don't override either would continue to benefit from the new behaviour.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-13 04:51
Assume the case of e = MyException() (note: 0 args) with a str that returns a default message. Now, if the message is ascii, str(e) works and the user see the default message but unicode(e) returns a not-so-useful empty string. On the other hand, if str returns a non-ascii string, then it's wrong in the first place, because str(e) will fail and returning an empty string with unicode(e) is not going to help.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-13 04:56
An alternative approach that should work even for the KeyError case is for BaseException_unicode to check explicitly for the situation where the str slot has been overridden but unicode is still the BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))" when it detects that situation.
This is even better, I'll try to do it.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-13 06:01
Here is a new patch (-3.patch) that checks if str has been overridden and calls PyObject_Unicode(PyObject_Str(self)).
All the tests (including the one with KeyError) in issue6108_testcase.diff now pass.
If the patch is OK I'll make sure that the tests cover all the possible cases that I listed and possibly add a few more before the commit.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-13 12:52
You should check the return value from PyObject_Str().
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-20 19:29
I created a comprehensive set of tests to check all the possibilities that I listed in and updated the patch for Object/exceptions.c. Without patch all the test_*with_overridden__str_ and test_builtin_exceptions fail, both on 2.6 and on trunk, with the patch all the tests pass. The code in exceptions.c now does the equivalent of unicode(e.str()) instead of unicode(str(e)). If e.str() returns a non-ascii unicode string, unicode() now shows the message instead of raising an error.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-21 11:54
I created a comprehensive set of tests to check all the possibilities that I listed in and updated the patch for Object/exceptions.c.
Great! Small thing: in tests, you should use setUp() to initialize test data rather than init().
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-21 16:00
I updated the patch and moved the helper class outside the init.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-21 16:05
This looks fine, module the slight style issue mentioned on IRC. Please commit after you fix it. (this is assuming all tests pass, of course!)
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-21 17:40
This should be the final patch (-6.patch). I update the comments, checked that (some of) the tests fail without the patch, that they (all) pass with it and that there are no leaks. I plan to backport this on 2.6 and possibly port the tests to py3k and 3.1.
Author: Antoine Pitrou (pitrou) *
Date: 2009-12-21 18:34
It's ok for me.
Author: Ezio Melotti (ezio.melotti) *
Date: 2009-12-24 23:03
Fixed in r77045 (trunk) and r77046 (release26-maint). No need to port it to py3k since unicode() is gone.
History
Date
User
Action
Args
2022-04-11 14:56:49
admin
set
nosy: + benjamin.peterson
github: 50358
2009-12-24 23:03:08
ezio.melotti
set
status: open -> closed
messages: +
keywords: - needs review
resolution: accepted -> fixed
stage: commit review -> resolved
2009-12-21 18:34:06
pitrou
set
resolution: accepted
messages: +
stage: patch review -> commit review
2009-12-21 17:40:36
ezio.melotti
set
files: + issue6108-6.patch
resolution: accepted -> (no value)
messages: +
stage: commit review -> patch review
2009-12-21 16:05:58
pitrou
set
resolution: accepted
messages: +
stage: patch review -> commit review
2009-12-21 16:00:29
ezio.melotti
set
files: + issue6108-5.patch
messages: +
2009-12-21 11:54:50
pitrou
set
messages: +
2009-12-20 19:29:52
ezio.melotti
set
files: + issue6108-4.patch
messages: +
2009-12-13 12:52:51
pitrou
set
messages: +
2009-12-13 06:01:54
ezio.melotti
set
files: + issue6108-3.patch
messages: +
2009-12-13 04:56:44
ezio.melotti
set
messages: +
2009-12-13 04:51:43
ezio.melotti
set
messages: +
2009-12-13 04:49:12
ncoghlan
set
messages: +
2009-12-13 04:44:57
rbcollins
set
nosy: + rbcollins
messages: +
2009-12-13 04:39:31
ezio.melotti
set
keywords: + needs review
files: + issue6108-2.patch
messages: +
stage: needs patch -> patch review
2009-12-13 04:24:37
ncoghlan
set
messages: +
2009-12-13 01:52:48
pitrou
set
messages: +
2009-12-13 01:39:56
ezio.melotti
set
messages: +
2009-12-13 01:09:45
ncoghlan
set
messages: +
2009-12-12 15:53:07
pitrou
set
nosy: + ncoghlan
messages: +
2009-12-12 03:21:58
ezio.melotti
set
files: + issue6108.diff
assignee: ezio.melotti
messages: +
keywords: + patch
2009-11-12 05:25:54
ezio.melotti
set
files: + unicode_exceptions.py
2009-11-12 05:24:18
ezio.melotti
set
keywords: - patch
files: + output_on_py26.txt
2009-11-12 05:23:02
ezio.melotti
set
files: - unicode_exceptions.py
2009-11-12 05:22:50
ezio.melotti
set
files: + issue6108_testcase.diff
priority: high -> release blocker
title: unicode(exception) behaves differently on Py2.6 when len(exception.args) > 1 -> unicode(exception) and str(exception) should return the same message on Py2.6
messages: +
keywords: + patch
2009-09-29 18:31:08
barry
set
priority: release blocker -> high
nosy: + barry
messages: +
2009-09-16 19:59:17
georg.brandl
link
2009-09-16 19:41:34
georg.brandl
set
priority: high -> release blocker
2009-09-13 14:36:34
pitrou
set
priority: high
versions: + Python 2.7
nosy: + pitrou
messages: +
stage: needs patch
2009-09-13 05:00:41
exarkun
set
nosy: + exarkun
messages: +
2009-09-12 08:45:38
cvrebert
set
nosy: + cvrebert
2009-05-29 07:37:10
georg.brandl
link
2009-05-26 04:53:13
ezio.melotti
create