Issue 6108: unicode(exception) and str(exception) should return the same message on Py2.6 (original) (raw)

Created on 2009-05-26 04:53 by ezio.melotti, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (25)

msg88330 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-05-26 04:52

On Python 2.5 str(exception) and unicode(exception) return the same text:

err UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)') str(err) "'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)" unicode(err) u"'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)"

On Python 2.6 unicode(exception) returns unicode(exception.args):

err UnicodeDecodeError('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)') str(err) "'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)" unicode(err) u"('ascii', '\xc3\xa0', 0, 1, 'ordinal not in range(128)')"

This seems to affect only exceptions with more than 1 arg (e.g. UnicodeErrors and SyntaxErrors). KeyError is also different (the '' are missing with unicode()).

Note that when an exception like ValueError() is instantiated with more than 1 arg even str() returns str(exception.args) on both Py2.5 and Py2.6.

Probably str() checks the number of args before returning a specific message and if it doesn't match it returns str(self.args). unicode() instead seems to always return unicode(self.args) on Py2.6.

Attached there's a script that prints the repr(), str() and unicode() of some exceptions, run it on Py2.5 and Py2.6 to see the differences.

msg92561 - (view)

Author: Jean-Paul Calderone (exarkun) * (Python committer)

Date: 2009-09-13 05:00

Perhaps also worth noting is that in Python 2.4 as well, str(exception) and unicode(exception) returned the same thing. Unlike some other exception changes in 2.6, this doesn't seem to be a return to older behavior, but just a new behavior. (Or maybe no one cares about that; just wanted to point it out, though.)

msg92568 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-09-13 14:36

Looks like a potentially annoying bug to me.

msg93313 - (view)

Author: Barry A. Warsaw (barry) * (Python committer)

Date: 2009-09-29 18:31

Since we do not yet have a patch for this, I'm knocking it off the list for 2.6.3. It seems like an annoying loss of compatibility, but do we have any reports of it breaking real-world code?

msg95158 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-11-12 05:22

I added the output of unicode_exceptions.py on Py2.6 and a testcase (against the trunk) that fails for 5 different exceptions, including the IOError mentioned in #6890 (also added to unicode_exceptions.py). The problem has been introduced by #2517.

msg96281 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-12 03:21

In r64791, BaseException gained a new unicode method that does the equivalent of the following things:

if the number of args is 0, returns u''
if it's 1 returns unicode(self.args[0])
if it's >1 returns unicode(self.args)

Before this, BaseException only had a str method, so unicode(e) (with e being an exception derived from BaseException) called:

e.str().decode(), if e didn't implement unicode
e.unicode(), if e implemented an unicode method

Now, all the derived exceptions that don't implement their own unicode method inherit the "generic" unicode of BaseException, and they use that instead of falling back on str. This is generally ok if the numbers of args is 0 or 1, but if there are more args, there's usually some specific formatting in the str method that is lost when BaseException.unicode returns unicode(self.args).

Possible solutions:

implement a unicode method that does the equivalent of calling unicode(str(self)) (i.e. converting to unicode the message returned by str instead of converting self.args);
implement a unicode method that formats the message as str for all the exceptions with a str that does some specific formatting;

Attached there's a proof of concept (.diff) where I tried to implement the first method with UnicodeDecodeError. This method can be used as long as str always returns only ascii.

The patch seems to work fine for me (note: this is my first attempt to use the C API). If the approach is correct I can do the same for the other exceptions too and submit a proper patch.

msg96297 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-12 15:53

In r64791, BaseException gained a new unicode method that does the equivalent of the following things:

It remains to be seen why that behaviour was chosen. Apparently Nick implemented it. IMO unicode should have the same behaviour as str. There's no reason to implement two different formatting algorithms.

msg96313 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2009-12-13 01:09

Following this down the rabbit hole a little further: Issue #2517 (the origin of my checkin) was just a restoration of the unicode slot implementation that had been ripped out in r51837 due to Issue #1551432.

At the time of the r64791 checkin, BaseException_str and BaseException_unicode were identical aside from the type of object returned (checking SVN head shows they're actually still identical).

However, it looks like several exceptions with str overrides (i.e. Unicode[Encode/Decode/Translate]Error_str, EnvironmentError_str, WindowsError_str. SyntaxError_str, KeyError_str) are missing corresponding unicode overrides, so invoking unicode() on them falls back to the BaseException_unicode implementation instead of using the custom formatting behaviour of the subclass.

msg96314 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-13 01:39

IMO unicode should have the same behaviour as str. There's no reason to implement two different formatting algorithms.

If BaseException has both the methods they have to be both overridden by derived exceptions in order to have the same behaviour. The simplest way to do it is to convert the string returned by str to unicode, as I did in .diff. If you have better suggestions let me know.

msg96315 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-13 01:52

Well the obvious problem with this approach is that it won't work if str() returns a non-ascii string. The only working solution would be to replicate the functioning of str() in each unicode() implementation.

msg96318 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2009-12-13 04:24

As Antoine said, there's a reason BaseException now implements both str and unicode and doesn't implement the latter in terms of the former - it's the only way to consistently support Unicode arguments that can't be encoded to an 8-bit ASCII string:

str(Exception(u"\xc3\xa0")) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) unicode(Exception(u"\xc3\xa0")) u'\xc3\xa0'

For some of the exception subclasses that will always return ASCII (e.g. KeyError, which calls repr() on its arguments) then defining unicode in terms of str as Ezio suggests will work.

For others (as happened with BaseException itself), the unicode method will need to be a reimplementation that avoids trying to encode potentially non-ASCII characters into an 8-bit ASCII string.

msg96319 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-13 04:39

What you said is only a special case, and I agree that the solution introduced with r64791 is correct for that. However, that fix has the side effect of breaking the code in other situations.

To summarize the possible cases and the behaviours I prepared the following list (odd numbers -> BaseException; even numbers -> any exception with overridden str and no unicode.):

0 args, e = Exception(): py2.5 : str(e) -> ''; unicode(e) -> u'' py2.6 : str(e) -> ''; unicode(e) -> u'' desired: str(e) -> ''; unicode(e) -> u'' Note: this is OK
0 args, e = MyException(), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u'' desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string (including ''), unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error.

3a) 1 str arg, e = Exception('foo'): py2.5 : str(e) -> 'foo'; unicode(e) -> u'foo' py2.6 : str(e) -> 'foo'; unicode(e) -> u'foo' desired: str(e) -> 'foo'; unicode(e) -> u'foo' Note: this is OK

3b) 1 non-ascii unicode arg, e = Exception(u'föö'): py2.5 : str(e) -> error; unicode(e) -> error py2.6 : str(e) -> error; unicode(e) -> u'föö' desired: str(e) -> error; unicode(e) -> u'föö' Note: py2.6 behaviour is better: unicode(e) should return u'föö'

1 unicode arg, e = MyException(u'föö'), with overridden str: py2.5 : str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii' py2.6 : str(e) -> error or 'ascii'; unicode(e) -> u'föö' desired: str(e) -> error or 'ascii'; unicode(e) -> error or u'ascii' Note: py2.5 behaviour is better: if str returns an ascii string str(e) should work, otherwise it should raise an error. unicode(e) should return the ascii string decoded or an error, not the arg.
1 args of any type, e = Exception('foo', u'föö', 5): py2.5 : str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; py2.6 : str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; desired: str(e) -> "('foo', u'f\xf6\xf6', 5)"; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)";

Note: this is OK

1 args of any type, e = MyException('foo', u'föö', 5), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u"('foo', u'f\xf6\xf6', 5)"; desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string, unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error.

As you can see, your example corresponds just to the case 3b) (now fixed), but cases 2, 4, 6 are now broken.

Making this list allowed me to come out with a new patch, that seems to solve all the problems (2, 4 and 6 while leaving 3b as it is now). The only exception is for KeyError, if we want it to print the repr, then KeyError_unicode should be implemented, but I think that Python only calls str() so it's probably not necessary.

Attached new patch that passes all the tests in issue6108_testcase except for KeyError. Unless you disagree with the 'desired behaviours' that I listed, this patch should fix the issue.

msg96321 - (view)

Author: Robert Collins (rbcollins) * (Python committer)

Date: 2009-12-13 04:44

"2) 0 args, e = MyException(), with overridden str: py2.5 : str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; py2.6 : str(e) -> 'ascii' or error; unicode(e) -> u'' desired: str(e) -> 'ascii' or error; unicode(e) -> u'ascii' or error; Note: py2.5 behaviour is better: if str returns an ascii string (including ''), unicode(e) should return the same string decoded, if str returns a non-ascii string, both should raise an error. "

I'm not sure how you justify raising an unnecessary error when trying to stringify an exception as being 'better'.

str should not decode its arguments if they are already strings: they may be valid data for the user even if they are not decodable (and note that an implicit decode may try to decode('ascii') which is totally useless.

str and unicode are /different/ things, claiming they have to behave the same is equivalent to claiming either that we don't need unicode, or that we don't need binary data.

Surely there is space for both things, which does imply that unicode(str(e)) != unicode(e).

Why should that be the same anyway?

msg96322 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2009-12-13 04:49

I agree the 2.6 implementation creates backwards compatibility problems with subclasses that only override str that we didn't recognise at the time.

An alternative approach that should work even for the KeyError case is for BaseException_unicode to check explicitly for the situation where the str slot has been overridden but unicode is still the BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))" when it detects that situation.

That way subclasses that only override str would continue to see the old behaviour, while subclasses that don't override either would continue to benefit from the new behaviour.

msg96323 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-13 04:51

Assume the case of e = MyException() (note: 0 args) with a str that returns a default message. Now, if the message is ascii, str(e) works and the user see the default message but unicode(e) returns a not-so-useful empty string. On the other hand, if str returns a non-ascii string, then it's wrong in the first place, because str(e) will fail and returning an empty string with unicode(e) is not going to help.

msg96324 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-13 04:56

An alternative approach that should work even for the KeyError case is for BaseException_unicode to check explicitly for the situation where the str slot has been overridden but unicode is still the BaseException version and invoke "PyObject_Unicode(PyObject_Str(self))" when it detects that situation.

This is even better, I'll try to do it.

msg96325 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-13 06:01

Here is a new patch (-3.patch) that checks if str has been overridden and calls PyObject_Unicode(PyObject_Str(self)).

All the tests (including the one with KeyError) in issue6108_testcase.diff now pass.

If the patch is OK I'll make sure that the tests cover all the possible cases that I listed and possibly add a few more before the commit.

msg96331 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-13 12:52

You should check the return value from PyObject_Str().

msg96717 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-20 19:29

I created a comprehensive set of tests to check all the possibilities that I listed in and updated the patch for Object/exceptions.c. Without patch all the test_*with_overridden__str_ and test_builtin_exceptions fail, both on 2.6 and on trunk, with the patch all the tests pass. The code in exceptions.c now does the equivalent of unicode(e.str()) instead of unicode(str(e)). If e.str() returns a non-ascii unicode string, unicode() now shows the message instead of raising an error.

msg96742 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-21 11:54

I created a comprehensive set of tests to check all the possibilities that I listed in and updated the patch for Object/exceptions.c.

Great! Small thing: in tests, you should use setUp() to initialize test data rather than init().

msg96755 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-21 16:00

I updated the patch and moved the helper class outside the init.

msg96756 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-21 16:05

This looks fine, module the slight style issue mentioned on IRC. Please commit after you fix it. (this is assuming all tests pass, of course!)

msg96761 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-21 17:40

This should be the final patch (-6.patch). I update the comments, checked that (some of) the tests fail without the patch, that they (all) pass with it and that there are no leaks. I plan to backport this on 2.6 and possibly port the tests to py3k and 3.1.

msg96762 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2009-12-21 18:34

It's ok for me.

msg96872 - (view)

Author: Ezio Melotti (ezio.melotti) * (Python committer)

Date: 2009-12-24 23:03

Fixed in r77045 (trunk) and r77046 (release26-maint). No need to port it to py3k since unicode() is gone.

History

Date

User

Action

Args

2022-04-11 14:56:49

admin

set

nosy: + benjamin.peterson
github: 50358

2009-12-24 23:03:08

ezio.melotti

set

status: open -> closed
messages: +

keywords: - needs review
resolution: accepted -> fixed
stage: commit review -> resolved

2009-12-21 18:34:06

pitrou

set

resolution: accepted
messages: +
stage: patch review -> commit review

2009-12-21 17:40:36

ezio.melotti

set

files: + issue6108-6.patch
resolution: accepted -> (no value)
messages: +

stage: commit review -> patch review

2009-12-21 16:05:58

pitrou

set

resolution: accepted
messages: +
stage: patch review -> commit review

2009-12-21 16:00:29

ezio.melotti

set

files: + issue6108-5.patch

messages: +

2009-12-21 11:54:50

pitrou

set

messages: +

2009-12-20 19:29:52

ezio.melotti

set

files: + issue6108-4.patch

messages: +

2009-12-13 12:52:51

pitrou

set

messages: +

2009-12-13 06:01:54

ezio.melotti

set

files: + issue6108-3.patch

messages: +

2009-12-13 04:56:44

ezio.melotti

set

messages: +

2009-12-13 04:51:43

ezio.melotti

set

messages: +

2009-12-13 04:49:12

ncoghlan

set

messages: +

2009-12-13 04:44:57

rbcollins

set

nosy: + rbcollins
messages: +

2009-12-13 04:39:31

ezio.melotti

set

keywords: + needs review
files: + issue6108-2.patch
messages: +

stage: needs patch -> patch review

2009-12-13 04:24:37

ncoghlan

set

messages: +

2009-12-13 01:52:48

pitrou

set

messages: +

2009-12-13 01:39:56

ezio.melotti

set

messages: +

2009-12-13 01:09:45

ncoghlan

set

messages: +

2009-12-12 15:53:07

pitrou

set

nosy: + ncoghlan
messages: +

2009-12-12 03:21:58

ezio.melotti

set

files: + issue6108.diff
assignee: ezio.melotti
messages: +

keywords: + patch

2009-11-12 05:25:54

ezio.melotti

set

files: + unicode_exceptions.py

2009-11-12 05:24:18

ezio.melotti

set

keywords: - patch
files: + output_on_py26.txt

2009-11-12 05:23:02

ezio.melotti

set

files: - unicode_exceptions.py

2009-11-12 05:22:50

ezio.melotti

set

files: + issue6108_testcase.diff
priority: high -> release blocker
title: unicode(exception) behaves differently on Py2.6 when len(exception.args) > 1 -> unicode(exception) and str(exception) should return the same message on Py2.6
messages: +

keywords: + patch

2009-09-29 18:31:08

barry

set

priority: release blocker -> high
nosy: + barry
messages: +

2009-09-16 19:59:17

georg.brandl

link

issue6890 superseder

2009-09-16 19:41:34

georg.brandl

set

priority: high -> release blocker

2009-09-13 14:36:34

pitrou

set

priority: high
versions: + Python 2.7
nosy: + pitrou

messages: +

stage: needs patch

2009-09-13 05:00:41

exarkun

set

nosy: + exarkun
messages: +

2009-09-12 08:45:38

cvrebert

set

nosy: + cvrebert

2009-05-29 07:37:10

georg.brandl

link

issue5274 superseder

2009-05-26 04:53:13

ezio.melotti

create