Issue 2517: Error when printing an exception containing a Unicode string (original) (raw)
Created on 2008-03-30 23:13 by christoph, last changed 2022-04-11 14:56 by admin. This issue is now closed.
Messages (36)
Author: Christoph Burgmer (christoph)
Date: 2008-03-30 23:13
Python seems to have problems when an exception is thrown that contains non-ASCII text as a message and is converted to a string.
try: ... raise Exception(u'Error when printing ü') ... except Exception, e: ... print e ... Traceback (most recent call last): File "", line 4, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 20: ordinal not in range(128)
See http://www.stud.uni-karlsruhe.de/~uyhc/de/content/python-and-exceptions-containing-unicode-messages
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-03-30 23:21
That is because Python encodes it's error messages as ASCII by default, and "ü" is not in ASCII. You can fix this by using "print unicode_msg.encode("utf-8")" or something similar.
Author: Christoph Burgmer (christoph)
Date: 2008-03-31 09:47
To be more precise: I see no way to convert the encapsulated non-ASCII data from the string in an easy way. Taking e from my last post none of the following will work: str(e) # UnicodeDecodeError e.str() # UnicodeDecodeError e.unicode() # AttributeError unicode(e) # UnicodeDecodeError unicode(e, 'utf8') # TypeError
My solution around this right now is raising an exception with an already converted string (see the link I provided).
But as the tutorials speak of simply "print e" I guess the behaviour described above is some kind of a bug.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-03-31 11:58
Use: print unicode(e.message).encode("utf-8")
Author: Christoph Burgmer (christoph)
Date: 2008-03-31 12:19
Thanks, this does work.
But, where can I find the piece of information you just gave to me in the docs? I couldn't find any interface definition for Exceptions.
Further more will this be regarded as a bug? From [1] I understand that "unicode(e)" and "unicode(e, 'utf8')" are supposed to work. No limitations are made on the type of the object. And I suppose that unicode() is the exact equivalent of str() in that it copes with unicode strings. Not expecting the string representation of an Exception to return a Unicode string when its content is non-ASCII where as this kind of behaviour of simple string conversion is wished for with ASCII text seems unlikely cumbersome.
Please reopen if my report does have a point.
[1] http://docs.python.org/lib/built-in-funcs.html
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-03-31 16:13
Note the interpreter cannot print the exception either:
raise Exception(u'Error when printing ü') Traceback (most recent call last): File "", line 1, in Exception>>>
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-03-31 21:36
I am going to reopen this issue for Py3k. The recommended encoding for Python source files in 2.x is ASCII; I wouldn't say correctly dealing with non-ASCII exceptions is fully supported. In 3.x, however, the recommended encoding is UTF-8, so this should work.
In Py3k, str(e) # str is unicode in Py3k does work correctly, and that'll have to be used because the message attribute is gone is 3.x. However, the problem Amaury pointed out is not fixed. Exceptions that cannot encoding into ASCII are silently not printed. I think a warning should at least be printed.
Author: Christoph Burgmer (christoph)
Date: 2008-03-31 22:19
Though I welcome the reopening of the bug for Python 3.0 I must say that plans of not fixing a core element rather surprises me.
I never believed Python to be a programming language with good Unicode integration. Several points were missing that would've been nice or even essential to have for good development with Unicode, most ignored for the sake of maintaining backward compatibility. This though is not the fault of the Unicode class itself and supporting packages.
Some modules like the one for CSV are lacking full Unicode support. But nevertheless the basic Python would always give you the possibility to use Unicode in (at least) a consistent way. For me raising exceptions does count as basic support like this.
So I still hope to see this solved for the 2.x versions which I read will be maintained even after the release of 3.0.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-03-31 22:30
I never believed Python to be a programming language with good Unicode integration. Several points were missing that would've been nice or even essential to have for good development with Unicode, most ignored for the sake of maintaining backward compatibility. This though is not the fault of the Unicode class itself and supporting packages. Many (including myself) agree with you. That's pretty much the whole point of Py3k. We want to fix the Python "warts" which can only be fixed by breaking backwards compatibility.
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *
Date: 2008-03-31 23:10
Even in 2.5, str is allowed to return a Unicode object; we could change BaseException_str this way:
Index: exceptions.c
--- exceptions.c (revision 61957) +++ exceptions.c (working copy) @@ -108,6 +104,11 @@ break; case 1: out = PyObject_Str(PyTuple_GET_ITEM(self->args, 0));
if (out == NULL &&
PyErr_ExceptionMatches(PyExc_UnicodeEncodeError))
{
PyErr_Clear();
out = PyObject_Unicode(PyTuple_GET_ITEM(self->args, 0));
default: out = PyObject_Str(self->args);} break;
Then str(e) still raises UnicodeEncodeError, but unicode(e) returns the original message.
But I would like the opinion of an experimented core developer...
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-04-01 02:22
After thinking some more, I'm going to add 2.6 to this. I'm attaching a patch for the trunk (it can be merged in Py3k, and maybe 2.5) which displays a UnicodeWarning when an Exception cannot be displayed due to encoding issues.
Georg, can you review Amaury's and my patches? Also, would mine be a candidate for 2.5 backporting?
Author: Antoine Pitrou (pitrou) *
Date: 2008-04-01 06:41
Shouldn't it be an exception rather than a warning? The fact that an exception can be downgraded to a warning (and thus involuntarily silenced) is a bit disturbing IMHO.
Another possibility would be to display the warning, and then to encode the exception message again in "replace" or "ignore" mode rather than "strict" mode. That way exception messages are always displayed, but not always properly. The ASCII part of the message is generally useful, since it gives the exception name and most often the reason too.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-04-01 12:44
Have you looked at PyErr_Display? There are many, many possible exceptions, and it ignores them all because "too many callers rely on this." So, I think all we can do is warn. I will look into encoding the message differently.
Author: Christoph Burgmer (christoph)
Date: 2008-04-02 17:42
JFTR:
print unicode(e.message).encode("utf-8") only works for Python 2.5, not downwards.
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-04-02 20:57
We can't do much about that because only security fixes are backported to version < 2.5.
Author: Simon Cross (hodgestar)
Date: 2008-06-09 13:39
One of the examples Christoph tried was
unicode(Exception(u'\xe1'))
which fails quite oddly with:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)
The reason for this is Exception lacks an unicode method implementation so that unicode(e) does something like unicode(str(e)) which attempts to convert the exception arguments to the default encoding (almost always ASCII) and fails.
Fixing this seems quite important. It's common to want to raise errors with non-ASCII characters (e.g. when the data which caused the error contains such characters). Usually the code raising the error has no way of knowing how the characters should be encoded (exceptions can end up being written to log files, displayed in web interfaces, that sort of thing). This means raising exceptions with unicode messages. Using unicode(e.message) is unattractive since it won't work in 3.0 and also does not duplicate str(e)'s handling of the other exception init arguments.
I'm attaching a patch which implements unicode for BaseException. Because of the lack of a tp_unicode slot to mirror tp_str slot, this breaks the test that calls unicode(Exception). The existing test for unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test. My patch adds a test of unicode(Exception(u'\xe1')) which fails without the patch.
A quick look through trunk suggests implementing tp_unicode actually wouldn't be a huge job. My worry is that this would constitute a change to the C API for PyObjects and has little chance of acceptance into 2.6 (and in 3.0 all these issues disappear anyway). If there is some chance of acceptance, I'm willing to write a patch that adds tp_unicode.
Author: David Fraser (davidfraser)
Date: 2008-06-09 15:53
Aha - the unicode method was previously there in Python 2.5, and was ripped out because of the unicode(Exception) problem. See http://bugs.python.org/issue1551432.
The reversion is in http://svn.python.org/view/python/trunk/Objects/exceptions.c?rev=51837&r1=51770&r2=51837
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-06-09 15:56
On Mon, Jun 9, 2008 at 8:40 AM, Simon Cross <report@bugs.python.org> wrote:
Simon Cross <hodgestar@gmail.com> added the comment:
One of the examples Christoph tried was
unicode(Exception(u'\xe1'))
which fails quite oddly with:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)
The reason for this is Exception lacks an unicode method implementation so that unicode(e) does something like unicode(str(e)) which attempts to convert the exception arguments to the default encoding (almost always ASCII) and fails.
What version are you using? In Py3k, str is unicode so str can return a unicode string.
Fixing this seems quite important. It's common to want to raise errors with non-ASCII characters (e.g. when the data which caused the error contains such characters). Usually the code raising the error has no way of knowing how the characters should be encoded (exceptions can end up being written to log files, displayed in web interfaces, that sort of thing). This means raising exceptions with unicode messages. Using unicode(e.message) is unattractive since it won't work in 3.0 and also does not duplicate str(e)'s handling of the other exception init arguments.
I'm attaching a patch which implements unicode for BaseException. Because of the lack of a tp_unicode slot to mirror tp_str slot, this breaks the test that calls unicode(Exception). The existing test for unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test. My patch adds a test of unicode(Exception(u'\xe1')) which fails without the patch.
A quick look through trunk suggests implementing tp_unicode actually wouldn't be a huge job. My worry is that this would constitute a change to the C API for PyObjects and has little chance of acceptance into 2.6 (and in 3.0 all these issues disappear anyway). If there is some chance of acceptance, I'm willing to write a patch that adds tp_unicode.
Email Python-dev for permission.
Author: Simon Cross (hodgestar)
Date: 2008-06-09 16:03
Concerning http://bugs.python.org/issue1551432:
I'd much rather have working unicode(e) than working unicode(Exception). Calling unicode(C) on any class C which overrides unicode is broken without tp_unicode anyway.
Author: Simon Cross (hodgestar)
Date: 2008-06-09 16:11
Benjamin Peterson wrote:
What version are you using? In Py3k, str is unicode so str can return a unicode string.
I'm sorry it wasn't clear. I'm aware that this issue doesn't apply to Python 3.0. I'm testing on both Python 2.5 and Python 2.6 for the purposes of the bug.
Code I'm developing that hits these issues are database exceptions with unicode messages raised inside MySQLdb on Python 2.5.
The patch I submitted is against trunk.
Author: Marc-Andre Lemburg (lemburg) *
Date: 2008-06-09 16:20
Removing 3.0 from the versions list.
Author: David Fraser (davidfraser)
Date: 2008-06-09 19:04
So I've got a follow-up patch that adds tp_unicode. Caveat that I've never done anything like this before and it's almost certain to be wrong.
It does however generate the desired result in this case :-)
Author: Benjamin Peterson (benjamin.peterson) *
Date: 2008-06-09 19:09
On Mon, Jun 9, 2008 at 2:04 PM, David Fraser <report@bugs.python.org> wrote:
David Fraser <davidf@sjsoft.com> added the comment:
So I've got a follow-up patch that adds tp_unicode. Caveat that I've never done anything like this before and it's almost certain to be wrong.
Unfortunately, adding a slot is a bit more complicated. You have to deal with inheritance and such. Have a look in typeobject.c for all the gory details. I'd recommend you write to python-dev before going on the undertaking, though.
It does however generate the desired result in this case :-)
Added file: http://bugs.python.org/file10562/tp_unicode_exception.patch
Python tracker <report@bugs.python.org> <http://bugs.python.org/issue2517>
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-06-11 09:32
As far as I am concerned, the implementation of PyObject_Unicode in object.c has a bug in it: it should NEVER be retrieving unicode from the instance object. The implementation of PyObject_Format in abstract.c shows the correct way to retrieve a pseudo-slot method like unicode from an arbitrary object.
Line 482 in object.c is the offending line: func = PyObject_GetAttr(v, unicodestr);
Fix that bug, then add a unicode method back to Exception objects and you will have the best of both worlds.
Author: Marc-Andre Lemburg (lemburg) *
Date: 2008-06-11 09:47
On 2008-06-11 11:32, Nick Coghlan wrote:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
As far as I am concerned, the implementation of PyObject_Unicode in object.c has a bug in it: it should NEVER be retrieving unicode from the instance object. The implementation of PyObject_Format in abstract.c shows the correct way to retrieve a pseudo-slot method like unicode from an arbitrary object.
The only difference I can spot is that the PyObject_Format() code special cases non-instance objects.
Line 482 in object.c is the offending line: func = PyObject_GetAttr(v, unicodestr);
Fix that bug, then add a unicode method back to Exception objects and you will have the best of both worlds.
I'm not sure whether that would really solve anything.
IMHO, it's better to implement the tp_unicode slot and then check that before trying .unicode (as mentioned in the comment in PyObject_Unicode()).
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-06-11 10:02
Here's the key difference with the way PyObject_Format looks up the pseudo-slot method:
PyObject *method = _PyType_Lookup(Py_TYPE(obj),
str__format__);
_PyType_Lookup instead of PyObject_GetAttr - so unicode(Exception) would only look for type.unicode and avoid getting confused by the utterly irrelevant Exception.unicode method (which is intended only for printing Exception instances, not for printing the Exception type itself).
You then need the PyInstance_Check/PyObject_GetAttr special case for retrieving the bound method because _PyType_Lookup won't work on classic class instances.
Author: Simon Cross (hodgestar)
Date: 2008-06-11 11:54
Attached a patch which implements Nick Coghlan's suggestion. All existing tests in test_exceptions.py and test_unicode.py pass as does the new unicode(Exception(u"\xe1")) test.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-06-11 14:15
Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.
Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.
Author: Marc-Andre Lemburg (lemburg) *
Date: 2008-06-11 14:33
On 2008-06-11 16:15, Nick Coghlan wrote:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.
Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.
Added file: http://bugs.python.org/file10585/exception-unicode-with-type-fetch-no-whitespace-changes.diff
That approach is fine as well.
I still like the idea to add a tp_unicode slot, though, since that's still missing for C extension types to benefit from.
Perhaps we can have both ?!
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-06-11 14:49
I'm not sure adding a dedicated method slot would be worth the hassle involved - Py3k drop backs to just the tp_str slot anyway, and the only thing you gain with a tp_unicode slot over _PyType_Lookup of a unicode attribute is a small reduction in memory usage and a slight speed increase.
Author: Simon Cross (hodgestar)
Date: 2008-06-11 14:53
Re :
Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.
Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.
Thanks for the clean-up, Nick. The mixture of tabs and spaces in the current object.c was unpleasant :/.
Author: Marc-Andre Lemburg (lemburg) *
Date: 2008-06-11 16:50
On 2008-06-11 16:49, Nick Coghlan wrote:
Nick Coghlan <ncoghlan@gmail.com> added the comment:
I'm not sure adding a dedicated method slot would be worth the hassle involved - Py3k drop backs to just the tp_str slot anyway, and the only thing you gain with a tp_unicode slot over _PyType_Lookup of a unicode attribute is a small reduction in memory usage and a slight speed increase.
AFAIK, _PyType_Lookup will only work for base types, ie. objects subclassing from object. C extension types often do not inherit from object, since the attribute access mechanisms and object creation are a lot simpler when not doing so.
Author: Simon Cross (hodgestar)
Date: 2008-06-19 08:24
Justing prodding the issue again now that the betas are out. What's the next step?
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-07-07 12:53
Adding this to my personal to-do list for the next beta release.
Author: Alyssa Coghlan (ncoghlan) *
Date: 2008-07-08 14:14
Fixed in 64791.
Blocked from being merged to Py3k (since there is no longer a unicode special method).
For MAL: the PyInstance_Check included in the patch for the benefit of classic classes defined in Python code also covers all of the classic C extension classes which are not instances of object.
Author: Piotr Dobrogost (piotr.dobrogost)
Date: 2019-01-10 21:18
Benjamin Peterson in comment https://bugs.python.org/issue2517#msg64771 wrote:
"That is because Python encodes it's error messages as ASCII by default…"
Could somebody please point where in the source code of Python 2 this happens?
History
Date
User
Action
Args
2022-04-11 14:56:32
admin
set
github: 46769
2019-01-10 21🔞29
piotr.dobrogost
set
messages: +
2019-01-08 22:02:42
piotr.dobrogost
set
nosy: + piotr.dobrogost
2009-09-17 10:12:59
ezio.melotti
set
nosy: + ezio.melotti
2008-07-08 14:15:53
ncoghlan
set
status: open -> closed
resolution: fixed
2008-07-08 14:14:15
ncoghlan
set
messages: +
2008-07-07 12:53:19
ncoghlan
set
priority: normal -> critical
assignee: georg.brandl -> ncoghlan
messages: +
2008-06-19 08:24:25
hodgestar
set
messages: +
2008-06-13 02:22:12
ggenellina
set
nosy: + ggenellina
2008-06-11 16:50:26
lemburg
set
messages: +
2008-06-11 14:53:04
hodgestar
set
messages: +
2008-06-11 14:49:20
ncoghlan
set
messages: +
2008-06-11 14:33:57
lemburg
set
messages: +
2008-06-11 14:15:11
ncoghlan
set
files: + exception-unicode-with-type-fetch-no-whitespace-changes.diff
messages: +
2008-06-11 11:54:40
hodgestar
set
files: + exception-unicode-with-type-fetch.diff
messages: +
2008-06-11 10:02:03
ncoghlan
set
messages: +
2008-06-11 09:47:17
lemburg
set
messages: +
2008-06-11 09:32:22
ncoghlan
set
nosy: + ncoghlan
messages: +
2008-06-09 19:09:03
benjamin.peterson
set
messages: +
2008-06-09 19:04:55
davidfraser
set
files: + tp_unicode_exception.patch
messages: +
2008-06-09 16:20:45
lemburg
set
nosy: + lemburg
messages: +
versions: - Python 3.0
2008-06-09 16:11:28
hodgestar
set
messages: +
2008-06-09 16:03:25
hodgestar
set
messages: +
2008-06-09 15:56:37
benjamin.peterson
set
messages: +
2008-06-09 15:53:25
davidfraser
set
messages: +
2008-06-09 15:44:46
davidfraser
set
nosy: + davidfraser
2008-06-09 13:39:22
hodgestar
set
files: + exception-unicode.diff
nosy: + hodgestar
messages: +
2008-04-02 20:57:53
benjamin.peterson
set
messages: +
2008-04-02 17:42:23
christoph
set
messages: +
2008-04-01 12:44:42
benjamin.peterson
set
messages: +
2008-04-01 06:41:07
pitrou
set
nosy: + pitrou
messages: +
2008-04-01 02:22:28
benjamin.peterson
set
files: + unicode_exception_warning.patch
versions: + Python 2.6, Python 2.5
nosy: + georg.brandl
messages: +
assignee: georg.brandl
keywords: + patch
2008-03-31 23:10:48
amaury.forgeotdarc
set
messages: +
2008-03-31 22:30:42
benjamin.peterson
set
messages: +
2008-03-31 22:19:26
christoph
set
messages: +
2008-03-31 21:36:01
benjamin.peterson
set
status: closed -> open
priority: normal
resolution: not a bug -> (no value)
messages: +
versions: + Python 3.0, - Python 2.5, Python 2.4
2008-03-31 16:13:59
amaury.forgeotdarc
set
nosy: + amaury.forgeotdarc
messages: +
2008-03-31 12:19:24
christoph
set
messages: +
2008-03-31 11:58:21
benjamin.peterson
set
messages: +
2008-03-31 09:47:43
christoph
set
messages: +
2008-03-30 23:21:25
benjamin.peterson
set
status: open -> closed
resolution: not a bug
messages: +
nosy: + benjamin.peterson
2008-03-30 23:13:52
christoph
create