Issue 2517: Error when printing an exception containing a Unicode string (original) (raw)

Created on 2008-03-30 23:13 by christoph, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (36)

msg64770 - (view)

Author: Christoph Burgmer (christoph)

Date: 2008-03-30 23:13

Python seems to have problems when an exception is thrown that contains non-ASCII text as a message and is converted to a string.

try: ... raise Exception(u'Error when printing ü') ... except Exception, e: ... print e ... Traceback (most recent call last): File "", line 4, in ? UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 20: ordinal not in range(128)

See http://www.stud.uni-karlsruhe.de/~uyhc/de/content/python-and-exceptions-containing-unicode-messages

msg64771 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-03-30 23:21

That is because Python encodes it's error messages as ASCII by default, and "ü" is not in ASCII. You can fix this by using "print unicode_msg.encode("utf-8")" or something similar.

msg64779 - (view)

Author: Christoph Burgmer (christoph)

Date: 2008-03-31 09:47

To be more precise: I see no way to convert the encapsulated non-ASCII data from the string in an easy way. Taking e from my last post none of the following will work: str(e) # UnicodeDecodeError e.str() # UnicodeDecodeError e.unicode() # AttributeError unicode(e) # UnicodeDecodeError unicode(e, 'utf8') # TypeError

My solution around this right now is raising an exception with an already converted string (see the link I provided).

But as the tutorials speak of simply "print e" I guess the behaviour described above is some kind of a bug.

msg64781 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-03-31 11:58

Use: print unicode(e.message).encode("utf-8")

msg64782 - (view)

Author: Christoph Burgmer (christoph)

Date: 2008-03-31 12:19

Thanks, this does work.

But, where can I find the piece of information you just gave to me in the docs? I couldn't find any interface definition for Exceptions.

Further more will this be regarded as a bug? From [1] I understand that "unicode(e)" and "unicode(e, 'utf8')" are supposed to work. No limitations are made on the type of the object. And I suppose that unicode() is the exact equivalent of str() in that it copes with unicode strings. Not expecting the string representation of an Exception to return a Unicode string when its content is non-ASCII where as this kind of behaviour of simple string conversion is wished for with ASCII text seems unlikely cumbersome.

Please reopen if my report does have a point.

[1] http://docs.python.org/lib/built-in-funcs.html

msg64786 - (view)

Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)

Date: 2008-03-31 16:13

Note the interpreter cannot print the exception either:

raise Exception(u'Error when printing ü') Traceback (most recent call last): File "", line 1, in Exception>>>

msg64793 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-03-31 21:36

I am going to reopen this issue for Py3k. The recommended encoding for Python source files in 2.x is ASCII; I wouldn't say correctly dealing with non-ASCII exceptions is fully supported. In 3.x, however, the recommended encoding is UTF-8, so this should work.

In Py3k, str(e) # str is unicode in Py3k does work correctly, and that'll have to be used because the message attribute is gone is 3.x. However, the problem Amaury pointed out is not fixed. Exceptions that cannot encoding into ASCII are silently not printed. I think a warning should at least be printed.

msg64794 - (view)

Author: Christoph Burgmer (christoph)

Date: 2008-03-31 22:19

Though I welcome the reopening of the bug for Python 3.0 I must say that plans of not fixing a core element rather surprises me.

I never believed Python to be a programming language with good Unicode integration. Several points were missing that would've been nice or even essential to have for good development with Unicode, most ignored for the sake of maintaining backward compatibility. This though is not the fault of the Unicode class itself and supporting packages.

Some modules like the one for CSV are lacking full Unicode support. But nevertheless the basic Python would always give you the possibility to use Unicode in (at least) a consistent way. For me raising exceptions does count as basic support like this.

So I still hope to see this solved for the 2.x versions which I read will be maintained even after the release of 3.0.

msg64795 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-03-31 22:30

I never believed Python to be a programming language with good Unicode integration. Several points were missing that would've been nice or even essential to have for good development with Unicode, most ignored for the sake of maintaining backward compatibility. This though is not the fault of the Unicode class itself and supporting packages. Many (including myself) agree with you. That's pretty much the whole point of Py3k. We want to fix the Python "warts" which can only be fixed by breaking backwards compatibility.

msg64797 - (view)

Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer)

Date: 2008-03-31 23:10

Even in 2.5, str is allowed to return a Unicode object; we could change BaseException_str this way:

Index: exceptions.c

--- exceptions.c (revision 61957) +++ exceptions.c (working copy) @@ -108,6 +104,11 @@ break; case 1: out = PyObject_Str(PyTuple_GET_ITEM(self->args, 0));

PyErr_ExceptionMatches(PyExc_UnicodeEncodeError))

Then str(e) still raises UnicodeEncodeError, but unicode(e) returns the original message.

But I would like the opinion of an experimented core developer...

msg64798 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-04-01 02:22

After thinking some more, I'm going to add 2.6 to this. I'm attaching a patch for the trunk (it can be merged in Py3k, and maybe 2.5) which displays a UnicodeWarning when an Exception cannot be displayed due to encoding issues.

Georg, can you review Amaury's and my patches? Also, would mine be a candidate for 2.5 backporting?

msg64802 - (view)

Author: Antoine Pitrou (pitrou) * (Python committer)

Date: 2008-04-01 06:41

Shouldn't it be an exception rather than a warning? The fact that an exception can be downgraded to a warning (and thus involuntarily silenced) is a bit disturbing IMHO.

Another possibility would be to display the warning, and then to encode the exception message again in "replace" or "ignore" mode rather than "strict" mode. That way exception messages are always displayed, but not always properly. The ASCII part of the message is generally useful, since it gives the exception name and most often the reason too.

msg64807 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-04-01 12:44

Have you looked at PyErr_Display? There are many, many possible exceptions, and it ignores them all because "too many callers rely on this." So, I think all we can do is warn. I will look into encoding the message differently.

msg64866 - (view)

Author: Christoph Burgmer (christoph)

Date: 2008-04-02 17:42

JFTR:

print unicode(e.message).encode("utf-8") only works for Python 2.5, not downwards.

msg64876 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-04-02 20:57

We can't do much about that because only security fixes are backported to version < 2.5.

msg67863 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-09 13:39

One of the examples Christoph tried was

unicode(Exception(u'\xe1'))

which fails quite oddly with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

The reason for this is Exception lacks an unicode method implementation so that unicode(e) does something like unicode(str(e)) which attempts to convert the exception arguments to the default encoding (almost always ASCII) and fails.

Fixing this seems quite important. It's common to want to raise errors with non-ASCII characters (e.g. when the data which caused the error contains such characters). Usually the code raising the error has no way of knowing how the characters should be encoded (exceptions can end up being written to log files, displayed in web interfaces, that sort of thing). This means raising exceptions with unicode messages. Using unicode(e.message) is unattractive since it won't work in 3.0 and also does not duplicate str(e)'s handling of the other exception init arguments.

I'm attaching a patch which implements unicode for BaseException. Because of the lack of a tp_unicode slot to mirror tp_str slot, this breaks the test that calls unicode(Exception). The existing test for unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test. My patch adds a test of unicode(Exception(u'\xe1')) which fails without the patch.

A quick look through trunk suggests implementing tp_unicode actually wouldn't be a huge job. My worry is that this would constitute a change to the C API for PyObjects and has little chance of acceptance into 2.6 (and in 3.0 all these issues disappear anyway). If there is some chance of acceptance, I'm willing to write a patch that adds tp_unicode.

msg67865 - (view)

Author: David Fraser (davidfraser)

Date: 2008-06-09 15:53

Aha - the unicode method was previously there in Python 2.5, and was ripped out because of the unicode(Exception) problem. See http://bugs.python.org/issue1551432.

The reversion is in http://svn.python.org/view/python/trunk/Objects/exceptions.c?rev=51837&r1=51770&r2=51837

msg67867 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-06-09 15:56

On Mon, Jun 9, 2008 at 8:40 AM, Simon Cross <report@bugs.python.org> wrote:

Simon Cross <hodgestar@gmail.com> added the comment:

One of the examples Christoph tried was

unicode(Exception(u'\xe1'))

which fails quite oddly with:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)

The reason for this is Exception lacks an unicode method implementation so that unicode(e) does something like unicode(str(e)) which attempts to convert the exception arguments to the default encoding (almost always ASCII) and fails.

What version are you using? In Py3k, str is unicode so str can return a unicode string.

Fixing this seems quite important. It's common to want to raise errors with non-ASCII characters (e.g. when the data which caused the error contains such characters). Usually the code raising the error has no way of knowing how the characters should be encoded (exceptions can end up being written to log files, displayed in web interfaces, that sort of thing). This means raising exceptions with unicode messages. Using unicode(e.message) is unattractive since it won't work in 3.0 and also does not duplicate str(e)'s handling of the other exception init arguments.

I'm attaching a patch which implements unicode for BaseException. Because of the lack of a tp_unicode slot to mirror tp_str slot, this breaks the test that calls unicode(Exception). The existing test for unicode(e) does unicode(Exception(u"Foo")) which is a bit of a non-test. My patch adds a test of unicode(Exception(u'\xe1')) which fails without the patch.

A quick look through trunk suggests implementing tp_unicode actually wouldn't be a huge job. My worry is that this would constitute a change to the C API for PyObjects and has little chance of acceptance into 2.6 (and in 3.0 all these issues disappear anyway). If there is some chance of acceptance, I'm willing to write a patch that adds tp_unicode.

Email Python-dev for permission.

msg67868 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-09 16:03

Concerning http://bugs.python.org/issue1551432:

I'd much rather have working unicode(e) than working unicode(Exception). Calling unicode(C) on any class C which overrides unicode is broken without tp_unicode anyway.

msg67869 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-09 16:11

Benjamin Peterson wrote:

What version are you using? In Py3k, str is unicode so str can return a unicode string.

I'm sorry it wasn't clear. I'm aware that this issue doesn't apply to Python 3.0. I'm testing on both Python 2.5 and Python 2.6 for the purposes of the bug.

Code I'm developing that hits these issues are database exceptions with unicode messages raised inside MySQLdb on Python 2.5.

The patch I submitted is against trunk.

msg67870 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2008-06-09 16:20

Removing 3.0 from the versions list.

msg67874 - (view)

Author: David Fraser (davidfraser)

Date: 2008-06-09 19:04

So I've got a follow-up patch that adds tp_unicode. Caveat that I've never done anything like this before and it's almost certain to be wrong.

It does however generate the desired result in this case :-)

msg67875 - (view)

Author: Benjamin Peterson (benjamin.peterson) * (Python committer)

Date: 2008-06-09 19:09

On Mon, Jun 9, 2008 at 2:04 PM, David Fraser <report@bugs.python.org> wrote:

David Fraser <davidf@sjsoft.com> added the comment:

So I've got a follow-up patch that adds tp_unicode. Caveat that I've never done anything like this before and it's almost certain to be wrong.

Unfortunately, adding a slot is a bit more complicated. You have to deal with inheritance and such. Have a look in typeobject.c for all the gory details. I'd recommend you write to python-dev before going on the undertaking, though.

It does however generate the desired result in this case :-)

Added file: http://bugs.python.org/file10562/tp_unicode_exception.patch


Python tracker <report@bugs.python.org> <http://bugs.python.org/issue2517>


msg67944 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-06-11 09:32

As far as I am concerned, the implementation of PyObject_Unicode in object.c has a bug in it: it should NEVER be retrieving unicode from the instance object. The implementation of PyObject_Format in abstract.c shows the correct way to retrieve a pseudo-slot method like unicode from an arbitrary object.

Line 482 in object.c is the offending line: func = PyObject_GetAttr(v, unicodestr);

Fix that bug, then add a unicode method back to Exception objects and you will have the best of both worlds.

msg67946 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2008-06-11 09:47

On 2008-06-11 11:32, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

As far as I am concerned, the implementation of PyObject_Unicode in object.c has a bug in it: it should NEVER be retrieving unicode from the instance object. The implementation of PyObject_Format in abstract.c shows the correct way to retrieve a pseudo-slot method like unicode from an arbitrary object.

The only difference I can spot is that the PyObject_Format() code special cases non-instance objects.

Line 482 in object.c is the offending line: func = PyObject_GetAttr(v, unicodestr);

Fix that bug, then add a unicode method back to Exception objects and you will have the best of both worlds.

I'm not sure whether that would really solve anything.

IMHO, it's better to implement the tp_unicode slot and then check that before trying .unicode (as mentioned in the comment in PyObject_Unicode()).

msg67947 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-06-11 10:02

Here's the key difference with the way PyObject_Format looks up the pseudo-slot method:

    PyObject *method = _PyType_Lookup(Py_TYPE(obj),
                      str__format__);

_PyType_Lookup instead of PyObject_GetAttr - so unicode(Exception) would only look for type.unicode and avoid getting confused by the utterly irrelevant Exception.unicode method (which is intended only for printing Exception instances, not for printing the Exception type itself).

You then need the PyInstance_Check/PyObject_GetAttr special case for retrieving the bound method because _PyType_Lookup won't work on classic class instances.

msg67950 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-11 11:54

Attached a patch which implements Nick Coghlan's suggestion. All existing tests in test_exceptions.py and test_unicode.py pass as does the new unicode(Exception(u"\xe1")) test.

msg67974 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-06-11 14:15

Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.

msg67980 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2008-06-11 14:33

On 2008-06-11 16:15, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.

Added file: http://bugs.python.org/file10585/exception-unicode-with-type-fetch-no-whitespace-changes.diff

That approach is fine as well.

I still like the idea to add a tp_unicode slot, though, since that's still missing for C extension types to benefit from.

Perhaps we can have both ?!

msg67984 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-06-11 14:49

I'm not sure adding a dedicated method slot would be worth the hassle involved - Py3k drop backs to just the tp_str slot anyway, and the only thing you gain with a tp_unicode slot over _PyType_Lookup of a unicode attribute is a small reduction in memory usage and a slight speed increase.

msg67985 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-11 14:53

Re :

Minor cleanup of Simon's patch attached - aside from a couple of unneeded whitespace changes, it all looks good to me.

Not checking it in yet, since it isn't critical for this week's beta release - I'd prefer to leave it until after that has been dealt with.

Thanks for the clean-up, Nick. The mixture of tabs and spaces in the current object.c was unpleasant :/.

msg67994 - (view)

Author: Marc-Andre Lemburg (lemburg) * (Python committer)

Date: 2008-06-11 16:50

On 2008-06-11 16:49, Nick Coghlan wrote:

Nick Coghlan <ncoghlan@gmail.com> added the comment:

I'm not sure adding a dedicated method slot would be worth the hassle involved - Py3k drop backs to just the tp_str slot anyway, and the only thing you gain with a tp_unicode slot over _PyType_Lookup of a unicode attribute is a small reduction in memory usage and a slight speed increase.

AFAIK, _PyType_Lookup will only work for base types, ie. objects subclassing from object. C extension types often do not inherit from object, since the attribute access mechanisms and object creation are a lot simpler when not doing so.

msg68394 - (view)

Author: Simon Cross (hodgestar)

Date: 2008-06-19 08:24

Justing prodding the issue again now that the betas are out. What's the next step?

msg69384 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-07-07 12:53

Adding this to my personal to-do list for the next beta release.

msg69436 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2008-07-08 14:14

Fixed in 64791.

Blocked from being merged to Py3k (since there is no longer a unicode special method).

For MAL: the PyInstance_Check included in the patch for the benefit of classic classes defined in Python code also covers all of the classic C extension classes which are not instances of object.

msg333419 - (view)

Author: Piotr Dobrogost (piotr.dobrogost)

Date: 2019-01-10 21:18

Benjamin Peterson in comment https://bugs.python.org/issue2517#msg64771 wrote:

"That is because Python encodes it's error messages as ASCII by default…"

Could somebody please point where in the source code of Python 2 this happens?

History

Date

User

Action

Args

2022-04-11 14:56:32

admin

set

github: 46769

2019-01-10 21🔞29

piotr.dobrogost

set

messages: +

2019-01-08 22:02:42

piotr.dobrogost

set

nosy: + piotr.dobrogost

2009-09-17 10:12:59

ezio.melotti

set

nosy: + ezio.melotti

2008-07-08 14:15:53

ncoghlan

set

status: open -> closed
resolution: fixed

2008-07-08 14:14:15

ncoghlan

set

messages: +

2008-07-07 12:53:19

ncoghlan

set

priority: normal -> critical
assignee: georg.brandl -> ncoghlan
messages: +

2008-06-19 08:24:25

hodgestar

set

messages: +

2008-06-13 02:22:12

ggenellina

set

nosy: + ggenellina

2008-06-11 16:50:26

lemburg

set

messages: +

2008-06-11 14:53:04

hodgestar

set

messages: +

2008-06-11 14:49:20

ncoghlan

set

messages: +

2008-06-11 14:33:57

lemburg

set

messages: +

2008-06-11 14:15:11

ncoghlan

set

files: + exception-unicode-with-type-fetch-no-whitespace-changes.diff
messages: +

2008-06-11 11:54:40

hodgestar

set

files: + exception-unicode-with-type-fetch.diff
messages: +

2008-06-11 10:02:03

ncoghlan

set

messages: +

2008-06-11 09:47:17

lemburg

set

messages: +

2008-06-11 09:32:22

ncoghlan

set

nosy: + ncoghlan
messages: +

2008-06-09 19:09:03

benjamin.peterson

set

messages: +

2008-06-09 19:04:55

davidfraser

set

files: + tp_unicode_exception.patch
messages: +

2008-06-09 16:20:45

lemburg

set

nosy: + lemburg
messages: +
versions: - Python 3.0

2008-06-09 16:11:28

hodgestar

set

messages: +

2008-06-09 16:03:25

hodgestar

set

messages: +

2008-06-09 15:56:37

benjamin.peterson

set

messages: +

2008-06-09 15:53:25

davidfraser

set

messages: +

2008-06-09 15:44:46

davidfraser

set

nosy: + davidfraser

2008-06-09 13:39:22

hodgestar

set

files: + exception-unicode.diff
nosy: + hodgestar
messages: +

2008-04-02 20:57:53

benjamin.peterson

set

messages: +

2008-04-02 17:42:23

christoph

set

messages: +

2008-04-01 12:44:42

benjamin.peterson

set

messages: +

2008-04-01 06:41:07

pitrou

set

nosy: + pitrou
messages: +

2008-04-01 02:22:28

benjamin.peterson

set

files: + unicode_exception_warning.patch
versions: + Python 2.6, Python 2.5
nosy: + georg.brandl
messages: +
assignee: georg.brandl
keywords: + patch

2008-03-31 23:10:48

amaury.forgeotdarc

set

messages: +

2008-03-31 22:30:42

benjamin.peterson

set

messages: +

2008-03-31 22:19:26

christoph

set

messages: +

2008-03-31 21:36:01

benjamin.peterson

set

status: closed -> open
priority: normal
resolution: not a bug -> (no value)
messages: +
versions: + Python 3.0, - Python 2.5, Python 2.4

2008-03-31 16:13:59

amaury.forgeotdarc

set

nosy: + amaury.forgeotdarc
messages: +

2008-03-31 12:19:24

christoph

set

messages: +

2008-03-31 11:58:21

benjamin.peterson

set

messages: +

2008-03-31 09:47:43

christoph

set

messages: +

2008-03-30 23:21:25

benjamin.peterson

set

status: open -> closed
resolution: not a bug
messages: +
nosy: + benjamin.peterson

2008-03-30 23:13:52

christoph

create