[Python-Dev] unicode Exception messages in py2.7 (original) (raw)

Chris Barker chris.barker at noaa.gov
Thu Nov 14 18:32:10 CET 2013


Folks,

(note this is about 2.7 -- sorry, but a lot of us still use that! I can only assume that in 3.* this is a non-issue)

I just discovered an issue that's been around a long time:

If you create an Exception with a unicode object for the message, the message can be silently ignored if it can not be encoded to ASCII (or, more properly, the default encoding).

In my use-case, I was parsing a text file (utf-8), and wanted a bit of that text to be part of the Exception message (an error reading the file, I wanted the user to know what the text was surrounding the ill-formated part of the text file).

What I got was a blank message, and it took a lot of poking at it to figure out why.

My solution was:

                msg = u"Problem with line %i: %s This is not a

valid time slot"%(linenum, line) raise ValueError(msg.encode('ascii', 'ignore'))

which is really pretty painfully clunky.

This is an issue brought up in various tutorial and blog posts, and all the solutions I've seen involve some similar clunkiness.

I also found this issue in the issue tracker:

http://bugs.python.org/issue2517

Which was resolved years ago, but as far as I can tell, only solved the problem of being able to do:

unicode(an_exception)

and get the proper unicode message object. But we still can't raise the darn thing and expect the user to see the message.

Why is this the case? I can print a unicode object to the terminal, why can't raising an Exception print a unicode object?

I can imagine for backward compatibility, or maybe for non-unicode terminals, or ??? Exceptions do need to print as ascii. However, having a message simply get swallowed up and disappear seems like the wrong solution.

So a proposal:

Use 'replace" mode for the encoding to the default, and at least the user would see SOMETHING of the message. In a common case, it would be a lot of ascii, and in the worse case it would be a lot of question marks -- still better than a totally blank message.

Another option would be to use the str(repr(the_message)) so the user would get the escaped version. Though I think that would be more ugly.

What am I missing? This seems so obvious, and easy to do (though maybe it's buried in the C implementation of Exceptions)

-Chris

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker at noaa.gov



More information about the Python-Dev mailing list