[Python-Dev] unicode Exception messages in py2.7 (original) (raw)
Chris Barker chris.barker at noaa.gov
Thu Nov 14 18:32:10 CET 2013
- Previous message: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522
- Next message: [Python-Dev] unicode Exception messages in py2.7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Folks,
(note this is about 2.7 -- sorry, but a lot of us still use that! I can only assume that in 3.* this is a non-issue)
I just discovered an issue that's been around a long time:
If you create an Exception with a unicode object for the message, the message can be silently ignored if it can not be encoded to ASCII (or, more properly, the default encoding).
In my use-case, I was parsing a text file (utf-8), and wanted a bit of that text to be part of the Exception message (an error reading the file, I wanted the user to know what the text was surrounding the ill-formated part of the text file).
What I got was a blank message, and it took a lot of poking at it to figure out why.
My solution was:
msg = u"Problem with line %i: %s This is not a
valid time slot"%(linenum, line) raise ValueError(msg.encode('ascii', 'ignore'))
which is really pretty painfully clunky.
This is an issue brought up in various tutorial and blog posts, and all the solutions I've seen involve some similar clunkiness.
I also found this issue in the issue tracker:
http://bugs.python.org/issue2517
Which was resolved years ago, but as far as I can tell, only solved the problem of being able to do:
unicode(an_exception)
and get the proper unicode message object. But we still can't raise the darn thing and expect the user to see the message.
Why is this the case? I can print a unicode object to the terminal, why can't raising an Exception print a unicode object?
I can imagine for backward compatibility, or maybe for non-unicode terminals, or ??? Exceptions do need to print as ascii. However, having a message simply get swallowed up and disappear seems like the wrong solution.
auto-conversion to a default encoding is fraught with problems all over the board -- I know that. I also know that too much code would break too often if we didn't have auto-conversion.
for the most part, the auto-conversion uses 'strict' mode -- I generally dislike this, as it means code crashes when odd stuff gets introduced after testing, but I can see why it is done.
However, I can see why for raising Exceptions, the decision was made to swallow that error, so that the actual Exception intended is raised, rather than a new UnicodeEncodeError.
But combining 'strict' with ignoring the encoding exception seems like the worst of both worlds.
So a proposal:
Use 'replace" mode for the encoding to the default, and at least the user would see SOMETHING of the message. In a common case, it would be a lot of ascii, and in the worse case it would be a lot of question marks -- still better than a totally blank message.
Another option would be to use the str(repr(the_message)) so the user would get the escaped version. Though I think that would be more ugly.
What am I missing? This seems so obvious, and easy to do (though maybe it's buried in the C implementation of Exceptions)
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
- Previous message: [Python-Dev] [Python-checkins] Daily reference leaks (784a02ec2a26): sum=522
- Next message: [Python-Dev] unicode Exception messages in py2.7
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]