Issue 8313: message in unittest tracebacks (original) (raw)

Created on 2010-04-05 10:56 by michael.foord, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
unittest2-issue-8313.patch	gthb,2010-05-04 13:58	Patch: more useful standard assertion messages on unicode comparisons
traceback_unicode.patch	vstinner,2010-05-04 15:17

Messages (13)
msg102368 - (view)	Author: Michael Foord (michael.foord) *	Date: 2010-04-05 10:56
>>> import unittest >>> class Foo(unittest.TestCase): ... def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') ... >>> unittest.main(exit=False) F ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "", line 2, in test_fffd AssertionError: ---------------------------------------------------------------------- Ran 1 test in 0.001s The problem with creating unicode tracebacks is that they could fail when being output on terminals not capable of showing the full range of unicode characters (the default terminal on Windows is CP1252). This can already happen with Unicode messages that aren't part of the traceback. Detecting the 'unprintable' message before calling into traceback and replacing it with the repr of the unicode is one possibility.
msg104937 - (view)	Author: Gunnlaugur Thor Briem (gthb)	Date: 2010-05-04 13:58
Replacing the message with its repr seems to me at least strongly preferable to the current “hide it all” behavior. :) Better, msg.encode('ascii', 'backslashreplace') does what repr does with unencodable characters, but does not add the quotes, so the behavior is only different when it needs to be. Better still, 'ascii' need not be hardcoded. I'm attaching a patch that sets the encoding from an environment variable, defaulting to 'ascii', and encodes the message with 'backslashreplace'. This makes unicode string equality assertions much more useful for me. The encoding could also be configurable by some clean hook for test runners to use. unit2 could have a command-line parameter, and TextTestRunner could use stream.encoding if not None (or PYTHONIOENCODING on Python 3). Ideally messages should not be forced to be 8-bit strings by the failure exception class, but I suppose that's a bigger change than you would want to make. The downside of using backslashreplace (or repr, for that matter) is that it does not preserve lengths, so the diff markers can get misaligned. I find that an acceptable tradeoff, but 'replace' is another option that preserves lengths, at least more often.
msg104940 - (view)	Author: Michael Foord (michael.foord) *	Date: 2010-05-04 14:10
Sounds like a good solution - I'll look at this, thanks.
msg104943 - (view)	Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *	Date: 2010-05-04 14:31
Very recently, changed regrtest.py to use 'backslashreplace' when printing errors. This issue seems very similar
msg104946 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-04 15:01
The example raises an AssertionError(u'\n- \ufffd+ \ufffd\ufffd') which is converted to string by traceback.format_exception(). This function fails in _some_str() on str(value) instruction. You can reproduce the error with: >>> str(AssertionError(u"\xe9")) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128) > The problem with creating unicode tracebacks is that they could > fail when being output on terminals not capable of showing > the full range of unicode characters (the default terminal > on Windows is CP1252). The problem is not related to the terminal encoding: str(value) uses Python default encoding (ASCII by default). Python3 is not concerned because str(AssertionError("\xe9")) doesn't raise any error: it returns "\xe9".
msg104947 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-04 15:04
> Very recently, changed regrtest.py to use > 'backslashreplace' when printing errors. This issue seems > very similar Issue #8533 is not directly related because in this issue the error occurs before writing the traceback to the terminal.
msg104949 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-04 15:17
Attached patch fixes _some_str() function of the traceback module: encode unicode exception message to ASCII using backslashreplace error handler. ASCII is not the best choice, but str(unicode(...)) uses also ASCII (the default encoding) and we don't know the terminal encoding in traceback. We cannot do better here in Python2 (without breaking a lot of APIs...). The right fix is to use Python3 which formats a traceback to unicode (unicode characters of the error message are kept unchanged). The choice of the encoding and error handler is made only at the end, when writing the output to the terminal, which is the right thing to do.
msg104989 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-05 00:32
> The downside of using backslashreplace (or repr, for that matter) is > that it does not preserve lengths, so the diff markers can get > misaligned. I find that an acceptable tradeoff, but 'replace' is > another option that preserves lengths, at least more often. 'replace' loose important informations: if the test is about the unicode string content, we will be unable to see the error data. Result of the first example with my patch (backslashreplace): ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "x.py", line 3, in test_fffd def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') AssertionError: - \ufffd+ \ufffd\ufffd Result of the first example with 'replace' error handler: ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "x.py", line 3, in test_fffd def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') AssertionError: - ?+ ?? (but this example is irrevelant because U+FFFD is the unicode replacement character :-D) If nobody complains about my patch, I will commit it to Python trunk (only). You can still reimplement fail() method to encode the message using a more revelant encoding and/or error handler.
msg105011 - (view)	Author: Michael Foord (michael.foord) *	Date: 2010-05-05 10:19
I would prefer to try str(...) first and only attempt to convert to unicode and do the backslash replace if the str(...) call fails.
msg105022 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-05-05 12:46
Commited: r80777 (trunk) and r80779 (2.6); blocked: r80778 (py3k). Open a new issue if you would like to use something better than ASCII+backslashreplace in unittest (using runner stream encoding?).
msg150127 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2011-12-23 02:32
http://pypi.python.org/pypi/unittest2 says "There are several places in unittest2 (and unittest) that call str(...) on exceptions to get the exception message. This can fail if the exception was created with non-ascii unicode. This is rare and I won't address it unless it is actually reported as a problem for someone." It is a problem for us now that we've re-rooted all our TestCases on top of unittest2 at work. :) The solution I'm leaning towards is monkey-patching the new traceback._some_str implementation in at unittest2 import time.
msg150128 - (view)	Author: Gregory P. Smith (gregory.p.smith) *	Date: 2011-12-23 02:32
We're on python 2.6, otherwise this would be a moot point. but you might want to include something like that in a new unittest2 backport release.
msg150172 - (view)	Author: Michael Foord (michael.foord) *	Date: 2011-12-23 15:45
traceback patch looks good. Thanks for the unittest2 patch as well.

History
Date	User	Action	Args
2022-04-11 14:56:59	admin	set	github: 52560
2011-12-23 15:45:17	michael.foord	set	messages: +
2011-12-23 02:32:50	gregory.p.smith	set	messages: +
2011-12-23 02:32:04	gregory.p.smith	set	nosy: + gregory.p.smithmessages: +
2010-05-05 12:46:29	vstinner	set	status: open -> closedresolution: fixedmessages: +
2010-05-05 10:19:56	michael.foord	set	messages: +
2010-05-05 10:06:18	michael.foord	set	messages: -
2010-05-05 10:00:40	michael.foord	set	messages: +
2010-05-05 00:32:21	vstinner	set	messages: +
2010-05-04 15:17:05	vstinner	set	files: + traceback_unicode.patchmessages: +
2010-05-04 15:04:09	vstinner	set	messages: +
2010-05-04 15:01:48	vstinner	set	nosy:amaury.forgeotdarc, vstinner, ezio.melotti, michael.foord, gthbmessages: + components: + Unicodeversions: - Python 3.2
2010-05-04 14:31:03	amaury.forgeotdarc	set	nosy: + amaury.forgeotdarc, vstinnermessages: +
2010-05-04 14:10:48	michael.foord	set	messages: +
2010-05-04 13:58:50	gthb	set	files: + unittest2-issue-8313.patchnosy: + gthbmessages: + keywords: + patch
2010-04-05 10:56:28	michael.foord	create