| msg102368 - (view) |
Author: Michael Foord (michael.foord) *  |
Date: 2010-04-05 10:56 |
| >>> import unittest >>> class Foo(unittest.TestCase): ... def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') ... >>> unittest.main(exit=False) F ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "", line 2, in test_fffd AssertionError: ---------------------------------------------------------------------- Ran 1 test in 0.001s The problem with creating unicode tracebacks is that they could fail when being output on terminals not capable of showing the full range of unicode characters (the default terminal on Windows is CP1252). This can already happen with Unicode messages that aren't part of the traceback. Detecting the 'unprintable' message before calling into traceback and replacing it with the repr of the unicode is one possibility. |
|
|
| msg104937 - (view) |
Author: Gunnlaugur Thor Briem (gthb) |
Date: 2010-05-04 13:58 |
| Replacing the message with its repr seems to me at least strongly preferable to the current “hide it all” behavior. :) Better, msg.encode('ascii', 'backslashreplace') does what repr does with unencodable characters, but does not add the quotes, so the behavior is only different when it needs to be. Better still, 'ascii' need not be hardcoded. I'm attaching a patch that sets the encoding from an environment variable, defaulting to 'ascii', and encodes the message with 'backslashreplace'. This makes unicode string equality assertions much more useful for me. The encoding could also be configurable by some clean hook for test runners to use. unit2 could have a command-line parameter, and TextTestRunner could use stream.encoding if not None (or PYTHONIOENCODING on Python 3). Ideally messages should not be forced to be 8-bit strings by the failure exception class, but I suppose that's a bigger change than you would want to make. The downside of using backslashreplace (or repr, for that matter) is that it does not preserve lengths, so the diff markers can get misaligned. I find that an acceptable tradeoff, but 'replace' is another option that preserves lengths, at least more often. |
|
|
| msg104940 - (view) |
Author: Michael Foord (michael.foord) *  |
Date: 2010-05-04 14:10 |
| Sounds like a good solution - I'll look at this, thanks. |
|
|
| msg104943 - (view) |
Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) *  |
Date: 2010-05-04 14:31 |
| Very recently, changed regrtest.py to use 'backslashreplace' when printing errors. This issue seems very similar |
|
|
| msg104946 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-04 15:01 |
| The example raises an AssertionError(u'\n- \ufffd+ \ufffd\ufffd') which is converted to string by traceback.format_exception(). This function fails in _some_str() on str(value) instruction. You can reproduce the error with: >>> str(AssertionError(u"\xe9")) UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 0: ordinal not in range(128) > The problem with creating unicode tracebacks is that they could > fail when being output on terminals not capable of showing > the full range of unicode characters (the default terminal > on Windows is CP1252). The problem is not related to the terminal encoding: str(value) uses Python default encoding (ASCII by default). Python3 is not concerned because str(AssertionError("\xe9")) doesn't raise any error: it returns "\xe9". |
|
|
| msg104947 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-04 15:04 |
| > Very recently, changed regrtest.py to use > 'backslashreplace' when printing errors. This issue seems > very similar Issue #8533 is not directly related because in this issue the error occurs before writing the traceback to the terminal. |
|
|
| msg104949 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-04 15:17 |
| Attached patch fixes _some_str() function of the traceback module: encode unicode exception message to ASCII using backslashreplace error handler. ASCII is not the best choice, but str(unicode(...)) uses also ASCII (the default encoding) and we don't know the terminal encoding in traceback. We cannot do better here in Python2 (without breaking a lot of APIs...). The right fix is to use Python3 which formats a traceback to unicode (unicode characters of the error message are kept unchanged). The choice of the encoding and error handler is made only at the end, when writing the output to the terminal, which is the right thing to do. |
|
|
| msg104989 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-05 00:32 |
| > The downside of using backslashreplace (or repr, for that matter) is > that it does not preserve lengths, so the diff markers can get > misaligned. I find that an acceptable tradeoff, but 'replace' is > another option that preserves lengths, at least more often. 'replace' loose important informations: if the test is about the unicode string content, we will be unable to see the error data. Result of the first example with my patch (backslashreplace): ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "x.py", line 3, in test_fffd def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') AssertionError: - \ufffd+ \ufffd\ufffd Result of the first example with 'replace' error handler: ====================================================================== FAIL: test_fffd (__main__.Foo) ---------------------------------------------------------------------- Traceback (most recent call last): File "x.py", line 3, in test_fffd def test_fffd(self): self.assertEqual(u'\ufffd', u'\ufffd\ufffd') AssertionError: - ?+ ?? (but this example is irrevelant because U+FFFD is the unicode replacement character :-D) If nobody complains about my patch, I will commit it to Python trunk (only). You can still reimplement fail() method to encode the message using a more revelant encoding and/or error handler. |
|
|
| msg105011 - (view) |
Author: Michael Foord (michael.foord) *  |
Date: 2010-05-05 10:19 |
| I would prefer to try str(...) first and only attempt to convert to unicode and do the backslash replace if the str(...) call fails. |
|
|
| msg105022 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-05-05 12:46 |
| Commited: r80777 (trunk) and r80779 (2.6); blocked: r80778 (py3k). Open a new issue if you would like to use something better than ASCII+backslashreplace in unittest (using runner stream encoding?). |
|
|
| msg150127 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-12-23 02:32 |
| http://pypi.python.org/pypi/unittest2 says "There are several places in unittest2 (and unittest) that call str(...) on exceptions to get the exception message. This can fail if the exception was created with non-ascii unicode. This is rare and I won't address it unless it is actually reported as a problem for someone." It is a problem for us now that we've re-rooted all our TestCases on top of unittest2 at work. :) The solution I'm leaning towards is monkey-patching the new traceback._some_str implementation in at unittest2 import time. |
|
|
| msg150128 - (view) |
Author: Gregory P. Smith (gregory.p.smith) *  |
Date: 2011-12-23 02:32 |
| We're on python 2.6, otherwise this would be a moot point. but you might want to include something like that in a new unittest2 backport release. |
|
|
| msg150172 - (view) |
Author: Michael Foord (michael.foord) *  |
Date: 2011-12-23 15:45 |
| traceback patch looks good. Thanks for the unittest2 patch as well. |
|
|