msg163227 - (view) |
Author: Ev Kontsevoy (ekontsevoy) |
Date: 2012-06-19 22:18 |
When calling connection.iterdump() on a database with non-ASCII string values, the following exception is raised: ---------------------------------------------------- File "/python-2.7.3/lib/python2.7/sqlite3/dump.py", line 56, in _iterdump yield("{0};".format(row[0])) UnicodeEncodeError: 'ascii' codec can't encode characters in position 48-51: ordinal not in range(128) ---------------------------------------------------- The older versions used the following (safer) version in /python-2.7.3/lib/python2.7/sqlite3/dump.py:56: yield("%s;" % row[0]) |
|
|
msg163230 - (view) |
Author: Ev Kontsevoy (ekontsevoy) |
Date: 2012-06-19 22:53 |
Proposed fix: maybe yield(u"%s;" % row[0]) or simply row[0] + ";"? |
|
|
msg163235 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-06-20 00:48 |
It's not clear to me why the behavior differs. Hopefully Eric will explain. For 2.7 we should probably just revert the change to the yield statement to restore the previous behavior, unless format can be fixed. |
|
|
msg163237 - (view) |
Author: Ev Kontsevoy (ekontsevoy) |
Date: 2012-06-20 00:57 |
If the behavior of string.format() can be fixed to act identically to u"%s" % "" that would be simply wonderful! Currently at work we have a rule in place: to never use string.format() since it cannot be used for anything but constants due to encoding exceptions. |
|
|
msg163239 - (view) |
Author: Eric V. Smith (eric.smith) *  |
Date: 2012-06-20 01:02 |
Could you reproduce this in a short script that doesn't use sqlite? I'm looking for something like: str = 'some-string' "{0}".format(str) Also: is that the entire traceback? I don't see how format could be invoking a codec. Maybe the error occurs when writing it to stdout, or some other operation that's encoding? |
|
|
msg163241 - (view) |
Author: Ev Kontsevoy (ekontsevoy) |
Date: 2012-06-20 01:09 |
I am attaching death.py file which dies on string.format() The stack trace above is at the full depth. Python doesn't print anything from inside of format(). |
|
|
msg163243 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-06-20 01:49 |
>>> print('{}'.format(u'\u2107')) Traceback (most recent call last): File "", line 1, in UnicodeEncodeError: 'ascii' codec can't encode character u'\u2107' in position 0: ordinal not in range(128) >>> print('%s' % u'\u2107') ℇ (You get the exception without the print as well, just in case that isn't clear.) Ah, and now I see why this is true. The '%s' gets implicitly coerced to unicode. So, it is not a bug in format, and the yield statement change should be reverted. You can use format if you just always make your format input strings unicode strings (which you should be doing anyway, especially now that python3.3 will allow the 'u' prefix...that is, such code will be forward-compatible with Python3). |
|
|
msg163244 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-06-20 01:50 |
Or use 'from __future__ import unicode_literals'. |
|
|
msg163246 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2012-06-20 01:58 |
Note that this is a regression in 2.7.3 relative to 2.7.2, which is why I'm marking it as high priority. |
|
|
msg179614 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2013-01-11 02:12 |
New changeset 2a417ad8bfbf by R David Murray in branch '2.7': #15109: revert '%'->'format' changes in 4b105d328fe7 to fix regression. http://hg.python.org/cpython/rev/2a417ad8bfbf |
|
|