msg274222 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-02 10:19 |
The "us-ascii" encoding is an alias to the Python ASCII encoding. PyUnicode_AsEncodedString() and PyUnicode_Decode() functions have a fast-path for the "ascii" string, but not for "us-ascii". Attached patch uses also the fast-path for "us-ascii". It's a more generic change than the issue #27915. The "us-ascii" name is common in the email and xml.etree modules. Other changes of the patch: * Rewrite _Py_normalize_encoding() as a C implementation of encodings.normalize_encoding(). For example, " utf-8 " is now normalized to "utf_8". So the fast path is now used for more name variants of the same encoding. * Avoid strcpy() when encoding is NULL: call directly the UTF-8 codec * Reorder encodings: UTF-8, ASCII, MBCS, Latin1, UTF-16 * Remove fast-path for UTF-32: seriously, nobody uses this codec. Latin9 is much faster but has no fast-path. |
|
|
msg274232 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-02 11:37 |
See also get_standard_encoding() in Python/codecs.c. I suppose it is faster. UTF-32 is rarely used as external encoding, but it is still used as internal encoding in some programming languages and libraries (e.g. wchar_t* in C and std::wstring in C++ on Linux). The codec itself is very fast. I would add fast path all utf encodings (except utf-7). |
|
|
msg274455 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-09-05 22:48 |
New changeset 99818330b4c0 by Victor Stinner in branch 'default': Issue #27938: Add a fast-path for us-ascii encoding https://hg.python.org/cpython/rev/99818330b4c0 |
|
|
msg274456 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-05 22:51 |
> See also get_standard_encoding() in Python/codecs.c. I suppose it is faster. I understand that PyCodec_SurrogatePassErrors() is already called with a normalized encoding name. With my enhanced _Py_normalize_encoding(), strange syntaxes like " utf 8 " also take the fast path. > UTF-32 is rarely used as external encoding, but ... Ok, I used the same design than get_standard_encoding() to match the "utf" prefix, so having a fast-path for UTF-16 and UTF-32 doesn't add new strcmp() for "latin9". I pushed my change, so I close the issue. |
|
|
msg274512 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-09-06 04:23 |
It seems this change is the cause of the Free BSD buildbot failures. From memory, both failing cases involve sending or receiving non-ASCII bytes in child Python processes. http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Non-Debug%203.x/builds/110/steps/test/logs/stdio ====================================================================== FAIL: test_non_ascii (test.test_cmd_line_script.CmdLineTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/test_cmd_line_script.py", line 517, in test_non_ascii rc, stdout, stderr = assert_python_ok(script_name) File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/support/script_helper.py", line 139, in assert_python_ok return _assert_python(True, *args, **env_vars) File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/support/script_helper.py", line 125, in _assert_python err)) AssertionError: Process return code is 1 command line: ['/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/python', '-X', 'faulthandler', '-I', './@test_60885_tmp\udce7w\udcf0.py'] stdout: --- --- stderr: --- UnicodeEncodeError: 'ascii' codec can't encode character '\xe7' in position 17: ordinal not in range(128) --- ====================================================================== FAIL: test_nonascii (test.test_readline.TestReadline) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/home/buildbot/python/3.x.koobs-freebsd-current.nondebug/build/Lib/test/test_readline.py", line 203, in test_nonascii self.assertIn(b"text 't\\xeb'\r\n", output) AssertionError: b"text 't\\xeb'\r\n" not found in bytearray(b"^A^B^B^B^B^B^B^B\t\tx\t\r\n[\x07\r\x07\x07\x07\x07\x07\x07\x07\x07x[\x08\x07\r\nresult \'x[\'\r\nhistory \'x[\'\r\n") |
|
|
msg274692 - (view) |
Author: Kubilay Kocak (koobs)  |
Date: 2016-09-07 00:52 |
Re-open and assign for regressions. Observed in all koobs-freebsd* buildbots (9/10/11) and build types. Issue is in default branch (add version 3.7) First failing test run: http://buildbot.python.org/all/builders/AMD64%20FreeBSD%20CURRENT%20Non-Debug%203.x/builds/110 |
|
|
msg274720 - (view) |
Author: Martin Panter (martin.panter) *  |
Date: 2016-09-07 02:56 |
Koobs if you can, it would be good to understand where the failure is. My guess is that Python doesn’t like running a non-ASCII filename. The following is hopefully a simplified version of the test_cmd_line_script test case: import os, subprocess, sys script_name = os.fsdecode(b'./\xE7w\xF0.py') script_file = open(script_name, 'w', encoding='utf-8') script_file.write('print(ascii(__file__))\n') script_file.close() cmd_line = [sys.executable, '-X', 'faulthandler', '-I', script_name] env = os.environ.copy() env['TERM'] = '' proc = subprocess.Popen(cmd_line, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, env=env) out, err = proc.communicate() print(proc.returncode) # Should be 0 but Free BSD has 1 print(repr(err)) # Error is about encoding 0xE7 with ASCII print(repr(out)) # If executed, this would be the file name Hopefully fixing the above problem will help with the test_readline failure. The readline test case does Readline (tab) completions involving non-ASCII text, and it seems that the Python completion routine is no longer being called. |
|
|
msg274732 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-07 03:12 |
Sorry, but I don't have enough information to fix the issue. I don't see how my change can break the two failing tests. Could you please try to collect more information manually? |
|
|
msg274796 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2016-09-07 11:15 |
Maybe Windows buildbots failures are related: http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/8294/steps/test/logs/stdio ====================================================================== FAIL: test_create_at_shutdown_without_encoding (test.test_io.PyTextIOWrapperTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_io.py", line 3174, in test_create_at_shutdown_without_encoding self.assertIn(self.shutdown_error, err.decode()) AssertionError: 'LookupError: unknown encoding: ascii' not found in 'Exception ignored in: <bound method C.__del__ of <__main__.C object at 0x000000000123BF60>>\r\nTraceback (most recent call last):\r\n File "", line 12, in __del__\r\n File "C:\\buildbot.python.org\\3.x.kloth-win64\\build\\lib\\_pyio.py", line 1934, in __init__\r\n File "C:\\buildbot.python.org\\3.x.kloth-win64\\build\\lib\\encodings\\__init__.py", line 158, in _alias_mbcs\r\nImportError: sys.meta_path is None, Python is likely shutting down' ---------------------------------------------------------------------- |
|
|
msg274831 - (view) |
Author: Steve Dower (steve.dower) *  |
Date: 2016-09-07 16:29 |
The Windows buildbot failures are partly my fault and partly Ben's fault (I created a new error message - Ben added it to the wrong test), so I'll go and prevent the error message. No idea on the other issue. It doesn't repro for me, but since it seems to be FreeBSD readline related that isn't a surprise. |
|
|
msg274834 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-07 16:46 |
> FAIL: test_create_at_shutdown_without_encoding (test.test_io.PyTextIOWrapperTest) Steve fixed it: --- changeset: 103229:47b4dbd451f5 tag: tip user: Steve Dower <steve.dower@microsoft.com> date: Wed Sep 07 09:31:52 2016 -0700 files: Lib/encodings/__init__.py Lib/test/test_io.py description: Issue #27959: Prevent ImportError from escaping codec search function --- Its new search function now catchs ImportError as expected. |
|
|
msg275574 - (view) |
Author: Roundup Robot (python-dev)  |
Date: 2016-09-10 06:14 |
New changeset 3b185df3a3e2 by Victor Stinner in branch 'default': Fix check_force_ascii() https://hg.python.org/cpython/rev/3b185df3a3e2 |
|
|
msg275576 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-10 06:31 |
> New changeset 3b185df3a3e2 by Victor Stinner in branch 'default': > Fix check_force_ascii() > https://hg.python.org/cpython/rev/3b185df3a3e2 @koobs: That's my tiny gift for your birthday. Happy Birthday! ;-) (It should fix FreeBSD buildbots.) |
|
|
msg275577 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2016-09-10 06:33 |
Sorry for the little breakage of FreeBSD buildbots, it seems to be ok now ;-) |
|
|
msg275625 - (view) |
Author: Kubilay Kocak (koobs)  |
Date: 2016-09-10 11:14 |
@Victor I was just checking this issue to copy the test command, to provide results to you both when I saw the lovely surprise. Thank you :) |
|
|
msg308374 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-12-15 10:19 |
New changeset 297fd876aad8ef443d8992618de22c46dbda258b by Victor Stinner (Ville Skyttä) in branch 'master': bpo-28393: Update encoding lookup docs wrt bpo-27938 (#4871) https://github.com/python/cpython/commit/297fd876aad8ef443d8992618de22c46dbda258b |
|
|
msg308392 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2017-12-15 14:23 |
New changeset 77bf6da7258b4a312e224860ea50ac010aa17c1e by Victor Stinner (Miss Islington (bot)) in branch '3.6': bpo-28393: Update encoding lookup docs wrt bpo-27938 (GH-4871) (#4881) https://github.com/python/cpython/commit/77bf6da7258b4a312e224860ea50ac010aa17c1e |
|
|