| msg327945 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 08:50 |
| Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it? |
|
|
| msg327953 - (view) |
Author: Andrew Svetlov (asvetlov) *  |
Date: 2018-10-18 09:50 |
| List of strings works on both my local Linux box and CPython test suite. Please provide more info about the error. Stacktrace can help |
|
|
| msg327954 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-10-18 09:54 |
| Hi Remy, > Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it? Can you elaborate? On which OS? What is your error message? Can you paste a traceback? |
|
|
| msg327955 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 10:04 |
| > List of strings works on both my local Linux box and CPython test suite. Indeed that's why I posted this bug report, in my opinion it should work only with bytes string. > Can you elaborate? On which OS? What is your error message? Can you paste a traceback? If you try to send a UTF-8 string on a linux box for instance, you might get a UnicodeEncodeError. Let me try to provide you with a script to reproduce this error. |
|
|
| msg327962 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 10:53 |
| I though this would be sufficient to actually reproduce the issue. However it seems that if the system encoding is UTF-8 it does work properly. Here is the traceback I had: ``` UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 69: ordinal not in range(128) File "worker.py", line 393, in return_code = loop.run_until_complete(main(loop)) File "asyncio/base_events.py", line 467, in run_until_complete return future.result() File "worker.py", line 346, in main '-f mp4', '-o', '{}/{}.mp4'.format(download_tempdir, video_id)) File "worker.py", line 268, in run_command proc = await create File "asyncio/subprocess.py", line 225, in create_subprocess_exec stderr=stderr, **kwds) File "asyncio/base_events.py", line 1191, in subprocess_exec bufsize, **kwargs) File "asyncio/unix_events.py", line 191, in _make_subprocess_transport **kwargs) File "asyncio/base_subprocess.py", line 39, in __init__ stderr=stderr, bufsize=bufsize, **kwargs) File "asyncio/unix_events.py", line 697, in _start universal_newlines=False, bufsize=bufsize, **kwargs) File "python3.6/subprocess.py", line 707, in __init__ restore_signals, start_new_session) File "python3.6/subprocess.py", line 1267, in _execute_child restore_signals, start_new_session, preexec_fn) ``` |
|
|
| msg327964 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 11:03 |
| I am adding the following info: If I run the following on the Docker image where I got the error I get: ``` import sys import locale print(sys.getdefaultencoding()) print(locale.getpreferredencoding()) ``` utf-8 ANSI_X3.4-1968 While if I run it on my machine I get: utf-8 UTF-8 I don't know how to force the usage of the later locally to reproduce. Settings LC_ALL=C and LANG=C didn't do the trick |
|
|
| msg327965 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 11:06 |
| Here we go: ``` $ python3.7 demo.py utf-8 UTF-8 Traceback (most recent call last): File "demo.py", line 21, in asyncio.run(main()) File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run return loop.run_until_complete(main) File "/usr/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete return future.result() File "demo.py", line 14, in main sys.stdout.write(out.decode('utf-8')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128) ``` |
|
|
| msg327966 - (view) |
Author: Andrew Svetlov (asvetlov) *  |
Date: 2018-10-18 11:07 |
| I think you'll get the same error on `subprocess.run()` call if your current locale is not UTF-8. I don't recall the details but the Intenet has a lot info about setting locale per user and system-wide. |
|
|
| msg327967 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 11:08 |
| I believe Python 3.7 brings explicit unicode encoding/decoding. If depending on the environment the create_subprocess_exec method can fail, I believe we should not try to encode the command lines attribute but rather enforce it to be bytes. |
|
|
| msg327970 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-10-18 11:57 |
| I added the UTF-8 Mode for you, for the Docker use case: python3.7 -X utf8. Using that, Python ignores your locale and speaks UTF-8. What is your locale? Try the "locale" command. |
|
|
| msg327974 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 12:40 |
| Here are the locale set: ``` LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= ``` |
|
|
| msg327975 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-10-18 12:41 |
| > LC_CTYPE="POSIX" I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C". |
|
|
| msg327976 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 12:43 |
| Unicode is complicated, the answer is somewhere here: https://unicodebook.readthedocs.io/ Sorry for the bothering, I thought it was a bug but apparently it's a feature. Thank you for your help, thank you for making Python better. |
|
|
| msg327977 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2018-10-18 12:44 |
| This issue is not an asyncio bug: the bug occurs in subprocess. The bug is not a subprocess bug: subprocess works as expected, it encodes Unicode with sys.getfilesystemencoding() (see os.fsencode()). The bug is that you use non-ASCII strings whereas your filesystem encoding is ASCII. You have a different options to fix *your* issue: * Use a different locale which uses a UTF-8 locale * Enable the Python 3.7 UTF-8 mode * Wait for Python 3.7.1 (which enables automatically the UTF-8 Mode for LC_CTYPE="POSIX") Note: You might want to read my ebook http://unicodebook.readthedocs.io/ which explains how to deal with Unicode. |
|
|
| msg327978 - (view) |
Author: Rémy Hubscher [:natim] (natim) * |
Date: 2018-10-18 12:44 |
| > I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C" Ok works for me thanks :) |
|
|