Issue 35014: asyncio subprocess accepts string as parameter which lead to UnicodeEncodeError (original) (raw)

Issue35014

Created on 2018-10-18 08:50 by natim, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
demo.py natim,2018-10-18 10:53
Messages (15)
msg327945 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 08:50
Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it?
msg327953 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-18 09:50
List of strings works on both my local Linux box and CPython test suite. Please provide more info about the error. Stacktrace can help
msg327954 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 09:54
Hi Remy, > Asyncio.create_subprocess_exec accepts a list of str as parameter which lead to UnicodeEncodeError I think it should accept only bytes shouldn't it? Can you elaborate? On which OS? What is your error message? Can you paste a traceback?
msg327955 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 10:04
> List of strings works on both my local Linux box and CPython test suite. Indeed that's why I posted this bug report, in my opinion it should work only with bytes string. > Can you elaborate? On which OS? What is your error message? Can you paste a traceback? If you try to send a UTF-8 string on a linux box for instance, you might get a UnicodeEncodeError. Let me try to provide you with a script to reproduce this error.
msg327962 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 10:53
I though this would be sufficient to actually reproduce the issue. However it seems that if the system encoding is UTF-8 it does work properly. Here is the traceback I had: ``` UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 69: ordinal not in range(128) File "worker.py", line 393, in return_code = loop.run_until_complete(main(loop)) File "asyncio/base_events.py", line 467, in run_until_complete return future.result() File "worker.py", line 346, in main '-f mp4', '-o', '{}/{}.mp4'.format(download_tempdir, video_id)) File "worker.py", line 268, in run_command proc = await create File "asyncio/subprocess.py", line 225, in create_subprocess_exec stderr=stderr, **kwds) File "asyncio/base_events.py", line 1191, in subprocess_exec bufsize, **kwargs) File "asyncio/unix_events.py", line 191, in _make_subprocess_transport **kwargs) File "asyncio/base_subprocess.py", line 39, in __init__ stderr=stderr, bufsize=bufsize, **kwargs) File "asyncio/unix_events.py", line 697, in _start universal_newlines=False, bufsize=bufsize, **kwargs) File "python3.6/subprocess.py", line 707, in __init__ restore_signals, start_new_session) File "python3.6/subprocess.py", line 1267, in _execute_child restore_signals, start_new_session, preexec_fn) ```
msg327964 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:03
I am adding the following info: If I run the following on the Docker image where I got the error I get: ``` import sys import locale print(sys.getdefaultencoding()) print(locale.getpreferredencoding()) ``` utf-8 ANSI_X3.4-1968 While if I run it on my machine I get: utf-8 UTF-8 I don't know how to force the usage of the later locally to reproduce. Settings LC_ALL=C and LANG=C didn't do the trick
msg327965 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:06
Here we go: ``` $ python3.7 demo.py utf-8 UTF-8 Traceback (most recent call last): File "demo.py", line 21, in asyncio.run(main()) File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run return loop.run_until_complete(main) File "/usr/lib/python3.7/asyncio/base_events.py", line 568, in run_until_complete return future.result() File "demo.py", line 14, in main sys.stdout.write(out.decode('utf-8')) UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 1: ordinal not in range(128) ```
msg327966 - (view) Author: Andrew Svetlov (asvetlov) * (Python committer) Date: 2018-10-18 11:07
I think you'll get the same error on `subprocess.run()` call if your current locale is not UTF-8. I don't recall the details but the Intenet has a lot info about setting locale per user and system-wide.
msg327967 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 11:08
I believe Python 3.7 brings explicit unicode encoding/decoding. If depending on the environment the create_subprocess_exec method can fail, I believe we should not try to encode the command lines attribute but rather enforce it to be bytes.
msg327970 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 11:57
I added the UTF-8 Mode for you, for the Docker use case: python3.7 -X utf8. Using that, Python ignores your locale and speaks UTF-8. What is your locale? Try the "locale" command.
msg327974 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:40
Here are the locale set: ``` LANG= LANGUAGE= LC_CTYPE="POSIX" LC_NUMERIC="POSIX" LC_TIME="POSIX" LC_COLLATE="POSIX" LC_MONETARY="POSIX" LC_MESSAGES="POSIX" LC_PAPER="POSIX" LC_NAME="POSIX" LC_ADDRESS="POSIX" LC_TELEPHONE="POSIX" LC_MEASUREMENT="POSIX" LC_IDENTIFICATION="POSIX" LC_ALL= ```
msg327975 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 12:41
> LC_CTYPE="POSIX" I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C".
msg327976 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:43
Unicode is complicated, the answer is somewhere here: https://unicodebook.readthedocs.io/ Sorry for the bothering, I thought it was a bug but apparently it's a feature. Thank you for your help, thank you for making Python better.
msg327977 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-10-18 12:44
This issue is not an asyncio bug: the bug occurs in subprocess. The bug is not a subprocess bug: subprocess works as expected, it encodes Unicode with sys.getfilesystemencoding() (see os.fsencode()). The bug is that you use non-ASCII strings whereas your filesystem encoding is ASCII. You have a different options to fix *your* issue: * Use a different locale which uses a UTF-8 locale * Enable the Python 3.7 UTF-8 mode * Wait for Python 3.7.1 (which enables automatically the UTF-8 Mode for LC_CTYPE="POSIX") Note: You might want to read my ebook http://unicodebook.readthedocs.io/ which explains how to deal with Unicode.
msg327978 - (view) Author: Rémy Hubscher [:natim] (natim) * Date: 2018-10-18 12:44
> I modified Python 3.7.1 to enable the UTF-8 Mode when the LC_CTYPE is "POSIX". In Python 3.7.0, the UTF-8 Mode is only enabled if the LC_CTYPE is "C" Ok works for me thanks :)
History
Date User Action Args
2022-04-11 14:59:07 admin set github: 79195
2018-10-18 12:44:24 natim set messages: +
2018-10-18 12:44:22 vstinner set messages: +
2018-10-18 12:43:29 natim set status: open -> closedresolution: not a bugmessages: + stage: resolved
2018-10-18 12:41:17 vstinner set messages: +
2018-10-18 12:40:04 natim set messages: +
2018-10-18 11:57:42 vstinner set messages: +
2018-10-18 11:08:40 natim set messages: +
2018-10-18 11:07:30 asvetlov set messages: +
2018-10-18 11:06:49 natim set messages: +
2018-10-18 11:03:03 natim set messages: +
2018-10-18 10:53:44 natim set files: + demo.pymessages: +
2018-10-18 10:04:25 natim set messages: +
2018-10-18 09:54:30 vstinner set nosy: + vstinnermessages: +
2018-10-18 09:50:45 asvetlov set messages: +
2018-10-18 08:50:17 natim create