Issue 17110: sys.argv docs should explaining how to handle encoding issues (original) (raw)

Created on 2013-02-03 04:01 by ncoghlan, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
Issue17110.patch sreepriya,2014-03-17 23:01 Documentation for proper encoding of command line arguments. review
Pull Requests
URL Status Linked Edit
PR 12602 merged methane,2019-03-28 12:27
PR 12626 merged miss-islington,2019-03-30 05:32
Messages (11)
msg181239 - (view) Author: Alyssa Coghlan (ncoghlan) * (Python committer) Date: 2013-02-03 04:01
The sys.argv docs [1] currently contain no mention of the fact that they are Unicode strings decoded from bytes provided by the OS. They also don't explain how to correct a decoding error by reversing Python's implicit conversion and redoing it based on the application's knowledge of the correct encoding, as described at [2] [1] http://docs.python.org/3/library/sys#sys.argv [2] http://stackoverflow.com/questions/6981594/sys-argv-as-bytes-in-python-3k/
msg213674 - (view) Author: Sreepriya Chalakkal (sreepriya) * Date: 2014-03-15 19:12
I tried running with Python 3.4 the following code import sys print(sys.argv[1]) print(b'bytes') And I ran as follows trying to run with a different encoding. $ python ~/a.py `echo priya|iconv -t latin1` priya bytes There was no unicode encode error generated! Is it because the problem is fixed?
msg213699 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-16 01:54
> There was no unicode encode error generated! Is it because the problem > is fixed? No, it's not fixed. First, it seems you are testing with Python 2 (otherwise you would get "b'bytes'", not "bytes"). Python 2 won't have a problem here, since it treats everything as bytestrings. Second, to evidence the issue you must pass a non-ASCII string. For example: $ ./python a.py `echo éléphant|iconv -t latin1` Traceback (most recent call last): File "a.py", line 4, in print(sys.argv[1]) UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position 0: surrogates not allowed
msg213911 - (view) Author: Sreepriya Chalakkal (sreepriya) * Date: 2014-03-17 23:01
You are right. Instead of running ./python inside the python directory, I ran the default python of older version! Based on the stackoverflow link given, I tried to make some documentation. I am attaching the patch!
msg214022 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-03-18 21:33
Hmm, I'm not sure where those explanations belong but I'm not sure should be in the sys module docs (especially as they are quite lengthy, and they also apply to other data such as os.environ). Perhaps the Unicode HOWTO?
msg339175 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2019-03-30 05:32
New changeset 38f4e468d4b55551e135c67337c18ae142193ba8 by Inada Naoki in branch 'master': bpo-17110: doc: add note how to get bytes from sys.argv (GH-12602) https://github.com/python/cpython/commit/38f4e468d4b55551e135c67337c18ae142193ba8
msg339176 - (view) Author: miss-islington (miss-islington) Date: 2019-03-30 05:38
New changeset 5b80cb5584a72044424f2d82d0ae79c720f24c47 by Miss Islington (bot) in branch '3.7': bpo-17110: doc: add note how to get bytes from sys.argv (GH-12602) https://github.com/python/cpython/commit/5b80cb5584a72044424f2d82d0ae79c720f24c47
msg371778 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-06-17 22:03
The actual startup code uses Py_DecodeLocale() for converting argv from bytes to unicode. Since which Python version is it guaranteed that Py_DecodeLocale() and os.fsencode() roundtrip?
msg371788 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-06-18 01:42
There is no strict guarantee. I think ASCII, UTF-8, latin1 with surrogateescape guarantee roundtrip. Other legacy encodings like cp932 may not roundtrip. But it is not a huge problem because only Windows use them typically. On Windows: * wchar_t is used in most case, instead of fsencoding * fsencoding is now UTF-8 by default In other words, if you are using legacy encoding on Unix, it may be not roundtripping.
msg371802 - (view) Author: Manuel Jacob (mjacob) * Date: 2020-06-18 09:48
If the encoding supports it, since which Python version do Py_DecodeLocale() and os.fsencode() roundtrip? The background of my question is that Mercurial goes some extra rounds to determine the correct encoding to emulate what Py_EncodeLocale() would do: https://www.mercurial-scm.org/repo/hg/file/5.4.1/mercurial/pycompat.py#l157 . If os.fsencode() could be used, it would simplify the code. Mercurial supports Python 3.5+.
msg371806 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2020-06-18 11:19
> > Manuel Jacob <me@manueljacob.de> added the comment: > > If the encoding supports it, since which Python version do > Py_DecodeLocale() and os.fsencode() roundtrip? > Maybe, since Python 3.2. FWIW, fsencode is added by Victor in https://bugs.python.org/issue8514 > The background of my question is that Mercurial goes some extra rounds to > determine the correct encoding to emulate what Py_EncodeLocale() would do: > https://www.mercurial-scm.org/repo/hg/file/5.4.1/mercurial/pycompat.py#l157 > . If os.fsencode() could be used, it would simplify the code. Mercurial > supports Python 3.5+. > > > I think it is a right approach. One of the important use case of os.fsencode is using file path from sys.argv even if it can not be decoded by filesystem encoding.
History
Date User Action Args
2022-04-11 14:57:41 admin set github: 61312
2020-06-18 11:19:04 methane set messages: +
2020-06-18 09:48:29 mjacob set messages: +
2020-06-18 01:42:25 methane set messages: +
2020-06-17 22:03:03 mjacob set nosy: + mjacobmessages: +
2019-03-30 06:25:04 methane set status: open -> closedstage: patch review -> resolvedresolution: fixedversions: + Python 3.7, Python 3.8, - Python 3.2, Python 3.3, Python 3.4
2019-03-30 05:38:17 miss-islington set nosy: + miss-islingtonmessages: +
2019-03-30 05:32:36 miss-islington set pull_requests: + <pull%5Frequest12559>
2019-03-30 05:32:11 methane set nosy: + methanemessages: +
2019-03-28 12:27:55 methane set stage: needs patch -> patch reviewpull_requests: + <pull%5Frequest12542>
2014-03-18 21:33:01 pitrou set messages: +
2014-03-18 08:56:15 andyma set nosy: + andyma
2014-03-17 23:01:03 sreepriya set files: + Issue17110.patchkeywords: + patchmessages: +
2014-03-16 01:54:01 pitrou set nosy: + pitroumessages: +
2014-03-15 19:12:56 sreepriya set nosy: + sreepriyamessages: +
2013-02-03 04:27:08 Arfrever set nosy: + Arfrever
2013-02-03 04:01:11 ncoghlan create