Issue 10039: python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is used (original) (raw)

Created on 2010-10-07 01:25 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
redecode_filename.patch vstinner,2010-10-07 01:25 review
Messages (7)
msg118089 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-07 01:25
If a program name contains a non-ascii character in its name and/or full path and PYTHONFSENCODING is set to an encoding different than the locale encoding, Python fails to open the program. Example in the utf-8 locale: $ PYTHONFSENCODING=ascii ./python é.py UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) This issue is similar to #9992 and #10014. Solutions: remove PYTHONFSENCODING environment variable or redecode the filename from the locale encoding to the filesystem encoding. Attached patch implements the latter. -- We may also redecode Py_GetProgramName().
msg118436 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-10-12 17:06
I don’t understand why reading a filename would not respect the envvar stating the filesystem encoding.
msg118444 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2010-10-12 17:29
Éric, if you are saying, "the user asked for it, it *should* fail", then that is indeed one of the arguments put forward in issue 9992 where this was discussed. But I think the emerging consensus is that it is better to just avoid the problem by always using the locale on Unix, and solve the problem that PYTHONFSENCODING was supposed to solve in a different way (by always using utf-8 on OSX and unicode on Windows).
msg118445 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-10-12 17:39
> if you are saying, "the user asked for it, it *should* fail", then > that is indeed one of the arguments put forward in issue 9992 where > this was discussed. You could put it that way, thanks for phrasing my thoughts :) > But I think the emerging consensus is that it is better to just avoid > the problem by always using the locale on Unix, *displays his lack of knowledge* Is it always correct to decode a filename with the locale encoding on Unix? Can’t each filesystem have its own encoding? > and solve the problem that PYTHONFSENCODING was supposed to solve in a > different way (by always using utf-8 on OSX and unicode on Windows). If there is a better alternate way, let’s go for it, and maybe remove PYTHONFSENCODING altogether, since it’s new in 3.2. Thanks for explaining! I’ll repay your time by reviewing the doc patches.
msg118492 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-13 00:25
> Is it always correct to decode a filename with the locale encoding > on Unix? Do you know something better than the locale encoding? I don't. > Can’t each filesystem have its own encoding? Yes, but how do you get the encoding of each filesystem? I think that few or no application support such case without mojibake. Backup programs can use the "raw" (bytes) API of Python 3 to avoid all encoding issues. -- As wrote R. David Murray, read issue #9992 if you would like to know more about this problem and the different proposed solutions. I voted for removal of PYTHONFSENCODING which fix most issues.
msg118593 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-10-13 22:20
Fixed by r85430 (remove PYTHONFSENCODING), see #9992.
msg119039 - (view) Author: Éric Araujo (eric.araujo) * (Python committer) Date: 2010-10-18 17:03
> Do you know something better than the locale encoding? I don't. Neither do I, sorry. >> Can’t each filesystem have its own encoding? > Yes, but how do you get the encoding of each filesystem? If I really had to, on linux I could parse the output of the mount command, but this could get messy quickly, and of course is not okay for official Python. > Backup programs can use the "raw" (bytes) API of Python 3 to avoid > all encoding issues. Neat! > As wrote R. David Murray, read issue #9992 if you would like to know > more about this problem and the different proposed solutions. I did so, thanks for the pointer and all the explanations.
History
Date User Action Args
2022-04-11 14:57:07 admin set github: 54248
2010-10-18 17:03:42 eric.araujo set messages: +
2010-10-13 22:20:20 vstinner set status: open -> closedresolution: fixedmessages: +
2010-10-13 00:25:38 vstinner set messages: +
2010-10-12 17:39:45 eric.araujo set messages: +
2010-10-12 17:29:26 r.david.murray set nosy: + r.david.murraymessages: +
2010-10-12 17:06:01 eric.araujo set nosy: + eric.araujomessages: +
2010-10-07 11:33:05 vstinner link issue10014 dependencies
2010-10-07 01:25:31 vstinner create