Issue 10039: python é.py fails with UnicodeEncodeError if PYTHONFSENCODING is used (original) (raw)

Created on 2010-10-07 01:25 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
redecode_filename.patch	vstinner,2010-10-07 01:25	review

Messages (7)
msg118089 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-07 01:25
If a program name contains a non-ascii character in its name and/or full path and PYTHONFSENCODING is set to an encoding different than the locale encoding, Python fails to open the program. Example in the utf-8 locale: $ PYTHONFSENCODING=ascii ./python é.py UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) This issue is similar to #9992 and #10014. Solutions: remove PYTHONFSENCODING environment variable or redecode the filename from the locale encoding to the filesystem encoding. Attached patch implements the latter. -- We may also redecode Py_GetProgramName().
msg118436 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-12 17:06
I don’t understand why reading a filename would not respect the envvar stating the filesystem encoding.
msg118444 - (view)	Author: R. David Murray (r.david.murray) *	Date: 2010-10-12 17:29
Éric, if you are saying, "the user asked for it, it should fail", then that is indeed one of the arguments put forward in issue 9992 where this was discussed. But I think the emerging consensus is that it is better to just avoid the problem by always using the locale on Unix, and solve the problem that PYTHONFSENCODING was supposed to solve in a different way (by always using utf-8 on OSX and unicode on Windows).
msg118445 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-12 17:39
> if you are saying, "the user asked for it, it should fail", then > that is indeed one of the arguments put forward in issue 9992 where > this was discussed. You could put it that way, thanks for phrasing my thoughts :) > But I think the emerging consensus is that it is better to just avoid > the problem by always using the locale on Unix, displays his lack of knowledge Is it always correct to decode a filename with the locale encoding on Unix? Can’t each filesystem have its own encoding? > and solve the problem that PYTHONFSENCODING was supposed to solve in a > different way (by always using utf-8 on OSX and unicode on Windows). If there is a better alternate way, let’s go for it, and maybe remove PYTHONFSENCODING altogether, since it’s new in 3.2. Thanks for explaining! I’ll repay your time by reviewing the doc patches.
msg118492 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 00:25
> Is it always correct to decode a filename with the locale encoding > on Unix? Do you know something better than the locale encoding? I don't. > Can’t each filesystem have its own encoding? Yes, but how do you get the encoding of each filesystem? I think that few or no application support such case without mojibake. Backup programs can use the "raw" (bytes) API of Python 3 to avoid all encoding issues. -- As wrote R. David Murray, read issue #9992 if you would like to know more about this problem and the different proposed solutions. I voted for removal of PYTHONFSENCODING which fix most issues.
msg118593 - (view)	Author: STINNER Victor (vstinner) *	Date: 2010-10-13 22:20
Fixed by r85430 (remove PYTHONFSENCODING), see #9992.
msg119039 - (view)	Author: Éric Araujo (eric.araujo) *	Date: 2010-10-18 17:03
> Do you know something better than the locale encoding? I don't. Neither do I, sorry. >> Can’t each filesystem have its own encoding? > Yes, but how do you get the encoding of each filesystem? If I really had to, on linux I could parse the output of the mount command, but this could get messy quickly, and of course is not okay for official Python. > Backup programs can use the "raw" (bytes) API of Python 3 to avoid > all encoding issues. Neat! > As wrote R. David Murray, read issue #9992 if you would like to know > more about this problem and the different proposed solutions. I did so, thanks for the pointer and all the explanations.

History
Date	User	Action	Args
2022-04-11 14:57:07	admin	set	github: 54248
2010-10-18 17:03:42	eric.araujo	set	messages: +
2010-10-13 22:20:20	vstinner	set	status: open -> closedresolution: fixedmessages: +
2010-10-13 00:25:38	vstinner	set	messages: +
2010-10-12 17:39:45	eric.araujo	set	messages: +
2010-10-12 17:29:26	r.david.murray	set	nosy: + r.david.murraymessages: +
2010-10-12 17:06:01	eric.araujo	set	nosy: + eric.araujomessages: +
2010-10-07 11:33:05	vstinner	link	issue10014 dependencies
2010-10-07 01:25:31	vstinner	create