msg118089 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-10-07 01:25 |
If a program name contains a non-ascii character in its name and/or full path and PYTHONFSENCODING is set to an encoding different than the locale encoding, Python fails to open the program. Example in the utf-8 locale: $ PYTHONFSENCODING=ascii ./python é.py UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 0: ordinal not in range(128) This issue is similar to #9992 and #10014. Solutions: remove PYTHONFSENCODING environment variable or redecode the filename from the locale encoding to the filesystem encoding. Attached patch implements the latter. -- We may also redecode Py_GetProgramName(). |
|
|
msg118436 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-10-12 17:06 |
I don’t understand why reading a filename would not respect the envvar stating the filesystem encoding. |
|
|
msg118444 - (view) |
Author: R. David Murray (r.david.murray) *  |
Date: 2010-10-12 17:29 |
Éric, if you are saying, "the user asked for it, it *should* fail", then that is indeed one of the arguments put forward in issue 9992 where this was discussed. But I think the emerging consensus is that it is better to just avoid the problem by always using the locale on Unix, and solve the problem that PYTHONFSENCODING was supposed to solve in a different way (by always using utf-8 on OSX and unicode on Windows). |
|
|
msg118445 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-10-12 17:39 |
> if you are saying, "the user asked for it, it *should* fail", then > that is indeed one of the arguments put forward in issue 9992 where > this was discussed. You could put it that way, thanks for phrasing my thoughts :) > But I think the emerging consensus is that it is better to just avoid > the problem by always using the locale on Unix, *displays his lack of knowledge* Is it always correct to decode a filename with the locale encoding on Unix? Can’t each filesystem have its own encoding? > and solve the problem that PYTHONFSENCODING was supposed to solve in a > different way (by always using utf-8 on OSX and unicode on Windows). If there is a better alternate way, let’s go for it, and maybe remove PYTHONFSENCODING altogether, since it’s new in 3.2. Thanks for explaining! I’ll repay your time by reviewing the doc patches. |
|
|
msg118492 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-10-13 00:25 |
> Is it always correct to decode a filename with the locale encoding > on Unix? Do you know something better than the locale encoding? I don't. > Can’t each filesystem have its own encoding? Yes, but how do you get the encoding of each filesystem? I think that few or no application support such case without mojibake. Backup programs can use the "raw" (bytes) API of Python 3 to avoid all encoding issues. -- As wrote R. David Murray, read issue #9992 if you would like to know more about this problem and the different proposed solutions. I voted for removal of PYTHONFSENCODING which fix most issues. |
|
|
msg118593 - (view) |
Author: STINNER Victor (vstinner) *  |
Date: 2010-10-13 22:20 |
Fixed by r85430 (remove PYTHONFSENCODING), see #9992. |
|
|
msg119039 - (view) |
Author: Éric Araujo (eric.araujo) *  |
Date: 2010-10-18 17:03 |
> Do you know something better than the locale encoding? I don't. Neither do I, sorry. >> Can’t each filesystem have its own encoding? > Yes, but how do you get the encoding of each filesystem? If I really had to, on linux I could parse the output of the mount command, but this could get messy quickly, and of course is not okay for official Python. > Backup programs can use the "raw" (bytes) API of Python 3 to avoid > all encoding issues. Neat! > As wrote R. David Murray, read issue #9992 if you would like to know > more about this problem and the different proposed solutions. I did so, thanks for the pointer and all the explanations. |
|
|