Issue 10600: surrogateescape'd paths not readable on Windows XP. (original) (raw)

Created on 2010-12-01 23:24 by ideasman42, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
utf8_surrogateescape.py ideasman42,2010-12-01 23:24 Testfile for surrogateescape'd path not being writable.
Messages (5)
msg123022 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-01 23:24
Attached is a script which works in linux but not windows XP 32bit with Python 3.1.3. The problem is that the path can be written to when specified as bytes but when escaped it fails.
msg123023 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-01 23:27
note, this bug was reported to me by a user running windows 7, 64bits.
msg123035 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2010-12-02 02:03
Use surrogateescape error handler to decode a Windows path is not a good idea. On Windows, the problem is not to decode a path (ANSI => wide char), but to encode a path (wide char => ANSI) to use a function expecting bytes path encoded to the ANSI code page. surrogateescape is only useful on the *decode* operation, to store undecodable bytes in special characters. Why do you decode a Windows path using UTF-8? UTF-8 is not used, by default, as an ANSI code page. But first, what do you manipulate bytes path on Windows? If you would like a portable program supporting UNIX/BSD (bytes) and Windows (unicode) paths with a single type, you should use str instead of bytes, because Unicode (with surrogateescape) is a superset of bytes. Python 3.2 has os.fsencode() and os.fsdecode() functions to do that easily (to decode/encode UNIX/BSD paths).
msg123056 - (view) Author: Campbell Barton (ideasman42) * Date: 2010-12-02 05:22
This bug is with blender3d, were the paths are stored internally in C as simple char arrays - bytes. We could expose all path names as bytes too through our C/python API, this would at least be a 1:1 mapping, however Id prefer using strings if possible. Since blender projects need to be portable - compress entire projects and run on different systems, we cant ensure the native fs encoding is used. So surrogateescape seems to work very well, except for this one case I've run into, windows only.
msg123057 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2010-12-02 06:14
This is not a bug. You can't expect that using an arbitrary codec (such as UTF-8) with the surrogateescape code, and expect to be able that opening the file will be able to produce the correct filename. This won't work on Unix, in the general case, either. The surrogateescape code will work correctly in this setup only when used with the filesystem encoding.
History
Date User Action Args
2022-04-11 14:57:09 admin set github: 54809
2010-12-02 06:14:04 loewis set status: open -> closednosy: + loewismessages: + resolution: not a bug
2010-12-02 05:22:30 ideasman42 set messages: +
2010-12-02 02:03:06 vstinner set nosy: + vstinnermessages: +
2010-12-01 23:27:44 ideasman42 set messages: +
2010-12-01 23:24:55 ideasman42 create