[Python-Dev] PEP 383 and GUI libraries (original) (raw)
Michael Foord fuzzyman at voidspace.org.uk
Fri May 1 11:06:08 CEST 2009
- Previous message: [Python-Dev] PEP 383 and GUI libraries
- Next message: [Python-Dev] PEP 383 and GUI libraries
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Zooko O'Whielacronx wrote:
[snip...] Would it be possible for Python unicode objects to have a flag indicating whether the 'python-escape' error handler was present? That would serve the same purpose as my "faileddecode" flag above, and would basically allow me to use the Python APIs directory and make all this work-around code disappear.
Failing that, I can't see any way to use the os.listdir() in its unicode-oriented mode to satisfy Tahoe's requirements. If you take the above code and then add the fact that you want to use the faileddecode flag when encoding the d argument to os.listdir(), then you get this code: [2]. Oh, I just realized that I could use the PEP 383 os.listdir(), like this: def listdir(d): fse = sys.getfilesystemencoding() if fse == 'utf-8b': fse = 'utf-8' ns = [] for fn in os.listdir(d): bytes = fn.encode(fse, 'python-escape') try: ns.append(FName(bytes.decode(fse, 'strict'))) except UnicodeDecodeError: ns.append(FName(fn.decode('utf-8', 'python-escape'), faileddecode=True)) return ns (And I guess I could define listdir() like this only on the non-unicode-safe platforms, as above.) However, that strikes me as even more horrible than the previous "listdir()" work-around, in part because it means decoding, re-encoding, and re-decoding every name, so I think I would stick with the previous version.
The current unicode mode would skip the filenames you are interested (those that fail to decode correctly) - so you would have been forced to use the bytes mode. If you need access to the original bytes then you should continue to do this. PEP-383 is entirely neutral for your use case as far as I can see.
Michael
Oh, one more note: for Tahoe's purposes you can, in all of the code above, replace ".decode('utf-8', 'python-replace')" with ".decode('windows-1252')" and it works just as well. While UTF-8b seems like a really cool hack, and it would produce more legible results if utf-8-encoded strings were partially corrupted, I guess I should just use 'windows-1252' which is already implemented in Python 2 (as well as in all other software in the world).
I guess this means that PEP 383, which I have approved of and liked so far in this discussion, would actually not help Tahoe at all and would in fact harm Tahoe -- I would have to remember to detect and work-around the automatic 'utf-8b' filesystem encoding when porting Tahoe to Python 3. If anyone else has a concrete, real use case which would be helped by PEP 383, I would like to hear about it. Perhaps Tahoe can learn something from it. Oh, if this PEP could be extended to add a flag to each unicode object indicating whether it was created with the python-escape handler or not, then it would be useful to me. Regards, Zooko [1] http://mail.python.org/pipermail/python-dev/2009-April/089020.html [2] http://allmydata.org/trac/tahoe/attachment/ticket/534/fsencode.3.py
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
- Previous message: [Python-Dev] PEP 383 and GUI libraries
- Next message: [Python-Dev] PEP 383 and GUI libraries
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]