[Python-Dev] Python-3.0, unicode, and os.environ (original) (raw)
Terry Reedy tjreedy at udel.edu
Tue Dec 9 00:58:09 CET 2008
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
M.-A. Lemburg wrote:
On Sun, Dec 7, 2008 at 3:53 PM, Terry Reedy <tjreedy at udel.edu> wrote:
try: files = os.listdir(somedir, errors = strict) except OSError as e: log() files = os.listdir(somedir)
If that error parameter is the same as in unicode(value, errors), then this would be a useful feature:
Except that unicode becomes str in 3.0, that is exactly my intention.
People could then choose among the already existing error handlers ('strict', 'ignore', 'replace', 'xmlcharrefreplace') or register their own ones via the codecs module.
These could be passed through from listdir or getenv to str.
[Side questions:
- 'xmlcharrefreplace' is not in the 3.0 LibRef doc or doc string. Should it be or is 'xmlcharrefreplace' an addition for a later version.
- A garbage value for errors (such as 'blah') is silently ignored (so I cannot test the above). Intended or a bug?]
Someone else proposed a new option 'warn', which Guido has accepted to be the default instead of the current 'ignore'. It could not be passed through (unless str were changed or something registered). I believe the implementation of that would be to call str with 'strict' but catch errors and warn instead. Whether there should be 1 warning for each problematic bytes encountered or 1 for each listdir (or whatever) call, possibly with the number of problems, I leave to others to decide.
Such application specific error handlers could then also apply whatever fancy round-trip safe encoding of non-decodable bytes to Unicode escapes, private code points, etc. as seen fit by the application.
Perhaps we should also add an ''encoding'' parameter that can be set on a per directory basis (if necessary) and defaults to the global file system encoding.
That could also be passed through, but I will lets others make the argument for it.
If an application hits directory that is known to cause problems, it could then chose to receive the file names in a different, more suitable encoding. This allows implementing fallback mechanisms with a list of common encodings for a locale.
Terry Jan Reedy
- Previous message: [Python-Dev] Python-3.0, unicode, and os.environ
- Next message: [Python-Dev] Python-3.0, unicode, and os.environ
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]