[Python-3000] Unicode and OS strings (original) (raw)

Michael Urman murman at gmail.com
Fri Sep 21 17:22:29 CEST 2007


On 9/21/07, Jim Jewett <jimjjewett at gmail.com> wrote:

(Outside ASCII), if you treat sys.argv as text, that is probably impossible without filesystem support. Before python even sees the data, the terminal itself is allowed to change between canonical equivalents, which have different binary representations.

It does sound like we need a way to get to the original bytes, similar to sys.stdin.buffer. Is it reasonable to expose sys.argv.buffer?

If there's not something straightforward to put in the ... below that would allow simple iteration and processing of all files passed on the command line, preferably interchangeably on both unix (where filenames cannot necessarily be converted to Unicode) and Windows NT and up (where filenames cannot necessarily be represented by bytestrings, and arguments don't necessarily come in as bytes), then I will be one of many disappointed people.

arguments = ... # something equivalent to (python 2.x on unix) sys.argv[1:] for filename in arguments: ... archive.add(filename) # definitely - akin to open(file) ... print(filename, file=listing) # maybe - this makes too many assumptions

Obviously simple things like replacing an un(de/en)codable character with '?' will fail - while they could be partially worked around by using glob (assuming a one to one replacement, as processed by the OS), that's just asking for an unwitting corner-case behavior when another file nearly matches the name of another with a replaced character.

I don't have a preference between sys.argv[1:] doing this like it always has on unix, and tends to within a single locale on Windows; the introduction of a new sys.arguments (either [0:] or [1:]); or even some simple map(encode_step, sys.argv[1:]). Of course the problem with the encode_step is unless it is a no-op on Windows, it can break filenames as badly as decoding them will on unix, unless the common OS interfaces all reverse the process (in which case doing it manually is never necessary).

Michael

Michael Urman



More information about the Python-3000 mailing list