[Python-3000] Unicode and OS strings (original) (raw)
Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Sat Sep 22 10🔞34 CEST 2007
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dnia 21-09-2007, Pt o godzinie 10:00 -0400, Jim Jewett napisał(a):
Is it reasonable to expose sys.argv.buffer? (Since this would be bytes rather than text, I assume this would be a single array, rather than a list of already separated arguments.)
On Unix the arguments are already separated on the OS level. It's the shell which usually separates them if they were previously written with spaces between (and understands quotes and other things). The execve() system call obtains them separated, and the program receives them separated.
Each Unix argument is a null-terminated array of bytes, i.e. only 0 bytes are disallowed, and the OS does not mangle the contents.
Of course people typically interpret these bytes as characters in a guessed encoding, and the encoding is always a superset of ASCII.
On Windows the arguments are not separated, the whole command line is a single string with spaces and possible quotes left for the program to possibly interpret as separate arguments (unless something has changed in the last 10 years). I believe it's an array of 16-bit code units, typically meant to be interpreted as UTF-16, but without checking that it's a well-formed UTF-16 sequence. I suppose that any 16-bit word except 0 is allowed, but I'm not sure.
-- _("< Marcin Kowalczyk _/ qrczak at knm.org.pl ^^ http://qrnik.knm.org.pl/~qrczak/
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]