[Python-3000] Unicode and OS strings (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Fri Sep 14 14:32:59 CEST 2007
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Are you sure that "strings in an unknown encoding" are conceptually strings and not rather bytes?
For file names, most definitely. For command line arguments, I am fairly sure: the argc/argv calling convention does not allow for arbitrary bytes.
And what if we skillfully conserve unknown bytes in a private use or surrogate area and the application author actually knows the encoding and wants correctly decoded strings?
They can easily roundtrip that then to the encoding that it should have:
good_string = sys.argv[bad_string_index].
encode(sys.argv_encoding, "pua-replace").decode(real_encoding)
However, we are talking about borderline cases here - in most cases, Python will just do the right thing. Special cases aren't special enough to break the rules.
Regards, Martin
- Previous message: [Python-3000] Unicode and OS strings
- Next message: [Python-3000] Unicode and OS strings
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]