[Python-3000] Unicode and OS strings (original) (raw)

Jim Jewett jimjjewett at gmail.com
Fri Sep 21 17:01:24 CEST 2007


On 9/21/07, Paul Moore <p.f.moore at gmail.com> wrote:

On 21/09/2007, Jim Jewett <jimjjewett at gmail.com> wrote: > (Outside ASCII), if you treat sys.argv as text, that is probably > impossible without filesystem support. Before python even sees the > data, the terminal itself is allowed to change between canonical > equivalents, which have different binary representations.

Please note - this statement is Unix specific. The situation on Windows is entirely different (the fact that the CRT on Windows emulates some aspects of the Unix semantics is not relevant here - you need to understand the underlying OS model).

No; it is a consequence of unicode. The command shell (or other program launcher) have the same freedom.

If you are using text (as opposed to bytes), then À can be either U+00C0 or <U+0041, U+0300>. If the file system makes a distinction, then it is using bytes, and any program interacting with it needs* to use bytes too.

-jJ



More information about the Python-3000 mailing list