(original) (raw)
On 9 Jan 2014 22:25, "Kristj�n Valur J�nsson" <kristjan@ccpgames.com> wrote:
\>
\>
\>
\> > -----Original Message-----
\> > From: Victor Stinner \[mailto:victor.stinner@gmail.com\]
\> > Sent: 9\. jan�ar 2014 13:51
\> > To: Kristj�n Valur J�nsson
\> > Cc: Antoine Pitrou; python-dev@python.org
\> > Subject: Re: \[Python-Dev\] Python3 "complexity"
\> >
\> > 2014/1/9 Kristj�n Valur J�nsson <kristjan@ccpgames.com>:
\> > > This definition is funny, because according to Wikipedia, it is a
\> > > "superset" of 8869-1 ( latin1)
\> >
\> > Bytes 0x80..0x9f are unassigned in ISO/CEI 8859-1... but are assigned in
\> > (IANA's) ISO-8859-1.
\> >
\> > Python implements the latter, ISO-8859-1.
\> >
\> > Wikipedia says "This encoding is a superset of ISO 8859-1, but differs from
\> > the IANA's ISO-8859-1".
\> >
\>
\> Thanks. �That's entirely non-confusing :)
\> " ISO-8859-1 is the IANA preferred name for this standard when supplemented with the C0 and C1 control codes from ISO/IEC 6429."
\>
\> So anyway, yes, Python's "latin1" encoding does cover the entire 256 range. �But on windows we use cp1252 instead which does not,
\> but instead defines useful and common windows characters in many of the control caracters slots.
\> Hence the need for "surrogateescape" to be able to roundtrip characters.
\>
\> Again, this is non-obvious, and knowing from my experience with cp1252, I had no way of guessing that the "subset", i.e. latin1, would indeed cover all the range. �Two things then I have learned since my initial foray into parsing ascii files with python3: �Surrogateescapes and "latin1 in python == IANA's ISO-8859-1 which does indeed define the whole 8 bit range".
http://python-notes.curiousefficiency.org/en/latest/python3/text\_file\_processing.html is currently linked from the Unicode HOWTO. However, I'd be happy to offer it for direct inclusion to help make it more discoverable.
Cheers,
Nick.
>
\> K
\> \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_
\> Python-Dev mailing list
\> Python-Dev@python.org
\> https://mail.python.org/mailman/listinfo/python-dev
\> Unsubscribe: https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com