[Python-Dev] Python and the Unicode Character Database (original) (raw)

Alexander Belopolsky alexander.belopolsky at gmail.com
Thu Dec 2 22:34:34 CET 2010

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Thu, Dec 2, 2010 at 1:55 PM, Antoine Pitrou <solipsis at pitrou.net> wrote: ..

I don't think so. str.split() and str.splitlines() are also defined in conformance to the SPEC, AFAIK. They certainly try to.

You are joking, right? Where exactly does Unicode specify something like this:

''.join('𐌀𐌁𐌂'.split('\udf00\ud800')) '𐌁𐌂' ?

OK, splitting on a given separator has very little to do with Unicode or UCD, but str.splitlines() makes absolutely no attempt to conform to Unicode Standard Annex #14 ("Unicode line breaking algorithm"). Wait, UAX #14 is actually relevant to textwrap module which saw very little change since 2.x days. So, what exactly does str.splitlines() do? And which part of the Unicode standard defines how it is different from str.split(.., '\n')? Reference manual does not help me here either:

""" str.splitlines([keepends])

Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ http://docs.python.org/dev/library/stdtypes.html#str.splitlines

Previous message: [Python-Dev] Python and the Unicode Character Database
Next message: [Python-Dev] Python and the Unicode Character Database
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list