[Python-Dev] Divorcing str and unicode (no more implicit conversions). (original) (raw)
"Martin v. Löwis" martin at v.loewis.de
Mon Oct 24 23:06:38 CEST 2005
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
M.-A. Lemburg wrote:
There seems to be a general misunderstanding here: even if you have UCS4 storage, it is still possible to slice a Unicode string in a way which makes rendering it correctly. [impossible?]
Unicode has the concept of combining code points, e.g. you can store an "é" (e with a accent) as "e" + "'". Now if you slice off the accent, you'll break the character that you encoded using combining code points.
While this is all true, I agree with Neil that it should do whatever it does consistently across implementations, i.e. len("\U00010000") should always give the same result, and I think this result should always be 1.
How to best implement this efficiently is an entirely different question, as is the question whether you can render arbitrary substrings in a meaningful way.
Regards, Martin
- Previous message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Next message: [Python-Dev] Divorcing str and unicode (no more implicit conversions).
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]