[I18n-sig] Re: [Python-Dev] Unicode debate (original) (raw)

Just van Rossum just@letterror.com
Wed, 3 May 2000 08:50:11 +0100


[MAL]

I just wanted to point out that the argument "slicing doesn't work with UTF-8" is moot.

[Just]

And failed...

[Tim]

He succeeded for me. Blind slicing doesn't always "work right" no matter what encoding you use, because "work right" depends on semantics beyond the level of encoding. UTF-8 is no worse than anything else in this respect.

But the discussion was at the level of encoding! Still it is worse, since an arbitrary utf-8 slice may result in two illegal strings -- slicing "e`" results in two perfectly legal strings, at the encoding level. Had he used surrogates as an example, he would've been right... (But even that is an encoding issue.)

Just