[Python-Dev] Internal representation of strings and Micropython (original) (raw)

Paul Sokolovsky pmiscml at gmail.com
Wed Jun 4 17:53:52 CEST 2014


Hello,

On Thu, 5 Jun 2014 01:00:52 +1000 Chris Angelico <rosuav at gmail.com> wrote:

On Thu, Jun 5, 2014 at 12:49 AM, Paul Sokolovsky <pmiscml at gmail.com> wrote: >> > But you need non-ASCII characters to display a title of MP3 >> > track. > > Yes, but to display a title, you don't need to do codepoint access > at random - you need to either take a block of memory (length in > bytes) and do something with it (pass to a C function, transfer > over some bus, etc.), or iterate in order over codepoints in a > string. All these operations are as efficient (O-notation) for > UTF-8 as for UTF-32.

Suppose you have a long title, and you need to abbreviate it by dropping out words (delimited by whitespace), such that you keep the first word (always) and the last (if possible) and as many as possible in between. How are you going to write that? With PEP 393 or UTF-32 strings, you can simply record the index of every whitespace you find, count off lengths, and decide what to keep and what to ellipsize.

I'll submit angry bugreport along the lines of "WWWHAT, it's 3.5 and there's still no str.isplit()??!!11", then do it with re.finditer() (while submitting another report on inconsistent naming scheme).

[]

-- Best regards, Paul mailto:pmiscml at gmail.com



More information about the Python-Dev mailing list