[Python-Dev] Internal representation of strings and Micropython (original) (raw)

Serhiy Storchaka storchaka at gmail.com
Wed Jun 4 18:49:18 CEST 2014

Previous message: [Python-Dev] Internal representation of strings and Micropython
Next message: [Python-Dev] Internal representation of strings and Micropython
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

04.06.14 18:38, Paul Sokolovsky написав(ла):

Any non-trivial text parsing uses indices or regular expressions (and regular expressions themself use indices internally). I keep hearing this stuff, and unfortunately so far don't have enough time to collect all that stuff and provide detailed response. So, here's spur of the moment response - hopefully we're in the same context so it is easy to understand. So, gentlemen, you keep mixing up character-by-character random access to string and taking substrings of a string. Character-by-character random access imply that you would need to scan thru (possibly almost) all chars in a string. That's O(N) (N-length of string). With varlength encoding (taking O(N) to index arbitrary char), there's thus concern that this would be O(N^2) op. But show me real-world case for that. Common usecase is scanning string left-to-right, that should be done using iterator and thus O(N). Right-to-left scanning would be order(s) of magnitude less frequent, as and also handled by iterator.

html.HTMLParser, json.JSONDecoder, re.compile, tokenize.tokenize don't use iterators. They use indices, str.find and/or regular expressions. Common use case is quickly find substring starting from current position using str.find or re.search, process found token, advance position and repeat.

Previous message: [Python-Dev] Internal representation of strings and Micropython
Next message: [Python-Dev] Internal representation of strings and Micropython
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list