[Python-Dev] Internal representation of strings and Micropython (original) (raw)

Chris Angelico rosuav at gmail.com
Thu Jun 5 01:05:33 CEST 2014


On Thu, Jun 5, 2014 at 8:52 AM, Paul Sokolovsky <pmiscml at gmail.com> wrote:

"Well" is subjective (or should be defined formally based on the requirements). With my MicroPython hat on, an implementation which receives a string, transcodes it, leading to bigger size, just to immediately transcode back and send out - is awful, environment unfriendly implementation ;-).

Be careful of confusing correctness and performance, though. The transcoding you describe is inefficient, but (presumably) correct; something that's fast but wrong is straight-up buggy. You can always fix inefficiency in a later release, but buggy behaviour sometimes is relied on (which is why ECMAScript still exposes UTF-16 to scripts, and why Windows window messages have a WPARAM and an LPARAM, and why Python's threading module has duplicate names for a lot of functions, because it's just not worth changing). I'd be much more comfortable releasing something where "everything works fine, but if you use astral characters in your strings, memory usage blows out by a factor of four" (or "... the len() function takes O(N) time") than one where "everything works fine as long as you use BMP only, but SMP characters result in tests failing".

ChrisA



More information about the Python-Dev mailing list