[Python-Dev] Internal representation of strings and Micropython (original) (raw)

Donald Stufft donald at stufft.io
Wed Jun 4 03:46:22 CEST 2014


I think UTF8 is the best option.

On Jun 3, 2014, at 9:17 PM, Steven D'Aprano <steve at pearwood.info> wrote:

There is a discussion over at MicroPython about the internal representation of Unicode strings. Micropython is aimed at embedded devices, and so minimizing memory use is important, possibly even more important than performance. (I'm not speaking on their behalf, just commenting as an interested outsider.) At the moment, their Unicode support is patchy. They are talking about either: * Having a build-time option to restrict all strings to ASCII-only. (I think what they mean by that is that strings will be like Python 2 strings, ASCII-plus-arbitrary-bytes, not actually ASCII.) * Implementing Unicode internally as UTF-8, and giving up O(1) indexing operations. https://github.com/micropython/micropython/issues/657

Would either of these trade-offs be acceptable while still claiming "Python 3.4 compatibility"? My own feeling is that O(1) string indexing operations are a quality of implementation issue, not a deal breaker to call it a Python. I can't see any requirement in the docs that str[n] must take O(1) time, but perhaps I have missed something.

-- Steven


Python-Dev mailing list Python-Dev at python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/donald%40stufft.io



More information about the Python-Dev mailing list