[Python-Dev] Internal representation of strings and Micropython (original) (raw)

Paul Sokolovsky pmiscml at gmail.com
Thu Jun 5 00:52:53 CEST 2014


Hello,

On Wed, 04 Jun 2014 18:04:52 -0400 Terry Reedy <tjreedy at udel.edu> wrote:

On 6/4/2014 5:14 PM, Paul Sokolovsky wrote:

> That said, and unlike previous attempts to develop a small Python > implementations (which of course existed), we're striving to be > exactly a Python language implementation, not a Python-like language > implementation. As there's no formal, implementation-independent > language spec, what constitutes a compatible language > implementation is subject to opinions, and we welcome and > appreciate independent review, like this thread did. > >> Realistically, most Python code that works on Python 3.4 won't work >> on Micropython (for various reasons, not just the string behavior) >> and neither does it need to. > > That's true. However, as was said, we're striving to provide a > compatible implementation, and compatibility claims must be > validated. While we have simple "in-house" testsuite, more serious > compatibility validation requires running a testsuite for reference > implementation (CPython), and that's gradually being approached. I would call what you are doing a 'Python 3.n subset, with

Thanks, that's what we call it ourselves in the docs linked in the original message, and use n=4. Note that being a subset is not a design requirement, but there's higher-priority requirement of staying lean, so realistically uPy will always stay a subset.

limitations', where n should be a specific number, which I would urge should be at least 3, if not 4 ('yield from'). To me, that would mean that every Micropython program (that does not use a clearly non-Python addon like inline assembly) would run the same* on CPython 3.n. Conversely, a Python 3.n program should either run the same* on MicroPython as CPython, or raise. What most to avoid is giving different* answers.

That's nice aim, to implement which we don't have enough resources, so would appreciate any help from interested parties.

*'same' does not include timing differences or normal float variations or bug fixes in MicroPython not in CPython.

As for unicode: I would see ascii-only (very limited codepoints) or bare utf-8 (limited speed == expanded time) as possibly fitting the definition above. Just be clear what the limitations are. And accept that there will be people who do not bother to read the limitations and then complain when they bang into them. PS. You do not seem to be aware of how well the current PEP393 implementation works. If you are going to write any more about it, I suggest you run Tools/Stringbench/stringbench.py for timings.

"Well" is subjective (or should be defined formally based on the requirements). With my MicroPython hat on, an implementation which receives a string, transcodes it, leading to bigger size, just to immediately transcode back and send out - is awful, environment unfriendly implementation ;-).

-- Best regards, Paul mailto:pmiscml at gmail.com



More information about the Python-Dev mailing list