[Python-Dev] Trying to focus the whole bytes/str formatting discussion (original) (raw)

Nick Coghlan ncoghlan at gmail.com
Mon Jan 13 12:53:41 CET 2014

Previous message: [Python-Dev] Trying to focus the whole bytes/str formatting discussion
Next message: [Python-Dev] Python advanced debug support (update frame code)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 13 January 2014 08:46, Brett Cannon <brett at python.org> wrote:

I don't know about the rest of you but I feel like the discussion is heading off the rails (if it hasn't already jumped the tracks). Let's try to bring this back around to something actionable which people can focus their energy on as the amount of developer time spent arguing could have led to several coded-up solutions.

I see it as a practicality-beats-purity vs. explicit-is-better-than-implicit. The PBP group want bytes.format() (just assume I include interpolation support if you want that) to work as close to a drop-in replacement for current str.format() use in Python 2 to ease porting. The argument is that code looks cleaner and the amount of changes in Python 2 code being ported to Python 3 is much smaller. THE EIBTI group are willing to support PEP 460 but beyond that don't want to have in Python itself anything for bytes.format() which takes in a string and spits out bytes. It's bytes in->bytes out and not bytes & str in->bytes out as the PBP group is after. The EIBTI group are arguing that letting str into bytes.format() and then automatically be converted to strict ASCII leads to conflating the text/bytes divide as well as being too magical, e.g. what if you actually wanted UTF-16 for you number string instead of ASCII; the EIBTI group wants to force people to make a decision. They are also less concerned with making users update Python 2 code to handle this as it already needs to be updated for other Python 3 things anyway. From where I'm sitting, the EIBTI group and their PEP 460 proposal from Antoine (and no longer Victor) are not controversial. Everyone seems to agree that PEP 460 at minimum is acceptable and should happen for Python 3.5. The people with the uphill battle and something to prove are those arguing for str in->bytes out support in bytes.format(). The added features that the PBP group want are the ones being argued over. As the onus is on the PBP group to convince the EIBTI group (or Guido), I think the PBP group should code up a solution that does what they want and put it on PyPI to see what the community thinks. If the PBP group wants to convince the EIBTI group that str in->bytes out for bytes.format() is critical in getting a key group of users to start using Python 3 then I think that needs to be demonstrated through real-world usage by some people.

Note that I am now fine with Guido's more lenient proposal so long as explicitly bytes-only formatb and formatb_map methods are also included.

That would give us the following situation in 3.5:

Text interpolation: str.mod, str.format, str.format_map ASCII compatible interpolation: bytes.mod, bytes.format, bytes.format_map Arbitrary binary interpolation: bytes.formatb, bytes.formatb_map

Those are all reasonable operations for the language to support natively, and by providing convenient access to all three, we avoid the attractive nuisance that would be created by providing only ASCII interpolation without providing strict binary interpolation (since people would inevitably use the former when they should really be using the latter, because interpolation is such a convenient construct), while still addressing the interests of both groups (people like me and Antoine that like PEP 460 as it stands, as well as those that favour the ASCII encoding features).

It's only the introduction of ASCII compatible interpolation support without binary interpolation support that I am adamantly opposed to

that's the kind of attractive nuisance that leads to people inappropriately using ASCII compatible only APIs and then discovering that their code breaks when confronted with ASCII incompatible encodings like UTF-16, ShiftJIS and ISO-2022.

Originally I was opposed to the idea entirely, but then Antoine wrote the binary only version of PEP 460 and I found it to be a very elegant solution that didn't compromise the Python 3 text model. As long as this pure API remains available in some form (such as formatb and formatb_map methods), then I'm OK with the ASCII only version existing in parallel - at that point, it is analogous to all the other existing bytes methods that assume the use of ASCII compatible data.

** The caveat **

However, note that there were two significant issues that were raised in the recent broader discussions. PEP 460 only tackles the more tractable of the two: the fact that Twisted and Mercurial both consider bytes.mod support a blocker for switching to Python 3. That's a useful discussion to have, but it's important for people to realise that the mod-formatting feature is utterly irrelevant to the concerns Armin Ronacher raised in http://lucumr.pocoo.org/2014/1/5/unicode-in-2-and-3/ that kicked off this whole recent spate of interest in the topic.

Obviously, I disagree with his conclusions (and personally wish Python 2 Unicode experts would show a little more humility in trying to understand the core team's motivations for Python 3 design decisions rather than assuming that we're clueless idiots that decided to maintain 4 parallel branches in Subversion for a couple of years just because we thought it might be fun), but I can certainly understand his pain.

I'm the one who actually made the changes to restore dual bytes/unicode support in urllib.parse for Python 3 (one of Armin's favourite examples of the difficulty of writing that kind of code using the Python 3 text model), and I agree entirely with Armin's assessment of that code: it isn't pretty, and it wasn't fun to write. Yes, I got it to work, and yes, it was satisfying when the tests finally based, and yes there is now a smaller number of cases where errors will pass silently, but that's far from the same thing as finding the process of getting there a pleasant one, or considering the result an elegant approach to porting hybrid APIs from Python 2 such that bytes in = bytes out and str in = str out. The only difference between Armin and myself in this respect is that I know the reasons for the changes the text model, and I think the increased difficulty in implementing that particular use case was worth it, given the pay-off in finally being able to remove the implicit encoding and decoding operations from the text model (Note that the unicode input handling in urlparse in Python 2 breaks entirely if you turn off implicit decoding. You can still get hits from the cache, but if you have to actually parse anything, it will fail: http://python-notes.curiousefficiency.org/en/latest/python3/binary_protocols.html#couldn-t-the-implicit-decoding-just-be-disabled-in-python-2).

The fact remains, however, that in Python 2 the code you need for that kind of hybrid API was easy to write - you just made all your internal constants 8-bit strings, and the implicit decoding to Unicode took care of the case of str inputs. There are still valid use cases for such hybrid APIs, even in Python 3 (urllib.parse is one of them), and the reason I helped Benno start the asciicompat project (https://github.com/jeamland/asciicompat) is because I want to make that kind of code almost as effortless as it was in Python 2 - all you should need to do is make your constants asciistr instances rather than builtin bytes or str objects.

My ambition here is not "good enough to get people to stop complaining", it's "there's no actual reason Python 3 needs to be worse at this than Python 2, it just doesn't need to be part of the core builtin types, because we're in a better position to fix interoperability issues now that we don't have to deal with the close coupling between str and unicode that existed in Python 2, and the bytes type will generally play nice with anything that exposes the PEP 3118 buffer interface".

Cheers, Nick.

-- Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia

Previous message: [Python-Dev] Trying to focus the whole bytes/str formatting discussion
Next message: [Python-Dev] Python advanced debug support (update frame code)
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list