[Python-Dev] PEP 460 reboot (original) (raw)

M.-A. Lemburg mal at egenix.com
Mon Jan 13 10:06:00 CET 2014

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 13.01.2014 07:51, Nick Coghlan wrote:

[Using a new asciistr type] The key thing that the text model change in Python 3 enabled is for us to use the type system to help with managing the complexity of dealing with text encodings. We've got a long way with just the two pure types, and no additional types that straddle the binary/text boundary the way the Python 2 str type did. Unlike introducing new ASCII-only operations to the bytes type, adding new types specifically for dealing with ASCII compatible formats (especially starting life as a third party library) isn't compromising the Python 3 text model, it's embracing it and making it work for us (which is why I've been suggesting that it be considered since at least 2010). The problem with "str" in Python 2 was that one type was used to represent too many things with serious semantic differences. The ongoing attempts to reintroduce that ambiguity to the core bytes type rather than exploring the creation of new types and then filing bugs for any interoperability issues those attempts uncover in the core types represents one of the worst cases of paradigm lock that I have ever seen :P

In theory this sounds nice, but in practice you often run into the issue that whenever you pass such a str-subtype to some function that works on str doesn't return the str-subtype as result, but instead a new str object.

As a result, you have to keep track of which operations work on your str-subtype alone and which convert it back to a str, making the approach infeasible for all but the most basic uses.

This is why we try to make the basic types as useful as possible for everyone. It's also the main reason why subtyping 8-bit strings and Unicode in Python 2 wasn't a popular sport :-)

Leaving aside the discussion about str and bytes, I think PEP 460 has much potential of making life easier for people dealing with binary data: the formatting codes for the bytes format methods could be extended to include the struct module features - with the struct module then turning into a proxy for these new format methods (much like we did with the string module when string methods were introduced).

BTW: There's a little known trick in Python 2 which also lets you disable the string to Unicode coercion: all you have to do is set the default encoding to "undefined" (see site.py:setencoding()). Python 2 will then raise a UnicodeError whenever coercion would trigger. I added that codec to experiment with this scenario in the early days of the Unicode integration.

-- Marc-Andre Lemburg eGenix.com

Professional Python Services directly from the Source (#1, Jan 13 2014)

Python Projects, Consulting and Support ... http://www.egenix.com/ mxODBC.Zope/Plone.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Previous message: [Python-Dev] PEP 460 reboot
Next message: [Python-Dev] PEP 460 reboot
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-Dev mailing list