[Python-3000] PEP 3137 plan of attack (original) (raw)

Christian Heimes lists at cheimes.de
Wed Oct 10 21:08:27 CEST 2007


Guido van Rossum wrote:

Definitely not. basestring is for text strings. We could even decide to remove it; we should instead have ABCs for this purpose.

I'm going to provide a patch which rips basestring out, k? Somebody has to write a fixer for 2to3 which replaces code like isinstance(egg, basestring) with isinstance(egg, str).

You mean 'formerly', not 'formally' :-) I prefer to just call these by their C names (PyString) to be precise, as the C names aren't changing (at least not yet ;-).

Oh, formerly ... right. The current state of the names is very confusing. It's going to cost me some cups of coffee.

str - PyUnicode bytes - PyString buffer - PyBytes

No, that's spelled out in the PEP. Those should all stay. (If you see a method that's not listed in the PEP, ask me about it before deleting it. :-)

Doh, I should have read the PEP again before asking the question.

I've a question about one point. The PEP states "They accept anything that implements the PEP 3118 buffer API for bytes arguments, and return the same type as the object whose method is called ("self")". Which types do implement the buffer API? PyString, PyBytes but not PyUnicode?

For now the PyString takes PyUnicode objects are argument and vice versa but PyBytes doesn't take unicode. Do I understand correctly that PyString must not accept PyUnicode?

b"abc".count("b") 1 "abc".count(b"b") 1 buffer(b"abc").count("b") Traceback (most recent call last): File "", line 1, in SystemError: can't use str as char buffer buffer(b"abc").count(b"b") 1

Several people have noted the same issue. My goal is to remove this behavior completely. I don't know how much it will take; these bootstrap issues are always hard to debug and sometimes hard to fix.

I tried to debug and fix it but I gave up after half an hour.

I am looking into this a bit right now; I suspect it's got to do with some types that still return a PyString from their repr(). I noticed that even removing .encode() from PyString breaks about 5 tests.

Great!

I've a patch that renames PyString -> bytes and PyByte -> buffer while keeping str8 as an alias for bytes until str8 is removed. It's based on Alexandres patch which itself is partly based on my patch. It breaks a hell of a lot but it could give you a head start.

b'' b'' type(b'') <type 'bytes'> type(b'') is str8 True type(b'') is bytes True type(buffer(b'')) <type 'buffer'>

I'll keep working on the patch.

Crys



More information about the Python-3000 mailing list