[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?] (original) (raw)
Guido van Rossum guido at python.org
Wed Feb 15 00:13:41 CET 2006
- Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2/13/06, Barry Warsaw <barry at python.org> wrote:
This makes me think I want an unsigned byte type, which b[0] would return. In another thread I think someone mentioned something about fixed width integral types, such that you could have an object that was guaranteed to be 8-bits wide, 16-bits wide, etc. Maybe you also want signed and unsigned versions of each. This may seem like YAGNI to many people, but as I've been working on a tightly embedded/ extended application for the last few years, I've definitely had occasions where I wish I could more closely and more directly model my C values as Python objects (without using the standard workarounds or writing my own C extension types).
So I'm taking that the specific properties you want to model are the overflow behavior, right? N-bit unsigned is defined as arithmethic mod 2**N; N-bit signed is a bit more tricky to define but similar. These never overflow but instead just throw away bits in an exactly specified manner (2's complement arithmetic).
While I personally am comfortable with writing (x+y) & 0xFFFF (for 16-bit unsigned), I can see that someone who spends a lot of time doing arithmetic in this field might want specialized types.
But I'm not sure that that's what the Numeric folks want -- I believe they're more interested in saving space, not in the mod 2**N properties. So (here I'm to some extent guessing) they have different array types whose elements are ints or floats of various widths; I'm guessing they also have scalars of those widths for consistency or to guide the creation of new arrays from scalars. I wouldn't be surprised if, rather than requiring N-bit 2's complement, they would prefer more flexible control over overflow -- e.g. ignore, warn, error, turn into NaN, etc.
But anyway, without hyper-generalizing, it's still worth asking whether a bytes type is just a container of byte objects, where the contained objects would be distinct, fixed 8-bit unsigned integral types.
There's certainly a point to treating bytes as ints; I don't know if it's more compelling than to treating them as unit bytes. But if we decide that the bytes types contains ints, b[0] should return a plain int (whose value necessarily is in range(0, 256)), not some new unsigned-8-bit type. And creating a bytes object from a list of ints should accept any input values as long as their index value is in that same range.
I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and bytes([-1]) should raise a ValueError.
> There's also the consideration for APIs that, informally, accept > either a string or a sequence of objects. Many of these exist, and > they are probably all being converted to support unicode as well as > str (if it makes sense at all). Should a bytes object be considered as > a sequence of things, or as a single thing, from the POV of these > types of APIs? Should we try to standardize how code tests for the > difference? (Currently all sorts of shortcuts are being taken, from > isinstance(x, (list, tuple)) to isinstance(x, basestring).)
I think bytes objects are very much like string objects today -- they're the photons of Python since they can act like either sequences or scalars, depending on the context. For example, we have code that needs to deal with situations where an API can return either a scalar or a sequence of those scalars. So we have a utility function like this: def thingiter(obj): try: it = iter(obj) except TypeError: yield obj else: for item in it: yield item Maybe there's a better way to do this, but the most obvious problem is that (for our use cases), this fails for strings because in this context we want strings to act like scalars. So we add a little test just before the "try:" like "if isinstance(obj, basestring): yield obj". But that's yucky. I don't know what the solution is -- if there /is/ a solution short of special case tests like above, but I think the key observation is that sometimes you want your string to act like a sequence and sometimes you want it to act like a scalar. I suspect bytes objects will be the same way.
I agree it's icky, and I'd rather not design APIs like that -- but I can't help it that others continue to want to use that idiom. I also agree that most likely we'll want to treat bytes the same as strings here. But no basestring (bytes are mutable and don't behave like sequences of characters).
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
- Previous message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Next message: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]