[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

Tue Feb 14 05:59:03 CET 2006

On Feb 13, 2006, at 7:29 PM, Guido van Rossum wrote:

> There's one property that bytes, str and unicode all share: type(x[0])
> == type(x), at least as long as len(x) >= 1. This is perhaps the
> ultimate test for string-ness.

But not perfect, since of course other containers can contain objects  
of their own type too.  But it leads to an interesting issue...

> Or should b[0] be an int, if b is a bytes object? That would change
> things dramatically.

This makes me think I want an unsigned byte type, which b[0] would  
return.  In another thread I think someone mentioned something about  
fixed width integral types, such that you could have an object that  
was guaranteed to be 8-bits wide, 16-bits wide, etc.   Maybe you also  
want signed and unsigned versions of each.  This may seem like YAGNI  
to many people, but as I've been working on a tightly embedded/ 
extended application for the last few years, I've definitely had  
occasions where I wish I could more closely and more directly model  
my C values as Python objects (without using the standard workarounds  
or writing my own C extension types).

But anyway, without hyper-generalizing, it's still worth asking  
whether a bytes type is just a container of byte objects, where the  
contained objects would be distinct, fixed 8-bit unsigned integral  
types.

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects. Many of these exist, and
> they are probably all being converted to support unicode as well as
> str (if it makes sense at all). Should a bytes object be considered as
> a sequence of things, or as a single thing, from the POV of these
> types of APIs? Should we try to standardize how code tests for the
> difference? (Currently all sorts of shortcuts are being taken, from
> isinstance(x, (list, tuple)) to isinstance(x, basestring).)

I think bytes objects are very much like string objects today --  
they're the photons of Python since they can act like either  
sequences or scalars, depending on the context.  For example, we have  
code that needs to deal with situations where an API can return  
either a scalar or a sequence of those scalars.  So we have a utility  
function like this:

def thingiter(obj):
     try:
         it = iter(obj)
     except TypeError:
         yield obj
     else:
         for item in it:
             yield item

Maybe there's a better way to do this, but the most obvious problem  
is that (for our use cases), this fails for strings because in this  
context we want strings to act like scalars.  So we add a little test  
just before the "try:" like "if isinstance(obj, basestring): yield  
obj".  But that's yucky.

I don't know what the solution is -- if there /is/ a solution short  
of special case tests like above, but I think the key observation is  
that sometimes you want your string to act like a sequence and  
sometimes you want it to act like a scalar.  I suspect bytes objects  
will be the same way.

-Barry