[Python-Dev] bytes type discussion

Wed Feb 15 00:35:14 CET 2006

On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:

> I'm about to send 6 or 8 replies to various salient messages in the
> PEP 332 revival thread. That's probably a sign that there's still a
> lot to be sorted out. In the mean time, to save you reading through
> all those responses, here's a summary of where I believe I stand.
> Let's continue the discussion in this new thread unless there are
> specific hairs to be split in the other thread that aren't addressed
> below or by later posts.
>
> Non-controversial (or almost):
>
> - we need a new PEP; PEP 332 won't cut it
>
> - no b"..." literal
>
> - bytes objects are mutable
>
> - bytes objects are composed of ints in range(256)
>
> - you can pass any iterable of ints to the bytes constructor, as long
> as they are in range(256)

Sounds like array.array('B').

Will the bytes object support the buffer interface?  Will it accept  
objects supporting the buffer interface in the constructor (or a  
class method)?  If so, will it be a copy or a view?  Current  
array.array behavior says copy.

> - longs or anything with an __index__ method should do, too
>
> - when you index a bytes object, you get a plain int

When slicing a bytes object, do you get another bytes object or a  
list?  If its a bytes object, is it a copy or a view?  Current  
array.array behavior says copy.

> - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
>
> Somewhat controversial:
>
> - it's probably too big to attempt to rush this into 2.5
>
> - bytes("abc") == bytes(map(ord, "abc"))
>
> - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,  
> 256])

It would be VERY controversial if ord('\xff') == 256 ;)

> Very controversial:
>
> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"  
> argument
>
> - bytes(u"abc") == bytes("abc") # for ASCII at least
>
> - bytes(u"\x80\xff") raises UnicodeError
>
> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
>
> Martin von Loewis's alternative for the "very controversial" set is to
> disallow an encoding argument and (I believe) also to disallow Unicode
> arguments. In 3.0 this would leave us with s.encode(<encoding>) as the
> only way to convert a string (which is always unicode) to bytes. The
> problem with this is that there's no code that works in both 2.x and
> 3.0.

Given a base64 or hex string, how do you get a bytes object out of  
it?  Currently str.decode('base64') and str.decode('hex') are good  
solutions to this... but you get a str object back.

-bob