[Python-Dev] Byte string class hierarchy

"Martin v. Löwis" martin at v.loewis.de
Thu Aug 19 00:38:31 CEST 2004


Jack Jansen wrote:
> genericbytes
>     mutablebytes
>     bytes
>         genericstring
>             string
>             unicode

I think this hiearchy is wrong. unicode is not a specialization of
genericybytes: a unicode strings is made out of characters, not out
of bytes.

> The basic type for all bytes, buffers and strings is genericbytes. This 
> abstract base type is neither mutable nor immutable, and has the 
> interface that all of the types would share. Mutablebytes adds slice 
> assignment and such. Bytes, on the other hand, adds hashing. 

There is a debate on whether such a type is really useful. Why do you
need hashing on bytes?

> genericstring is the magic stuff that's there already that makes unicode 
> and string interoperable for hashing and dict keys and such.

Interoperability, in Python, does not necessarily involve a common base
type.

> Casting to a basetype is always free and doesn't copy anything

And, of course, there is no casting at all in Python.

> Operations like concatenation return the most specialised class. 

Assuming the hieararchy on the top of your message, what does that mean?
Suppose I want to concatenate unicode and string: which of them is
more specialized?

> Read() is guaranteed only to return genericbytes, but if you open a file 
> in textmode they'll returns strings, and we should add the ability to 
> open files for unicode and probably mutablebytes too. 

I think Guido's proposal is that read(), in text mode, returns Unicode
strings, and (probably) that there is no string type in Python anymore.
read() on binary files would return a mutable byte array.

Regards,
Martin


More information about the Python-Dev mailing list