[Python-Dev] methods on the bytes object

Mon May 1 22:54:01 CEST 2006

This discussion seems to have gotten a bit out of hand. I believe it
belongs on the python-3000 list.

As a quick commentary, I see good points made by both sides. My
personal view is that we should *definitely* not introduce a third
type, and that *most* text-based activities should be done in the
(Unicode) string domain.

That said, I expect a certain amount of parsing to happen on bytes
objects -- for example, I would say that CPython's current parser is
parsing bytes since its input is UTF-8. There are also plenty of
text-based socket protocols that are explicitly defined in terms of
octets (mostly containing ASCII bytes only); I can see why some people
would want to write handlers that parse the bytes directly.

But instead of analyzing or arguing the situation to death, I'd like
to wait until we have a Py3k implementation that implements something
approximating the proposed end goal, where 'str' represents unicode
characters, and 'bytes' represents bytes, and we have separate I/O
APIs for binary (bytes) and character (str) data. I'm hoping to make
some progress towards this goal in the p3yk (sic) branch. It appears
that before we can switch the meaning of 'str' we will first have to
implement the new I/O library, which is what I'm focusing on right
now. I already have a fairly minimal but functional bytes type, which
I'll modify as I go along and understand more of the requirements.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)