[Python-Dev] methods on the bytes object

"Martin v. Löwis" martin at v.loewis.de
Sun Apr 30 20:52:02 CEST 2006


Josiah Carlson wrote:
>> I think what you are missing is that algorithms that currently operate
>> on byte strings should be reformulated to operate on character strings,
>> not reformulated to operate on bytes objects.
> 
> By "character strings" can I assume you mean unicode strings which
> contain data, and not some new "character string" type?

I mean unicode strings, period. I can't imagine what "unicode strings
which do not contain data" could be.

> I know I must
> have missed some conversation. I was under the impression that in Py3k:
> 
> Python 1.x and 2.x str -> mutable bytes object

No. Python 1.x and 2.x str -> str, Python 2.x unicode -> str
In addition, a bytes type is added, so that
Python 1.x and 2.x str -> bytes

The problem is that the current string type is used both to represent
bytes and characters. Current applications of str need to be studied,
and converted appropriately, depending on whether they use
"str-as-bytes" or "str-as-characters". The "default", in some
sense of that word, is that str applications are assumed to operate
on character strings; this is achieved by making string literals
objects of the character string type.

> I was also under the impression that str.encode(...) -> bytes,
> bytes.decode(...) -> str

Correct.

> and that there would be some magical argument
> to pass to the file or open open(fn, 'rb', magical_parameter).read() ->
> bytes.

I think the precise details of that are still unclear. But yes,
the plan is to have two file modes: one that returns character
strings (type 'str') and one that returns type 'bytes'.

> I mention this because I do binary data handling, some ''.join(...) for
> IO buffers as Guido mentioned (because it is the fastest string
> concatenation available in Python 2.x), and from this particular
> conversation, it seems as though Python 3.x is going to lose
> some expressiveness and power.

You certainly need a "concatenate list of bytes into a single
bytes". Apparently, Guido assumes that this can be done through
bytes().join(...); I personally feel that this is over-generalization:
if the only practical application of .join is the empty bytes
object as separator, I think the method should be omitted.

Perhaps

  bytes(...)

or
  bytes.join(...)

could work?

Regards,
Martin


More information about the Python-Dev mailing list