[issue18468] re.group() should never return a bytearray

Guido van Rossum report at bugs.python.org
Thu Jul 18 17:02:17 CEST 2013


Guido van Rossum added the comment:

> Ezio Melotti added the comment:
[...]
> IIUC the advantage of changing the behavior is that it won't keep the target string alive anymore, but on the other hand is not backward compatible and makes things more difficult for people who want the same type back.

Everyone seems to be afraid of backward compatibility here. I will
take full responsibility, so let's just discuss what's the better API,
regardless of what we did (and in 99% of the cases it's the same
anyway).

"People who want the same type back" -- there is no evidence that
anyone wants this. "People who want a bytes object" -- this is
definitely a valid use case.

> If people always want bytes back regardless of the input, they can convert the input or output to bytes explicitly.

But this requires an extra copy if the input is a bytearray. I suspect
this might be the most commonly used non-bytes non-str target in
Python 3 programs, and we are striving to support bytearray as input
in as many places as possible where plain bytes is accepted. But
generally getting bytearray as output requires a different API, e.g.
recv_into().

I think a very reasonable general rule is that for functions that take
either str or bytes and adjust their output to the input type, if
their input is one of the bytes alternatives (bytearray, memoryview,
array.array('b'), maybe others) the output is always a bytes object.

The reason is that while the buffer API makes it easy to access the
underlying bytes from C, it doesn't give you a way to create a new
object of the same type (except by slicing, which doesn't always
apply, e.g. os.listdir()). So for creating return values that match a
memoryview (or bytearray, etc.) input, the only reasonable thing is to
return a bytes object.

(FWIW os.listdir() violates this too -- os.listdir(b'.') returns a
list of bytes objects, while os.listdir(bytearray(b'.')) returns a
list of str objects. This seems caused by revesed logic -- it probably
tests "if the type is bytes" rather than "if the type isn't str" for
the output type, even though it does the right thing with the
input...)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18468>
_______________________________________


More information about the Python-bugs-list mailing list