[Python-ideas] duck typing for io write methods

Fri Jun 14 11:00:16 CEST 2013

Steven D'Aprano <steve at ...> writes:

> 
> On 14/06/13 00:41, Wolfgang Maier wrote:
> 
> >>> It's funny you mention that difference since that was how I came across my
> >>> issue. I was looking for a way to get back the Python 2.7 behaviour
> >>> bytes('1234')
> >>> '1234'
> >>
> >> You mean other than using the bytes literal b'1234' instead of a
> >> string literal? Bytes and text are different things in Python 3,
> >> whereas the 2.x "bytes" was just an alias for "str".
> >>
> >
> > Well, I was illustrating the case with a literal integer, but, of course, I
> > was thinking of cases with references:
> > a=1234
> > str(a).encode() # gives b'1234' in Python3, but converting your int to str
> > first, just to encode it again to bytes seems weird
> 
> On the contrary, it is the most natural way to do it. Converting objects
directly to bytes is not
> conceptually obvious. I can think of at least TWELVE obvious ways which
the int 4 might convert to bytes
> (displaying in all hex, rather than the more compact but less consistent
forms):
> 
> # Treat it as a 8-bit, 16-bit, 32-bit or 64-bit integer:
> b'\x04'

this is what's currently happening with:
bytes([4])

> b'\x00\x04'
> b'\x04\x00'
> b'\x00\x00\x00\x04'
> b'\x04\x00\x00\x00'
> b'\x00\x00\x00\x00\x00\x00\x00\x04'
> b'\x04\x00\x00\x00\x00\x00\x00\x00'
>

these would be ways to make bytes([seq of ints]) work with numbers > 255,
which is currently not possible. Maybe bytes([seq of ints]) could take an
additional encoding argument that specifies how many bytes to reserve per int.

> # Convert it to the string '4' first, then encode to bytes
> # as UTF-8, UTF-16, or UTF-32:
> b'\x34'
> b'\x00\x34'
> b'\x34\x00'
> b'\x34\x00\x00\x00'
> b'\x00\x00\x00\x34'
>

this is what str(int).encode() does, but is quite complicated, since it
actually generates a full-blown Python string object first, then encodes
this to bytes again. What should be done, I think, is that a int_to_byte()
function or method converts each digit of an int to its ascii code and turns
this into bytes. Of course, this would be done in C, so the only high-level
object ever generated would be the final bytes object.

> The actual behaviour, where bytes(4) => b'\x00\x00\x00\x00', I consider to
be neither obvious nor
> especially useful. If bytes were mutable, then bytes(4) would be a useful
way to initialise a block of four
> bytes for later modification. But they aren't, so I don't really see the
point. The obvious way to get four
> NUL bytes is surely b'\0'*4, so it's also redundant.
> 
> That you can't even subclass int and override it, like you can override
every other dunder method (__str__,
> __repr__, __add__, __mul__, etc.) strikes me as astonishingly weird and in
violation of the Zen:
> 
> Special cases aren't special enough to break the rules.
> 
> I imagine that the code for the bytes builtin looks something like this in
pseudo-code:
> 
> if isinstance(arg, int):
>      special case int
> elif isinstance(arg, str):
>      special case str
> else:
>      call __bytes__ method
> 
> I don't think it would effect performance very much, if at all, if it were
changed to:
> 
> if type(arg) is int:
>      special case int
> elif type(arg) is str:
>      special case str
> else:
>      call __bytes__ method
> 
> ints and strs will have to grow a dunder method in order to support
inheritance, but the implication could be
> as simple as:
> 
> def __bytes__(self):
>      return bytes(int(self))
> 
> def __bytes__(self, encoding):
>      return bytes(str(self), encoding)
> 
> Of course, I may have missed some logic for the current behaviour.
> 

I find the current implementation very disturbing, too, and would very much
favour a solution like yours. Nick argued that it would slow down native str
and bytes unduely, and I'm in no position to argue against this. He's
probably thought it through more deeply than we could, but, yes, the current
way is against the Zen.
Wolfgang