[Python-Dev] What type of object mmap.read_byte should return on py3k?

Sat Feb 28 13:01:09 CET 2009

Hirokazu Yamamoto wrote:
> Hello. I noticed mmap.read_byte returns 1-length unicode on py3k. I felt
> this was strange, so I created issue on bug tracker
> (http://bugs.python.org/issue5391) and Martin proposed this is suitable
> for discussion on python-dev. I'll quote messages on bug tracker here.
> 
> I wrote:
>> On Python3000, mmap.read_byte returns str not bytes, and mmap.write_byte
>> accepts str. Is this intended behavior?
>>
>>>>> import mmap
>>>>> m = mmap.mmap(-1, 10)
>>>>> type(m.read_byte())
>> <class 'str'>
>>>>> m.write_byte("a")
>>>>> m.write_byte(b"a")
>>
>> Maybe another possibility. read_byte() returns int which represents
>> byte, write_byte accepts int which represents byte. (Like b"abc"[0]
>> returns int not 1-length bytes)
> 
> Martin wrote:
>> Indeed, I think it should use the "b" code, instead of the "c" code.
>> Please discuss this on python-dev, though.
>>
>> It might not be ok to backport this to 3.0, since it may break existing
>> code.
> 
>> Furthermore, all other uses of the "c" code might need to be
>> reconsidered.

It certainly seems like mmap should be playing in an all-bytes world
(where only already encoded strings are allowed). On the specific
question of whether it would be better for read_byte()/write_byte to use
1-length bytes objects or integers, I have no strong opinion (the former
is closer to the 2.x class API, the later more consistent with the
operation of the 3.x bytes class).

However, as Martin says, it wouldn't be reasonable to backport the fixes
in this to 3.0 - the associated API changes would almost certainly break
otherwise working code.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------