Bytes indexing returns an int

Robin Becker robin at reportlab.com
Wed Jan 8 06:05:49 EST 2014


On 07/01/2014 19:48, Serhiy Storchaka wrote:
........
> data[0] == b'\xE1'[0] works as expected in both Python 2.7 and 3.x.
>
>
I have been porting a lot of python 2 only code to a python2.7 + 3.3 version for 
a few months now. Bytes indexing was a particular problem. PDF uses quite a lot 
of single byte indicators so code like

if text[k] == 'R':
    .....

or

dispatch_dict.get(text[k],error)()

is much harder to make compatible because of this issue. I think this change was 
a mistake.

To get round this I have tried the following class to resurrect the old style 
behaviour

if isPy3:
	class RLBytes(bytes):
		'''simply ensures that B[x] returns a bytes type object and not an int'''
		def __getitem__(self,x):
			if isinstance(x,int):
				return RLBytes([bytes.__getitem__(self,x)])
			else:
				return RLBytes(bytes.__getitem__(self,x))

I'm not sure if that covers all possible cases, but it works for my dispatching 
cases. Unfortunately you can't do simple class assignment to change the 
behaviour so you have to copy the text.

I find a lot of the "so glad we got rid of byte strings" fervour a bit silly. 
Bytes, chars,  words etc etc were around long before unicode. Byte strings could 
already represent unicode in efficient ways that happened to be useful for 
western languages. Having two string types is inconvenient and error prone, 
swapping their labels and making subtle changes is a real pain.
-- 
Robin Becker




More information about the Python-list mailing list