Working with bytes.

Piet van Oostrum piet at cs.uu.nl
Mon Apr 5 10:37:09 EDT 2004


>>>>> anton at vredegoor.doge.nl (Anton Vredegoor) (AV) wrote:

AV> Piet van Oostrum <piet at cs.uu.nl> wrote:

>> Which includes quite a few NON-ASCII characters. 
>> So what is ASCII-compliant about it?
>> You can't store 7 bits per byte and still be ASCII-compliant. At least if
>> you don't want to include control characters.

AV> Thanks, and yes you are right. I thought that getting rid of control
AV> codes just meant switching to the high bit codes, but of course
AV> control codes are part of the lower bit population and can't be
AV> removed that way. Worse than that: high bit codes are not
AV> ASCII-compliant at all!

AV> However the code below has the 8'th and 7'th bit always set to 0 and 1
AV> respectively, so it should produce ASCII-compliant output using 6 bits
AV> per byte.

Except that the highest code you get is 0177 which is DEL, and is also a
control code. If you store 6 bits per byte that is also what BASE64 does,
so why reinvent the wheel?

AV> I wonder whether it would be possible to use more than six bits per
AV> byte but less than seven? There seem to be some character codes left
AV> and these could be used too?

Yes, you could in principle use 94 characters. There is a scheme called
btoa that encodes 4 bytes into 5 ASCII characters by using BASE85, but I
have never seen a Python implementation of it. It shouldn't be difficult,
however. 
-- 
Piet van Oostrum <piet at cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP]
Private email: P.van.Oostrum at hccnet.nl



More information about the Python-list mailing list