Working with bytes.
Anton Vredegoor
anton at vredegoor.doge.nl
Mon Apr 5 05:29:14 EDT 2004
Piet van Oostrum <piet at cs.uu.nl> wrote:
>AV>
[snip]
>Which includes quite a few NON-ASCII characters.
>So what is ASCII-compliant about it?
>You can't store 7 bits per byte and still be ASCII-compliant. At least if
>you don't want to include control characters.
Thanks, and yes you are right. I thought that getting rid of control
codes just meant switching to the high bit codes, but of course
control codes are part of the lower bit population and can't be
removed that way. Worse than that: high bit codes are not
ASCII-compliant at all!
However the code below has the 8'th and 7'th bit always set to 0 and 1
respectively, so it should produce ASCII-compliant output using 6 bits
per byte.
I wonder whether it would be possible to use more than six bits per
byte but less than seven? There seem to be some character codes left
and these could be used too?
Anton
from itertools import islice
def _bits(i):
return [('01'[i>>j & 1]) for j in range(8)][::-1]
_table = dict([(chr(i),_bits(i)) for i in range(256)])
def _bitstream(bytes):
for byte in bytes:
for bit in _table[byte]:
yield bit
def _drop_first_two(gen):
while 1:
gen.next()
gen.next()
for x in islice(gen,6):
yield x
def sixes(bytes):
""" stream normal bytes to bytes where bits 8,7 are 0,1 """
gen = _bitstream(bytes)
while 1:
R = list(islice(gen,6))
if not R: break
s = '01'+ "".join(R) + '0' * (6-len(R))
yield chr(int(s,2))
def eights(bytes,n):
""" the reverse of the sixes function :-| """
gen = _bitstream(bytes)
df = _drop_first_two(gen)
for i in xrange(n):
s = ''.join(islice(df,8))
yield chr(int(s,2))
def test():
from random import randint
size = 20
R = [chr(randint(0,255)) for i in xrange(size)]
bytes = ''.join(R)
sx = ''.join(sixes(bytes))
check = ''.join(eights(sx,size))
assert check == bytes
print sx
if __name__ == '__main__':
test()
output:
VMtdh[LII~Qexdyg}xFRhXRIVx
More information about the Python-list
mailing list