Bits and bytes?

Daniel Fackrell dfackrell at DELETETHIS.linuxmail.org
Thu Jun 13 10:18:19 EDT 2002


Guyon,

In a case where the total output from your algorithm does not yield a size
evenly divisible by 8, you'll probably have to pad to the nearest byte
(generally done with 0 bits) before you can write to a file, as files these
days are always (?) stored as a multiple of 8 bits.  In fact, that generally
goes for data in memory as well.

I would avoid, however, any padding inside the bit stream, as you'll lose
effective compression.  Better to treat it as if there were no byte
boundaries at all until you need to write it to disk.

Maybe this, combined with the other ideas presented will help you out.

--
Daniel Fackrell (dfackrell at linuxmail.org)
When we attempt the impossible, we can experience true growth.

"Guyon Morée" <gumuz at looze.net> wrote in message
news:3d08839a$0$226$4d4ebb8e at news.nl.uu.net...
> ok, really a lot of thanx for your help, this is awesome! :D
>
> but, let me explain what i am actually trying to do here and why i think
> both of your solutions won't work for me.
>
> i am trying to implement my own sort of huffman encoding/compression.
> converting asci-characters to a compressed bitstream requires me to write
it
> at bit-level. Bill's solution might work in some way, but my sequence of
> bits aren't allways divideable by eight. a word like 'ape' (which is
3x8=24
> bits) might become 1001101, which is 7 bits.... catch my drift? strings is
> no option as this would have no positive effect on the 'real size'
;there's
> no compression.
>
> i hope this makes it a bit clearer for you. anyway, thanx a lot.
>
> guyon





More information about the Python-list mailing list