xor: how come so slow?

Steven D'Aprano steve at REMOVE-THIS-cybersource.com.au
Sat Oct 18 23:01:50 EDT 2008


On Sat, 18 Oct 2008 09:16:11 +1300, Lawrence D'Oliveiro wrote:

> In message <Jlj*lmIps at news.chiark.greenend.org.uk>, Sion Arrowsmith
> wrote:
> 
>> Maybe it should be "fewer random data".
> 
> Except these days we tend to think of "data" being, say, more like
> "flour" than "bees", so it's "less data", like "less flour", rather than
> like "fewer bees". :)
> 
>> After all, each byte in the block is discrete.
> 
> Data can come in fractional bits. That's how compression works.

Compression works by removing redundant data so the same amount of useful 
information requires fewer bits. If you don't believe me, try compressing 
a single bit and see if you get a "fractional bit". I leave it to you to 
choose whether the bit should be on or off.

That's why compressing data twice doesn't gain much, if anything. Try 
compressing a typical JPEG image, chances are it won't get any smaller. 
That's because JPEGs already remove the redundancy in the data, 
compressing it. With no redundancy, there can be no compression.

Due to the pigeon-hole principle, any reversible compression scheme must 
cause some data to be expanded rather than compressed. (If not, then 
there must be two different pieces of data that compress to the same 
thing, which means the scheme is not reversible.) If there were 
fractional bits, that wouldn't apply, because you can always divide a 
pigeon-hole (a bit) into two smaller pigeon-holes.



-- 
Steven



More information about the Python-list mailing list