Compression of random binary data

jonas.thornvall at gmail.com jonas.thornvall at gmail.com
Tue Jul 12 13:35:16 EDT 2016


Den tisdag 12 juli 2016 kl. 17:12:01 UTC+2 skrev Steven D'Aprano:
> On Wed, 13 Jul 2016 12:24 am, jonas.thornvall at gmail.com wrote:
> 
> > Den måndag 11 juli 2016 kl. 20:38:51 UTC+2 skrev Steven D'Aprano:
> >> On Tue, 12 Jul 2016 03:52 am, jonas.thornvall at gmail.com wrote:
> >> 
> >> > What kind of statistic law or mathematical conjecture  or is it even a
> >> > physical law is violated by compression of random binary data?
> >> 
> >> The pigeon hole principle. If you have 100 pigeon holes, and 101 pigeons,
> >> then clearly at least one pigeon hole must have two pigeons in it.
> [...]
> > But it seem your reasoning is based upon interpretation of the actual
> > digits, bits and bytes value.
> 
> Not at all. If you think that, you've misread my example. There's no
> interpretation of the bytes: they are just 8-bit numbers from 0 to 255. You
> cannot losslessly compress all 256 of them to just four 2-bit numbers.
> 
> 
> > There could be different interpretation 
> > worlds of course you would have to chose one using digits, An 
> > interpretationworld here could be reading out different word lengths of
> > the dataset and maybe a lookup table.
> 
> Any lookup table you have counts as part of the compressed data.
> 
> 
> > But it could also be arithmetic rules that magically recreate a number
> > from a number of folds or difference of folds.
> 
> Oh, sure, if you believe in magic, anything is possible. Just close your
> eyes, click your heels together, and wish really, really hard.
> 
> Suppose I could compress ANY random data, no matter what, down to 10% of the
> original size. Okay, let's start with a million bits of data. Compress it
> down to 100,000 bits.
> 
> But I believe that I can compress *anything*, any random collection of data.
> Okay, let me compress it again. Now I have 10,000 bits.
> 
> Compress it again. Now I have 1,000 bits.
> 
> Compress it again. Now I have 100 bits.
> 
> Compress it again. Now I have 10 bits.
> 
> Compress it again. Now I have 1 bit, either a 0 or a 1.
> 
> 
> Can you not see how absurd this is? I have claimed that I can take *any*
> random set of data, and by compressing it again and again and again,
> compress it down to ONE BIT, either a 0 or a 1, WITHOUT LOSS. Somehow I
> have to take that 0 bit and uncompress it back to the Complete Works Of
> William Shakespeare, and *also* uncompress it back to the recent Deadpool
> movie, AND uncompress it back to last year's Ant Man movie, AND uncompress
> it back to some funny picture of a cat.
> 
> How can I possibly know which of the billions and billions of different
> files this 0 bit represents?
> 
> If you pass me a 0 bit, and say "uncompress this", and I get The Lord Of The
> Rings novels, and then you pass me another 0 bit, and I uncompress it and
> get The Hobbit, well, how did I tell the two bits apart? They're both zero.
> 
> 
> 
> The alternative is to say, it doesn't matter how clever you are, you can't
> compress *everything*. There are some things that simply won't compress.
> Eventually you get something no longer compresses. If you could compress
> EVERYTHING, then you could compress the compressed data, and compress the
> compressed-compressed data, and so on, until you've got only a single bit.
> And that is ridiculous.
> 
> 
> 
> -- 
> Steven
> “Cheer up,” they said, “things could be worse.” So I cheered up, and sure
> enough, things got worse.

No it is only compressible down to a limit given by the algorithm.



More information about the Python-list mailing list