Compression of random binary data

Ben Bacarisse ben.usenet at bsb.me.uk
Thu Oct 26 18:53:30 EDT 2017


Gregory Ewing <greg.ewing at canterbury.ac.nz> writes:

> Ben Bacarisse wrote:
>> The trouble is a pedagogic one.  Saying "you can't compress random data"
>> inevitably leads (though, again, this is just my experience) to endless
>> attempts to define random data.
>
> It's more about using terms without making sure everyone agrees
> on the definitions being used.
>
> In this context, "random data" really means "uniformly distributed
> data", i.e. any bit sequence is equally likely to be presented as
> input. *That's* what information theory says can't be compressed.

But that has to be about the process that gives rise to the data, not
the data themselves.  No finite collection of bits has the property you
describe.

If I say: "here is some random data..." you can't tell if it is or is
not from a random source.  I can, as a parlour trick, compress and
recover this "random data" because I chose it.

A source of random can be defined but "random data" is much more
illusive.

>> I think "arbitrary data" (thereby including the results of compression
>> by said algorithm) is the best way to make progress.
>
> I'm not sure that's much better, because it doesn't home in
> on the most important thing, which is the probability
> distribution.

I think the argument that you can't compress arbitrary data is simpler
to make.  You don't have to define it (except in the very simplest
terms) and it's obvious that it includes the results of previous
compressions.

-- 
Ben.



More information about the Python-list mailing list