Compression of random binary data

Sun Oct 29 11:17:53 EDT 2017

Gregory Ewing <greg.ewing at canterbury.ac.nz> writes:

> Ben Bacarisse wrote:
>> But that has to be about the process that gives rise to the data, not
>> the data themselves.
>
>> If I say: "here is some random data..." you can't tell if it is or is
>> not from a random source.  I can, as a parlour trick, compress and
>> recover this "random data" because I chose it.
>
> Indeed. Another way to say it is that you can't conclude
> anything about the source from a sample size of one.
>
> If you have a large enough sample, then you can estimate
> a probability distribution, and calculate an entropy.
>
>> I think the argument that you can't compress arbitrary data is simpler
>> ...  it's obvious that it includes the results of previous
>> compressions.
>
> What? I don't see how "results of previous compressions" comes
> into it. The source has an entropy even if you're not doing
> compression at all.

Maybe we are taking at cross purposes.  A claim to be able to compress
arbitrary data leads immediately to the problem that iterating the
compression will yield zero-size results.  That, to me, is a simpler
argument that talking about data from a random source.

-- 
Ben.