Compression of random binary data

Gregory Ewing greg.ewing at canterbury.ac.nz
Sat Oct 28 22:00:23 EDT 2017


Ben Bacarisse wrote:
> But that has to be about the process that gives rise to the data, not
> the data themselves.

> If I say: "here is some random data..." you can't tell if it is or is
> not from a random source.  I can, as a parlour trick, compress and
> recover this "random data" because I chose it.

Indeed. Another way to say it is that you can't conclude
anything about the source from a sample size of one.

If you have a large enough sample, then you can estimate
a probability distribution, and calculate an entropy.

> I think the argument that you can't compress arbitrary data is simpler
> ...  it's obvious that it includes the results of previous
> compressions.

What? I don't see how "results of previous compressions" comes
into it. The source has an entropy even if you're not doing
compression at all.

-- 
Greg



More information about the Python-list mailing list