Compression of random binary data

Ian Kelly ian.g.kelly at gmail.com
Wed Oct 25 09:27:59 EDT 2017


On 10/24/17, Richard Damon <Richard at damon-family.org> wrote:
> My understanding of the 'Random Data Comprehensibility' challenge is
> that is requires that the compression take ANY/ALL strings of up to N
> bits, and generate an output stream no longer than the input stream, and
> sometime less.

That's incorrect, at least of the challenge being discussed. Here's the link:

http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/

The challenge is just to take a single known file of a million random
digits and make it smaller, including the size of the decompresser and
without hiding data. So far in 15 years nobody has succeeded even at
this, and nobody knows whether it's impossible. For instance it may be
the case that the data in the file happens to be the nth prime, in
which case it could simply be compressed to the number n with a
decompresser that calculates process.

> It admits that given no-uniformly distributed data, it is
> possible to compress some patterns, the common ones, and expand others,
> the uncommon ones, to lower the net average length. What it says can't
> be done is to have a compression method that compresses EVERY input
> pattern. That is where the 'Pigeon Hole' principle comes into play which
> the people who claim to be able to compress random data like to ignore
> or just attempt to say doesn't apply.

There is a second challenge on that page that is as you state, but the
page admits that this second challenge is a bit of a troll since this
is provably impossible.



More information about the Python-list mailing list