Compression of random binary data

Nobody nobody at nowhere.invalid
Mon Jul 11 14:56:09 EDT 2016


On Mon, 11 Jul 2016 10:52:08 -0700, jonas.thornvall wrote:

> What kind of statistic law or mathematical conjecture  or is it even a
> physical law is violated by compression of random binary data?

You can't create an invertable mapping between a set with 2^N elements
(e.g. the set of all N-bit binary sequences) and any set with fewer than
2^N elements (e.g. the set of all M-bit binary sequences for M<N).

Lossless compression requires an invertable mapping.

For any lossless compression algorithm, there will always be inputs where
the output is larger than the input, even if only by a single bit.

Practical lossless compression schemes operate by mapping likely inputs to
short outputs and unlikely inputs to longer outputs, resulting in outputs
which are /on average/ shorter than the inputs.

Lossy compression can achieve as much compression as you want, providing
that you're willing to tolerate the resulting loss of information.




More information about the Python-list mailing list