Compression of random binary data

Ian Kelly ian.g.kelly at gmail.com
Thu Oct 26 21:26:11 EDT 2017


On Thu, Oct 26, 2017 at 2:38 PM,  <danceswithnumbers at gmail.com> wrote:
>
> Thomas Jollans
>
> On 2017-10-25 23:22, danceswi... at gmail.com wrote:
>> With every transform the entropy changes,
>
> That's only true if the "transform" loses or adds information.
>
> If it loses information, that's lossy compression, which is only useful
> in very specific (but also extremely common) circumstances.
>
> If it adds information, that's poetry, not compression.
>
>
>
> Not true! You can transform a stream lossless, and change its entropy without adding to or taking away. These two streams 16 bits are the same but one is reversed.
>
> 1111000010101011
> This equals
> 61611
> This can be represented using
> 0-6 log2(7)*5= 14.0367746103 bits
>
>
> 1101010100001111
> This equals
> 54543
> This can be represented using
> 0-5 log2(6)*5= 12.9248125036 bits
>
> In reality you can express 54543 with 10 bits.

I don't know what you're calculating here but it isn't Shannon
entropy. Shannon entropy is correctly calculated for a data source,
not an individual message, but if we assume the two numbers above to
be the result of a Bernoulli process with probabilities matching the
frequencies of the bits in the numbers, then the total entropy of 16
events is:

py> 16 * (-9/16 * math.log2(9/16) - 7/16 * math.log2(7/16))
15.81919053261596

Approximately 15.8 bits. This is the same no matter what order the
events occur in.



More information about the Python-list mailing list