Compression of random binary data

Sat Oct 28 20:32:52 EDT 2017

On Sun, 29 Oct 2017 07:03 am, Peter Pearson wrote:

> On Thu, 26 Oct 2017 19:26:11 -0600, Ian Kelly <ian.g.kelly at gmail.com> wrote:
>>
>> . . . Shannon entropy is correctly calculated for a data source,
>> not an individual message . . .
> 
> Thank you; I was about to make the same observation.  When
> people talk about the entropy of a particular message, you
> can bet they're headed for confusion.

I don't think that's right. The entropy of a single message is a well-defined
quantity, formally called the self-information. The entropy of a data source
is the expected value of the self-information of all possible messages coming
from that data source.

https://en.wikipedia.org/wiki/Self-information

We can consider the entropy of a data source as analogous to the population
mean, and the entropy of a single message as the sample mean. A single
message is like a sample taken from the set of all possible messages.

Self-information is sometimes also called "surprisal" as it is a measure of
how surprising an event is. If your data source is a coin toss, then actually
receiving a Heads has a self-information ("entropy") of 1 bit. That's not
very surprising. If your data source is a pair of fair, independent dice,
then the self-information of receiving a two and a four is 5.170 bits. Its a
logarithmic scale, not linear: if the probability of a message or event is p,
the self-information of that event is log_2 (1/p) bits.

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.