Compression of random binary data
Peter J. Holzer
hjp-usenet3 at hjp.at
Tue Oct 24 16:09:56 EDT 2017
On 2017-10-23 04:21, Steve D'Aprano <steve+python at pearwood.info> wrote:
> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote:
>>
> If the probability of certain codes (either single codes, or sequences of
> codes) are non-equal, then you can take advantage of that by encoding the
> common cases into a short representation, and the uncommon and rare cases
> into a longer representation. As you say:
>
>
>> Otherwise, if ( 0, 0 ) is much more frequent,
>> we can encode ( 0, 0 ) by "0" and
>>
>> ( 0, 1 ) by "101",
>> ( 1, 0 ) by "110", and
>> ( 1, 1 ) by "111".
>>
>> And we could then use /less/ than two bits on the
>> average.
>
> That's incorrect. On average you use 2.5 bits.
>
> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.)
I disagree. If the distribution is not equal, then the average needs to
take the different probabilities into account.
Let's assume that (0, 0) has a probability of 90 %, (0, 1) a probability
of 10 % and (1, 0) and (1, 1) a probability of 5 % each.
Then the average length is
0.9 * 1 bit + 0.1 * 3 bits + 0.05 * 3 bits + 0.05 * 3 bits = 1.5 bits.
hp
--
_ | Peter J. Holzer | Fluch der elektronischen Textverarbeitung:
|_|_) | | Man feilt solange an seinen Text um, bis
| | | hjp at hjp.at | die Satzbestandteile des Satzes nicht mehr
__/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
More information about the Python-list
mailing list