python implementation of a new integer encoding algorithm.

Dave Angel davea at davea.name
Tue Feb 17 10:18:33 EST 2015


On 02/17/2015 09:58 AM, Chris Angelico wrote:
> On Wed, Feb 18, 2015 at 1:50 AM, Dave Angel <davea at davea.name> wrote:
>> But the first thing I'd expect to see would be a target estimate of the
>> anticipated distribution of number values/magnitudes.  For example, if a
>> typical integer is 1500 bits, plus/minus 200 bits, I'd probably try encoding
>> in base 65535, and have a terminator character of 0xffff.
>
> Sure. Not sure how you'd cope with an interior FFFF in the stream
> without drastically losing efficiency, though.

That's why it was base 65535, not 65536.

>
>> An interesting point of history.  At the time of Huffman's paper, it was
>> provable that it was the best possible lossless compression.  But there were
>> some unwritten assumptions, such as that each character would be encoded
>> with a whole number of bits.  Changing that assumption makes arithmetic
>> coding possible.  Next assumption was that probabilities of each character
>> didn't depend on context.  Changing that assumption enables LZ.  And so on.
>
> Didn't LZ predate arithmetic coding?

I really don't know.  I looked up Huffman's article (before Internet, I 
found it in a library) from the 1952 IRE journal (I believe that's where 
it was).  I probably read it about 1980.

But arithmetic coding and LZ both came later to me, and that's the order 
I saw them in.

> But yes, removing those
> assumptions has enabled some pretty amazing alternatives. Just
> describing arithmetic coding to them is enough to blow most people's
> minds. (Especially my sister. She hates infinity, mainly - I think -
> because it doesn't fit inside her brain. I troll her occasionally with
> Hilbert's Hotel or anything involving the notion of different sizes of
> infinity.)


-- 
DaveA



More information about the Python-list mailing list