python implementation of a new integer encoding algorithm.

Dave Angel davea at davea.name
Wed Feb 18 17:19:48 EST 2015


On 02/18/2015 02:55 PM, janhein.vanderburg at gmail.com wrote:
> Op woensdag 18 februari 2015 17:47:49 UTC+1 schreef Dave Angel:
>> On 02/18/2015 03:59 AM, janhein.vanderburg at gmail.com wrote:
>>
>>
>>> encoding individual integers optimally without any assumptions about their values.
>>>
>>
>> Contradiction in terms.
>>
>> --
>> DaveA
>
> Not.
> Jan-Hein.
>

Then you had better define your new word "optimal" to us.  I decided to 
try your algorithm for all the values between 0 and 999999.  One million 
values, and the 7bit encoding[1] beat yours for 950081 of them.  The 
rest were tied.  Yours never was shorter.

For a uniform distribution of those particular values, the average 
number of bytes used by yours was 3.933568 bytes, and by 7bit encoding 
was 2.983487

For the second and third million, yours are all 4 bytes, while 7bit uses 
3. Beyond 2097152, 7bit uses 4 bytes, same as you.  Between 16 and 17 
million, you average 4.156865, while 7bit is a constant 4.0.

After that, I started spot-checking.  I went up to 100 billion, and for 
none of those I tried did your algorithm take fewer bytes than 7bit.


So how is yours optimal?  Over what range of values?

I'm not necessarily doubting it, just challenging you to provide a data 
sample that actually shows it.  And of course, I'm not claiming that 
7bit is in any way optimal.  You cannot define optimal without first 
defining the distribution.

[1] by 7bit, I'm referring to the one apparently used in MIDI encoding, 
where 7 bits of each byte hold the value, and the 8th bit is zero, 
except for the last byte, where the 8th bit is one.  So 3 bytes can 
encode 21 bits, or up to 2**21 - 1.

-- 
DaveA



More information about the Python-list mailing list