python implementation of a new integer encoding algorithm.

Dave Angel davea at davea.name
Wed Feb 18 08:54:29 EST 2015


On 02/18/2015 04:04 AM, janhein.vanderburg at gmail.com wrote:
> On Tuesday, February 17, 2015 at 3:35:16 PM UTC+1, Chris Angelico wrote:
>> Oh, incidentally: If you want a decent binary format for
>> variable-sized integer, check out the MIDI spec.
>
> I did some time ago, thanks, and it is indeed a decent format.
> I also looked at variations of that approach.
> None of them beats

Define "beats."  You might mean beats in simplicity, or in elegance, or 
in clarity of code.  But you probably mean in space efficiency, or 
"compression."  But that's meaningless without a target distribution of 
values that you expect to encode.

For example, if 99.9% of your values are going to be less than 255, then 
the most efficient byte encoding would be one that simply stores a value 
less than 255, and starts with an FF for larger values.  It's almost 
irrelevant how it encodes those larger values.

On the other hand, if most values are going to be in the 10,000 to 
20,000 bit size range, and a few will be much smaller, and a few will be 
very much larger, then it would be very practical to start with a size 
field, say 16 bits, followed by the raw packed data.  Naturally, the 
size field would need to have an escape value that indicates a larger 
field was needed.  In fact, the size field could be encoded in a 
7bits-per-byte manner, so it would encode an arbitrary sized number as well.


> "my" concept of two counters that cooperatively specify field lengths and represented integer values.
>
>> I've tried to read through the original algorithm description, but I'm
>> not entirely sure: How many payload bits per transmitted byte does it
>> actually achieve?
>
> I don't think that payload bits per byte makes sense in this concept.
>

Correct.  Presumably one means average payload bits per byte.

First one would have to define what the "standard" unencoded variable 
length integer format was.  Then one could call that size the payload 
size.  Then, in order to compute an average, one would have to specify 
an expected, or target distribution of values.  One then compares and 
averages the payload size for each typical value with the encoded size.

-- 
DaveA



More information about the Python-list mailing list