python implementation of a new integer encoding algorithm.

Thu Feb 19 13:46:23 EST 2015

On 02/19/2015 01:32 PM, Ian Kelly wrote:
> On Thu, Feb 19, 2015 at 11:24 AM, Dave Angel <davea at davea.name> wrote:
>> Here's a couple of ranges of output, showing that the 7bit scheme does
>> better for values between 384 and 16379.
>>
>> 382 2 80fe --- 2 7e82
>> 383 2 80ff --- 2 7f82
>> 384 3 810000 --- 2 0083
>> 384  jan grew 3 810000
>> 385 3 810001 --- 2 0183
>> 386 3 810002 --- 2 0283
>> 387 3 810003 --- 2 0383
>> 388 3 810004 --- 2 0483
>> 389 3 810005 --- 2 0583
>>
>> 16380 3 813e7c --- 2 7cff
>> 16380  jan grew 3 813e7c
>> 16380 7bit grew 2 7cff
>> 16381 3 813e7d --- 2 7dff
>> 16382 3 813e7e --- 2 7eff
>> 16383 3 813e7f --- 2 7fff
>> 16384 3 813e80 --- 3 000081
>> 16384 7bit grew 3 000081
>> 16385 3 813e81 --- 3 010081
>> 16386 3 813e82 --- 3 020081
>> 16387 3 813e83 --- 3 030081
>> 16388 3 813e84 --- 3 040081
>> 16389 3 813e85 --- 3 050081
>>
>> In all my experimenting, I haven't found any values where the 7bit scheme
>> does worse.  It seems likely that for extremely large integers, it will, but
>> if those are to be the intended distribution, the 7bit scheme could be
>> replaced by something else, like just encoding a length at the beginning,
>> and using raw bytes after that.
>
> It looks like you're counting whole bytes, not bits. That would be
> important since the "difficult" encoding uses fractional bytes.
>

Not the implementation that was shared.  I've only seen one set of 
Python code for "difficult", and it was strictly bytes.  As i said 
earlier in the message you quoted from.

Naturally, I question whether the original description makes sense for 
sub-bytes, since it was claimed that these are NOT for lists or 
sequences of arbitrary integers, but only for a single one at a time. 
Presumably mixed with other data which may or may not like bit encoding.

-- 
DaveA