python implementation of a new integer encoding algorithm.

Dave Angel davea at davea.name
Thu Feb 19 13:41:48 EST 2015


On 02/19/2015 01:34 PM, Chris Angelico wrote:
> On Fri, Feb 20, 2015 at 5:24 AM, Dave Angel <davea at davea.name> wrote:
>> In all my experimenting, I haven't found any values where the 7bit scheme
>> does worse.  It seems likely that for extremely large integers, it will, but
>> if those are to be the intended distribution, the 7bit scheme could be
>> replaced by something else, like just encoding a length at the beginning,
>> and using raw bytes after that.
>
> Encoding a length (as varlen) and then using eight bits to the byte
> thereafter is worse for small numbers,

I only suggested this if it turns out that the distribution is primarily 
extremely large numbers, large enough that 7bit isn't good enough.

As I (and others) have said many times, making it optimal means making 
some assumptions about the distribution of likely values.

> breaks even around 2**56, and
> then is better. So unless your numbers are mainly going to be above
> 2**56, it's better to just use varlen for the entire number. On the
> other hand, if you have to stream this without over-reading (imagine
> streaming from a TCP/IP socket; you want to block until you have the
> whole number, but not block after that), it may be more efficient to
> take the length, and then do a blocking read for the main data,
> instead of a large number of single-byte reads. But on the gripping
> hand, you can probably just do those one-byte reads and rely on (or
> implement) lower-level buffering.
>
> Ask not the python-list for advice, because they will say both "yes"
> and "no" and "maybe"... because they will say all three of "yes",
> "no", "maybe", and "you don't need to do that"... erm, AMONG our
> responses will be such diverse elements as...
>
> ChrisA
>


-- 
DaveA



More information about the Python-list mailing list