python implementation of a new integer encoding algorithm.

janhein.vanderburg at gmail.com janhein.vanderburg at gmail.com
Wed Feb 18 03:59:51 EST 2015


On Tuesday, February 17, 2015 at 3:13:41 PM UTC+1, Dave Angel wrote:
> This is a fine forum for such a discussion.  I for one would love to 
> participate.  However, note that it isn't necessary true that "the 
> smaller the better" is a good algorithm.  In context, there are 
> frequently a number of tradeoffs, even ignoring processor time (as you 
> imply).
Thanks Dave; about those trade offs:

I agree that allowing variable length encoding in general forces the programmer to process records in a file sequentially.
Random access to individual data items in such a stream is simply not an option, unless all items have been indexed in a directory that accompanies the data character stream.
So if you need random access to individual items put limits to each individual data item.
My applications do not allow me to do that, because I would always be to restrictive and be caught by the "640k should be enough for any programmer" trap.

> So going back to your problem, and assuming that the other issues are 
> moot, what's your proposal?  Are you compressing relative to a straight 
> binary form of storage?  Are you assuming anything about the relative 
> likelihood of various values of integers?  Do you provide anything to 
> allow for the possibility that your prediction for probability 
> distribution isn't met?

I'm not compressing sequences of integer encodings but encoding individual integers optimally without any assumptions about their values.



More information about the Python-list mailing list