python implementation of a new integer encoding algorithm.

Dave Angel davea at davea.name
Tue Feb 17 09:12:56 EST 2015


On 02/17/2015 06:22 AM, janhein.vanderburg at gmail.com wrote:
> In http://optarbvalintenc.blogspot.nl/ I propose a new way to encode arbitrarily valued integers and the python code that can be used as a reference for practical implementations of codecs.
>
> The encoding algorithm itself is optimized for transmission and storage requirements without any processor load consideration whatsoever.
>
> The next step is the development of the python code that minimizes processor requirements without compromising the algorithm.
>
> Is this the right forum to ask for help with this step or to obtain references to more suitable platforms?
>

This is a fine forum for such a discussion.  I for one would love to 
participate.  However, note that it isn't necessary true that "the 
smaller the better" is a good algorithm.  In context, there are 
frequently a number of tradeoffs, even ignoring processor time (as you 
imply).

Many years ago I worked on some code that manipulated the intermediate 
files for a compiler.  I had no say in the format, I just had to deal 
with it.

They had a field type called a "compressed integer."  It could vary 
between one byte and I think about six.  And the theory was that it took 
less space than the equivalent format of fixed size integers.  The catch 
from my point was that these integers could appear in the middle of a 
struct, and thus access to the later fields of the struct required a 
dynamic calculation.  This put a huge onus on my code to read or write 
the data serially.  I ended up solving it by writing code that generated 
40 thousand lines of C++ header and body code, so that the rest of the 
code didn't care.

Was it worth it?  To reduce the size of some files that only lived a few 
seconds on disk?  I seriously doubt it.  But I learned a lot in the process.

On another project, the goal was to be able to recover data from 
archives in spite of physical damage to some files.  So I had to add 
selective redundancy for that.  In the process, I also compress the 
data, but confine the compression algorithm to relatively small pieces 
of data, and label those pieces independently, so that any single 
decompression error affects only one unit of data.

So going back to your problem, and assuming that the other issues are 
moot, what's your proposal?  Are you compressing relative to a straight 
binary form of storage?  Are you assuming anything about the relative 
likelihood of various values of integers?  Do you provide anything to 
allow for the possibility that your prediction for probability 
distribution isn't met?

-- 
DaveA



More information about the Python-list mailing list