key/value store optimized for disk storage
Steve Howell
showell30 at yahoo.com
Fri May 4 02:29:28 EDT 2012
On May 3, 11:03 pm, Paul Rubin <no.em... at nospam.invalid> wrote:
> Steve Howell <showel... at yahoo.com> writes:
> > Sounds like a useful technique. The text snippets that I'm
> > compressing are indeed mostly English words, and 7-bit ascii, so it
> > would be practical to use a compression library that just uses the
> > same good-enough encodings every time, so that you don't have to write
> > the encoding dictionary as part of every small payload.
>
> Zlib stays adaptive, the idea is just to start with some ready-made
> compression state that reflects the statistics of your data.
>
> > Sort of as you suggest, you could build a Huffman encoding for a
> > representative run of data, save that tree off somewhere, and then use
> > it for all your future encoding/decoding.
>
> Zlib is better than Huffman in my experience, and Python's zlib module
> already has the right entry points. Looking at the docs,
> Compress.flush(Z_SYNC_FLUSH) is the important one. I did something like
> this before and it was around 20 lines of code. I don't have it around
> any more but maybe I can write something else like it sometime.
>
> > Is there a name to describe this technique?
>
> Incremental compression maybe?
Many thanks, this is getting me on the right path:
compressor = zlib.compressobj()
s = compressor.compress("foobar")
s += compressor.flush(zlib.Z_SYNC_FLUSH)
s_start = s
compressor2 = compressor.copy()
s += compressor.compress("baz")
s += compressor.flush(zlib.Z_FINISH)
print zlib.decompress(s)
s = s_start
s += compressor2.compress("spam")
s += compressor2.flush(zlib.Z_FINISH)
print zlib.decompress(s)
More information about the Python-list
mailing list