[Tutor] Possible Useless Python Challenge.
Deirdre Saoirse Moen
deirdre@deirdre.net
Mon, 7 May 2001 22:00:17 -0700
At 10:38 PM -0500 5/7/01, Rob Andrews wrote:
>Sometimes I barely understand a word of what's currently being discussed
>and still think it sounds nifty. This is such a time.
Well, to grossly simplify the two types of compression:
Huffman uses a variable number of bits to represent characters --
more common letters are represented by fewer bits; less common by
more bits, on the theory that, on average, you'll save lots of bits.
In its simplest form (but not a very common implementation these
days), it uses three bits to store tree depth and the remaining bits
to store tree location, i.e., for characters "etaion" appearing in
descending of frequency:
000 e
/ \
001 t a
/ \ /
010 i o n
So, e = 000, t = 0011, a = 0010, i = 01011, o = 01010, n = 01001
In this form, you have to store the dictionary (i.e. the tree
diagram) as a part of the compressed document, so the doc has to be
fairly long (a few K) for the compression to be effective.
Other forms use two trees and/or a known dictionary.
Lempel-Ziv is compression based on pairs of characters that appear
together commonly.
http://www.faqs.org/faqs/compression-faq/part2/section-1.html
> > I've written both Lempel-Ziv and Huffman compressors and
> > decompressors, but not LZH per se and not in Python. Given that
>> Python has all the requisite bitwise operators, it would be possible,
>> but it'd probably be better to write a wrapper for an existing set of
> > C functions.
--
_Deirdre Stash-o-Matic: http://weirdre.com http://deirdre.net
"I love deadlines. I like the whooshing sound they make as they fly by."
- Douglas Adams