[Tutor] Possible Useless Python Challenge.

Deirdre Saoirse Moen deirdre@deirdre.net
Mon, 7 May 2001 22:00:17 -0700


At 10:38 PM -0500 5/7/01, Rob Andrews wrote:
>Sometimes I barely understand a word of what's currently being discussed
>and still think it sounds nifty. This is such a time.

Well, to grossly simplify the two types of compression:

Huffman uses a variable number of bits to represent characters -- 
more common letters are represented by fewer bits; less common by 
more bits, on the theory that, on average, you'll save lots of bits. 
In its simplest form (but not a very common implementation these 
days), it uses three bits to store tree depth and the remaining bits 
to store tree location, i.e., for characters "etaion" appearing in 
descending of frequency:

000				e
                               /   \
001                         t       a
                           /   \    /
010                      i     o   n

So, e = 000, t = 0011, a = 0010, i = 01011, o = 01010, n = 01001

In this form, you have to store the dictionary (i.e. the tree 
diagram) as a part of the compressed document, so the doc has to be 
fairly long (a few K) for the compression to be effective.

Other forms use two trees and/or a known dictionary.

Lempel-Ziv is compression based on pairs of characters that appear 
together commonly.

http://www.faqs.org/faqs/compression-faq/part2/section-1.html


>  > I've written both Lempel-Ziv and Huffman compressors and
>  > decompressors, but not LZH per se and not in Python. Given that
>>  Python has all the requisite bitwise operators, it would be possible,
>>  but it'd probably be better to write a wrapper for an existing set of
>  > C functions.

-- 
_Deirdre     Stash-o-Matic: http://weirdre.com      http://deirdre.net
"I love deadlines. I like the whooshing sound they make as they fly by."
                                                          - Douglas Adams