A little amusing Python program

Andrew Dalke dalke at dalkescientific.com
Fri Oct 5 19:24:48 EDT 2001


Jeff Sandys
>> Another program shown at an AI conference was a
>> document classifier.  To determine which folder to add
>> the document to, it simply compare the size of the
>> tarred folders before and after adding the document.

Tom Good:
>I don't get that last part.  How does comparing the size of the
>folders before and after do anything useful?  Wouldn't all of the
>folders increase by the size of the file?

I'm assuming they were also compressed.  Depending on the
compression scheme, if there is a lot of text (words, phrases)
that are shared between the documents then closely related
text should compress better then more distantly related text.

But that makes a lot of assumptions on the compression, like
that it doesn't reset itself after a given amount of data.

                    Andrew
                    dalke at dalkescientific.com







More information about the Python-list mailing list