[Spambayes] Re: Practical Applications

Tim Peters tim.one@comcast.net
Mon, 23 Sep 2002 11:25:25 -0400


[Gary Robinson]
> Suppose I wanted to build a practical application on spambayes that
> integrated with Outlook but it just couldn't be done as elegantly with
> separate Python code communicating with Outlook via COM.
>
> Would the open source license spambayes is using allow me to translate
> spambayes into Visual Basic for Applications?????

Sure.

> I love Python and hate the idea of VBA but whatever gets the job done
> best...
>
> also, if translation is allowable, about how many lines of code does
> spambayes have?

Note that it would take a great many more lines of Visual Basic.

C:\Code\spambayes>wc *.py
    292    1296   11204 Options.py
    243     862    8603 TestDriver.py
    173     639    6355 Tester.py
    221     597    5874 cdb.py
    733    3939   30411 classifier.py
     98     346    2527 cmp.py
    314    1072    9089 hammie.py
    142     438    3977 hammiesrv.py
    253    1853   11434 heapq.py
     12      43     297 hmm.py
    101     448    3404 loosecksum.py
     89     257    2227 mboxcount.py
    174     457    4209 mboxtest.py
     91     285    2729 mboxutils.py
    172     736    6177 neilfilter.py
     59     156    1490 neiltrain.py
     10      29     250 pik.py
    582    2286   22269 pop3proxy.py
     72     166    1712 randomtrain.py
     96     361    2777 rates.py
    171     643    5094 rebal.py
    498    1954   16656 sets.py
      8      11     156 setup.py
    100     303    2445 split.py
    110     367    2909 splitn.py
    117     386    3203 splitndirs.py
     23      49     575 temp.py
    213     685    5980 timcv.py
    122     334    3141 timtest.py
   1053    5754   40349 tokenizer.py
     99     272    2794 unheader.py
   6441   27024  220317 total

C:\Code\spambayes>

The only "core code" there is:

   1053    5754   40349 tokenizer.py
    733    3939   30411 classifier.py

These have huge comment-line to code ratios, because comment blocks discuss
dozens of alternatives, and even include some full "this way and that way"
comparative test-run output.

The other files are in support of data massaging, test driving, and
analyzing test output.

Oops:  sets.py and heapq.py implement set and priority queue datatypes.

My classifier.py is bigger than anyone else's at the moment, because it
contains full implementations of Graham's scheme, the scheme on your
webpage, and the central-limit scheme.

pop3proxy.py is a very cool way to hook up any mail agent to the classifier
(you point your mail reader at *it*, and it sucks down the email from your
real pop server, adds a header line with the classifier's spam judgement,
and passes the messages so augmented onto your mail reader).