[Spambayes] Re: Practical Applications
Tim Peters
tim.one@comcast.net
Mon, 23 Sep 2002 11:25:25 -0400
[Gary Robinson]
> Suppose I wanted to build a practical application on spambayes that
> integrated with Outlook but it just couldn't be done as elegantly with
> separate Python code communicating with Outlook via COM.
>
> Would the open source license spambayes is using allow me to translate
> spambayes into Visual Basic for Applications?????
Sure.
> I love Python and hate the idea of VBA but whatever gets the job done
> best...
>
> also, if translation is allowable, about how many lines of code does
> spambayes have?
Note that it would take a great many more lines of Visual Basic.
C:\Code\spambayes>wc *.py
292 1296 11204 Options.py
243 862 8603 TestDriver.py
173 639 6355 Tester.py
221 597 5874 cdb.py
733 3939 30411 classifier.py
98 346 2527 cmp.py
314 1072 9089 hammie.py
142 438 3977 hammiesrv.py
253 1853 11434 heapq.py
12 43 297 hmm.py
101 448 3404 loosecksum.py
89 257 2227 mboxcount.py
174 457 4209 mboxtest.py
91 285 2729 mboxutils.py
172 736 6177 neilfilter.py
59 156 1490 neiltrain.py
10 29 250 pik.py
582 2286 22269 pop3proxy.py
72 166 1712 randomtrain.py
96 361 2777 rates.py
171 643 5094 rebal.py
498 1954 16656 sets.py
8 11 156 setup.py
100 303 2445 split.py
110 367 2909 splitn.py
117 386 3203 splitndirs.py
23 49 575 temp.py
213 685 5980 timcv.py
122 334 3141 timtest.py
1053 5754 40349 tokenizer.py
99 272 2794 unheader.py
6441 27024 220317 total
C:\Code\spambayes>
The only "core code" there is:
1053 5754 40349 tokenizer.py
733 3939 30411 classifier.py
These have huge comment-line to code ratios, because comment blocks discuss
dozens of alternatives, and even include some full "this way and that way"
comparative test-run output.
The other files are in support of data massaging, test driving, and
analyzing test output.
Oops: sets.py and heapq.py implement set and priority queue datatypes.
My classifier.py is bigger than anyone else's at the moment, because it
contains full implementations of Graham's scheme, the scheme on your
webpage, and the central-limit scheme.
pop3proxy.py is a very cool way to hook up any mail agent to the classifier
(you point your mail reader at *it*, and it sucks down the email from your
real pop server, adds a header line with the classifier's spam judgement,
and passes the messages so augmented onto your mail reader).