[Python-Dev] GBayes design

M.-A. Lemburg mal@egenix.com
Thu, 05 Sep 2002 19:19:57 +0200


Raymond Hettinger wrote:
> Is it too late to challenge a core design decision?
> 
> Instead of multiplying probablities, use fuzzy logic methods.
> Classify the indicators into damning, strong, weak, neautral, ...
> 
> After counting the number of indicators in each class, make
> a spam/ham decision that can be easily tweaked.  This would
> make it easy to implement variations of Tim's recent clear
> win, where additional indicators are gathered until the
> balance shifts sharply to one side.
> 
> Some other advantages are:
> -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... )
> -- avoids mathematical issues with indicators not being independent
> -- allows the addition of non-token based indicators.  for instance,
>     a preponderance of caps would be a weak indicator.  the presence
>     of caps separated by spaces would be a strong indicator.
> -- the decision logic would be more intuitive
> -- avoids the issue of having equal amounts of spam and ham in
>     the sample
> 
> The core concept would stay the same -- it's really just a shift from
> continuous to discrete.

Hmm, there's nothing discrete about fuzzy logic (ok, this
claim is 0.65% true ;-)

The problem is more about multi-dimensional optimization where
you are interested in distilling several different inputs
into one value.

A weighted average is the simplest form to use here and there
are various multi-dimensional optimization algorithms around
to aid in finding the "optimal" weights.

Another approach would be using a shallow neural network.

The only "problem" with these is that Tim generates a
variable number of inputs, AFAICT, so that you'd have
to use some preprocessing to make the number of inputs
constant.

Would make a nice internship project, I guess :-)

-- 
Marc-Andre Lemburg
CEO eGenix.com Software GmbH
_______________________________________________________________________
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting:                               http://www.egenix.com/
Python Software:                    http://www.egenix.com/files/python/