[Spambayes] training
Tim Stone - Four Stones Expressions
tim at fourstonesExpressions.com
Wed Feb 19 08:23:09 EST 2003
2/19/2003 4:48:36 AM, "Mark Hammond" <mhammond at skippinet.com.au> wrote:
>> > Which, coincidently, leads us to what I have been advocating
>> > for some time <wink>.
>>
>> :)
>>
>> > The core spambayes code should persist
>> > the word database as now, but also a basic "message
>> > database".
>>
>> Do you mean one like pop3proxy's cache? i.e. one that
>> expires messages over a certain age?
>
>I actually just meant a simple msg_id->trained_as_spam dictionary - just a
>memory that a message had previously been trained as ham/spam, so a need to
>untrain and multiple requests for the same message can be detected. This is
>user-proof in the face of I-double-click-everywhere type users <wink>
This is a great idea. The filesystem based stuff (pop3proxy) will need to
keep a permanent copy of mails that have been trained in order for this to
work, but I don't have a problem with that.
>
>> > If this sounds OK, I've a further idea I will expand in email :)
>
>I meant to say "private email", but the list is quiet at the moment
><wink>...
>
>I was thinking that we could possibly abstract the database out one step
>more. Have a single "database manager" that maintains a few 'databases' -
>really just discrete tables, with no joins, in standard database parlance.
>What I'm trying to get at is that if we could have 2 dictionaries (existing
>word dictionary, plus one more "msg_id->how_was_trained") stored in a single
>file, and maybe even the possibility of additional "application defined"
>dictionaries (such as random config info) in that same file, life would be
>pretty peachy :)
>
>If we talk in terms of pickles, imagine:
>database['bayes'] = existing_bayes_pickle
>database['training'] = dict_I_proposed_above
>database['outlook_ui'] = dict_for_outlook_ui_options
We might replace Options.py with a pickled dictionary pointed to by this
dictionary. Or at least the user configurable stuff. The configurator for
bayescustomize.ini is an enormous pain, and getting worse as I try to write
'installers' for various pop3 mailers.
>
>And 'database' is pickled. I see no reason this couldn't also work for
>bsdbd. I am proposing that Corpus.py automatically manage the 'bayes' and
>'training' keys of the database, but leave others for applications. Bayes
>itself persists the entire database. Some naming convention would be just
>fine too :)
Very kewl ideas.
Getting-over-my-God's-gift-to-opensourcedness-ly, TimS
>
>Never-satisfied-ly,
>
>Mark.
>
c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org
More information about the Spambayes
mailing list