[Spambayes] Question about training via the web interface

Skip Montanaro skip at pobox.com
Wed Apr 14 22:15:43 EDT 2004


>>>>> "Tony" == Tony Meyer <tameyer at ihug.co.nz> writes:

    Tony> I do think that it would be interesting to have sb_server somehow
    Tony> able to use tte.py.  Offhand, I'm not all that sure what the best
    Tony> way to offer this would be, though.

Perhaps defer actual training until the user clicks a "train" button?

    Tony> You feed tte.py a few hundred messages at a time, yes?  How well
    Tony> do you think it would work if you feed it a smaller number (60, or
    Tony> whatever it is that sb_server displays on a single page)?

Yes, at the moment I'm feeding it about 1000 messages total.  As far as
training speed, feeding it 60 messages would be a breeze.  As far as
classification quality I suspect it would be about the same as normal
training on a similar number of messages.

Here's a thought...  Instead of blasting through your entire training set
all at once, break it into chunks, say 100 messages each.  t-t-e on the
first set, then using the resulting database t-t-e on the second set, etc.
My guess is that after training to exhaustion on set 1, more messages in set
2 will score properly on the first pass and not need to be used as training
fodder.  The result might be a faster run time for the entire set and a
smaller database.

maybe...

Skip




More information about the Spambayes mailing list