[Spambayes] Question about training via the web interface
Skip Montanaro
skip at pobox.com
Wed Apr 14 22:15:43 EDT 2004
>>>>> "Tony" == Tony Meyer <tameyer at ihug.co.nz> writes:
Tony> I do think that it would be interesting to have sb_server somehow
Tony> able to use tte.py. Offhand, I'm not all that sure what the best
Tony> way to offer this would be, though.
Perhaps defer actual training until the user clicks a "train" button?
Tony> You feed tte.py a few hundred messages at a time, yes? How well
Tony> do you think it would work if you feed it a smaller number (60, or
Tony> whatever it is that sb_server displays on a single page)?
Yes, at the moment I'm feeding it about 1000 messages total. As far as
training speed, feeding it 60 messages would be a breeze. As far as
classification quality I suspect it would be about the same as normal
training on a similar number of messages.
Here's a thought... Instead of blasting through your entire training set
all at once, break it into chunks, say 100 messages each. t-t-e on the
first set, then using the resulting database t-t-e on the second set, etc.
My guess is that after training to exhaustion on set 1, more messages in set
2 will score properly on the first pass and not need to be used as training
fodder. The result might be a faster run time for the entire set and a
smaller database.
maybe...
Skip
More information about the Spambayes
mailing list