[Spambayes] More on Training Disparity Issues

Tony Meyer tameyer at ihug.co.nz
Mon Jul 19 11:41:41 CEST 2004


> I occasionally load the Review messages page -- it works 
> fine with this version of ZoneAlarm -- but it already shows
> mostly spam.

There are some options to try and help with this.  You can set the default
action for the various categories (defaulting to "discard" for spam, for
example), and you can have different actions for ham below a threshold and
spam above a threshold ("train as spam" for all spam below 0.7 and "discard"
for all spam above 0.7, for example).  You can set the number of messages of
each type (ham, spam, unsure) to display per page (this defaults to
something like 100,000, so in all likelihood all messages for that day).

> In today's listing of untrained messages (for me, Sunday -- 
> here -- isn't a "typical" day; most days, I get more hams and
> unsures), the Review Messages page has 34 messages classified
> as unsure, 15 hams, and nearly 1200 classified as
> spam.

For example, here you could set the rows per category to something like 50,
and not see most of the spam (without clicking to the next page).

> I have to click on some of the messages and look at the View 
> Message screen.  No problem, of course, except that it's extra
> steps -- remember the volume of mail that I'm dealing with
> -- that I don't have to take if, instead, I'm looking at
> the hard copy of the actual message in my inbox and can see 
> everything at a glance.

The SMTP proxy is one way around this.  It's an alternative training method
where you forward messages ("as an attachment" with Netscape) to a special
address (e.g. spambayes-spam at localhost or spambayes-ham at localhost) and do
the training that way.  It's (currently) one message at a time, though, so
perhaps not a convenient solution.

A better alternative, but one that's not quite ready yet, would be the
pop3dnd script (it's in the source archive, but really needs a bit more work
and a lot more testing before it's ready for prime time and a binary
release).  This lets you do drag-and-drop training in your regular mailer
(without actually integrating into the mailer, so it's still unimportant
which one you use).  Not much use now, but some day...

> 2 - Here is an item maybe you or someone could fix:
> 
> Remember, my lists of untrained messages are very long.  If I 
> am on the Review screen and click on the message and look at
> it in the View Message screen, your program doesn't register
> a placeholder.  So, rather than returning to that
> message in the list, so that I can keep scanning subject 
> lines, when I return from the View Message screen to the Review
> page, I am sent back to the top of the list.

I've added a feature request tracker for this, so will get to it at some
point (unless someone beats me to it).

[ 993679 ] Remember place on review page when viewing message
<http://sourceforge.net/tracker/index.php?func=detail&aid=993679&group_id=61
702&atid=498106>

> Let's assume that, for example, all the messages in my Unsure 
> folder are spam. Using the "Train on a message, mbox file or
> dbx file" section of the Web Interface page, can I simply browse
> to and upload the Unsure folder and click the "Train as Spam"
> button, and it will be as effective as if a) I had trained
> on those messages from the Review messages page, and as good 
> as if b) I had displayed the page source for those messages and 
> cut-and-pasted them in their entirety into the "Train on a
> message ..." box?

Yes, that's correct.

> 2 - There's no way to sort POP3 messages by ascending/descending
> order of spam score using the Web Interface, is there?  I believe
> I've noticed some conversation about this capability in Outlook,
> but I can't figure out any way to do it with the Web Interface.

There is, with some catches.  To do this you turn on the advanced option to
add the "score" header to messages (in the "Headers" section on the Advanced
Configuration page), and the advanced option to show the "score" column in
the review page (in the "Interface Options" section on the Advanced
Configuration page).

The main catch is that there's a bug in all the current releases which means
that the scores are sorted as strings, which means it goes something like
"1, 10, 11, 12, ..., 2, 20, ...".  I fixed this a few days ago, but the
change won't make it out into a released version for a while.  Apart from
that, it should mostly work.

> 3 - I'm not sure which folders are which.  I may have missed 
> this in the help files and FAQs, but it would be helpful to
> have a list of file names and functions and the typical
> folders in which they reside -- a sort of "typical"
> SpamBayes hierarchy.  It might be necessary to do different 
> versions for the different flavors of SpamBayes, but you'll
> know where and what and how.

I'll try and remember to add this to the appropriate documentation.  It
seems like a sensible addition - thanks!

> This list is exceptionally helpful

We try - note that there are periods when all the regular people doing the
answering get quite busy at the same time, and so replies are slow to come.
They do come eventually, though.  (And this is not one of those times).  The
difference in time zones (e.g. I'm in NZ; I think Adam is in the US) can
also mean that responses are delayed sometimes.

=Tony Meyer

---
Please always include the list (spambayes at python.org) in your replies
(reply-all), and please don't send me personal mail about SpamBayes. This
way, you get everyone's help, and avoid a lack of replies when I'm busy.



More information about the Spambayes mailing list