[spambayes-bugs] [ spambayes-Bugs-1007285 ] unable to train empty mail

SourceForge.net noreply at sourceforge.net
Wed Nov 3 05:18:08 CET 2004


Bugs item #1007285, was opened at 2004-08-12 02:00
Message generated for change (Settings changed) made by anadelonbrin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1007285&group_id=61702

Category: pop3proxy
Group: Source code 1.0rc1
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Fredrik (fjoll)
Assigned to: Nobody/Anonymous (nobody)
Summary: unable to train empty mail

Initial Comment:
I've recently begun to recieve empty emails and
spambayes is unable to train these :(

they look like this:

X-Apparently-To: xxxxx at xxxxx via 206.190.37.64; Wed, 11
Aug 2004 05:47:47 -0700
X-Originating-IP: [213.214.194.102]
Return-Path: <vhjlvtbr at inbox.ru>
Received: from 213.214.194.102  (EHLO smtp2.home.se)
(213.214.194.102)
  by mta442.mail.scd.yahoo.com with SMTP; Wed, 11 Aug
2004 05:47:47 -0700
Received: from lns-th2-5f-81-56-228-176.adsl.proxad.net
not authenticated [81.56.228.176]
	by smtp2.home.se with NetMail SMTP Agent $Revision:  
3.22.1.8  $ on Novell NetWare;
	Wed, 11 Aug 2004 14:23:31 +0200
X-Message-Info: y[1
X-Spambayes-Classification: unsure
Subject: unsure
X-Spambayes-MailId: 1092232483

------
Since i get a pretty hefty sum of these, it sure would
be nice to be able to filter them. running pop3-proxy
rc2 from source on a gnu-system.

----------------------------------------------------------------------

Comment By: Fredrik (fjoll)
Date: 2004-08-12 10:41

Message:
Logged In: YES 
user_id=940770

"By "unable to train these" I assume you mean that no matter
how many of them you train on, SpamBayes still classifies
them as unsure."

Exactly, sorry for not making it more clear. And thanks for
the tip. i'll try it out and see how it works.

----------------------------------------------------------------------

Comment By: Kenny Pitt (kpitt)
Date: 2004-08-12 02:45

Message:
Logged In: YES 
user_id=859086

By "unable to train these" I assume you mean that no matter
how many of them you train on, SpamBayes still classifies
them as unsure.

This is a known issue with extremely short messages,
particularly in this case where there is no body at all. The
problem is that there are so few clues available for
SpamBayes to use in classifying the message.

There are a couple of options that IIRC are not enabled by
default, but may provide additional clues to help classify
these messages. One of them is available from the Advanced
Configuration page, but the other must be manually added to
the config file so I'll just show both of them that way.

In your SpamBayes data directory along with your training
databases, you should find a "bayescustomize.ini" file. I
run Windows so I'm afraid I don't know what path is used on
Linux, but I assume it's somewhere under your HOME
directory. In this file, look for a section starting with
"[Tokenizer]". If that heading is not present then add it.
In that section, add the option settings show below:

[Tokenizer]
mine_received_headers:True
record_header_absence:True

The mine_received_headers option adds tokens derived from
the Received: headers, and the record_header_absence option
adds tokens to indicate when common headers such as From:,
To:, or Subject: are missing.

I can't guarantee that this will improve your results, but
it might.  I notice in particular that the message you
included has no From: header, and recording the absence of
that header could quickly become a strong spam clue since
very few legitimate messages will be missing the From: header.

Unfortunately, if this doesn't help then there probably
isn't much else we can do. SpamBayes can only work with the
data that is available in the message, and there just isn't
much data here to work with.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=1007285&group_id=61702


More information about the Spambayes-bugs mailing list