Outlook add-in bug (Was: [Spambayes] Tokenizing ideas (images, attachments))

Harri Pesonen harri.pesonen at wicom.com
Wed Aug 27 16:55:57 EDT 2003


Now I realized that this is in fact a more fatal problem. Because my
Possible (unsure) folder is also in Personal Folders, and add-in does
not handle html body correctly from there, then moving an e-mail from
Possible to Spam folder does not teach it properly. I just got the
following message into Possible:

<html>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<body>
<center><!--ld1u552wqc3g--><a
href="http://www.greatdf45.com/host/default.asp?ID=omni"><img
src="http://bigsalesxz.com/pics/gv1.gif" height="270"
width="405"></a></center>
</html></body>

Again, no text. I moved it to Spam folder, where it got 100%
probability. I checked spam clues for this message both in Possible and
Spam folder, and it does not show the message text at all. When I moved
it into Inbox (it got 1%), only then it showed the above text.

Clearly a bug, which can be avoided by creating Spam and Unsure folders
in Mailbox folder (where Inbox is). It seems that Personal Folders are
not supported correctly.

I moved the message again to Inbox, just to see the spam clues:

Spam Score: 0.00268853

word                                spamprob         #ham  #spam
'*H*'                               0.995217            -      -
'*S*'                               0.000594429         -      -
'url:gif'                           0.0266272           8      0
'url:default'                       0.0348837           6      0
'url:bigsalesxz'                    0.0918367           2      0
'url:greatdf45'                     0.0918367           2      0
'url:gv1'                           0.0918367           2      0
'url:omni'                          0.0918367           2      0
'url:pics'                          0.0918367           2      0

It is clear that moving the message twice back to Inbox corrupted the
database. Moving it to Spam folder does not remove ham status from these
url tokens. How to fix this... Now I created the Spam and Unsure folder
in Mailbox, and moved the message to Spam. It got 95%. Now all these
urls have 1 in #ham and #spam. Then I added one folder in Personal
Folders as filter folder, and moved it there, and back. Now these urls
have 0 in #ham and 2 in #spam. :-)

Btw, see url:greatdf45, it again has digits in the end of domain name.
If a spammer has tens of thousands of these, it would be better to
remove the digits from the end when tokenizing. The other url,
bigsalesxz, adds letters xz. Tokenizer should remove these as well, so
that it yields url:bigsales. But how to do it, that's another question.

Harri

-----Original Message-----
From: Harri Pesonen 
Sent: 27. elokuuta 2003 11:18
To: spambayes at python.org
Subject: RE: [Spambayes] Tokenizing ideas (images, attachments)


It seems that the plugin shows the html body correctly when I check the
messages in Inbox. When I check the messages in Spam folder, which is in
Personal Folders, then it shows only the plain text. So no big problem
here, because it works in Inbox.

-----Original Message-----
From: Meyer, Tony [mailto:T.A.Meyer at massey.ac.nz] 
Sent: 27. elokuuta 2003 10:48
To: Harri Pesonen
Cc: spambayes at python.org
Subject: RE: [Spambayes] Tokenizing ideas (images, attachments)


> I use the latest 0.7. I have done some Outlook programming as 
> well, and I know that there is a separate HTMLBody property, 
> I guess that the plugin just gets the Body property, that is 
> empty in this case.

No, the plugin definitely looks at the html as well.  If you look at the
clues for a text/plain message, do you see the message body in the
stream?  If you look at the clues for a text/html message, do you see
the body in the stream?  It looks more like something is awry with the
plug-in/setup.

> Thanks for the :1 explanation.

No worries.

=Tony Meyer



More information about the Spambayes mailing list