[spambayes-bugs] [ spambayes-Feature Requests-1206807 ] "Trojan text"
SourceForge.net
noreply at sourceforge.net
Mon May 23 06:39:33 CEST 2005
Feature Requests item #1206807, was opened at 2005-05-23 16:33
Message generated for change (Comment added) made by anadelonbrin
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1206807&group_id=61702
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Matt (matthew_levine)
Assigned to: Nobody/Anonymous (nobody)
Summary: "Trojan text"
Initial Comment:
Some spam will have long sections of text from random
sources, such as excerpts of classic novels or books of
quotes, so there will be lots of normal, i.e. hammy,
words to get the spam past filters. The spam content
will consist of urls and possibly images.
An obvious solution would be to search the urls for spam
clues, and you already have this as an experimental
feature. However, that feature only works for emails that
are below a certain threshold of tokens, and the phony
text could easily put it over that threshold. So I suggest
that either the feature should be able to check urls in all
messages, or it could also kick in when some
conditions are fulfilled that indicate the likely presence
of "Trojan text," such as a high number of ham words
along with linked images.
Additionally, I suggest that when this feature causes a
message to be registered as spam, SpamBayes should
not be spam-trained on the "Trojan text," because it was
inserted specifically to throw off spam filters, so the filter
should work better if it's ignored.
----------------------------------------------------------------------
>Comment By: Tony Meyer (anadelonbrin)
Date: 2005-05-23 16:39
Message:
Logged In: YES
user_id=552329
The experimental (available with 1.0.4 or 1.1a1) URL
slurping options do more-or-less what you describe. Please
feel free to try them out and suggest any specific
improvements to them, and let us know whether they do
improve your results or not.
Identifying text that doesn't fit with the message is fairly
complicated - DSPAM has a "noise" detection algorithm that
does this. We may try this at some point.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=1206807&group_id=61702
More information about the Spambayes-bugs
mailing list