[Spambayes] Inspecting images (was: SpamBayes toHandleEmbeddedImages)
FreeMJ@HotPop.com
FreeMJ at HotPop.com
Thu Oct 27 05:55:00 CEST 2005
Ken,
Please post the entire Spambayes Clues listing, so I can see what Spambayes
is doing with all the erroneous ham text that's included at the bottom of
your e-mail message example.
Thanks,
FMJ
-----Original Message-----
From: spambayes-bounces at python.org [mailto:spambayes-bounces at python.org] On
Behalf Of Ken Gordon
Sent: Wednesday, October 26, 2005 8:26 AM
To: spambayes at python.org; <FreeMJ at HotPop.com>
Subject: Re: [Spambayes] Inspecting images (was: SpamBayes
toHandleEmbeddedImages)
There's a lot more to spambayes than just evaluating content. Here's the SB
Evidence header from a recent spam. But for 'charset', very little of this
has to do with the content, yet it was correctly classified as spam.
> X-Spambayes-Evidence: '*H*': 0.00; '*S*': 1.00;
'received:192.168.1':
> 0.10; 'subject:skip:B 10': 0.16; 'received:192.168': 0.20;
> 'received:192': 0.21; 'url:www': 0.23; 'content-type:image/jpeg':
> 0.34; 'to:addr:none': 0.38; 'header:Return-Path:1': 0.38;
> 'header:MIME-Version:1': 0.61; 'url:': 0.64; 'x-mailer:none': 0.71;
> 'to:no real name:2**0': 0.72; 'from:name:\x1b$b5z at nf`1{\x1b(b': 0.84;
> 'message-id:@imx100522.ath.cx': 0.84; 'received:imx100522.ath.cx':
> 0.84; 'url:fetish': 0.84; 'received:192.168.1.11': 0.91;
> 'received:kick': 0.91; 'content-type:multipart/related': 0.92;
> 'received:210.153': 0.93; 'received:ath.cx': 0.93; 'url:cc': 0.93;
> 'virus:src="cid:': 0.95; 'content-type/type:multipart/alternative':
> 0.96; 'received:cx': 0.97; 'email addr:yahoo.co.jp': 0.99; 'skip:\x1b
> 80': 0.99; 'from:addr:yahoo.co.jp': 1.00; 'from:charset:iso-2022-jp':
> 1.00; 'skip:\x1b 60': 1.00; 'skip:\x1b 30': 1.00; 'skip:\x1b 20':
> 1.00; 'skip:\x1b 50': 1.00; 'subject:$': 1.00; 'received:210': 1.00;
> 'charset:iso-2022-jp': 1.00; 'subject:\x1b$': 1.00;
> 'subjectcharset:iso-2022-jp': 1.00
On 2005 Oct 25, at 8:37, <FreeMJ at HotPop.com> wrote:
> How? Technically speaking, what could your SpamBayes installation be
> doing differently? These are ALL ham words, so how is it that your
> e-mail could be classifying all of this as Spam? If it is, I suspect
> you're losing a lot of legitimate e-mail with it.
>
> FMJ
>
> -----Original Message-----
> From: Ken Gordon [mailto:ksg at telusplanet.net]
> Sent: Monday, October 24, 2005 8:58 PM
> To: FreeMJ at HotPop.com
> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to
> HandleEmbeddedImages)
>
> My installation of SpamBayes catches nearly all of these. I don't see
> one a month outside of the Spam folder.
>
> ---
> Ken Gordon
> (780) 628-2758
> http://www.wolfe-gordon.ca
> On 2005 Oct 24, at 20:18, <FreeMJ at HotPop.com> wrote:
>
>> Hi Tony,
>> The problem is, they keep changing the meaningless text at the bottom
>> of the e-mail all the time, to confuse the Spam filter. They're
>> picking Hammy words. And, as you can see, it's a highly effective
>> technique. In other words, NONE of the "Tokens" should actually be
>> "Significant", it's the image that needs to be scored in this case.
>> Here's the spambayes clues for one of the e-mails:
>>
>> Combined Score: 3% (0.0330173)
>> Internal ham score (*H*): 0.999976
>> Internal spam score (*S*): 0.0660102
>>
>> # ham trained on: 14237
>> # spam trained on: 20138
>>
>> 150 Significant Tokens
>> token spamprob #ham #spam
>> 'sender:no real name:2**0' 0.0277535 2187 88
>> 'dismissed' 0.0374933 314 17
>> 'raising' 0.0417704 313 19
>> 'lives' 0.0580962 1012 88
>> 'ill' 0.0613924 1084 100
>> 'said' 0.0677803 6498 668
>> 'two' 0.08226 5200 659
>> 'put' 0.0828439 2632 336
>> 'were' 0.0845653 6094 796
>> 'recalled' 0.0862187 92 12
>> 'town' 0.0883783 600 82
>> 'being' 0.0894639 4312 599
>> 'letter' 0.093344 1595 232
>> 'unless' 0.0960663 687 103
>> 'stephan' 0.0968154 15 2
>> 'face' 0.0986506 1397 216
>> 'who' 0.0991493 8031 1250
>> 'knows' 0.102049 574 92
>> 'anyone' 0.104976 1828 303
>> 'them' 0.106325 4690 789
>> 'think' 0.107446 3584 610
>> 'keep' 0.109385 2517 437
>> 'him' 0.111552 2631 467
>> 'suspicions' 0.113796 40 7
>> 'went' 0.11401 1331 242
>> 'sound' 0.116592 596 111
>> 'care' 0.117491 1244 234
>> 'going' 0.119623 3503 673
>> 'sort' 0.119677 511 98
>> 'his' 0.119861 5717 1101
>> 'remained' 0.11998 271 52
>> 'heavily' 0.123551 232 46
>> 'last' 0.126157 5241 1070
>> 'subject:: ' 0.134951 9110 2010
>> 'voice' 0.135891 644 143
>> 'walk' 0.140296 339 78
>> 'everyone' 0.140502 1225 283
>> 'whatever' 0.141645 618 144
>> 'overdosed' 0.142155 48 11
>> 'mother' 0.144908 510 122
>> 'way' 0.146154 3458 837
>> 'was' 0.146612 8939 2172
>> 'would' 0.146893 7679 1870
>> 'but' 0.14865 8435 2083
>> 'past' 0.155513 1932 503
>> 'duty' 0.15756 326 86
>> 'been' 0.158577 6937 1849
>> 'away' 0.159247 1632 437
>> 'soon' 0.16154 1021 278
>> 'header:In-Reply-To:1' 0.162139 1791 490
>> 'made' 0.163602 3467 959
>> 'true' 0.164161 566 157
>> 'too' 0.164462 2199 612
>> 'then' 0.167186 3519 999
>> 'road' 0.169212 459 132
>> 'covington' 0.170591 18 5
>> 'firmly' 0.171729 69 20
>> 'received' 0.172468 1646 485
>> 'yes' 0.17276 275 81
>> 'other' 0.174723 6686 2002
>> 'offered' 0.177462 702 214
>> 'saw' 0.178119 738 226
>> 'might' 0.184601 2399 768
>> 'hotel' 0.185114 203 65
>> 'thought' 0.186457 1287 417
>> 'her' 0.187192 2831 922
>> 'indeed' 0.18721 191 62
>> 'lie' 0.188538 165 54
>> 'filled' 0.188682 329 108
>> 'assorted' 0.198662 32 11
>> 'intent' 0.199592 596 210
>> 'manner' 0.200765 192 68
>> 'second' 0.203991 1311 475
>> 'let' 0.207891 1835 681
>> 'much' 0.210328 3345 1260
>> 'back' 0.211425 3207 1216
>> 'place' 0.214507 1704 658
>> 'out' 0.216398 6503 2540
>> 'little' 0.218176 2273 897
>> 'within' 0.218497 1940 767
>> 'occupied' 0.218989 56 22
>> 'never' 0.222876 2224 902
>> 'take' 0.223351 4101 1668
>> 'subject:-' 0.223886 2564 1046
>> 'find' 0.224822 2482 1018
>> 'play' 0.230279 518 219
>> 'skip:n 10' 0.233772 2561 1105
>> 'eyes' 0.234231 294 127
>> 'that' 0.245614 11155 5137
>> 'thoughts' 0.250399 193 91
>> 'observed' 0.252899 109 52
>> 'not' 0.253605 9451 4542
>> 'have' 0.260054 10350 5145
>> 'myself' 0.268888 281 146
>> 'with' 0.272839 10712 5685
>> 'skip:r 10' 0.274264 4752 2540
>> 'look' 0.276317 1963 1060
>> 'can' 0.286752 7254 4125
>> 'guided' 0.29442 24 14
>> 'all' 0.300499 8283 5033
>> 'resign' 0.304561 39 24
>> 'contracts' 0.313223 163 105
>> 'subject:Alert' 0.322897 61 41
>> 'upon' 0.326586 853 585
>> 'skip:i 10' 0.332672 4717 3326
>> 'for' 0.339583 12494 9087
>> 'topics' 0.371008 114 95
>> 'the' 0.371613 13338 11157
>> 'above' 0.380529 678 589
>> 'header:Return-Path:1' 0.635635 6219 15346
>> 'consults' 0.695316 3 10
>> 'comparative' 0.728703 17 65
>> 'earnest' 0.747547 24 101
>> 'friendship' 0.796906 6 34
>> 'blush' 0.797234 13 73
>> 'skip:7 70' 0.805302 5 30
>> 'expedition' 0.825248 9 61
>> 'from:addr:g.wcvbss' 0.844828 0 1
>> 'from:addr:netnitco.net' 0.844828 0 1
>> 'from:name:raymond goins' 0.844828 0 1
>> 'lensalizarin' 0.844828 0 1
>> "m'scorset" 0.844828 0 1
>> 'message-id:@icsp.net' 0.844828 0 1
>> 'ownthat' 0.844828 0 1
>> 'prominents' 0.844828 0 1
>> 'roadsthat' 0.844828 0 1
>> 'sender:addr:athenet.net' 0.844828 0 1
>> 'sender:addr:h.nnq' 0.844828 0 1
>> 'subject:< ' 0.844828 0 1
>> 'subject:Stiles' 0.844828 0 1
>> 'totrue' 0.844828 0 1
>> 'virus:src="cid:' 0.888282 111 1250
>> 'congenial' 0.905802 5 70
>> 'taters' 0.907976 1 16
>> 'skip:7 90' 0.908163 0 2
>> 'header:Received:2' 0.914966 886 13487
>> 'diem' 0.92631 3 56
>> 'subject:CBXC' 0.949438 0 4
>> 'rotund' 0.952904 1 33
>> 'blushingly' 0.958716 0 5
>> 'refolding' 0.969799 0 7
>> 'egress' 0.970088 1 53
>> 'to:name:freemj' 0.988432 0 19
>> 'septennial' 0.990405 0 23
>> 'veal' 0.993066 0 32
>> 'youll' 0.993469 0 34
>> 'subject:Stock' 0.99571 0 52
>> 'casteth' 0.995868 0 54
>> 'cutlet' 0.996894 0 72
>> 'to:addr:hotpop.com' 0.997792 23 14803
>>
>> Message Stream
>> Return-Path: <H.jykqli at valkyrie.net>
>> Received: from 38.113.3.52 (unknown [200.107.173.172])
>> by mx1.hotpop.com (Postfix) with SMTP
>> id 5B8A0E8304; Sun, 23 Oct 2005 23:49:29 +0000 (UTC)
>> Received: from spellbound.gape.jeffersonian.gauguin.es
>> ([200.107.173.172]
>> helo=scatterbrain.mail.elknet.net) by smtp9.bt.com with esmtp
>> id 0X162p-8865LL-80; Mon, 24 Oct 2005 01:48:41 +0100
>> Message-Id: <8927397790.37444460700 at icsp.net>
>> Sender: H.nnq at athenet.net
>> Date: Sun, 23 Oct 2005 20:42:41 -0400
>> In-Reply-To: Your message of "Sun, 23 Oct 2005 20:46:41 -0400."
>> <98802417987115.YV37184 at joel.renaissance.arden.net>
>> From: "Raymond Goins" <G.wcvbss at netnitco.net>
>> To: "Freemj" <freemj at hotpop.com>
>> Subject: Fwd: Stock - Alert-CBXC< Neil Stiles
>> MIME-Version: 1.0
>> Content-Type: multipart/related;
>> boundary="--ZZR8PVzcRDTpf2Pu68MQiz"
>> X-HotPOP-Delivered-To: freemj at hotpop.com
>>
>>
>> negligiblestymie breakwatergrist m'scorset
>>
>>
>>
>> We went to the triumph comparative at egress diem then a mouldy sort
>> of establishment have my place so I blushingly offered to resign it
>> The septennial who made as much of my going away as if I were going
>> to China received me as an was dismissed and other topics occupied us
>> he remained so seldom raising
>> his eyes unless to
>> true Rosanne was suspicions arose within me that it was an ill
>> assorted friendship that he never thought of being observed by anyone
>> but was so intent upon her and upon his ownthat I received soon
>> recalled me to myself and put me in the road back to the hotel I was
>> so filled with the play and with the past for it was in a manner
>> Everyone who knows you consults with you and is guided by you Stephan
>> but on second thoughts I shall keep him to take care of me
>> and refolding the letter it would be insupportable to me to think of
>> I am in earnest at last so youll soon have to arrange our contracts
>> and to bind us firmly to them been overdosed with taters I commanded
>> him in my deepest voice to order a veal cutlet and potatoes Yes I am
>> on an expedition of duty My mother lives a little way out of town and
>> the roadsthat I received soon recalled me to myself and put me in the
>> road back to the hotel for I saw a faint blush in her face you would
>> have let me find it out for myself that would not lie too heavily
>> upon her purse and to do my duty in it whatever it might be and the
>> prominents walk and the congenial sound of the rotund casteth
>> hovering above them all
>> 7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg7Mik
>> U
>> R
>> q
>> reWfg7M6dwtJ4t1Fxn
>> as he can look at me out of his two eyes Is he indeed said Mr
>> Covington
>>
>> <HTML><HEAD>
>> <META http-equiv=Content-Type content="text/html;
>> charset=windows-1252"> <TITLE>lensalizarin impregnatecost</TITLE>
>> </HEAD> <BODY> <TABLE BORDER="0" CELLPADDING="0" CELLSPACING="0">
>> <TR><TD><font></font><font></font>
>> <BR><STRONG></STRONG><IMG
>> SRC="cid:lTN1QnT11CtJIk8H6J5X7INGgMff2pS at prairieweb.com" border="0"
>> ALT="negligiblestymie breakwatergrist m'scorset">
>> <BR><STRONG></STRONG><font></font><FONT face="Verdana"
>> size=1><FONT></FONT></font></TD></TR><TR><TD><FONT
>> size=1><BR><BR><font></font><STRONG></STRONG>We went to the triumph
>> comparative at egress diem then a mouldy sort of
>> establishment<BR>have my place so I blushingly offered to resign it
>> <STRONG></STRONG><STRONG></STRONG>The septennial who made as much of
>> my going away as if I were going to China received me as an<BR>was
>> dismissed and other topics occupied us he remained so seldom raising
>> his eyes unless to</FONT></TD></TR><TR><TD><FONT size=1>true Rosanne
>> was suspicions arose within me that it was an ill assorted
>> friendship <BR>that he never thought of being observed by anyone but
>> was so intent upon her and upon his own<FONT
>> SIZE=2></FONT><font></font>that I received soon recalled me to
>> myself and put me in the road back to the hotel<BR>I was so filled
>> with the
>> play and with the past for it was in a manner<BR>Everyone who
>> knows you
>> consults with you and is guided by you Stephan <BR>but on second
>> thoughts I shall keep him to take care of me
>> </FONT></TD></TR><TR><TD><FONT size=1>and refolding the letter it
>> would be
>> insupportable to me to think of <BR>I am in earnest at last so
>> youll soon
>> have to arrange our contracts and to bind us firmly to
>> them<font></font><BR>been overdosed with taters I commanded him in
>> my deepest voice to order a veal cutlet and potatoes<BR>Yes I am on
>> an expedition of duty My mother lives a little way out of town and
>> the roads<font></font><FONT SIZE=2></FONT>that I received soon
>> recalled me to myself and put me in the road back to the
>> hotel<BR>for I saw a faint blush in her face you would have let me
>> find it out for myself <font></font>that would not lie too heavily
>> upon her purse and to do my duty in it whatever it might be <BR>and
>> the prominents walk and the congenial sound of the rotund casteth
>> hovering above them all
>> <BR>7iVHrKDJTsgBJsJa4Nezv5RgkNpN5NYq6gowYZF0z3De6QLplaiyWM4rm4wSXsXeg
>> 7
>> M
>> ikURq
>> reWfg7M6dwtJ4t1Fxn<BR>as he can look at me out of his two eyes Is he
>> indeed said Mr Covington </FONT></TD></TR></TABLE> </BODY> </HTML>
>>
>> All Message Tokens
>> 187 unique tokens
>>
>> 'above'
>> 'all'
>> 'and'
>> 'anyone'
>> 'arose'
>> 'arrange'
>> 'assorted'
>> 'away'
>> 'back'
>> 'been'
>> 'being'
>> 'bind'
>> 'blush'
>> 'blushingly'
>> 'but'
>> 'can'
>> 'care'
>> 'casteth'
>> 'cc:none'
>> 'china'
>> 'commanded'
>> 'comparative'
>> 'congenial'
>> 'consults'
>> 'content-type:text/plain'
>> 'contracts'
>> 'covington'
>> 'cutlet'
>> 'deepest'
>> 'diem'
>> 'dismissed'
>> 'duty'
>> 'earnest'
>> 'egress'
>> 'everyone'
>> 'expedition'
>> 'eyes'
>> 'face'
>> 'faint'
>> 'filled'
>> 'find'
>> 'firmly'
>> 'for'
>> 'friendship'
>> 'from:addr:g.wcvbss'
>> 'from:addr:netnitco.net'
>> 'from:name:raymond goins'
>> 'going'
>> 'guided'
>> 'have'
>> 'header:Date:1'
>> 'header:From:1'
>> 'header:In-Reply-To:1'
>> 'header:MIME-Version:1'
>> 'header:Message-Id:1'
>> 'header:Received:2'
>> 'header:Return-Path:1'
>> 'header:Subject:1'
>> 'header:To:1'
>> 'heavily'
>> 'her'
>> 'him'
>> 'his'
>> 'hotel'
>> 'hovering'
>> 'ill'
>> 'indeed'
>> 'intent'
>> 'keep'
>> 'knows'
>> 'last'
>> 'lensalizarin'
>> 'let'
>> 'letter'
>> 'lie'
>> 'little'
>> 'lives'
>> 'look'
>> "m'scorset"
>> 'made'
>> 'manner'
>> 'message-id:@icsp.net'
>> 'might'
>> 'mother'
>> 'mouldy'
>> 'much'
>> 'myself'
>> 'never'
>> 'not'
>> 'observed'
>> 'occupied'
>> 'offered'
>> 'order'
>> 'other'
>> 'our'
>> 'out'
>> 'overdosed'
>> 'ownthat'
>> 'past'
>> 'place'
>> 'play'
>> 'potatoes'
>> 'prominents'
>> 'purse'
>> 'put'
>> 'raising'
>> 'recalled'
>> 'received'
>> 'refolding'
>> 'remained'
>> 'reply-to:none'
>> 'resign'
>> 'road'
>> 'roadsthat'
>> 'rosanne'
>> 'rotund'
>> 'said'
>> 'saw'
>> 'second'
>> 'seldom'
>> 'sender:addr:athenet.net'
>> 'sender:addr:h.nnq'
>> 'sender:no real name:2**0'
>> 'septennial'
>> 'shall'
>> 'skip:7 70'
>> 'skip:7 90'
>> 'skip:b 10'
>> 'skip:e 10'
>> 'skip:i 10'
>> 'skip:n 10'
>> 'skip:r 10'
>> 'soon'
>> 'sort'
>> 'sound'
>> 'stephan'
>> 'subject: '
>> 'subject: - '
>> 'subject:-'
>> 'subject:: '
>> 'subject:< '
>> 'subject:Alert'
>> 'subject:CBXC'
>> 'subject:Fwd'
>> 'subject:Neil'
>> 'subject:Stiles'
>> 'subject:Stock'
>> 'suspicions'
>> 'take'
>> 'taters'
>> 'that'
>> 'the'
>> 'them'
>> 'then'
>> 'think'
>> 'thought'
>> 'thoughts'
>> 'to:2**0'
>> 'to:addr:freemj'
>> 'to:addr:hotpop.com'
>> 'to:name:freemj'
>> 'too'
>> 'topics'
>> 'totrue'
>> 'town'
>> 'triumph'
>> 'true'
>> 'two'
>> 'unless'
>> 'upon'
>> 'veal'
>> 'virus:src="cid:'
>> 'voice'
>> 'walk'
>> 'was'
>> 'way'
>> 'went'
>> 'were'
>> 'whatever'
>> 'who'
>> 'with'
>> 'within'
>> 'would'
>> 'x-mailer:none'
>> 'yes'
>> 'you'
>> 'youll'
>>
>> -----Original Message-----
>> From: spambayes-bounces at python.org
>> [mailto:spambayes-bounces at python.org] On Behalf Of Tony Meyer
>> Sent: Sunday, October 23, 2005 9:43 PM
>> To: <FreeMJ at HotPop.com>
>> Cc: spambayes at python.org
>> Subject: Re: [Spambayes] Inspecting images (was: SpamBayes to
>> HandleEmbeddedImages)
>>
>>> Something really needs to be done about this embedded image Spam.
>>> Honestly,
>>> SpamBayes appears to be ineffective against all these images,
>>
>> Can you post an example of a message that is incorrectly classified,
>> *with the spambayes clues* for the message? The Outlook plug-in
>> provides this via the "Show Clues for this Message" item in the
>> SpamBayes menu.
>>
>> [...]
>>> I'm sure OCR isn't the only way, but the words are there in plain
>>> view. It seems like the obvious way to resolve this.
>>
>> Obvious isn't always best. One of the tenets here is "stupid beats
>> smart" - I think doing some sort of OCR on images would fall into the
>> "smart" category, and generating simple tokens from the images would
>> fall into the "stupid" category and be more successful. Just my
>> opinion, of course, but that's what I'd test if I had time (perhaps
>> over the (southern hemisphere) summer...or maybe I can convince one
>> of my employers that this would be worth doing in paid time).
>>
>>> SpamBayes has been such a great program for me and my colleges,
>>> family and friends. I can only hope that the project sees fit to
>>> resolve this soon.
>>
>> It's not really a case of "seeing fit" - the issue is that the
>> developers are very short on time at the moment (contributions have
>> always been, and always will be, welcome) and, in addition, this is a
>> complex problem.
>>
>> =Tony.Meyer
>>
>> --
>> Please always include the list (spambayes at python.org) in your
>> replies (reply-all), and please don't send me personal mail about
>> SpamBayes.
>> http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this.
>>
>>
>> _______________________________________________
>> SpamBayes at python.org
>> http://mail.python.org/mailman/listinfo/spambayes
>> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>>
>>
>> _______________________________________________
>> SpamBayes at python.org
>> http://mail.python.org/mailman/listinfo/spambayes
>> Check the FAQ before asking: http://spambayes.sf.net/faq.html
>>
>
>
_______________________________________________
SpamBayes at python.org
http://mail.python.org/mailman/listinfo/spambayes
Check the FAQ before asking: http://spambayes.sf.net/faq.html
More information about the SpamBayes
mailing list