From davebeaver at cinci.rr.com Mon Feb 1 02:47:53 2010 From: davebeaver at cinci.rr.com (Dave Beaver) Date: Sun, 31 Jan 2010 20:47:53 -0500 Subject: [Spambayes] HELP IN TRAINING Message-ID: I AM USING SPAMBAYES WITH OUTLOOK EXPRESS. WHEN I OPEN SPAMBAYES TO FOLLOW THE "REVIEW MESSAGES" LINK FOR TRAINING PURPOSES I GET AN INTERNET PAGE THAT SAYS: Internet Explorer cannot display the webpage AND WHEN I TRY OTHER THINGS I GET THE SAME MESSAGE. HOW COME? -------------- next part -------------- An HTML attachment was scrubbed... URL: From 47rc at bellsouth.net Tue Feb 2 11:52:32 2010 From: 47rc at bellsouth.net (Ray) Date: Tue, 2 Feb 2010 05:52:32 -0500 Subject: [Spambayes] windows 7 Message-ID: <7128229258E04EAF806454FCBD5CD0AE@Raydesk> I am running outlook 2003 and windows Pro 7 will the 1.0.4 version work in windows 7? I've been using spam bayes for years and upgraded a new box to windows 7 and would love to run spam bayes again.. Thanks in advance Ray Cook PLEASE NOTE: No trees were destroyed in the sending of this contaminant free message. We do concede, a significant number of electrons may have been inconvenienced or displaced. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amedee at amedee.be Wed Feb 3 00:03:32 2010 From: amedee at amedee.be (Amedee Van Gasse) Date: Wed, 03 Feb 2010 00:03:32 +0100 Subject: [Spambayes] How to feed a spam corpus to a MTA? Message-ID: <4B68AF44.1020804@amedee.be> I have a question that's not directly related to Spambayes but to antispam in general. But I know that there are a lot of people on this list who could point me in the right direction. I'm writing a paper comparing several antispam techniques. I would like to make some measurements too. To do that, I'm planning on sending the same set of messages (several thousand, maybe lots more) trough the filters and see how they perform. I have a practical question. It won't be a problem to feed spam to a filter that works on the mail itself, but how do I test with properties of the SMTP session, like the original MTA? The original "metadata" isn't saved in an email in a file. I could feed the spam corpus to a postfix, who will behave as a new MTA, but then it will be the same MTA for all messages. Or is this just not possible? Will I have to settle with live testing for tests like DNSBL, greylisting,... and test the spam corpus only on things like spamassassin, spambayes,... ? -- Amedee From dale at BriannasSaladDressing.com Wed Feb 3 14:55:30 2010 From: dale at BriannasSaladDressing.com (Dale Schroeder) Date: Wed, 03 Feb 2010 07:55:30 -0600 Subject: [Spambayes] HELP IN TRAINING In-Reply-To: References: Message-ID: <4B698052.5050701@BriannasSaladDressing.com> Dave, I've found that to usually mean that the leading "http://" has not been entered into the address bar. For some reason, IE won't work without it. localhost:8880/ does not work, but http://localhost:8880/ does. I prefer Firefox, and it does not have this restriction. I don't know what Chrome, Safari, or Opera require. HTH, Dale On 01/31/2010 7:47 PM, Dave Beaver wrote: > I AM USING SPAMBAYES WITH OUTLOOK EXPRESS. > WHEN I OPEN SPAMBAYES TO FOLLOW THE "REVIEW MESSAGES" LINK FOR > TRAINING PURPOSES I GET AN INTERNET PAGE THAT SAYS: > > > Internet Explorer cannot display the webpage > > AND WHEN I TRY OTHER THINGS I GET THE SAME MESSAGE. > HOW COME? > > > _______________________________________________ > SpamBayes at python.org > http://mail.python.org/mailman/listinfo/spambayes > Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes > Check the FAQ before asking: http://spambayes.sf.net/faq.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ocean at cobaltnight.com Thu Feb 4 14:58:10 2010 From: Ocean at cobaltnight.com (Ocean) Date: Thu, 4 Feb 2010 08:58:10 -0500 Subject: [Spambayes] Problems with classifying as spam Message-ID: <000301caa5a2$1d082b00$57188100$@com> In addition to the startup problems, Spambayes is having problems marking messages as spam. As an example, I received this email: ------------------------------ Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall**** Body: ***Discount_Viagra_VXPL_Percocet*_Adderall****! http://kashertqdum17.com/ ------------------------------ That's it. The only text in the body of the message is that URL link. There are two issues I see showing up: 1. The subject and link text isn't being parsed properly. Nowhere in the spam clues are the words "viagra", "percocet", or "adderall" showing up. The spam token involving the subject is "'subject:****'" So, not only is SpamBayes not treating the underscores as word seperators, but it's not even getting to the words, because it looks like it's getting choked up on the asterisks. 2. I've got a *lot* of tokens showing up in the Spam Clues that are nowhere in the email itself. I'm guessing that Spambayes is actually going to that link and processing what's on the page, but if so, that's a big problem. First of all, it gives the spammers more flexibility in trying to bypass spambayes. And second, if it's following links, then it's confirming to the spammers that my email address is valid. That's a huge no-no. Spambayes should not be following links at all, but should only look in the message itself. From jsp at PKC.com Thu Feb 4 15:29:16 2010 From: jsp at PKC.com (Jesse Pelton) Date: Thu, 4 Feb 2010 09:29:16 -0500 Subject: [Spambayes] Problems with classifying as spam References: <000301caa5a2$1d082b00$57188100$@com> Message-ID: <16E2027582CDB74180896CDB4B8CC1F90411CDA2@PKCVT01.pkc.com> SpamBayes doesn't follow links (see http://spambayes.sourceforge.net/faq.html#will-show-spam-clues-notify-a-spammer-that-i-opened-their-message for a tangentially related discussion), but it does process message headers. Lots of good information in there that you might think came from a Web site. Unless you're willing to dive into the the code and the math, I'd caution against trying to second-guess SpamBayes. You're going to want it to behave rationally, and it doesn't (at least at the level you're looking at); it behaves statistically. That's why the FAQ (http://spambayes.svn.sourceforge.net/viewvc/spambayes/trunk/spambayes/Outlook2000/docs/troubleshooting.html#Messages_have_incorrect_or_unexpected) suggests sending all the Spam clues to the list when trying to understand why a given message isn't classified as expected. -----Original Message----- From: spambayes-bounces+jsp=pkc.com at python.org on behalf of Ocean Sent: Thu 2/4/2010 8:58 AM To: spambayes at python.org Subject: [Spambayes] Problems with classifying as spam In addition to the startup problems, Spambayes is having problems marking messages as spam. As an example, I received this email: ------------------------------ Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall**** Body: ***Discount_Viagra_VXPL_Percocet*_Adderall****! http://kashertqdum17.com/ ------------------------------ That's it. The only text in the body of the message is that URL link. There are two issues I see showing up: 1. The subject and link text isn't being parsed properly. Nowhere in the spam clues are the words "viagra", "percocet", or "adderall" showing up. The spam token involving the subject is "'subject:****'" So, not only is SpamBayes not treating the underscores as word seperators, but it's not even getting to the words, because it looks like it's getting choked up on the asterisks. 2. I've got a *lot* of tokens showing up in the Spam Clues that are nowhere in the email itself. I'm guessing that Spambayes is actually going to that link and processing what's on the page, but if so, that's a big problem. First of all, it gives the spammers more flexibility in trying to bypass spambayes. And second, if it's following links, then it's confirming to the spammers that my email address is valid. That's a huge no-no. Spambayes should not be following links at all, but should only look in the message itself. _______________________________________________ SpamBayes at python.org http://mail.python.org/mailman/listinfo/spambayes Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ocean at cobaltnight.com Thu Feb 4 15:40:36 2010 From: Ocean at cobaltnight.com (Ocean) Date: Thu, 4 Feb 2010 09:40:36 -0500 Subject: [Spambayes] Problems with classifying as spam In-Reply-To: <16E2027582CDB74180896CDB4B8CC1F90411CDA2@PKCVT01.pkc.com> References: <000301caa5a2$1d082b00$57188100$@com> <16E2027582CDB74180896CDB4B8CC1F90411CDA2@PKCVT01.pkc.com> Message-ID: <000801caa5a8$03de27a0$0b9a76e0$@com> > > From: Jesse Pelton [mailto:jsp at PKC.com] > Sent: Thursday, February 04, 2010 9:29 AM > To: Ocean; spambayes at python.org > Subject: RE: [Spambayes] Problems with classifying as spam > > SpamBayes doesn't follow links (see http://spambayes.sourceforge.net/faq. > html#will-show-spam-clues-notify-a-spammer-that-i-opened-their-message for > a tangentially related discussion), but it does process message headers. > Lots of good information in there that you might think came from a Web site. > > Unless you're willing to dive into the the code and the math, I'd caution > against trying to second-guess SpamBayes. You're going to want it to > behave rationally, and it doesn't (at least at the level you're looking > at); it behaves statistically. That's why the FAQ (http://spambayes.svn. > sourceforge. > net/viewvc/spambayes/trunk/spambayes/Outlook2000/docs/troubleshooting. > html#Messages_have_incorrect_or_unexpected) suggests sending all the Spam > clues to the list when trying to understand why a given message isn't > classified as expected. > Okay, then here's the nitty gritty: ------------------------------ Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall**** Body: ***Discount_Viagra_VXPL_Percocet*_Adderall****! http://kashertqdum17.com/ ------------------------------ As I said before, that's all that's in the message, nothing else. And here are the spam clues: -------------------------------------------- Combined Score: 0% (3.88578e-16) Internal ham score (*H*): 1 Internal spam score (*S*): 7.77156e-16 # ham trained on: 11546 # spam trained on: 2608 The last time this message was classified or trained: This message had not been filtered. This message had not been trained. 150 Significant Tokens token spamprob #ham #spam 'subject:****' 0.000567537 396 0 'submitting' 0.00233281 96 0 'default' 0.00251537 89 0 'lab' 0.00286807 78 0 '(from' 0.00360288 62 0 'skip:* 40' 0.00391645 57 0 '9:00' 0.0042898 52 0 'to:addr:ocean' 0.00511949 904 1 'listings' 0.00542823 41 0 'report:' 0.00542823 41 0 'engineers' 0.00819672 27 0 'lately,' 0.00920245 24 0 'daniel' 0.00964836 3204 7 'flaw' 0.0104895 21 0 'geek' 0.0104895 21 0 'liquid' 0.0104895 21 0 'detect' 0.0115681 19 0 'tech.' 0.0115681 19 0 'viruses' 0.0115681 19 0 'binary' 0.0121951 18 0 'textbook' 0.012894 17 0 'finalizing' 0.0136778 16 0 'greg' 0.0180723 12 0 'yahoo' 0.0180723 12 0 'providers.' 0.0196507 11 0 'adults' 0.0238095 9 0 "amazon's" 0.0238095 9 0 '2012' 0.0266272 8 0 'music.' 0.0266272 8 0 'textbooks' 0.0266272 8 0 '11:33' 0.0302013 7 0 '1:01' 0.0302013 7 0 '9:53' 0.0302013 7 0 'buys' 0.0302013 7 0 'suggests' 0.0302013 7 0 'html' 0.0343728 255 2 '2:05' 0.0348837 6 0 'compression' 0.0348837 6 0 'formats' 0.0348837 6 0 'teenagers' 0.0348837 6 0 'adapt' 0.0412844 5 0 "apple's" 0.0412844 5 0 'h.264' 0.0412844 5 0 'kingdom,' 0.0412844 5 0 'times)' 0.0412844 5 0 'page.' 0.0444174 481 5 'hours,' 0.0480985 92 1 'to:addr:cobaltnight.com' 0.0496998 1274 15 '11:37' 0.0505618 4 0 'explorer' 0.0507201 87 1 'indicated' 0.0512533 168 2 'amazon' 0.0524349 84 1 'tech' 0.0543494 389 5 'sign' 0.0553702 608 8 'payment' 0.0573002 1169 16 '1:12' 0.0652174 3 0 'globally.' 0.0652174 3 0 'u.s.,' 0.0652174 3 0 'interface' 0.0657778 66 1 '2009' 0.0678561 672 11 'center' 0.0698918 651 11 'electronic' 0.0758594 542 10 "aren't" 0.0871721 95 2 'microscopic' 0.0918367 2 0 'start-up' 0.0918367 2 0 'surveillance' 0.0918367 2 0 'teens,' 0.0918367 2 0 'turf' 0.0918367 2 0 'sales' 0.0930142 736 17 'still' 0.0972529 1728 42 'relevant' 0.0974632 84 2 'charge' 0.0981746 368 9 'test' 0.0985371 407 10 'from:no real name:2**0' 0.104008 2519 66 'information' 0.110498 3031 85 "that's" 0.112004 809 23 'requests' 0.11383 174 5 '100' 0.11434 276 8 'post' 0.11434 276 8 'support' 0.114415 1201 35 "doesn't" 0.116645 471 14 'data' 0.11763 699 21 'pst' 0.119211 67 2 'be?' 0.120093 34 1 'touch' 0.12132 226 7 'service' 0.121935 2010 63 'deal.' 0.126626 32 1 'job' 0.130755 178 6 'actual' 0.13257 349 12 'web' 0.133212 952 33 'tom' 0.136878 113 4 'needs' 0.137534 390 14 'team' 0.140032 681 25 'rather' 0.141804 296 11 'leaving' 0.150599 151 6 'licenses' 0.151311 26 1 'working' 0.152874 639 26 'wednesday' 0.154298 171 7 '"protected' 0.155172 1 0 '787' 0.155172 1 0 'armstrong' 0.155172 1 0 'blogging' 0.155172 1 0 'declines' 0.155172 1 0 'ina' 0.155172 1 0 'mode"' 0.155172 1 0 'nsa' 0.155172 1 0 'pew' 0.155172 1 0 'reprieve' 0.155172 1 0 'rumor' 0.155172 1 0 'another' 0.155232 748 31 'relatively' 0.158807 48 2 'technology' 0.161453 277 12 'going' 0.163466 1519 67 'shows' 0.165458 269 12 'agreement' 0.165465 202 9 'ago' 0.166308 112 5 'using' 0.170597 1357 63 'provides' 0.178445 164 8 'adding' 0.179077 143 7 'running' 0.179577 365 18 'digital' 0.180445 202 10 'helped' 0.181042 61 3 'onto' 0.182937 80 4 'survey' 0.186745 78 4 'ever.' 0.187926 20 1 'touch.' 0.187926 20 1 'software' 0.191964 504 27 'skip:n 10' 0.19449 1321 72 'decides' 0.195819 19 1 'group' 0.197292 271 15 'sure,' 0.199182 72 4 'improving' 0.200943 36 2 'has' 0.202357 3997 229 'research' 0.20242 245 14 'figures' 0.796501 12 11 'crowd' 0.828161 5 6 'war' 0.829187 15 17 'police' 0.834856 17 20 'briefed' 0.844828 0 1 'chambers,' 0.844828 0 1 'investigates' 0.844828 0 1 'kindle' 0.844828 0 1 'nash' 0.844828 0 1 'upfront' 0.844828 0 1 'boeing' 0.84654 1 2 'ward' 0.84654 1 2 'from:addr:message.myspace.com' 0.908163 0 2 'message-id:@message.myspace.com' 0.908163 0 2 'chemists' 0.934783 0 3 'reportedly' 0.949438 0 4 Message Stream Received: from vlan195-30.azeronline.com ([88.151.195.30]) by mail.cobaltnight.com (CW Mail Server) with SMTP id MPR19711 for ; Thu, 04 Feb 2010 08:39:11 -0500 Received: from localhost (127.0.0.1) by vlan195-30.azeronline.com (88.151.195.30) with Microsoft SMTP Server id 8.0.685.24; Thu, 4 Feb 2010 17:38:57 +0400 From: < noreply at message.myspace.com> To: ocean at cobaltnight.com Subject: ***Discount_Viagra_VXPL_Percocet*_Adderall**** Date: Thu, 4 Feb 2010 17:38:57 +0400 MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: 8bit Message-ID: ***Discount_Viagra_VXPL_Percocet*_Adderall****! ***Discount_Viagra_VXPL_Percocet*_Adderall** **!