From hott.ebrandz at hotmail.com Fri May 5 07:23:09 2006 From: hott.ebrandz at hotmail.com (Rey Mysterio) Date: Fri, 05 May 2006 10:53:09 +0530 Subject: [spambayes-dev] Your link has been uploaded Message-ID: Dear Webmaster, My name is Anil and I have just gone through your site, and visited many pages. I have noticed that you have good content on the site. It would be better if we link to each other as reciprocal link place an important role in a search engine ranking algorithm. I have already placed a link to your site on the following webpage:- http://www.puritec.com/resources/water-cooler.cfm Your link details are here:-

Water Cooler - Plumbed in pure chilled filtered water coolers and fountains.

Kindly link back to our site with the following details: Title: Whole House Water Filter Desc: Enter our online store to buy the highest rated water filter and water purifier for your family. Your satisfaction is guaranteed. URL: http://www.puritec.com/ Our Link Code: Whole House Water Filter Enter our online store to buy the highest rated water filter and water purifier for your family. Your satisfaction is guaranteed. If you wish to have your details changed, we would be interested in modification as per you requirements. Please note that the link to your site will be active for 10 business days, if thereafter we do not detect a link to our site from your webpage, it will be assumed that you are not interested in reciprocal link and to be fair to our other link partners, we shall remove your link. Once you add our link, your link details will be permanent on our site. Thanks. _________________________________________________________________ All that you wanted to know about Ms Beautiful Lips http://server1.msn.co.in/Profile/katrina.asp From skip at pobox.com Sat May 6 21:33:44 2006 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 May 2006 14:33:44 -0500 Subject: [spambayes-dev] New urls Message-ID: <17500.64024.934136.263528@montanaro.dyndns.org> I'm still fiddling around with these spams that have a bunch of one-letter words hiding drugs for sale: V k I p A m G i R u A v V j A v L s I t U w M g X g A f N a A f X q C x I e A a L g I c S l followed by a url: http://www.prouceteir.com followed by some presumably benign text: physiolog resis comminute Phoeb ideologis not called for; local anesthetics were sufficient for the cleansing and suturing, followed by generous injections of antibiotics. The foreign objects had passed through their bodies, explained the chief doctor. I presume you mean bullets when you speak so reverently of foreign objects, said Krupkin in high dudgeon. He means bullets, confirmed Alex hoarsely in Russian. The retired I don't think there's much to grab onto in the benign text section, however the url tends to vary a lot and the domain name generally seems very new. For instance, according to whois, the above domain was created on April 28th. I received the spam it contained on April 30th. The others of this ilk I've looked at were also new domains. That suggests to me a couple possibilities: * look up the age of the domains via whois (preferably caching those lookups for a reasonable period - 90 days, one year?) * note whether or not you've seen the domain before * lookup (and cache) other information about the domain name - registrar, registrant, etc. The creation date currently seems the hardest to fake, though it's expensive to calculate and I suppose eventually the spammers will start creating their own registrars (if they haven't already) and back-date the information they provide. I suppose you could start tokenizing these one-letter runs as well and see if they contain embedded words: C x I e A a L g I c S l ==> CIALIS Thoughts? Anybody else seeing lots of this stuff sneak through as unsure? Skip From matt at mondoinfo.com Sat May 6 21:50:22 2006 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Sat, 6 May 2006 14:50:22 -0500 (CDT) Subject: [spambayes-dev] New urls In-Reply-To: <17500.64024.934136.263528@montanaro.dyndns.org> References: <17500.64024.934136.263528@montanaro.dyndns.org> Message-ID: <1146944278.82.1551@mint-julep.mondoinfo.com> > I'm still fiddling around with these spams that have a bunch of > one-letter > words hiding drugs for sale: > > V k I p A m G i R u A v > V j A v L s I t U w M g > X g A f N a A f X q > C x I e A a L g I c S l > > followed by a url: > > http://www.prouceteir.com > > followed by some presumably benign text: It took some training for me before my SpamBayes started to recognize those reliably, but it seems that my old hack to tokenize URL's IPs helps: Spambayes spam score 0.998 for message "Re: your VtAGRiA" Spambayes clues: 0.000 *H*, 0.997 *S* [. . .] 0.845 url-ip:222.52.1.11/32 0.845 url-ip:222.52.1/24 0.845 url-ip:222.52/16 [. . .] 0.976 url-ip:222/8 Excellent arguments about why doing that is a bad idea have been made, but it seems to work for me. Regards, Matt From skip at pobox.com Sun May 7 00:31:28 2006 From: skip at pobox.com (skip at pobox.com) Date: Sat, 6 May 2006 17:31:28 -0500 Subject: [spambayes-dev] New urls In-Reply-To: <1146944278.82.1551@mint-julep.mondoinfo.com> References: <17500.64024.934136.263528@montanaro.dyndns.org> <1146944278.82.1551@mint-julep.mondoinfo.com> Message-ID: <17501.9152.156295.432564@montanaro.dyndns.org> Matt> It took some training for me before my SpamBayes started to Matt> recognize those reliably, but it seems that my old hack to Matt> tokenize URL's IPs helps: ... This doesn't seem to be in the code base. 'Zat so? Skip From matt at mondoinfo.com Sun May 7 01:42:03 2006 From: matt at mondoinfo.com (Matthew Dixon Cowles) Date: Sat, 6 May 2006 18:42:03 -0500 (CDT) Subject: [spambayes-dev] New urls In-Reply-To: <17501.9152.156295.432564@montanaro.dyndns.org> References: <17500.64024.934136.263528@montanaro.dyndns.org> <1146944278.82.1551@mint-julep.mondoinfo.com> <17501.9152.156295.432564@montanaro.dyndns.org> Message-ID: <1146958334.67.1551@mint-julep.mondoinfo.com> Matt> It took some training for me before my SpamBayes started to Matt> recognize those reliably, but it seems that my old hack to Matt> tokenize URL's IPs helps: > This doesn't seem to be in the code base. 'Zat so? Yup! The patch is at: http://www.mondoinfo.com/tokenizerpatch.txt and the local cache I use it with is at: http://www.mondoinfo.com/dnscache.py There was some discussion of it here some time ago. It didn't seem to help on historical corpora, perhaps because spammers don't maintain their DNS for long. But on current spam it helps for me. I haven't experimented with breaking the IP up at anything other than byte boundaries. I also haven't looked at the related issue of whether four tokens for an exact match is optimal. Regards, Matt From tameyer at ihug.co.nz Sun May 7 08:25:36 2006 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun, 7 May 2006 18:25:36 +1200 Subject: [spambayes-dev] New urls In-Reply-To: <17500.64024.934136.263528@montanaro.dyndns.org> References: <17500.64024.934136.263528@montanaro.dyndns.org> Message-ID: <31882DC3-1525-4B0D-857C-B45B39AEAD90@ihug.co.nz> [Skip] > I'm still fiddling around with these spams that have a bunch of one- > letter > words hiding drugs for sale: > > V k I p A m G i R u A v > V j A v L s I t U w M g > X g A f N a A f X q > C x I e A a L g I c S l I will try your sf patch with newer mail soon, honest! :) [...] > I don't think there's much to grab onto in the benign text section, > however > the url tends to vary a lot and the domain name generally seems > very new. > For instance, according to whois, the above domain was created on > April > 28th. I received the spam it contained on April 30th. The others > of this > ilk I've looked at were also new domains. That suggests to me a > couple > possibilities: > > * look up the age of the domains via whois (preferably caching > those > lookups for a reasonable period - 90 days, one year?) > > * note whether or not you've seen the domain before > > * lookup (and cache) other information about the domain name - > registrar, registrant, etc. > > The creation date currently seems the hardest to fake, though it's > expensive > to calculate and I suppose eventually the spammers will start > creating their > own registrars (if they haven't already) and back-date the > information they > provide. One of the things on my to-do list is to store information like this in the ham & spam I archive so that these sorts of things can be tested with the 'traditional' tools. I have a script that does a bunch of DNS-based information gathering (SURBL lookups, DomainKey, SenderID, DNS blacklists - not the things you list above, but that wouldn't be that hard to add), and just need to figure out how to get fetchmail working properly (on OS X) so that the mail is retrieved and piped through it. If you create a patch for any of the above, I'd be happy to use it day-to-day and let you know what appears in the token database. > I suppose you could start tokenizing these one-letter runs as well > and see > if they contain embedded words: > > C x I e A a L g I c S l ==> CIALIS This seems a little too specific for me - there are lots of other ways to hide the rubbish letters apart from putting them in lower case. > Thoughts? Anybody else seeing lots of this stuff sneak through as > unsure? I see a few, although I have more problems with image spam (no successful patches there yet). =Tony.Meyer From nduncan at cqmail.net Wed May 17 18:49:51 2006 From: nduncan at cqmail.net (Nigel Duncan) Date: Wed, 17 May 2006 18:49:51 +0200 Subject: [spambayes-dev] Probable error in Windows FAQ Message-ID: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> I am prepared to suggest alterations to the Windows FAQ (at least the parts that affect Eudora), but first I want to have it agreed that I have found an important error. Essentially spam can only be RECEIVED mail from the POP server (or IMAP server if you happen to have one). Therefore an SMTP server can have nothing to do with the question. I removed all references to the SMTP server and port in my Eudora.ini file and everything seems to be working fine - but I have added a return receipt to this message to be on the safe side. Apart from this error (and another lesser one) I would try and develop a more user-friendly version as I gain experience (without getting into Outlook or Outlook Express, which I have but don't use) as I am used to writing and translating technical documents with emphasis on clarity for the user. Another point is the preferred file format - as public domain are you allergic to MS Word 2003? It should be easily convertible to html, but Word would be much easier for me to format. I am also sending this message to the developer's list as it is not 100% clear which is the right one for this case. ---------------------------------------------------------------- Nigel Duncan Jerez 4, P4, 4B E-28016 MADRID, Spain Tel: (0034) 91-350-4793 email: nduncan at cqmail.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20060517/67428f2d/attachment.htm From richie at entrian.com Wed May 17 23:27:24 2006 From: richie at entrian.com (Richie Hindle) Date: Wed, 17 May 2006 22:27:24 +0100 Subject: [spambayes-dev] Probable error in Windows FAQ In-Reply-To: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> References: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> Message-ID: Hi Nigel, > I am prepared to suggest alterations to the Windows FAQ (at least the > parts that affect Eudora), Thanks! All suggestions for improvements are gratefully received. In the case of the Eudora content, I personally don't use Eudora so I couldn't verify any new content - SpamBayes developers, do any of us use Eudora? > but first I want to have it agreed that I have found an important error. Ah, I'm sorry, but I don't think we can do that: > Essentially spam can only be RECEIVED mail from the POP server (or > IMAP server if you happen to have one). Therefore an SMTP server can > have nothing to do with the question. SpamBayes includes an SMTP proxy that can be used to train SpamBayes. By forwarding email to either a 'ham' address or a 'spam' address, you can train SpamBayes without using the web interface. It's explained in the FAQ, in point 4 of this question: http://spambayes.sourceforge.net/faq.html#is-there-a-high-level-summary-that-shows-how-spambayes-works > Another point is the preferred file format - as public domain are you > allergic to MS Word 2003? It should be easily convertible to html, > but Word would be much easier for me to format. I think most of us could read it using OpenOffice without triggering any nasty reactions. 8-) > I am also sending this message to the developer's list as it is not > 100% clear which is the right one for this case. I think spambayes-dev at python.org was appropriate - though you might want to send any suggestions to spambayes at python.org so that they can be reviewed by non-developer Eudora/SpamBayes users. -- Richie Hindle richie at entrian.com From tameyer at ihug.co.nz Wed May 17 23:39:54 2006 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu, 18 May 2006 09:39:54 +1200 Subject: [spambayes-dev] [Spambayes] Probable error in Windows FAQ In-Reply-To: References: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> Message-ID: <9CD4D8CA-8AC7-4F99-BF04-6501B0FFD7D3@ihug.co.nz> [Nigel] >> I am prepared to suggest alterations to the Windows FAQ (at least the >> parts that affect Eudora), [Richie] > Thanks! All suggestions for improvements are gratefully received. Indeed! > In the > case of the Eudora content, I personally don't use Eudora so I > couldn't > verify any new content - SpamBayes developers, do any of us use > Eudora? I have Eudora (Mac and Windows), and very occasionally test things with it, but that's about it. >> Another point is the preferred file format - as public domain are you >> allergic to MS Word 2003? It should be easily convertible to html, >> but Word would be much easier for me to format. > > I think most of us could read it using OpenOffice without > triggering any > nasty reactions. 8-) True, although plain-text would probably be the simplest format. If any markup is necessary, whoever adds it to the ReST file can add it. If I got something in Word format, the first thing I would do is copy the text out without markup, anyway. =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. From amedee at amedee.be Wed May 17 23:43:58 2006 From: amedee at amedee.be (amedee at amedee.be) Date: Wed, 17 May 2006 23:43:58 +0200 (CEST) Subject: [spambayes-dev] [Spambayes] Probable error in Windows FAQ In-Reply-To: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> References: <7.0.1.0.0.20060517181947.01bbed48@cqmail.net> Message-ID: <52376.213.118.146.89.1147902238.squirrel@amedee.be> > Another point is the preferred file format - as public domain are you > allergic to MS Word 2003? It should be easily convertible to html, > but Word would be much easier for me to format. Spambayes is NOT public domain but Python Software Foundation license. The PSF license is certified as an Open Source license. You might have heard about other Open Source software like Linux, which has a different (General Public License) but compatible license. Hint, hint... Why does it always have to be MS Word 2003? That is way too expensive! Spambayes is free of charge, the programming language python is also free of charge, so why not keep the documentation file format also free of charge? If you want something with advanced formatting capabilities, why not use the Open Document Format that is used by many modern advanced software? Like OpenWriter, a part of OpenOffice, which has exactly the same capabilities as MS Word 2003, can read/write MS Word files, and is totally free of charge? And also has a compatible license? -- Amedee Van Gasse amedee at amedee.be From nduncan at cqmail.net Wed May 17 20:41:16 2006 From: nduncan at cqmail.net (Nigel Duncan) Date: Wed, 17 May 2006 20:41:16 +0200 Subject: [spambayes-dev] Probable error in Windows FAQ Message-ID: <7.0.1.0.0.20060517203634.01a17c40@cqmail.net> I am prepared to suggest alterations to the Windows FAQ (at least the parts that affect Eudora), but first I want to have it agreed that I have found an important error. Essentially spam can only be RECEIVED mail from the POP server (or IMAP server if you happen to have one). Therefore an SMTP server can have nothing to do with the question. I removed all references to the SMTP server and port in my Eudora.ini file and everything seems to be working fine - but I have added a return receipt to this message to be on the safe side. Apart from this error (and another lesser one) I would try and develop a more user-friendly version as I gain experience (without getting into Outlook or Outlook Express, which I have but don't use) as I am used to writing and translating technical documents with emphasis on clarity for the user. Another point is the preferred file format - as public domain are you allergic to MS Word 2003? It should be easily convertible to html, but Word would be much easier for me to format. I am also sending this message to the developer's list as it is not 100% clear which is the right one for this case. PS I spoke too soon. Some messages got through the SpamBayes proxy server. Now neither that nor my original one is recognised. So I think I need some HELP!!!!!, which I?ll have to look for on the mailing list. ---------------------------------------------------------------- Nigel Duncan Jerez 4, P4, 4B E-28016 MADRID, Spain Tel: (0034) 91-350-4793 email: nduncan at cqmail.net -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20060517/adf21c49/attachment.html From tattoo93 at sbcglobal.net Sun May 21 22:46:43 2006 From: tattoo93 at sbcglobal.net (Sandra Brown) Date: Sun, 21 May 2006 13:46:43 -0700 Subject: [spambayes-dev] Question Message-ID: <003501c67d17$ac089040$0301a8c0@Sandra> When I attempt to download SpamBayes to my computer, it asks me to identify a "mirror". What does that mean? I live in Northern Calif., which mirror should I choose, if any? Thanks Life might not be the party we hoped for, but while we are here we may as well dance! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20060521/65095e6f/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 862 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20060521/65095e6f/attachment.gif From richie at entrian.com Sun May 21 23:22:03 2006 From: richie at entrian.com (Richie Hindle) Date: Sun, 21 May 2006 22:22:03 +0100 Subject: [spambayes-dev] Question In-Reply-To: <003501c67d17$ac089040$0301a8c0@Sandra> References: <003501c67d17$ac089040$0301a8c0@Sandra> Message-ID: Hi Sandra, > When I attempt to download SpamBayes to my computer, it asks me to identify > a "mirror". What does that mean? I live in Northern Calif., which mirror > should I choose, if any? It doesn't really matter. In theory, one that's geographically closer to you might give you a faster download. Just pick any of the North American ones. -- Richie Hindle richie at entrian.com From skip at pobox.com Mon May 22 00:02:15 2006 From: skip at pobox.com (skip at pobox.com) Date: Sun, 21 May 2006 17:02:15 -0500 Subject: [spambayes-dev] Question In-Reply-To: <003501c67d17$ac089040$0301a8c0@Sandra> References: <003501c67d17$ac089040$0301a8c0@Sandra> Message-ID: <17520.58215.403669.317252@montanaro.dyndns.org> Sandra> When I attempt to download SpamBayes to my computer, it asks me Sandra> to identify a "mirror". What does that mean? I live in Sandra> Northern Calif., which mirror should I choose, if any? SourceForge hosts a huge number of projects, so many that they can't possibly provide the service they need from just a single computer. Accordingly, they work with other organizations to provide mirrors of their content scattered all over the Internet. When you download files from SourceForge they ask you to choose which mirror you want to actually download from. (With a bit of extra effort they could probably make the mirror selection process transparent to the user.) Here in the USA it generally doesn't matter which North American mirror you choose, especially given that the SpamBayes distribution is so small. I've found the mirror at the University of Minnesota to be fast and reliable (I'm in Chicago). Skip