From tameyer at ihug.co.nz Tue Mar 1 01:13:44 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 1 01:13:53 2005 Subject: [spambayes-dev] My previous bug report In-Reply-To: Message-ID: > When is v1.1 being released so I can finally recommend it as > a corporate solution? I had hoped that an alpha would be out a month ago (obviously it wasn't). Unfortunately there was a complication with 1.0.2, which used up time, and I've been too busy with paid work since then to do just about anything with SpamBayes. That won't change until around the 20th of March. So unless one of the other developers decides to put 1.1a1 out, it won't be until around the end of March. I expect that about another month will pass before 1.1a2, then hopefully 1.1b1 a couple of weeks later, quickly followed by 1.1 final. In the meantime, you can of course use the code from CVS, or patch 1.0.3 yourself, and use that. =Tony.Meyer From tameyer at ihug.co.nz Tue Mar 1 02:48:08 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 1 02:48:14 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: Message-ID: > It is now possible (thanks to Tony Meyer) to have a single > instance of sb_imapfilter deal with multiple IMAP accounts. > That's great. > > However, it seems to me that there is something lacking in > what you can specify in the config file. The way things are > implemented (and I don't see how it can be done differently > currently) is that for each of the accounts, all mail folders > in the imap:filter_folders configuration variale are > filtered. But it may well be that the different accounts use > different sets of folders. Ah - didn't think of that. It's a good point - and it negates most of the use of being able to filter multiple accounts, unless you have a very simple (or matching) setup (e.g. filter "Inbox", move to "Unsure" and "Spam"). > In my case it's even worse. I use two sb_imapfilter setups > with different filter_folders, but also with different cutoff values. > > Has anybody given these kinds of problems any thought? Maybe it would be better if you could give imapfilter a list of configuration files. It could then cycle through them (e.g. load config, do filter/train, unload config, move to next file). That way you'd only need one imapfilter instance running, but it could handle any set of different options. It'd load them over the top of the config files found in the normal process (BAYESCUSTOMIZE, bayescustomize.ini, .spambayesrc etc), so options global to all servers could still be easily set. Actually, I guess you wouldn't have to explicitly give a list of configuration files. Imapfilter could look for a (e.g.) server_name.ini automatically, and load that if found. Does this sound useful? Anyone have a better idea? =Tony.Meyer From fctr at nac.net Thu Mar 3 22:19:46 2005 From: fctr at nac.net (From Concept To Reality, L.L.C.) Date: Thu Mar 3 22:46:32 2005 Subject: [spambayes-dev] SPAM collection Message-ID: <20050303214630.6D0CC1E4009@bag.python.org> Greetings: I've been building up a rather large (in my opinion) SPAM collection and have removed all the person information out of them and invalidated all the e-mail addresses (changed 'em to @domain.com) in the headers. Just how many e-mails are needed to make a good size collection for testing purposes and further development? 1000? 5000? 25000? And once I hit that number, where should I deposit them? Is there a web site, or should I just host it myself? Sincerely, Andrew Burns +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | From Concept To Reality, LLC. | | (fctr@nac.net) | | http://users.nac.net/fctr | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | Voice: (908) 879-3274 | | FAX: (908) 879-3275 | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ From sjoerd at acm.org Wed Mar 9 16:03:19 2005 From: sjoerd at acm.org (Sjoerd Mullender) Date: Wed Mar 9 16:04:31 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: References: Message-ID: <422F1037.3090001@acm.org> Tony Meyer wrote: >>It is now possible (thanks to Tony Meyer) to have a single >>instance of sb_imapfilter deal with multiple IMAP accounts. >>That's great. >> >>However, it seems to me that there is something lacking in >>what you can specify in the config file. The way things are >>implemented (and I don't see how it can be done differently >>currently) is that for each of the accounts, all mail folders >>in the imap:filter_folders configuration variale are >>filtered. But it may well be that the different accounts use >>different sets of folders. > > > Ah - didn't think of that. It's a good point - and it negates most of the > use of being able to filter multiple accounts, unless you have a very simple > (or matching) setup (e.g. filter "Inbox", move to "Unsure" and "Spam"). > > >>In my case it's even worse. I use two sb_imapfilter setups >>with different filter_folders, but also with different cutoff values. >> >>Has anybody given these kinds of problems any thought? > > > Maybe it would be better if you could give imapfilter a list of > configuration files. It could then cycle through them (e.g. load config, do > filter/train, unload config, move to next file). That way you'd only need > one imapfilter instance running, but it could handle any set of different > options. It'd load them over the top of the config files found in the > normal process (BAYESCUSTOMIZE, bayescustomize.ini, .spambayesrc etc), so > options global to all servers could still be easily set. > > Actually, I guess you wouldn't have to explicitly give a list of > configuration files. Imapfilter could look for a (e.g.) server_name.ini > automatically, and load that if found. > > Does this sound useful? Anyone have a better idea? This sounds like an excellent solution. -- Sjoerd Mullender -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 374 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20050309/6ccade64/signature.pgp From tim.peters at gmail.com Tue Mar 15 03:30:39 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Mar 15 03:30:46 2005 Subject: [spambayes-dev] SPAM collection In-Reply-To: <20050303214630.6D0CC1E4009@bag.python.org> References: <20050303214630.6D0CC1E4009@bag.python.org> Message-ID: <1f7befae05031418307bfac10c@mail.gmail.com> [From Concept To Reality, L.L.C. ] > I've been building up a rather large (in my opinion) SPAM > collection and have removed all the person information out of them > and invalidated all the e-mail addresses (changed 'em to > @domain.com) in the headers. > > Just how many e-mails are needed to make a good size collection for > testing purposes and further development? 1000? 5000? 25000? > > And once I hit that number, where should I deposit them? Is there a > web site, or should I just host it myself? Collections of all sizes are of use to someone -- it depends on their resources and goals. The SB project doesn't collect spam, because our goal is individualized training, and each person's ham/spam mix differs (and changes over time). Initial development was done on canned databases, though. Here's a good list of spam collections: http://www.paulgraham.com/spamarchives.html If you intend to make a long-term commitment to this, you could ask Paul Graham to include a link to you there. Else you can find an archive there to which you can contribute. From lionel at mamane.lu Wed Mar 16 00:03:31 2005 From: lionel at mamane.lu (Lionel Elie Mamane) Date: Wed Mar 16 00:03:33 2005 Subject: [spambayes-dev] Re: Evading bayesian spam filtering? In-Reply-To: <20050315224012.GA12226@tofu.mamane.lu> References: <20050315224012.GA12226@tofu.mamane.lu> Message-ID: <20050315230331.GB12226@tofu.mamane.lu> On Tue, Mar 15, 2005 at 11:40:12PM +0100, Lionel Elie Mamane wrote: > I just got this "interesting" spam, you might be interested > in. Clever way to evade bayesian filters that classify "I don't > know" (which looks to me like is the guaranteed score here...) I meant "if you get the details right", obviously. This particular mail screwed the details (still sent HTML, still a spammy sentence in the first link in the HTML (!), ...), but I wouldn't be surprised a mail with "the details right" would evade most bayesian filtering. Am I missing something? > as "ham". -- Lionel From lionel at mamane.lu Tue Mar 15 23:40:12 2005 From: lionel at mamane.lu (Lionel Elie Mamane) Date: Wed Mar 16 00:09:49 2005 Subject: [spambayes-dev] Evading bayesian spam filtering? Message-ID: <20050315224012.GA12226@tofu.mamane.lu> Dear all, I just got this "interesting" spam, you might be interested in. Clever way to evade bayesian filters that classify "I don't know" (which looks to me like is the guaranteed score here...) as "ham". Hope it doesn't get too popular. :-( Bye, -- Lionel -------------- next part -------------- An embedded message was scrubbed... From: "parker koerner" Subject: Satisfaction zone jacques Date: Wed, 16 Mar 2005 05:14:57 -0500 Size: 7042 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20050315/6d0f2813/attachment.mht From tameyer at ihug.co.nz Wed Mar 16 00:54:33 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Mar 16 00:54:47 2005 Subject: [spambayes-dev] Re: Evading bayesian spam filtering? In-Reply-To: Message-ID: > > I just got this "interesting" spam, you might be interested > > in. Clever way to evade bayesian filters that classify "I don't > > know" (which looks to me like is the guaranteed score here...) > > I meant "if you get the details right", obviously. This particular > mail screwed the details (still sent HTML, still a spammy sentence in > the first link in the HTML (!), ...), but I wouldn't be surprised a > mail with "the details right" would evade most bayesian filtering. Am > I missing something? I wrote a whole response to this before I figured out that these are ascii art pictures and not images (they look like images to me). The message will have to HTML for this to work, though, or the ascii art is too huge to fit anything in (or the ways that it can be drawn are limited). (Looking at the plain-text version of the message, I can't make out the words at all). I get a fair number of these sorts of messages now - although I had thought they were images, so I'm not sure what percentage actually are images and which are ascii art. Typically the only spammy content in the body of the message is an image and the text is word salad, or news clippings or something like that. What did the message score for you? I notice these in my unsures (high unsures, usually) - there may be ones in the spam folder, too, but I don't pay enough attention to those messages to know for sure. Your message scored 0.997712 for me, which isn't too bad considering it was to you and not me. (All but five of the spam clues were hapaxes, so this is almost certainly because I received something very similar that was unsure and then trained on). There are filters that make an effort to look at the message in 'eye space' (i.e. as the user sees it), which SpamBayes doesn't really do. If this sort of thing works, then more of that might be necessary, although I think there are other ways of countering this. =Tony.Meyer From lionel at mamane.lu Wed Mar 16 08:33:32 2005 From: lionel at mamane.lu ('Lionel Elie Mamane') Date: Wed Mar 16 08:33:35 2005 Subject: [spambayes-dev] Re: Evading bayesian spam filtering? In-Reply-To: References: Message-ID: <20050316073332.GB4282@tofu.mamane.lu> (I'm dropping Xavier Leroy from the CC list.) On Wed, Mar 16, 2005 at 12:54:33PM +1300, Tony Meyer wrote: >>> I just got this "interesting" spam, you might be interested >>> in. Clever way to evade bayesian filters that classify "I don't >>> know" (which looks to me like is the guaranteed score here...) >> I meant "if you get the details right", obviously. This particular >> mail screwed the details (still sent HTML, still a spammy sentence >> in the first link in the HTML (!), ...), but I wouldn't be >> surprised a mail with "the details right" would evade most bayesian >> filtering. Am I missing something? > I wrote a whole response to this before I figured out that these are > ascii art pictures and not images (they look like images to me). > The message will have to HTML for this to work, though, or the ascii > art is too huge to fit anything in (or the ways that it can be drawn > are limited). (Looking at the plain-text version of the message, I > can't make out the words at all). No? I don't run any HTML engine on my mails - ever (unless I have vaguely checked the HTML code and it comes from a clueless family member, and even then, it goes through "lynx -dump"), and the ascii art was clear and readable to me. Maybe a question of habit (I suppose you did use a fixed pitch font to look at the plain-text version? If not, this would probably explain it), or my bad eyesight "averaged out" the "pixels". ;-) > I get a fair number of these sorts of messages now. > What did the message score for you? I don't have a well-trained spambayes, so I cannot give precise figures. > Your message scored 0.997712 for me, which isn't too bad considering > it was to you and not me. What does it score if you remove the spammy parts? I mean: - the following line: more convenience: LOow price meds
- all HTML tags, and the text/html MIME declaration - maybe "satisfaction" from the title? > (All but five of the spam clues were hapaxes, so this is almost > certainly because I received something very similar that was unsure > and then trained on). I see. Were some of them the "random" words making up the ASCII art? Then you may have gotten the very same spam before :) > There are filters that make an effort to look at the message in 'eye > space' (i.e. as the user sees it), Really going to eye space for that kind of thing needs OCR... That's a wholly new level of complexity throw in. > If this sort of thing works, then more of that might be necessary, > although I think there are other ways of countering this. What ways are you thinking about? -- Lionel From chadihamdan at gmail.com Wed Mar 16 09:43:25 2005 From: chadihamdan at gmail.com (chadi hamdan) Date: Wed Mar 16 09:43:28 2005 Subject: [spambayes-dev] outlook 2000 dosn't respond Message-ID: <68eb624f0503160043840da81@mail.gmail.com> Hi, I have been using Spambayes for a while, ans it works great until lately it just stop. I did uninstall and re-install but nothing. the icon are there but when I click nothing happing, even if i click on Remove as Spam nothing happing either Chadi From tameyer at ihug.co.nz Thu Mar 17 07:16:52 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Mar 17 07:17:04 2005 Subject: [spambayes-dev] Re: Evading bayesian spam filtering? In-Reply-To: Message-ID: >> The message will have to HTML for this to work, though, or the ascii >> art is too huge to fit anything in (or the ways that it can be drawn >> are limited). (Looking at the plain-text version of the message, I >> can't make out the words at all). > > No? I don't run any HTML engine on my mails - ever (unless I have > vaguely checked the HTML code and it comes from a clueless family > member, and even then, it goes through "lynx -dump"), and the ascii > art was clear and readable to me. Maybe a question of habit (I suppose > you did use a fixed pitch font to look at the plain-text version? Yes - I read email in Outlook, in a fixed-width font (Courier New, size 10). Looking at the plain-text version of the message the writing was massive (with the default font size, rather than the HTML-specified size) and so hard to read. It also had a blank line between each line, which made it even worse. Have you looked at it in HTML? It really is much clearer (unless your regular font size is that small, I suppose). > What does it score if you remove the spammy parts? I mean: > > - the following line: > >
href="http://vietnamese.com.medattuneto.com/?Bggiw/x">more > convenience: LOow price meds
> > - all HTML tags, and the text/html MIME declaration > > - maybe "satisfaction" from the title? 0.994954235889 - almost no difference at all. > > (All but five of the spam clues were hapaxes, so this is almost > > certainly because I received something very similar that was unsure > > and then trained on). > > I see. Were some of them the "random" words making up the ASCII art? > Then you may have gotten the very same spam before :) Yes, many of them were. It could also have been spam that uses the same technique and happened to hit on the same sequences of characters. There's a huge variety of ways that the image could be composed, but if the same ASCII art generation engine is creating them all then there will probably be a few sequences that are commonly used. It's certainly extremely unlikely that any of them would appear in ham. It would be most interesting to know what it scored for someone else that hasn't trained on a message like this before. I know that I haven't seen any false negatives from these (just unsures), but I can't recall what the scores were. > Really going to eye space for that kind of thing needs OCR... That's a > wholly new level of complexity throw in. There are mid steps, but yes, that would be one way of doing it. I haven't used OCR in about a decade, so I really have no idea what good OCR is like these days. I would have thought that it would be reasonably fast & accurate by now (it wasn't that bad then). Regular OCR wouldn't help here, probably, since if it was any good it would churn out the same ASCII art. You'd need to pre-process the image first (blurring it, for example, or maybe some loose edge detection) probably. >> If this sort of thing works, then more of that might be necessary, >> although I think there are other ways of countering this. > > What ways are you thinking about? Things like Jonathan A. Zdziarski's Bayesian Noise Reduction, for example. IIRC the original intent was to counter things like word salad (which the ASCII art technique sometimes also employs), but I would presume that a message like this would be almost all noise (although I haven't done any testing on this). You could do other analysis like this, too, like looking at sentence structure (of which there is none here) - not a dead giveaway (there are plenty of people sending appallingly written email), but still a clue. There's also plenty of content in the headers that this trick does nothing to hide, which can be analysed by simple statistical techniques like SpamBayes uses, or with black- or grey-listing, or with social-circle analysis, or whatever. =Tony.Meyer From tameyer at ihug.co.nz Thu Mar 17 07:26:32 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Mar 17 07:26:40 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: Message-ID: [Tony Meyer] >> Maybe it would be better if you could give imapfilter a list of >> configuration files. [Sjoerd Mullender] > This sounds like an excellent solution. I've checked in changes to sb_imapfilter to implement this. I've done some limited testing of it, but haven't had a chance to do any thorough testing (I only have one IMAP account at the moment, so I'll have to set up a second one to test it properly), and haven't checked in any updates to the unit tests to cover it. I'd like to get 1.1a1 out next week (only two months later than planned...), which will include this, so I'll try and get better testing done before then. =Tony.Meyer -- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. From tameyer at ihug.co.nz Thu Mar 17 07:31:15 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Thu Mar 17 07:31:19 2005 Subject: [spambayes-dev] Windows/Unix(?) Command Line Training In-Reply-To: Message-ID: > HOWEVER, it's not easy to "automatically" train e-mail from > PMMail 2000. So, I've written some command line code that > connects to the SPAMBayes webserver and trains on a filename. > > The source code is located at: > > http://uses.nac.net/fctr/spambayes/trainasspam.cpp [...] I've only just managed to (try to) have a look at this, sorry. Unfortunately I can't get to uses.nac.net (DNS won't resolve). Is that the right address? > The command line looks like this: > C:\> TrainAsSpam FalseNegativeMessage.txt [...] 1.1's sb_upload.py has a -t switch that 1.0.x's doesn't, which does this too. Since I can't see the code for your program I can't tell if they're doing the same thing, but I presume so. (With 1.0.x, you could use sb_filter/sb_mboxtrain to train, but you'd have to shut sb_server down first). =Tony.Meyer -- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. From fctr at nac.net Thu Mar 17 07:36:18 2005 From: fctr at nac.net (From Concept To Reality, L.L.C.) Date: Thu Mar 17 07:36:24 2005 Subject: [spambayes-dev] Windows/Unix(?) Command Line Training In-Reply-To: Message-ID: <20050317063622.D27AE1E4004@bag.python.org> On Thu, 17 Mar 2005 19:31:15 +1300, Tony Meyer wrote: >> HOWEVER, it's not easy to "automatically" train e-mail from >> PMMail 2000. So, I've written some command line code that >> connects to the SPAMBayes webserver and trains on a filename. >> >> The source code is located at: >> >> http://uses.nac.net/fctr/spambayes/trainasspam.cpp I am SUCH a dumbass... http://users.nac.net/fctr/spambayes/trainasspam.cpp Try that one! *LAUGH* Sincerely, Andrew Burns +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | From Concept To Reality, LLC. | | (fctr@nac.net) | | http://users.nac.net/fctr | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | Voice: (908) 879-3274 | | FAX: (908) 879-3275 | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ From sjoerd at acm.org Thu Mar 17 11:39:27 2005 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Mar 17 11:39:38 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: References: Message-ID: <42395E5F.7010509@acm.org> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 374 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20050317/994f0b70/signature.pgp From sjoerd at acm.org Thu Mar 17 11:49:25 2005 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Mar 17 11:49:31 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: <42395E5F.7010509@acm.org> References: <42395E5F.7010509@acm.org> Message-ID: <423960B5.9010502@acm.org> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 374 bytes Desc: OpenPGP digital signature Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20050317/5a645a21/signature.pgp From kenny.pitt at gmail.com Thu Mar 17 14:35:24 2005 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Thu Mar 17 14:35:29 2005 Subject: [spambayes-dev] BSDDB version issues (was 1.0.2 and 1.1a1) In-Reply-To: Message-ID: <4239879e.5cb327b8.39b1.5901@mx.gmail.com> Hernan Martinez Foffani wrote: > Hope you don't mind (I don't know if you already knew if or applies) > but are you aware that the Subversion core dev team currently advise > against the use of bsddb 4.1 versions? They recomend either 4.0 or > 4.2. http://www.subversion.org/faq.html#bdb41-tabletype-bug > > In Python 2.3.4 (that's what I've got installed): > > >>> import bsddb > >>> bsddb._db.version() > (4, 1, 25) > >>> bsddb._db.DB_VERSION_STRING > 'Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002)' > >>> I'm afraid I've been a bit scarce lately. Things are rather hectic right now in the world of paid software development, and probably will be until at least mid-year. I just got around to checking this in Python 2.4: >>> import bsddb >>> bsddb._db.version() (4, 2, 52) >>> bsddb._db.DB_VERSION_STRING 'Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)' >>> Now that the binary installer is being built with Python 2.4, it will be interesting to see if reports of the "run recovery" problem decrease. -- Kenny Pitt From bounce_29155917 at mail.sign-up.to Thu Mar 17 17:26:28 2005 From: bounce_29155917 at mail.sign-up.to (Adam) Date: Thu Mar 17 17:26:29 2005 Subject: [spambayes-dev] 19th March Message-ID: <200503171626.j2HGQS7m002313@h100240.tdmgroup.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050317/22959fa5/attachment.htm From tameyer at ihug.co.nz Fri Mar 18 03:06:11 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Mar 18 03:07:41 2005 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes Corpus.py, 1.21, 1.22 FileCorpus.py, 1.18, 1.19 In-Reply-To: Message-ID: > future imports have to come first... I thought that too, but if I run this script (with 2.2), it works: """ __author__ = "Tony Meyer" from __future__ import generators def a(): yield 1 for b in a(): print b """ Why is that? Is it just an implementation quirk? =Tony.Meyer From t-meyer at ihug.co.nz Fri Mar 18 03:18:00 2005 From: t-meyer at ihug.co.nz (Tony Meyer) Date: Fri Mar 18 03:18:05 2005 Subject: [spambayes-dev] BSDDB version issues (was 1.0.2 and 1.1a1) In-Reply-To: Message-ID: > I'm afraid I've been a bit scarce lately. Haven't we all! > Things are rather > hectic right now in the world of paid software development, and > probably will be until at least mid-year. Things have been rather hectic here, too, although that should be changing about now. That could work out ok for you anyway, since around that time 1.1 final should be out and more interesting work on 1.2 can begin :) (i.e. you get to skip the dull 1.1 bug fixing process...) > I just got around to checking this in Python 2.4: > > >>> import bsddb > >>> bsddb._db.version() > (4, 2, 52) > >>> bsddb._db.DB_VERSION_STRING > 'Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)' > >>> > > Now that the binary installer is being built with Python 2.4, > it will be interesting to see if reports of the "run recovery" > problem decrease. If it doesn't decrease to 0 it'll be hard to tell, though. These reports are pretty rare these days (unless people are actually reading the FAQ and not reporting them, which seems unlikely). =Tony.Meyer From skip at pobox.com Fri Mar 18 03:38:56 2005 From: skip at pobox.com (Skip Montanaro) Date: Fri Mar 18 03:38:46 2005 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes Corpus.py, 1.21, 1.22 FileCorpus.py, 1.18, 1.19 In-Reply-To: References: Message-ID: <16954.16192.160931.532564@montanaro.dyndns.org> >> future imports have to come first... Tony> I thought that too, but if I run this script (with 2.2), it works: ... Sure, but not with CVS HEAD: % python Python 2.5a0 (#75, Mar 15 2005, 21:55:51) [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import example Traceback (most recent call last): File "", line 1, in ? SyntaxError: from __future__ imports must occur at the beginning of the file (example.py, line 3) Think of it as __future__-proofing... ;-) Skip From tameyer at ihug.co.nz Fri Mar 18 04:08:43 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Mar 18 04:08:49 2005 Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/spambayes Corpus.py, 1.21, 1.22 FileCorpus.py, 1.18, 1.19 In-Reply-To: Message-ID: >>> future imports have to come first... >> I thought that too, but if I run this script (with 2.2), it works: > > Sure, but not with CVS HEAD: Ah - too cutting edge for me :) > % python > Python 2.5a0 (#75, Mar 15 2005, 21:55:51) > [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin > Type "help", "copyright", "credits" or "license" for more > information. > >>> import example > Traceback (most recent call last): > File "", line 1, in ? > SyntaxError: from __future__ imports must occur at the > beginning of the file (example.py, line 3) > > Think of it as __future__-proofing... ;-) :) So I guess this was a case of the implementation not matching the documentation and it just took 3 minor versions to get fixed. I'm sure there's irony in there somewhere, but I can't quite spot it :) =Tony.Meyer From tameyer at ihug.co.nz Fri Mar 18 04:18:45 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Mar 18 04:18:50 2005 Subject: [spambayes-dev] sb_imapfilter with multiple accounts In-Reply-To: Message-ID: > Come to think of it, username + server name is also not > enough for me: I also use different settings for the same > user name/server name combination, but for different folders. I agree that username + server name would be better (though the filename is getting rather long now!). I wonder that your case is somewhat extreme... :) >> Alternatively, there could just be a list of option files >> in the main option file. > > This will probably have to be the way to go. The other solution seemed tidier, but this should work and would be more flexible. It will be more work to implement, though, since loading/unloading the config files will have to be much higher up in the process. >> For now I'll make a local patch to see whether it actually works... Sounds good :) I'm happy to back out what I checked in. =Tony.Meyer From tameyer at ihug.co.nz Fri Mar 18 04:25:22 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Fri Mar 18 04:25:34 2005 Subject: [spambayes-dev] Windows/Unix(?) Command Line Training In-Reply-To: Message-ID: >>> HOWEVER, it's not easy to "automatically" train e-mail from >>> PMMail 2000. So, I've written some command line code that >>> connects to the SPAMBayes webserver and trains on a filename. >>> >>> The source code is located at: [...] > http://users.nac.net/fctr/spambayes/trainasspam.cpp Ah, yes, now I remember why I like Python <0.5 wink>. This does appear to be doing just what sb_upload.py does, except that it's 599 lines rather than 163, and does less :) OTOH, I'm sure than the compiled C++ would result in a smaller executable than a frozen sb_upload.py, and would run faster. Perhaps you'd like to put a link to this on the SpamBayes wiki? - the UserRecipes page is probably the most logical. =Tony.Meyer From bounce_29291988 at mail.sign-up.to Fri Mar 18 12:13:03 2005 From: bounce_29291988 at mail.sign-up.to (Adam) Date: Fri Mar 18 12:13:05 2005 Subject: [spambayes-dev] 19th March Message-ID: <200503181113.j2IBD3Oc030786@h100239.tdmgroup.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050318/277eebd1/attachment.html From bounce_29096274 at mail.sign-up.to Thu Mar 17 15:34:43 2005 From: bounce_29096274 at mail.sign-up.to (Sarah) Date: Fri Mar 18 19:16:54 2005 Subject: [spambayes-dev] the plan for the weekend Message-ID: <200503171434.j2HEYhkk003915@h100240.tdmgroup.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050317/146c62a5/attachment.html From bounce_29343432 at mail.sign-up.to Fri Mar 18 15:54:57 2005 From: bounce_29343432 at mail.sign-up.to (Sarah) Date: Sat Mar 19 10:49:57 2005 Subject: [spambayes-dev] the plan for the weekend Message-ID: <200503181454.j2IEsvvN015371@h100239.tdmgroup.net> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050318/93e04c5f/attachment.htm From hernan at orgmf.com.ar Mon Mar 21 12:44:54 2005 From: hernan at orgmf.com.ar (=?us-ascii?Q?Hernan_Martinez_Foffani?=) Date: Mon Mar 21 12:45:48 2005 Subject: [spambayes-dev] BSDDB version issues (was 1.0.2 and 1.1a1) In-Reply-To: <4239879e.5cb327b8.39b1.5901@mx.gmail.com> Message-ID: >> Hope you don't mind (I don't know if you already knew if or applies) >> but are you aware that the Subversion core dev team currently advise >> against the use of bsddb 4.1 versions? They recomend either 4.0 or >> 4.2. http://www.subversion.org/faq.html#bdb41-tabletype-bug >> >> In Python 2.3.4 (that's what I've got installed): >> >> >>> import bsddb >> >>> bsddb._db.version() >> (4, 1, 25) >> >>> bsddb._db.DB_VERSION_STRING >> 'Sleepycat Software: Berkeley DB 4.1.25: (December 19, 2002)' >> >>> > > I'm afraid I've been a bit scarce lately. Things are rather hectic > right now in the world of paid software development, and probably > will be until at least mid-year. > > I just got around to checking this in Python 2.4: > > >>> import bsddb > >>> bsddb._db.version() > (4, 2, 52) > >>> bsddb._db.DB_VERSION_STRING > 'Sleepycat Software: Berkeley DB 4.2.52: (December 3, 2003)' > >>> > > Now that the binary installer is being built with Python 2.4, it will > be interesting to see if reports of the "run recovery" problem > decrease. You may also find this mail (perhaps also the whole thread) http://subversion.tigris.org/servlets/ReadMsg?list=users&msgNo=28110 an interesting reading. Regards, -Hernan. From ta-meyer at ihug.co.nz Tue Mar 22 23:22:22 2005 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 22 23:22:27 2005 Subject: [spambayes-dev] Re: Is building with Python 2.4 and not Python 2.3 a bugfix? Message-ID: [Tony Meyer, back in November] >> There are various reasons why building the binary with Python 2.4 would be >> better than with Python 2.3, but the important one is that Python 2.4 has >> email 3.0. [...] >> However, 1.0.2 is only for bugfixes, not for new features. >> >> So my question is: is building with the newer Python valid for a bugfix >> release? [Tim Peters] > There's a small chance that 2.4 will introduce a relevant bug, and > then the decision to use it would look imprudent. In that case you'll > be blamed. But if all it does is improve email handling, nobody will > praise you. So there's the moral dilemma: you can improve peoples' > lives, or cover your ass. Since it's not my ass, I vote you expose > it . Of course, using 2.4 resulted in (at least) three bugs (not having mscvr71.dll, modulefinder problem with bsddb, and the most annoying one with asyncore using up all of people's disk space). As Pratchett would say, those million-to-one chances do seem to crop up nine times out of ten... Anyway, while the first two problems were fixed for 1.0.3, the last one is still a major problem, and I'm sick of dealing with it. I'm going to do a 1.0.4 before Easter (and then run away over the holiday period so that no-one can find me ) that includes the fix for that *and* is built with Python 2.3. People can have their X-SpamBayes-Exceptions back! So if anyone has any other bug fixes they'd like to see in 1.0.4, please commit them ASAP or let me know so I can hold off the build. I'd like to get 1.1a1 (*with* Python 2.4) done tomorrow too, but might not have a chance, in which case I'll get to it next week. There'll definitely be a 1.1a2, so there's not a huge hurry to get things into this release. Any objections? =Tony.Meyer From fctr at nac.net Wed Mar 23 01:49:25 2005 From: fctr at nac.net (From Concept To Reality, L.L.C.) Date: Wed Mar 23 01:49:30 2005 Subject: [spambayes-dev] Re: Is building with Python 2.4 and not Python 2.3 a bugfix? In-Reply-To: Message-ID: <20050323004929.4C6E11E4005@bag.python.org> On Wed, 23 Mar 2005 10:22:22 +1200, Tony Meyer wrote: >Any objections? Just fix it! I don't know about the rest of you, but I'm ready to go back to IBM punch cards. *LAUGH* By the way, I've noticed SPAM that used to be classified as SPAM is now being DEFER'ed. I find this behaviour odd, since 85% of my e-mail is SPAM, and always has been. Am I alone, here? Sincerely, Andrew Burns +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | From Concept To Reality, LLC. | | (fctr@nac.net) | | http://users.nac.net/fctr | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | Voice: (908) 879-3274 | | FAX: (908) 879-3275 | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ From ta-meyer at ihug.co.nz Wed Mar 23 02:14:53 2005 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Mar 23 02:15:06 2005 Subject: [spambayes-dev] Re: Is building with Python 2.4 and not Python 2.3 a bugfix? In-Reply-To: Message-ID: > By the way, I've noticed SPAM that used to be classified as SPAM is > now being DEFER'ed. I find this behaviour odd, since 85% of my > e-mail is SPAM, and always has been. By "DEFER'ed", do you mean that it's classified as unsure? This could be either new types of spam, or as a result of new training (e.g. a training mistake, a growing imbalance). The way to figure out why a message scores what it does is to look at the clues for that message (there's a link on the review page). If it's not obvious why it scored what it did, then you can send a copy on to spambayes@python.org and someone will comment on it. It's possible that the problem can be fixed by adjusting your thresholds (the defaults are fairly conservative). =Tony.Meyer From fctr at nac.net Wed Mar 23 02:50:13 2005 From: fctr at nac.net (From Concept To Reality, L.L.C.) Date: Wed Mar 23 02:50:19 2005 Subject: [spambayes-dev] Re: Is building with Python 2.4 and not Python 2.3 a bugfix? In-Reply-To: Message-ID: <20050323015018.2A0521E4005@bag.python.org> On Wed, 23 Mar 2005 13:14:53 +1200, Tony Meyer wrote: >> By the way, I've noticed SPAM that used to be classified as SPAM is >> now being DEFER'ed. I find this behaviour odd, since 85% of my >> e-mail is SPAM, and always has been. >By "DEFER'ed", do you mean that it's classified as unsure? Yes, unsure. >This could be >either new types of spam, or as a result of new training (e.g. a training >mistake, a growing imbalance). My current balance is 66% SPAM, 33% HAM. (Spam: 3082 Ham: 1096) Could this cause the problem? >The way to figure out why a message scores what it does is to look at the >clues for that message (there's a link on the review page). If it's not >obvious why it scored what it did, then you can send a copy on to >spambayes@python.org and someone will comment on it. >It's possible that the problem can be fixed by adjusting your thresholds >(the defaults are fairly conservative). I'll take a gander at the clues and see if it's something stupid that I did. Sincerely, Andrew Burns +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | From Concept To Reality, LLC. | | (fctr@nac.net) | | http://users.nac.net/fctr | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | Voice: (908) 879-3274 | | FAX: (908) 879-3275 | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ From ta-meyer at ihug.co.nz Wed Mar 23 03:42:46 2005 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Wed Mar 23 03:42:59 2005 Subject: [spambayes-dev] TREC-Spam Message-ID: I'm planning on participating in TREC 2005 (the spam track) using SpamBayes: Basically the idea is that a whole lot of filters are run over a few corpora (a couple of public and a couple of private) and the results are compared. (Not to say, "hey, my filter is best", but to see what works well, where improvements can be made, and all that). The testing system is similar to our (Alex's) incremental testing setup - the steps are: initialize classify emailfile resultfile train [ham|spam] emailfile resultfile finalize So there is (or can be) training after each classification. I'll create scripts (a modified sb_filter, probably) that do each of the steps. I don't think that train-on-everything is a good idea here, so will include some sort of training regime (like the incremental testing setup), too (maybe train-to-exhaustion?). I'm interested in doing this: o As research that I can work on after I submit my PhD and before I defend it. o To see how spambayes compares with various types of filter/corpus. o As a sideline to other research I'd like to do with spambayes (see #1). To get to the point of the email: o Does anyone object to me using spambayes in this way? Everyone will be acknowledged in the write-ups and all that, obviously, and I'm participating as an individual (with tentative ties to my work, and obviously using the work, but not speaking for, the spambayes group). o Is anyone else interested in this? I can certainly report back as things progress, but if anyone is really interested and can spare the time, I'd happy work on it with someone else. =Tony.Meyer From Jeremy.Labram at Charteris.com Thu Mar 24 14:10:04 2005 From: Jeremy.Labram at Charteris.com (Jeremy Labram) Date: Thu Mar 24 14:13:52 2005 Subject: [spambayes-dev] In box - playing a sound on arrival of new mail. Message-ID: <559663697966B44F8DC0A81A585C3CA8577365@neptune1.TheMandelbrotSet.com> In box - playing a sound on arrival of new mail. I would like the new mail alerts only to operate when they are 'ham' and not 'ham or spam'. Is there anyway Outlook 2003 / Spambayes can be configured for this. I can find nothing obvious in the help and web-site faqs. Thanks in advance Jeremy Labram, Principal, Charteris plc, tel 07989 330721 (M), 020 7600 9199 (Off) _________________________________________________________________ This message from Charteris plc has been checked for viruses http://www.charteris.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050324/a8319936/attachment.htm From tameyer at ihug.co.nz Tue Mar 29 03:29:39 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 29 03:29:42 2005 Subject: [spambayes-dev] RE: [Spambayes] Jaded Easter Special - Extended daytime opening In-Reply-To: Message-ID: [snip yet-another-party-spam] These party spams are starting to bug me, since they're reasonably frequent. I know we don't generally filter spambayes@python.org or spambayes-dev@python.org (and don't really want to reopen that discussion), but would anyone mind if I added something like *@mail.sign-up.to (I'll take a look at them and see if I can find a tighter match) to the automatically reject/discard address list for the two lists? (I'll consider no response over the next week as no objection). =Tony.Meyer From tim.peters at gmail.com Tue Mar 29 03:34:24 2005 From: tim.peters at gmail.com (Tim Peters) Date: Tue Mar 29 03:34:28 2005 Subject: [spambayes-dev] RE: [Spambayes] Jaded Easter Special - Extended daytime opening In-Reply-To: References: Message-ID: <1f7befae050328173471997763@mail.gmail.com> [Tony Meyer] > [snip yet-another-party-spam] > > These party spams are starting to bug me, since they're reasonably frequent. > I know we don't generally filter spambayes@python.org or > spambayes-dev@python.org (and don't really want to reopen that discussion), > but would anyone mind if I added something like *@mail.sign-up.to (I'll take > a look at them and see if I can find a tighter match) to the automatically > reject/discard address list for the two lists? > > (I'll consider no response over the next week as no objection). Go right ahead! I won't miss 'em either . From tameyer at ihug.co.nz Tue Mar 29 04:23:04 2005 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 29 04:23:09 2005 Subject: [spambayes-dev] RE: [Spambayes] Jaded Easter Special - Extended daytime opening In-Reply-To: Message-ID: [Tony Meyer] >> [snip yet-another-party-spam] >> >> [...] would anyone mind if I added something like >> *@mail.sign-up.to [...] to the automatically reject/discard >> address list for the two lists? [Tim Peters] > Go right ahead! I won't miss 'em either . I considered this a pronouncement and did it, using "^bounce_\d+@mail.sign-up.to" as the filter (for discard), which should work if the addresses stay nice and consistent. Looking at http://sign-up.to, "^.+sign-up.to" would probably ok, but maybe there are some legit mail addresses there. So hopefully these won't be seen again (if they are, I'll take another look at the filter). =Tony.Meyer From popiel at wolfskeep.com Tue Mar 29 04:25:38 2005 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Mar 29 04:25:41 2005 Subject: [spambayes-dev] RE: [Spambayes] Jaded Easter Special - Extended daytime opening In-Reply-To: Message from "Tony Meyer" of "Tue, 29 Mar 2005 14:23:04 +1200." References: Message-ID: <20050329022538.E19D02DF18@cashew.wolfskeep.com> In message: "Tony Meyer" writes: > >So hopefully these won't be seen again (if they are, I'll take another >look at the filter). I never saw them in the first place, so I guess my filter is working. ;-) - Alex