From richard at jowsey.com Tue Apr 1 06:47:23 2003 From: richard at jowsey.com (Richard Jowsey) Date: Mon Mar 31 15:48:01 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <5.1.0.14.0.20030331073006.01ebac50@mail.telecommunity.com> References: <3E88AFD0.4984.1072C98C@localhost> Message-ID: <3E8935FB.16804.1EF3D9D@localhost> > Won't this just convince spammers that: > 1) Their spam is "working", because "people are clicking on the link", Won't matter, coz I'm not buying anything, just slurping words. Let 'em count away, haha... > 2) If there's a unique ID in the URL, it will confirm that your > address is live and that you're a sucker for whatever it is they > mailed you. :) More fools them. They've already got my address, and they'll keep on mindlessly sending out junk no matter what... > Of course, I also suppose it's possible that if enough people install > a spam filter that works this way, the resulting "spambayes effect" > might crash a few of their servers. :) Ha! I doubt it. They'll only stop doing this single URL trick when it's not translating into sales. The key point is that these little uglies appear to be specifically designed to get past our kind of filtering, by dint of insufficient clues in the email. From richard at jowsey.com Tue Apr 1 07:20:07 2003 From: richard at jowsey.com (Richard Jowsey) Date: Mon Mar 31 16:20:25 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: References: <200303311251.h2VCp4419496@localhost.localdomain> Message-ID: <3E893DA7.31420.20D35DB@localhost> > We have to be careful with this. It would be relatively simple to > stymie, by simply adding two urls, the spam one, and an unrelated > innocent site. Or three urls, or whatever... Spammers are simple folk. They won't be putting no innocent url's in these spams... > We definitely should NOT crawl the site, just in case it really is an > innocent url. The load can crush a site, particularly if it's hosted. Nah. You need to throw thousands of requests at a half-decent web server before it gives up the ghost. And if they're sending out 10 million mail pieces, they should expect their http server to take some load. These are definitely NOT innocent emails. They come from bogus senders, have minimal headers (deliberately), and contain *nothing* but a url. Which points, via redirect naturally, to an incest porn or get-a-huge-penis site, etc. > Spambayes is superb at recognizing spam based solely upon the payload > received. If these mails are slipping through, then we need to > examine the clues and see why. I couldn't agree more! Here's one which got a resounding "unsure" (p=0.5130) from my classifier first time through. After slurping that url, it shot up to p=0.9893, exactly where it belongs! -------------------------------------------------------------------- Return-Path: Received: from kerchunk.com ([61.149.21.5]) by www1.kc.aoindustries.com (8.11.6/8.11.6) with SMTP id h2V3DST27976 for ; Date: 29 Mar 2003 04:44:15 -0400 From: Ella Bunton To: richard@jowsey.com Subject: inside Daughter Message-ID: <20030330002313.YhnvhVSzPGVA@kerchunk.com> Content-type: text/plain; charset="us-ascii" http://leajoulom.lewdmother.com -------------------------------------------------------------------- Cheers, Richard From richard at jowsey.com Tue Apr 1 08:52:35 2003 From: richard at jowsey.com (Richard Jowsey) Date: Mon Mar 31 17:52:53 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <16008.46908.795498.412561@montanaro.dyndns.org> References: <3E893DA7.31420.20D35DB@localhost> Message-ID: <3E895353.6359.261DD00@localhost> > >> We definitely should NOT crawl the site, just in case it really > is an >> innocent url. The load can crush a site, particularly if > it's >> hosted. > > Richard> Nah. You need to throw thousands of requests at a > half-decent Richard> web server before it gives up the ghost. And > if they're sending Richard> out 10 million mail pieces, they > should expect their http Richard> server to take some load. These > are definitely NOT innocent Richard> emails. They come from bogus > senders, have minimal headers Richard> (deliberately), and contain > *nothing* but a url. Which points, Richard> via redirect > naturally, to an incest porn or get-a-huge-penis Richard> site, > etc. > > You can't make that judgement beforehand. If the site you are poking > is a valid site and the email received was not spam, none of what you > said holds. If I remember correctly, you said this was only to be > performed in circumstances where certain criteria were met, none of > which included a conclusion the mail was spam. Skip, I agree absolutely! We certainly can't assume that an email containing only a singleton url is spam. NB: when friends or colleagues send me a single url, as they do, such messages already get a "good" classification. No problem there. But the same kind of message from a spammer ends up as "unsure", primarily because there's simply not enough clues to be definite about its classification. Actually, what prompted this whole question was a "complaint" from one of my proxy beta testers about one of these spams. He reckoned it was "bloody obvious" that the message was junk. The classifier disagreed. I went looking for a simple solution! My *only* criteria for poking the url (rather ironic choice of verb, considering the sites in question ;-) are: 1. an "unsure" classification 2. number of clues < 150 (or whatever max_discriminators one has) 3. a URL in the message body If the url happens to point at an "innocent" site, this extra bit of information-gathering will simply tip the message over to the "good" bucket. A Good Thing, no harm done. And, in this (unusual) case, there's definitely no extra load happening at the web server, precisely because there weren't millions of this email getting blasted out... HTH! Cheers, Richard From richard at jowsey.com Tue Apr 1 09:24:28 2003 From: richard at jowsey.com (Richard Jowsey) Date: Mon Mar 31 18:25:13 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <20030331221529.8A6592DDF2@cashew.wolfskeep.com> References: Message from "Richard Jowsey" of "Tue, 01 Apr 2003 07:20:07 +1000." <3E893DA7.31420.20D35DB@localhost> Message-ID: <3E895ACC.19208.27F0EE0@localhost> > Spammers might be simple folk, but serious crackers (not the script > kiddies) certainly are not. If there comes to be a widely deployed > tool with this sort of fetch-what-I-tell-you-to behaviour, then it > will get exploited by people wanting to do a denial of service > attack or similar. There's literally dozens of DOS "attack tools" out there already. They're unfortunately very easy to build. A determined site-slammer is going to use quite different technology than my crawler, in any case, e.g. http://grc.com/dos/grcdos.htm What I've built is a simple url-slurper, which resides on a proxy server (not deployed on desktops), and is only invoked under very particular circumstances. The results are immediately incorporated into the server's database, so that anyone else receiving that spam benefits from the extra information. Under this kind of deployment scenario, a spam site only needs to be crawled once. Then we've got him nailed! :) Cheers, Richard From richard at jowsey.com Tue Apr 1 09:43:08 2003 From: richard at jowsey.com (Richard Jowsey) Date: Mon Mar 31 18:43:46 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: References: <16008.46908.795498.412561@montanaro.dyndns.org> Message-ID: <3E895F2C.28583.2902582@localhost> > That's right. We really should try to solve this problem with > tokenization. You're quite right. My initial response to these spams was ripping up the url, then checking every possible fragment (>= 3 chars) against the database. This proved reasonably effective when there were enough additional spam-words in the message to shift the classifier out of "unsure". However, these new spams appear designed to provide us the absolute minimum number of clues. Additional tokenization logic probably won't help much, but I'd be delighted if we could figure out a better way! Cheers, Richard From mhammond at skippinet.com.au Tue Apr 1 10:24:36 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Mar 31 19:25:40 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <3E88CD74.4050405@parducci.net> Message-ID: [Tim S] > That's right. We really should try to solve this problem with > tokenization. I'm not sure how many tricks we can pull with tokenization - in the sample mail, there simply aren't enough tokens in the message. I see lots of these, and another trick they use is to use simple mis-spellings of words that would otherwise be clues - eg "fatherr and daugter". Like Richard, I assume these are designed to provide minimal clues. The problem seems to simply be the "unsure" nature of these messages. As Richard says, a trivial URL message in ham will *generally* have enough good clues to push it over the edge. It sounds like we are asking for a tweaking of the math and/or configuration options to push unsure messages towards "spam" - ie, a "in the absence of any clues, assume spam" rather than the current "assume unsure". The only problem I see with this is that, by definition, unsure messages do not have enough clues. A distinction seems to be that in one case we have lots of unsure clues, where in this case we have very few unsure clues. I'm not sure we want a token for the length of the message - the number of clues is the issue. [Alex] > Spammers might be simple folk, but serious crackers (not the script > kiddies) certainly are not. If there comes to be a widely deployed As Richard says, this may be a stretch. Such a DOS attack would require sending a crafted spam to each of these addresses known to run such a filter (or a blind spam hoping to hit them). This spam would cause a single hit on the web server. Re-sending the same spam would not re-fetch the URL, as now we have spam clues, and can score the message without the URL fetch (this is assuming we auto train after the first fetch). We would obviously only fetch html text from the server. Could you not do the same thing today, by sending out a HTML email referencing some images from the server you want to attack? Given the number of mail clients out there that will fetch these images (using their mailers default settings), I would expect this to remain a far more effective attack than the one you propose. [Tim S again] > EXCELLENT point, Alex. Case closed. I'm not sure who you are speaking for here . But yeah, fetching the URL does seem the wrong long-term approach. I'm very impressed with the creativity of the idea though - I see lots of these spams and did wonder WTF we could do about it. Mark. From T.A.Meyer at massey.ac.nz Tue Apr 1 14:06:30 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Mar 31 21:11:26 2003 Subject: [Spambayes] Latest spammer trick stymied - QUESTION Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13010B4430@its-xchg4.massey.ac.nz> [Neale] > I think there ought to be a FAQ on the web page somehwere, to the tune of: > > Q: Hey! Why don't you implement cool tokenizer trick X? I think it > would really foil those spammers! > > A: Have you run your tokenizer trick against a set of messages to see if > it actually works? Many times what seems like a good idea turns out > not to help much, and sometimes even hurts. If you have a good idea, > you've run it against a batch of messages and can prove that it > helps, paste the code for your technique and the proof to the mailing > list. Otherwise, you will likely get a message from Tim Peters about > why you need to test your idea :) +1 I would also suggest that if they are not coders, and don't wish to be, then they could try posting a feature request via the sf system. =Tony Meyer From spambayes_discussion at cklowe.com Tue Apr 1 10:30:40 2003 From: spambayes_discussion at cklowe.com (Chris Lowe) Date: Tue Apr 1 04:30:48 2003 Subject: [Spambayes] Latest spammer trick stymied References: <200303311251.h2VCp4419496@localhost.localdomain><3E893DA7.31420.20D35DB@localhost> <20030331221529.8A6592DDF2@cashew.wolfskeep.com> Message-ID: <004c01c2f831$5c8e49b0$35526451@blueeyes> T. Alexander Popiel wrote: > If there comes to be a widely deployed tool with this sort of > fetch-what-I-tell-you-to behaviour, then itwill get exploited > by people wanting to do a denial of serviceattack or similar. I'm not convinced by this. In order to mount a good DOS attack, the attacker must effectively multiply his bandwidth as much as possible, so his paltry broadband link can compete with a well connected server farm. The standard techniques are to use small ping packets that require large, 64K responses, and to use zombies that make continuous requests. A URL sent in an email via SMTP represents a sizeable amount of data, and unlike ping packets involves establishing a TCP link. Only a fraction of the recipients will follow the URL. Bandwidth-consuming images are not going to be downloaded by the crawler, just text. So I don't believe an attacker will consider spamming URLs to millions of recipients an effective way to use his bandwidth to eat up the target's bandwidth. Cheers, Chris. From Paul.Moore at atosorigin.com Tue Apr 1 11:07:10 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Apr 1 05:08:33 2003 Subject: [Spambayes] Latest spammer trick stymied Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A24@UKDCX001.uk.int.atosorigin.com> From: Tim Stone - Four Stones Expressions > That's right. We really should try to solve this problem > with tokenization. Silly question, but is there actually a problem? The system isn't expected to be 100% perfect. Is this happening often enough to justify the effort? I get a reasonable number of virus mails from big@boss.com, they generally come in as "unsure". After I train on 5 or 6 of them, they start coming in as spam. No problem. Won't this work here as well? If the issue is with the person who was surprised that Spambayes didn't identify an "obvious" spam, maybe it's just an education issue. I'm against following URLs. Please don't add it to the main code (except, if you must, as an option which defaults to off). Paul. From mwh at python.net Tue Apr 1 11:24:44 2003 From: mwh at python.net (Michael Hudson) Date: Tue Apr 1 05:33:31 2003 Subject: [Spambayes] Re: Latest spammer trick stymied References: <16008.46908.795498.412561@montanaro.dyndns.org> <3E895F2C.28583.2902582@localhost> Message-ID: <2mhe9izf9v.fsf@starship.python.net> "Richard Jowsey" writes: > However, these new spams appear designed to provide us the absolute > minimum number of clues. So, as Tim keeps saying , perhaps the thing to do is turn this on it's head: make constructions that appear to be intended to minimize clue-generation generate tokens. Cheers, M. -- > Or can I sweep that can of worms under the rug? Please shove them under the garage. -- Greg Ward and Guido van Rossum mix their metaphors on python-dev From Paul.Moore at atosorigin.com Tue Apr 1 11:29:37 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Apr 1 05:35:29 2003 Subject: [Spambayes] Latest spammer trick stymied - QUESTION Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A25@UKDCX001.uk.int.atosorigin.com> From: Neale Pickett [mailto:neale@woozle.org] > So many, in fact, that I think there ought to be a FAQ on the web > page somehwere, to the tune of: +1 Can I also suggest: Q: I just got a spam, but the system said it was "unsure". Why couldn't it tell that it was spam - it's obvious? A: It may be obvious to you, but the classifier only works on the information it has been given. Maybe this is "new" (you've never seen this particular flavour of spam before), or maybe there aren't enough clues in the message which the system is aware of as strong spam clues. Q: OK, I trained on that message. But I just got *another* one, and the stupid system still thinks it's unsure. Why did it ignore me??? A: It didn't, but you may need to train on a few more of this type of message to get it classified as "spam". The classification algorithm weights its results based on the number of times it has seen a particular clue, so that clues unique to this type of message may need a few more instances to become "convincing". Paul. From richard at jowsey.com Tue Apr 1 21:02:54 2003 From: richard at jowsey.com (Richard Jowsey) Date: Tue Apr 1 06:03:41 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <16E1010E4581B049ABC51D4975CEDB88619A24@UKDCX001.uk.int.atosorigin.com> Message-ID: <3E89FE7E.26741.4FE7EAD@localhost> > From: Tim Stone - Four Stones Expressions > > That's right. We really should try to solve this problem > > with tokenization. > > Silly question, but is there actually a problem? The system isn't > expected to be 100% perfect. Is this happening often enough to justify > the effort? That's a very good question, actually. IMHO, it's happening often enough when your inbox is normally 99.9% spam-free, but suddenly, a few of these low-mass particles start sneaking through... > I get a reasonable number of virus mails from big@boss.com, they > generally come in as "unsure". After I train on 5 or 6 of them, > they start coming in as spam. No problem. Won't this work here > as well? Apparently not. My proxy catches viruses too, real well! This is a bit different, in that these subatomics are sent from randomly generated sub-domains, with randomized senders, etc. Thus, minimal and rapidly-changing clue sets. There's just no good way to train on them quickly enough. It's damn annoying, is what... > If the issue is with the person who was surprised that Spambayes > didn't identify an "obvious" spam, maybe it's just an education > issue. Nope, the tester in question is a very educated consumer. I can see where you're going, but the general public expects a so-called "filtering proxy service" to work 100% of the time. And they're perplexed when it misses something they think is obvious. But let's not worry about a URL slurper getting into the core SpamBayes code. It probably shouldn't. But certain individuals might want to experiment with the notion, and that's the kind of real-world testing that can only improve an already extraordinarily intelligent mail filter. Which is a Good Thing, I reckon... :-) Cheers, Richard From lists at olivermaunder.co.uk Tue Apr 1 12:26:42 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 1 06:27:32 2003 Subject: [Spambayes] Getting a mbox file from Outlook Express In-Reply-To: References: <16E1010E4581B049ABC51D4975CEDB880113D99B@UKDCX001.uk.int.atosorigin.com> Message-ID: <3E897772.5070304@olivermaunder.co.uk> David Leftley wrote: >Possibly the simplest way to approach this is to install a copy of >Eudora, and tell it to import the messages from OE. I believe Eudora >uses standard mbox files for its storage. > >David. > > Or Mozilla Mail - it imports from OE, and uses mbox for storage. Olly From Paul.Moore at atosorigin.com Tue Apr 1 13:48:17 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Apr 1 07:49:41 2003 Subject: [Spambayes] Some pop3proxy suggestions from a new user Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A26@UKDCX001.uk.int.atosorigin.com> I've introduced a colleague to Spambayes, and he's pretty impressed. He's using pop3proxy with Outlook Express, and barring a few minor issues while setting it up it's working fine for him. He came up with a couple of comments, though: 1. The fact that the column headers (discard, defer, spam, ham) on the review page are clickable wasn't obvious to him. Checking, I see that this is mentioned, but maybe the wording could be clarified a bit. Maybe something like "To train on all messages in a section at one go, you can click on the Discard / Defer / Ham / Spam title at the top of the column, which will select that action for all messages in the section"? 2. It would be nice to be able to "Classify a message" from the creview page, as well as by uploading a message file. Maybe clicking on the message subject, or an explicit "Classify" clink on the message line, ould be added. 3. Having experimented with training, he wanted to clear out the training info and start again. I said that this was just a matter of deleting the hammie.db file and the cache directories, and then restarting the proxy. (a) is this right, and (b) is it worth a way of doing this in the UI? (I suspect the answer to (b) is "no", as it's rare to need this, and dangerous if you do it by accident...) I'll have a go at coding these at some point, but I'm low ot time at the moment, so I thought I'd post the suggestions to the list so that at least they are recorded for posterity :-) Paul. From francois.granger at free.fr Tue Apr 1 15:14:26 2003 From: francois.granger at free.fr (Francois Granger) Date: Tue Apr 1 08:14:32 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: References: Message-ID: At 19:59 -0500 31/03/2003, in message RE: RE: [Spambayes] Latest spammer trick stymied, Tim Peters wrote: >[Tim Stone] > > > I'm not sure if this would have pushed it toward spamminess or not, but it >> bears researching. > >Look in your database for the spamprob on 'proto:http'. My bet is that it's >near neutral; Number of spam messages: 470. Number of ham messages: 224. Probability that a message containing this word is spam: 0.643541700966. So, for my kind of mails, it is on the spam side of things ;-) -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From bill at parducci.net Tue Apr 1 06:58:18 2003 From: bill at parducci.net (bill parducci) Date: Tue Apr 1 10:01:54 2003 Subject: [Spambayes] db query References: Message-ID: <3E89A90A.3060600@parducci.net> > Look in your database for the spamprob on 'proto:http'. My bet is > that it's near neutral i have been trying to figure out how to query the db for sometime now to get this very info. can someone point me to the FAQ/note/msg/comment/util that allows one to do so? thanks b From skip at pobox.com Tue Apr 1 09:10:03 2003 From: skip at pobox.com (Skip Montanaro) Date: Tue Apr 1 10:11:09 2003 Subject: [Spambayes] db query In-Reply-To: <3E89A90A.3060600@parducci.net> References: <3E89A90A.3060600@parducci.net> Message-ID: <16009.43979.172138.854043@montanaro.dyndns.org> >> Look in your database for the spamprob on 'proto:http'. My bet is >> that it's near neutral bill> i have been trying to figure out how to query the db for sometime bill> now to get this very info. can someone point me to the bill> FAQ/note/msg/comment/util that allows one to do so? I doubt there's anything in the faq about it. Here's how to go about printing the raw spam/ham counts for a given token: >>> import shelve >>> db = shelve.open("hammie.db") >>> print db.get("proto:http") (12106, 20272) >>> db.close() The only reason I used db.get("proto:http") instead of db["proto:http"] is that there's no guarantee that any particular token is present in your database. (This was a poor example to demonstrate that.) I can't remember if the first element is the number of times it appears in ham or spam. A little guesswork suggests the first element is nspam: >>> print db.get("viagra") (122, 4) Skip From Paul.Moore at atosorigin.com Tue Apr 1 16:32:17 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Apr 1 10:33:40 2003 Subject: [Spambayes] db query Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D9D7@UKDCX001.uk.int.atosorigin.com> From: Skip Montanaro [mailto:skip@pobox.com] > I doubt there's anything in the faq about it. Here's how to go about > printing the raw spam/ham counts for a given token: This only works for shelve-based databases. For bsddb3-based ones (eg, the Outlook client) you can do >>> import bsddb3 # Done on a copy - is it safe on the live db? >>> db = bsddb3.hashopen("a.db") >>> import cPickle >>> print cPickle.loads(db["proto:http"]) (5576, 4205) It would be nice if there were a generic wrapper which worked for any database. Paul. From tim at fourstonesExpressions.com Tue Apr 1 10:07:03 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 1 11:07:07 2003 Subject: [Spambayes] db query In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D9D7@UKDCX001.uk.int.atosorigin.com> Message-ID: 4/1/2003 9:32:17 AM, "Moore, Paul" wrote: >It would be nice if there were a generic wrapper which worked for any >database. The pop3proxy has a word query interface that works quite nicely with any db. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From spambayes at rodland.no Tue Apr 1 21:09:40 2003 From: spambayes at rodland.no (Fredrik Rodland) Date: Tue Apr 1 14:09:45 2003 Subject: [Spambayes] Getting a mbox file from Outlook Express In-Reply-To: <3E897772.5070304@olivermaunder.co.uk> Message-ID: > David Leftley wrote: > > >Possibly the simplest way to approach this is to install a copy of > >Eudora, and tell it to import the messages from OE. I believe Eudora > >uses standard mbox files for its storage. > > > >David. > > I think Eudora strips the attatchmnets from the emailos and store these another place. doen't it? F -- Fredrik Rodland Technical Architect, Stocknet, Oslo, Norway Stocknet: http://www.stocknet.com phone: +47 23 28 40 17 Private: http://rodland.no phone: +47 99 21 98 17 From tim.one at comcast.net Tue Apr 1 22:30:00 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Apr 1 22:30:34 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: Message-ID: [Tim Stone] > ... > So what's your take on the slurping thing, Tim? It could be valuable, although it seems more at home in a central (shared) server kind of scheme, where the expenses (on all sides) of fetching content can be incurred once for the benefit of many (I'm picturing a shared dict/database mapping a URL to a token sequence -- there's no "ham or spam?" judgment there, just a one-time fetching and pre-digesting of the referenced info). One twist I didn't see mentioned is that spam web sites often get shut down quickly, so failure to resolve a URL would be a useful (& sometimes expensive (in time) to obtain!) clue too. The spambayes system has always scratched its head over (a) very short msgs, and (b) long, chatty, "just folks" spam. Fetching URL content could improve classification of both. The OP's scheme of invoking it only when the score would otherwise be unsure was a neat idea. Integrating blacklist lookups as part of header analysis would be similar (IMO) in many ways. OTOH, I don't exepct 1-URL spam to survive -- there's no motivation to click the link. From tdickenson at devmail.geminidataloggers.co.uk Wed Apr 2 11:31:14 2003 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Wed Apr 2 05:31:19 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: References: Message-ID: <200304021131.14274.tdickenson@devmail.geminidataloggers.co.uk> On Wednesday 02 April 2003 4:30 am, Tim Peters wrote: > OTOH, I don't exepct 1-URL spam to survive -- there's no motivation to > click the link. Curiosity? There are few clues to the human that it is spam too. From lists at olivermaunder.co.uk Wed Apr 2 14:51:29 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 2 08:52:39 2003 Subject: [Spambayes] setup In-Reply-To: <41OKWVGBUQNYTA5POJI54PC8A9MLIF.3e88637a@myst> References: <41OKWVGBUQNYTA5POJI54PC8A9MLIF.3e88637a@myst> Message-ID: <3E8AEAE1.4050609@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >There currenly is no imap proxy in spambayes. It is a documented feature >request, but nobody has picked it up as of yet. I think the problem >(certainly from my point of view) is that imap servers to test against are not >nearly as common as pop3 servers. > > I was going to have a look at this because I use IMAP, but haven't got round to it, and quite probably never will. With IMAP, you don't really need a proxy. You can use a standalone program to filter the mail into the correct folders, and the messages will appear in the right place in any IMAP client. Because the folders are on the server, you can even do automatic training independent of any client. If anyone fancies doing this, IMAPSpamBeGone (http://www.rogerbinns.com/isbg) might be a good jumping off point. It's an IMAP front-end for SpamAssassin, written in Python, so should be relatively easy to adapt for SpamBayes. Olly >c'est moi - TimS >http://www.fourstonesExpressions.com >http://wecanstopspam.org > >There are 10 kinds of people in the world: > those who understand binary, > and those who don't. > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > From lists at morpheus.demon.co.uk Wed Apr 2 21:37:03 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Wed Apr 2 16:19:26 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worth of messages...? Message-ID: I've recently (over a week ago) reinstalled Spambayes. I'm using the pop3proxy as a service on Windows. After my initial training, I stopped bothering training on new messages (I get a lot of spam and very little ham, so training just tends to unbalance my database). Just today, I went into the "review messages" page. And I could go back to Sunday 23rd March - well over 7 days. I thought the proxy discarded messages from over 7 days ago? Am I wrong, or is there a bug here somewhere? Paul. -- This signature intentionally left blank From tim at fourstonesExpressions.com Wed Apr 2 18:01:40 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 2 19:01:46 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worth of messages...? In-Reply-To: Message-ID: 4/2/2003 2:37:03 PM, Paul Moore wrote: > >Am I wrong, or is there a bug here somewhere? I think you have to restart the proxy to get it to do it's discard old stuff thing. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Thu Apr 3 12:04:00 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 2 19:04:40 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worth ofmessages...? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230836@its-xchg4.massey.ac.nz> > >Am I wrong, or is there a bug here somewhere? > I think you have to restart the proxy to get it to do it's > discard old stuff thing. I think you might be right. This does strike me as a bug, though - we shouldn't require users to restart to do the purge (we need to find something else to hook the purge onto). =Tony Meyer From tim.one at comcast.net Wed Apr 2 21:15:55 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Apr 2 21:18:18 2003 Subject: [Spambayes] Latest spammer trick stymied In-Reply-To: <200304021131.14274.tdickenson@devmail.geminidataloggers.co.uk> Message-ID: [Tim[ > OTOH, I don't exepct 1-URL spam to survive -- there's no motivation to > click the link. [Toby Dickenson] > Curiosity? There are few clues to the human that it is spam too. I talked about the novelty factor before, and didn't want to repeat it. For 1-URL spam to *survive* means we see lots of them over time. As a 1-shot trick it may be effective, but it gets old very fast, and I can't imagine anyone clicking on a link like http://abcdefgh.lewdmother.com a second time . BTW, don't click on that -- it's a live URL, and lewdmother.com doesn't care what you (or they) put before the first dot. It's an obnoxious porn site that pops up a sequence of browser windows in full-screen mode. Clicking on just one of those would be enough to stop just about anyone from clicking on a 1-URL msg from a stranger again. From Paul.Moore at atosorigin.com Thu Apr 3 09:05:27 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Thu Apr 3 03:06:55 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D9DC@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > >Am I wrong, or is there a bug here somewhere? > > > > I think you have to restart the proxy to get it to do it's > > discard old stuff thing. > > I think you might be right. This does strike me as a bug, > though - we shouldn't require users to restart to do the purge > (we need to find something else to hook the purge onto). I shutdown my PC each night, and restart the next day. That's not doing the purge... Paul. From tim at fourstonesExpressions.com Thu Apr 3 07:10:50 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 3 08:12:17 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D9DC@UKDCX001.uk.int.atosorigin.com> Message-ID: 4/3/2003 2:05:27 AM, "Moore, Paul" wrote: >From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] >> > >Am I wrong, or is there a bug here somewhere? >> > >> > I think you have to restart the proxy to get it to do it's >> > discard old stuff thing. >> >> I think you might be right. This does strike me as a bug, >> though - we shouldn't require users to restart to do the purge >> (we need to find something else to hook the purge onto). > >I shutdown my PC each night, and restart the next day. That's not >doing the purge... Ok. That's a bona-fide bug. I'll have a look, unless Tony beats me to it (yeah right ) c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at morpheus.demon.co.uk Wed Apr 2 23:04:47 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Thu Apr 3 13:57:31 2003 Subject: [Spambayes] Latest spammer trick stymied References: <200304021131.14274.tdickenson@devmail.geminidataloggers.co.uk> Message-ID: Toby Dickenson writes: > On Wednesday 02 April 2003 4:30 am, Tim Peters wrote: > >> OTOH, I don't exepct 1-URL spam to survive -- there's no motivation to >> click the link. > > Curiosity? There are few clues to the human that it is spam too. I'd like to think that people wouldn't click on unidentified URLs appearing in unsolicited emails. But if that were true, things like the ILOVEYOU "virus" wouldn't exist. So you're probably right... Paul. -- This signature intentionally left blank From webmaster at badreligion.ru Fri Apr 4 00:39:32 2003 From: webmaster at badreligion.ru (xd.rav3n) Date: Thu Apr 3 14:04:33 2003 Subject: [Spambayes] (no subject) Message-ID: <188360999860.20030404003932@badreligion.ru> Hi there, i have 1 question about your product... Will you develop spambayes-plugin for The Bat! [ http://www.ritlabs.com ]? Best regards, xd.rav3n ICQ: 55000220 WWW: www.badreligion.ru From noreply at sourceforge.net Thu Apr 3 11:38:46 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 3 14:23:51 2003 Subject: [Spambayes] [ spambayes-Bugs-709051 ] Error loading configuration should not be fatal Message-ID: Bugs item #709051, was opened at 2003-03-24 23:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Error loading configuration should not be fatal Initial Comment: There was a report of this error using the second binary release: SpamAddin - Connecting to Outlook pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 152, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 355, in LoadConfig exceptions.EOFError: While there is another problem that caused this error, we should not die completely loading the config pickle should it get screwed up. However, as this means spambayes will be unconfigured, we do need a scheme to let the user know this (as we do in the few other places where we disable spambayes due to config errors) ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-03 21:38 Message: Logged In: YES user_id=227443 I have another case, but without apparent cause: Traceback (most recent call last): File "/home/mailman21/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/home/mailman21/Mailman/Queue/Runner.py", line 155, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/home/mailman21/Mailman/Queue/OutgoingRunner.py", line 69, in _dispose mlist.Load() File "/home/mailman21/Mailman/MailList.py", line 626, in Load self._spamdb = hammie.open(path, 0) File "/home/mailman21/pythonlib/spambayes/hammie.py", line 262, in open b = storage.PickledClassifier(filename) File "/home/mailman21/pythonlib/spambayes/storage.py", line 80, in __init__ self.load() File "/home/mailman21/pythonlib/spambayes/storage.py", line 98, in load tempbayes = pickle.load(fp) EOFError it happens quite often but not always, I believe it is a concurrency issue (e.g. lack of locking). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-03-24 23:56 Message: Logged In: YES user_id=14198 The reporter just let me know that the problem was caused by about 20 power failures over short period. So I don't think we can cure the cause here, just the symptoms. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 From noreply at sourceforge.net Thu Apr 3 14:01:40 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 3 16:46:42 2003 Subject: [Spambayes] [ spambayes-Bugs-709051 ] Error loading configuration should not be fatal Message-ID: Bugs item #709051, was opened at 2003-03-24 23:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Error loading configuration should not be fatal Initial Comment: There was a report of this error using the second binary release: SpamAddin - Connecting to Outlook pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 152, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 355, in LoadConfig exceptions.EOFError: While there is another problem that caused this error, we should not die completely loading the config pickle should it get screwed up. However, as this means spambayes will be unconfigured, we do need a scheme to let the user know this (as we do in the few other places where we disable spambayes due to config errors) ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 00:01 Message: Logged In: YES user_id=227443 Maybe this patch would be a little (but insufficient) improvement? I'd upload it as a separate file, but there's no "upload" button.... --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-03 23:43:16.000000000 +0200 @@ -59,6 +59,7 @@ import cPickle as pickle import errno import shelve +import os from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,9 +122,10 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') + fp = open(self.db_name+'.tmp', 'wb') pickle.dump(self, fp, PICKLE_TYPE) fp.close() + os.rename(self.db_name+'.tmp', self.db_name) class DBDictClassifier(classifier.Classifier): ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-03 21:38 Message: Logged In: YES user_id=227443 I have another case, but without apparent cause: Traceback (most recent call last): File "/home/mailman21/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/home/mailman21/Mailman/Queue/Runner.py", line 155, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/home/mailman21/Mailman/Queue/OutgoingRunner.py", line 69, in _dispose mlist.Load() File "/home/mailman21/Mailman/MailList.py", line 626, in Load self._spamdb = hammie.open(path, 0) File "/home/mailman21/pythonlib/spambayes/hammie.py", line 262, in open b = storage.PickledClassifier(filename) File "/home/mailman21/pythonlib/spambayes/storage.py", line 80, in __init__ self.load() File "/home/mailman21/pythonlib/spambayes/storage.py", line 98, in load tempbayes = pickle.load(fp) EOFError it happens quite often but not always, I believe it is a concurrency issue (e.g. lack of locking). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-03-24 23:56 Message: Logged In: YES user_id=14198 The reporter just let me know that the problem was caused by about 20 power failures over short period. So I don't think we can cure the cause here, just the symptoms. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 From T.A.Meyer at massey.ac.nz Fri Apr 4 10:40:55 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 17:41:57 2003 Subject: [Spambayes] (no subject) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230B41@its-xchg4.massey.ac.nz> > I have 1 question about your product... Will you develop > spambayes-plugin for The Bat! [ http://www.ritlabs.com ]? You can already use The Bat! with spambayes (I have done so). You'll want to use the pop3proxy (and possibly smtpproxy, which works fine with The Bat!). Download the software and read the INTEGRATION.TXT file. In particular, the section on the pop3proxy and the web interface. Please let us know if you have any problems. =Tony Meyer From tim at fourstonesExpressions.com Thu Apr 3 16:59:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 3 17:59:09 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230B58@its-xchg4.massey.ac.nz> Message-ID: 4/3/2003 4:57:33 PM, "Meyer, Tony" wrote: >> Ok. That's a bona-fide bug. >> I'll have a look, unless Tony beats me to it (yeah right ) > >It seems to me that the only time that the cache is purged is when >CreateWorkers() is called - at startup, and when options are changed. +1 from me. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Fri Apr 4 11:02:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 18:03:26 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230B60@its-xchg4.massey.ac.nz> > > > Am I wrong, or is there a bug here somewhere? > > I shutdown my PC each night, and restart the next day. > > That's not doing the purge... By any chance, are the messages that aren't being purged in the unknown cache (i.e. messages that have not been trained)? Pop3proxy currently doesn't purge these - only the spam and ham caches. I understand why it would be good to keep these around to train on, but not in the case where a user is satisfied with performance and not doing any training. Anyway, unless anyone speaks up soon (or TimS finds an actual bug somewhere (yeah right )) then I'll change pop3proxy to purge all three. =Tony Meyer From noreply at sourceforge.net Thu Apr 3 15:19:46 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 3 18:04:45 2003 Subject: [Spambayes] [ spambayes-Bugs-709051 ] Error loading configuration should not be fatal Message-ID: Bugs item #709051, was opened at 2003-03-25 09:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Error loading configuration should not be fatal Initial Comment: There was a report of this error using the second binary release: SpamAddin - Connecting to Outlook pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 152, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 355, in LoadConfig exceptions.EOFError: While there is another problem that caused this error, we should not die completely loading the config pickle should it get screwed up. However, as this means spambayes will be unconfigured, we do need a scheme to let the user know this (as we do in the few other places where we disable spambayes due to config errors) ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-04 09:19 Message: Logged In: YES user_id=14198 Check the traceback is the same as yours - this error is loading the configuration pickle, not the word database. Thus, locking shouldn't be the issue, as I can't see how two threads or processes could write this file at once (Outlook appears to have its own lock for startup; I've never seen spambayes running twice in different processes.) So, your patch wont help this exception. However, if you are getting a slightly different EOFError, you patch may apply. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 08:01 Message: Logged In: YES user_id=227443 Maybe this patch would be a little (but insufficient) improvement? I'd upload it as a separate file, but there's no "upload" button.... --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-03 23:43:16.000000000 +0200 @@ -59,6 +59,7 @@ import cPickle as pickle import errno import shelve +import os from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,9 +122,10 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') + fp = open(self.db_name+'.tmp', 'wb') pickle.dump(self, fp, PICKLE_TYPE) fp.close() + os.rename(self.db_name+'.tmp', self.db_name) class DBDictClassifier(classifier.Classifier): ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 05:38 Message: Logged In: YES user_id=227443 I have another case, but without apparent cause: Traceback (most recent call last): File "/home/mailman21/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/home/mailman21/Mailman/Queue/Runner.py", line 155, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/home/mailman21/Mailman/Queue/OutgoingRunner.py", line 69, in _dispose mlist.Load() File "/home/mailman21/Mailman/MailList.py", line 626, in Load self._spamdb = hammie.open(path, 0) File "/home/mailman21/pythonlib/spambayes/hammie.py", line 262, in open b = storage.PickledClassifier(filename) File "/home/mailman21/pythonlib/spambayes/storage.py", line 80, in __init__ self.load() File "/home/mailman21/pythonlib/spambayes/storage.py", line 98, in load tempbayes = pickle.load(fp) EOFError it happens quite often but not always, I believe it is a concurrency issue (e.g. lack of locking). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-03-25 09:56 Message: Logged In: YES user_id=14198 The reporter just let me know that the problem was caused by about 20 power failures over short period. So I don't think we can cure the cause here, just the symptoms. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 From tim at fourstonesExpressions.com Thu Apr 3 17:05:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 3 18:05:47 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230B60@its-xchg4.massey.ac.nz> Message-ID: <65JG53ZOM215A6KKE86QNJGA7IFIE.3e8cbe41@myst> 4/3/2003 5:02:36 PM, "Meyer, Tony" wrote: >> > > Am I wrong, or is there a bug here somewhere? >> > I shutdown my PC each night, and restart the next day. >> > That's not doing the purge... > >By any chance, are the messages that aren't being purged in the unknown >cache (i.e. messages that have not been trained)? Pop3proxy currently >doesn't purge these - only the spam and ham caches. > >I understand why it would be good to keep these around to train on, but >not in the case where a user is satisfied with performance and not doing >any training. > >Anyway, unless anyone speaks up soon (or TimS finds an actual bug >somewhere (yeah right )) TimS won't be finding any bugs until sometime tomorrow when he gets back home... > then I'll change pop3proxy to purge all three. Another +1 > >=Tony Meyer > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Fri Apr 4 11:32:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 18:33:16 2003 Subject: [Spambayes] pop3proxy message expiry Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230B90@its-xchg4.massey.ac.nz> WARNING: I've just checked in a change that will effect all pop3proxy users. Previously, only those messages that you had trained as spam/ham were expired from the cache that pop3proxy keeps. Untrained messages were kept until they were trained. This is no longer the case. *ALL* messages are expired after whatever time you have set (7 days by default). You also no longer need to restart the proxy (or change the options) in order to have messages expired. Messages are expired (if old enough) each time a mail client connects to the proxy. Please let me know if there are any problems with this and I'll deal with them ASAP. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 4 13:12:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 20:12:53 2003 Subject: [Spambayes] Some pop3proxy suggestions from a new user Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230BF7@its-xchg4.massey.ac.nz> > He's using pop3proxy with Outlook Express, and > barring a few minor issues while setting it up it's working > fine for him. Were the issues anything that might apply to others? To get this moving towards beta we need to have docs for any little things that come up. > 1. The fact that the column headers (discard, defer, spam, > ham) on the review page are clickable wasn't obvious to > him. Checking, I see that this is mentioned, but maybe the > wording could be clarified a bit. Maybe something like "To > train on all messages in a section at one go, you can click > on the Discard / Defer / Ham / Spam title at the top of the > column, which will select that action for all messages in the > section"? The wording is currently "Click one of the Discard / Defer / Ham / Spam headers to check all of the buttons in that section in one go". To me this is fairly clear, and somewhat more concise than your version. What exactly do you think needs clarifying? > 2. It would be nice to be able to "Classify a message" from the > review page, as well as by uploading a message file. Maybe > clicking on the message subject, or an explicit "Classify" > clink on the message line, could be added. The subject is already a link to the message text itself, but it would be simple enough to add a classify link (or 'show clues', since the classification is already given). I've done this; CVS-up and see if it matches what you/he was after. > 3. Having experimented with training, he wanted to clear out > the training info and start again. I said that this was > just a matter of deleting the hammie.db file and the cache > directories, and then restarting the proxy. (a) is this right, > and (b) is it worth a way of doing this in the UI? (I suspect > the answer to (b) is "no", as it's rare to need this, and > dangerous if you do it by accident...) The outlook plugin has an option to do a full-retrain, which is more-or-less the same as this, except that the pop3proxy doesn't keep messages to retrain on, of course. It would be simple enough to add this, except for your (b) above. Is this worth doing? I think I would weigh in on the 'no' side, but add something to the docs (the FAQ perhaps) about how to do it manually. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 4 13:19:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 20:20:39 2003 Subject: [Spambayes] Latest spammer trick stymied - QUESTION Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230BFE@its-xchg4.massey.ac.nz> > > Many times what seems like a good idea turns > > out not to help much, and sometimes even hurts. > > this very thread started with such an approach {build and > show] and was predominantly dismissed. this may not have an > affect on the implementer's use of the modification, but I > would hate to think that this would be the only 'allowable' > method by which ideas can be posted. I'm sure that any ideas can be posted. And if someone isn't able to code something to do some testing, then they really should post a feature request. If everyone else thinks that the feature is worthless, then at least it can be closed, and there is a record of it being requested. Although the consensus has weighed in against the URL-following technique (in general, at least), I'm sure it hasn't weighed in against the discussion itself. > ...and sometimes someone else has tried it and it didn't > help. why would you want to force people to reinvent the wheel > before discussing an idea? Two things here. One is that even if something has been tried before, that doesn't mean that it isn't worth trying again. It's not like all the changes are independent - especially if something was tried before all the combination/math stuff got settled down. The other is that the docs/list archive have a good summary of things that have been tried - it's certainly worth searching through them to see if something was done, and what the results were. (Interestingly, regarding following up on urls, IIRC TimS suggested something like this a long while back). =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 4 13:25:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 20:25:41 2003 Subject: [Spambayes] IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230C05@its-xchg4.massey.ac.nz> >> There currently is no imap proxy in spambayes. It is a documented >> feature request, but nobody has picked it up as of yet. I think >> the problem (certainly from my point of view) is that imap servers >> to test against are not nearly as common as pop3 servers. > I was going to have a look at this because I use IMAP, but > haven't got round to it, and quite probably never will. Just how high is the demand for an IMAP solution for spambayes? (Speak up!). I'm happy (as is TimS I think) to do the coding necessary since it's probably pretty similar to the pop3proxy stuff anyway, but (as we've both said) we don't have IMAP accounts to test against (anyone know of free ones from anywhere?). If there are enough people who want it, and who will do the testing, then I'll still do the coding, although it'll be a much slower process than otherwise. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 4 14:09:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 3 21:12:23 2003 Subject: [Spambayes] IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230C33@its-xchg4.massey.ac.nz> > I've been wondering if you really want to proxy it... or just > have a separate IMAP client that runs from cron and does the > scoring/message moving. This is what Oliver Maunder suggested as well, and it seems to make sense to me (and avoid various problems that proxies have). I'll take this path. > > (anyone know of free ones from anywhere?). > http://www.myrealbox.com/ offers excellent service and is free. Great. I've got an account there now (all you spammers reading the list, please hit me with your best at anadelonbrin@myrealbox.com so I have something to work with). I'll start working on an imap filter app and commit it once it's useable. =Tony Meyer From richard at jowsey.com Fri Apr 4 12:48:26 2003 From: richard at jowsey.com (Richard Jowsey) Date: Thu Apr 3 21:48:44 2003 Subject: [Spambayes] Latest spammer trick stymied - QUESTION In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230BFE@its-xchg4.massey.ac.nz> Message-ID: <3E8D7F1A.30907.12ACDBA8@localhost> [Tony] > I'm sure that any ideas can be posted. And if someone isn't able to > code something to do some testing, then they really should post a > feature request. The url slurper *has* been coded, but in Java, or I'd have definitely donated source to the Pythonic SpamBayes project for further evaluation. I'd emphasize that this concept has been "real-world tested" by a number of my beta users, who're apparently quite delighted that these annoying micro-spams are now being accurately classified. As am I. "Unsures" have completely disappeared in cases where the message sender is unknown, the email has very few useful clues, and it contains only a single URL. Those are the only test results that matter to me... [Tony] > Although the consensus has weighed in against the URL-following > technique (in general, at least), I'm sure it hasn't weighed in > against the discussion itself. The discussion has been quite healthy, IMO, but I can't quite agree with your "consensus" conclusion. Several people did express (constructive) concerns about the security/safety of such a capability, and other contributors have, I think, allayed or circumscribed those fears. A couple of people also questioned the usefulness of URL-retrieval, given there aren't too many of these spasms, and they'll probably disappear soon enough. A fair comment, I thought. As for consensus, I'm biased, but I don't really think there is one. Bear in mind, we're talking about a real "outer limits" tweak here, specifically for those quite rare cases where the classifier just doesn't have enough data to make up its mind. Understandably, many people aren't at all concerned about squeezing yet another 0.01% of accuracy out of the beastie; it's already good enough. In fact, most people could probably care less! I only mentioned the idea for those of us who demand absolute 100% perfection... Apologies if I've inadvertently gaffed etiquette by not posting code and/or a Feature Request. I'm a bad dog, and I promise to behave better in future! ;-) Later, R From Paul.Moore at atosorigin.com Fri Apr 4 09:06:39 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Apr 4 03:08:05 2003 Subject: [Spambayes] I thought pop3proxy only kept 7 days' worthofmessages...? Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D9E4@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > By any chance, are the messages that aren't being purged in > the unknown cache (i.e. messages that have not been trained)? > Pop3proxy currently doesn't purge these - only the spam and ham > caches. That's exactly right. I've given up regular training, as I'm satisfied with the accuracy I'm getting, and I don't want to train on yet more spam, without adding new ham (which will just unbalance the database) > Anyway, unless anyone speaks up soon (or TimS finds an actual > bug somewhere (yeah right )) then I'll change pop3proxy > to purge all three. That would suit me. If people *really* want to defer training for over a week, there could be an option (defaulting to "purge everything"), but it seems like YAGNI to me. Thanks for investigating this. Paul From Paul.Moore at atosorigin.com Fri Apr 4 09:14:02 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Apr 4 03:15:24 2003 Subject: [Spambayes] Some pop3proxy suggestions from a new user Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A2A@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > He's using pop3proxy with Outlook Express, and barring a few > > minor issues while setting it up it's working fine for him. > > Were the issues anything that might apply to others? To get > this moving towards beta we need to have docs for any little > things that come up. No, they were me messing up the packaging of the CVS version, and a botched attempt (again by me) to fix a minor problem, which Mark Hammond has since fixed properly. > > 1. The fact that the column headers (discard, defer, spam, > > ham) on the review page are clickable wasn't obvious to him. > > Checking, I see that this is mentioned, but maybe the wording > > could be clarified a bit. Maybe something like "To train > > on all messages in a section at one go, you can click on > > the Discard / Defer / Ham / Spam title at the top of the > > column, which will select that action for all messages in the > > section"? > > The wording is currently "Click one of the Discard / Defer / > Ham / Spam headers to check all of the buttons in that section > in one go". To me this is fairly clear, and somewhat more > concise than your version. What exactly do you think needs > clarifying? Probably nothing. Leave it unless someone else raises the same issue. When I pointed out the text and the links, my colleague didn't have a problem. It's just that he needed me to point it out - he'd missed it for himself (in spite of finding clicking on each radio button in turn a real pain). No, leave it. I can't think of a sensible way to improve it... > > 2. It would be nice to be able to "Classify a message" > > from the review page, as well as by uploading a message > > file. Maybe clicking on the message subject, or an explicit > > "Classify" clink on the message line, could be added. > > The subject is already a link to the message text itself, but > it would be simple enough to add a classify link (or 'show > clues', since the classification is already given). > > I've done this; CVS-up and see if it matches what you/he was > after. Thanks. I'll check it once I get out from behind this firwall :-) > > 3. Having experimented with training, he wanted to clear > > out the training info and start again. I said that this was > > just a matter of deleting the hammie.db file and the cache > > directories, and then restarting the proxy. (a) is this > > right, and (b) is it worth a way of doing this in the UI? (I > > suspect the answer to (b) is "no", as it's rare to need this, > > and dangerous if you do it by accident...) > > The outlook plugin has an option to do a full-retrain, which > is more-or-less the same as this, except that the pop3proxy > doesn't keep messages to retrain on, of course. > > It would be simple enough to add this, except for your (b) > above. Is this worth doing? I think I would weigh in on the > 'no' side, but add something to the docs (the FAQ perhaps) > about how to do it manually. That sounds like the best option. Thanks for looking at this. Paul. From Paul.Moore at atosorigin.com Fri Apr 4 09:20:26 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Apr 4 03:21:49 2003 Subject: [Spambayes] IMAP Message-ID: <16E1010E4581B049ABC51D4975CEDB88619A2B@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > >> There currently is no imap proxy in spambayes. It is a > >> documented feature request, but nobody has picked it up as > >> of yet. I think the problem (certainly from my point of > >> view) is that imap servers to test against are not nearly as > >> common as pop3 servers. > > > > I was going to have a look at this because I use IMAP, but > > haven't got round to it, and quite probably never will. > > Just how high is the demand for an IMAP solution for spambayes? > (Speak up!). I keep toying with the idea of switching to IMAP for my mail, just to be a bit more independent of any particular client. But I've never actually done it, so don't count that as a vote in favour... > I'm happy (as is TimS I think) to do the coding necessary since > it's probably pretty similar to the pop3proxy stuff anyway, > but (as we've both said) we don't have IMAP accounts to test > against (anyone know of free ones from anywhere?). Not accounts, but if you are on Windows, there are 2 free local mail servers which I know of which do IMAP. These could be used for testing. Hamster, at http://www.tglsoft.de/misc/hamster_en.htm Mercury, at http://www.pmail.com/ > If there are enough people who want it, and who will do the > testing, then I'll still do the coding, although it'll be a > much slower process than otherwise. I could do some testing (although, as I say, I don't use IMAP at present), but I don't have much free time, so it'd be limited. Paul. From lists at olivermaunder.co.uk Fri Apr 4 09:48:20 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Fri Apr 4 03:49:15 2003 Subject: [Spambayes] IMAP In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230C05@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1301230C05@its-xchg4.massey.ac.nz> Message-ID: <3E8D46D4.6090702@olivermaunder.co.uk> Meyer, Tony wrote: >Just how high is the demand for an IMAP solution for spambayes? (Speak >up!). I'm happy (as is TimS I think) to do the coding necessary since >it's probably pretty similar to the pop3proxy stuff anyway, but (as >we've both said) we don't have IMAP accounts to test against (anyone >know of free ones from anywhere?). > > I want an IMAP solution! As for servers, on my random sample of 2 ISPs, I found they both supported IMAP, although they didn't promote it. I just tried setting up an IMAP connection to their POP3 servers, and it worked! Olly >If there are enough people who want it, and who will do the testing, >then I'll still do the coding, although it'll be a much slower process >than otherwise. > >=Tony Meyer > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > From noreply at sourceforge.net Fri Apr 4 02:28:30 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 05:14:00 2003 Subject: [Spambayes] [ spambayes-Bugs-709051 ] Error loading configuration should not be fatal Message-ID: Bugs item #709051, was opened at 2003-03-24 23:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Error loading configuration should not be fatal Initial Comment: There was a report of this error using the second binary release: SpamAddin - Connecting to Outlook pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 152, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 355, in LoadConfig exceptions.EOFError: While there is another problem that caused this error, we should not die completely loading the config pickle should it get screwed up. However, as this means spambayes will be unconfigured, we do need a scheme to let the user know this (as we do in the few other places where we disable spambayes due to config errors) ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 12:28 Message: Logged In: YES user_id=227443 I disagree, the "configuration pickle" and the "word database" are the very same file. Moreover, without this path you seriously risk to completely loose your word database, in case execution stops beween open() (which truncates the file to zero length) and pickle.dump(). Execution could stop for whatever reason, from CTRL+C to system crash, so it's of vital importance that the file update is atomic, which should be guaranteed by rename(). I think this is a bug for sure, even if you don't plan to add support to concurrency. Of course this is not enough when you add concurrency, because you could loose some training information if 2 separate instances try to update the word database at the same time (they will both read the old file, then they will both create the temp file, then the second rename() will overwrite the result of the first one). To solve this, you should add some locking mechanism (in addition to atomic rename()), which could be out of your scope, I understand, but I think this would be a very useful enhancement on spambayes usability. If you need some code example, you can look at Mailman's handling of the MailList object persistency. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-04 01:19 Message: Logged In: YES user_id=14198 Check the traceback is the same as yours - this error is loading the configuration pickle, not the word database. Thus, locking shouldn't be the issue, as I can't see how two threads or processes could write this file at once (Outlook appears to have its own lock for startup; I've never seen spambayes running twice in different processes.) So, your patch wont help this exception. However, if you are getting a slightly different EOFError, you patch may apply. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 00:01 Message: Logged In: YES user_id=227443 Maybe this patch would be a little (but insufficient) improvement? I'd upload it as a separate file, but there's no "upload" button.... --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-03 23:43:16.000000000 +0200 @@ -59,6 +59,7 @@ import cPickle as pickle import errno import shelve +import os from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,9 +122,10 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') + fp = open(self.db_name+'.tmp', 'wb') pickle.dump(self, fp, PICKLE_TYPE) fp.close() + os.rename(self.db_name+'.tmp', self.db_name) class DBDictClassifier(classifier.Classifier): ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-03 21:38 Message: Logged In: YES user_id=227443 I have another case, but without apparent cause: Traceback (most recent call last): File "/home/mailman21/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/home/mailman21/Mailman/Queue/Runner.py", line 155, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/home/mailman21/Mailman/Queue/OutgoingRunner.py", line 69, in _dispose mlist.Load() File "/home/mailman21/Mailman/MailList.py", line 626, in Load self._spamdb = hammie.open(path, 0) File "/home/mailman21/pythonlib/spambayes/hammie.py", line 262, in open b = storage.PickledClassifier(filename) File "/home/mailman21/pythonlib/spambayes/storage.py", line 80, in __init__ self.load() File "/home/mailman21/pythonlib/spambayes/storage.py", line 98, in load tempbayes = pickle.load(fp) EOFError it happens quite often but not always, I believe it is a concurrency issue (e.g. lack of locking). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-03-24 23:56 Message: Logged In: YES user_id=14198 The reporter just let me know that the problem was caused by about 20 power failures over short period. So I don't think we can cure the cause here, just the symptoms. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 From noreply at sourceforge.net Fri Apr 4 04:01:41 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 06:47:04 2003 Subject: [Spambayes] [ spambayes-Bugs-715248 ] Pickle classifier should save to a temp file first Message-ID: Bugs item #715248, was opened at 2003-04-04 22:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Nobody/Anonymous (nobody) Summary: Pickle classifier should save to a temp file first Initial Comment: A number of "EOF Error"s could be avoided if the pickle classifier saved to a temp file first, then renamed to the real file. Otherwise, failure during save can lose the database. This came up in https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 From noreply at sourceforge.net Fri Apr 4 04:05:16 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 06:49:45 2003 Subject: [Spambayes] [ spambayes-Bugs-709051 ] Config file loading and saving is fragile Message-ID: Bugs item #709051, was opened at 2003-03-25 09:19 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) >Summary: Config file loading and saving is fragile Initial Comment: There was a report of this error using the second binary release: SpamAddin - Connecting to Outlook pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 152, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 355, in LoadConfig exceptions.EOFError: While there is another problem that caused this error, we should not die completely loading the config pickle should it get screwed up. However, as this means spambayes will be unconfigured, we do need a scheme to let the user know this (as we do in the few other places where we disable spambayes due to config errors) ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-04 22:05 Message: Logged In: YES user_id=14198 I'm afraid you are wrong about the config file being the same as the word database. You are however correct about the saving. As we have the 2 pickles, I will track the Outlook config pickle in this bug, and opened: https://sourceforge.net/tracker/index.php?func=detail&aid=715248&group_id=61702&atid=498103 to track the word database bug. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 20:28 Message: Logged In: YES user_id=227443 I disagree, the "configuration pickle" and the "word database" are the very same file. Moreover, without this path you seriously risk to completely loose your word database, in case execution stops beween open() (which truncates the file to zero length) and pickle.dump(). Execution could stop for whatever reason, from CTRL+C to system crash, so it's of vital importance that the file update is atomic, which should be guaranteed by rename(). I think this is a bug for sure, even if you don't plan to add support to concurrency. Of course this is not enough when you add concurrency, because you could loose some training information if 2 separate instances try to update the word database at the same time (they will both read the old file, then they will both create the temp file, then the second rename() will overwrite the result of the first one). To solve this, you should add some locking mechanism (in addition to atomic rename()), which could be out of your scope, I understand, but I think this would be a very useful enhancement on spambayes usability. If you need some code example, you can look at Mailman's handling of the MailList object persistency. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-04 09:19 Message: Logged In: YES user_id=14198 Check the traceback is the same as yours - this error is loading the configuration pickle, not the word database. Thus, locking shouldn't be the issue, as I can't see how two threads or processes could write this file at once (Outlook appears to have its own lock for startup; I've never seen spambayes running twice in different processes.) So, your patch wont help this exception. However, if you are getting a slightly different EOFError, you patch may apply. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 08:01 Message: Logged In: YES user_id=227443 Maybe this patch would be a little (but insufficient) improvement? I'd upload it as a separate file, but there's no "upload" button.... --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-03 23:43:16.000000000 +0200 @@ -59,6 +59,7 @@ import cPickle as pickle import errno import shelve +import os from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,9 +122,10 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') + fp = open(self.db_name+'.tmp', 'wb') pickle.dump(self, fp, PICKLE_TYPE) fp.close() + os.rename(self.db_name+'.tmp', self.db_name) class DBDictClassifier(classifier.Classifier): ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 05:38 Message: Logged In: YES user_id=227443 I have another case, but without apparent cause: Traceback (most recent call last): File "/home/mailman21/Mailman/Queue/Runner.py", line 105, in _oneloop self._onefile(msg, msgdata) File "/home/mailman21/Mailman/Queue/Runner.py", line 155, in _onefile keepqueued = self._dispose(mlist, msg, msgdata) File "/home/mailman21/Mailman/Queue/OutgoingRunner.py", line 69, in _dispose mlist.Load() File "/home/mailman21/Mailman/MailList.py", line 626, in Load self._spamdb = hammie.open(path, 0) File "/home/mailman21/pythonlib/spambayes/hammie.py", line 262, in open b = storage.PickledClassifier(filename) File "/home/mailman21/pythonlib/spambayes/storage.py", line 80, in __init__ self.load() File "/home/mailman21/pythonlib/spambayes/storage.py", line 98, in load tempbayes = pickle.load(fp) EOFError it happens quite often but not always, I believe it is a concurrency issue (e.g. lack of locking). ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-03-25 09:56 Message: Logged In: YES user_id=14198 The reporter just let me know that the problem was caused by about 20 power failures over short period. So I don't think we can cure the cause here, just the symptoms. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 From skip at pobox.com Fri Apr 4 08:27:00 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Apr 4 09:27:15 2003 Subject: [Spambayes] IMAP In-Reply-To: <3E8D46D4.6090702@olivermaunder.co.uk> References: <1ED4ECF91CDED24C8D012BCF2B034F1301230C05@its-xchg4.massey.ac.nz> <3E8D46D4.6090702@olivermaunder.co.uk> Message-ID: <16013.38452.431498.163149@montanaro.dyndns.org> Oliver> As for servers, on my random sample of 2 ISPs, I found they both Oliver> supported IMAP, although they didn't promote it. I just tried Oliver> setting up an IMAP connection to their POP3 servers, and it Oliver> worked! And why should they? From their perspective it only leads to greater consumption of their disk space. Northwestern is going through these IMAP machinations right now. Adding quotas, sending out warning messages (by email!), finally spooling over-quota IMAP storage off to tape. It's gonna be a mess. Skip From tim at fourstonesExpressions.com Fri Apr 4 08:32:17 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 09:32:26 2003 Subject: [Spambayes] IMAP In-Reply-To: <3E8D46D4.6090702@olivermaunder.co.uk> Message-ID: 4/4/2003 2:48:20 AM, Oliver Maunder wrote: >Meyer, Tony wrote: > >>Just how high is the demand for an IMAP solution for spambayes? (Speak >>up!). I'm happy (as is TimS I think) to do the coding necessary since >>it's probably pretty similar to the pop3proxy stuff anyway, but (as >>we've both said) we don't have IMAP accounts to test against (anyone >>know of free ones from anywhere?). >> >> >I want an IMAP solution! When I joined this mailing list and started talking about a Lotus Notes integration. This project was the first open source project I'd ever participated in, and Tim Peters helped me tremendously early on by letting me know that in open source, you get to scratch your own itch . So... don't look for an IMAP proxy from me anytime soon... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at olivermaunder.co.uk Fri Apr 4 15:47:57 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Fri Apr 4 09:48:46 2003 Subject: [Spambayes] IMAP In-Reply-To: References: Message-ID: <3E8D9B1D.6020400@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >> want an IMAP solution! >> >> > >When I joined this mailing list and started talking about a Lotus Notes >integration. This project was the first open source project I'd ever >participated in, and Tim Peters helped me tremendously early on by letting me >know that in open source, you get to scratch your own itch . So... >don't look for an IMAP proxy from me anytime soon... > > > Point taken, and entirely agreed with :-) As I said yesterday, I've been mulling over doing an IMAP app myself, but haven't got round to starting anything yet. If anyone else (Tony?) gets there quicker, then I will certainly do my share of testing and fixing. Olly From dmitry at karasik.eu.org Fri Apr 4 11:42:15 2003 From: dmitry at karasik.eu.org (Dmitry Karasik) Date: Fri Apr 4 09:49:11 2003 Subject: [Spambayes] hammie.py -d crash help Message-ID: <3E8D5377.2020401@karasik.eu.org> Hello, How would I tackle the following problem? $ /usr/local/spambayes/hammie.py -d Traceback (most recent call last): File "/usr/local/spambayes/hammie.py", line 5, in ? spambayes.hammiebulk.main() File "./spambayes/hammiebulk.py", line 180, in main h = hammie.open(pck, usedb, mode) File "./spambayes/hammie.py", line 260, in open b = storage.DBDictClassifier(filename, mode) File "./spambayes/storage.py", line 140, in __init__ self.load() File "./spambayes/storage.py", line 148, in load self.dbm = dbmstorage.open(self.db_name, self.mode) File "./spambayes/dbmstorage.py", line 54, in open return f(*args) File "./spambayes/dbmstorage.py", line 36, in open_best return f(*args) File "./spambayes/dbmstorage.py", line 17, in open_dbhash return bsddb.hashopen(*args) bsddb.error: (2, 'No such file or directory') zsh: 27989 exit 1 /usr/local/spambayes/hammie.py -d /Dmitry From martin.worger at cbs-osn.co.uk Fri Apr 4 11:30:49 2003 From: martin.worger at cbs-osn.co.uk (Martin Worger) Date: Fri Apr 4 09:49:42 2003 Subject: [Spambayes] SpamBayes Outlook Plug-In Message-ID: First of all, it's very good and seems to be doing it's job properly - congratulations. A minor point, but since accumulating enough spam to start filtering, messages I read are no longer marked as 'read' by Outlook. i.e. Since I turned filtering on. I'm running Outlook 2002 SP-2 (10.4608.4219). Sorry no version info on the SpamBayes filter, but I downloaded it last week. Regards, ____________________________________________________________ Martin Worger, BSc (Hons.) IT, MCSD Martin.Worger@cbs-osn.co.uk Systems Development Manager CBS Open Systems & Networks Technical : +44 (0) 116 264 3700 The opinions expressed herein are my own and are not necessarily representative of the policies or opinions of my employer. From jrfsousa at esoterica.pt Fri Apr 4 15:39:47 2003 From: jrfsousa at esoterica.pt (=?iso-8859-1?Q?Jos=E9_Rui_Faustino_de_Sousa?=) Date: Fri Apr 4 09:51:41 2003 Subject: [Spambayes] Possible bug. Message-ID: <000001c2fab8$0ad5b7a0$fd00a8c0@krolik2piii1008> Hi! Log file: #---CUT HERE------------------------ SpamAddin - Connecting to Outlook Created new configuration file 'C:\usr\jrfsousa\Application Data\SpamBayes\default_configuration.pck' pythoncom error: Failed to call the universal dispatcher Traceback (most recent call last): File "E:\src\pythonex\com\win32com\universal.py", line 170, in dispatch File "E:\src\pythonex\com\win32com\server\policy.py", line 322, in _InvokeEx_ File "E:\src\pythonex\com\win32com\server\policy.py", line 601, in _invokeex_ File "E:\src\pythonex\com\win32com\server\policy.py", line 541, in _invokeex_ File "E:\src\spambayes\Outlook2000\addin.py", line 655, in OnConnection File "E:\src\spambayes\Outlook2000\manager.py", line 475, in GetManager File "E:\src\spambayes\Outlook2000\manager.py", line 156, in __init__ File "E:\src\spambayes\Outlook2000\manager.py", line 71, in import_core_spambayes_stuff File "E:\src\Installer\iu.py", line 274, in importHook File "E:\src\Installer\iu.py", line 353, in doimport File "E:\src\spambayes\spambayes\tokenizer.py", line 659, in ? exceptions.AttributeError: 'OptionsClass' object has no attribute 'skip_max_word_size' #---CUT HERE------------------------ Best regards Jos? Rui ======================================================================== We must respect the other fellow"s religion, but only in the sense and to the extent that we respect his theory that his wife is beautiful and his children smart. - H. L. Mencken. ======================================================================== iam://Jose Rui Faustino de Sousa http://homepage.esoterica.pt/~jrfsousa/ mailto://jrfsousa.removeFF289802this@esoterica.pt phone://+351-239444940 address://rua Carlos A. Pinto de Abreu 30C, 1 3040-245 Coimbra Portugal ========================================================================  From tim at fourstonesExpressions.com Fri Apr 4 09:00:01 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 10:00:06 2003 Subject: [Spambayes] IMAP In-Reply-To: <3E8D9B1D.6020400@olivermaunder.co.uk> Message-ID: 4/4/2003 8:47:57 AM, Oliver Maunder wrote: >Point taken, and entirely agreed with :-) BTW... I'm still workin on the notes integration <.5 wink> c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Fri Apr 4 08:55:30 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 10:00:39 2003 Subject: [Spambayes] hammie.py -d crash help In-Reply-To: <3E8D5377.2020401@karasik.eu.org> Message-ID: You must name a file to be used as the spambayes wordinfo database, so your comand should look more like hammie.py -d dmitry.wordinfo.database Or you can leave the -d operand off, and it will default to 'hammie.db' You should run hammie.py with no operands to print some directions on how to run it. You've left off a couple operands that will be needed. 4/4/2003 3:42:15 AM, Dmitry Karasik wrote: >Hello, > >How would I tackle the following problem? > >$ /usr/local/spambayes/hammie.py -d >Traceback (most recent call last): > File "/usr/local/spambayes/hammie.py", line 5, in ? > spambayes.hammiebulk.main() > File "./spambayes/hammiebulk.py", line 180, in main > h = hammie.open(pck, usedb, mode) > File "./spambayes/hammie.py", line 260, in open > b = storage.DBDictClassifier(filename, mode) > File "./spambayes/storage.py", line 140, in __init__ > self.load() > File "./spambayes/storage.py", line 148, in load > self.dbm = dbmstorage.open(self.db_name, self.mode) > File "./spambayes/dbmstorage.py", line 54, in open > return f(*args) > File "./spambayes/dbmstorage.py", line 36, in open_best > return f(*args) > File "./spambayes/dbmstorage.py", line 17, in open_dbhash > return bsddb.hashopen(*args) >bsddb.error: (2, 'No such file or directory') >zsh: 27989 exit 1 /usr/local/spambayes/hammie.py -d > > >/Dmitry > > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From romany at actimize.com Fri Apr 4 09:50:07 2003 From: romany at actimize.com (Roman Yakovenko) Date: Fri Apr 4 10:04:07 2003 Subject: [Spambayes] smart spam Message-ID: <91BFE89EFFA2904E9A4C3ACB4E5F2DF5027ACB@exchange.adrembi.com> Good morning, thanks for good program. Almost 99% of all spam I get is deleted. Thanks. The spam I forwarder to you is a smart one. The program can't detect that this is a spam. Also I already have a few letters like this in my database. May be it will help you to clisify this letter as spam. Roman. -----Original Message----- From: donajo1vydt@yahoo.com [mailto:donajo1vydt@yahoo.com] Sent: Saturday, April 05, 2003 8:10 AM To: Roman Yakovenko Subject: Mortgage Rates down to Lowest since 1966! 123 9868XSq-7 JAMES No mail! c3MNkrgWu8EAbFI3iMQu6 7631AxCz5-645XVpE8262sKXb6-317dljB5514fddo2-321pTJJ8065Wl53 From tim at fourstonesExpressions.com Fri Apr 4 09:07:59 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 10:09:27 2003 Subject: [Spambayes] smart spam In-Reply-To: <91BFE89EFFA2904E9A4C3ACB4E5F2DF5027ACB@exchange.adrembi.com> Message-ID: <08A8QLZVTJNLRO3XTODC2VB6QLSQB0.3e8d9fcf@myst> 4/4/2003 12:50:07 AM, "Roman Yakovenko" wrote: >Good morning, thanks for good program. Almost 99% of all spam I get is >deleted. Thanks. >The spam I forwarder to you is a smart one. The program can't detect >that this is a spam. >Also I already have a few letters like this in my database. May be it >will help you to clisify >this letter as spam. Be patient. You just haven't seen enough of these for your classifier to be trained to catch 'em yet, apparently. My classifier nails this one with the following clues: Spam probability: 0.966497710245 Clues: *H* 0.0612971104255 *S* 0.994292530916 subject:down 0.0412844036697 skip:c 20 0.218422592603 from:no real name:2**0 0.345657329535 url:com 0.669802785364 url:www 0.679068545706 james 0.820319551399 url:remove 0.876581080058 subject:Rates 0.908163265306 message-id:invalid 0.934782608696 subject:Lowest 0.934782608696 subject:! 0.962260406696 url:info 0.990405117271 c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From maht at cuntbubble.com Fri Apr 4 16:08:21 2003 From: maht at cuntbubble.com (maht 0x0r) Date: Fri Apr 4 10:11:53 2003 Subject: [Spambayes] IMAP In-Reply-To: <16013.38452.431498.163149@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F1301230C05@its-xchg4.massey.ac.nz> <3E8D46D4.6090702@olivermaunder.co.uk> <16013.38452.431498.163149@montanaro.dyndns.org> Message-ID: <3E8D9FE5.1080904@cuntbubble.com> http://fastmail.fm a free mailer that has Web/Pop3/IMAP access to email From skip at pobox.com Fri Apr 4 09:17:10 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Apr 4 10:17:32 2003 Subject: [Spambayes] hammie.py -d crash help In-Reply-To: <3E8D5377.2020401@karasik.eu.org> References: <3E8D5377.2020401@karasik.eu.org> Message-ID: <16013.41462.185789.641917@montanaro.dyndns.org> Dmitry> How would I tackle the following problem? Dmitry> $ /usr/local/spambayes/hammie.py -d ... Have you tried /usr/local/spambayes/hammie.py -d -p ~/newhammie.db ? If so, does it succeed? Perhaps your default database file can't be created. You can see what your default is by executing /usr/local/spambayes/hammie.py --help and noting the text for the -p flag. Skip From Paul.Moore at atosorigin.com Fri Apr 4 15:52:55 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Fri Apr 4 11:30:23 2003 Subject: [Spambayes] SpamBayes Outlook Plug-In Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D9F1@UKDCX001.uk.int.atosorigin.com> From: Martin Worger [mailto:martin.worger@cbs-osn.co.uk] > A minor point, but since accumulating enough spam to start filtering, > messages I read are no longer marked as 'read' by Outlook. i.e. Since I > turned filtering on. This is a known issue with the Outlook plugin. If you read messages before the plugin gets to do its stuff, this happens (the messages are set back to "unread"). There's a small, but significant enough to be annoying, time window. As far as I know, no-one has been able to track down the problem to fix it yet. Paul. From dalmolin at e-cology.ca Fri Apr 4 19:04:08 2003 From: dalmolin at e-cology.ca (Joseph Dal Molin) Date: Fri Apr 4 14:04:09 2003 Subject: [Spambayes] Installation Problems: no runnable browser, OR connection refused Message-ID: <1049483045.30953.9.camel@montegrappa> I have just installed Spambayes on Mandrake 9.0 and cannot get the browser based configuration going. I have tried connecting direclty via http://localhost:8880/ and get a connection refused reply AND I have tried to launch the browser using both the pop3proxy.py script and doing a self test of Dibbler.py...the following is what I get: [root@montegrappa pop3proxy-spam-cache]# python /usr/bin/pop3 Loading database... Done. User interface url is http://localhost:8880/ Traceback (most recent call last): File "/usr/bin/pop3proxy.py", line 1578, in ? run() File "/usr/bin/pop3proxy.py", line 1572, in run main(state.servers, state.proxyPorts, state.uiPort, state File "/usr/bin/pop3proxy.py", line 1253, in main Dibbler.run(launchBrowser=launchUI) File "/usr/lib/python2.2/site-packages/spambayes/Dibbler.py webbrowser.open_new("http://localhost:%d/" % context._HTT File "/usr/lib/python2.2/webbrowser.py", line 46, in open_n get().open(url, 1) File "/usr/lib/python2.2/webbrowser.py", line 38, in get raise Error("could not locate runnable browser") webbrowser.Error: could not locate runnable browser [root@montegrappa spambayes]# python Dibbler.py Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.2/threading.py", line 408, in __bootstrap self.run() File "/usr/lib/python2.2/threading.py", line 396, in run apply(self.__target, self.__args, self.__kwargs) File "Dibbler.py", line 560, in runTestServer Dibbler.run(launchBrowser=True) File "/usr/lib/python2.2/site-packages/spambayes/Dibbler.py", line 527, in run webbrowser.open_new("http://localhost:%d/" % context._HTTPPort) File "/usr/lib/python2.2/webbrowser.py", line 46, in open_new get().open(url, 1) File "/usr/lib/python2.2/webbrowser.py", line 38, in get raise Error("could not locate runnable browser") Error: could not locate runnable browser Could someone please tell me what I am doing wrong. Thanks. Joseph -- +=======================================================+ | Joseph Dal Molin | E-mail: dalmolin@e-cology.ca | | President | Web: http://www.e-cology.ca | | e-cology Corporation | Phone: 1.416.232.1206 | +=======================================================+ From tim at fourstonesExpressions.com Fri Apr 4 13:29:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 14:29:11 2003 Subject: [Spambayes] Installation Problems: no runnable browser, OR connection refused In-Reply-To: <1049483045.30953.9.camel@montegrappa> Message-ID: 4/4/2003 1:04:06 PM, Joseph Dal Molin wrote: >I have just installed Spambayes on Mandrake 9.0 and cannot get the >browser based configuration going. I have tried connecting direclty via >http://localhost:8880/ and get a connection refused reply AND I have >tried to launch the browser using both the pop3proxy.py script and doing >a self test of Dibbler.py...the following is what I get: I presume you're using the -b startup option. This will make the pop3proxy try to start up a default browser window, which python is unable to find on your installation for some reason. Simply leave the option off, start your browser yourself, and point it to localhost:8880 - TimS c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Fri Apr 4 12:13:06 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 14:58:29 2003 Subject: [Spambayes] [ spambayes-Bugs-715248 ] Pickle classifier should save to a temp file first Message-ID: Bugs item #715248, was opened at 2003-04-04 14:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Nobody/Anonymous (nobody) Summary: Pickle classifier should save to a temp file first Initial Comment: A number of "EOF Error"s could be avoided if the pickle classifier saved to a temp file first, then renamed to the real file. Otherwise, failure during save can lose the database. This came up in https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 22:13 Message: Logged In: YES user_id=227443 --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-04 21:51:25.000000000 +0200 @@ -59,6 +59,9 @@ import cPickle as pickle import errno import shelve +import sys +import os +import random from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,10 +124,31 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') - pickle.dump(self, fp, PICKLE_TYPE) - fp.close() - + # Be as defensive as possible, keep always a safe copy. + rand = random.randrange(0, sys.maxint) + tmp = self.db_name + '.%d.%d.tmp' % (rand, os.getpid()) + last = self.db_name + '.bak' + fp = None + try: + fp = open(tmp, 'wb') + pickle.dump(self, fp, PICKLE_TYPE) + fp.close() + except IOError, e: + if options.verbose: + print 'Failed update: ' + e + if fp is not None: + os.unlink(tmp) + raise + try: + os.unlink(last) + except OSError, e: + if e.errno <> errno.ENOENT: raise + try: + os.link(self.db_name, last) + except: + if e.errno <> errno.ENOENT: raise + os.rename(tmp, self.db_name) + class DBDictClassifier(classifier.Classifier): '''Classifier object persisted in a caching database''' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 From noreply at sourceforge.net Fri Apr 4 13:02:45 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 15:48:08 2003 Subject: [Spambayes] [ spambayes-Bugs-715248 ] Pickle classifier should save to a temp file first Message-ID: Bugs item #715248, was opened at 2003-04-04 06:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) >Assigned to: Tim Stone (timstone4) Summary: Pickle classifier should save to a temp file first Initial Comment: A number of "EOF Error"s could be avoided if the pickle classifier saved to a temp file first, then renamed to the real file. Otherwise, failure during save can lose the database. This came up in https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 ---------------------------------------------------------------------- >Comment By: Tim Stone (timstone4) Date: 2003-04-04 15:02 Message: Logged In: YES user_id=645698 I agree. This was how it did its thing to start with... . I'll put it back in. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 14:13 Message: Logged In: YES user_id=227443 --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-04 21:51:25.000000000 +0200 @@ -59,6 +59,9 @@ import cPickle as pickle import errno import shelve +import sys +import os +import random from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,10 +124,31 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') - pickle.dump(self, fp, PICKLE_TYPE) - fp.close() - + # Be as defensive as possible, keep always a safe copy. + rand = random.randrange(0, sys.maxint) + tmp = self.db_name + '.%d.%d.tmp' % (rand, os.getpid()) + last = self.db_name + '.bak' + fp = None + try: + fp = open(tmp, 'wb') + pickle.dump(self, fp, PICKLE_TYPE) + fp.close() + except IOError, e: + if options.verbose: + print 'Failed update: ' + e + if fp is not None: + os.unlink(tmp) + raise + try: + os.unlink(last) + except OSError, e: + if e.errno <> errno.ENOENT: raise + try: + os.link(self.db_name, last) + except: + if e.errno <> errno.ENOENT: raise + os.rename(tmp, self.db_name) + class DBDictClassifier(classifier.Classifier): '''Classifier object persisted in a caching database''' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 From noreply at sourceforge.net Fri Apr 4 13:09:21 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 4 15:54:17 2003 Subject: [Spambayes] [ spambayes-Bugs-715248 ] Pickle classifier should save to a temp file first Message-ID: Bugs item #715248, was opened at 2003-04-04 14:01 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Tim Stone (timstone4) Summary: Pickle classifier should save to a temp file first Initial Comment: A number of "EOF Error"s could be avoided if the pickle classifier saved to a temp file first, then renamed to the real file. Otherwise, failure during save can lose the database. This came up in https://sourceforge.net/tracker/?func=detail&atid=498103&aid=709051&group_id=61702 ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 23:09 Message: Logged In: YES user_id=227443 sorry, typo... the 3rd "try" is "except OSError, e", like the 2nd one. ---------------------------------------------------------------------- Comment By: Tim Stone (timstone4) Date: 2003-04-04 23:02 Message: Logged In: YES user_id=645698 I agree. This was how it did its thing to start with... . I'll put it back in. ---------------------------------------------------------------------- Comment By: Simone Piunno (pioppo) Date: 2003-04-04 22:13 Message: Logged In: YES user_id=227443 --- spambayes/storage.py.orig 2003-04-03 23:35:47.000000000 +0200 +++ spambayes/storage.py 2003-04-04 21:51:25.000000000 +0200 @@ -59,6 +59,9 @@ import cPickle as pickle import errno import shelve +import sys +import os +import random from spambayes import dbmstorage # Make shelve use binary pickles by default. @@ -121,10 +124,31 @@ if options.verbose: print 'Persisting',self.db_name,'as a pickle' - fp = open(self.db_name, 'wb') - pickle.dump(self, fp, PICKLE_TYPE) - fp.close() - + # Be as defensive as possible, keep always a safe copy. + rand = random.randrange(0, sys.maxint) + tmp = self.db_name + '.%d.%d.tmp' % (rand, os.getpid()) + last = self.db_name + '.bak' + fp = None + try: + fp = open(tmp, 'wb') + pickle.dump(self, fp, PICKLE_TYPE) + fp.close() + except IOError, e: + if options.verbose: + print 'Failed update: ' + e + if fp is not None: + os.unlink(tmp) + raise + try: + os.unlink(last) + except OSError, e: + if e.errno <> errno.ENOENT: raise + try: + os.link(self.db_name, last) + except: + if e.errno <> errno.ENOENT: raise + os.rename(tmp, self.db_name) + class DBDictClassifier(classifier.Classifier): '''Classifier object persisted in a caching database''' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=715248&group_id=61702 From richard at jowsey.com Sat Apr 5 07:28:16 2003 From: richard at jowsey.com (Richard Jowsey) Date: Fri Apr 4 16:27:52 2003 Subject: [Spambayes] smart spam In-Reply-To: <91BFE89EFFA2904E9A4C3ACB4E5F2DF5027ACB@exchange.adrembi.com> Message-ID: <3E8E8590.18356.16B00351@localhost> > The spam I forwarder to you is a smart one. The program can't > detect that this is a spam. Also I already have a few letters > like this in my database. Yet another example of a "low molecular mass" spam. This one gets 0.7053 as-is through my well-trained system, with URL slurping turned off. Still unsure. Probably needs to see another half dozen of these little nuisances before it knows what's what... But, as one might expect, the URL's html content earns a spam prob of exactly 1.0! RIP. Cheers, Richard From tim at fourstonesExpressions.com Fri Apr 4 15:35:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 4 16:37:04 2003 Subject: [Spambayes] smart spam In-Reply-To: <3E8E8590.18356.16B00351@localhost> Message-ID: 4/4/2003 3:28:16 PM, "Richard Jowsey" wrote: >> The spam I forwarder to you is a smart one. The program can't >> detect that this is a spam. Also I already have a few letters >> like this in my database. > >Yet another example of a "low molecular mass" spam. This one gets >0.7053 as-is through my well-trained system, with URL slurping turned >off. Still unsure. Probably needs to see another half dozen of these >little nuisances before it knows what's what... > >But, as one might expect, the URL's html content earns a spam prob of >exactly 1.0! RIP. Do you train on the site contents? c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From richard at jowsey.com Sat Apr 5 07:58:46 2003 From: richard at jowsey.com (Richard Jowsey) Date: Fri Apr 4 16:58:23 2003 Subject: [Spambayes] smart spam In-Reply-To: References: <3E8E8590.18356.16B00351@localhost> Message-ID: <3E8E8CB6.20000.16CBF280@localhost> > Do you train on the site contents? You betcha! I consider the site's contents to be a legitimate "message part" under the previously-defined circumstances (unsure, few clues, yadda). Cheers, R From richard at jowsey.com Sat Apr 5 08:21:12 2003 From: richard at jowsey.com (Richard Jowsey) Date: Fri Apr 4 17:20:57 2003 Subject: [Spambayes] Latest spammer trick stymied - QUESTION In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230CA3@its-xchg4.massey.ac.nz> Message-ID: <3E8E91F8.18375.16E07BC4@localhost> [Tony] > You can still donate the source so that it can be converted into a > nice language ;) Prototype slurper is attached! Note that the recursive re- direct/refresh logic is a bit Q&D, and needs some more tidying up (there's only 24 hours in a day, dammit). [Richard] > > As for consensus, I'm biased, but I don't really think there > > is one. [Tony] > Hmm. I read all the messages immediately after each other [snip] > There weren't a lot of people saying that they thought it was a good > thing to include it, but there were some saying that they were against > it. Then there were a lot posting queries and answers to that. > Overall, it did seem very anti the idea. Perhaps you're right. There does seem to be a dollop of scepticism about the value of something like this. Funny, coz my beta users *love* it! > Perhaps, as a test, someone could convert the code to Python and > it could be committed (as an option that defaults to False). If test > results really support it (and no-one can come up with a pure > tokenisation alternative), it could be left there (still defaulting to > False, unless all the concerns are addressed). Fair enough! Cheers, Richard -------------- next part -------------- /* * @(#) UrlSlurper.java */ package net.death2spam.http; import java.io.BufferedOutputStream; import java.io.IOException; import java.io.PrintStream; import java.net.MalformedURLException; import java.net.Socket; import java.net.URL; import java.security.Security; import javax.net.ssl.SSLSocketFactory; import net.death2spam.Statics; import net.death2spam.io.BigByteInputStream; import net.death2spam.log.HttpLogger; import com.sun.net.ssl.HttpsURLConnection; import com.sun.net.ssl.internal.ssl.Provider; /** * Gets HTML from a URL for word-frequency analysis. */ public final class UrlSlurper { /* The URL to retrieve */ protected URL url = null; /* Whether to only obtain the HTTP headers */ protected boolean head = false; /* Hidden default constructor */ private UrlSlurper() {} /** Constructor taking a URL argument */ public UrlSlurper(URL url) { this.url = url; } /** Constructor taking URL and head-flag parameters */ public UrlSlurper(URL url, boolean head) { this.url = url; this.head = head; } /** @return the URL's contents as a String */ public String retrieve() { String html = ""; Socket socket = null; try { HttpLogger.trace("UrlSlurper.retrieve() " + url.toString()); String host = url.getHost(); HttpLogger.trace("UrlSlurper.retrieve() contacting " + host); int port = url.getPort(); port = (port < 1) ? 80 : port; if (port == 443) { Security.addProvider(new Provider()); System.setProperty("java.protocol.handler.pkgs", "com.sun.net.ssl.internal.www.protocol"); SSLSocketFactory sslFactory = HttpsURLConnection.getDefaultSSLSocketFactory(); socket = sslFactory.createSocket(host, port); } else socket = new Socket(host, port); // create the socket reader and writer HttpLogger.trace("UrlSlurper.retrieve() " + socket.toString()); BigByteInputStream in = new BigByteInputStream( socket.getInputStream(), Statics.BUFFER_SIZE); BufferedOutputStream out = new BufferedOutputStream( socket.getOutputStream(), Statics.BUFFER_SIZE); // construct the request headers String file = url.getFile(); if (file == null || file.length() == 0) file = "/"; String httpRequest = (head ? "HEAD " : "GET ") + file + " HTTP/1.0" + Statics.CRLF; httpRequest += "Host: " + host + (port == 80 ? "" : ":" + port) + Statics.CRLF; httpRequest += "User-Agent: Death2Spam/1.0 (compatible; en)" + Statics.CRLF; httpRequest += "Content-Type: text/html" + Statics.CRLF; httpRequest += "Accept: */*" + Statics.CRLF; httpRequest += "Accept-Language: en" + Statics.CRLF; httpRequest += "Accept-Charset: iso-8859-1, *, utf-8" + Statics.CRLF; httpRequest += "Connection: close" + Statics.CRLF; HttpLogger.trace("UrlSlurper.retrieve() request:" + Statics.CRLF + httpRequest); httpRequest += Statics.CRLF; // end of RFC-822 header section // send the request PrintStream ps = new PrintStream(out, true); ps.print(httpRequest); ps.flush(); // read the response html = in.toString(); HttpLogger.trace("UrlSlurper.retrieve() response:" + Statics.CRLF + html); } catch (Exception e) { if (Statics.debug) e.printStackTrace(System.err); HttpLogger.error(e.toString()); } finally { if (socket != null) { try { socket.close(); } catch (IOException ignore) {} } } return html; } /** * Recursively follows a redirect or refresh trail. * @param url The page being retrieved. * @param response Contents of the http response. * @param headReq whether it's a HEAD request. * @return The response from any re-directs detected. * */ public String handleRedirect(URL url, String response, boolean headReq) { String protocol = url.getProtocol(); String host = url.getHost(); int port = url.getPort(); port = (port > -1) ? port : 80; String file = url.getFile(); int posCRLF = response.indexOf(Statics.CRLF); if (posCRLF < 0) posCRLF = response.indexOf('\n'); String first = response.substring(0, posCRLF).trim(); int posCode = first.indexOf(' ') + 1; int code = 200; if (first.indexOf("HTTP") > -1) { try { code = Integer.parseInt(first.substring(posCode, posCode + 3)); } catch (NumberFormatException nfe) { code = 404; // assume a file-not-found error } } try { if (code > 499) { if (Statics.debug) System.err.println(first); } else if (code > 399) { // 404 not found, etc if (host.startsWith("www.")) return response; else { int posDot = host.indexOf('.') + 1; if (host.indexOf('.', posDot) < 0) return response; url = new URL(protocol, "www." + host.substring(posDot).trim(), port, file); UrlSlurper slurper = new UrlSlurper(url, headReq); response = slurper.retrieve(); return handleRedirect(url, response, headReq); } } else if (code == 301 || code == 302 || code == 303) { // moved, found or see-other String res = response.toLowerCase(); int start = res.indexOf("location:"); if (start < 0) { int posDot = host.indexOf('.') + 1; url = new URL(protocol, "www." + host.substring(posDot).trim(), port, file); } else if (res.substring(start + 9, start + 15).trim().startsWith("http")) { start = res.indexOf("http", start + 9); int end = res.indexOf('\r', start); url = new URL(response.substring(start, end).trim()); } else { start += 9; int end = res.indexOf('\r', start); file = response.substring(start, end).trim(); url = new URL(protocol, host, port, file); } UrlSlurper slurper = new UrlSlurper(url, headReq); response = slurper.retrieve(); return handleRedirect(url, response, headReq); } else if (response.toLowerCase().indexOf(" -1) { // meta refresh tag in header String res = response.toLowerCase(); int start = res.indexOf(" wrote: >> Do you train on the site contents? > >You betcha! What's that do to the size of your database? > >I consider the site's contents to be a legitimate "message part" >under the previously-defined circumstances (unsure, few clues, >yadda). > >Cheers, >R > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From richard at jowsey.com Sat Apr 5 11:48:13 2003 From: richard at jowsey.com (Richard Jowsey) Date: Fri Apr 4 20:48:48 2003 Subject: [Spambayes] smart spam In-Reply-To: References: <3E8E8CB6.20000.16CBF280@localhost> Message-ID: <3E8EC27D.10208.179E7E19@localhost> > >You betcha! > > What's that do to the size of your database? There aren't enough of these mini-spams coming down the pipe to bloat the database. Considering there's only, say, 500-1000 distinct words on these "spurped" sites, and many of the words are already known to the database, it's just the same as an email with a 5-10k text attachment. Currently, my spam + good + virus databases (including hapaxes), are still under 15Mb total size. That's from an initial training corpus of about 15k good, 50k spam, and around 15k messages through the beta proxy in the past month... Good messages: 21,676 Unique good words: 326,001 Total good words: 11,446,259 Datafile size (Kb): 2,898 Spam messages: 59,090 Unique spam words: 770,800 Total spam words: 36,973,011 Datafile size (Kb): 11,083 Cheers, R From tim_one at email.msn.com Sat Apr 5 02:18:40 2003 From: tim_one at email.msn.com (Tim Peters) Date: Sat Apr 5 02:19:29 2003 Subject: [Spambayes] smart spam In-Reply-To: <3E8E8590.18356.16B00351@localhost> Message-ID: [Richard Jowsey] > Yet another example of a "low molecular mass" spam. This one gets > 0.7053 as-is through my well-trained system, with URL slurping turned > off. Still unsure. Probably needs to see another half dozen of these > little nuisances before it knows what's what... > > But, as one might expect, the URL's html content earns a spam prob of > exactly 1.0! RIP. A cute thing is that the http://www.lowratemortgages.info/lead2345 page also tries to hide most of its text: the "100's of Lenders Compete for Your Loan ..." business is itself hiding in a JPEG. They left plenty of incriminating evidence in the TITLE and the FORM, though. Now let's see whether you scrape the URL and call *this* msg spam . From richard at jowsey.com Sat Apr 5 18:29:28 2003 From: richard at jowsey.com (Richard Jowsey) Date: Sat Apr 5 03:22:55 2003 Subject: [Spambayes] smart spam In-Reply-To: References: <3E8E8590.18356.16B00351@localhost> Message-ID: <3E8F2088.26626.190DD75C@localhost> > [Richard Jowsey] > > Yet another example of a "low molecular mass" spam. This one gets > > 0.7053 as-is through my well-trained system, with URL slurping > > turned off. Still unsure. Probably needs to see another half dozen > > of these little nuisances before it knows what's what... > > > > But, as one might expect, the URL's html content earns a spam prob > > of exactly 1.0! RIP. > > A cute thing is that the > > http://www.lowratemortgages.info/lead2345 > > page also tries to hide most of its text: the "100's of Lenders > Compete for Your Loan ..." business is itself hiding in a JPEG. They > left plenty of incriminating evidence in the TITLE and the FORM, > though. Now let's see whether you scrape the URL and call *this* msg > spam . LOL! My classifier didn't need to go slurp, but it said "unsure" anyway (pSpam=0.4603), coz you put LENDERS in there... Top 10: spambayes=0.0001 h:spambayes=0.0001 python.org=0.0001 spambayes@python.org=0.0001 s:spambayes=0.0002 h:python.org=0.0005 r:cj569191b=0.0036 as-is=0.0072 tim_one@email.msn.com=0.0076 lenders=0.9889 :-) From skip at pobox.com Sat Apr 5 13:05:42 2003 From: skip at pobox.com (Skip Montanaro) Date: Sat Apr 5 14:05:47 2003 Subject: [Spambayes] FYI - Habeas suing two bulk emailers Message-ID: <16015.10502.785202.750119@montanaro.dyndns.org> This in from Slashdot: http://slashdot.org/article.pl?sid=03/04/05/1644249 ... According to a news.com article, it will now be tested in court. Habeas is suing two internet marketers, saying that they've included Habeas' haiku in their mail, thereby lowering their SpamAssassin score by 6 points, but allegedly violating the trademark. ... FYI. Skip From anthony at interlink.com.au Sun Apr 6 19:00:47 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Apr 6 04:02:39 2003 Subject: [Spambayes] IMAP In-Reply-To: <16013.38452.431498.163149@montanaro.dyndns.org> Message-ID: <200304060800.h3680mT18635@localhost.localdomain> >>> Skip Montanaro wrote > And why should they [offer IMAP]? From their perspective it only leads > to greater consumption of their disk space. Northwestern is going > through these IMAP machinations right now. Adding quotas, sending out > warning messages (by email!), finally spooling over-quota IMAP storage > off to tape. It's gonna be a mess. Speaking as someone who's worked at both a large ISP and now at a telco that provides webmail and voicemail access -- in an ISP environment, IMAP is overkill. If nothing else, the IMAP daemons that are out there all have much heavier footprints that POP daemons. Admittedly, the UWash IMAP daemon is far far worse than the others, but they're all pretty bad. >From the perspective of offering web and voice access to a mailbox, though, IMAP kicks POP's butt. The sheer amount of horrific tedium you need to cope with when doing POP stuff in a sane way will melt your brain. Anthony From noreply at sourceforge.net Sun Apr 6 17:53:00 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Apr 6 19:36:54 2003 Subject: [Spambayes] [ spambayes-Feature Requests-716437 ] Version information in GUI somewhere Message-ID: Feature Requests item #716437, was opened at 2003-04-07 11:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Version information in GUI somewhere Initial Comment: With the growing number of users, especially those using the binary, it would be good to have a version number printed somewhere in the GUI for people when they are reporting bugs. Greyed out text in the manager dialog, or even something in the about.html would work fine. I'll leave it to you to find somewhere appropriate :) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 From noreply at sourceforge.net Sun Apr 6 17:54:58 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Apr 6 19:39:33 2003 Subject: [Spambayes] [ spambayes-Feature Requests-716437 ] Version information in GUI somewhere Message-ID: Feature Requests item #716437, was opened at 2003-04-07 09:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Version information in GUI somewhere Initial Comment: With the growing number of users, especially those using the binary, it would be good to have a version number printed somewhere in the GUI for people when they are reporting bugs. Greyed out text in the manager dialog, or even something in the about.html would work fine. I'll leave it to you to find somewhere appropriate :) ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-07 09:54 Message: Logged In: YES user_id=14198 I'm not sure *what* version to report though. I will find the "where" if you tell me that "what" ;) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 From noreply at sourceforge.net Sun Apr 6 18:01:19 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Apr 6 19:46:04 2003 Subject: [Spambayes] [ spambayes-Feature Requests-716437 ] Version information in GUI somewhere Message-ID: Feature Requests item #716437, was opened at 2003-04-07 11:52 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Version information in GUI somewhere Initial Comment: With the growing number of users, especially those using the binary, it would be good to have a version number printed somewhere in the GUI for people when they are reporting bugs. Greyed out text in the manager dialog, or even something in the about.html would work fine. I'll leave it to you to find somewhere appropriate :) ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-07 12:01 Message: Logged In: YES user_id=552329 :) Well, for the binaries, your 001/002 system would work. For the full source releases, there's a __version__ attribute (1.0a2 at the moment, I think). I don't really know for CVS (maybe just 'cvs'?), but anyone using the cvs code should be able to describe when they retrieved it. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-07 11:54 Message: Logged In: YES user_id=14198 I'm not sure *what* version to report though. I will find the "where" if you tell me that "what" ;) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 From T.A.Meyer at massey.ac.nz Mon Apr 7 12:56:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 6 19:57:15 2003 Subject: [Spambayes] IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230F5C@its-xchg4.massey.ac.nz> Quick question for the IMAP'ers: How do you want to train? Three easy answers (but feel free to add your own): (a) By moving a message into a 'train as ham' or 'train as spam' folder. Once trained, the spam messages could be moved into a 'spam' folder - the ham ones could be moved into a 'ham' folder, but I'm not sure that I could move them back into their _original_ folder. (b) Via the web ui, like pop3proxy. (c) By forwarding mail to a (locally caught) spam/ham address, like smtpproxy (which would in fact work unaltered). I have a basic IMAP filter app written, I just need to do a bit more testing, work in a training scheme and tidy things up a bit. I'll then add it to CVS and you'll can start testing and feature-requesting. =Tony Meyer From tim at fourstonesExpressions.com Sun Apr 6 20:04:15 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sun Apr 6 20:06:19 2003 Subject: [Spambayes] IMAP In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230F5C@its-xchg4.massey.ac.nz> Message-ID: <62IGW72IGIHI7232KI94VNJFE0865.3e90c07f@myst> 4/6/2003 6:56:38 PM, "Meyer, Tony" wrote: >Quick question for the IMAP'ers: > >How do you want to train? Three easy answers (but feel free to add your >own): > >(a) By moving a message into a 'train as ham' or 'train as spam' folder. >Once trained, the spam messages could be moved into a 'spam' folder - >the ham ones could be moved into a 'ham' folder, but I'm not sure that I >could move them back into their _original_ folder. For point of reference, this is how the notes 'integration' works... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Mon Apr 7 13:16:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 6 20:16:43 2003 Subject: [Spambayes] FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230F8A@its-xchg4.massey.ac.nz> I've created a new file - FAQ.txt (it's in the main directory, since all the other docs are - what do people think about moving these into a doc directory?). There are only three Q/A's at the moment; the three that were recently suggested on the list. I'll trawl through the archives again when I get a chance and try to find some more obvious ones, but suggestions are welcome (for those that can commit, just add 'em to the file), even if they are Q only and need the A added. Once there are a few more we can make the file look nicer and add a version to the website (cross one of the beta checklist!). =Tony Meyer From davida at ActiveState.com Sun Apr 6 18:38:38 2003 From: davida at ActiveState.com (David Ascher) Date: Sun Apr 6 20:36:32 2003 Subject: [Spambayes] IMAP In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230F5C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1301230F5C@its-xchg4.massey.ac.nz> Message-ID: <3E90C88E.5040808@activestate.com> Meyer, Tony wrote: >Quick question for the IMAP'ers: > >How do you want to train? Three easy answers (but feel free to add your >own): > >(a) By moving a message into a 'train as ham' or 'train as spam' folder. >Once trained, the spam messages could be moved into a 'spam' folder - >the ham ones could be moved into a 'ham' folder, but I'm not sure that I >could move them back into their _original_ folder. > >(b) Via the web ui, like pop3proxy. > >(c) By forwarding mail to a (locally caught) spam/ham address, like >smtpproxy (which would in fact work unaltered). > > (c) isn't useful to me without doing a lot of .procmail work (I can't create new mail accounts willy nilly). (b) is a fallback, but too much an out-of-process (a) is probably best, as long as I can label several folders as being "train as ham" -- I don't want to destroy my existing folder hierarchy. (d) Ideally: by a "delete with prejudice" button in my email client (mozilla mail). --david From T.A.Meyer at massey.ac.nz Mon Apr 7 13:40:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 6 20:42:18 2003 Subject: [Spambayes] IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301230FBF@its-xchg4.massey.ac.nz> [Training] > (c) isn't useful to me without doing a lot of .procmail work (I can't > create new mail accounts willy nilly). Actually, the way the smtpproxy works is that it sits between your mail client and your SMTP server and catches any mail to two particular (spam/ham) addresses. These addresses don't need to exist (and in fact, mail would never reach them). It's not ideal, though. > (a) is probably best, as long as I can label several folders as being > "train as ham" -- I don't want to destroy my existing folder > hierarchy. If this ends up being the chosen option (I imagine that it will be), I'll make sure I add this in. > (d) Ideally: by a "delete with prejudice" button in my email client > (mozilla mail). :) To add buttons to a mail client, we have to write a plugin of sorts for it, in which case there really isn't any need for a filter script or proxy. This is a lot of work, though (just look at the Outlook plugin), and not something I'm planning on doing... =Tony Meyer From frank.horowitz at csiro.au Mon Apr 7 03:23:58 2003 From: frank.horowitz at csiro.au (Frank Horowitz) Date: Sun Apr 6 22:23:59 2003 Subject: [Spambayes] Re: IMAP In-Reply-To: References: Message-ID: <1049682229.20502.14.camel@bonzo.ned.dem.csiro.au> On Fri, 2003-04-04 at 22:49, spambayes-request@python.org wrote: > Date: Fri, 04 Apr 2003 15:47:57 +0100 > From: Oliver Maunder > Subject: Re: [Spambayes] IMAP > To: spambayes@python.org > Message-ID: <3E8D9B1D.6020400@olivermaunder.co.uk> > Content-Type: text/plain; charset=us-ascii; format=flowed > > Tim Stone - Four Stones Expressions wrote: > > >> want an IMAP solution! > >> > >> > > > >When I joined this mailing list and started talking about a Lotus Notes > >integration. This project was the first open source project I'd ever > >participated in, and Tim Peters helped me tremendously early on by letting me > >know that in open source, you get to scratch your own itch . So... > >don't look for an IMAP proxy from me anytime soon... > > > > > > > Point taken, and entirely agreed with :-) > > As I said yesterday, I've been mulling over doing an IMAP app myself, > but haven't got round to starting anything yet. If anyone else (Tony?) > gets there quicker, then I will certainly do my share of testing and fixing. > > Olly > Someone on this list suggested earlier having a look at imapspambegone, an IMAP front end for SpamAssassin. I can vouch that imapspambegone ( or isbg for short: http://groups.yahoo.com/group/imapspambegone ) looks like a decent place to start hacking an IMAP front end for spambayes. In private email with isbg's author (Roger Binns), he I) expressed a willingness to consider patches to isbg's code that would integrate spambayes, and II) suggested that if there were a command line version of spambayes (hammiefilter?) that produced return codes that could distinguish between spam, unsure, and ham, isbg already could deal with spambayes via isbg's --sasave and --satest command-line arguments. If I weren't in the throes of moving house I'd have a quick hack at this myself. Otherwise, any takers? Cheers, Frank Horowitz From tim at fourstonesExpressions.com Sun Apr 6 22:38:39 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sun Apr 6 22:38:51 2003 Subject: [Spambayes] Proxies everywhere In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301230FBF@its-xchg4.massey.ac.nz> Message-ID: <54KVTRQNIKYTTQVTB43NHGFED413W.3e90e4af@myst> Would seem to me to be useful to have starting the IMAP proxy simply be another parameter on the pop3proxy, just like the smtpproxy is. The only problem with this is that running "pop3proxy" to get an imap proxy doesn't make much sense. SO.... I propose renaming the pop3proxy to something like "spambayes", and make it more of a server daemon. In fact, it could conceivable sport a hammiesrv interface, and become much more general purpose. It could have -p, -s, -i operands for the various proxies, etc. etc. What says you all? It would be fairly simple to implement. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Mon Apr 7 17:06:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 00:10:40 2003 Subject: [Spambayes] Proxies everywhere Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13012310E6@its-xchg4.massey.ac.nz> > Would seem to me to be useful to have starting the IMAP proxy > simply be another parameter on the pop3proxy, > just like the smtpproxy is. The only problem with this is that > running "pop3proxy" to get an imap proxy doesn't make much sense. Thoughts: 1. The IMAP solution at the moment (ok, so I haven't even committed yet, but...) isn't a proxy, but more like the notes filter is I think (i.e. you execute the script whenever you want to filter, or possibly you leave it running and it periodically filters for you). The IMAP users on the list seemed to prefer a non-proxy solution, so that's what I've done so far (it's certainly easier). 2. I think that smtpproxy could run by default. It doesn't take anything much in the way of additional resources, and unless the user has configured it it doesn't actually do anything (i.e. without a server to proxy, it does nothing). I think that pop3proxy is the same actually, except that it does always provide the web ui. 3. I agree that running pop3proxy to get both proxies is less than ideal. > SO.... I propose renaming the pop3proxy to something like > "spambayes", and make it more of a server daemon. In fact, it could > conceivably sport a hammiesrv interface, and become much more > general purpose. It could have -p, -s, -I operands for the various > proxies, etc. etc. What says you all? It would be fairly simple > to implement. Rather than renaming, I would suggest creating a new script (called spambayes) that does this. It's probably also time that we broke the web ui out of pop3proxy.py (like the __doc__ says will one day happen). This could be something else that the 'spambayes' script takes care of. I like this idea in general - it would be nice to have a unified front end for everything (well, probably everything but the Outlook plugin). It would make some of the documentation easier, and it might make it easier to package a binary (the suggested (windows) pop3proxy binary, for example. Those 'power-user' type people could still run whichever parts they want separately. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 7 17:10:42 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 00:18:02 2003 Subject: [Spambayes] Re: IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13012310ED@its-xchg4.massey.ac.nz> > Someone on this list suggested earlier having a look at > imapspambegone, an IMAP front end for SpamAssassin. I can > vouch that imapspambegone ( or isbg for short: > http://groups.yahoo.com/group/imapspambegone ) > looks like a > decent place to start hacking an IMAP front end for spambayes. When this was suggested I took a look at this and did base v0.0 of the filter I've put together on isbg. There's just about nothing from it there now (although there is an acknowledgement). I'll commit the imap filter this afternoon, I think, as unfinished as it is, so that people can start commenting on it. > II) suggested that if > there were a command line version of spambayes > (hammiefilter?) that produced return codes that could > distinguish between spam, unsure, and ham, isbg already could > deal with spambayes via isbg's --sasave and --satest > command-line arguments. This certainly can be done. And for a 30 second solution, you could use this, plus the smtpproxy to train, and that would be that. I think that ideally, however, we'll need something ourselves so that we can work the training (and whatever else) in. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 7 17:15:42 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 00:25:00 2003 Subject: [Spambayes] IMAP Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13012310F9@its-xchg4.massey.ac.nz> [Training] > I'd prefer a small variation on (a): > > Training/untraining automatically when moving between > folders, (and on delivery) but with the possibility of having > more than one folder for ham. (Using the Trash folder for > spam seems appropriate) Ah, you people should all just use Outlook and the Outlook plugin ;) Seriously, I'm not sure how easy this will be. I'll have a go at watching folders and see how it goes; the trouble is that there isn't any 'new mail' event that I can hook, so if the user beats the filter to it, then the filter won't get a chance. Anyway, all votes are for (a) (or modified versions) so far, so I'll throw something along those lines together. No-one likes (b), so it can go, and (c) already exists, so people can use that as well/instead if they like. Look for an initial commit and corresponding message to the list in a couple of hours. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 7 20:28:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 03:28:51 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> Ok, I've committed a v1 IMAP filter. If you execute the script it will connect to the specified imap server, classify mail in the inbox and move it to specified unsure or spam folders. It will also train any mail in a train_as_ham folder as ham, and similarly a train_spam folder. This is not at all ready for general use. In your config file, specify the following options: * imap_server: server you want to connect to (imap.example.com) * imap_port: port you want to connect to (defaults to 143) * imap_username: your username for that imap box * imap_password: your password for that imap box * imap_inbox: the folder to filter (defaults to inbox) * imap_unsure_folder: folder where unsure mail should be put * imap_spam_folder: folder where spam should be put * imap_ham_train_folders: list of folders that contain mail to be trained as ham * imap_spam_train_folders: as above, but for spam * imap_expunge: True iff you want an expunge (purge) done on exit (defaults to False) The database is the one specified in pop3proxy_persistent_storage_file, and this will be a pickle/db depending on what pop3proxy_persistent_use_database is set to. The script will also add the spambayes headers to mail (so, in fact, you can just filter on the headers if you would rather). At the moment, the script has no memory. I'll add this tomorrow, along the lines of the pickles stored by the notes filter and outlook plugin (or maybe the big general picture solution will get finished and I'll just use that ;)). It doesn't yet, but it will use the same web ui as pop3proxy to let the user do all the configuration stuff (another reason to abstract this out!). Feel free to take a look at it and make any improvements you want to. I'll keep working on it tomorrow, and then next week (I'm busy the rest of this week). =Tony Meyer From anthony at interlink.com.au Mon Apr 7 19:39:17 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Apr 7 04:42:37 2003 Subject: [Spambayes] IMAP Filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> Message-ID: <200304070839.h378dIg29407@localhost.localdomain> >>> "Meyer, Tony" wrote > Ok, I've committed a v1 IMAP filter. If you execute the script it will > connect to the specified imap server, classify mail in the inbox and > move it to specified unsure or spam folders. It will also train any > mail in a train_as_ham folder as ham, and similarly a train_spam folder. Would another option be to use IMAP flags? You could use IMAP's STORE +FLAGS Spam, STORE +FLAGS Ham, STORE +FLAGS Unsure. Or alternately simply use a flag on the messages in the inbox that are already scanned. The former would allow you to do filtering on more than the inbox... -- Anthony Baxter It's never too late to have a happy childhood. From anthony at interlink.com.au Mon Apr 7 19:43:35 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Apr 7 04:45:30 2003 Subject: [Spambayes] IMAP Filter Message-ID: <200304070843.h378hZr29444@localhost.localdomain> >>> Anthony Baxter wrote > Would another option be to use IMAP flags? You could use IMAP's STORE > +FLAGS Spam, STORE +FLAGS Ham, STORE +FLAGS Unsure. Or alternately > simply use a flag on the messages in the inbox that are already scanned. > > The former would allow you to do filtering on more than the inbox... Just looking at imapfilter.py as it is - appending the trained messages to the mailbox will mess up the original delivery date on most mailboxes, and may also make a seen but unrated message unseen again. The code also doesn't seem like it's checking for UIDVALIDITY responses. The comment: # we can't actually update the message with IMAP # XXX (someone tell me if this is wrong!) is correct. This is why you might want to consider using flags instead of re-writing the message. Anthony From noreply at sourceforge.net Mon Apr 7 05:28:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 7 07:12:58 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 21:28 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From noreply at sourceforge.net Mon Apr 7 05:58:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 7 07:43:05 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 11:28 Message generated for change (Comment added) made by worger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 11:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From gregscott at gbsage.com Mon Apr 7 10:00:59 2003 From: gregscott at gbsage.com (Greg Scott) Date: Mon Apr 7 09:33:24 2003 Subject: [Spambayes] Lost database Message-ID: <000201c2fd05$c1a9a6e0$1700a8c0@gbsage.net> I had performed the initial training for the utility, and had around 1100 spam examples and 400+ ham examples. Outlook hung, and I had to abnormally terminate it and reboot in order for the computer to function properly. When I opened Outlook again, SpamBayes indicated there were no spam and no ham instances in its database. Unfortunately, I had already deleted my spam examples, and now don't have them for re-training. Is there a way to import that information from a backup file or anything? I am running XP, Outlook XP, and the binary version of the Outlook plugin. My Python version is ActiveState 2.2.4. Thanks in advance, Greg From tim at fourstonesExpressions.com Mon Apr 7 09:47:19 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 7 09:54:23 2003 Subject: [Spambayes] Lost database In-Reply-To: <000201c2fd05$c1a9a6e0$1700a8c0@gbsage.net> Message-ID: Given that you know approximately how many of each you had, it's relatively simple to correct. Run dbExpImp.py with no operands to see how to export and import the classifier database. Then export it. Bring the export file up in an editor. The first line will have two numbers in it, both zero. Those numbers are the number of spam and the number of ham, respectively, that are in the database. Change them to 1100 and 400 (or whatever is appropriate), save the file, and then import it. Just for good measure, you might import to a different database than the original. This should correct the problem. 4/7/2003 8:00:59 AM, "Greg Scott" wrote: >I had performed the initial training for the utility, and had around >1100 spam examples and 400+ ham examples. Outlook hung, and I had to >abnormally terminate it and reboot in order for the computer to function >properly. When I opened Outlook again, SpamBayes indicated there were no >spam and no ham instances in its database. Unfortunately, I had already >deleted my spam examples, and now don't have them for re-training. > >Is there a way to import that information from a backup file or >anything? > >I am running XP, Outlook XP, and the binary version of the Outlook >plugin. My Python version is ActiveState 2.2.4. > >Thanks in advance, > >Greg >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From skip at pobox.com Mon Apr 7 10:48:21 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Apr 7 10:48:44 2003 Subject: [Spambayes] Proxies everywhere In-Reply-To: <54KVTRQNIKYTTQVTB43NHGFED413W.3e90e4af@myst> References: <1ED4ECF91CDED24C8D012BCF2B034F1301230FBF@its-xchg4.massey.ac.nz> <54KVTRQNIKYTTQVTB43NHGFED413W.3e90e4af@myst> Message-ID: <16017.36789.530181.702781@montanaro.dyndns.org> Tim> Would seem to me to be useful to have starting the IMAP proxy Tim> simply be another parameter on the pop3proxy, just like the Tim> smtpproxy is. The only problem with this is that running Tim> "pop3proxy" to get an imap proxy doesn't make much sense. Oddly enough, the imap package on many Linux systems is what also provides the ipop3d daemon: % rpm -qi imap-2000c ... The imap package provides server daemons for both the IMAP (Internet Message Access Protocol) and POP (Post Office Protocol) mail access protocols. I learned this the hard way just a few days ago. :-( Tim> SO.... I propose renaming the pop3proxy to something like Tim> "spambayes", and make it more of a server daemon. In fact, it Tim> could conceivable sport a hammiesrv interface, and become much more Tim> general purpose. It could have -p, -s, -i operands for the various Tim> proxies, etc. etc. What says you all? It would be fairly simple Tim> to implement. Why not sbproxy or spamproxy? I like having "proxy" in the name as a clue to the tool's general function. Skip From skip at pobox.com Mon Apr 7 10:59:29 2003 From: skip at pobox.com (Skip Montanaro) Date: Mon Apr 7 10:59:44 2003 Subject: [Spambayes] IMAP Filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> Message-ID: <16017.37457.432195.772475@montanaro.dyndns.org> Tony> * imap_expunge: True iff you want an expunge (purge) done on exit Tony> (defaults to False) What does "purge" or "expunge" mean in this case? Are messages in folders deleted as a result when this is set to True? If so, this is probably not a good idea. Someone will eventually set it to True then complain that spambayes deletes valid messages. Skip From anthony at interlink.com.au Tue Apr 8 01:59:50 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Apr 7 11:01:21 2003 Subject: [Spambayes] Proxies everywhere In-Reply-To: <16017.36789.530181.702781@montanaro.dyndns.org> Message-ID: <200304071459.h37ExpU32677@localhost.localdomain> If it's going down the path to become a framework with POP/IMAP/SMTP proxies involved, is it worth looking at an existing framework for the guts of the code? Twisted seems reasonably well tested and robust, but is a rather large dependency to require... are there others? Anthony -- Anthony Baxter It's never too late to have a happy childhood. From lists at olivermaunder.co.uk Mon Apr 7 17:44:58 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Mon Apr 7 11:45:41 2003 Subject: [Spambayes] IMAP Filter In-Reply-To: <16017.37457.432195.772475@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> <16017.37457.432195.772475@montanaro.dyndns.org> Message-ID: <3E919CFA.6030106@olivermaunder.co.uk> Skip Montanaro wrote: >What does "purge" or "expunge" mean in this case? > As far as I can see, IMAP doesn't have a facility to "move" messages between folders. You have to recreate the message in a new folder, then mark the original message as deleted. The original doesn't actually get deleted until the folder is purged. Whether you still see the message in the folder depends on your mail client. >Someone will eventually set it to True then complain that >spambayes deletes valid messages. > > Spambayes wouldn't be deleting stuff in that sense - there will still be a copy in the "spam" folder. But if the original message isn't purged, then there will be 2 copies of the spam taking up space on the system. I learned all this the hard way a couple of weeks ago when my account was over quota, even after I thought I'd deleted several weeks worth of mail. It was still there - just hiding :-) Olly >Skip > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > From lists at morpheus.demon.co.uk Sun Apr 6 14:19:36 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Mon Apr 7 14:59:47 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port Message-ID: The POP3 proxy's UI webpage doesn't display the port/server being proxied. Instead it displays "POP3 proxy running on 1110, proxying to example.com" in all cases. Paul. -- This signature intentionally left blank From T.A.Meyer at massey.ac.nz Tue Apr 8 10:38:02 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 17:38:39 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301231225@its-xchg4.massey.ac.nz> > Would another option be to use IMAP flags? You could use IMAP's STORE > +FLAGS Spam, STORE +FLAGS Ham, STORE +FLAGS Unsure. Or alternately > simply use a flag on the messages in the inbox that are > already scanned. This would be much better, yes. But, annoyingly, creation of new flags is only supported by some IMAP servers - it doesn't have to be (supported) either. From RFC1730: "PERMANENTFLAGS Followed by a parenthesized list of flags, indicates which of the known flags that the client may change permanently. Any flags that are in the FLAGS untagged response, but not the PERMANENTFLAGS list, can not be set permanently. If the client attempts to STORE a flag that is not in the PERMANENTFLAGS list, the server will either reject it with a NO reply or store the state for the remainder of the current session only. The PERMANENTFLAGS list may also include the special flag \*, which indicates that it is possible to create new keywords by attempting to store those flags in the mailbox." > The former would allow you to do filtering on more than the inbox... I do plan to add filtering for other folders. Specifying a list (like the training folders), probably. I just hadn't got to it :) =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Apr 8 10:45:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 17:45:38 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130123123E@its-xchg4.massey.ac.nz> > Just looking at imapfilter.py as it is - appending the > trained messages to the mailbox will mess up the original > delivery date on most mailboxes, Well, according to the RFC, if I append and specify a date, then it should use that date. It will be stuffed up at the moment, but I hope that if I extract the original date from the message (which shouldn't be difficult) and append with that date, things should be ok. A nice theory, anyway :) > and may also make a seen but > unrated message unseen again. Heh. Now we can be just like Outlook ;) Seriously, I suspect Outlook will have to take the same path here and note the unread status of the message before doing anything to it, and setting it to that status once it's done. (I'll add this to the IMAP filter soonish). > The code also doesn't seem like > it's checking for UIDVALIDITY responses. Not at the moment. All the messages are tracked by UID, but folders aren't at the moment. It would be much nicer to have UIDs for these (from the UIDVALIDITY response), except that there's still the problem of identifying them the first time. (Unless I create some sort of GUI like the pop3proxy web ui). > The comment: > # we can't actually update the message with IMAP > # XXX (someone tell me if this is wrong!) > > is correct. This is why you might want to consider using > flags instead of re-writing the message. See previous message re: flags. One advantage of re-writing is that we get to add the spambayes headers. It would be nicer to be able to move/update messages, but never mind. Thanks for the comments, much appreciated. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Apr 8 10:48:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 17:49:29 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301231247@its-xchg4.massey.ac.nz> > Tony> * imap_expunge: True iff you want an expunge > Tony> (purge) done on exit (defaults to False) > > What does "purge" or "expunge" mean in this case? All messages marked with the //deleted flag are permanently removed from the server. > Are messages in folders deleted as a result when this is set to > True? If so, this is probably not a good idea. Someone will > eventually set it to True then complain that spambayes > deletes valid messages. To clarify, as things are, spambayes doesn't mark any incoming messages as //deleted, spam or not. In order to move/change a message, a new one is created (a copy, more or less), and the old one is marked as //deleted. So an expunge gets rid of these duplicates. However, it also gets rid of any messages set as //deleted by the user. For reference, Outlook Express does much the same thing, from what I've observed. If you move a message from one IMAP folder to another, a copy of the message is created in the destination folder, and the source folder holds the original message, but now with the //deleted flag set. Not that I'm holding OE up as an example of how to do things right, of course ;) =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Apr 8 10:53:37 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 17:54:20 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301231258@its-xchg4.massey.ac.nz> > The POP3 proxy's UI webpage doesn't display the port/server > being proxied. Instead it displays > > "POP3 proxy running on 1110, proxying to example.com" > > in all cases. Is it running correctly (apart from this)? Mine (latest cvs) says "POP3 proxy running on 110, proxying to pop.ihug.co.nz:110", as it should. If I ignore my config file it says 'No proxies running', as it should. What version are you using? =Tony Meyer From db3l at fitlinxx.com Mon Apr 7 23:25:49 2003 From: db3l at fitlinxx.com (David Bolen) Date: Mon Apr 7 18:25:49 2003 Subject: [Spambayes] Re: SpamBayes Outlook Plug-In References: <16E1010E4581B049ABC51D4975CEDB880113D9F1@UKDCX001.uk.int.atosorigin.com> Message-ID: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the problem to > fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. -- -- David -- /-----------------------------------------------------------------------\ \ David Bolen \ E-mail: db3l@fitlinxx.com / | FitLinxx, Inc. \ Phone: (203) 708-5192 | / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ \-----------------------------------------------------------------------/ From Alexander at Leidinger.net Mon Apr 7 18:54:13 2003 From: Alexander at Leidinger.net (Alexander Leidinger) Date: Mon Apr 7 21:15:22 2003 Subject: [Spambayes] IMAP Filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1301231172@its-xchg4.massey.ac.nz> Message-ID: <20030407175413.24ecaa5e.Alexander@Leidinger.net> On Mon, 7 Apr 2003 19:28:13 +1200 "Meyer, Tony" wrote: > * imap_inbox: the folder to filter (defaults to inbox) Is there a way to specify more than one folder? Bye, Alexander. -- Speak softly and carry a cellular phone. http://www.Leidinger.net Alexander @ Leidinger.net GPG fingerprint = C518 BC70 E67F 143F BE91 3365 79E2 9C60 B006 3FE7 From T.A.Meyer at massey.ac.nz Tue Apr 8 14:18:11 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 21:18:54 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130123139D@its-xchg4.massey.ac.nz> > > * imap_inbox: the folder to filter (defaults to inbox) > Is there a way to specify more than one folder? Not with the code submitted, but RSN, you can specify a comma delimited list of folders. (Insert pause for someone to tell me that a comma is a valid character in an imap messagebox name). =Tony Meyer From tim at fourstonesExpressions.com Mon Apr 7 21:53:56 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 7 21:54:08 2003 Subject: [Spambayes] IMAP Filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130123139D@its-xchg4.massey.ac.nz> Message-ID: 4/7/2003 8:18:11 PM, "Meyer, Tony" wrote: >> > * imap_inbox: the folder to filter (defaults to inbox) >> Is there a way to specify more than one folder? > >Not with the code submitted, but RSN, you can specify a comma delimited >list of folders. >(Insert pause for someone to tell me that a comma is a valid character >in an imap messagebox name). Be our luck that ALL characters are allowed... nulls included... LOL c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Mon Apr 7 20:21:54 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 7 22:06:13 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 21:28 Message generated for change (Comment added) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-08 12:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 21:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From noreply at sourceforge.net Mon Apr 7 20:28:47 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 7 22:13:03 2003 Subject: [Spambayes] [ spambayes-Bugs-717253 ] Database should be saved after training Message-ID: Bugs item #717253, was opened at 2003-04-08 12:28 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717253&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Database should be saved after training Initial Comment: The database should be saved after training operations, otherwise an Outlook crash upset things. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717253&group_id=61702 From mhammond at skippinet.com.au Tue Apr 8 13:10:26 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Apr 7 22:16:54 2003 Subject: [Spambayes] Re: SpamBayes Outlook Plug-In In-Reply-To: Message-ID: Would it be possible to try another change for me instead? Revert filter.py, and in msgstore.py's Save method, find the line: self.mapi_object.SaveChanges(mapi.KEEP_OPEN_READWRITE | USE_DEFERRED_ERRORS) And remove "USE_DEFERRED_ERRORS" from it. See if the behaviour changes. If not, then try finding the definition of USE_DEFERRED_ERRORS, and change it to 0, thereby preventing *any* code in this file passing that funky MAPI flag. I am kinda hoping this will cause me to be a MAPI_E_OBJECT_CHANGED error, which would allow a more elegant solution to the problem. I'm a little unsure why one of the Save calls is indented too. BTW, I've opened http://sourceforge.net/tracker/index.php?func=detail&aid=716684&group_id=617 02&atid=498103 to track this. Thanks, Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of David Bolen > Sent: Tuesday, 8 April 2003 8:26 AM > To: spambayes@python.org > Subject: [Spambayes] Re: SpamBayes Outlook Plug-In > > > "Moore, Paul" writes: > > > As far as I know, no-one has been able to track down the problem to > > fix it yet. > > I have, however, implemented a local workaround that's working really > well for me. This was based on my prior messages in: > > http://mail.python.org/pipermail/spambayes/2003-March/004086.html > http://mail.python.org/pipermail/spambayes/2003-March/004088.html > > Since I can't quantify what if any penalty it imposes in the general > case by syncing changes back to the server an additional time, and > since the problem may be limited to Exchange servers, I haven't > proposed it be made to the main source yet - although I certainly > haven't noticed much of a penalty in my local testing. > > But if anyone else wants to try a local change, it's fairly trivial, > adding an additional call to msg.Save() in filter.py: > > *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 > --- filter.py 7 Apr 2003 22:18:58 -0000 > *************** > *** 27,39 **** > try: > # Save the score > msg.SetField(mgr.config.field_score_name, prob) > # and the ID of the folder we were in when scored. > # (but only if we want to perform all actions) > # Note we must do this, and the Save, before the > # filter, else the save will fail. > if all_actions: > msg.RememberMessageCurrentFolder() > ! msg.Save() > > if all_actions and attr_prefix is not None: > folder_id = getattr(config, attr_prefix + "_folder_id") > --- 26,39 ---- > try: > # Save the score > msg.SetField(mgr.config.field_score_name, prob) > + msg.Save() > # and the ID of the folder we were in when scored. > # (but only if we want to perform all actions) > # Note we must do this, and the Save, before the > # filter, else the save will fail. > if all_actions: > msg.RememberMessageCurrentFolder() > ! msg.Save() > > if all_actions and attr_prefix is not None: > folder_id = getattr(config, attr_prefix + "_folder_id") > > > After making this change, what went from virtually _every_ message > staying unread, became the extreme rare case, such that I'm no longer > certain any remaining case may even be spambayes related. > > -- > -- David > -- > /-----------------------------------------------------------------------\ > \ David Bolen \ E-mail: db3l@fitlinxx.com / > | FitLinxx, Inc. \ Phone: (203) 708-5192 | > / 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \ > \-----------------------------------------------------------------------/ > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > From mhammond at skippinet.com.au Tue Apr 8 13:16:32 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Apr 7 22:27:55 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lost database) In-Reply-To: <000201c2fd05$c1a9a6e0$1700a8c0@gbsage.net> Message-ID: I've added a bug for the fact that the addin doesn't save the database after a train operation. http://sourceforge.net/tracker/index.php?func=detail&aid=717253&group_id=617 02&atid=498103 For the spambayes crowd: Doing this for a "pickle" database would be prohibitive. Now that there is a bsddb3 that works for Windows, how would people feel about me dropping all pickle support from the plugin? This will require you to install bsddb3 (or Python 2.3) and do a full re-train. As I have mentioned before, I would also be happy to accept patches that do an automatic migration ;) Any objections? Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of Greg Scott > Sent: Monday, 7 April 2003 11:01 PM > To: spambayes@python.org > Subject: [Spambayes] Lost database > > > I had performed the initial training for the utility, and had around > 1100 spam examples and 400+ ham examples. Outlook hung, and I had to > abnormally terminate it and reboot in order for the computer to function > properly. When I opened Outlook again, SpamBayes indicated there were no > spam and no ham instances in its database. Unfortunately, I had already > deleted my spam examples, and now don't have them for re-training. > > Is there a way to import that information from a backup file or > anything? > > I am running XP, Outlook XP, and the binary version of the Outlook > plugin. My Python version is ActiveState 2.2.4. > > Thanks in advance, > > Greg > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > From T.A.Meyer at massey.ac.nz Tue Apr 8 15:30:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 22:31:01 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1301231417@its-xchg4.massey.ac.nz> > Now that there is a bsddb3 that works for Windows, how would > people feel about me dropping all pickle support from the > plugin? Does this include the pickle that the config stuff is (currently) stored in? No objections here, BTW. =Tony Meyer From tim.one at comcast.net Mon Apr 7 23:30:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Apr 7 22:32:03 2003 Subject: [Spambayes] Outlook plugin missing msgs In-Reply-To: Message-ID: Just FYI. My machines have often been under very heavy load the last few weeks (doing builds of Python and Zopes, and running their test suites). This gave me an opportunity to observe an oddity I thought I saw before, but wasn't sure about: when a lot of messages are coming in real fast (all my accounts are POP3, at least cable-modem speed, and this is the IMO version of Outlook 2000 -- no Exchange involved), and the machine is struggling, seemingly random contiguous ranges of the fresh msgs don't get scored. It's like 10 get scored, the next 20 don't, the next 15 do, and so on. Outlook's Inbox display also updates in chunks when this happens. There are no clues in the trace log that anything peculiar has happened: there's no evidence there that we were ever told about the unscored msgs. Note that I don't care. The next time I start Outlook, the plugin finds the unscored msgs and does right things with them. It's just evidence that we have no idea how Outlook decides whether to tell us about a msg, and evidence that it won't tell us when (perhaps among other reasons) it's short on spare cycles. It's times like these I'm so glad Outlook is open source . From mhammond at skippinet.com.au Tue Apr 8 13:44:09 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Apr 7 22:44:44 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1301231417@its-xchg4.massey.ac.nz> Message-ID: > > Now that there is a bsddb3 that works for Windows, how would > > people feel about me dropping all pickle support from the > > plugin? > > Does this include the pickle that the config stuff is (currently) stored > in? Nope, but that should save itself more reliably (there is a bug on that too :) Mark. From tim.one at comcast.net Mon Apr 7 23:44:09 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Apr 7 22:45:03 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: [Mark Hammond] > I've added a bug for the fact that the addin doesn't save the > database after a train operation. > > http://sourceforge.net/tracker/index.php?func=detail&aid=717253&gr > oup_id=61702&atid=498103 > > For the spambayes crowd: Doing this for a "pickle" database would be > prohibitive. Not for my pickle databases: they're under 2MB, and I rarely train on anything anymore. > Now that there is a bsddb3 that works for Windows, how would people feel > about me dropping all pickle support from the plugin? This will > require you to install bsddb3 (or Python 2.3) and do a full re-train. As I > have mentioned before, I would also be happy to accept patches that do an > automatic migration ;) > > Any objections? Me! I'd like to hold off on that until Python 2.3 final is released, as I don't want to encourage people to install an alpha Python (which 2.3 still is; only Python (not spambayes) alpha testers should be using anything later than Python 2.2.2). From T.A.Meyer at massey.ac.nz Tue Apr 8 15:47:43 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 7 22:48:16 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes]Lostdatabase) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130123142E@its-xchg4.massey.ac.nz> > > Any objections? > > Me! I'd like to hold off on that until Python 2.3 final is > released, as I don't want to encourage people to install an > alpha Python (which 2.3 still is; only Python (not spambayes) > alpha testers should be using anything later than Python 2.2.2). I've already said I don't care, but I will note that I'm using Python 2.2.2 with bsddb3. It's pretty simple to download it and install it separately (now that the buggy binary is gone). Depends if you want to require spambayes users (who are still alpha testers after all) to download the extra package if they aren't also Python alpha testers. =Tony Meyer From piersh at friskit.com Mon Apr 7 20:54:50 2003 From: piersh at friskit.com (Piers Haken) Date: Mon Apr 7 22:53:59 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) Message-ID: <9891913C5BFE87429D71E37F08210CB92C7567@zeus.sfhq.friskit.com> > Now that there is a bsddb3 that works for Windows, how would > people feel about me dropping all pickle support from the > plugin? This will require you to install bsddb3 (or Python > 2.3) and do a full re-train. As I have mentioned before, I > would also be happy to accept patches that do an automatic > migration ;) > > Any objections? I'll bite. As long as there are decent instructions for us 2.2.2 users.... Piers. From tim at fourstonesExpressions.com Mon Apr 7 23:23:59 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 7 23:28:36 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lost database) In-Reply-To: Message-ID: 4/7/2003 9:16:32 PM, "Mark Hammond" wrote: >Now that there is a bsddb3 that works for Windows, how would people feel >about me dropping all pickle support from the plugin? This will require you >to install bsddb3 (or Python 2.3) and do a full re-train. A full retrain can be avoided by exporting and importing using dbExpImp.py id est: dbExpImp -e -d apickledb -f dbexportfile dbExpImp -i -D absddbdb -f dbexportfile c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Mon Apr 7 23:56:09 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 7 23:56:20 2003 Subject: [Spambayes] 'Spam' on the Sourceforge site Message-ID: If you look at the spambayes project page, there's a "Buy Viagra Online" ad in the sponsored content portion of the left side navbar. Now that's ironic c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From mhammond at skippinet.com.au Tue Apr 8 15:18:39 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 8 00:19:12 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lost database) In-Reply-To: Message-ID: > A full retrain can be avoided by exporting and importing using dbExpImp.py It can't for Outlook. Mark. From mhammond at skippinet.com.au Tue Apr 8 15:21:41 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 8 00:24:50 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: > > For the spambayes crowd: Doing this for a "pickle" database would be > > prohibitive. > > Not for my pickle databases: they're under 2MB, and I rarely train on > anything anymore. But this would still mean that every unsure you bothered to hit the "recover" or "delete" button on would require writing the 2MB pickle. I believe we can afford this hit with a bsddb style database. > Me! I'd like to hold off on that until Python 2.3 final is released, as I > don't want to encourage people to install an alpha Python (which 2.3 still > is; only Python (not spambayes) alpha testers should be using > anything later > than Python 2.2.2). And requiring bsddb3 to be installed is too much of a burden? But yeah, either way, having the code not flush-on-train for pickles isn't that big of a deal. Mark. From T.A.Meyer at massey.ac.nz Tue Apr 8 19:15:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 8 02:16:23 2003 Subject: [Spambayes] IMAP Filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13012314E4@its-xchg4.massey.ac.nz> >> (Insert pause for someone to tell me that a >> comma is a valid character in an imap message box name). > > It is... but you'd probably be crazy to use it. :-) > > RFC 2060 Section 5.1 > http://www.faqs.org/rfcs/rfc2060.html RFC1730 (the original IMAP) is the same. Anything can be used as a delimiter. I'll stick with commas, and if anyone has an IMAP server that uses them, they can post a bug and we'll figure out something else. =Tony Meyer From tim_one at email.msn.com Tue Apr 8 03:22:16 2003 From: tim_one at email.msn.com (Tim Peters) Date: Tue Apr 8 02:22:54 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: [Mark Hammond] > ... > And requiring bsddb3 to be installed is too much of a burden? Installing it is more bother than not installing it. "Too much" will vary by user. I'm typing on a laptop now with a dialup connection, and almost out of disk space -- it seems like a lot of bother for me on this box right now , and the pickle-based addin works fine here. I expect bsddb3 will chew up more disk space too (relative to pickles -- have a feel for that? I gave up using Sam Rushing's bsddb 1.85 Windows port years ago due to disk bloat and bugs; I'm told the bugs are fixed in bsddb3, but don't know about disk consumption). > But yeah, either way, having the code not flush-on-train for pickles > isn't that big of a deal. If you feel more strongly about it than I do, go ahead. If the intent is to move to bsddb3 exclusively, then there's a lot to be said for biting that bullet before many more people grow pickle databases. From noreply at sourceforge.net Tue Apr 8 02:21:15 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 8 04:05:41 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 12:28 Message generated for change (Comment added) made by pmoore You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-08 09:21 Message: Logged In: YES user_id=113328 My version of filter.py looks different. I haven't updated from CVS in a while, maybe that's why. But my version looks like it has msg.Save() called unconditionally. try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. msg.RememberMessageCurrentFolder() msg.Save() I've tried moving msg.Save() to before msg.Remember...(), but I'll have to wait to see results. Will report back. (Better might be to cvs update and apply your change, but I may not get a chance to do that for a couple of days...) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-08 03:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 12:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From Paul.Moore at atosorigin.com Tue Apr 8 10:26:04 2003 From: Paul.Moore at atosorigin.com (Moore, Paul) Date: Tue Apr 8 04:38:24 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D9FB@UKDCX001.uk.int.atosorigin.com> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > > The POP3 proxy's UI webpage doesn't display the port/server > > being proxied. Instead it displays > > > > "POP3 proxy running on 1110, proxying to example.com" > > > > in all cases. > > Is it running correctly (apart from this)? Mine (latest > cvs) says "POP3 proxy running on 110, proxying to > pop.ihug.co.nz:110", as it should. If I ignore my config file > it says 'No proxies running', as it should. It's running fine apart from this. > What version are you using? CVS from a couple of weeks ago. I don't update particularly regularly these days. I'll try updating from CVS, and I'll investigate a bit more closely. Thanks for the reality check. Paul. From noreply at sourceforge.net Tue Apr 8 03:08:39 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 8 04:53:06 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 11:28 Message generated for change (Comment added) made by tobermory You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: David Leftley (tobermory) Date: 2003-04-08 09:08 Message: Logged In: YES user_id=626601 Just to try and clarify one point: earlier msgs in this thread suggest that spambayes doesn't attempt to classify a message until certain events (reading a msg, etc.) occur. In fact from watching the trace output, the message is classified as soon as it arrives (the trace shows "Message 'xxx' had a Spam classification of 'yyy'") but Outlook doesn't reflect this change until the next event occurs. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-08 08:21 Message: Logged In: YES user_id=113328 My version of filter.py looks different. I haven't updated from CVS in a while, maybe that's why. But my version looks like it has msg.Save() called unconditionally. try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. msg.RememberMessageCurrentFolder() msg.Save() I've tried moving msg.Save() to before msg.Remember...(), but I'll have to wait to see results. Will report back. (Better might be to cvs update and apply your change, but I may not get a chance to do that for a couple of days...) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-08 02:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 11:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From spambayes at djl.freeuk.com Tue Apr 8 12:16:01 2003 From: spambayes at djl.freeuk.com (David Leftley) Date: Tue Apr 8 06:16:36 2003 Subject: [Spambayes] Re: SpamBayes Outlook Plug-In In-Reply-To: References: Message-ID: On Tue, 08 Apr 2003 12:10:26 +1000, Mark Hammond wrote: >Would it be possible to try another change for me instead? Revert >filter.py, and in msgstore.py's Save method, find the line: > > self.mapi_object.SaveChanges(mapi.KEEP_OPEN_READWRITE | >USE_DEFERRED_ERRORS) > >And remove "USE_DEFERRED_ERRORS" from it. See if the behaviour changes. If >not, then try finding the definition of USE_DEFERRED_ERRORS, and change it >to 0, thereby preventing *any* code in this file passing that funky MAPI >flag. I have made this change (to the Save method, not removing the definition altogether) and it seems to fix the problem for me. I haven't yet been quick enough to open a message before spambayes gets its hands on it, but messages are now being updated to show their score as soon as they come in, and stay "read" when I open them. David. From spambayes at djl.freeuk.com Tue Apr 8 12:32:35 2003 From: spambayes at djl.freeuk.com (David Leftley) Date: Tue Apr 8 06:33:09 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lost database) In-Reply-To: References: <000201c2fd05$c1a9a6e0$1700a8c0@gbsage.net> Message-ID: On Tue, 8 Apr 2003 12:16:32 +1000, "Mark Hammond" wrote: >Now that there is a bsddb3 that works for Windows, how would people feel >about me dropping all pickle support from the plugin? This will require you >to install bsddb3 (or Python 2.3) and do a full re-train. As I have >mentioned before, I would also be happy to accept patches that do an >automatic migration ;) > I'm in favour. I don't usually like the idea of adding an extra dependency, but Outlook shutdown times are so much better with bsddb3 that I don't see why anyone would still want to use a pickle. David (just waiting for my work PC to retrain - you reminded me that I forgot to install bsddb3 when I set up my new PC last week :-) From steffen.siebert at logware.de Tue Apr 8 14:23:04 2003 From: steffen.siebert at logware.de (Steffen Siebert) Date: Tue Apr 8 07:18:34 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes]Lostdatabase) Message-ID: <8BEFECF23C9C264094D4B53DFC3ACE982A1E05@ex09-00-z001.berlin.logware.de> Hi, > But yeah, either way, having the code not flush-on-train for > pickles isn't > that big of a deal. For this case I'd love a separate "Save" button which saves the pickle on demand. Ciao, Steffen From tim at fourstonesExpressions.com Tue Apr 8 08:17:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 8 08:19:03 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lost database) In-Reply-To: Message-ID: 4/7/2003 11:18:39 PM, "Mark Hammond" wrote: >> A full retrain can be avoided by exporting and importing using dbExpImp.py Oh ya... forgot. I should fix that. Can you send me an Outlook pickle database so I can give it a go? c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Tue Apr 8 09:03:42 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 8 10:48:34 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 11:28 Message generated for change (Comment added) made by db3l You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: David Bolen (db3l) Date: 2003-04-08 15:03 Message: Logged In: YES user_id=53196 > Would it be possible to try another change for me instead? Revert > filter.py, and in msgstore.py's Save method, find the line: > > self.mapi_object.SaveChanges (mapi.KEEP_OPEN_READWRITE | > USE_DEFERRED_ERRORS) I'm pretty sure I had tried that first, while experimenting, before I ended up with the extra Save as a final "flush all" attempt. There was no change, nor did the function return an error. I think I also tried setting it to 0 (after reading the internal comment) to no apparent effect, although I may be misremembering since the same comment may have caused me to hesitate to make that change :-) But I'll double check this when I get a chance today. If I recall when testing, my review of the code seemed to indicate that everything should work fine, and that the spambayes code was managing the read flag reasonably well, and in fact, was issuing a Save() back in filter.py. I ended up inserting test Save()s at the lowermost level (which worked) and then bubbling them up to see how high I could leave the extra call so it was called as infrequently as possible. In the end, the issue seemed related to the processing that goes on when all_actions is enabled. If the Save() occurred after the field status was updated, but before RememberMessageCurrentFolder() was called, all was fine. But if the RememberMessageCurrentFolder() got called first, then the following Save() - as already in filter.py - didn't seem to "take." > I'm a little unsure why one of the Save calls is indented too. Ah, that was just to minimize change from the existing code path. I wanted to move the Save() to happen before RememberMessageCurrentFolder(), but then the second Save () became completely superfluous in the main branch case - but I didn't want to lose the Save() following RememberMessageCurrentFolder() if it was called just in case that was critical to existing behavior, so I moved it to only occur in that same block. -- David ---------------------------------------------------------------------- Comment By: David Leftley (tobermory) Date: 2003-04-08 09:08 Message: Logged In: YES user_id=626601 Just to try and clarify one point: earlier msgs in this thread suggest that spambayes doesn't attempt to classify a message until certain events (reading a msg, etc.) occur. In fact from watching the trace output, the message is classified as soon as it arrives (the trace shows "Message 'xxx' had a Spam classification of 'yyy'") but Outlook doesn't reflect this change until the next event occurs. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-08 08:21 Message: Logged In: YES user_id=113328 My version of filter.py looks different. I haven't updated from CVS in a while, maybe that's why. But my version looks like it has msg.Save() called unconditionally. try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. msg.RememberMessageCurrentFolder() msg.Save() I've tried moving msg.Save() to before msg.Remember...(), but I'll have to wait to see results. Will report back. (Better might be to cvs update and apply your change, but I may not get a chance to do that for a couple of days...) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-08 02:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 11:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From noreply at sourceforge.net Tue Apr 8 09:07:40 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 8 10:51:18 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 11:28 Message generated for change (Comment added) made by db3l You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: David Bolen (db3l) Date: 2003-04-08 15:07 Message: Logged In: YES user_id=53196 Just as an FYI, I agree with tobermory's comments in that spambayes is in fact classifying the message immediately, but it's the update to the server and/or client that appears to be delayed until the next Outlook event. See the first of my two spambayes list mail messages referenced in my note Mark included in this bug. ---------------------------------------------------------------------- Comment By: David Bolen (db3l) Date: 2003-04-08 15:03 Message: Logged In: YES user_id=53196 > Would it be possible to try another change for me instead? Revert > filter.py, and in msgstore.py's Save method, find the line: > > self.mapi_object.SaveChanges (mapi.KEEP_OPEN_READWRITE | > USE_DEFERRED_ERRORS) I'm pretty sure I had tried that first, while experimenting, before I ended up with the extra Save as a final "flush all" attempt. There was no change, nor did the function return an error. I think I also tried setting it to 0 (after reading the internal comment) to no apparent effect, although I may be misremembering since the same comment may have caused me to hesitate to make that change :-) But I'll double check this when I get a chance today. If I recall when testing, my review of the code seemed to indicate that everything should work fine, and that the spambayes code was managing the read flag reasonably well, and in fact, was issuing a Save() back in filter.py. I ended up inserting test Save()s at the lowermost level (which worked) and then bubbling them up to see how high I could leave the extra call so it was called as infrequently as possible. In the end, the issue seemed related to the processing that goes on when all_actions is enabled. If the Save() occurred after the field status was updated, but before RememberMessageCurrentFolder() was called, all was fine. But if the RememberMessageCurrentFolder() got called first, then the following Save() - as already in filter.py - didn't seem to "take." > I'm a little unsure why one of the Save calls is indented too. Ah, that was just to minimize change from the existing code path. I wanted to move the Save() to happen before RememberMessageCurrentFolder(), but then the second Save () became completely superfluous in the main branch case - but I didn't want to lose the Save() following RememberMessageCurrentFolder() if it was called just in case that was critical to existing behavior, so I moved it to only occur in that same block. -- David ---------------------------------------------------------------------- Comment By: David Leftley (tobermory) Date: 2003-04-08 09:08 Message: Logged In: YES user_id=626601 Just to try and clarify one point: earlier msgs in this thread suggest that spambayes doesn't attempt to classify a message until certain events (reading a msg, etc.) occur. In fact from watching the trace output, the message is classified as soon as it arrives (the trace shows "Message 'xxx' had a Spam classification of 'yyy'") but Outlook doesn't reflect this change until the next event occurs. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-08 08:21 Message: Logged In: YES user_id=113328 My version of filter.py looks different. I haven't updated from CVS in a while, maybe that's why. But my version looks like it has msg.Save() called unconditionally. try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. msg.RememberMessageCurrentFolder() msg.Save() I've tried moving msg.Save() to before msg.Remember...(), but I'll have to wait to see results. Will report back. (Better might be to cvs update and apply your change, but I may not get a chance to do that for a couple of days...) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-08 02:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 11:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From db3l at fitlinxx.com Tue Apr 8 18:44:03 2003 From: db3l at fitlinxx.com (David Bolen) Date: Tue Apr 8 13:44:04 2003 Subject: [Spambayes] Re: SpamBayes Outlook Plug-In References: Message-ID: David Leftley writes: > I have made this change (to the Save method, not removing the > definition altogether) and it seems to fix the problem for me. I > haven't yet been quick enough to open a message before spambayes gets > its hands on it, but messages are now being updated to show their > score as soon as they come in, and stay "read" when I open them. So far it's working ok for me too, but I'm reasonably certain (as I noted on SourceForge) that I had tested this first too - I seem to recall that it worked better than using deferred, but it still missed cases (compared to the extra Save). Unfortunately they aren't reproducing at the moment, so I'll see if I can recreate the failing scenarios over a period of time. -- David From whisper at oz.net Tue Apr 8 13:25:22 2003 From: whisper at oz.net (David LeBlanc) Date: Tue Apr 8 15:24:32 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes]Lostdatabase) In-Reply-To: Message-ID: > If you feel more strongly about it than I do, go ahead. If the > intent is to > move to bsddb3 exclusively, then there's a lot to be said for biting that > bullet before many more people grow pickle databases. Why not move to the DBAPI and let people make their own choice about db backend? I personally think a great deal of sqlite and hope it makes it's way into the core someday (and it's licensing supports that!), especially in comparison to Gadfly. Dave LeBlanc Seattle, WA USA From T.A.Meyer at massey.ac.nz Wed Apr 9 11:20:06 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 8 18:21:14 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13012315CC@its-xchg4.massey.ac.nz> > > What version are you using? > > CVS from a couple of weeks ago. I don't update particularly > regularly these days. A wise decision ;) > I'll try updating from CVS, and I'll investigate a bit more closely. If it doesn't change, I'd be interested to know. I suspect that you lucked out and got a pop3proxy that had a little bug that was later fixed (quietly!). =Tony Meyer From mhammond at skippinet.com.au Wed Apr 9 10:20:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 8 19:21:11 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: [Tim1] > If you feel more strongly about it than I do, go ahead. I don't, but: > If the intent is to move to bsddb3 exclusively, then there's > a lot to be said for biting that bullet before many more people > grow pickle databases. is exactly where I was coming from! (FYI, the binaries are all bsddb based, so real people wont be growing pickles.) So, what I have decided is that I will state publically, and document somewhere that pickles will not be supported long term by Outlook. I will keep the code so long as the cost is small. Next time an incompatible database change happens, drop support. Such a change will ideally involve an automatic "upgrade" from the existing db - but not from existing pickles, so at this time I would declare pickles dead. Hopefully this will be post Python 2.3. Where I am comimg from with the "incompatible database" is my idea for the "message database" next to our "word database", as posted here a couple of months back. I made a start on it, then decided I was being too ambitious and changing too much, so I abandonded it, intending to go back to my original "slightly hacky but less instrusive" plan. Tim2 may recall that this database is what is preventing dbExport from working, rather than the bayes word database. Since then payed work has got in the way. Damn-capitalists Mark. From tim at fourstonesExpressions.com Tue Apr 8 19:31:33 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 8 19:31:46 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: <4Y072WXVLGWUXWFC6ZICOM8275A0OL3Z.3e935bd5@myst> 4/8/2003 6:20:31 PM, "Mark Hammond" wrote: >Where I am comimg from with the "incompatible database" is my idea for the >"message database" next to our "word database", as posted here a couple of >months back. I made a start on it, then decided I was being too ambitious >and changing too much, so I abandonded it, intending to go back to my >original "slightly hacky but less instrusive" plan. Tim2 may recall that >this database is what is preventing dbExport from working, rather than the >bayes word database. Since then payed work has got in the way. I am currently hard at work on the "message database." It is being shaken out with the imap filter, and then I'll incorporate it into the notes filter and the pop3proxy. By then it should be fairly solid . You might want to have a look at it before then and let me know what you think. It's spambayes.message c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Wed Apr 9 15:22:34 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 8 22:23:12 2003 Subject: [Spambayes] Outlook & 'Non-mail' items Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139C289@its-xchg4.massey.ac.nz> I just tried to send mail to an invalid address and got a bounce back from my exchange server. The bounce message was classified as unsure and filtered into my unsure folder (which is ok). However, I tried to "Recover from spam", both to move it back to the inbox and to train, and I get the message that no mail items are selected. If it filters, shouldn't it be able to be unfiltered? Or am I missing something? =Tony Meyer From tim at fourstonesExpressions.com Tue Apr 8 23:06:33 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 8 23:11:20 2003 Subject: Removing pickle support from Outlook? (was RE: [Spambayes] Lostdatabase) In-Reply-To: Message-ID: 4/8/2003 7:53:34 PM, "Mark Hammond" wrote: >It looks quite good, and is the basis of something Outlook could use. This is good news to me . > >Some first thoughts after a quick look: > >* It would be cool if we could store the database in the same file as the >word database. bsddb supports this, and it seems to make a whole lot of >sense. Once file for all databases we come up with. Apps could even add >their own specific databases to this file. Sounds very good. I don't know how to do this. Also, the msginfo database name is hard coded at this point, clearly not desirable. > >* setIdFromPayload(), addSBHeaders() and delSBHeaders() look suspect for the >base class. If you intend splitting it later, you should consider doing it >earlier - it will force you to face certain decisions. Where would you suggest we put these? As functions somewhere? > >* The distinction between "set" and "change" will escape most people, and >doesn't seem to serve much purpose except forcing people to call "change". >Indeed, maybe "set" should check if self.id is already set, and if so, >remove that ID from the database, or assert if that ID is still there, or >some such. Tony and I have struggled with this a bit. There are too many id setters for my taste. This is a typical key change kind of problem. I'm kinda not sure it really matters much. > >* should copy() do something with the id, such as reset it? Presumably, the message that is being copied into already has an unique id set, or will at some point. This is more like a clone operation, an adaptation for the imap filter, which cannot simply modify a message. It must create a new message, with a new id, and store it, then delete the old one. A copy (clone, whatever) operation facilitates that process. > >* modified() is probably a bad name for that you are asking. It seems the >method means "HaveID()". Oh - I think you are using it as an "event"? Not >sure. Yes, it's an event. It causes the object to be persisted. Ugly, but I kinda lifted the code from the dbdictclassifier. I could use some help here too. > >* All the "isCls" and "clsfy" methods, and training versions are suspect. >If they are implementation specific, put an underscore, but to me they look >like noise. Why not just: > > GetSpamClassification(self): > return None, True or False > RememberSpamClassification(self, isSpam): > void > GetSpamTrained(self): > return None, True or False > RememberSpamTrained(self, isSpam): > void > >You have about 80 lines expressing what I believe can be done in 10 (or so >) I'm not wedded to this portion of the interface by any means, but... Here we might have a bit of a problem. The notesfilter really does have to know if a message has ever been classified, not just whether or not it has been classified as spam. This is because there is no way to know if you've ever looked at a message or not. Each time you run the filter, you look at every single stinking message in the entire notes database. So... classification has to be more than a binary flag. I gotta know if it's spam, ham, unsure, or never classified. I'll grant you that perhaps that should go in a notesmessage subclass, but then persistence gets to be a problem... > >If we can agree on most of this, I may even help ;) Your help would be MOST welcome. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From mhammond at skippinet.com.au Wed Apr 9 14:15:20 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 8 23:16:19 2003 Subject: [Spambayes] Outlook & 'Non-mail' items In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130139C289@its-xchg4.massey.ac.nz> Message-ID: There is a bug on that, I think. Certainly one that "drafts" are filtered, but I think it is the same. There should be a single function to determine "should I look at this object", and it be used by both the filter and train operations. Bounces by outlook when not in "corporate" mode are just mail messages, so I never see this. Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org]On Behalf Of Meyer, Tony > Sent: Wednesday, 9 April 2003 12:23 PM > To: spambayes@python.org > Subject: [Spambayes] Outlook & 'Non-mail' items > > > I just tried to send mail to an invalid address and got a bounce back > from my exchange server. The bounce message was classified as unsure > and filtered into my unsure folder (which is ok). > > However, I tried to "Recover from spam", both to move it back to the > inbox and to train, and I get the message that no mail items are > selected. > > If it filters, shouldn't it be able to be unfiltered? Or am I missing > something? > > =Tony Meyer > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes > From mhammond at skippinet.com.au Wed Apr 9 14:30:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 8 23:31:28 2003 Subject: Removing pickle support from Outlook? (was RE:[Spambayes] Lostdatabase) In-Reply-To: Message-ID: > >* setIdFromPayload(), addSBHeaders() and delSBHeaders() look > > suspect for the base class. If you intend splitting it later, > > you should consider doing it > >earlier - it will force you to face certain decisions. > > Where would you suggest we put these? As functions somewhere? As I said, "suspect for a base class". Thus, I suggest you put them in a sub-class The abstract interface needs not much more than an ID, and methods to remember state. > my taste. This is a typical key change kind of problem. I'm > kinda not sure it really matters much. I'm not sure what a "key change kind of problem" is, but yeah, it doesn't matter much, but we may as well make it clean from the start. > >* should copy() do something with the id, such as reset it? > > Presumably, the message that is being copied into already has an > unique id > set, or will at some point. This is more like a clone operation, an > adaptation for the imap filter, which cannot simply modify a > message. It must > create a new message, with a new id, and store it, then delete > the old one. A > copy (clone, whatever) operation facilitates that process. Maybe this needs more thinking through. I assume a standard Python copy is not suitable? Either way, the semantics should be clear - Outlook has the ability to copy a message, and also to "clone" a Python object. Copying a message changes its ID, as does moving a message. I really don't know what Outlook should do for clone. > >* All the "isCls" and "clsfy" methods, and training versions are suspect. > >If they are implementation specific, put an underscore, but to > me they look > >like noise. Why not just: > > > > GetSpamClassification(self): > > return None, True or False > > RememberSpamClassification(self, isSpam): > > void > > GetSpamTrained(self): > > return None, True or False > > RememberSpamTrained(self, isSpam): > > void > > > >You have about 80 lines expressing what I believe can be done in > 10 (or so > >) > > I'm not wedded to this portion of the interface by any means, but... > > Here we might have a bit of a problem. The notesfilter really > does have to > know if a message has ever been classified, not just whether or > not it has > been classified as spam. Yes, this is the "None" return val. All cases had: > > return None, True or False What did you think I meant? None = not known, result = result :) I think you should step back a little, and think about what the abstract interface needs to do. I see very few methods. I can see that "id changing" is an issue that also could be dealt with by the base class, so that may complicate it a little. But almost everything else seems to be candidates for a "pop3proxy" sub-class. There may even be scope for a "header-enabled" sub-class, able to be used by all applications which can implement everything from standard message headers, and thus able to be shared by pop3proxy and notes. Clear semantics for "copying" may be OK too, as long as exactly what it means is explained (but I don't see the need) A decent rule of thumb will be that whenever your abstract interface has the concept using headers for *anything*, it wont make sense for Outlook, and thus would be better placed in a sub-class. Any assumptions about IDs, other than that they are strings and that they may change over the life a single object similarly. When I find some more cycles for spambayes I will have a play with whatever is in CVS at the time. Mark. From T.A.Meyer at massey.ac.nz Wed Apr 9 16:33:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 8 23:34:26 2003 Subject: Message class (was: RE: Removing pickle support from Outlook? (was RE:[Spambayes] Lostdatabase)) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139C2FC@its-xchg4.massey.ac.nz> [Mark] > * The distinction between "set" and "change" will escape > most people, and doesn't seem to serve much purpose except > forcing people to call "change". Indeed, maybe "set" should > check if self.id is already set, and if so, remove that ID > from the database, or assert if that ID is still there, or > some such. I'm guilty of adding the change function. Tim had a check to make sure that you didn't set an id if it was already set, but I needed to change the ID for the IMAP filter. I didn't want to undo his stuff, so I added a new function. I'd be happy to dump the change function and just have set. I like the idea of removing the id if it is already set. Otherwise we can just dump change and I'll change the IMAP filter to create a new message (object) when it creates a new (imap) message. [Mark] > * should copy() do something with the id, such as reset it? [Tim] > Presumably, the message that is being copied into already has > an unique id set, or will at some point. This is more like a > clone operation, an adaptation for the imap filter, which cannot > simply modify a message. It must create a new message, with a > new id, and store it, then delete the old one. A copy (clone, > whatever) operation facilitates that process. I haven't managed to look at it properly since Tim made these changes (busy day), but I think that the copy should go since it's expensive (the payload copy) and really only of use for IMAP. I'll modify the (derived) IMAPMessage class to create a new Message object with the correct key, and do the payload (etc) copy. [Mark] > >* All the "isCls" and "clsfy" methods, and training versions are > >suspect. I agree that this should be changed. [Mark] > > GetSpamClassification(self): > > return None, True or False > > RememberSpamClassification(self, isSpam): > > void > > GetSpamTrained(self): > > return None, True or False > > RememberSpamTrained(self, isSpam): > > void [Tim] > Here we might have a bit of a problem. The notesfilter > really does have to know if a message has ever been classified, What about: GetClassification(self): & GetTrained(self): return None, "spam", "ham", "unsure" (these could use the options. values) RememberClassification and RememberTrained could be the same. None, True and False is ok, but what about unsures? (Ok, you can't train as unsure, but you can be classified as such, and it's nice if they look kinda the same). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Apr 9 16:43:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 8 23:44:34 2003 Subject: Removing pickle support from Outlook? (was RE:[Spambayes]Lostdatabase) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139C30C@its-xchg4.massey.ac.nz> > I'm not sure what a "key change kind of problem" is, but > yeah, it doesn't matter much, but we may as well make it > clean from the start. +1 > Yes, this is the "None" return val. All cases had: > > > return None, True or False > What did you think I meant? None = not known, result = result :) Well, None could have been unsure (see my previous message). > I can see that "id changing" is an issue that also could be > dealt with by the base class, so that may complicate it a > little. The more I consider it, the more I think that an ID shouldn't be able to change. If IMAP needs to assign a new id, it can create a new message. Hopefully others won't need to (Outlook ids are meant to be permanent, right?). > But almost everything else seems to be candidates > for a "pop3proxy" sub-class. There may even be scope for a > "header-enabled" sub-class, able to be used by all > applications which can implement everything from standard > message headers, and thus able to be shared by pop3proxy and > notes. This definitely should be a "header-enabled" sub-class. Pop3proxy, IMAP, notes and maybe hammie (don't know much about hammie) can use this. 'Real' integrations like the Outlook plugin can go their own way after taking the base class. > Any assumptions about IDs, other than that > they are strings and that they may change over the life a > single object similarly. Should an id be able to change over the lifetime of an object? > When I find some more cycles for spambayes I will have a play > with whatever is in CVS at the time. If I find time today I might do some of this since we're testing with IMAP at the moment. Putting it all together into one bsddb3 database is all yours, though ;) =Tony Meyer From mhammond at skippinet.com.au Wed Apr 9 16:19:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Apr 9 01:20:42 2003 Subject: Removing pickle support from Outlook? (wasRE:[Spambayes]Lostdatabase) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130139C30C@its-xchg4.massey.ac.nz> Message-ID: > > Yes, this is the "None" return val. All cases had: > > > > return None, True or False > > What did you think I meant? None = not known, result = result :) > > Well, None could have been unsure (see my previous message). Right, yes, I missed that. Do we actually need to know the classification, or just the score? But yeah, maybe one extra method. Outlook has no need for the "classified" portion of this database, only the "trained", but that is OK. Actually, I *might* use the classify database - it may help with read-only stores, such as hotmail. The "if I have already seen this message, don't filter it" smarts are likely to fail for hotmail. Not that I have a clue about hotmail. > The more I consider it, the more I think that an ID shouldn't be able to > change. If IMAP needs to assign a new id, it can create a new message. > Hopefully others won't need to (Outlook ids are meant to be permanent, > right?). They change when you move a message to a new folder. I can see how it is handy to say "this message has changed ID" and have the database updated accordingly - but I can't see why SetID can't do it. OTOH, Outlook uses the "conversation id" (iirc) rather than message ID for this database just to cope with that fact. > This definitely should be a "header-enabled" sub-class. Pop3proxy, > IMAP, notes and maybe hammie (don't know much about hammie) can use > this. 'Real' integrations like the Outlook plugin can go their own way > after taking the base class. I dont see that I need anything beyond the base class - just a handful of "remember this" and "get this" methods passing an ID I have already constructed. But yeah, that is right. > > Any assumptions about IDs, other than that > > they are strings and that they may change over the life a > > single object similarly. > > Should an id be able to change over the lifetime of an object? Obviously this is easy to work around by an application - just "unremember" the old message ID, and remember the new one. If this makes life easier I am happy for it, but I don't think Outlook would use it - Outlook changes IDs on message moves, and the user can move whatever they like without us able to track it - so yeah, Outlook needs an immutable ID. > > When I find some more cycles for spambayes I will have a play > > with whatever is in CVS at the time. > > If I find time today I might do some of this since we're testing with > IMAP at the moment. Putting it all together into one bsddb3 database is > all yours, though ;) Deal ;) Mark. From noreply at sourceforge.net Tue Apr 8 23:37:25 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 9 01:21:47 2003 Subject: [Spambayes] [ spambayes-Bugs-717998 ] Can't reset Spam folder if folder is lost Message-ID: Bugs item #717998, was opened at 2003-04-09 00:37 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benjamin J. Judson (astrogen) Assigned to: Mark Hammond (mhammond) Summary: Can't reset Spam folder if folder is lost Initial Comment: If the Spam Manager is set up to move spam to a folder and that folder disappears, the Spam Manager may show that spam is to be delivered to . In this event trying to browse the folder list will not list any folders, and you will be unable to set the Spam folder to anything else. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 From jamie at audible.transient.net Wed Apr 9 21:28:53 2003 From: jamie at audible.transient.net (Jamie Heilman) Date: Wed Apr 9 23:37:40 2003 Subject: [Spambayes] CdbClassifer [sic] Message-ID: <20030410032853.GB31144@audible.transient.net> CdbClassifier is misspelt throughout the spambayes project tree as "CdbClassifer" attached patch fixes this. -- Jamie Heilman http://audible.transient.net/~jamie/ "I was in love once -- a Sinclair ZX-81. People said, "No, Holly, she's not for you." She was cheap, she was stupid and she wouldn't load -- well, not for me, anyway." -Holly -------------- next part -------------- --- mailsort.py 16 Feb 2003 17:05:07 -0000 1.6 +++ mailsort.py 10 Apr 2003 03:22:22 -0000 @@ -30,11 +30,11 @@ DB_FILE = os.path.expanduser(DB_FILE) def import_spambayes(): - global mboxutils, CdbClassifer, tokenize + global mboxutils, CdbClassifier, tokenize if not os.environ.has_key('BAYESCUSTOMIZE'): os.environ['BAYESCUSTOMIZE'] = os.path.expanduser(CONFIG_FILE) from spambayes import mboxutils - from spambayes.cdb_classifier import CdbClassifer + from spambayes.cdb_classifier import CdbClassifier from spambayes.tokenizer import tokenize @@ -87,7 +87,7 @@ if not os.path.exists(rc_dir): print "Creating", RC_DIR, "directory..." os.mkdir(rc_dir) - bayes = CdbClassifer() + bayes = CdbClassifier() print 'Training with ham...' train(bayes, ham_name, False) print 'Training with spam...' @@ -123,7 +123,7 @@ del blocks msg = email.message_from_string(msgdata) del msgdata - bayes = CdbClassifer(open(DB_FILE, 'rb')) + bayes = CdbClassifier(open(DB_FILE, 'rb')) prob = bayes.spamprob(tokenize(msg)) else: prob = 0.0 @@ -138,7 +138,7 @@ def print_message_score(msg_name, msg_fp): msg = email.message_from_file(msg_fp) - bayes = CdbClassifer(open(DB_FILE, 'rb')) + bayes = CdbClassifier(open(DB_FILE, 'rb')) prob, evidence = bayes.spamprob(tokenize(msg), evidence=True) print msg_name, prob for word, prob in evidence: --- spambayes/cdb_classifier.py 20 Jan 2003 03:14:32 -0000 1.1 +++ spambayes/cdb_classifier.py 10 Apr 2003 03:22:40 -0000 @@ -10,7 +10,7 @@ from spambayes.tokenizer import tokenize from spambayes.classifier import Classifier -class CdbClassifer(Classifier): +class CdbClassifier(Classifier): def __init__(self, cdbfile=None): Classifier.__init__(self) if cdbfile is not None: From tim at fourstonesExpressions.com Wed Apr 9 23:51:49 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 9 23:52:02 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: <20030410032853.GB31144@audible.transient.net> Message-ID: <2V96URC0NHNHQA04X2VSONHB6JEUTYV.3e94ea55@myst> 4/9/2003 10:28:53 PM, Jamie Heilman wrote: >CdbClassifier is misspelt throughout the spambayes project tree as >"CdbClassifer" attached patch fixes this. >-- >Jamie Heilman http://audible.transient.net/> ~jamie/ >"I was in love once -- a Sinclair ZX-81. People said, "No, Holly, >she's > not for you." She was cheap, she was stupid and she wouldn't load > -- well, not for me, anyway."-Holly Guess that shows you about how much cdb classifiers are used... They're very similar to zx-81... cheap, slow, and well, they *usually* load... :) However, if you really are interested in that working, then you should formally submit the patch on the spambayes project. It'll get lost on this list very quickly. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From jamie at audible.transient.net Wed Apr 9 22:36:38 2003 From: jamie at audible.transient.net (Jamie Heilman) Date: Thu Apr 10 01:06:01 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: <2V96URC0NHNHQA04X2VSONHB6JEUTYV.3e94ea55@myst> References: <20030410032853.GB31144@audible.transient.net> <2V96URC0NHNHQA04X2VSONHB6JEUTYV.3e94ea55@myst> Message-ID: <20030410043638.GC31144@audible.transient.net> Tim Stone - Four Stones Expressions wrote: > > However, if you really are interested in that working, then you > should formally submit the patch on the spambayes project. It'll > get lost on this list very quickly. Actually it works fine, its just consistently misspelled which makes it a tad odd to "link" against. I don't have a sourceforge acct, nor do I plan on getting one just to submit this patch, but it is really easy to generate: cd your/path/to/spambayes && find . -type f -print0 | xargs -0 perl -pi -e 's/CdbClassifer/CdbClassifier/;' thats it -- Jamie Heilman http://audible.transient.net/~jamie/ "You came all this way, without saying squat, and now you're trying to tell me a '56 Chevy can beat a '47 Buick in a dead quarter mile? I liked you better when you weren't saying squat kid." -Buddy From noreply at sourceforge.net Thu Apr 10 00:06:49 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 10 01:50:12 2003 Subject: [Spambayes] [ spambayes-Bugs-717998 ] Can't reset Spam folder if folder is lost Message-ID: Bugs item #717998, was opened at 2003-04-09 15:37 Message generated for change (Comment added) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benjamin J. Judson (astrogen) Assigned to: Mark Hammond (mhammond) Summary: Can't reset Spam folder if folder is lost Initial Comment: If the Spam Manager is set up to move spam to a folder and that folder disappears, the Spam Manager may show that spam is to be delivered to . In this event trying to browse the folder list will not list any folders, and you will be unable to set the Spam folder to anything else. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-10 16:06 Message: Logged In: YES user_id=14198 If there a traceback associated with this? I regularly "test" this, thanks to Outlook screwing all my folder IDs as I reconfigure Outlook, and I don't have the problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 From noreply at sourceforge.net Thu Apr 10 06:01:32 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 10 07:46:11 2003 Subject: [Spambayes] [ spambayes-Bugs-716684 ] Filtering marks message as unread Message-ID: Bugs item #716684, was opened at 2003-04-07 12:28 Message generated for change (Comment added) made by pmoore You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Mark Hammond (mhammond) Assigned to: Mark Hammond (mhammond) Summary: Filtering marks message as unread Initial Comment: Reported too many times :) Exchange server users only. As spambayes startsit starts processing missed messages. In the meantime, the user reads some messages, thereby marking them as read. As smapbayes writes the spam field, these messages spring back to a read status. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-10 13:01 Message: Logged In: YES user_id=113328 I just had this happen to me again. I'm using the CVS version from a day or two ago. So the version in current CVS isn't (completely) fixed. Anythong else I could try? I see part of a thread about USE_DEFERRED_ERRORS here, but I can't find the original. My code has USE_DEFERRED_ERRORS in it. If you can tell me what (if anything) else needs changing, I can try taking it out. ---------------------------------------------------------------------- Comment By: David Bolen (db3l) Date: 2003-04-08 16:07 Message: Logged In: YES user_id=53196 Just as an FYI, I agree with tobermory's comments in that spambayes is in fact classifying the message immediately, but it's the update to the server and/or client that appears to be delayed until the next Outlook event. See the first of my two spambayes list mail messages referenced in my note Mark included in this bug. ---------------------------------------------------------------------- Comment By: David Bolen (db3l) Date: 2003-04-08 16:03 Message: Logged In: YES user_id=53196 > Would it be possible to try another change for me instead? Revert > filter.py, and in msgstore.py's Save method, find the line: > > self.mapi_object.SaveChanges (mapi.KEEP_OPEN_READWRITE | > USE_DEFERRED_ERRORS) I'm pretty sure I had tried that first, while experimenting, before I ended up with the extra Save as a final "flush all" attempt. There was no change, nor did the function return an error. I think I also tried setting it to 0 (after reading the internal comment) to no apparent effect, although I may be misremembering since the same comment may have caused me to hesitate to make that change :-) But I'll double check this when I get a chance today. If I recall when testing, my review of the code seemed to indicate that everything should work fine, and that the spambayes code was managing the read flag reasonably well, and in fact, was issuing a Save() back in filter.py. I ended up inserting test Save()s at the lowermost level (which worked) and then bubbling them up to see how high I could leave the extra call so it was called as infrequently as possible. In the end, the issue seemed related to the processing that goes on when all_actions is enabled. If the Save() occurred after the field status was updated, but before RememberMessageCurrentFolder() was called, all was fine. But if the RememberMessageCurrentFolder() got called first, then the following Save() - as already in filter.py - didn't seem to "take." > I'm a little unsure why one of the Save calls is indented too. Ah, that was just to minimize change from the existing code path. I wanted to move the Save() to happen before RememberMessageCurrentFolder(), but then the second Save () became completely superfluous in the main branch case - but I didn't want to lose the Save() following RememberMessageCurrentFolder() if it was called just in case that was critical to existing behavior, so I moved it to only occur in that same block. -- David ---------------------------------------------------------------------- Comment By: David Leftley (tobermory) Date: 2003-04-08 10:08 Message: Logged In: YES user_id=626601 Just to try and clarify one point: earlier msgs in this thread suggest that spambayes doesn't attempt to classify a message until certain events (reading a msg, etc.) occur. In fact from watching the trace output, the message is classified as soon as it arrives (the trace shows "Message 'xxx' had a Spam classification of 'yyy'") but Outlook doesn't reflect this change until the next event occurs. ---------------------------------------------------------------------- Comment By: Paul Moore (pmoore) Date: 2003-04-08 09:21 Message: Logged In: YES user_id=113328 My version of filter.py looks different. I haven't updated from CVS in a while, maybe that's why. But my version looks like it has msg.Save() called unconditionally. try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. msg.RememberMessageCurrentFolder() msg.Save() I've tried moving msg.Save() to before msg.Remember...(), but I'll have to wait to see results. Will report back. (Better might be to cvs update and apply your change, but I may not get a chance to do that for a couple of days...) ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-08 03:21 Message: Logged In: YES user_id=14198 >From David Bolen: "Moore, Paul" writes: > As far as I know, no-one has been able to track down the > problem to fix it yet. I have, however, implemented a local workaround that's working really well for me. This was based on my prior messages in: http://mail.python.org/pipermail/spambayes/2003-March/004086.html http://mail.python.org/pipermail/spambayes/2003-March/004088.html Since I can't quantify what if any penalty it imposes in the general case by syncing changes back to the server an additional time, and since the problem may be limited to Exchange servers, I haven't proposed it be made to the main source yet - although I certainly haven't noticed much of a penalty in my local testing. But if anyone else wants to try a local change, it's fairly trivial, adding an additional call to msg.Save() in filter.py: *** filter.py 18 Mar 2003 03:09:03 -0000 1.20 --- filter.py 7 Apr 2003 22:18:58 -0000 *************** *** 27,39 **** try: # Save the score msg.SetField(mgr.config.field_score_name, prob) # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") --- 26,39 ---- try: # Save the score msg.SetField(mgr.config.field_score_name, prob) + msg.Save() # and the ID of the folder we were in when scored. # (but only if we want to perform all actions) # Note we must do this, and the Save, before the # filter, else the save will fail. if all_actions: msg.RememberMessageCurrentFolder() ! msg.Save() if all_actions and attr_prefix is not None: folder_id = getattr(config, attr_prefix + "_folder_id") After making this change, what went from virtually _every_ message staying unread, became the extreme rare case, such that I'm no longer certain any remaining case may even be spambayes related. ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-07 12:58 Message: Logged In: YES user_id=751487 A bit of investigation (I am using Outlook 2002 & Exchange 2000). A new message arriving in my Inbox does not get analysed by SpamBayes until: - You switch to another folder and back again. - You read the message (then analysed, but remains 'unread') - Another email arrives, when the first message is then rated (second one not rated though) - Another email in Inbox is saved after editing In other words, it seems some other action eventually triggers the analysis - not the arrival event itself. An email that is read before it has been analysed by SpamBayes will always be 'unread' afterwards. This is independent of type (plain, rich text or HTML) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=716684&group_id=61702 From skip at pobox.com Thu Apr 10 09:29:32 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Apr 10 09:29:42 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: <20030410032853.GB31144@audible.transient.net> References: <20030410032853.GB31144@audible.transient.net> Message-ID: <16021.29116.34266.801597@montanaro.dyndns.org> Jamie> CdbClassifier is misspelt throughout the spambayes project tree Jamie> as "CdbClassifer" attached patch fixes this. Thanks for bringing the problem to our attention. I just checked in corrected versions of the two affected files. -- Skip Montanaro skip@pobox.com http://www.musi-cal.com/ From tim at fourstonesExpressions.com Thu Apr 10 09:37:20 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 10 09:40:46 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: <16021.29116.34266.801597@montanaro.dyndns.org> Message-ID: 4/10/2003 8:29:32 AM, Skip Montanaro wrote: > > Jamie> CdbClassifier is misspelt throughout the spambayes project tree > Jamie> as "CdbClassifer" attached patch fixes this. > >Thanks for bringing the problem to our attention. I just checked in >corrected versions of the two affected files. The cdb thing kinda fell through the cracks. I don't think we should support this. It was early work, and isn't supported in any mainstream code. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From nas at python.ca Thu Apr 10 07:58:51 2003 From: nas at python.ca (Neil Schemenauer) Date: Thu Apr 10 09:57:49 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: References: <16021.29116.34266.801597@montanaro.dyndns.org> Message-ID: <20030410135850.GA21693@glacier.arctrix.com> Tim Stone - Four Stones Expressions wrote: > The cdb thing kinda fell through the cracks. I don't think we should > support this. It was early work, and isn't supported in any > mainstream code. Huh? It works great for me. I should add support for mailbox formats other than Maildir but otherwise I don't know of any problems. Simple is good, IMHO. Neil From skip at pobox.com Thu Apr 10 10:19:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Apr 10 10:19:53 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: References: <16021.29116.34266.801597@montanaro.dyndns.org> Message-ID: <16021.32127.878951.484398@montanaro.dyndns.org> Tim> The cdb thing kinda fell through the cracks. I don't think we Tim> should support this. It was early work, and isn't supported in any Tim> mainstream code. That's fine, but I think we should still spell "Classifier" correctly. Feel free to rip that code out completely. Skip From tim at fourstonesExpressions.com Thu Apr 10 10:53:05 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 10 10:53:37 2003 Subject: [Spambayes] CdbClassifer [sic] In-Reply-To: <16021.32127.878951.484398@montanaro.dyndns.org> Message-ID: <74436Z5ZNMYSE0762YKJKEROA9SOWSZY.3e958551@myst> 4/10/2003 9:19:43 AM, Skip Montanaro wrote: >That's fine, but I think we should still spell "Classifier" correctly. I'll grant you that... LOL! > Feel free to rip that code out completely. > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Fri Apr 11 04:20:59 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 11 06:04:52 2003 Subject: [Spambayes] [ spambayes-Bugs-719586 ] Cannot View Spam Cues for Undeliverable Reports Message-ID: Bugs item #719586, was opened at 2003-04-11 10:20 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin Worger (worger) Assigned to: Mark Hammond (mhammond) Summary: Cannot View Spam Cues for Undeliverable Reports Initial Comment: The SpamBayes Outlook add-in marked an undeliverable notice as spam. I moved it back into my Inbox, but it still shows a 90% rating. When I try to view the spam cues for it says that no message is selected. The email is using a different message class i.e. it is a 'report' not a 'message' (when you look at the message properties). BTW: Sorry if I have trampled on any protocols by posting this - I'm not a developer, merely testing (and it is doing a pretty good job of filtering, this is my first FP since filtering was enabled - I'm impressed!) Martin Worger ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 From noreply at sourceforge.net Fri Apr 11 04:25:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 11 06:08:56 2003 Subject: [Spambayes] [ spambayes-Bugs-719586 ] Cannot View Spam Cues for Undeliverable Reports Message-ID: Bugs item #719586, was opened at 2003-04-11 10:20 Message generated for change (Comment added) made by worger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin Worger (worger) Assigned to: Mark Hammond (mhammond) Summary: Cannot View Spam Cues for Undeliverable Reports Initial Comment: The SpamBayes Outlook add-in marked an undeliverable notice as spam. I moved it back into my Inbox, but it still shows a 90% rating. When I try to view the spam cues for it says that no message is selected. The email is using a different message class i.e. it is a 'report' not a 'message' (when you look at the message properties). BTW: Sorry if I have trampled on any protocols by posting this - I'm not a developer, merely testing (and it is doing a pretty good job of filtering, this is my first FP since filtering was enabled - I'm impressed!) Martin Worger ---------------------------------------------------------------------- >Comment By: Martin Worger (worger) Date: 2003-04-11 10:25 Message: Logged In: YES user_id=751487 Sorry - I didn't see bug 690418! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 From noreply at sourceforge.net Fri Apr 11 04:27:38 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Fri Apr 11 06:11:30 2003 Subject: [Spambayes] [ spambayes-Bugs-719586 ] Cannot View Spam Cues for Undeliverable Reports Message-ID: Bugs item #719586, was opened at 2003-04-11 10:20 Message generated for change (Comment added) made by worger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Martin Worger (worger) Assigned to: Mark Hammond (mhammond) Summary: Cannot View Spam Cues for Undeliverable Reports Initial Comment: The SpamBayes Outlook add-in marked an undeliverable notice as spam. I moved it back into my Inbox, but it still shows a 90% rating. When I try to view the spam cues for it says that no message is selected. The email is using a different message class i.e. it is a 'report' not a 'message' (when you look at the message properties). BTW: Sorry if I have trampled on any protocols by posting this - I'm not a developer, merely testing (and it is doing a pretty good job of filtering, this is my first FP since filtering was enabled - I'm impressed!) Martin Worger ---------------------------------------------------------------------- >Comment By: Martin Worger (worger) Date: 2003-04-11 10:27 Message: Logged In: YES user_id=751487 Sorry - I didn't see bug 690418! ---------------------------------------------------------------------- Comment By: Martin Worger (worger) Date: 2003-04-11 10:25 Message: Logged In: YES user_id=751487 Sorry - I didn't see bug 690418! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=719586&group_id=61702 From list2003 at fure.net Fri Apr 11 09:19:32 2003 From: list2003 at fure.net (Jan Fure) Date: Fri Apr 11 11:16:45 2003 Subject: [Spambayes] Python or spambayes or user induced? Message-ID: <3E96DD04.5040104@fure.net> Hi; I get an error after the 'python pop3proxy.py' script has been running for a while. I am unsure whether this is related to my configuration (user), or python 2.3.1 (alpha), or spambayes itself. I am including the error message below: python pop3proxy.py Loading database... Done. Listener on port 110 is proxying pop3.wyith.net:110 User interface url is http://localhost:8880/ error: uncaptured python exception, closing channel <__main__.ServerLineReader connected at 0x404f2d0c> (socket.error:(111, 'Connection refused') [/usr/local/lib/python2.3/asynchat.py|handle_read|88] [/usr/local/lib/python2.3/asyncore.py|recv|353]) It would be more convenient if the proxy could be quietly running all the time. I would be happy for any pointers. My linux is redhat 7.3, with sufficient library updates to compile python 2.3.1. Jan Fure From list2003 at fure.net Fri Apr 11 09:24:39 2003 From: list2003 at fure.net (Jan Fure) Date: Fri Apr 11 11:21:50 2003 Subject: [Spambayes] Mixed case words in heading Message-ID: <3E96DE37.1050100@fure.net> Hi; I have noticed that many spam messages contain words with mixed case. I am wondering whether spambayes has any provision for increasing the spam rating when that occurs, even if that particular mixed case word has not been encountered before? Also, are there any provisions for creating a whitelist, or is that typically not necessary, as the filter algorithm is effective enough? Jan From bill at parducci.net Fri Apr 11 10:07:22 2003 From: bill at parducci.net (bill parducci) Date: Fri Apr 11 12:07:27 2003 Subject: [Spambayes] Mixed case words in heading References: <3E96DE37.1050100@fure.net> Message-ID: <3E96E83A.1060807@parducci.net> spambayes doesn't 'whitelist'. the idea is to create a profile through training. admittedly, it may take a while for some messages to be be trained properly (i personally have problems with state department notices because they have so many words similar to mail scams) but it will occur in the vast majority of cases with time (as long as you continue to retrain! :o) there has been some discussions about training on various components of each message and using various techniques to combine those scores. i am not sure what the current status is, but i expect that, "[someone] build it. test it. show it." would be a good guess :o) as to the 'mixed case' issue, i believe that there have been a couple of different tests looking at case, etc., none of which returned statistical relevance. therefore, i *think* that the scoring is case insensitive currently (i would assume to optimize db size). b Jan Fure wrote: > Hi; > > I have noticed that many spam messages contain words with mixed case. I > am wondering whether spambayes has any provision for increasing the spam > rating when that occurs, even if that particular mixed case word has not > been encountered before? > > Also, are there any provisions for creating a whitelist, or is that > typically not necessary, as the filter algorithm is effective enough? > > Jan > > > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From sgdavis at koyote.com Fri Apr 11 13:05:40 2003 From: sgdavis at koyote.com (Steve Davis) Date: Fri Apr 11 13:09:34 2003 Subject: [Spambayes] Note on SPAM Message-ID: Dear Sir, First, I sincerely apologize if this causes any annoyance. I am fully aware how irritated many people get when a person poses a question or makes a suggestion without first checking out documentation or at least reading a FAQ. I admit to not having read things as of yet in order to determine the most proper place to send such e-mail. I have been using the internet for close to two decades and one thing I have found to be true about SPAM, without exception. Every e-mail whose header shows that it was received from a mailer whose alias resolved to an unknown IP address has been SPAM. Frankly, I've been on the lookout for even 1 legitimate e-mail that resolved to "unknown" -- still haven't come across one. I'm not very adept at programming so have never been able to create a SPAM removal program. If your developers are already aware of this or just deem it as useless info please pardon my intrusion. If not, might I impose on you to relate this to the developers? Have a good day! Steve Davis sgdavis@koyote.com From popiel at wolfskeep.com Fri Apr 11 12:17:51 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Fri Apr 11 14:17:55 2003 Subject: [Spambayes] Note on SPAM In-Reply-To: Message from "Steve Davis" of "Fri, 11 Apr 2003 12:05:40 CDT." References: Message-ID: <20030411181751.23CA32DDDA@cashew.wolfskeep.com> In message: "Steve Davis" writes: > >I have been using the internet for close to two decades and one thing I have >found to be true about SPAM, without exception. Every e-mail whose header >shows that it was received from a mailer whose alias resolved to an unknown >IP address has been SPAM. Frankly, I've been on the lookout for even 1 >legitimate e-mail that resolved to "unknown" -- still haven't come across >one. I'm not very adept at programming so have never been able to create a >SPAM removal program. Mail that I've personally sent out has occasionally had this sort of marking, because my ISP occasionally gets their reverse-maps messed up. I would be quite annoyed if this alone caused some of my outgoing mail to be dropped. Of course, I have trouble producing such samples on demand, precisely because it is due to the failure of an outside agency... >If your developers are already aware of this or just deem it as useless info >please pardon my intrusion. If not, might I impose on you to relate this to >the developers? Not useless, merely ocasionally misleading. If you have tokenization of the Received headers turned on, then I believe that this clue will get automatically weighted and factored in. - Alex From rob at hooft.net Sat Apr 12 09:46:35 2003 From: rob at hooft.net (Rob Hooft) Date: Sat Apr 12 02:49:20 2003 Subject: [Spambayes] Note on SPAM In-Reply-To: References: Message-ID: <3E97B64B.30004@hooft.net> Steve Davis wrote: > I have been using the internet for close to two decades and one thing I have > found to be true about SPAM, without exception. Every e-mail whose header > shows that it was received from a mailer whose alias resolved to an unknown > IP address has been SPAM. The "postfix" mailer has an option to refuse such E-mail. If this is 100% true for you, you can use that option. From my experience, this is not true at all. Many, many companies, even big companies, have mistakes in their DNS setup. I agree that this would be a very good thing to enforce, but personnally I wouldn't like to be responsible if one of these big companies could not reach the sales people at the place where I work! If it would be globally enforced, I'd join. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From francois.granger at free.fr Sat Apr 12 10:47:59 2003 From: francois.granger at free.fr (Francois Granger) Date: Sat Apr 12 03:48:05 2003 Subject: [Spambayes] Python or spambayes or user induced? In-Reply-To: <3E96DD04.5040104@fure.net> References: <3E96DD04.5040104@fure.net> Message-ID: At 08:19 -0700 11/04/2003, in message [Spambayes] Python or spambayes or user induced?, Jan Fure wrote: >Hi; > >I get an error after the 'python pop3proxy.py' script has been >running for a while. I am unsure whether this is related to my >configuration (user), or python 2.3.1 (alpha), or spambayes itself. >I am including the error message below: > >python pop3proxy.py >Loading database... Done. >Listener on port 110 is proxying pop3.wyith.net:110 >User interface url is http://localhost:8880/ >error: uncaptured python exception, closing channel ><__main__.ServerLineReader connected at 0x404f2d0c> >(socket.error:(111, 'Connection refused') >[/usr/local/lib/python2.3/asynchat.py|handle_read|88] >[/usr/local/lib/python2.3/asyncore.py|recv|353]) > >It would be more convenient if the proxy could be quietly running >all the time. I have a similar one around once a day: error: uncaptured python exception, closing channel (socket.error:(32, 'Broken pipe') [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asynchat.py|initiate_send|219] [/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/asyncore.py|send|334]) But Spambayes still works without restarting. -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From anthony at interlink.com.au Sat Apr 12 20:13:22 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sat Apr 12 05:14:50 2003 Subject: [Spambayes] Note on SPAM In-Reply-To: <3E97B64B.30004@hooft.net> Message-ID: <200304120913.h3C9DMB31389@localhost.localdomain> >>> Rob Hooft wrote > The "postfix" mailer has an option to refuse such E-mail. If this is > 100% true for you, you can use that option. From my experience, this is > not true at all. Many, many companies, even big companies, have mistakes > in their DNS setup. I agree that this would be a very good thing to > enforce, but personnally I wouldn't like to be responsible if one of > these big companies could not reach the sales people at the place where > I work! If it would be globally enforced, I'd join. Note also that reverse DNS can be correctly configured, but the addresses fail to be resolved due to network glitches, timeouts, or the like. DNS uses UDP, and there's no guarantees of reliable service there. -- Anthony Baxter It's never too late to have a happy childhood. From lists at morpheus.demon.co.uk Sat Apr 12 15:20:50 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Sat Apr 12 09:21:13 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port References: <1ED4ECF91CDED24C8D012BCF2B034F13012315CC@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: >> > What version are you using? >> >> CVS from a couple of weeks ago. I don't update particularly >> regularly these days. > > A wise decision ;) > >> I'll try updating from CVS, and I'll investigate a bit more closely. > > If it doesn't change, I'd be interested to know. I suspect that you > lucked out and got a pop3proxy that had a little bug that was later > fixed (quietly!). Nope, still broken. On the proxy home page: POP3 proxy running on 1110, proxying to example.com. Active POP3 conversations: 0. POP3 conversations this session: 0. Emails classified this session: 0 spam, 0 ham, 0 unsure. Total emails trained: Spam: 82 Ham: 13 On the configuration page: Servers: Current Value: localhost Ports: Current Value: 8110 Very odd. Ah. It only does this when run via the Win32 service code in windows/pop3proxy_service.py. So running pop3proxy.py from the command line shows the right UI, but the service doesn't. Presumably something isn't qualifying a variable like it should. I'll keep investigating... Paul. -- This signature intentionally left blank From lists at morpheus.demon.co.uk Sat Apr 12 17:20:06 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Sat Apr 12 11:20:26 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port References: <1ED4ECF91CDED24C8D012BCF2B034F13012315CC@its-xchg4.massey.ac.nz> Message-ID: Paul Moore writes: > Ah. It only does this when run via the Win32 service code in > windows/pop3proxy_service.py. So running pop3proxy.py from the command > line shows the right UI, but the service doesn't. > > Presumably something isn't qualifying a variable like it should. I'll > keep investigating... Got it. This patch fixes the problem: --- pop3proxy_service.py.orig 2003-04-12 16:11:26.000000000 +0100 +++ pop3proxy_service.py 2003-04-12 16:11:34.000000000 +0100 @@ -101,6 +101,7 @@ def ServerThread(self): state = pop3proxy.state + state.buildServerStrings() pop3proxy.main(state.servers, state.proxyPorts, state.uiPort, state.launchUI) if __name__=='__main__': -- This signature intentionally left blank From noreply at sourceforge.net Sat Apr 12 10:01:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Apr 12 11:45:09 2003 Subject: [Spambayes] [ spambayes-Patches-711845 ] mboxtrain.py in mh mode: trivial fix Message-ID: Patches item #711845, was opened at 2003-03-29 11:45 Message generated for change (Comment added) made by jay_berkenbilt You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=711845&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Jay Berkenbilt (jay_berkenbilt) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py in mh mode: trivial fix Initial Comment: This patch relative to mboxtrain.py in the 2003-01-17 snapshot fixes two trivial problems in mhdir_train: files are overwritten needlessly, and the count of trained messages is not properly updated. I just took the logic from the maildir_train function and duplicated it. ---------------------------------------------------------------------- >Comment By: Jay Berkenbilt (jay_berkenbilt) Date: 2003-04-12 12:01 Message: Logged In: YES user_id=523210 I'm attaching a new patch relative to the current CVS. Sorry for not doing that to begin with. The only difference actually between this patch and the previous one is the line numbers. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=711845&group_id=61702 From Greg at TheThomasHome.co.uk Sat Apr 12 23:24:22 2003 From: Greg at TheThomasHome.co.uk (Greg Thomas) Date: Sat Apr 12 17:27:54 2003 Subject: [Spambayes] pop3proxy problems Message-ID: Hi, SpamBayes was recommended to me, so I thought I'd give it a go on my Win2K PC. I downloaded and installed Python 2.2, downloaded and installed version 2.5 of the email package, and renamed away the old one. I then installed spambayes-1.0a2 and downloaded pop3proxy. I followed the instructions, and created a bayescustomize.ini with a pop3proxy_servers: entry, but that failed ("pop3proxy_servers & pop3proxy_ports are different lengths!") so I added a pop3proxy_ports: 110 line too. That seemed to get the pop3proxy going OK. But, when I tried to access the UI on http://localhost:8880/ I just get the following error in Mozilla: 500 Server error Traceback (most recent call last): File "c:\Python22\Lib\site-packages\spambayes\Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "pop3proxy.py", line 706, in onHome self.html.findMessage) File "c:\Python22\Lib\site-packages\spambayes\PyMeldLite.py", line 710, in __getattr__ raise AttributeError, "No element or attribute named %r" % name AttributeError: No element or attribute named 'findMessage' I've double checked just about everything, but can't see what I'm doing wrong. I've never used Python before, so I'd be grateful fro any pointers, TIA, Greg From tim at fourstonesExpressions.com Sat Apr 12 20:22:51 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sat Apr 12 20:23:00 2003 Subject: [Spambayes] pop3proxy problems In-Reply-To: Message-ID: Wow. Haven't seen that one before. Lemme have a gander at the code and see if I can see anything... - TimS 4/12/2003 4:24:22 PM, Greg Thomas wrote: >Hi, > >SpamBayes was recommended to me, so I thought I'd give it a go on my >Win2K PC. I downloaded and installed Python 2.2, downloaded and >installed version 2.5 of the email package, and renamed away the old >one. I then installed spambayes-1.0a2 and downloaded pop3proxy. > >I followed the instructions, and created a bayescustomize.ini with a >pop3proxy_servers: entry, but that failed ("pop3proxy_servers & >pop3proxy_ports are different lengths!") so I added a >pop3proxy_ports: 110 line too. That seemed to get the pop3proxy going >OK. > >But, when I tried to access the UI on http://localhost:8880/ I just >get the following error in Mozilla: > >500 Server error > >Traceback (most recent call last): > > File "c:\Python22\Lib\site-packages\spambayes\Dibbler.py", line 398, in found_terminator > getattr(plugin, name)(**params) > > File "pop3proxy.py", line 706, in onHome > self.html.findMessage) > > File "c:\Python22\Lib\site-packages\spambayes\PyMeldLite.py", line 710, in __getattr__ > raise AttributeError, "No element or attribute named %r" % name > >AttributeError: No element or attribute named 'findMessage' > >I've double checked just about everything, but can't see what I'm >doing wrong. I've never used Python before, so I'd be grateful fro any >pointers, > >TIA, > >Greg > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Sat Apr 12 20:44:06 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sat Apr 12 20:48:45 2003 Subject: [Spambayes] pop3proxy problems In-Reply-To: Message-ID: 4/12/2003 4:24:22 PM, Greg Thomas wrote: >Hi, > >SpamBayes was recommended to me, Fantastic! >so I thought I'd give it a go on my >Win2K PC. I downloaded and installed Python 2.2, downloaded and >installed version 2.5 of the email package, and renamed away the old >one. I then installed spambayes-1.0a2 and downloaded pop3proxy. > >I followed the instructions, and created a bayescustomize.ini with a >pop3proxy_servers: entry, but that failed ("pop3proxy_servers & >pop3proxy_ports are different lengths!") so I added a >pop3proxy_ports: 110 line too. That seemed to get the pop3proxy going >OK. Ok, all this is good, and indicates that whatever is wrong is not terribly wrong. > >But, when I tried to access the UI on http://localhost:8880/ I just >get the following error in Mozilla: This *might* be caused by the pop3proxy running in the wrong directory. The current directory for pop3proxy should be spambayes-1.0a2 The bayescustomize.ini should be in that same directory. Then you simply run pop3proxy.py from the command line. If you're doing all this, and you still have the ui error, then check to be sure that you have a file named ui_html.py in spambayes-1.0a2\spambayes\resources. If that file exists, then I'm stumped. Lemme know :) c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From david at theresistance.net Sat Apr 12 22:39:37 2003 From: david at theresistance.net (David Shaw) Date: Sat Apr 12 21:39:43 2003 Subject: [Spambayes] Tough to classify Message-ID: I placed an order with Amazon today. I got a TiVo and a Java book. The order confirmation came back unsure, with 123 clues pointing both ways, and probability as follows: *H* 0.981571864474 *S* 0.56420331545 This message is obviously ham to a human, but here are some of the higher spam clues: find, 0.908163265306 day? 0.908163265306 $5,000. 0.908163265306 20, 0.908163265306 url:help 0.908163265306 telephone: 0.934782608696 order: 0.934782608696 buy 0.942237128563 saver 0.949438202247 seller 0.96511627907 online, 0.983271375465 ordering 0.987106017192 dollar 0.987106017192 grand 0.988431876607 shopping 0.992091388401 tax 0.994699646643 subject:with 0.99504950495 subject:Your 0.997366881217 What can be done in a case like this? I don't order from amazon that often (maybe 4 times a year), but amazon itself is a ham clue: url:amazon 0.155172413793 I feel like spambayes has enough clues to know this is ham, it's just a question of calculating the probability in such a way as to recognize it. I would be interesting in any thoughts on this. From tim.one at comcast.net Sun Apr 13 02:17:48 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Apr 13 01:18:23 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: <3E96E83A.1060807@parducci.net> Message-ID: [bill parducci] > ... > as to the 'mixed case' issue, i believe that there have been a couple of > different tests looking at case, etc., none of which returned > statistical relevance. therefore, i *think* that the scoring is case > insensitive currently (i would assume to optimize db size). It's *mostly* case-insensitive, and indeed to minimize database size, and because tests both ways had overall indistinguishable error rates. Preserving or folding away case had different effects on different kinds of msgs, though (there are comments about this in tokenize.py -- each way is prone to different kinds of mistakes). Case is preserved for words in Subject lines, and for header field names ("To:" vs "TO:", etc), because tests said both those improved overall results. Note that all test results in the early days were on English ham, and mostly English spam. From anthony at interlink.com.au Sun Apr 13 16:22:13 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Apr 13 01:23:51 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: Message-ID: <200304130522.h3D5MDo14232@localhost.localdomain> >>> Tim Peters wrote > It's *mostly* case-insensitive, and indeed to minimize database size, and > because tests both ways had overall indistinguishable error rates. With smaller training databases, case-sensitivity actually made for noticeably worse results. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tim.one at comcast.net Sun Apr 13 02:39:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Apr 13 01:41:51 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: <200304130522.h3D5MDo14232@localhost.localdomain> Message-ID: [Tim Peters] >> It's *mostly* case-insensitive, and indeed to minimize database >> size, and because tests both ways had overall indistinguishable error rates. [Anthony Baxter] > With smaller training databases, case-sensitivity actually made for > noticeably worse results. Good memory, Anthony! That's right. The thing that scares me is that this result made intuitive sense . From anthony at interlink.com.au Sun Apr 13 17:00:43 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Sun Apr 13 02:02:20 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: Message-ID: <200304130600.h3D60hJ14451@localhost.localdomain> >>> Tim Peters wrote > Good memory, Anthony! That's right. The thing that scares me is that this > result made intuitive sense . Had to happen eventually. Counter-intuitive results were all too common and we were getting used to them. something-about-inconsistent-hobgoblins-here Anthony From tim.one at comcast.net Sun Apr 13 03:02:50 2003 From: tim.one at comcast.net (Tim Peters) Date: Sun Apr 13 02:04:22 2003 Subject: [Spambayes] Tough to classify In-Reply-To: Message-ID: [David Shaw] > I placed an order with Amazon today. I got a TiVo and a Java book. > The order confirmation came back unsure, with 123 clues pointing both > ways, and probability as follows: > > *H* 0.981571864474 > *S* 0.56420331545 > > This message is obviously ham to a human, I have no doubt that it was obviously ham to you, but don't accept it would have been obvious ham to humans other than you. For example, """ <> Thanks for ordering from Gateway! Please see the attached file for the details of your order. Should you wish to add additional items, or have any questions, please reply to me or call me at the number listed inside. Refer a friend! You may qualify for a $50 credit when you refer a friend and they buy a Gateway PC. See this page for details http://www.gateway.com/programs/rewards/index.shtml """ is the text of an order confirmation I got from Gateway last month. I dare say it's spam to everyone else on this list . > but here are some of the higher spam clues: > > find, 0.908163265306 > day? 0.908163265306 > $5,000. 0.908163265306 > 20, 0.908163265306 > url:help 0.908163265306 > telephone: 0.934782608696 > order: 0.934782608696 > buy 0.942237128563 > saver 0.949438202247 > seller 0.96511627907 > online, 0.983271375465 > ordering 0.987106017192 > dollar 0.987106017192 > grand 0.988431876607 > shopping 0.992091388401 > tax 0.994699646643 > subject:with 0.99504950495 > subject:Your 0.997366881217 > > > What can be done in a case like this? Training on it will be effective, over time. As it says on http://spambayes.sourceforge.net/background.html For example, commercial HTML email from a company you do business with is quite likely to score as Unsure the first time the system sees such a message from a particular company. Spam and commercial email both use the language and devices of advertising heavily, so it's hard to tell them apart. Training quickly teaches the system all sorts of things about the commerical email you want, though, ranging from which company sent it and how they addressed you, to the kinds of products and services it's offering. and, e.g., "$5,000." is either some advertising gimmick, or you paid waaaay too much for a Java book . > I don't order from amazon that often (maybe 4 times a year), but amazon itself > is a ham clue: > > url:amazon 0.155172413793 You must have many ham clues, else your *H* score wouldn't have been 0.98. > I feel like spambayes has enough clues to know this is ham, it's just a > question of calculating the probability in such a way as to recognize > it. I would be interesting in any thoughts on this. There are many ways to combine the individual word spamprobs so that the msg will come out as ham. The trick is to do so in a way that doesn't also classify more spam as ham. The combination method in spambayes is the end result of some intense work on the topic by several people, and beat about a dozen other combination methods in large tests. That doesn't mean it's the best possible combination method, but does suggests it won't be trivial to do better. The combination code (in classifier.py) is about the easiest part of the system to change, so feel encouraged to test alternatives. "I feel like" isn't really testable on its own . From Greg at TheThomasHome.co.uk Sun Apr 13 13:35:18 2003 From: Greg at TheThomasHome.co.uk (Greg Thomas) Date: Sun Apr 13 07:38:52 2003 Subject: [Spambayes] pop3proxy problems In-Reply-To: References: Message-ID: On Sat, 12 Apr 2003 19:44:06 -0500, you wrote: > This *might* be caused by the pop3proxy running in the wrong directory. The > current directory for pop3proxy should be spambayes-1.0a2 The > bayescustomize.ini should be in that same directory. Then you simply run > pop3proxy.py from the command line. If you're doing all this, and you still > have the ui error, then check to be sure that you have a file named ui_html.py > in spambayes-1.0a2\spambayes\resources. If that file exists, then I'm > stumped. Lemme know :) OK, the problem appears to be with the version of pop3proxy I downloaded from sourceforge. I'd not noticed that spambayes cam with a version already until I tried to copy the sourceforge copy to the spambayes-1.0a2 folder. The one with Spambayes is working with no problems. Thanks for the help, Greg From tim at fourstonesExpressions.com Sun Apr 13 07:40:10 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sun Apr 13 07:40:20 2003 Subject: [Spambayes] pop3proxy problems In-Reply-To: Message-ID: 4/13/2003 6:35:18 AM, Greg Thomas wrote: >OK, the problem appears to be with the version of pop3proxy I >downloaded from sourceforge. I'd not noticed that spambayes cam with a >version already until I tried to copy the sourceforge copy to the >spambayes-1.0a2 folder. The one with Spambayes is working with no >problems. Oh yeah... it's been heavily modified since 1.0a2. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From michel at reimon.net Sun Apr 13 19:51:33 2003 From: michel at reimon.net (Michel Reimon) Date: Sun Apr 13 13:38:21 2003 Subject: [Spambayes] baysian news filter Message-ID: hello. barry warsaw encouraged me, to contact this list with my little problem. i'm a journalist, writing about politics and economics in both mainstream and independent media (www.indymedia.org). the www is oviously a nice thing for independent journalism, but it's also very vulnerable and some incidents in the last 2-3 years (including hackers, police and the chinese government) led to the insight: we need a better system. the software project i'm working on is a peer-to-peer-network for sharing news and articles. (in my former life i got a low-level degree in computer sciences.) without going into details: the goal is to create a free, decentralized, quick, reliable and personalized news source for everyone. great, huh? so how does spambayes fit into this idea? the p2p-nodes pass news through the system. the users/readers can't read all articles flooding through their computer, so the system has to decide which news might be interesting for it's user and should therefore be displayed or stored. i want to use a baysian approach for this and i want to program it in python - bingo! seems to me, that this problem is quite similar if not identical to spam-recognition... because my time is very much limited i'll break work into small pieces and write this filter first, using emailed articles as input. i haven't written programs for quite a while and no experience with python, so i really could need a guardian angel - someone who looks over my shoulder as i stumble through the source code. so if someone here wants to go on a crusade for free speech - any help will be very much appreciated. but be warned: i guess i'll start with some pretty dumb questions :) please contact me off list. thanx for reading this, michel From list2003 at fure.net Sun Apr 13 12:05:07 2003 From: list2003 at fure.net (Jan Fure) Date: Sun Apr 13 14:02:26 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: <200304130522.h3D5MDo14232@localhost.localdomain> References: <200304130522.h3D5MDo14232@localhost.localdomain> Message-ID: <3E99A6D3.3040508@fure.net> Anthony Baxter wrote: >>>>Tim Peters wrote >> >>It's *mostly* case-insensitive, and indeed to minimize database size, and >>because tests both ways had overall indistinguishable error rates. > > > With smaller training databases, case-sensitivity actually made for > noticeably worse results. > > Anthony > This also made intutive sense to this spambayes newbie as well, since case sensitivity would increase the number of words, and decrease the statistics on each of them. My original question was whether mixed case should be penalized: Here is a potential pseudocode: if ($word is unknown/doesn't occur in DataBase) if(1 < # of Uppercase Letters < # of Total letters in word) then $spam_rating = 0.9 end end This is outside the baysian approach, but would reprecent an educated guess only for unknown words. Jan Fure From charl at infosat.net Mon Apr 14 00:03:23 2003 From: charl at infosat.net (Charl Matthee) Date: Sun Apr 13 17:03:33 2003 Subject: [Spambayes] mboxtrain.py chokes on non-existent directories Message-ID: <20030413210323.GG7614@sa02.infosat.net> Hi, I you run mboxtrain.py with -g or -s set to a non-existent directory you end up with and error like: Training ham (/home/charl/Mail/mailman): Traceback (most recent call last): File "/home/charl/projects/spambayes/mboxtrain.py", line 292, in ? main() File "/home/charl/projects/spambayes/mboxtrain.py", line 279, in main train(h, g, False, force) File "/home/charl/projects/spambayes/mboxtrain.py", line 215, in train elif trainnew and os.path.isdir(os.path.join(path, "new")): NameError: global name 'trainnew' is not defined Perhaps this should throw a more appropriate error? Ciao Charl __________________________________________________________________________ [ Charl Matthee ] [ +27-11-721-3800 ] [ Systems Manager ] [ +27-11-405-6508 ] __________________________________________________________________________ From skip at pobox.com Sun Apr 13 18:10:18 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Apr 13 18:10:31 2003 Subject: [Spambayes] Mixed case words in heading In-Reply-To: <3E99A6D3.3040508@fure.net> References: <200304130522.h3D5MDo14232@localhost.localdomain> <3E99A6D3.3040508@fure.net> Message-ID: <16025.57418.521556.704207@montanaro.dyndns.org> Jan> My original question was whether mixed case should be penalized: It's easy enough to tweak the spambayes tokenizer to generate a synthetic token for unusually capitalized words. Then, you don't assign a penalty to it, but let the classifier decide if it is a hammy or spammy (or neither) clue. A weird idea just crossed my mind. Has anyone ever tested the performance of the system using only synthetic tokens, no real content? Skip From skip at pobox.com Sun Apr 13 18:25:26 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Apr 13 18:25:40 2003 Subject: [Spambayes] mboxtrain.py chokes on non-existent directories In-Reply-To: <20030413210323.GG7614@sa02.infosat.net> References: <20030413210323.GG7614@sa02.infosat.net> Message-ID: <16025.58326.760755.955242@montanaro.dyndns.org> Charl> I you run mboxtrain.py with -g or -s set to a non-existent Charl> directory you end up with and error like: ... I just checked in a change to mboxtrain.py but can't really test it because I don't use that code. Please "cvs up" and give it a try. Skip From T.A.Meyer at massey.ac.nz Mon Apr 14 11:49:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 13 18:50:22 2003 Subject: [Spambayes] Pop3 proxy UI - doesn't display correct port Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139CA76@its-xchg4.massey.ac.nz> > Ah. It only does this when run via the Win32 service code in > windows/pop3proxy_service.py. So running pop3proxy.py from > the command line shows the right UI, but the service doesn't. Wow, someone is actually using that! ;) > Got it. This patch fixes the problem: [...] Thanks for figuring out the problem & solution; I'll check it in. =Tony Meyer From noreply at sourceforge.net Sun Apr 13 19:14:00 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sun Apr 13 20:57:22 2003 Subject: [Spambayes] [ spambayes-Feature Requests-670573 ] IMAP proxy Message-ID: Feature Requests item #670573, was opened at 2003-01-19 19:51 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702 Category: None Group: None >Status: Pending Priority: 5 Submitted By: Jean-Marc Valin (jmvalin) >Assigned to: Tony Meyer (anadelonbrin) Summary: IMAP proxy Initial Comment: I use IMAP for my mail, so I think an IMAP proxy for spambayes would be great. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-14 13:14 Message: Logged In: YES user_id=552329 A simple IMAP filter is in cvs. This is very much in testing/development at the moment, but it'll be there soon. Anyone wanting to use spambayes with IMAP could please help test it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702 From david at theresistance.net Sun Apr 13 23:49:21 2003 From: david at theresistance.net (David Shaw) Date: Sun Apr 13 22:49:32 2003 Subject: [Spambayes] Tough to classify In-Reply-To: Message-ID: > I have no doubt that it was obviously ham to you, but don't accept it > would > have been obvious ham to humans other than you. For example, > You're right of course. The mail included this spammie bit: Need to give a gift? Not sure what to buy? Amazon.com gift certificates are available in any dollar amount from $5 to $5,000. We'll deliver it via e-mail or physical mail-- so it's the perfect last minute gift. Learn more at http://www.amazon.com/gift-certificates/ > You must have many ham clues, else your *H* score wouldn't have been > 0.98. It had lots of strong clues for both. > There are many ways to combine the individual word spamprobs so that > the msg > will come out as ham. The trick is to do so in a way that doesn't also > classify more spam as ham. The combination method in spambayes is the > end > result of some intense work on the topic by several people, and beat > about a > dozen other combination methods in large tests. That doesn't mean > it's the > best possible combination method, but does suggests it won't be > trivial to > do better. Oh I know. I read the math in the chi squared code on Gary's page and quickly got in over my head. I took some probability math classes in college, but it's been a few years. Maybe I just need to adjust my thresholds. This message scored: X-Spambayes-Spam-Probability: 0.288224866953 I have my ham threshold at .2 and my spam at .8. Almost always when a message is unsure it is really spam. This time it was ham. I think maybe I just need to set the thresholds to .3 and .7 and see how that goes for a while. > The combination code (in classifier.py) is about the easiest part of > the > system to change, so feel encouraged to test alternatives. "I feel > like" > isn't really testable on its own . I love python for this very reason :) If only I could figure out that dibbler stuff -- it seems very complicated (and slow, at least on OS X) for what it's doing. I'd love to replace it with something simpler and faster. From list2003 at fure.net Sun Apr 13 23:05:26 2003 From: list2003 at fure.net (Jan Fure) Date: Mon Apr 14 01:02:46 2003 Subject: [Spambayes] Pop mailbox filtering In-Reply-To: <20030413210323.GG7614@sa02.infosat.net> References: <20030413210323.GG7614@sa02.infosat.net> Message-ID: <3E9A4196.3090707@fure.net> Dear fellow spam fighters; Some of the requrements for a good spam tool can be summarized as following: 1. Effective filtering, so I don't have to see the spam. 2. Mobile, in the sense that the tool filters various pop3 mailboxes, such that the mail can be checked without being at home, without manually wading through the spam. 3. There should be safeguards against loosing a message. (These are simple if spambayes runs on a server with UNIX mailboxes, but being connected through @home/comcast, I am not counting on being able to telnet in to my home PC) I think I have achieved this through spambayes, and a script which deletes messages from my pop3 mailbox if determined to be spam. Here is the mechanics of it: 1. Script connects to the pop mailbox through the pop3proxy.py based proxy. 2. Script parses the 'X-Spambayes-Classification:' header, and if 'ham' or 'unsure' does nothing, if 'spam', the message gets downloaded (in order to have the option of training the classifier), deleted, and my local sendmail will send a message to the address in the 'Reply-To:' field, or if non-existent, the 'From:' field, with the following text: I am sorry to inform you that your recent message was determined to be automated by my mail-filter, and in the event this was a mistake, please re-send the message, it will most likely get through this filter if you change the format to plain text. The subject line is 'Your Recent Message to Jan Fure', which will let anybody sending an important E-mail realize it did not reach it's destination, and why, whereas this is likely to be useless to a spammer, and in the event the 'From:' or 'Reply-To' adresses are mail-bots programmed to decipher good adresses from subject lines or sender, neither will match anything in its databases, as my 'From:' field only gives a hostname for which there is no routing, and I am not re-using the subject line, which I will assume is unique in the case of a competent spammer. Has anybody else done something like this? My biggest qualm is the possible event that spam with a fake 'Reply-To:' field cauces me to send unsolicited E-mail to innocent third parties. But I will live with this possibility, I still think my behavior is just, but possibly not gentle. In the testing phase, I have been running the script on the proxy which receives the spambayes mailing list messages, and none of them have bounced yet. Jan From tim at fourstonesExpressions.com Mon Apr 14 01:13:34 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 01:13:47 2003 Subject: [Spambayes] Pop mailbox filtering In-Reply-To: <3E9A4196.3090707@fure.net> Message-ID: 4/14/2003 12:05:26 AM, Jan Fure wrote: >My biggest qualm is the possible event that spam with a fake 'Reply-To:' >field cauces me to send unsolicited E-mail to innocent third parties. >But I will live with this possibility, I still think my behavior is >just, but possibly not gentle. This is a huge problem, made even larger by the fact that you're not the one who has to live with it. In your case, someone would only receive a single bounce from you. But if spambayes were being used by millions of people, and a single spam with a reply-to that is an innocent real address (this DOES happen) were sent to those millions, that individual would receive an incredible flood of mail. This is THE reason we do not do this kind of thing. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Mon Apr 14 19:16:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 14 02:17:35 2003 Subject: [Spambayes] Tough to classify Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139CC69@its-xchg4.massey.ac.nz> > I love Python for this very reason :) If only I could figure > out that dibbler stuff -- it seems very complicated (and slow, > at least on OS X) for what it's doing. I'd love to replace it > with something simpler and faster. Dibbler's great! It does take a bit of getting used to, but just work through the code in pop3proxy or optionsconfig along with the dibbler.__doc__ and you'll see what it does. I don't think you'd easily be able to replace it with something simpler that kept the power that dibbler offers. Not sure about slow - doesn't seem that way to me, but I'm not using OSX, either. What exactly seems slow? Serving the pages? There might be something that the app using dibbler (pop3proxy or optionsconfig for example) can do to speed things up a bit. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 14 19:22:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 14 02:22:56 2003 Subject: [Spambayes] baysian news filter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130139CC6B@its-xchg4.massey.ac.nz> > so how does spambayes fit into this idea? > the p2p-nodes pass news through the system. the users/readers > can't read all articles flooding through their computer, so > the system has to decide which news might be interesting for > it's user and should therefore be displayed or stored. I want > to use a Bayesian approach for this and I want to program it > in python - bingo! seems to me, that this problem is quite > similar if not identical to spam-recognition... If you only want a binary interesting/non-interesting classification, then you can probably use spambayes almost untouched. You might need to add a few extra token generators, but depending on what the clues for 'interesting' are, you might not. In this case, you can basically just call tokenise() and classify() from whatever framework you set up. Obviously you wouldn't need the whole spambayes package. > please contact me off list. I copied this to the list since non-spam uses of spambayes has come up before and might interest others. You might want to look through the archives for a reasonably recent (March, I think) message about using spambayes to classify database records IIRC. =Tony Meyer From tim at fourstonesExpressions.com Mon Apr 14 09:02:24 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 09:02:39 2003 Subject: [Spambayes] baysian news filter In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130139CC6B@its-xchg4.massey.ac.nz> Message-ID: 4/14/2003 1:22:13 AM, "Meyer, Tony" wrote: >I copied this to the list since non-spam uses of spambayes has come up >before and might interest others. You might want to look through the >archives for a reasonably recent (March, I think) message about using >spambayes to classify database records IIRC. The thread you mention was started by Skip Montanaro on March 27, 2003, titled "Non-email use of the spambayes project." The body of the first posting in that thread is as follows: I've successfully applied the Spambayes code (http://spambayes.sf.net/) to a non-email application today and thought I'd pass the concept along to others. Many of you on c.l.py probably are aware of the Spambayes project which relies on user segregation of a set of email messages into spam and ham, then combines the resulting clues they contain to predict the hamminess or spamminess of email messages it hasn't seen before. It works extremely well for this, but the basic concept is applicable to other classification problems. I've operated the Mojam and Musi-Cal websites for several years. Over that time we've accumulated a sizable venue database. Unfortunately, many entries in the database have become stale and don't contribute anything to the system other than to slow down queries. Venue names get misspelled, venues go out of business, non-venue stuff slips into the database, or other errors occur. As a result, I had a venue database containing roughly 35,000 entries, only about half of which were referenced by concert items in the database. The database as it sat couldn't be licensed to potential customers because of all the errors it contained. I could simply delete all of those entries, but that would delete a lot of useful content from the database. Many of those currently unreferenced venue entries *are* correct and will eventually be associated with other concerts, or will be useful as corollary information for people using our websites or as an extra database we can license to content consumers. I wrote a trivial little application today which allowed me to rummage through the unreferenced records in the database. I could delete entries which I felt were incorrect, but it was a one-at-a-time process. With 15,000+ entries to scan, one-by-one wasn't going to cut it. Then I got the idea to use the Spambayes classifier to watch what I was doing and train on my actions. I was viewing the records in chunks of 20 items at a time, sorted alphabetically. I could choose to delete one or more items or move onto the next chunk of 20 entries. A deletion caused the classifier to be trained on the entry as "spam". Moving onto the next chunk caused the classifier to be trained on the remaining undeleted entries as "ham". Over a short period of time, it got reasonably good at identifying "spam". I then started sorting each chunk of 20 items by its spambayes score and could specify a threshold score below which to eliminate all entries in that chunk. The next improvement was to sort the entire mess of records by the spambayes classification. I was then seeing entire chunks of records whose scores fell below the threshold and was able to delete them 20 at a time. The entire Spambayes code is a single tokenizer generator function and a small Classifier class: import spambayes.storage class Classifier: def __init__(self): self.cls = spambayes.storage.DBDictClassifier("fven.db") def classify(self, d): return self.cls.spamprob(tokenize(d), True) def train(self, d, saved): self.cls.learn(tokenize(d), saved) def __del__(self): self.cls.store() def tokenize(d): # d is a dictionary as returned by a MySQL query - tokenize the # various fields, noting interesting facts yield "venue length:%d" % len(d["venue"]) for word in d["venue"].split(): # looks like a festival - not a venue at all if word.lower().endswith("fest"): yield "venue:" yield "venue:"+word # most correct venue names don't contain punctuation if (string.translate(d["venue"], null_xlate, string.punctuation) != d["venue"]): yield "venue:" # no address information for this venue - less valuable if not d["addr1"]: yield "addr1:" elif d["addr1"][0] not in string.digits: # most valid addresses in the US/Canada begin with a street number yield "addr1:" for word in d["addr1"].split(): yield "addr1:"+word for word in d["addr2"].split(): yield "addr2:"+word yield "phone:"+d["phone"] yield "city:"+d["city"].strip() yield "region:"+(d["state"].strip() or d["country"].strip()) yield "zip:"+d["zip"] # sometimes the city gets replicated in the address, making the # data "dirtier" and thus less valuable vwords = d["venue"].lower().split() for word in d["city"].lower().split(): if word in vwords: yield "city:" break # the record's id reflects its age - older records, and thus # smaller ids, are more likely to be outdated try: yield "id:2**%.0f" % math.log(int(d["id"]) // 100) except OverflowError: yield "id:2**0" return ... classifier = Classifier() The input to the tokenizer, instead of being an email message, is a dictionary representing the return value from an SQL query. When an item is to be deleted, it gets classified like so: classifier.train(d, False) When moving the the next chunk, the remaining records are classified like so: for item in chunk: classifier.train(item, True) I haven't gotten too crazy with the tokenizer (compare it with the Spambayes tokenizer!). I will probably collect some other clues in the tokenizer, such as what other tables reference the venue record. For the time being, it's working okay. I just need it to do a reasonably good job segregating records so I can quickly scan a group and make a deletion decision. So far, it's doing a very good job. Not bad for 15-30 minutes of work... Skip http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From charl at infosat.net Mon Apr 14 17:54:20 2003 From: charl at infosat.net (Charl Matthee) Date: Mon Apr 14 10:54:29 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages Message-ID: <20030414145420.GZ7614@sa02.infosat.net> Hi, If I try and run mboxtrain.py on a mbox that contains a bugtraq message it dies with the following: Traceback (most recent call last): File "/home/charl/projects/spambayes/mboxtrain.py", line 292, in ? main() File "/home/charl/projects/spambayes/mboxtrain.py", line 279, in main train(h, g, False, force) File "/home/charl/projects/spambayes/mboxtrain.py", line 212, in train mbox_train(h, path, is_spam, force) File "/home/charl/projects/spambayes/mboxtrain.py", line 151, in mbox_train outf.write(msg.as_string(True)) File "/usr/lib/python2.2/email/Message.py", line 107, in as_string g.flatten(self, unixfrom=unixfrom) File "/usr/lib/python2.2/email/Generator.py", line 100, in flatten self._write(msg) File "/usr/lib/python2.2/email/Generator.py", line 128, in _write self._dispatch(msg) File "/usr/lib/python2.2/email/Generator.py", line 154, in _dispatch meth(msg) File "/usr/lib/python2.2/email/Generator.py", line 243, in _handle_multipart g.flatten(part, unixfrom=False) File "/usr/lib/python2.2/email/Generator.py", line 100, in flatten self._write(msg) File "/usr/lib/python2.2/email/Generator.py", line 128, in _write self._dispatch(msg) File "/usr/lib/python2.2/email/Generator.py", line 154, in _dispatch meth(msg) File "/usr/lib/python2.2/email/Generator.py", line 212, in _handle_text raise TypeError, 'string payload expected: %s' % type(payload) TypeError: string payload expected: I am happy to provide a copy of such an email message to the responsible party, out of band. Ciao Charl __________________________________________________________________________ [ Charl Matthee ] [ +27-11-721-3800 ] [ Systems Manager ] [ +27-11-405-6508 ] __________________________________________________________________________ From tim at fourstonesExpressions.com Mon Apr 14 11:08:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 11:09:00 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <20030414145420.GZ7614@sa02.infosat.net> Message-ID: <5MNKJDMGLKIJHSO62FBRQ4Z4XEB2W.3e9aceff@myst> 4/14/2003 9:54:20 AM, Charl Matthee wrote: >I am happy to provide a copy of such an email message to the responsible >party, out of band. Does this message have a mimetype of multipart/digest? If so, I think we already have an open bug on this one. If not, then this looks like a new one. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From charl at infosat.net Mon Apr 14 18:17:45 2003 From: charl at infosat.net (Charl Matthee) Date: Mon Apr 14 11:18:05 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <5MNKJDMGLKIJHSO62FBRQ4Z4XEB2W.3e9aceff@myst> References: <20030414145420.GZ7614@sa02.infosat.net> <5MNKJDMGLKIJHSO62FBRQ4Z4XEB2W.3e9aceff@myst> Message-ID: <20030414151745.GD7614@sa02.infosat.net> On Mon Apr 14 2003 at 10:08:47AM -0500 'Tim Stone - Four Stones Expressions' wrote: > Does this message have a mimetype of multipart/digest? If so, I think we > already have an open bug on this one. If not, then this looks like a new one. It does not look like it. The message contains the following MIME parts: Message Name MIME Encoding 1 [text/plain, 7bit, us-ascii, 1.6K] 2 [argv] BitchX-353 Vulnerability [message/rfc822, 7bit, 7.2K] 3 [SecurityOffice] Netcharts XBRL Server [message/rfc822, 7bit, 3.2K] 4 php-Board (php) [message/rfc822, 7bit, 1.1K] 5 Kietu ( PHP ) [message/rfc822, 7bit, 1.9K] 6 DotBr (PHP) [message/rfc822, 7bit, 1.6K] 7 Presentation on Writing Secure Programs [message/rfc822, 7bit, 0.9K] 8 D-Forum (PHP) [message/rfc822, 7bit, 1.2K] 9 GLSA: nethack [message/rfc822, 7bit, 1.3K] 10 Re: Riched20.DLL attribute label buffer [message/rfc822, 7bit, 1.0K] 11 Re: /usr/bin/enq and /usr/bin/X11/aixter [message/rfc822, 7bit, 3.1K] 12 [OpenPKG-SA-2003.009] OpenPKG Security A [message/rfc822, 7bit, 4.2K] 13 [OpenPKG-SA-2003.010] OpenPKG Security A [message/rfc822, 7bit, 4.7K] 14 [OpenPKG-SA-2003.011] OpenPKG Security A [message/rfc822, 7bit, 4.0K] 15 SuSE Security Announcement: imp (SuSE-SA [message/rfc822, 7bit, 16K] 16 SuSE Security Announcement: mod_php4 (Su [message/rfc822, 7bit, 13K] 17 CSSA-2003-007.0 Advisory withdrawn. Re: [message/rfc822, 7bit, 1.2K] 18 Re: CSSA-2003-007.0 Advisory withdrawn. [message/rfc822, 7bit, 1.7K] Ciao Charl __________________________________________________________________________ [ Charl Matthee ] [ +27-11-721-3800 ] [ Systems Manager ] [ +27-11-405-6508 ] __________________________________________________________________________ From tim at fourstonesExpressions.com Mon Apr 14 11:33:56 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 11:34:11 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <20030414151745.GD7614@sa02.infosat.net> Message-ID: 4/14/2003 10:17:45 AM, Charl Matthee wrote: >On Mon Apr 14 2003 at 10:08:47AM -0500 'Tim Stone - Four Stones Expressions' wrote: > >> Does this message have a mimetype of multipart/digest? If so, I think we >> already have an open bug on this one. If not, then this looks like a new one. > >It does not look like it. The message contains the following MIME parts: Ok, can you share the message with me. Off list is fine if that's what you want. If you can attach it as a tar or zip, it will ensure that it isn't mangled by my mail reader... :) c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Mon Apr 14 13:54:14 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 13:54:26 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <20030414151745.GD7614@sa02.infosat.net> Message-ID: 4/14/2003 10:17:45 AM, Charl Matthee wrote: >On Mon Apr 14 2003 at 10:08:47AM -0500 'Tim Stone - Four Stones Expressions' wrote: > >> Does this message have a mimetype of multipart/digest? If so, I think we >> already have an open bug on this one. If not, then this looks like a new one. > >It does not look like it. The message contains the following MIME parts: > This is a multipart/digest message, and a known problem. Keep an eye out for the fix checkin. It'll get fixed one of these days. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From popiel at wolfskeep.com Mon Apr 14 12:07:05 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Apr 14 14:07:11 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: Message from Tim Stone - Four Stones Expressions of "Mon, 14 Apr 2003 12:54:14 CDT." References: Message-ID: <20030414180705.A4B292DE9A@cashew.wolfskeep.com> In message: writes: > >This is a multipart/digest message, and a known problem. Keep an eye out for >the fix checkin. It'll get fixed one of these days. Here's a question: what is the proper behaviour for these messages? Should the entire message get a ham/spam score, should the individual sub-messages get their own scores, or both? If both, how should the individual scores be combined into the overall score? Should the digest be broken into multiple messages: one containing ham, one containing spam, and one containing unsure? My initial impulse is to score each sub-message individually, and if any of them are ham, mark the entire thing as ham. If none are ham, but some are unsure, mark the overall message as unsure. Otherwise mark it as spam. As to the debug clue headers, I have no idea how to handle them... - Alex From tim at fourstonesExpressions.com Mon Apr 14 15:45:19 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 14 15:45:28 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <20030414180705.A4B292DE9A@cashew.wolfskeep.com> Message-ID: <8297SPNJ2XD96IFJDL1V3WB7KJW.3e9b0fcf@myst> 4/14/2003 1:07:05 PM, "T. Alexander Popiel" wrote: >In message: > writes: >> >>This is a multipart/digest message, and a known problem. Keep an eye out for >>the fix checkin. It'll get fixed one of these days. > >Here's a question: what is the proper behaviour for these messages? > >Should the entire message get a ham/spam score, should the individual >sub-messages get their own scores, or both? If both, how should the >individual scores be combined into the overall score? Should the digest >be broken into multiple messages: one containing ham, one containing >spam, and one containing unsure? I've spent a bit of time thinking about this, and there really is no good answer that I can come up with. Splitting the digest into three (ham/spam/unsure) digests makes the most sense, but there isn't much facility in the current email package to manage this, I don't think. > >My initial impulse is to score each sub-message individually, and if >any of them are ham, mark the entire thing as ham. If none are ham, >but some are unsure, mark the overall message as unsure. Otherwise >mark it as spam. As to the debug clue headers, I have no idea how >to handle them... This might handle things, but doesn't make training work. Again, spliting the digest makes the most sense to me. > >- Alex > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Mon Apr 14 15:26:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 14 17:08:45 2003 Subject: [Spambayes] [ spambayes-Bugs-717998 ] Can't reset Spam folder if folder is lost Message-ID: Bugs item #717998, was opened at 2003-04-09 00:37 Message generated for change (Comment added) made by astrogen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benjamin J. Judson (astrogen) Assigned to: Mark Hammond (mhammond) Summary: Can't reset Spam folder if folder is lost Initial Comment: If the Spam Manager is set up to move spam to a folder and that folder disappears, the Spam Manager may show that spam is to be delivered to . In this event trying to browse the folder list will not list any folders, and you will be unable to set the Spam folder to anything else. ---------------------------------------------------------------------- >Comment By: Benjamin J. Judson (astrogen) Date: 2003-04-14 16:26 Message: Logged In: YES user_id=752965 Where is the log file kept? I looked for it before submitting, and since your posted. I don't have the name or location for where I could find it. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-10 01:06 Message: Logged In: YES user_id=14198 If there a traceback associated with this? I regularly "test" this, thanks to Outlook screwing all my folder IDs as I reconfigure Outlook, and I don't have the problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 From noreply at sourceforge.net Mon Apr 14 15:32:42 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 14 17:16:02 2003 Subject: [Spambayes] [ spambayes-Bugs-717998 ] Can't reset Spam folder if folder is lost Message-ID: Bugs item #717998, was opened at 2003-04-09 00:37 Message generated for change (Comment added) made by astrogen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Benjamin J. Judson (astrogen) Assigned to: Mark Hammond (mhammond) Summary: Can't reset Spam folder if folder is lost Initial Comment: If the Spam Manager is set up to move spam to a folder and that folder disappears, the Spam Manager may show that spam is to be delivered to . In this event trying to browse the folder list will not list any folders, and you will be unable to set the Spam folder to anything else. ---------------------------------------------------------------------- >Comment By: Benjamin J. Judson (astrogen) Date: 2003-04-14 16:32 Message: Logged In: YES user_id=752965 Just figured there would be a log file somewhere.. but anywho... I manually ran manager.py Heres the traceback (copied and pasted from a dos prompt window) Traceback (most recent call last): File "C:\spambayes-1.0a2\Outlook2000\dialogs\FolderSelector.py", line 309, in OnInitDialog self.expand_ids = self._DetermineFoldersToExpand() File "C:\spambayes-1.0a2\Outlook2000\dialogs\FolderSelector.py", line 226, in _DetermineFoldersToExpand folder = self.manager.message_store.GetFolder(folder_id) File "C:\spambayes-1.0a2\Outlook2000\msgstore.py", line 225, in GetFolder table = folder.GetContentsTable(0) pywintypes.com_error: (-2147467259, 'Unspecified error', None, None) win32ui: OnInitDialog() virtual handler (>) raised an exception Bayes database is not dirty - not writing ---------------------------------------------------------------------- Comment By: Benjamin J. Judson (astrogen) Date: 2003-04-14 16:26 Message: Logged In: YES user_id=752965 Where is the log file kept? I looked for it before submitting, and since your posted. I don't have the name or location for where I could find it. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-10 01:06 Message: Logged In: YES user_id=14198 If there a traceback associated with this? I regularly "test" this, thanks to Outlook screwing all my folder IDs as I reconfigure Outlook, and I don't have the problem. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=717998&group_id=61702 From noreply at sourceforge.net Tue Apr 15 02:10:42 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 15 03:53:11 2003 Subject: [Spambayes] [ spambayes-Bugs-721664 ] mboxtrain.py doesn't find Maildir tmp/ directory properly Message-ID: Bugs item #721664, was opened at 2003-04-15 04:10 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=721664&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: David M. Cooke (dmcooke) Assigned to: Nobody/Anonymous (nobody) Summary: mboxtrain.py doesn't find Maildir tmp/ directory properly Initial Comment: The Maildir handler for mboxtrain.py tries to use a tmp/ under the directory passed to it. It should use a tmp/ directory at the same level (so given a ~/Maildir/cur, it should use ~/Maildir/tmp). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=721664&group_id=61702 From anthony at interlink.com.au Tue Apr 15 19:02:54 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Tue Apr 15 04:04:37 2003 Subject: [Spambayes] mboxtrain.py chokes on bugtraq email messages In-Reply-To: <20030414180705.A4B292DE9A@cashew.wolfskeep.com> Message-ID: <200304150802.h3F82tm19835@localhost.localdomain> >>> "T. Alexander Popiel" wrote > > Here's a question: what is the proper behaviour for these messages? > [multipart/digest] > > Should the entire message get a ham/spam score, should the individual > sub-messages get their own scores, or both? If both, how should the > individual scores be combined into the overall score? Should the digest > be broken into multiple messages: one containing ham, one containing > spam, and one containing unsure? The problem is working out a meaning for scoring parts of a message, and making them visible to the user. I'd be inclined towards simply marking the message as a whole (with multiple to: tokens, &c). If the user's got a sufficiently clueful mailer (like MH :) they can burst the digests before the scoring happens, in the event that they want the individual messages scored... Anthony -- Anthony Baxter It's never too late to have a happy childhood. From matos at attbi.com Tue Apr 15 19:40:46 2003 From: matos at attbi.com (David Matos) Date: Tue Apr 15 18:41:16 2003 Subject: [Spambayes] Outlook addin clipboard bug, etc. Message-ID: <000201c303a0$0e91c230$8d80b042@dexter> First off, thanks to you guys for doing such a fine job with this great anti-spam tool. I noticed a small but annoying bug while using the "002" version of the Outlook Spambayes addin. I'm running Outlook 2002 and WinXP Pro. I find that when I put text on the clipboard--an URL for example--then launch Outlook to compose a new message to someone, the clipboard text has been replaced instead by the smiley icon from the "recover from spam" button. If I then try again by copying the text a second time it works OK. Is there any way to fix this? I apologize in advance if this bug has already been reported. Also, is it possible to customize the buttons for the addin? For example, is there a way to change the text from "recover from spam" to "unspam"? Can the icons be changed too? Thanks in advance for reading this. And thanks again for a great piece of software. --Dave Matos From noreply at sourceforge.net Tue Apr 15 18:16:26 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 15 19:59:42 2003 Subject: [Spambayes] [ spambayes-Bugs-675812 ] Outlook registration/doc issues Message-ID: Bugs item #675812, was opened at 2003-01-28 12:40 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 Category: Outlook Group: None >Status: Closed Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Outlook registration/doc issues Initial Comment: The plugin should be listed in Outlook's COM plug-ins list. In fact, the doc says that this is so! This is not the case (here at least). This would allow nice removal (and addition??) rather than running addin.py --unregister and so on. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-16 12:16 Message: Logged In: YES user_id=552329 Ever since the first binary this has been fixed (although the docs *could* specify that the plugin will only be listed if you use a binary). I thought I had already closed this, but was still open, so I'll close it now. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702 From mhammond at skippinet.com.au Wed Apr 16 12:08:41 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue Apr 15 21:09:46 2003 Subject: [Spambayes] Outlook addin clipboard bug, etc. In-Reply-To: <000201c303a0$0e91c230$8d80b042@dexter> Message-ID: <003501c303b4$b9786e60$530f8490@eden> > I noticed a small but annoying bug while using the "002" > version of the > Outlook Spambayes addin. I'm running Outlook 2002 and WinXP > Pro. I find that > when I put text on the clipboard--an URL for example--then > launch Outlook to > compose a new message to someone, the clipboard text has been replaced > instead by the smiley icon from the "recover from spam" > button. If I then > try again by copying the text a second time it works OK. Is > there any way to > fix this? I apologize in advance if this bug has already been > reported. Believe it or not, this is the only way to stick a custom image on a toolbar or button. I guess what I could do is save and restore the clipboard data, but this can get messy quickly if the data is not simple text. > Also, is it possible to customize the buttons for the addin? > For example, is > there a way to change the text from "recover from spam" to > "unspam"? Can the > icons be changed too? Nope, but if we can agree on better names for the buttons, that would be great. Another option would be to stick that button on the dropdown, but that seems less useful. Mark. From charl at infosat.net Wed Apr 16 09:26:19 2003 From: charl at infosat.net (Charl Matthee) Date: Wed Apr 16 02:26:34 2003 Subject: [Spambayes] mboxtrain.py chokes on non-existent directories In-Reply-To: <16025.58326.760755.955242@montanaro.dyndns.org> References: <20030413210323.GG7614@sa02.infosat.net> <16025.58326.760755.955242@montanaro.dyndns.org> Message-ID: <20030416062619.GI14917@sa02.infosat.net> On Sun Apr 13 2003 at 05:25:26PM -0500 'Skip Montanaro' wrote: > I just checked in a change to mboxtrain.py but can't really test it because > I don't use that code. Please "cvs up" and give it a try. Here is what I get: charl@sa02:~spambayes$ mboxtrain.py -d $HOME/.hammiedb -g /nonexistent Training ham (/nonexistent): Traceback (most recent call last): File "spambayes/mboxtrain.py", line 294, in ? main() File "spambayes/mboxtrain.py", line 281, in main train(h, g, False, force, trainnew) File "spambayes/mboxtrain.py", line 212, in train raise ValueError("Nonexistent path: %s" % path) ValueError: Nonexistent path: /nonexistent charl@sa02:~spambayes$ mboxtrain.py -d $HOME/.hammiedb -s /nonexistent Training spam (/nonexistent): Traceback (most recent call last): File "spambayes/mboxtrain.py", line 294, in ? main() File "spambayes/mboxtrain.py", line 286, in main train(h, s, True, force, trainnew) File "spambayes/mboxtrain.py", line 212, in train raise ValueError("Nonexistent path: %s" % path) ValueError: Nonexistent path: /nonexistent Looks good. Ciao Charl __________________________________________________________________________ [ Charl Matthee ] [ +27-11-721-3800 ] [ Systems Manager ] [ +27-11-405-6508 ] __________________________________________________________________________ From lists at olivermaunder.co.uk Wed Apr 16 12:08:59 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 16 06:09:19 2003 Subject: [Spambayes] IMAPFilter Message-ID: <3E9D2BBB.9090500@olivermaunder.co.uk> Hi all I've been playing around with imapfilter.py, which is shaping up very nicely (thanks Tony and Tim!). However, I'm having a couple of problems with the latest version, which isn't too surprising. To quote Tim's CVS comment: "IMAP seems to be a really flukey kind of interface, and until it's been used on lots of imap servers, by lots of people, I won't be convinced that it's really correct." Well, for a start my IMAP server doesn't seem to like the preserved timestamps on the messages. The error message is Timestamp: Wed, 02 Apr 2003 10:12:56 +0100 Traceback (most recent call last): File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 415, in ? imap_filter.Filter() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 323, in Filter folder.Filter(self.classifier, self.spam_folder, self.unsure_folder) File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 279, in Filter msg.Save() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 151, in Save time_stamp, self.as_string()) File "C:\Program Files\Python22\lib\imaplib.py", line 296, in append return self._simple_command(name, mailbox, flags, date_time) File "C:\Program Files\Python22\lib\imaplib.py", line 925, in _simple_command return self._command_complete(name, apply(self._command, (name,) + args)) File "C:\Program Files\Python22\lib\imaplib.py", line 762, in _command_complete raise self.error('%s command error: %s %s' % (name, typ, data)) imaplib.error: APPEND command error: BAD ['Invalid date-time in Append command'] Second problem is that my IMAP server doesn't like messages with bare newlines. It expects lines to end with \r\n, and refuses to APPEND messages with lines that end in just \n. The easiest thing to do would be to just replace single \n's with \r\n. I'm sure that with the power of python it won't need more than one line of code to do this, but I'm a total python newbie. Anyone care to enlighten me as to what this line might be? Olly From lists at olivermaunder.co.uk Wed Apr 16 12:32:40 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 16 06:32:59 2003 Subject: [Spambayes] IMAPFilter In-Reply-To: <3E9D2BBB.9090500@olivermaunder.co.uk> References: <3E9D2BBB.9090500@olivermaunder.co.uk> Message-ID: <3E9D3148.80402@olivermaunder.co.uk> Oliver Maunder wrote: > Second problem is that my IMAP server doesn't like messages with bare > newlines. It expects lines to end with \r\n, and refuses to APPEND > messages with lines that end in just \n. The easiest thing to do would > be to just replace single \n's with \r\n. I'm sure that with the power > of python it won't need more than one line of code to do this, but I'm > a total python newbie. Anyone care to enlighten me as to what this > line might be? Well, answering my own post, but I've now got (in IMAPMessage.Save) msg = self.as_string().replace("\n", "\r\n") msg = msg.replace("\r\r\n", "\r\n") which seems to work well enough to get the server to accept the message. Olly From noreply at sourceforge.net Wed Apr 16 13:02:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 16 14:45:43 2003 Subject: [Spambayes] [ spambayes-Bugs-722672 ] MANIFEST.in -- sdist cannot reproduce itself Message-ID: Bugs item #722672, was opened at 2003-04-16 19:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=722672&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Terrel Shumway (terrelshumway) Assigned to: Nobody/Anonymous (nobody) Summary: MANIFEST.in -- sdist cannot reproduce itself Initial Comment: minor packaging nit. to reproduce: --- cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/spambayes login cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/spambayes co spambayes cd spambayes python setup.py sdist cd .. tar xzf spambayes/dist/spambayes-1.0a2.tar.gz tar tzf spambayes/dist/spambayes-1.0a2.tar.gz >cvs cd spambayes-1.0a2/ python setup.py sdist cd .. mv spambayes-1.0a2 spambayes-sdist tar tzf spambayes-sdist/dist/spambayes-1.0a2.tar.gz >sdist diff cvs sdist --- result: some files are missing from the second sdist: notably MANIFEST.in --- expected behavior: the file lists should be identical --- Also if you diff -r spambayes spambayes-sdist you will notice that spambayes_addin.iss spambayes_addin.spec runtests.sh dotest.sh are missing from the first sdist ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=722672&group_id=61702 From T.A.Meyer at massey.ac.nz Thu Apr 17 13:10:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 16 20:15:09 2003 Subject: [Spambayes] IMAPFilter Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150C9C2@its-xchg4.massey.ac.nz> > I've been playing around with imapfilter.py, which is shaping up very > nicely (thanks Tony and Tim!). No worries. Good to know that someone else is testing it as well. > However, I'm having a couple of problems with the latest > version, which isn't too surprising. :) > Well, for a start my IMAP server doesn't seem to like the preserved > timestamps on the messages. This was our fault. The server that Tim tested against must have been generous in accepting an invalid date - we were passing one in RFC822 format when it should have been in the imap format. I've added in some magic formatting changes and if you cvs-up it should now work. =Tony Meyer From dave at boost-consulting.com Thu Apr 17 10:16:21 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 17 09:16:28 2003 Subject: [Spambayes] imapfilter mangling headers! Message-ID: So I just got started with spambayes and tried to train my system using imapfilter.py. The first problem was that I had to discover on my own that I needed to edit spambayes/Options.py in order to keep imapfilter.py from raising exceptions. Then, once I'd done that, I was able to get it to do this: %src/spambayes/imapfilter.py -t -c -v -D bayes.db Loading database bayes.db... Loading state from bayes.db database bayes.db is a new database Done. Training Training took 10.5770339966 seconds, 0 messages were trained Classifying ... Now, should I be concerned that "0 messages were trained"? Should I be concerned that -v didn't produce much verbose output? Since I didn't get any, I decided to poke around and see what was happening. I went to the "unsure" mailbox (known as UnsureBox) in Gnus, and found that none of the messages showed up with senders or subjects. Taking a look at the raw messages, I found the following: Needless to say, I interrupted the classification process! What should I do now? At least 1000 messages have been processed this way. Are they hopelessly mangled? -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 5326 Url: http://mail.python.org/pipermail/spambayes/attachments/20030417/86389200/attachment.eml -------------- next part -------------- -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Thu Apr 17 09:34:00 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 09:35:58 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: Message-ID: 4/17/2003 8:16:21 AM, David Abrahams wrote: > >So I just got started with spambayes and tried to train my system >using imapfilter.py. The first problem was that I had to discover on >my own that I needed to edit spambayes/Options.py in order to keep >imapfilter.py from raising exceptions. Keeping in mind that this is still classified as alpha level software... We are currently writing the documentation so you don't have to discover this stuff for yourself, and the configuration prog so you can do this through your browser. It will modify a file named bayescustomize.ini, which is where your Options.py modifications should go. > >Then, once I'd done that, I was able to get it to do this: > > %src/spambayes/imapfilter.py -t -c -v -D bayes.db > Loading database bayes.db... Loading state from bayes.db database > bayes.db is a new database > Done. > Training > Training took 10.5770339966 seconds, 0 messages were trained > Classifying > ... > >Now, should I be concerned that "0 messages were trained"? Not if you didn't have anything in your training folders. > >Should I be concerned that -v didn't produce much verbose output? No >Since I didn't get any, I decided to poke around and see what was >happening. I went to the "unsure" mailbox (known as UnsureBox) in >Gnus, and found that none of the messages showed up with senders or >subjects. Taking a look at the raw messages, I found the following: There's nothing here... > >Needless to say, I interrupted the classification process! What >should I do now? At least 1000 messages have been processed this >way. Are they hopelessly mangled? We have not seen header dropping in our testing, which is in the very early stages. Since you interrupted the classification, the original messages should still be in your inbox, though they may have their delete flag set. What imap server are you using? At any rate, you should probably refrain from using the filter on your production mailbox until we figure out what happened when it looked at your system. To quote a post from a few days ago: "IMAP seems to be a really flukey kind of interface, and until it's been used on lots of imap servers, by lots of people, I won't be convinced that it's really correct." c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From dave at boost-consulting.com Thu Apr 17 10:56:45 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 17 09:56:54 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: Message-ID: Tim Stone - Four Stones Expressions writes: > 4/17/2003 8:16:21 AM, David Abrahams wrote: > >> >>So I just got started with spambayes and tried to train my system >>using imapfilter.py. The first problem was that I had to discover on >>my own that I needed to edit spambayes/Options.py in order to keep >>imapfilter.py from raising exceptions. > > Keeping in mind that this is still classified as alpha level > software... We are currently writing the documentation so you don't > have to discover this stuff for yourself, and the configuration prog > so you can do this through your browser. It will modify a file > named bayescustomize.ini, which is where your Options.py > modifications should go. That's nifty! Unfortunately, I am not the system administrator on the system where this will run, so installing things that can integrate with a webserver may not be an option for me. >>Then, once I'd done that, I was able to get it to do this: >> >> %src/spambayes/imapfilter.py -t -c -v -D bayes.db >> Loading database bayes.db... Loading state from bayes.db database >> bayes.db is a new database >> Done. >> Training >> Training took 10.5770339966 seconds, 0 messages were trained >> Classifying >> ... >> >>Now, should I be concerned that "0 messages were trained"? > > Not if you didn't have anything in your training folders. Oh, but I did! Around 200 messages in each one! >>Should I be concerned that -v didn't produce much verbose output? > > No > >>Since I didn't get any, I decided to poke around and see what was >>happening. I went to the "unsure" mailbox (known as UnsureBox) in >>Gnus, and found that none of the messages showed up with senders or >>subjects. Taking a look at the raw messages, I found the following: > > There's nothing here... Sorry, maybe the enclosure was dropped. It begins: --- Return-Path: Received: from mx02.mrf.mail.rcn.net ([207.172.4.51] verified) by stlport.com (CommuniGate Pro SMTP 3.5.9) with ESMTP id 201556 for dave@boost-consulting.com; Fri, 21 Feb 2003 17:50:31 -0800Received: from node-c-0bfb.a2000.nl ([62.194.11.251] helo=mx.mail.rcn.net) by mx02.mrf.mail.rcn.net with smtp (Exim 3.35 #4) id 18mOoA-0001sg-00 for david.abrahams@rcn.com; Fri, 21 Feb 2003 20:50:30 -0500From: "FRANK OKERE" Date: Sat, 22 Feb 2003 02:50:10To: david.abrahams@rcn.comSubject: STRICLTY CONFIDENTIALMIME-Version: 1.0Content-Type: text/plain;charset="iso-8859-1"Content-Transfer-Encoding: 7bitMessage-Id: X-Spam-Warning: This message was accepted from a host or IP address which is listed on one or more email blocking lists. Please see http://www.mail.rcn.net/external/x-header/ for more informationX-Spam-Warning: [SPEWS] [1] a2000.nl, see http://spews.org/ask.cgi?S2000X-Spambayes-Classification: unsureGood Day, With warm heart I offer my friendship, and my greetings, and I hope this letter meets you in good time. It will be surprising to you to receive this proposal from me since you do not know me personally. However, I am sincerely seeking your confidence in this transaction, which I propose with my free mind and as a person of integrity. --- That's 3 lines; the middle one is blank. >>Needless to say, I interrupted the classification process! What >>should I do now? At least 1000 messages have been processed this >>way. Are they hopelessly mangled? > > We have not seen header dropping in our testing, which is in the very early > stages. They're not dropped; they have apparently had all the newlines removed! > Since you interrupted the classification, the original messages > should still be in your inbox, though they may have their delete > flag set. > > What imap server are you using? Communigate Pro > At any rate, you should probably refrain from using the filter on > your production mailbox until we figure out what happened when it > looked at your system. To quote a post from a few days ago: "IMAP > seems to be a really flukey kind of interface, and until it's been > used on lots of imap servers, by lots of people, I won't be > convinced that it's really correct." Thanks for your help, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Thu Apr 17 09:58:29 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 09:58:44 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: Message-ID: 4/17/2003 8:49:24 AM, David Abrahams wrote: >That's nifty! Unfortunately, I am not the system administrator on >the system where this will run, so installing things that can >integrate with a webserver may not be an option for me. The 'web' configuration program runs locally. The browser will just give you a user interface to the configuration program. You'll use a url like: http://localhost:8880 or something like that. >> Not if you didn't have anything in your training folders. > >Oh, but I did! Around 200 messages in each one! That's interesting... hmmmm. I'll definitely check that one out, then. > >They're not dropped; they have apparently had all the newlines >removed! Yes. I certainly can see that. We just got a report from another imap user yesterday about his imap server giving an error related to newlines, with a patch that replaced \n with \r\n in the message as it was being saved into the classification mailbox. He indicated that the patch solved the problem, so I am currently implementing that change. However, my imap server doesn't seem to mind the newline malformation that his did, so I can't really test the fix. I'll probably check it in later today or tonight. If you don't mind checking it (on a test account) and letting us know if it helps, we'd greatly appreciate it. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From dave at boost-consulting.com Thu Apr 17 11:03:40 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 17 10:04:50 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: Message-ID: Tim Stone - Four Stones Expressions writes: > Keeping in mind that this is still classified as alpha level software... We > are currently writing the documentation so you don't have to discover this > stuff for yourself, and the configuration prog so you can do this through your > browser. It will modify a file named bayescustomize.ini, which is where your > Options.py modifications should go. > At any rate, you should probably refrain from > using the filter on your production mailbox until we figure out what happened > when it looked at your system. To quote a post from a few days ago: "IMAP > seems to be a really flukey kind of interface, and until it's been used > on lots of imap servers, by lots of people, I won't be convinced that it's > really correct." See, what happened was that I got back from my travels and finally decided to apply some of the very helpful advice I'd gotten about how to set up server-side filtering. A post by you said: ``You should start by reading http://spambayes.sourceforge.net/applications.html. There is a link to a page called "guide to integrating hammie with your mailer" on that page that should give you some good starting points.'' Which I did. I followed the link, which actually led to http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambayes/INTEGRATION.txt, which told me all about imapfilter.py... which I guess is a new thing that wasn't around a few weeks ago when you gave me these directions? Anyway, if I should start somewhere else, please let me know. I think that with some work I could create mbox files of the IMAP messages I have classified if I have to. BTW, it appears that one or the other of my mailers actually deleted the messages that imapfilter marked as deleted as it moved things into UnsureBox. FWIW-ly, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Thu Apr 17 10:49:24 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 17 10:07:28 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: (Tim Stone - Four Stones Expressions's message of "Thu, 17 Apr 2003 08:34:00 -0500") References: Message-ID: Tim Stone - Four Stones Expressions writes: > 4/17/2003 8:16:21 AM, David Abrahams wrote: > >> >>So I just got started with spambayes and tried to train my system >>using imapfilter.py. The first problem was that I had to discover on >>my own that I needed to edit spambayes/Options.py in order to keep >>imapfilter.py from raising exceptions. > > Keeping in mind that this is still classified as alpha level > software... We are currently writing the documentation so you don't > have to discover this stuff for yourself, and the configuration prog > so you can do this through your browser. It will modify a file > named bayescustomize.ini, which is where your Options.py > modifications should go. That's nifty! Unfortunately, I am not the system administrator on the system where this will run, so installing things that can integrate with a webserver may not be an option for me. >>Then, once I'd done that, I was able to get it to do this: >> >> %src/spambayes/imapfilter.py -t -c -v -D bayes.db >> Loading database bayes.db... Loading state from bayes.db database >> bayes.db is a new database >> Done. >> Training >> Training took 10.5770339966 seconds, 0 messages were trained >> Classifying >> ... >> >>Now, should I be concerned that "0 messages were trained"? > > Not if you didn't have anything in your training folders. Oh, but I did! Around 200 messages in each one! >>Should I be concerned that -v didn't produce much verbose output? > > No > >>Since I didn't get any, I decided to poke around and see what was >>happening. I went to the "unsure" mailbox (known as UnsureBox) in >>Gnus, and found that none of the messages showed up with senders or >>subjects. Taking a look at the raw messages, I found the following: > > There's nothing here... Sorry, maybe the enclosure was dropped. It begins: --- Return-Path: Received: from mx02.mrf.mail.rcn.net ([207.172.4.51] verified) by stlport.com (CommuniGate Pro SMTP 3.5.9) with ESMTP id 201556 for dave@boost-consulting.com; Fri, 21 Feb 2003 17:50:31 -0800Received: from node-c-0bfb.a2000.nl ([62.194.11.251] helo=mx.mail.rcn.net) by mx02.mrf.mail.rcn.net with smtp (Exim 3.35 #4) id 18mOoA-0001sg-00 for david.abrahams@rcn.com; Fri, 21 Feb 2003 20:50:30 -0500From: "FRANK OKERE" Date: Sat, 22 Feb 2003 02:50:10To: david.abrahams@rcn.comSubject: STRICLTY CONFIDENTIALMIME-Version: 1.0Content-Type: text/plain;charset="iso-8859-1"Content-Transfer-Encoding: 7bitMessage-Id: X-Spam-Warning: This message was accepted from a host or IP address which is listed on one or more email blocking lists. Please see http://www.mail.rcn.net/external/x-header/ for more informationX-Spam-Warning: [SPEWS] [1] a2000.nl, see http://spews.org/ask.cgi?S2000X-Spambayes-Classification: unsureGood Day, With warm heart I offer my friendship, and my greetings, and I hope this letter meets you in good time. It will be surprising to you to receive this proposal from me since you do not know me personally. However, I am sincerely seeking your confidence in this transaction, which I propose with my free mind and as a person of integrity. --- That's 3 lines; the middle one is blank. >>Needless to say, I interrupted the classification process! What >>should I do now? At least 1000 messages have been processed this >>way. Are they hopelessly mangled? > > We have not seen header dropping in our testing, which is in the very early > stages. They're not dropped; they have apparently had all the newlines removed! > Since you interrupted the classification, the original messages > should still be in your inbox, though they may have their delete > flag set. > > What imap server are you using? Communigate Pro > At any rate, you should probably refrain from using the filter on > your production mailbox until we figure out what happened when it > looked at your system. To quote a post from a few days ago: "IMAP > seems to be a really flukey kind of interface, and until it's been > used on lots of imap servers, by lots of people, I won't be > convinced that it's really correct." Thanks for your help, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Thu Apr 17 11:09:02 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 17 10:11:39 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: Message-ID: Tim Stone - Four Stones Expressions writes: > 4/17/2003 8:49:24 AM, David Abrahams wrote: > >>That's nifty! Unfortunately, I am not the system administrator on >>the system where this will run, so installing things that can >>integrate with a webserver may not be an option for me. > > The 'web' configuration program runs locally. The browser will just give you > a user interface to the configuration program. You'll use a url like: > http://localhost:8880 or something like that. > I'm a web-idiot, so if you say this will work, I'll take your word for it. > Yes. I certainly can see that. We just got a report from another imap user > yesterday about his imap server giving an error related to newlines, with a > patch that replaced \n with \r\n in the message as it was being saved into the > classification mailbox. He indicated that the patch solved the problem, so I > am currently implementing that change. However, my imap server doesn't seem > to mind the newline malformation that his did, so I can't really test the fix. > I'll probably check it in later today or tonight. If you don't mind checking > it (on a test account) and letting us know if it helps, we'd greatly > appreciate it. I'd love to, but I think that whatever it takes to set up a test account is probably beyond me, especially since I don't have admin privileges on my server machine. Maybe I could convince the admin to set up a mail-only account for you guys so that you can do testing (?) Would that help? -- Dave Abrahams Boost Consulting www.boost-consulting.com From lists at olivermaunder.co.uk Thu Apr 17 16:14:19 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 17 10:14:31 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: References: Message-ID: <3E9EB6BB.30007@olivermaunder.co.uk> Here are my results... C:\spambayes>imapfilter.py -d imapfilter.db -t -v Loading database imapfilter.db... Loading state from imapfilter.db pickle imapfilter.db is a new pickle Done. Training Training took 72.5730000734 seconds, 0 messages were trained Exception exceptions.AttributeError: "'NoneType' object has no attribute 'error' " in > ignored Again, 0 messages trained. I've got 122 messages in my ham-train folder, and 95 in the spam-train folder. imapfilter.py seems to be reading the folder names from bayescustomize.ini OK, because when I tell it to classify, it dumps everything into the Unsure folder. Which is reasonable if it thinks it hasn't been trained on anything! I get that error message at the end whenever I run imapfilter.py - even if I just do "imapfilter.py -h" - it prints out the help text, and tacks that error onto the end. Given the other problems, it could be significant. Exactly the same thing happens if I use the -D flag. Except the training is 8 seconds quicker ;-) Olly From tim at fourstonesExpressions.com Thu Apr 17 10:20:19 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 10:20:32 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: <3E9EB6BB.30007@olivermaunder.co.uk> Message-ID: 4/17/2003 9:14:19 AM, Oliver Maunder wrote: >Here are my results... > >C:\spambayes>imapfilter.py -d imapfilter.db -t -v >Loading database imapfilter.db... Loading state from imapfilter.db pickle >imapfilter.db is a new pickle >Done. >Training >Training took 72.5730000734 seconds, 0 messages were trained >Exception exceptions.AttributeError: "'NoneType' object has no attribute >'error' >" in 0x009B059 >0>> ignored > >Again, 0 messages trained. I've got 122 messages in my ham-train folder, >and 95 in the spam-train folder. > >imapfilter.py seems to be reading the folder names from >bayescustomize.ini OK, because when I tell it to classify, it dumps >everything into the Unsure folder. Which is reasonable if it thinks it >hasn't been trained on anything! Clearly we have a problem in training. > >I get that error message at the end whenever I run imapfilter.py - even >if I just do "imapfilter.py -h" - it prints out the help text, and tacks >that error onto the end. Given the other problems, it could be significant. This is a bug in dumbdbm. You should have the proper version of bsddb installed, and the system will then automatically select that version rather than dumbdbm. Installing that will require migrating your training database, which can be accomplished with dbExpImp.py. Also, if you do, you'll need to delete spambayes.msginfo.db, which may make the filter reclassify some messages, but shouldn't really hurt anything. The other alternative is to live with the exception, which doesn't really hurt anything. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Thu Apr 17 10:20:38 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 10:20:51 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: Message-ID: 4/17/2003 9:03:40 AM, David Abrahams wrote: >which I guess is a new thing >that wasn't around a few weeks ago when you gave me these directions? The imap filter is about a week old. > >Anyway, if I should start somewhere else, please let me know. I >think that with some work I could create mbox files of the IMAP >messages I have classified if I have to. I'm sure I thought you were integrating hammie. If you'd have mentioned imap, I'd have told you at the time that we didn't have an imap filter. It seems our timing was just star-crossed. It sounds like imap is what you're really after, and we're getting there. But we're finding that while imap syntax is well documented, the semantic is left to the server, and so different implementations do different things when you invoke the api. > >BTW, it appears that one or the other of my mailers actually deleted >the messages that imapfilter marked as deleted as it moved things >into UnsureBox. Argh. I sincerely apologize for that. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at olivermaunder.co.uk Thu Apr 17 16:46:13 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 17 10:46:26 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: References: Message-ID: <3E9EBE35.40105@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >Clearly we have a problem in training. > > Looks like it. Training was working for me a few days ago - so it must be something pretty recent that's changed there, rather than being more IMAP incompatibilities. >This is a bug in dumbdbm. You should have the proper version of bsddb >installed, and the system will then automatically select that version rather >than dumbdbm. Installing that will require migrating your training database, >which can be accomplished with dbExpImp.py. Also, if you do, you'll need to >delete spambayes.msginfo.db, which may make the filter reclassify some >messages, but shouldn't really hurt anything. The other alternative is to >live with the exception, which doesn't really hurt anything. > > Good to know. I can live with the exception for now - but I was worried it might have something to do with the training problems. As for migrating the training database - as it currently contains 0 messages, that shouldn't be an issue ;-) Olly From tim at fourstonesExpressions.com Thu Apr 17 11:18:09 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 11:18:22 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: <3E9EBE35.40105@olivermaunder.co.uk> Message-ID: 4/17/2003 9:46:13 AM, Oliver Maunder wrote: >Tim Stone - Four Stones Expressions wrote: > >>Clearly we have a problem in training. >> >> >Looks like it. Training was working for me a few days ago - so it must >be something pretty recent that's changed there, rather than being more >IMAP incompatibilities. I found it. It'll be fixed in the next checkin. >Good to know. I can live with the exception for now - but I was worried >it might have something to do with the training problems. As for >migrating the training database - as it currently contains 0 messages, >that shouldn't be an issue ;-) The only problem I've found that this exception causes (now that training is working), is that sometimes the system doesn't remember how it trained a message, and so it can get trained multiple times. The reason it doesn't remember is because of the error in closing the dbm file, which is the point at which it is synced with the file. They don't call it dumbdbm for nothing.... For testing, this probably isn't a problem, but for production work this simply won't work. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Thu Apr 17 13:07:45 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Thu Apr 17 13:08:00 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: <3E9EBE35.40105@olivermaunder.co.uk> Message-ID: <6ZKFOL74JIYXC7ZW6ZK3B99874JT.3e9edf61@myst> 4/17/2003 9:46:13 AM, Oliver Maunder wrote: >Good to know. I can live with the exception for now - but I was worried >it might have something to do with the training problems. As for >migrating the training database - as it currently contains 0 messages, >that shouldn't be an issue ;-) I've found what's causing the dumbdbm exception. If you make the _commit method in dumbdbm.py look like the following, the exception magically disappears, and life is good. Let me point out, however, that dumbdbm is not the way to go here... :) def _commit(self): # insert the following two lines if _os is None: return try: _os.unlink(self._bakfile) ..... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From Olivier.Biot at siemens.com Thu Apr 17 19:24:25 2003 From: Olivier.Biot at siemens.com (Biot Olivier) Date: Thu Apr 17 13:26:46 2003 Subject: [Spambayes] Won't install Message-ID: <6B546A602AD2D211BFF00008C7A428890683172D@hrtades2.atea.be> Hi, I tried to install SpamBayes, but without success. I always get: pythoncom error: Failed to call the universal dispatcher (see attached trace) My configuration: Laptop running Win2000 SP3 plus latest OS patches Outlook 2000 plus all patches; connected to an Exchange server (therefore no POP3/SMTP). I installed both Python 2.2.2 and win32all-152, once on D: and then on C: but without result. If I use the binary installer, I get to see the COM plugin but it never gets checked (see attached log). If I compile the latest released version (a2) and manually register it, then I don't see a "real" entry (it looks like there is only a partial plugin registration); I can however run manager.py from the command-line and I get the configuration GUI. What is wrong? Regards, Olivier -------------- next part -------------- A non-text attachment was scrubbed... Name: spambayes1.log Type: application/octet-stream Size: 1135 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030417/986846bf/spambayes1.obj From T.A.Meyer at massey.ac.nz Fri Apr 18 11:52:47 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 17 18:53:26 2003 Subject: [Spambayes] imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB18@its-xchg4.massey.ac.nz> > >They're not dropped; they have apparently had all the > newlines removed! I *think* I should be able to write a script that will put these back into shape. I'll have a good at duplicating the problem and see what I can do. I'll look at this now. I'm sorry about the messages that have been mangled; perhaps our *warning: alpha* messages need to be clearer. I might add a logging option as well, a la pop3proxy. I didn't originally, because we don't actually delete any messages (unless you set imap_expunge to True) and so the originals are there to undelete. I didn't count on a mailer automatically purging (none of the mailers I've used have done that). > See, what happened was that I got back from my travels and > finally decided to apply some of the very helpful advice I'd > gotten about how to set up server-side filtering. A post by you said: > > ``You should start by reading > http://spambayes.sourceforge.net/applications.html. There is a > link to a page called "guide to integrating hammie with your > mailer" on that page that should give you some good starting > points.'' Strange. That link didn't exist until yesterday (I added it) - I also added the imap filter documentation yesterday. I suppose I should have mentioned that the whole thing is only a week or so old, and so even more alpha than spambayes as a whole is. > BTW, it appears that one or the other of my mailers > actually deleted the messages that imapfilter marked as > deleted as it moved things into UnsureBox. That is a *very* bad thing for a mailer to do. So that we can add a warning to the docs, could you tell us which mailer that is? >So I just got started with spambayes and tried to train my system using >imapfilter.py. The first problem was that I had to discover on my own >that I needed to edit spambayes/Options.py in order to keep >imapfilter.py from raising exceptions. You shouldn't have needed to do this. What exceptions were raised? Or do you just mean that you had to add your server information? My fault that the documentation is weak abou adding your own information. I've just about done a web ui for imap filter (like the one for pop3proxy) and didn't want to have to rewrite the documentation a day after I committed it. [Test account] > I'd love to, but I think that whatever it takes to set > up a test account is probably beyond me, You don't need to have a whole separate account (although this is easier). The filter won't (can't) touch any folders that aren't specified in the options. So if you create a new folder and put some test mail in there (even if you just send something to yourself), you can use that to test. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 18 12:47:29 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 17 19:48:04 2003 Subject: [Spambayes] Email package and the CRLF pair Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB1A@its-xchg4.massey.ac.nz> I'm hoping that there are still some Python experts hanging about here and they've just been quiet for a bit. As you may have noticed, we ran into a bit of trouble recently with the imap filter because messages had only \n (CR) and not \r\n (CRLF). Message bodies don't really matter here; what matters are the headers. As far as I can see*, when the email package** returns a message via str() or as_string(), it only puts a CR after each header. But RFC2822 says that: "Header fields are lines composed of a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF." [Section 2.2] Is this therefore a bug in the email package that I should report? Advice from someone that knows more than me (is Barry there?) about these things would be appreciated. =Tony Meyer * Digging through the email Message module, then the Generator module, then the Header module into the _encode_chunks function. ** Both in Python 2.2.2 and cvs. From T.A.Meyer at massey.ac.nz Fri Apr 18 13:46:45 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 17 20:47:23 2003 Subject: [Spambayes] imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB1E@its-xchg4.massey.ac.nz> > > >They're not dropped; they have apparently had all the > > newlines removed! > I *think* I should be able to write a script that will put these back > into shape. I'll have a good at duplicating the problem and > see what I can do. I'll look at this now. Now I'm not so sure :(. Unfortunately (?) I can't duplicate this behaviour on either of the imap servers I have available (NetMail and Courier), so it's hard to tell what the messages would look like if I did a IMAP fetch (i.e. is the \n there somewhere and could be retrieved, or was it stripped at the append). If it's not there then I don't think this can be done easily if at all. The problem is finding the start of the header. If it looks like this: "X-Header1: blahX-Header2: xxx" then there's nothing distinctive between the 'blah' and the 'X-' to search for. (The only thing I can think of is to search for all the proper headers (like Date and Subject), and then search for the pattern r'X-([\w-]+):' and put newlines before that. I suppose I could have a go at that - let me know if you want me to. First, though, would be to check what is actually on the server. You could try running this and letting me know what comes back: >>> import imaplib >>> imap = imaplib.IMAP4(servername, serverport) >>> imap.login(username, password) >>> imap.select(unsure_folder_name, True) >>> response = imap.fetch("1:1", "(RFC822)") >>> imap.logout() >>> print response # whole lot >>> print response[1][0][1] # just the message text >>> print len(response[1][0][1]) # to check for invisible characters =Tony Meyer From rsalz at datapower.com Fri Apr 18 04:57:37 2003 From: rsalz at datapower.com (rsalz@datapower.com) Date: Fri Apr 18 00:01:24 2003 Subject: [Spambayes] training via email forwarding Message-ID: <20030418035729.17463.qmail@smtp.datapower.com> I've done a bit of looking around for a Bayesian email filter, and I'm going to try spambayes because it has the simplest procmail integration, and (seemingly) the simplest overall setup. It's just missing one thing... I'd like to train the filter by forwarding email messages to it. I can do that easily by using the subject line or the "+" convention, has in "rsalz+train-spam" as a mailbox. It's easy to see how to set up procmail rules to catch those. The work to unwrap the outer MIME wrapper, and process the forwarded message, should be easy for you folks to add, saving me the trouble of having to wrestle with MIME.py :) I think this woudl be a useful thing overall, making it easy for spambayes to run in the common "shell account" mail environment many techies have. Thanks. /r$ From T.A.Meyer at massey.ac.nz Fri Apr 18 17:50:00 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 00:50:35 2003 Subject: [Spambayes] pop3proxy_port, pop3proxy_server_name and pop3proxy_server_port options Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2D@its-xchg4.massey.ac.nz> WARNING! The options pop3proxy_port, pop3proxy_server_name and pop3proxy_server_port have been depreciated for a long time. (You should be using pop3proxy_servers and pop3proxy_ports). I have just removed support for these options. If you were still using them, you need to update your configuration files (your database will still be fine). If you use the web interface to do your configuration, your config file will be fine. If you had: pop3proxy_port: 110 pop3proxy_server_name: pop3.example.com pop3proxy_server_port: 110 You should now have: pop3proxy_ports: 110 pop3proxy_servers: pop3.example.com:110 =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 18 18:55:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 01:56:46 2003 Subject: [Spambayes] training via email forwarding Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2F@its-xchg4.massey.ac.nz> > I've done a bit of looking around for a Bayesian email > filter, and I'm going to try spambayes because it has the > simplest procmail integration, and (seemingly) the simplest > overall setup. Don't forget because it's also the most well-tested, and just plain simply the best ;) > It's just missing one thing... Well, *almost* missing. > I'd like to train the filter by forwarding email messages to > it. I can do that easily by using the subject line or the "+" > convention, has in "rsalz+train-spam" as a mailbox. If you were using pop3proxy, you could already do this (via the SMTP proxy). It would be simple to modify the smtpproxy to allow operation in a 'forward whole message' mode rather than what it currently does*. However, I would have thought that you could use the hammiefilter options to do this (you can use it to train with various command-line options). I could be wrong - I don't know anything about procmail. > I think this would be a useful thing overall, making it easy > for spambayes to run in the common "shell account" mail > environment many techies have. Let me know if you want to use smtpproxy in this fashion. It would be very quick to add. If you don't like the SMTP proxy option, and hammiefilter isn't of any use, then you should be able to get any code that you need from smtpproxy or hammiefilter. (Don't forget to submit a patch if you go this route). =Tony Meyer * Currently, it extracts an id from a spambayes header; it uses this id to identify the message in the pop3proxy cache, and uses that message to train. It has to go through this roundabout method to avoid mailers munging up the message. From T.A.Meyer at massey.ac.nz Fri Apr 18 19:18:04 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 02:18:39 2003 Subject: [Spambayes] IMPORTANT: Options class changes Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB30@its-xchg4.massey.ac.nz> [Apologies for the length of this, but this is information that people need to know]. A wee while back I proposed some fairly major changes (improvements!) to the options module. I outlined the basic idea, and only got positive comments (plus some requests, all followed through), so I've gone ahead and made the changes. I am about to commit the Options.py file - a diff will be no good here; although the information in the module is more-or-less the same, the format has changed greatly. If you are interested in how options will work, then please read the comments (__doc__ and __issues__ in particular) and give feedback. Otherwise, the main change is that instead of accessing options via options.option_name, you should use options["section_name", "option_name"] or options.get("section_name", "option_name"). I have also renamed those options where the section name was prepended with a '_' to the option name (e.g. pop3proxy_ports and hammie_header_name are now "pop3proxy", "ports" and "Hammie", "header_name"). HOWEVER, for the moment, the module is 100% backwards compatible (with everything in cvs - your own code is your own!) so nothing will break [touch wood]. I have changed all* the files over, and I will gradually introduce them once it is apparent that no-one will kill me if I do and that there are no apparent problems with the new module. The only thing that you should notice is that you will probably get various warnings about invalid options in your configuration file(s). Your options will still work, but please go ahead and change them to the new correct values. I have tested what I can, but if anything does break then any developer can roll this back to the previous version if they can't fix it. I'll keep an eye on the list, but I sleep when many of you are testing things out. =Tony Meyer * All, except the pspam folder - this is confusing! Does anyone use this? If so, would they mind doing the update, or explaining what is happening to me? From rsalz at datapower.com Fri Apr 18 08:23:26 2003 From: rsalz at datapower.com (Rich Salz) Date: Fri Apr 18 09:54:44 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2F@its-xchg4.massey.ac.nz> Message-ID: I'm not using the proxy; I use IMAP. I use spambayes to filter the mail as it comes in. If somehting slips through, I want to forward an email message to s-b and have it train appropriately. tnx. /r$ From Olivier.Biot at siemens.com Fri Apr 18 15:50:42 2003 From: Olivier.Biot at siemens.com (Biot Olivier) Date: Fri Apr 18 09:54:48 2003 Subject: [Spambayes] BUG: Input from LOCALE not used Message-ID: <6B546A602AD2D211BFF00008C7A4288906831735@hrtades2.atea.be> Hi all, I finally found the reason why the SpamBayes plugin was not loading: I have a Belgian locale with a comma as decimal separator. As a result, a float should be written as "0,10" instead of "0.10" and as a result the SpamBayes plugin crashed on reading the dictionary (I think). I now have to change my default LOCALE settings in Windows in order to get the plugin to work. Could it be possible to update this so SpamBayes will work for all LOCALE settings? Regards, Olivier BIOT PS: Backtrace of the unsuccessful plugin loading: # This window will display output from any programs that import win32traceutil # win32com servers registered with '--debug' are in this category. Outlook Spam Addin module loading SpamAddin - Connecting to Outlook Created new configuration file 'D:\spambayes-1.0a2\Outlook2000\default_configuration.pck' Traceback (most recent call last): File "C:\Python23\lib\site-packages\win32com\universal.py", line 170, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, meth.invkind, args, None, None) File "C:\Python23\lib\site-packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "C:\Python23\lib\site-packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\Python23\lib\site-packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "D:\spambayes-1.0a2\Outlook2000\addin.py", line 611, in OnConnection self.manager = manager.GetManager(application) File "D:\spambayes-1.0a2\Outlook2000\manager.py", line 335, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "D:\spambayes-1.0a2\Outlook2000\manager.py", line 79, in __init__ import_core_spambayes_stuff(self.ini_filename) File "D:\spambayes-1.0a2\Outlook2000\manager.py", line 46, in import_core_spambayes_stuff from spambayes import classifier File "C:\Python23\Lib\site-packages\spambayes\classifier.py", line 40, in ? from spambayes.Options import options File "C:\Python23\Lib\site-packages\spambayes\Options.py", line 557, in ? options.mergefilelike(d) File "C:\Python23\Lib\site-packages\spambayes\Options.py", line 517, in mergefilelike self._update() File "C:\Python23\Lib\site-packages\spambayes\Options.py", line 535, in _update value = getattr(c, fetcher)(section, option) File "C:\Python23\lib\ConfigParser.py", line 318, in getfloat return self._get(section, float, option) File "C:\Python23\lib\ConfigParser.py", line 312, in _get return conv(self.get(section, option)) exceptions.ValueError: invalid literal for float(): 0.20 From skip at pobox.com Fri Apr 18 10:24:58 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Apr 18 10:25:10 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <20030418035729.17463.qmail@smtp.datapower.com> References: <20030418035729.17463.qmail@smtp.datapower.com> Message-ID: <16032.2746.628454.778551@montanaro.dyndns.org> r$> I'd like to train the filter by forwarding email messages to it. I r$> can do that easily by using the subject line or the "+" convention, r$> has in "rsalz+train-spam" as a mailbox. It's easy to see how to set r$> up procmail rules to catch those. The work to unwrap the outer MIME r$> wrapper, and process the forwarded message, should be easy for you r$> folks to add, saving me the trouble of having to wrestle with r$> MIME.py :) How about just "b"ouncing the messages to the correct mailbox? It will add a few new headers (several Resent-This-N-That's and maybe a couple extra Received headers), but those should generally be ignored or could be stripped by an appropriate formail command in your procmailrc file. Skip From skip at pobox.com Fri Apr 18 10:28:17 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Apr 18 10:32:05 2003 Subject: [Spambayes] pop3proxy_port, pop3proxy_server_name and pop3proxy_server_port options In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2D@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2D@its-xchg4.massey.ac.nz> Message-ID: <16032.2945.216626.582701@montanaro.dyndns.org> Tony> You should now have: Tony> pop3proxy_ports: 110 Tony> pop3proxy_servers: pop3.example.com:110 What's the purpose of pop3proxy_ports? Can the ports to listen to be inferred from the pop3proxy_servers list? Skip From rsalz at datapower.com Fri Apr 18 11:51:38 2003 From: rsalz at datapower.com (Rich Salz) Date: Fri Apr 18 10:59:11 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <16032.2746.628454.778551@montanaro.dyndns.org> References: <20030418035729.17463.qmail@smtp.datapower.com> <16032.2746.628454.778551@montanaro.dyndns.org> Message-ID: <3EA010FA.4060802@datapower.com> > How about just "b"ouncing the messages to the correct mailbox? It will add > a few new headers (several Resent-This-N-That's and maybe a couple extra > Received headers), but those should generally be ignored or could be > stripped by an appropriate formail command in your procmailrc file. I thought bounce was an ELM thing, but I looked and see that it's in Pine (which makes sense). Do outlook, mozilla, et al, have it? Forwarding seems like a more general solution. Perhaps I'm wrong. /r$ From francois.granger at free.fr Fri Apr 18 18:06:11 2003 From: francois.granger at free.fr (Francois Granger) Date: Fri Apr 18 11:06:16 2003 Subject: [Spambayes] pop3proxy_port, pop3proxy_server_name and pop3proxy_server_port options In-Reply-To: <16032.2945.216626.582701@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB2D@its-xchg4.massey.ac.nz> <16032.2945.216626.582701@montanaro.dyndns.org> Message-ID: At 09:28 -0500 18/04/2003, in message Re: [Spambayes] pop3proxy_port, pop3proxy_server_name a, Skip Montanaro wrote: > Tony> You should now have: > Tony> pop3proxy_ports: 110 > Tony> pop3proxy_servers: pop3.example.com:110 > >What's the purpose of pop3proxy_ports? Can the ports to listen to be >inferred from the pop3proxy_servers list? From my meory, this allow to have one ip adresse (127.0.0.1) serving for more than one pop server. Something like pop3proxy_ports: 110, 1100 pop3proxy_servers: pop3.example.com:110, pop3.example2.com:110 -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From lists at olivermaunder.co.uk Fri Apr 18 18:58:48 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Fri Apr 18 12:58:51 2003 Subject: [Spambayes] Email package and the CRLF pair In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB1A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB1A@its-xchg4.massey.ac.nz> Message-ID: <3EA02EC8.8020904@olivermaunder.co.uk> Meyer, Tony wrote: >As you may have noticed, we ran into a bit of trouble recently with the >imap filter because messages had only \n (CR) and not \r\n (CRLF). >Message bodies don't really matter here; what matters are the headers. > > That's not necessarily a problem with the email package. I get those errors when moving messages between folders in my mail reader. The problem is caused by the original messages being malformed. Of course, if the email package *is* mis-behaving, that's not going to help. Olly From skip at pobox.com Fri Apr 18 15:26:52 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Apr 18 15:27:04 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <3EA010FA.4060802@datapower.com> References: <20030418035729.17463.qmail@smtp.datapower.com> <16032.2746.628454.778551@montanaro.dyndns.org> <3EA010FA.4060802@datapower.com> Message-ID: <16032.20860.80025.980095@montanaro.dyndns.org> >> How about just "b"ouncing the messages to the correct mailbox? It >> will add a few new headers (several Resent-This-N-That's and maybe a >> couple extra Received headers), but those should generally be ignored >> or could be stripped by an appropriate formail command in your >> procmailrc file. Rich> I thought bounce was an ELM thing, but I looked and see that it's Rich> in Pine (which makes sense). Do outlook, mozilla, et al, have it? I only recently discovered that VM in X/Emacs has it. I believe Eudora (at least on Windows) has it as well. I always thought of it as a sort of hackish way to pass messages along, but it seems like it might work in this case. Rich> Forwarding seems like a more general solution. Perhaps I'm wrong. Oh yeah. I was just trying to get you going in the short term and save someone here from doing extra work. ;-) Skip From dave at boost-consulting.com Fri Apr 18 16:39:23 2003 From: dave at boost-consulting.com (David Abrahams) Date: Fri Apr 18 15:39:34 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: Message-ID: <84el3zwq6s.fsf@boost-consulting.com> Tim Stone - Four Stones Expressions writes: > I'll probably check it in later today or tonight. If you don't mind checking > it (on a test account) and letting us know if it helps, we'd greatly > appreciate it. Did that checkin ever happen? I'm still looking forward to trying it out. Regards, -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Fri Apr 18 16:00:03 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 18 16:00:09 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: <84el3zwq6s.fsf@boost-consulting.com> Message-ID: 4/18/2003 2:39:23 PM, David Abrahams wrote: >Did that checkin ever happen? I'm still looking forward to trying it >out. Yup... I checked in the fix, and Tony checked in a whole bunch of stuff, including a UI for IMAP filter configuration. It'll probably be easier to reup the entire tree than to try to figure out what changed. We're still testing, so caveat emptor... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at morpheus.demon.co.uk Fri Apr 18 21:57:01 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Fri Apr 18 16:10:41 2003 Subject: [Spambayes] Email package and the CRLF pair References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB1A@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: > I'm hoping that there are still some Python experts hanging about here > and they've just been quiet for a bit. Not an expert, but I'll do my best... > As you may have noticed, we ran into a bit of trouble recently with the > imap filter because messages had only \n (CR) and not \r\n (CRLF). > Message bodies don't really matter here; what matters are the headers. \n is LF not CR. You nearly had me confused with that... > As far as I can see*, when the email package** returns a message via > str() or as_string(), it only puts a CR after each header. You mean LF, but with that change, you are right. Here's a simple example: >python Python 2.2.2 (#37, Oct 14 2002, 17:02:34) [MSC 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import email >>> from email.MIMEText import MIMEText >>> msg = MIMEText("Test") >>> msg["Subject"] = "A Test" >>> msg.as_string() 'Content-Type: text/plain; charset="us-ascii"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nSubject: A Test\n\nTest\n' >>> > But RFC2822 says that: > "Header fields are lines composed of a field name, followed by a colon > (":"), followed by a field body, and terminated by CRLF." [Section > 2.2] Interesting. Does it say anything about line terminators in the body? It probably should, as email is a pure-text medium, so you should be considering line termination for the whole message, not just the headers. For example, is the following valid? (Ignoring issues of required headers) Subject: A Test[CR][LF] From: Me [CR][LF] [CR][LF] [CR][LF] Now we start to get nasty[LF] Let's mix things up completely[CR][LF] And a Mac variation, just for fun[CR] So how does this look?[CR][LF] My betting is that RFC822 doesn't disallow it, which likely means that the RFC is, to some extent, broken... > Is this therefore a bug in the email package that I should report? > Advice from someone that knows more than me (is Barry there?) about > these things would be appreciated. My instinct is to say that "it depends how you look at it". While the RFC mandates CRLF, "usual practice" seems to be that the platform- specific newline character sequence is used internally, and often when messages are stored in files as well. It's only when transmitting data across the network that standardising on CRLF is important. I'd imagine that most network transport code converts \n to CRLF when sending data, so that the "internal" format doesn't matter in practice. Look at smtplib.py in the standard library: def quotedata(data): """Quote data for email. Double leading '.', and change Unix newline '\\n', or Mac '\\r' into Internet CRLF end-of-line. """ Interestingly, I can't see anything equivalent in imaplib.py. So maybe it's best argued as a bug in imaplib, rather than in the email package. (If the IMAP protocol mandates CRLF, then imaplib should ensure that rather than making client code - which generally uses \n internally - care about it). OK, I looked a bit further. The only imaplib method which deals with messages in string format is append(), so I'd argue that that method should convert CR, CRLF, and LF sequences into the canonical CRLF that the IMAP protocol needs. If nothing else, it's a case of the old "be lenient in what you accept and strict in what you deliver" mantra. For a workaround in client code, if you want to force CRLF in the message string, do def force_CRLF(data): """Make sure data uses CRLF for line termination. Nicked the regex from smtplib.quotedata. """ return re.sub(r'(?:\r\n|\n|\r(?!\n))', "\r\n", data) # Now, convert to canonical line endings msg_str = force_CRLF(msg_str) This is a bit more rigorous than a simple str.replace("\n", "\r\n"). Hope this helps, Paul. -- This signature intentionally left blank From dave at boost-consulting.com Fri Apr 18 19:35:48 2003 From: dave at boost-consulting.com (David Abrahams) Date: Fri Apr 18 18:36:01 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: (Tim Stone's message of "Fri, 18 Apr 2003 15:00:03 -0500") References: Message-ID: <841xzzwi0r.fsf@boost-consulting.com> Tim Stone - Four Stones Expressions writes: > 4/18/2003 2:39:23 PM, David Abrahams wrote: > >>Did that checkin ever happen? I'm still looking forward to trying it >>out. > > Yup... I checked in the fix, and Tony checked in a whole bunch of stuff, > including a UI for IMAP filter configuration. It'll probably be easier to > reup the entire tree than to try to figure out what changed. We're still > testing, so caveat emptor... Indeed. The UI script even has a syntax error (missing a colon). Is there someplace I can look for the syntax of bayescustomize.ini now? In the meantime, I guess I am going to hack Options.py again :( -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Fri Apr 18 18:39:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 18 18:40:05 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: <841xzzwi0r.fsf@boost-consulting.com> Message-ID: 4/18/2003 5:35:48 PM, David Abrahams wrote: >Indeed. The UI script even has a syntax error (missing a colon). >Is there someplace I can look for the syntax of bayescustomize.ini >now? In the meantime, I guess I am going to hack Options.py again :( Argh... I'm having trouble getting it to run myself. There's clearly some further work that needs to be done... > >-- >Dave Abrahams >Boost Consulting >www.boost-consulting.com > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Sat Apr 19 11:53:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 18:53:57 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB3E@its-xchg4.massey.ac.nz> > Indeed. The UI script even has a syntax error (missing a > colon). Well, there aren't any scripts that use the UI module, so that really shouldn't matter. And even the trial ones that I haven't checked in don't use that function...It's fixed now, anyway. > Is there someplace I can look for the syntax of > bayescustomize.ini now? In the meantime, I guess I am going > to hack Options.py again :( I still don't understand why you need to touch Options.py. In fact, given the new format, it is *highly* unadvisable. Add any configuration you need to in your bayescustomize.ini, or wait a day or two and use the web ui when access is provided to it. bayescustomize.ini uses the standard configuration format. The options you need are [imap] server, port, username, password (unless you use the -p option) and the folders. You probably need to set the pop3proxy options that specify where the database should be filed. Sorry this isn't the easiest of options, but this is all still very much in early development. It's still really only suitable for testing, not for production use. =Tony Meyer From barry at python.org Sat Apr 19 00:11:24 2003 From: barry at python.org (Barry Warsaw) Date: Fri Apr 18 19:11:26 2003 Subject: [mimelib-devel] Fwd: [Spambayes] Email package and the CRLFpair In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318CE16@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1318CE16@its-xchg4.massey.ac.nz> Message-ID: <1050707793.3531.45.camel@barry> On Fri, 2003-04-18 at 19:00, Tony Meyer wrote: > Thanks for clearing this up Barry (it nicely matches what Paul Moore > said, too). I guess the answer is then that we *do* need to submit a > bug, but on imaplib, not the email package. I'll do this. Cool. > I didn't mean to cast any aspersions on the email package None taken! > - my thought > was that since it's the internet email format (i.e. RFC2822) that > specifies the CRLF, then the email package would handle this since all > the protocols would need to convert it, unless they're using something > other than RFC822. All a matter of perspective, though, and you're the > boss :) :) I can see an option to Generator.flatten() to produce text representations in CRLF format. Since we're trying to design a better interface to the Generator anyway, this is definitely a reasonable thing to discuss. I probably wouldn't want it to be the default though, since it's simply more convenient to deal with newline line termination, and practicality beats purity. :) Note that the Parser should be able to handle any of the common line endings. -Barry From ta-meyer at ihug.co.nz Sat Apr 19 12:00:49 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Apr 18 19:12:06 2003 Subject: [mimelib-devel] Fwd: [Spambayes] Email package and the CRLFpair In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130152A6DD@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CE16@its-xchg4.massey.ac.nz> > The intent is for the Parser to handle any line endings, but > for the Generator to output "normal" Python line endings, > e.g. \n. When the message is transmitted over a protocol > such as SMTP that requires different line endings, it's up to > the protocol module to normalize them. This is in fact what > smtplib.py does. An imap module should do the same thing. Thanks for clearing this up Barry (it nicely matches what Paul Moore said, too). I guess the answer is then that we *do* need to submit a bug, but on imaplib, not the email package. I'll do this. I didn't mean to cast any aspersions on the email package - my thought was that since it's the internet email format (i.e. RFC2822) that specifies the CRLF, then the email package would handle this since all the protocols would need to convert it, unless they're using something other than RFC822. All a matter of perspective, though, and you're the boss :) Again, thanks for the clarification. =Tony Meyer From dave at boost-consulting.com Fri Apr 18 20:11:35 2003 From: dave at boost-consulting.com (David Abrahams) Date: Fri Apr 18 19:13:05 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: (Tim Stone's message of "Fri, 18 Apr 2003 17:39:37 -0500") References: Message-ID: <84vfxbv1so.fsf@boost-consulting.com> Tim Stone - Four Stones Expressions writes: > 4/18/2003 5:35:48 PM, David Abrahams wrote: > >>Indeed. The UI script even has a syntax error (missing a colon). >>Is there someplace I can look for the syntax of bayescustomize.ini >>now? In the meantime, I guess I am going to hack Options.py again :( > > Argh... I'm having trouble getting it to run myself. There's clearly some > further work that needs to be done... And here's another problem: cd ~/src/spambayes/ ./imapfilter.py -c -D ~/bayes.db dave Traceback (most recent call last): File "./imapfilter.py", line 461, in ? run() File "./imapfilter.py", line 451, in run File "./imapfilter.py", line 362, in Filter folder.Filter(self.classifier, self.spam_folder, self.unsure_folder) File "./imapfilter.py", line 322, in Filter msg.Save() File "./imapfilter.py", line 198, in Save self.extractTime(), self.as_string()) File "./imapfilter.py", line 181, in extractTime return imaplib.Time2Internaldate(\ TypeError: argument must be 9-item sequence, not None Compilation exited abnormally with code 1 at Fri Apr 18 15:49:18 -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Fri Apr 18 19:14:33 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 18 19:18:51 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: <84vfxbv1so.fsf@boost-consulting.com> Message-ID: 4/18/2003 6:11:35 PM, David Abrahams wrote: And here's another problem: A correct version of the imapfilter has just been checked in. It works fine for me now. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Sat Apr 19 12:32:36 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 19:33:12 2003 Subject: [Spambayes] pop3proxy_port, pop3proxy_server_name andpop3proxy_server_port options Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB49@its-xchg4.massey.ac.nz> > > Tony> You should now have: > > Tony> pop3proxy_ports: 110 > > Tony> pop3proxy_servers: pop3.example.com:110 > > > >What's the purpose of pop3proxy_ports? Can the ports to > listen to be > >inferred from the pop3proxy_servers list? > > From my memory, this allow to have one ip adresse (127.0.0.1) serving > for more than one pop server. Something like > > pop3proxy_ports: 110, 1100 > pop3proxy_servers: pop3.example.com:110, pop3.example2.com:110 That's exactly it. I must admit, I was very confused when I first used pop3proxy about what these did. pop3proxy_listen_ports or pop3proxy_proxy_ports would probably be a better name, but option names are not easily changed... =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Apr 19 12:51:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 19:51:40 2003 Subject: [Spambayes] BUG: Input from LOCALE not used Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB4B@its-xchg4.massey.ac.nz> > I finally found the reason why the SpamBayes plugin was not > loading: I have a Belgian locale with a comma as decimal > separator. As a result, a float should be written as "0,10" > instead of "0.10" and as a result the SpamBayes plugin > crashed on reading the dictionary (I think). > > I now have to change my default LOCALE settings in Windows in > order to get the plugin to work. Could it be possible to > update this so SpamBayes will work for all LOCALE settings? This has definitely been reported before. I thought that a bug was submitted and that Mark checked in a patch, but I can't find the bug anywhere on sf, and I can't find a fix in the check-in archives either. (Am I imagining the whole thing??). Anyway, Mark will probably get to this when he has a chance. I *thought* that what happened was that spambayes set the locale to English within the plugin and that solved it. =Tony Meyer From vanhorn at whidbey.com Fri Apr 18 17:57:26 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Apr 18 19:57:30 2003 Subject: [Spambayes] Getting started References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB49@its-xchg4.massey.ac.nz> Message-ID: <3EA090E6.718147E2@whidbey.com> And I mean *really* getting started. I've been following this list for months, but haven't actually knuckled down to use it. Now it's time. I will be using the pop3proxy approach, but I don't think I even have Python on this system, and I've never used CVS at all on any system. I started by downloading TortoiseCVS, which seems to be properly installed. I guessed that Python would be the next step, so from that area I learned I should be using something like this: cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/python login Based on that, I start a TortoiseCVS Checkout using the following settings: Protocol: Password server (:pserver:) Server: cvs.sourceforge.net Port: or 2401 Repository directory: /cvsroot/python User name: anyonymous Module: CVSROOT When I proceed, I get this message: In C:\Source: cvs -q checkout -P CVSROOT CVSROOT=:pserver:anonymous@cvs.sourceforge.net:/cvsroot/python cvs checkout: Empty password used - try 'cvs login' with a real password U CVSROOT/checkoutlist U CVSROOT/commitinfo U CVSROOT/config U CVSROOT/cvsignore U CVSROOT/cvswrappers U CVSROOT/editinfo U CVSROOT/loginfo U CVSROOT/modules U CVSROOT/notify U CVSROOT/rcsinfo U CVSROOT/syncmail U CVSROOT/taginfo U CVSROOT/verifymsg Success, CVS operation completed However, the resulting folder only includes 16 files for a total of 72 KB of disk. Somehow I don't think that's what I'm supposed to end up with, and it certainly doesn't tell me what I should do next. What simple, bonehead thing am I missing here? Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From tim at fourstonesExpressions.com Fri Apr 18 20:07:07 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 18 20:07:17 2003 Subject: [Spambayes] Getting started In-Reply-To: <3EA090E6.718147E2@whidbey.com> Message-ID: <2YOJA9FAB7JJEP5341ZUPKYS041X.3ea0932b@myst> Well, it's about time! Ok, my first recommendation would be to download the alpha2 version, which is pretty much self consistent and functional. The current cvs versions are in considerable flux at the moment, and you'll likely end up as frustrated as David Abrahams if you try to use it... Alpha2 is available on the website as a zip or a tarball. But... if you really want to install cvs, there are some specific instructions on the sourceforge main site for how to install and configure a client. You have to do some stuff to get a certificate installed on sourceforge, etc., etc. It's a bit of a pain, but it really does work once you get set up. 4/18/2003 6:57:26 PM, "G. Armour Van Horn" wrote: >And I mean *really* getting started. I've been following this list for >months, but haven't actually knuckled down to use it. Now it's time. I will >be using the pop3proxy approach, but I don't think I even have Python on >this system, and I've never used CVS at all on any system. > >I started by downloading TortoiseCVS, which seems to be properly installed. >I guessed that Python would be the next step, so from that area I learned I >should be using something like this: > cvs > -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/python > login > >Based on that, I start a TortoiseCVS Checkout using the following settings: > >Protocol: Password server (:pserver:) >Server: cvs.sourceforge.net >Port: or 2401 >Repository directory: /cvsroot/python >User name: anyonymous >Module: CVSROOT > >When I proceed, I get this message: > > In C:\Source: cvs -q checkout -P CVSROOT > CVSROOT=:pserver:anonymous@cvs.sourceforge.net:/cvsroot/python > > cvs checkout: Empty password used - try 'cvs login' with a real > password > > U CVSROOT/checkoutlist > U CVSROOT/commitinfo > U CVSROOT/config > U CVSROOT/cvsignore > U CVSROOT/cvswrappers > U CVSROOT/editinfo > U CVSROOT/loginfo > U CVSROOT/modules > U CVSROOT/notify > U CVSROOT/rcsinfo > U CVSROOT/syncmail > U CVSROOT/taginfo > U CVSROOT/verifymsg > > Success, CVS operation completed > >However, the resulting folder only includes 16 files for a total of 72 KB >of disk. Somehow I don't think that's what I'm supposed to end up with, and >it certainly doesn't tell me what I should do next. What simple, bonehead >thing am I missing here? > >Van > >-- >---------------------------------------------------------- >Sign up now for Quotes of the Day, a handful of quotations >on a theme delivered every morning. >Enlightenment! Daily, for free! >mailto:twisted@whidbey.com?subject=Subscribe_QOTD > >For web hosting and maintenance, >visit Van's home page: http://www.domainvanhorn.com/van/ >---------------------------------------------------------- > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Sat Apr 19 13:09:50 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 20:10:31 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB4E@its-xchg4.massey.ac.nz> > And here's another problem: [...] > File "./imapfilter.py", line 181, in extractTime > return imaplib.Time2Internaldate(\ > TypeError: argument must be 9-item sequence, not None > > Compilation exited abnormally with code 1 at Fri Apr 18 15:49:18 I've found this. The email message class returns None if a header isn't found (i.e. message["header_that_doesn't exist"] returns None) rather than raising an exception; who knows why, but that's what it does. I've checked in a fix for this. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Apr 19 13:29:20 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 20:29:57 2003 Subject: [Spambayes] Getting started Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB4F@its-xchg4.massey.ac.nz> > >Based on that, I start a TortoiseCVS Checkout using the following > >settings: TortoiseCVS is great - much better than WinCVS that sf recommends :) Your problem is that you've got "CVSROOT" as the module. As well as that, you're checking out from the python directory instead of spambayes. Change it to: Protocol: Password server (:pserver:) Server: cvs.sourceforge.net Port: Repository directory: /cvsroot/spambayes User name: anyonymous Module: spambayes > Ok, my first recommendation would be to download the alpha2 > version, which is pretty much self consistent and functional. I would have to echo Tim's comments, except to add that pop3proxy is somewhat more stable, even in cvs, than imapfilter (since it's much older and more widely used), and that if you are interested in *testing* rather than just *using* then cvs is better. What I do is have two copies - one stable one that I use for day-to-day email, and a latest-cvs one that I use for development/testing. > I don't think I even have Python on this system You will need Python to use pop3proxy, whether you have the alpha release or the cvs. Type "python" in a command line to see if you have it. I gather than OS X comes with 2.2 installed; cygwin has it as an option in the setup application; Windows doesn't by default. Not sure about other platforms. Again, up to you which version of Python you go for - I recommend 2.2.2 (download it from http://python.org), but you can get 2.3a or the cvs if you like. (If using cvs, change the module & directory to python). =Tony Meyer From vanhorn at whidbey.com Fri Apr 18 18:43:09 2003 From: vanhorn at whidbey.com (G. Armour Van Horn) Date: Fri Apr 18 20:44:09 2003 Subject: [Spambayes] Getting started References: <2YOJA9FAB7JJEP5341ZUPKYS041X.3ea0932b@myst> Message-ID: <3EA09B9D.3FB79DA3@whidbey.com> Tim Stone - Four Stones Expressions wrote: > Well, it's about time! Stipulated > Ok, my first recommendation would be to download the alpha2 version, which is > pretty much self consistent and functional. The current cvs versions are in > considerable flux at the moment, and you'll likely end up as frustrated as > David Abrahams if you try to use it... Alpha2 is available on the website as > a zip or a tarball. It seems that things are so much in flux that I would be better off either waiting until at least a beta is done, or staying up to date with the current development. At least right now it seems like every problem is getting dealt with in a matter of hours, while it could be days or weeks before alpha3 comes along. If I'm going to be on the bleeding edge, I want a good view from the front, damn it! > But... if you really want to install cvs, there are some specific instructions > on the sourceforge main site for how to install and configure a client. You > have to do some stuff to get a certificate installed on sourceforge, etc., > etc. It's a bit of a pain, but it really does work once you get set up. I saw that, but it appeared to only apply if you wanted to commit, I have no expectation of committing any code back to the project. Of course, if the only way to do it is to setup that way, then I could just refrain from committing any changes I make (certainly easy enough). But I did get the impression that anyonymous read-only access was supposed to be available. Van -- ---------------------------------------------------------- Sign up now for Quotes of the Day, a handful of quotations on a theme delivered every morning. Enlightenment! Daily, for free! mailto:twisted@whidbey.com?subject=Subscribe_QOTD For web hosting and maintenance, visit Van's home page: http://www.domainvanhorn.com/van/ ---------------------------------------------------------- From T.A.Meyer at massey.ac.nz Sat Apr 19 13:43:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 20:44:28 2003 Subject: [Spambayes] Email package and the CRLF pair Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB50@its-xchg4.massey.ac.nz> > Not an expert, but I'll do my best... Well, your comments matched those of *the* expert, so that's not bad! > \n is LF not CR. You nearly had me confused with that... Whoops. So it is. (you see, this is why you have both, for confused people like me ;) > Interesting. Does it say anything about line terminators in > the body? It probably should, as email is a pure-text medium, > so you should be considering line termination for the whole > message, not just the headers. I hadn't bothered to look that far, but yes it does. It (RFC2822) says: "CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body." > For example, is the following valid? (Ignoring issues of required > headers) > > Subject: A Test[CR][LF] > From: Me [CR][LF] > [CR][LF] > [CR][LF] Yes. > Now we start to get nasty[LF] No. > Let's mix things up completely[CR][LF] Yes. > And a Mac variation, just for fun[CR] No. > So how does this look?[CR][LF] Yes. This: This is a really odd one.[LF][CR] Is also not valid. > Interestingly, I can't see anything equivalent in imaplib.py. > So maybe it's best argued as a bug in imaplib, rather than in > the email package. (If the IMAP protocol mandates CRLF, then > imaplib should ensure that rather than making client code - > which generally uses \n internally - care about it). This is what Barry argued as well, so this is what it was submitted as. The way I see it, the IMAP protocol mandates RFC822, and the email package implements RFC822, so it's the email package's job. However, I guess the email package really does more than that, and all the arguments Barry put forth make sense. > For a workaround in client code, if you want to force CRLF in > the message string, do [...] > This is a bit more rigorous than a simple str.replace("\n", "\r\n"). I thought that it could be a bit better, but I'm the first to admit that I'm weak with regex. Thanks for that, I'll switch what we use to this. =Tony Meyer From tim_one at email.msn.com Fri Apr 18 22:18:08 2003 From: tim_one at email.msn.com (Tim Peters) Date: Fri Apr 18 21:18:46 2003 Subject: [Spambayes] Getting started In-Reply-To: <3EA090E6.718147E2@whidbey.com> Message-ID: [G. Armour Van Horn] > ... > I started by downloading TortoiseCVS, which seems to be properly > installed. I guessed that Python would be the next step, so from that > area I learned I should be using something like this: > cvs > -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/python > login Hmm. I hope you're not trying to download Python from CVS. Which OS are you using? I'm guessing some flavor of Windows, based on the version of CVS you're using. In that case, download the Windows installer for Python instead: http://www.python.org/2.2.2/ I'm the guy who builds that installer, and I know for a fact that not particularly bright 7-year-olds have installed Python from it without any help. But if you get stuck, don't be shy about asking . From ta-meyer at ihug.co.nz Sat Apr 19 13:21:44 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Fri Apr 18 21:25:37 2003 Subject: [mimelib-devel] Fwd: [Spambayes] Email package and the CRLFpair In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130152A6EA@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CE19@its-xchg4.massey.ac.nz> > I can see an option to Generator.flatten() to produce text > representations in CRLF format. Since we're trying to design > a better interface to the Generator anyway, this is > definitely a reasonable thing to discuss. I don't pretend to have any expertise here, but I think that option sounds like a good idea. Go ahead and add it to the discussion :) (Nice timing for our problem to come up while you're discussing it...) > I probably wouldn't want it to be the default though, since > it's simply more convenient to deal with newline line > termination, and practicality beats purity. :) Fair enough :) > Note that the Parser should be able to handle any of the > common line endings. Some imap servers seem to as well...but one was strict enough to consider messages with only \n as having no headers and only a body...(apologies if I offend anyone here, but the more I use it, the more I dislike imap...) I've submitted a bug, so hopefully this'll get done sooner or later (it should be easy enough to do). For our (spambayes) purposes we'll just do it ourselves since we have to be compatible with Python 2.2. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Apr 19 14:27:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 21:28:30 2003 Subject: [Spambayes] training via email forwarding Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB54@its-xchg4.massey.ac.nz> > I'm not using the proxy; I use IMAP. I use spambayes to > filter the mail as it comes in. If something slips through, > I want to forward an email message to s-b and have it train > appropriately. How fixed on forwarding are you? You can use the imap filter to train only and not classify (-t option). The way this works is that you move mail you want to train into certain folders (you specify which ones). A week or so ago I offered training options for imap, one of which was forwarding, and no-one went for that one (they all liked the folder method more). However, if you're using SMTP for outgoing mail, you can still use the smtpproxy if we set it up to work forwarding whole messages as well as via a message key. Let me know if you want me to put something together and check it in. =Tony Meyer From T.A.Meyer at massey.ac.nz Sat Apr 19 14:58:52 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 18 21:59:58 2003 Subject: [Spambayes] training via email forwarding Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB57@its-xchg4.massey.ac.nz> > > You can use the imap filter to train > > only and not classify (-t option). The way this works is that you > > move mail you want to train into certain folders (you specify which > > ones). > > A cron job that runs mboxtrain periodically would do it. If (when?) you do get this going, it would be great if you could post it to the list so that we can add it to the integration docs. > > However, if you're using SMTP for outgoing mail, > I don't want to / can't run a server on the mail server. Fair enough with the don't want to (and it saves me a job!). You don't need to run a server, though. The SMTP proxy just proxies to whatever server your MUA connects to. So your MUA connects to localhost, but thinks it's connecting to smtp.example.com. =Tony Meyer From mhammond at skippinet.com.au Sat Apr 19 13:27:16 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Fri Apr 18 22:27:53 2003 Subject: [Spambayes] BUG: Input from LOCALE not used In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB4B@its-xchg4.massey.ac.nz> Message-ID: <000601c3061b$3290f430$530f8490@eden> > This has definitely been reported before. I thought that a bug was > submitted and that Mark checked in a patch, but I can't find the bug > anywhere on sf, and I can't find a fix in the check-in > archives either. > (Am I imagining the whole thing??). > > Anyway, Mark will probably get to this when he has a chance. I > *thought* that what happened was that spambayes set the locale to > English within the plugin and that solved it. Oops - that was sitting modified in my tree. I just checked in a fix to addin.py to fix this, although the same issue is almost certain to bite pop3propxy users on that platform. The core should include the fix I made, IMO: > # Set our locale to be English, so our config parser works OK > # (This should almost certainly be done elsewhere, but as no one > # else seems to have an opinion on where this is, here is as good > # as any! > import locale > locale.setlocale(locale.LC_NUMERIC, "en") Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2072 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030419/f6eed906/winmail.bin From rsalz at datapower.com Fri Apr 18 22:52:03 2003 From: rsalz at datapower.com (Rich Salz) Date: Fri Apr 18 22:37:02 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB54@its-xchg4.massey.ac.nz> Message-ID: > How fixed on forwarding are you? I like forwarding because I can forward to myself and procmail will arrange for the training to happen. > You can use the imap filter to train > only and not classify (-t option). The way this works is that you move > mail you want to train into certain folders (you specify which ones). A cron job that runs mboxtrain periodically would do it. It doesn't look that tough to teach mboxtrain about forwarded email messages, so that's probably what I'd do. > A week or so ago I offered training options for imap, one of which was > forwarding, and no-one went for that one (they all liked the folder > method more). It takes all kinds. :) > However, if you're using SMTP for outgoing mail, you can still use the > smtpproxy if we set it up to work forwarding whole messages as well as > via a message key. Let me know if you want me to put something together > and check it in. I don't want to / can't run a server on the mail server. /r$ From rsalz at datapower.com Fri Apr 18 23:02:09 2003 From: rsalz at datapower.com (Rich Salz) Date: Fri Apr 18 22:37:07 2003 Subject: [Spambayes] training via email forwarding In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB57@its-xchg4.massey.ac.nz> Message-ID: I'll most definitely send my diffs along. I connect to the server from two hosts, work (linux) and home (windows), so running a proxy isn't convenient. Basically I have a mail server where I can run procmail, and not much more. At this point, I've learned/realized/been-taught that the only real difference between forwarding and mboxtrain is the latter requires (a) cron; and (b) periodic clean-out of the spam. I'll try to whip up a patch. /r$ From francois.granger at free.fr Sat Apr 19 11:26:45 2003 From: francois.granger at free.fr (Francois Granger) Date: Sat Apr 19 04:26:51 2003 Subject: [Spambayes] Getting started In-Reply-To: <3EA090E6.718147E2@whidbey.com> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB49@its-xchg4.massey.ac.nz> <3EA090E6.718147E2@whidbey.com> Message-ID: At 16:57 -0700 on 18/04/2003, in message [Spambayes] Getting started, G. Armour Van Horn wrote: >And I mean *really* getting started. I've been following this list for >months, but haven't actually knuckled down to use it. Now it's time. I will >be using the pop3proxy approach, but I don't think I even have Python on >this system, and I've never used CVS at all on any system. My advice would be to forget CVS. Go to Python.org http://python.org/2.2.2/ download a distro of Python for your plateforme (Windows 2000) http://python.org/ftp/python/2.2.2/Python-2.2.2.exe then download the build of Spambayes http://prdownloads.sourceforge.net/spambayes/spambayes-1.0a2.zip?download and voil? ! -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From francois.granger at free.fr Sat Apr 19 11:32:25 2003 From: francois.granger at free.fr (Francois Granger) Date: Sat Apr 19 04:32:31 2003 Subject: [Spambayes] Getting started In-Reply-To: <3EA09B9D.3FB79DA3@whidbey.com> References: <2YOJA9FAB7JJEP5341ZUPKYS041X.3ea0932b@myst> <3EA09B9D.3FB79DA3@whidbey.com> Message-ID: At 17:43 -0700 on 18/04/2003, in message Re: [Spambayes] Getting started, G. Armour Van Horn wrote: >changes I make (certainly easy enough). But I did get the impression that >anyonymous read-only access was supposed to be available. It is available. See https://sourceforge.net/cvs/?group_id=61702 =================== Anonymous CVS Access This project's SourceForge.net CVS repository can be checked out through anonymous (pserver) CVS with the following instruction set. The module you wish to check out must be specified as the modulename. When prompted for a password for anonymous, simply press the Enter key. To determine the names of the modules created by this project, you may examine their CVS repository via the provided web-based CVS repository viewer. cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/spambayes login cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/spambayes co modulename =================== -- Hofstadter's Law : It always takes longer than you expect, even when you take into account Hofstadter's Law. From dave at boost-consulting.com Sat Apr 19 08:36:36 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sat Apr 19 07:36:50 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB3E@its-xchg4.massey.ac.nz> Message-ID: <84he8uvhvf.fsf@boost-consulting.com> "Meyer, Tony" writes: >> Indeed. The UI script even has a syntax error (missing a >> colon). > > Well, there aren't any scripts that use the UI module, so that really > shouldn't matter. ?!? You can't even invoke the script unless it can be compiled! > And even the trial ones that I haven't checked in > don't use that function... How did you manage to test them? > It's fixed now, anyway. > >> Is there someplace I can look for the syntax of >> bayescustomize.ini now? In the meantime, I guess I am going >> to hack Options.py again :( > > I still don't understand why you need to touch Options.py. Because there's no obvious documentation which describes what goes in bayescustomize.ini. I wrote a bayescustomize.ini, but the entries seemed to be ignored. > In fact, given the new format, it is *highly* unadvisable. Add any > configuration you need to in your bayescustomize.ini, or wait a day > or two and use the web ui when access is provided to it. I have serious doubts that I'll be able to run a browser on the machine where I want to do this, so I don't think the web ui will be an option. > bayescustomize.ini uses the standard configuration format. ...which is documented where? > The options you need are [imap] server, port, username, password > (unless you use the -p option) and the folders. You probably need > to set the pop3proxy options that specify where the database should > be filed. > > Sorry this isn't the easiest of options, but this is all still very much > in early development. It's still really only suitable for testing, not > for production use. That part I can understand. -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Sat Apr 19 08:50:24 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sat Apr 19 07:50:37 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB4E@its-xchg4.massey.ac.nz> Message-ID: <84d6jivh8f.fsf@boost-consulting.com> "Meyer, Tony" writes: >> And here's another problem: > [...] >> File "./imapfilter.py", line 181, in extractTime >> return imaplib.Time2Internaldate(\ >> TypeError: argument must be 9-item sequence, not None >> >> Compilation exited abnormally with code 1 at Fri Apr 18 15:49:18 > > I've found this. The email message class returns None if a header isn't > found (i.e. message["header_that_doesn't exist"] returns None) rather > than raising an exception; who knows why, but that's what it does. > > I've checked in a fix for this. Not yet, I think. After an update: %python imapfilter.py -c -D ~/bayes.db Traceback (most recent call last): File "imapfilter.py", line 509, in ? run() File "imapfilter.py", line 499, in run imap_filter.Filter() File "imapfilter.py", line 391, in Filter self.unsure_folder) File "imapfilter.py", line 350, in Filter msg.Save() File "imapfilter.py", line 201, in Save msg_time = self.extractTime() File "imapfilter.py", line 178, in extractTime return imaplib.Time2Internaldate(\ TypeError: argument must be 9-item sequence, not None -- Dave Abrahams Boost Consulting www.boost-consulting.com From lists at morpheus.demon.co.uk Sat Apr 19 17:59:53 2003 From: lists at morpheus.demon.co.uk (Paul Moore) Date: Sat Apr 19 11:59:46 2003 Subject: [Spambayes] Email package and the CRLF pair References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB50@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: >> Not an expert, but I'll do my best... > > Well, your comments matched those of *the* expert, so that's not bad! Blind luck :-) >> Interesting. Does it say anything about line terminators in >> the body? It probably should, as email is a pure-text medium, >> so you should be considering line termination for the whole >> message, not just the headers. > > I hadn't bothered to look that far, but yes it does. It (RFC2822) says: > "CR and LF MUST only occur together as CRLF; they MUST NOT appear > independently in the body." OK, that's definitive. But reality differs. Look at any mbox file on a Unix system and you'll see LF terminators. Actually, if you look at RFC822, section 1.1 (Scope), you'll see: This standard specifies a syntax for text messages that are sent among computer users, within the framework of "electronic mail". and later: Note: This standard is NOT intended to dictate the internal for- mats used by sites, So this is pretty clear that RFC822 defines a network format, and not a local file, or other, format. And mandating a specific line termination convention is crucial for wire transfer formats. You *could* argue that RFC822 has nothing to say outside the context of network transfers, and so saying that the email package conforms to RFC822 is meaningless. But that's hair splitting (which is one of my hobbies, but I try not to inflict it on others :-)) The practical fact is that "colloquial" use of the term "RFC822" refers to the header and body structure, but not such things as the line termination, or the mandatory headers, etc. And the email package works with that "colloquial" version. As a general set of rules (which aren't stated anywhere) it's probably fair to say that: 1. Modules which manipulate internet-format data (like email) should work with line terminators of \n internally (just like Python strings do). 2. Modules which transmit files across TCP/IP should canonicalise any form of line ending to CRLF. 3. Modules which present data *received* from TCP/IP (like POP3) should convert data to \n line endings before returning it to the program. 4. Reading from the filesystem should be handled like (3), and should support files opened in text or binary modes (or universal newline mode in Python 2.3) 5. Writing to the filesystem should be done by assuming the data uses \n internally (the above rules make this true) and writing either in binary format (which leaves LFs in the files, ie Unix format) or in text format (which converts the \n characters to the platform native newline sequence). This is basically "be lenient in what you accept, and strict in what you send", plus "use \n internally as a line terminator". The only places I know of where these rules don't work currently are the imaplib bug you just raised, and a bug in the mailbox module I raised a long time ago (http://www.python.org/sf/586899) which basically notes that passing a file open in text mode to the mailbox constructor doesn't work. It might be nice to document these rules (or the right ones, rather than just my unsubstantiated opinions :-)) somewhere. But I don't know where, so I'm not volunteering :-) Paul. -- This signature intentionally left blank From python-spambayes at discworld.dyndns.org Sat Apr 19 14:10:04 2003 From: python-spambayes at discworld.dyndns.org (Charles Cazabon) Date: Sat Apr 19 15:06:07 2003 Subject: [Spambayes] Email package and the CRLF pair In-Reply-To: ; from lists@morpheus.demon.co.uk on Sat, Apr 19, 2003 at 04:59:53PM +0100 References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB50@its-xchg4.massey.ac.nz> Message-ID: <20030419131004.B14801@discworld.dyndns.org> Paul Moore wrote: > > > > I hadn't bothered to look that far, but yes it does. It (RFC2822) says: > > "CR and LF MUST only occur together as CRLF; they MUST NOT appear > > independently in the body." > > OK, that's definitive. But reality differs. Look at any mbox file on a > Unix system and you'll see LF terminators. rfc822 and 2822 only govern the message on-the-wire. Hosts /always/ use the native line ending of the system to store the message on-disk. rfcs in general only apply to protocols as they appear on the wire. Charles -- ----------------------------------------------------------------------- Charles Cazabon GPL'ed software available at: http://www.qcc.ca/~charlesc/software/ ----------------------------------------------------------------------- From T.A.Meyer at massey.ac.nz Sun Apr 20 15:16:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 22:52:38 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB61@its-xchg4.massey.ac.nz> > >> Indeed. The UI script even has a syntax error (missing a > >> colon). > > Well, there aren't any scripts that use the UI module, so > > that really shouldn't matter. > ?!? You can't even invoke the script unless it can be compiled! That's my point; let me spell it out more: at that stage the ImapUI.py module (which had the error) was not imported by *any* spambayes module. Since nothing used it, nothing invoked it, and nothing tried to compile it. (Things have changed since, and it does compile, and work, and get imported by imapfilter.py). > > And even the trial ones that I haven't checked in > > don't use that function... > How did you manage to test them? I did the testing before the function was added (and I tested that function without the line that had the syntax error, which, ironically, was only a test that the previous line worked). Since the function wasn't used, I didn't test afterwards since none of the code that was used had changed. If the module was used by anything else I would have done a cursory check that would have picked up the syntax error, but it really wasn't necessary. > I wrote a bayescustomize.ini, but the entries seemed to be ignored. Spambayes might not have been finding it. It should be found if it is in the current working directory (although I've found this a little unreliable sometimes), but it will definitely be found if the path is in the envar BAYESCUSTOMIZE. (I'll add this to the docs, too). It's easy to test if it's reading it - add some entry that doesn't exist in spambayes (like "option_does_not_exist: True") and try to run pop3proxy or imapfilter. You'll get an error message about an invalid option if it is finding it. If it does find it and ignore entries made, then this is definitely a bug and should be reported (with details about which entries are being ignored). > I have serious doubts that I'll be able to run a browser on > the machine where I want to do this, so I don't think the web > ui will be an option. Well, I guess then I'll add some documentation (maybe a FAQ) for people in this situation. The reason there isn't any yet is that no-one has said that they can't run a browser but want to use pop3proxy or imapfilter. It's probably also been taken for granted that people using spambayes without the web ui know what a config file looks like, given that they're using software that hasn't even made it to beta yet. > > bayescustomize.ini uses the standard configuration format. > ...which is documented where? Or just google for ini file format. Or search for a file suffixed with .ini or .rc. Or look at the default_bayes_customize.ini in the Outlook2000 folder. Or look in the cvs history of Options.py at the one that used to be built in there. (Given that this was the case until a couple of days ago, it's hardly surprising that newer documentation hasn't made it yet. Remember that this is a collaborative, open-source, voluntary project...things take time...). =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Apr 20 15:16:57 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 22:52:44 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB62@its-xchg4.massey.ac.nz> [Time error] > > I've checked in a fix for this. > Not yet, I think. After an update: > > %python imapfilter.py -c -D ~/bayes.db > Traceback (most recent call last): > File "imapfilter.py", line 509, in ? > run() > File "imapfilter.py", line 499, in run > imap_filter.Filter() > File "imapfilter.py", line 391, in Filter > self.unsure_folder) > File "imapfilter.py", line 350, in Filter > msg.Save() > File "imapfilter.py", line 201, in Save > msg_time = self.extractTime() > File "imapfilter.py", line 178, in extractTime > return imaplib.Time2Internaldate(\ > TypeError: argument must be 9-item sequence, not None A couple of things that are odd here. The first is that extractTime should almost never be called now - the imap server should provide the time. Could you execute the following and tell me what you get? (replacing the server-specific stuff, obviously) >>> import imaplib >>> imap = imaplib.IMAP4("mail.example.com") >>> imap.login("username", "password") >>> imap.select() >>> imap.fetch("1:1", "(INTERNALDATE)") >>> imap.logout() The other odd thing is that the date header is definitely present (otherwise line 178 wouldn't execute) so it's either parsedate or mktime that's returning None. Could you change line 178 to this and let me know what prints out? Pd = parsedate(message_date) print pd md = mktime(pd) print md t2i = imaplib.Time2Internaldate(md) print t2i return t2i Thanks for the help in tracing these down. Given the way that each imap server software seems to put it's own interpretation on the RFC, we need as many testers as possible. =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Apr 20 15:33:48 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 22:52:49 2003 Subject: [Spambayes] Email package and the CRLF pair Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB63@its-xchg4.massey.ac.nz> > As a general set of rules (which aren't stated anywhere) it's > probably fair to say that: [rules] Nice rules (and nice and clear, too). > It might be nice to document these rules (or the right ones, > rather than just my unsubstantiated opinions :-)) somewhere. > But I don't know where, so I'm not volunteering :-) Nor do I, so nor am I (well, they're documented in the spambayes mailing list archive now, but I'm not certain that's the right place ;). However, putting them in the right place does sound like a nice idea. I'm sure it's not *just* me that is easily confused by such things - and it would help anyone that develops a module that transmits (over TCP/IP) RFC(2)822 messages. I think I'll add this as a RFE (stealing your rules ;) to Python and see if anyone picks it up. > This is basically "be lenient in what you accept, and strict > in what you send", plus "use \n internally as a line terminator". Definitely +1 (well, perhaps if "\n" is replaced by a wordier "your platform/language's standard line terminator"). =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Apr 20 15:41:31 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 22:52:54 2003 Subject: [Spambayes] BUG: Input from LOCALE not used Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB64@its-xchg4.massey.ac.nz> > Oops - that was sitting modified in my tree. So I'm not going crazy, just psychic ;) > I just checked > in a fix to addin.py to fix this, although the same issue is > almost certain to bite pop3propxy users on that platform. > The core should include the fix I made, IMO: +1 from me. Where do you think it belongs? (i.e. what do you mean by "core"?). classifier.py? (does it effect tokenising?) =Tony Meyer From T.A.Meyer at massey.ac.nz Sun Apr 20 22:04:59 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 22:53:35 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB65@its-xchg4.massey.ac.nz> > > Pd = parsedate(message_date) > ^^ "pd" That's stupid Outlook thinking it knows how to fix my spelling. > > print pd > > md = mktime(pd) > ^^^^^^ "time.mktime" That's stupid me, thinking I had imported that... ;) [Traceback] > File "imapfilter.py", line 180, in extractTime > md = time.mktime(pd) > TypeError: argument must be 9-item sequence, not None Well, it seems that it's the call to parsedate that's returning None. I'll check in a fix, since it's easy enough, but it's strange. The date header must not be in the correct (RFC822) format. > >>> import imaplib > >>> imap = imaplib.IMAP4("mail.example.com") > >>> imap.login("username", "password") > ('OK', ['completed']) > >>> imap.select() > ('OK', ['5282']) > >>> imap.fetch("1:1", "(INTERNALDATE)") > ('OK', ['1 (INTERNALDATE "11-Mar-2003 18:37:56 +0000")']) > >>> imap.logout() > ('BYE', ['CommuniGate Pro IMAP closing connection']) What's odd is that it's finding the INTERNALDATE, so it should be using that, and not extractTime at all. I'll look into this, but it might be tomorrow before I get to it. > No problem; I just wish I could get you an account so you > could try this stuff yourself. :) I do test on the two imap servers I have - one is Courier and one is Netmail. Even with just these two there are notable differences. =Tony Meyer From dave at boost-consulting.com Sun Apr 20 05:56:00 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sun Apr 20 22:53:52 2003 Subject: [Spambayes] Re: imapfilter mangling headers! In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB62@its-xchg4.massey.ac.nz> (Tony Meyer's message of "Sun, 20 Apr 2003 14:16:57 +1200") References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB62@its-xchg4.massey.ac.nz> Message-ID: <84znml7djz.fsf@boost-consulting.com> "Meyer, Tony" writes: > [Time error] >> > I've checked in a fix for this. >> Not yet, I think. After an update: >> >> %python imapfilter.py -c -D ~/bayes.db >> Traceback (most recent call last): >> File "imapfilter.py", line 509, in ? >> run() >> File "imapfilter.py", line 499, in run >> imap_filter.Filter() >> File "imapfilter.py", line 391, in Filter >> self.unsure_folder) >> File "imapfilter.py", line 350, in Filter >> msg.Save() >> File "imapfilter.py", line 201, in Save >> msg_time = self.extractTime() >> File "imapfilter.py", line 178, in extractTime >> return imaplib.Time2Internaldate(\ >> TypeError: argument must be 9-item sequence, not None > > A couple of things that are odd here. The first is that extractTime > should almost never be called now - the imap server should provide the > time. Could you execute the following and tell me what you get? > (replacing the server-specific stuff, obviously) > >>> import imaplib >>> imap = imaplib.IMAP4("mail.example.com") >>> imap.login("username", "password") ('OK', ['completed']) >>> imap.select() ('OK', ['5282']) >>> imap.fetch("1:1", "(INTERNALDATE)") ('OK', ['1 (INTERNALDATE "11-Mar-2003 18:37:56 +0000")']) >>> imap.logout() ('BYE', ['CommuniGate Pro IMAP closing connection']) > The other odd thing is that the date header is definitely present > (otherwise line 178 wouldn't execute) so it's either parsedate or mktime > that's returning None. Could you change line 178 to this and let me > know what prints out? > > Pd = parsedate(message_date) ^^ "pd" > print pd > md = mktime(pd) ^^^^^^ "time.mktime" > print md > t2i = imaplib.Time2Internaldate(md) > print t2i > return t2i After making the above corrections: %python imapfilter.py -c -D ~/bayes.db None Traceback (most recent call last): File "imapfilter.py", line 529, in ? run() File "imapfilter.py", line 519, in run imap_filter.Filter() File "imapfilter.py", line 398, in Filter self.unsure_folder) File "imapfilter.py", line 357, in Filter msg.Save() File "imapfilter.py", line 208, in Save msg_time = self.extractTime() File "imapfilter.py", line 180, in extractTime md = time.mktime(pd) TypeError: argument must be 9-item sequence, not None > Thanks for the help in tracing these down. Given the way that each > imap server software seems to put it's own interpretation on the > RFC, we need as many testers as possible. No problem; I just wish I could get you an account so you could try this stuff yourself. -- Dave Abrahams Boost Consulting www.boost-consulting.com From skip at pobox.com Sun Apr 20 08:57:27 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Apr 20 22:54:21 2003 Subject: [Spambayes] Re: [Spambayes-checkins] spambayes/spambayes message.py, 1.12, 1.13 In-Reply-To: References: Message-ID: <16034.39223.115333.379226@montanaro.dyndns.org> Tim> + #XXX Tim doesn't like this regex. He will study it to Tim> + #XXX see if he can learn to like it, or come up with Tim> + #XXX something he likes better. - Tim Tim> return re.sub(r'(?:\r\n|\n|\r(?!\n))', "\r\n", data) I don't think the negative lookahead assertion is necessary. By default, regular expressions are greedy (match the largest possible string), so return re.sub(r"\r\n|\n|\r", "\r\n", data) should work. Any time \r\n appears, it will be preferred over either \r or \n, so if the \r branch matches, that implies it wasn't followed by \n. If I've been laboring under a misconception for all these years, I'd be happy to be corrected. Skip From skip at pobox.com Sun Apr 20 08:42:10 2003 From: skip at pobox.com (Skip Montanaro) Date: Sun Apr 20 22:54:27 2003 Subject: [Spambayes] pop3proxy_port, pop3proxy_server_name andpop3proxy_server_port options In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB49@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB49@its-xchg4.massey.ac.nz> Message-ID: <16034.38306.897538.875810@montanaro.dyndns.org> >>>>> "Tony" == Tony Meyer writes: Skip> What's the purpose of pop3proxy_ports? Can the ports to listen to Skip> be inferred from the pop3proxy_servers list? >> From my memory, this allow to have one ip adresse (127.0.0.1) serving >> for more than one pop server. Something like >> >> pop3proxy_ports: 110, 1100 >> pop3proxy_servers: pop3.example.com:110, pop3.example2.com:110 Tony> That's exactly it. I must admit, I was very confused when I first Tony> used pop3proxy about what these did. pop3proxy_listen_ports or Tony> pop3proxy_proxy_ports would probably be a better name, but option Tony> names are not easily changed... Still, if they are better named something other than what they are today, it will only be harder to change them later, especially after another release. Also, since we now have a proper hierarchy in our option names shouldn't we zap the "pop3proxy_" prefix? [pop3proxy] listen_ports: 110, 1100 remote_servers: pop3.example.com:110, pop3.example2.com:110 Skip From tim at fourstonesExpressions.com Sun Apr 20 15:34:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Sun Apr 20 22:54:35 2003 Subject: [Spambayes] Re: [Spambayes-checkins] spambayes/spambayes message.py, 1.12, 1.13 In-Reply-To: <16034.39223.115333.379226@montanaro.dyndns.org> Message-ID: 4/20/2003 7:57:27 AM, Skip Montanaro wrote: > > Tim> + #XXX Tim doesn't like this regex. He will study it to > Tim> + #XXX see if he can learn to like it, or come up with > Tim> + #XXX something he likes better. - Tim > Tim> return re.sub(r'(?:\r\n|\n|\r(?!\n))', "\r\n", data) > >I don't think the negative lookahead assertion is necessary. By default, >regular expressions are greedy (match the largest possible string), so > > return re.sub(r"\r\n|\n|\r", "\r\n", data) > >should work. Any time \r\n appears, it will be preferred over either \r or >\n, so if the \r branch matches, that implies it wasn't followed by \n. If >I've been laboring under a misconception for all these years, I'd be happy >to be corrected. I took a hard look at this one too. My intuition told me that there was something wrong with it, but I couldn't come up with an alternative that only used two alternations (three is worse than two...) that worked in all cases. The negative lookahead was especially distressing ;) I worked for a while on one that would only require two alternations using negated character classes, but didn't have much luck because they required a capture of the negated character for the replacement, which is even more expensive than the negative lookahead. Yours should work better than what's there right now, and I'm thinking about this one as well: '\r\n?|\n' This is possibly a major performance consideration, so it really behooves us to take a good hard look at this regex (and the issue of why it's there in the first place). > >Skip > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Mon Apr 21 16:34:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 23:34:59 2003 Subject: [Spambayes] Re: [Spambayes-checkins] spambayes/spambayes message.py, 1.12, 1.13 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB92@its-xchg4.massey.ac.nz> [message.py] > return re.sub(r'(?:\r\n|\n|\r(?!\n))', "\r\n", data) [Skip] > I don't think the negative lookahead assertion is necessary. > By default, regular expressions are greedy (match the largest > possible string), so > > return re.sub(r"\r\n|\n|\r", "\r\n", data) > > should work. Any time \r\n appears, it will be preferred > over either \r or \n, so if the \r branch matches, that > implies it wasn't followed by \n. If I've been laboring > under a misconception for all these years, I'd be happy to be > corrected. I'm not sure if I commented this anywhere, but I lifted the regex straight from smtplib, since it's doing the same thing as we want to do. Yours sounds good (to me, who is a regex-beginner), so if it is right, I guess I should submit a patch for smtplib as well :) [TimS] > Yours should work better than what's there right > now, and I'm thinking about this one as well: '\r\n?|\n' Again, this looks like it would work to me; surely it can't be this simple, though, can it? ;) [TimS] > (and the issue of why it's there in the first place). Well, this is because the imapfilter doesn't do this for us, as it should. I suppose we should move this into the IMAPMessage class, since messages over SMTP will be ok. Hopefully a later version of the imaplib will fix this, but while we need to be compatible with Python 2.2, we'll have to have it in there. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 21 16:51:12 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 20 23:51:46 2003 Subject: [Spambayes] Renaming options Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CB93@its-xchg4.massey.ac.nz> [Skip] > Still, if they are better named something other than what > they are today, it will only be harder to change them later, > especially after another release. I must admit that crossed my mind :) Well then, assuming that I take full responsibility for changing the code over, what about the following renaming (I'll hold off doing this for a while, so that people can respond): * New section "Headers", which takes these from section "Hammie": clue_mailheader_cutoff:0.5 header_ham_string:ham header_name:X-Spambayes-Classification header_score_digits:2 header_score_logarithm:False header_spam_string:spam header_unsure_string:unsure trained_header:X-Spambayes-Trained And these from section "pop3proxy" evidence_header_name:X-Spambayes-Evidence mailid_header_name:X-Spambayes-MailId prob_header_name:X-Spambayes-Spam-Probability thermostat_header_name:X-Spambayes-Level include_evidence:False include_prob:False include_thermostat:False include_mailid:False [was add_mailid_to:] * In "globals" section, change verbose from a boolean to an integer * Rename these from section "pop3proxy" ports: -> listen_ports: servers: -> remote_servers: * Rename these from section "smtpproxy" ports: -> listen_ports: servers: -> remote_servers: * Both section "hammiefilter" and section "pop3proxy" have these: persistent_storage_file: persistent_use_database: Can they be combined (into a new section, maybe)? The only difference between them at the moment is that hammiefilter's defaults to being in the home directory. This would mean a whole heap of changing config files, but on the other hand, some of that is necessary with the new options stuff anyway, and you're absolutely right that it's better to do this now than later. For the most part, the shifting I've suggested is as a result of the project growing beyond the original hammie. =Tony Meyer From mhammond at skippinet.com.au Mon Apr 21 21:25:31 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Apr 21 06:26:17 2003 Subject: [Spambayes] BUG: Input from LOCALE not used In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB64@its-xchg4.massey.ac.nz> Message-ID: <019901c307f0$56bb0b30$530f8490@eden> > > I just checked > > in a fix to addin.py to fix this, although the same issue is > > almost certain to bite pop3propxy users on that platform. > > The core should include the fix I made, IMO: > > +1 from me. Where do you think it belongs? (i.e. what do you mean by > "core"?). classifier.py? (does it effect tokenising?) I should have quoted "fix" as it is a hack. :) I can't find a traceback any more, but as far as I can recall it was very close to ConfigParser that we need the locale specific code - wherever the "float(config_str)" happens, we fail when 'config_str' uses a period instead of the locale specific character (usually ',') So it should be somewhere in our options framework. I wish I had that traceback :( Mark. -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 1988 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030421/60a0e5ef/winmail.bin From piersh at friskit.com Mon Apr 21 05:05:38 2003 From: piersh at friskit.com (Piers Haken) Date: Mon Apr 21 07:04:05 2003 Subject: [Spambayes] Outlook plugin broken in CVS (options code?) Message-ID: <9891913C5BFE87429D71E37F08210CB92C7587@zeus.sfhq.friskit.com> I'm not sure what's up, but my plugin's not starting since I updated from CVS. Here's what I'm getting. Piers. SpamAddin - Connecting to Outlook Created new configuration file 'C:\Documents and Settings\piersh\Application Data\SpamBayes\default_configuration.pck' Traceback (most recent call last): File "C:\Python22\lib\site-packages\win32com\universal.py", line 170, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, meth.invkind, args, None, None) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\Python22\lib\site-packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "C:\Python22\spam\spambayes\Outlook2000\addin.py", line 684, in OnConnection self.manager = manager.GetManager(application) File "C:\Python22\spam\spambayes\Outlook2000\manager.py", line 475, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "C:\Python22\spam\spambayes\Outlook2000\manager.py", line 156, in __init__ import_core_spambayes_stuff(self.ini_filename) File "C:\Python22\spam\spambayes\Outlook2000\manager.py", line 70, in import_core_spambayes_stuff from spambayes import classifier File "C:\Python22\spam\spambayes\spambayes\classifier.py", line 40, in ? from spambayes.Options import options File "C:\Python22\spam\spambayes\spambayes\Options.py", line 1411, in ? options.mergefiles(filenames) File "C:\Python22\spam\spambayes\spambayes\Options.py", line 1288, in mergefiles self._update() File "C:\Python22\spam\spambayes\spambayes\Options.py", line 1326, in _update self.set(section, option, value) File "C:\Python22\spam\spambayes\spambayes\Options.py", line 1276, in set self.convert(sect, opt, val) File "C:\Python22\spam\spambayes\spambayes\Options.py", line 1261, in convert return converter(value) File "C:\Python22\spam\spambayes\spambayes\Options.py", line 952, in 'address_headers': ('get', lambda s: Set(s.split())), exceptions.AttributeError: 'Set' object has no attribute 'split' From dave at boost-consulting.com Mon Apr 21 09:31:36 2003 From: dave at boost-consulting.com (David Abrahams) Date: Mon Apr 21 08:31:56 2003 Subject: [Spambayes] Re: imapfilter mangling headers! References: <1ED4ECF91CDED24C8D012BCF2B034F130150CB61@its-xchg4.massey.ac.nz> Message-ID: <84he8sm3pz.fsf@boost-consulting.com> "Meyer, Tony" writes: >> >> Indeed. The UI script even has a syntax error (missing a >> >> colon). >> > Well, there aren't any scripts that use the UI module, so >> > that really shouldn't matter. >> ?!? You can't even invoke the script unless it can be compiled! > > That's my point; let me spell it out more: at that stage the ImapUI.py > module (which had the error) was not imported by *any* spambayes module. > Since nothing used it, nothing invoked it, and nothing tried to compile > it. (Things have changed since, and it does compile, and work, and get > imported by imapfilter.py). > >> > And even the trial ones that I haven't checked in >> > don't use that function... >> How did you manage to test them? > > I did the testing before the function was added (and I tested that > function without the line that had the syntax error, which, ironically, > was only a test that the previous line worked). Since the function > wasn't used, I didn't test afterwards since none of the code that was > used had changed. If the module was used by anything else I would have > done a cursory check that would have picked up the syntax error, but it > really wasn't necessary. Just to be clear, I wouldn't have even mentioned it if you'd just said "that's prerelease code" or if Tim hadn't mentioned that it was there. However, you indicated that you had used test scripts with it, which crossed a few wires in my logic circuits. >> I wrote a bayescustomize.ini, but the entries seemed to be ignored. > > Spambayes might not have been finding it. It should be found if it is > in the current working directory (although I've found this a little > unreliable sometimes), but it will definitely be found if the path is in > the envar BAYESCUSTOMIZE. (I'll add this to the docs, too). I got it found eventually. I think I thought at one point that it could live in my home directory... I guess not. > It's easy to test if it's reading it - add some entry that doesn't > exist in spambayes (like "option_does_not_exist: True") and try to > run pop3proxy or imapfilter. You'll get an error message about an > invalid option if it is finding it. If it does find it and ignore > entries made, then this is definitely a bug and should be reported > (with details about which entries are being ignored). > >> I have serious doubts that I'll be able to run a browser on >> the machine where I want to do this, so I don't think the web >> ui will be an option. > > Well, I guess then I'll add some documentation (maybe a FAQ) for people > in this situation. The reason there isn't any yet is that no-one has > said that they can't run a browser but want to use pop3proxy or > imapfilter. It's probably also been taken for granted that people using > spambayes without the web ui know what a config file looks like, given > that they're using software that hasn't even made it to beta yet. > >> > bayescustomize.ini uses the standard configuration format. >> ...which is documented where? > > I eventually found that by crawling through the spambayes code. > Or just google for ini file format. Now, you see, I don't know that "standard configuration format" means the same as .ini file format. > Or search for a file suffixed with .ini or .rc. Or look at the > default_bayes_customize.ini in the Outlook2000 folder. I'da had to find those, wouldn't I? > Or look in the cvs history of Options.py at the one that used to be > built in there. (Given that this was the case until a couple of > days ago, it's hardly surprising that newer documentation hasn't > made it yet. I'm not surprised; I'm just asking because I need to know. Even if I know about .ini file format (which, aside from forgetting whether it's "name: value" or "name = value", I do), I still need to know what the magic names to use are, and what their permissible values are. I note that when Options.py changed, many of the options lost their prefixes (imap_whatever became just whatever), so my old patches to Options.py couldn't just be transplanted into my .ini file. > Remember that this is a collaborative, open-source, > voluntary project...things take time...). Of course I understand that, as I work primarily on that kind of project. Thanks for the explanation, Dave -- Dave Abrahams Boost Consulting www.boost-consulting.com From tim at fourstonesExpressions.com Mon Apr 21 11:02:45 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 11:05:14 2003 Subject: [Spambayes] Renaming options In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CB93@its-xchg4.massey.ac.nz> Message-ID: 4/20/2003 10:51:12 PM, "Meyer, Tony" wrote: >[Skip] >> Still, if they are better named something other than what >> they are today, it will only be harder to change them later, >> especially after another release. > >I must admit that crossed my mind :) Well then, assuming that I take >full responsibility for changing the code over, what about the following >renaming (I'll hold off doing this for a while, so that people can >respond): > >* New section "Headers", which takes these from section "Hammie": > clue_mailheader_cutoff:0.5 > header_ham_string:ham > header_name:X-Spambayes-Classification > header_score_digits:2 > header_score_logarithm:False > header_spam_string:spam > header_unsure_string:unsure > trained_header:X-Spambayes-Trained > And these from section "pop3proxy" > evidence_header_name:X-Spambayes-Evidence > mailid_header_name:X-Spambayes-MailId > prob_header_name:X-Spambayes-Spam-Probability > thermostat_header_name:X-Spambayes-Level > include_evidence:False > include_prob:False > include_thermostat:False > include_mailid:False [was add_mailid_to:] + 1 from me. One point to keep in mind here is that changes to the Headers section will be global in scope... This is similar to the ham/spam cutoff values... I really want to use two different sets of values, one for notes and one for pop3, but this isn't possible without using different config files (which is perfectly correct). This phenomenon would extend now to headers themselves... > >* In "globals" section, change verbose from a boolean to an integer +0 > >* Rename these from section "pop3proxy" > ports: -> listen_ports: > servers: -> remote_servers: +1 > >* Rename these from section "smtpproxy" > ports: -> listen_ports: > servers: -> remote_servers: +1 > >* Both section "hammiefilter" and section "pop3proxy" have these: > persistent_storage_file: > persistent_use_database: > Can they be combined (into a new section, maybe)? Please do!!!! > The only difference between them at the moment is that > hammiefilter's defaults to being in the home directory. > >This would mean a whole heap of changing config files, but on the other >hand, some of that is necessary with the new options stuff anyway, and >you're absolutely right that it's better to do this now than later. For >the most part, the shifting I've suggested is as a result of the project >growing beyond the original hammie. We can easily write a config file migration tool. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From noreply at sourceforge.net Mon Apr 21 15:50:12 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 21 17:50:19 2003 Subject: [Spambayes] [ spambayes-Bugs-725307 ] Outlook plugin won't load (anymore) Message-ID: Bugs item #725307, was opened at 2003-04-21 23:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Fredrik Rodland (fmmr) Assigned to: Mark Hammond (mhammond) Summary: Outlook plugin won't load (anymore) Initial Comment: I just updated to the latest cvs version. I've run python addin.py --unregister and python addin.py, but when I start outlook the following traceback is caught. The plugin is not loaded. SpamAddin - Connecting to Outlook Traceback (most recent call last): File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\universal.py", line 170, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, meth.invkind, args, None, None) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 662, in OnConnection self.manager = manager.GetManager(application) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 475, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 156, in __init__ import_core_spambayes_stuff(self.ini_filename) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 70, in import_core_spambayes_stuff from spambayes import classifier File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 40, in ? from spambayes.Options import options File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1411, in ? options.mergefiles(filenames) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1288, in mergefiles self._update() File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1326, in _update self.set(section, option, value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1276, in set self.convert(sect, opt, val) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1261, in convert return converter(value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 952, in 'address_headers': ('get', lambda s: Set(s.split())), exceptions.AttributeError: 'Set' object has no attribute 'split' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 From noreply at sourceforge.net Mon Apr 21 16:34:09 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 21 18:34:17 2003 Subject: [Spambayes] [ spambayes-Bugs-725307 ] Outlook plugin won't load (anymore) Message-ID: Bugs item #725307, was opened at 2003-04-22 09:50 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Fredrik Rodland (fmmr) >Assigned to: Tony Meyer (anadelonbrin) Summary: Outlook plugin won't load (anymore) Initial Comment: I just updated to the latest cvs version. I've run python addin.py --unregister and python addin.py, but when I start outlook the following traceback is caught. The plugin is not loaded. SpamAddin - Connecting to Outlook Traceback (most recent call last): File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\universal.py", line 170, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, meth.invkind, args, None, None) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 662, in OnConnection self.manager = manager.GetManager(application) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 475, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 156, in __init__ import_core_spambayes_stuff(self.ini_filename) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 70, in import_core_spambayes_stuff from spambayes import classifier File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 40, in ? from spambayes.Options import options File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1411, in ? options.mergefiles(filenames) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1288, in mergefiles self._update() File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1326, in _update self.set(section, option, value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1276, in set self.convert(sect, opt, val) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1261, in convert return converter(value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 952, in 'address_headers': ('get', lambda s: Set(s.split())), exceptions.AttributeError: 'Set' object has no attribute 'split' ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-22 10:34 Message: Logged In: YES user_id=552329 This is my fault so I'm assigning to me. On investigation I need to work on the options that turn into sets. This would have come up any time someone tried to set the value of one of those options (Outlook found the problem because one of the headers options is in the default ini file). I've checked in a 'fix' which just bypasses the validity check for sets for the moment. Previously there wasn't any checking, so no functionality is lost. As soon as I get time I'll look into fixing this properly. Leaving this open until that's done. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 From T.A.Meyer at massey.ac.nz Tue Apr 22 11:39:21 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 21 18:40:05 2003 Subject: [Spambayes] Outlook plugin broken in CVS (options code?) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CBB6@its-xchg4.massey.ac.nz> > I'm not sure what's up, but my plugin's not starting since I > updated from CVS. My fault. I neglected to check the Set options thoroughly enough. I've checked in a fix (works for me at least), and I'll work on a solution as soon as I get a chance. Apologies - I should have checked Outlook. =Tony Meyer From dmara at dimensionpoint.com Mon Apr 21 20:08:57 2003 From: dmara at dimensionpoint.com (Dan Mara) Date: Mon Apr 21 22:09:33 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins Message-ID: <318EF5B66E98D61180CF000102BE435A42C2@fh.dimensionpoint.com> Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Any clue on what ain't working? Dan From mhammond at skippinet.com.au Tue Apr 22 13:17:14 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon Apr 21 22:18:18 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: <318EF5B66E98D61180CF000102BE435A42C2@fh.dimensionpoint.com> Message-ID: <000801c30875$4ac55cb0$530f8490@eden> Open "about.html" in your browser, and see the trouble-shooting section. This will tell you how to locate the log file for the session, and the process for reporting a bug (ie, open a bug at http://sourceforge.net/projects/spambayes/, and attach the log.) Thanks, Mark. > -----Original Message----- > From: spambayes-bounces+mhammond=bigpond.net.au@python.org > [mailto:spambayes-bounces+mhammond=bigpond.net.au@python.org]On Behalf > Of Dan Mara > Sent: Tuesday, 22 April 2003 12:09 PM > To: 'spambayes@python.org' > Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins > > > Running pure Windows (98SE) and Outlook 2000. > No Python installed. > I can't get the plugin to initialize. > I go to Tools/options/other/advanced options/Com Addins > and check the SpamBayes plug-in box, but it won't > stayed checked, > and I've uninstalled/reinstalled Outlook and the plugin > but no go. > I've got this working great, really great, on Winnt 4.0 > and Xp. > Any clue on what ain't working? > > Dan > _______________________________________________ > Spambayes mailing list > Spambayes@python.org > http://mail.python.org/mailman/listinfo/spambayes From tim at fourstonesExpressions.com Mon Apr 21 22:18:43 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 22:18:50 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: <318EF5B66E98D61180CF000102BE435A42C2@fh.dimensionpoint.com> Message-ID: 4/21/2003 9:08:57 PM, Dan Mara wrote: >Running pure Windows (98SE) and Outlook 2000. >No Python installed. I believe Python is a prereq for spambayes >I can't get the plugin to initialize. >I go to Tools/options/other/advanced options/Com Addins >and check the SpamBayes plug-in box, but it won't >stayed checked, >and I've uninstalled/reinstalled Outlook and the plugin >but no go. >I've got this working great, really great, on Winnt 4.0 >and Xp. >Any clue on what ain't working? > >Dan >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From dmara at dimensionpoint.com Mon Apr 21 20:21:56 2003 From: dmara at dimensionpoint.com (Dan Mara) Date: Mon Apr 21 22:22:30 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins Message-ID: <318EF5B66E98D61180CF000102BE435A0A113E@fh.dimensionpoint.com> Python must not be required, as SpamBayes is running great on a Winnt 4.0 and XP boxes with no Python... Dan -----Original Message----- From: Tim Stone - Four Stones Expressions [mailto:tim@fourstonesExpressions.com] Sent: Monday, April 21, 2003 7:19 PM To: 'spambayes@python.org'; Dan Mara Subject: Re: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins 4/21/2003 9:08:57 PM, Dan Mara wrote: >Running pure Windows (98SE) and Outlook 2000. >No Python installed. I believe Python is a prereq for spambayes >I can't get the plugin to initialize. >I go to Tools/options/other/advanced options/Com Addins >and check the SpamBayes plug-in box, but it won't >stayed checked, >and I've uninstalled/reinstalled Outlook and the plugin >but no go. >I've got this working great, really great, on Winnt 4.0 >and Xp. >Any clue on what ain't working? > >Dan >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Mon Apr 21 22:25:34 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 22:25:40 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: <318EF5B66E98D61180CF000102BE435A0A113E@fh.dimensionpoint.com> Message-ID: 4/21/2003 9:21:56 PM, Dan Mara wrote: >Python must not be required, as SpamBayes is running >great on a Winnt 4.0 and XP boxes with no Python... I had forgotten that Mark provides binary distribution of the Outlook plugin... oh well, sometimes I wonder about myself.... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim.one at comcast.net Mon Apr 21 23:55:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Apr 21 23:04:36 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: <318EF5B66E98D61180CF000102BE435A0A113E@fh.dimensionpoint.com> Message-ID: [Dan Mara] > Python must not be required, as SpamBayes is running > great on a Winnt 4.0 and XP boxes with no Python... Trust us: spambayes is coded entirely in Python, both the classification engine and the Outlook plugin. I personally guarantee no part of spambayes can even start if Python isn't installed. OTOH, maybe Microsoft is shipping new versions of Outlook with spambayes pre-installed . alas-that-would-take-marketing-vision-ly y'rs - tim From T.A.Meyer at massey.ac.nz Tue Apr 22 16:13:32 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 21 23:15:13 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CBF3@its-xchg4.massey.ac.nz> > [Dan Mara] > > Python must not be required, as SpamBayes is running > > great on a Winnt 4.0 and XP boxes with no Python... [Tim Peters] > Trust us: spambayes is coded entirely in Python, both the > classification engine and the Outlook plugin. I personally > guarantee no part of spambayes can even start if Python isn't > installed. OTOH, maybe Microsoft is shipping new versions of > Outlook with spambayes pre-installed . Just to clear this up :) Python doesn't need to be installed *separately* to run the Outlook plugin via Mark's binary. The binary packages together whatever Python libraries (etc) it needs to run so that it can be installed and used without the user having to install Python itself. So you don't have to install Python, but (some of) Python is on the box for it to run :) Of course, those not using the Outlook plugin binary do need to have a 'proper' Python install. =Tony Meyer From tim at fourstonesExpressions.com Mon Apr 21 23:12:38 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 23:15:32 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: Message-ID: 4/21/2003 9:55:58 PM, Tim Peters wrote: >[Dan Mara] >> Python must not be required, as SpamBayes is running >> great on a Winnt 4.0 and XP boxes with no Python... > >Trust us: spambayes is coded entirely in Python, both the classification >engine and the Outlook plugin. I personally guarantee no part of spambayes >can even start if Python isn't installed. Is this true even for the binary dist of the outlook plugin? c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From dmara at dimensionpoint.com Mon Apr 21 21:19:21 2003 From: dmara at dimensionpoint.com (Dan Mara) Date: Mon Apr 21 23:20:02 2003 Subject: [Spambayes] Virgin XP works.... Message-ID: <318EF5B66E98D61180CF000102BE435A0A113F@fh.dimensionpoint.com> Well..hmmmmm... A new install of XP Pro with a new install of Outlook 2002, and then Spambayes, nothing else installed yet...and I've already filtered out over 188 spams, using Spambayes.... I must be blessed with good python vibes? This is using Mark Hammond's plug-in.... -----Original Message----- From: Tim Peters [mailto:tim.one@comcast.net] Sent: Monday, April 21, 2003 7:56 PM To: Dan Mara Cc: spambayes@python.org Subject: RE: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins [Dan Mara] > Python must not be required, as SpamBayes is running > great on a Winnt 4.0 and XP boxes with no Python... Trust us: spambayes is coded entirely in Python, both the classification engine and the Outlook plugin. I personally guarantee no part of spambayes can even start if Python isn't installed. OTOH, maybe Microsoft is shipping new versions of Outlook with spambayes pre-installed . alas-that-would-take-marketing-vision-ly y'rs - tim From tim at fourstonesExpressions.com Mon Apr 21 23:18:25 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 23:20:24 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CBF3@its-xchg4.massey.ac.nz> Message-ID: 4/21/2003 10:13:32 PM, "Meyer, Tony" wrote: >Just to clear this up :) Python doesn't need to be installed >*separately* to run the Outlook plugin via Mark's binary. The binary >packages together whatever Python libraries (etc) it needs to run so >that it can be installed and used without the user having to install >Python itself. Thanks. I thought I was going crazy there for a minute. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Mon Apr 21 23:25:54 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 21 23:26:06 2003 Subject: [Spambayes] Virgin XP works.... In-Reply-To: <318EF5B66E98D61180CF000102BE435A0A113F@fh.dimensionpoint.com> Message-ID: 4/21/2003 10:19:21 PM, Dan Mara wrote: >Well..hmmmmm... >A new install of XP Pro with a new install >of Outlook 2002, and then Spambayes, nothing else >installed yet...and I've already filtered out >over 188 spams, using Spambayes.... >I must be blessed with good python vibes? >This is using Mark Hammond's plug-in.... My guess is that this is platform related. I'm relatively sure that it hasn't been used (at least very much) on win98. You should open a bug on this one. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim.one at comcast.net Tue Apr 22 00:47:37 2003 From: tim.one at comcast.net (Tim Peters) Date: Mon Apr 21 23:51:47 2003 Subject: [Spambayes] Outlook Plug-in won't stay checked in Com_add-ins In-Reply-To: Message-ID: [Dan Mara] >>> Python must not be required, as SpamBayes is running >>> great on a Winnt 4.0 and XP boxes with no Python... [Tim P] >> Trust us: spambayes is coded entirely in Python, both the >> classification engine and the Outlook plugin. I personally >> guarantee no part of spambayes can even start if Python isn't >> installed. [Tim Stone] > Is this true even for the binary dist of the outlook plugin? Well, yes, no, yes, no, because that installs *enough* of Python to run the classifier and the plugin. Or at least Mark said it does . From noreply at sourceforge.net Mon Apr 21 22:41:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 08:15:41 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Addin won't initialize Message-ID: Bugs item #725449, was opened at 2003-04-22 02:41 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) Summary: Addin won't initialize Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From T.A.Meyer at massey.ac.nz Tue Apr 22 19:05:12 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 22 08:15:57 2003 Subject: [Spambayes] BUG: Input from LOCALE not used Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CC18@its-xchg4.massey.ac.nz> > I should have quoted "fix" as it is a hack. :) I can't find a > traceback any more, but as far as I can recall it was very > close to ConfigParser that we need the locale specific code - > wherever the "float(config_str)" happens, we fail when > 'config_str' uses a period instead of the locale specific > character (usually ',') If it's only while reading in the options, then I need to fix this while I'm doing the other options stuff. I'll open a bug and assign it to me so that I remember. I'll try and come up with something better than the general locale setting if I can. =Tony Meyer From noreply at sourceforge.net Tue Apr 22 00:07:53 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 08:16:04 2003 Subject: [Spambayes] [ spambayes-Bugs-725466 ] Include a proper locale fix in Options.py Message-ID: Bugs item #725466, was opened at 2003-04-22 18:07 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725466&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Tony Meyer (anadelonbrin) Summary: Include a proper locale fix in Options.py Initial Comment: When reading the options, the float() call fails when the locale is a language that uses a ',' for a separator instead of '.'. This is hack-fixed in Outlook, but needs to be fixed in general. I imagine that there must be some sort of locale call that will convert between the current locale and English, and that this should be called as the option is set. Anyway, I'll get to this when I can. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725466&group_id=61702 From T.A.Meyer at massey.ac.nz Tue Apr 22 19:35:20 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 22 08:16:11 2003 Subject: [Spambayes] Re: imapfilter mangling headers! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CC1D@its-xchg4.massey.ac.nz> > Just to be clear, I wouldn't have even mentioned it if you'd > just said "that's prerelease code" or if Tim hadn't mentioned > that it was there. However, you indicated that you had used > test scripts with it, which crossed a few wires in my logic circuits. I guess this is an issue because although the spambayes project as a whole is pre-release code, some of it (the core, hammie, pop3proxy) is much closer to release than other parts. Not sure what we can do about this, apart from try to document things more clearly; although anything that makes it into a packaged release (like alpha1 and alpha2 have been) should perhaps be more closer than not. > I got it found eventually. I think I thought at one point > that it could live in my home directory... I guess not. Heh. One of the quirks (this one not my doing!) is that if it's in the home directory it looks under the name .spambayesrc - setting the environment variable is really the way to go :) > Now, you see, I don't know that "standard configuration > format" means the same as .ini file format. Well, my google's came up with the same format :) > Even if I know about .ini file format (which, aside from > forgetting whether it's > "name: value" or "name = value", I do), Heh. It's either :) > I still need to know > what the magic names to use are, and what their permissible > values are. If you check out the (new) documentation, you'll see that I've added functions to easy access these. Well, I hope they're easy - if not, let me know :) > I note that when Options.py changed, many of the > options lost their prefixes (imap_whatever became just > whatever), so my old patches to Options.py couldn't just be > transplanted into my .ini file. Well, you can *at the moment*, because the stuff is completely backwards compatible right now. We're in the process of changing over the code to use the new style, and we'll write a conversion utility for config files as well, and once all that is done, then the old imap_whatever style won't work. =Tony Meyer From piersh at friskit.com Tue Apr 22 02:14:49 2003 From: piersh at friskit.com (Piers Haken) Date: Tue Apr 22 08:16:28 2003 Subject: [Spambayes] Outlook plugin broken in CVS (options code?) Message-ID: <9891913C5BFE87429D71E37F08210CB92C758A@zeus.sfhq.friskit.com> Great, thanks! I'm spam-free once again. I'd forgotten just how annoying spam was ;-) Piers. > -----Original Message----- > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] > Sent: Monday, April 21, 2003 3:39 PM > To: Piers Haken; spambayes@python.org > Subject: RE: [Spambayes] Outlook plugin broken in CVS (options code?) > > > > I'm not sure what's up, but my plugin's not starting since I > > updated from CVS. > > My fault. I neglected to check the Set options thoroughly > enough. I've checked in a fix (works for me at least), and > I'll work on a solution as soon as I get a chance. > > Apologies - I should have checked Outlook. > > =Tony Meyer > From lists at olivermaunder.co.uk Tue Apr 22 12:09:02 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 22 08:17:07 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: References: Message-ID: <3EA514BE.9000401@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >>>Clearly we have a problem in training. >>> >>> >>> >I found it. It'll be fixed in the next checkin. > > Hi all Back at work after Easter, got the latest CVS code into an empty folder and an inbox crammed with spam, but I'm still getting this: C:\Development\SpamBayes\spambayes>imapfilter.py -t -v Loading database hammie.db... Done. Training Training took 0.485000014305 seconds, 0 messages were trained There are around 80 messages in my ham-train and spam-train folders. All the configuration looks OK, and I checked it through the nice new web interface. Any ideas? Olly From lists at olivermaunder.co.uk Tue Apr 22 13:43:06 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 22 08:17:52 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: References: Message-ID: <3EA52ACA.80300@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >>>Clearly we have a problem in training. >>> >>> >>> >I found it. It'll be fixed in the next checkin. > > Hi all Back at work after Easter, got the latest CVS code into an empty folder and an inbox crammed with spam, but I'm still getting this: C:\Development\SpamBayes\spambayes>imapfilter.py -t -v Loading database hammie.db... Done. Training Training took 0.485000014305 seconds, 0 messages were trained There are around 80 messages in my ham-train and spam-train folders. All the configuration looks OK, and I checked it through the nice new web interface. Any ideas? Olly From noreply at sourceforge.net Tue Apr 22 06:15:54 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 08:18:02 2003 Subject: [Spambayes] [ spambayes-Patches-725616 ] Options.py mergefiles crashes (+ fix) Message-ID: Patches item #725616, was opened at 2003-04-22 14:15 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=725616&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Sjoerd Mullender (sjoerd) Assigned to: Tony Meyer (anadelonbrin) Summary: Options.py mergefiles crashes (+ fix) Initial Comment: When calling Options.options.mergefiles I get a crash somewhere in re.compile, from the call in _split_values with the message: bad character range. The r.e. being compile is r"[\w\.-\*]+". This is indeed an invalid character range ("." through "*", but ord(".") > ord("*")). I believe the problem is that the "-" is not escaped. The attached diff fixes the three occurances where this happens. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=725616&group_id=61702 From tim at fourstonesExpressions.com Tue Apr 22 09:05:11 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 22 09:10:51 2003 Subject: [Spambayes] imapfilter mangling headers! In-Reply-To: <3EA52ACA.80300@olivermaunder.co.uk> Message-ID: <42QK213ZB6DFBGBA7A8NJPJDYSSOVQ.3ea53e07@myst> 4/22/2003 6:43:06 AM, Oliver Maunder wrote: >Hi all > >Back at work after Easter, got the latest CVS code into an empty folder >and an inbox crammed with spam, but I'm still getting this: > >C:\Development\SpamBayes\spambayes>imapfilter.py -t -v >Loading database hammie.db... Done. >Training >Training took 0.485000014305 seconds, 0 messages were trained > >There are around 80 messages in my ham-train and spam-train folders. All >the configuration looks OK, and I checked it through the nice new web >interface. Any ideas? Well, let's check a couple of things. I've had a bit of trouble with *thinking* I had the latest cvs version, but that actually not being the case. The correct current version of imapfilter.py is 1.29. Can you check that? Also, are you certain that you've set the spam_train_folders and ham_train_folders options to the correct folder name(s)? Believe me, I'm not trying to insult your intelligence, but just for my own comfort level, can you check to be sure those values are correct in your bayescustomize.ini file (not just on the config page). In the meantime, I'll check in a version of imap filter that gives a bit more verbose output while training. If all the above stuff is good, then use this version, capture the output, and attach it to your response, if you will. Thanks. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tshumway at jdiworks.net Tue Apr 22 08:45:54 2003 From: tshumway at jdiworks.net (tshumway@jdiworks.net) Date: Tue Apr 22 10:46:31 2003 Subject: [Spambayes] Virgin XP works.... In-Reply-To: <318EF5B66E98D61180CF000102BE435A0A113F@fh.dimensionpoint.com> References: <318EF5B66E98D61180CF000102BE435A0A113F@fh.dimensionpoint.com> Message-ID: <1051022754.3ea555a2632f6@jdiworks.net> Quoting Dan Mara : > > [Dan Mara] > > Python must not be required, as SpamBayes is running > > great on a Winnt 4.0 and XP boxes with no Python... > > Trust us: spambayes is coded entirely in Python, both > the classification > engine and the Outlook plugin. I personally guarantee > no part of spambayes > can even start if Python isn't installed. OTOH, maybe > Microsoft is shipping > new versions of Outlook with spambayes pre-installed > . > > alas-that-would-take-marketing-vision-ly y'rs - tim It appears then that at least Compaq/HP has some marketing vision. The XP box my friend bought at Fry's had Python 2.2.1 installed. From lists at olivermaunder.co.uk Tue Apr 22 17:22:42 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 22 11:22:43 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfilter mangling headers!) In-Reply-To: <42QK213ZB6DFBGBA7A8NJPJDYSSOVQ.3ea53e07@myst> References: <42QK213ZB6DFBGBA7A8NJPJDYSSOVQ.3ea53e07@myst> Message-ID: <3EA55E42.3040000@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >4/22/2003 6:43:06 AM, Oliver Maunder wrote: > > >Well, let's check a couple of things. I've had a bit of trouble with >*thinking* I had the latest cvs version, but that actually not being the case. >The correct current version of imapfilter.py is 1.29. Can you check that? > > I've been caught that way too! But I deleted everything in my spambayes folder except bayescustomize.ini before checking out the latest code. I'm now using 1.3. >Also, are you certain that you've set the spam_train_folders and >ham_train_folders options to the correct folder name(s)? Believe me, I'm not >trying to insult your intelligence, but just for my own comfort level, can you >check to be sure those values are correct in your bayescustomize.ini file (not >just on the config page). > > Pretty certain - I created the .ini file manually. When I ran the config page, it picked up all the correct values out of the file. Also the config page managed to show all my folders on the IMAP server, so it's obviously able to connect. >In the meantime, I'll check in a version of imap filter that gives a bit more >verbose output while training. If all the above stuff is good, then use this >version, capture the output, and attach it to your response, if you will. > It's a pleasure, although not overly enlightening. I added a message count to the output: C:\Development\SpamBayes\spambayes>imapfilter.py -v -t Loading database hammie.db... Done. Training Training ham folder INBOX.spambayes.ham-train Total messages in INBOX.spambayes.ham-train: 122 0 trained. Training spam folder INBOX.spambayes.spam-train Total messages in INBOX.spambayes.spam-train: 95 0 trained. Training took 0.516000032425 seconds, 0 messages were trained There's a combined total of over 200 messages in those folders, so half a second seems a bit quick to do anything with them! UPDATE: Just done some more debugging, and the regex match seems to be failing in IMAPFolder.keys(). My server returns responses like 83 (FLAGS (\Seen) UID 131) whereas the regex is looking for r"[0-9]+ \(UID ([0-9]+) FLAGS \(([\\\w]*)\)\)" which seems to have things in a different order. I'm getting a growing feeling that my ISPs mailserver is very badly behaved indeed! Any regular expression experts out there? I might just be able to mangle it enough so it matches *my* server's responses, but that would break compatibility with everyone else. Olly From tim at fourstonesExpressions.com Tue Apr 22 12:42:50 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 22 12:42:57 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfilter mangling headers!) In-Reply-To: <3EA55E42.3040000@olivermaunder.co.uk> Message-ID: <74VRNJUPKE71GYTC7NMWS072VZVQOM.3ea5710a@myst> 4/22/2003 10:22:42 AM, Oliver Maunder wrote: >UPDATE: Just done some more debugging, and the regex match seems to be >failing in IMAPFolder.keys(). My server returns responses like >83 (FLAGS (\Seen) UID 131) > >whereas the regex is looking for >r"[0-9]+ \(UID ([0-9]+) FLAGS \(([\\\w]*)\)\)" > >which seems to have things in a different order. Ok. That explains it. That regex is looking for the message id, which is used to look in the index to see if that message has already been trained. Not finding it, it is ignoring the message altogether. This is another fluke in imap to add to a growing list of flukes... Just for fun, change the regex to: r"[0-9]+ \(FLAGS \(([\\\w]*)\) UID ([0-9]+)\)" and see what happens... > >I'm getting a growing feeling that my ISPs mailserver is very badly >behaved indeed! Ain't no such thing as a well behaved imap server. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From dave at boost-consulting.com Tue Apr 22 13:55:22 2003 From: dave at boost-consulting.com (David Abrahams) Date: Tue Apr 22 12:55:47 2003 Subject: [Spambayes] imapfilter progress Message-ID: <84k7dmjwud.fsf@boost-consulting.com> Training succeeded! However: python imapfilter.py -v -c -d /usr/home/dave/bayes.db Loading database /usr/home/dave/bayes.db... Done. Classifying Traceback (most recent call last): File "imapfilter.py", line 546, in ? run() File "imapfilter.py", line 536, in run imap_filter.Filter() File "imapfilter.py", line 415, in Filter self.unsure_folder) File "imapfilter.py", line 361, in Filter msg.Save() File "imapfilter.py", line 231, in Save old_id) File "/usr/local/lib/python2.2/imaplib.py", line 622, in uid typ, dat = apply(self._simple_command, (name, command) + args) File "/usr/local/lib/python2.2/imaplib.py", line 925, in _simple_command return self._command_complete(name, apply(self._command, (name,) + args)) File "/usr/local/lib/python2.2/imaplib.py", line 762, in _command_complete raise self.error('%s command error: %s %s' % (name, typ, data)) imaplib.error: UID command error: BAD ['syntax error'] -- Dave Abrahams Boost Consulting www.boost-consulting.com From noreply at sourceforge.net Tue Apr 22 16:08:23 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 18:08:28 2003 Subject: [Spambayes] [ spambayes-Patches-725616 ] Options.py mergefiles crashes (+ fix) Message-ID: Patches item #725616, was opened at 2003-04-23 00:15 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=725616&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Sjoerd Mullender (sjoerd) Assigned to: Tony Meyer (anadelonbrin) Summary: Options.py mergefiles crashes (+ fix) Initial Comment: When calling Options.options.mergefiles I get a crash somewhere in re.compile, from the call in _split_values with the message: bad character range. The r.e. being compile is r"[\w\.-\*]+". This is indeed an invalid character range ("." through "*", but ord(".") > ord("*")). I believe the problem is that the "-" is not escaped. The attached diff fixes the three occurances where this happens. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-23 10:08 Message: Logged In: YES user_id=552329 Thanks, I'm checking in the fix - you're exactly right. Bizarre that this worked for me - I've definately tested using all of those regexs when checking the validity of "Hammie" "header_name". Go figure. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498105&aid=725616&group_id=61702 From T.A.Meyer at massey.ac.nz Wed Apr 23 11:15:41 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 22 18:16:57 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfilter manglingheaders!) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CCBA@its-xchg4.massey.ac.nz> > UPDATE: Just done some more debugging, and the regex match > seems to be failing in IMAPFolder.keys(). My server returns responses > like 83 (FLAGS (\Seen) UID 131) > > whereas the regex is looking for > r"[0-9]+ \(UID ([0-9]+) FLAGS \(([\\\w]*)\)\)" > > which seems to have things in a different order. Ah, regex's - the bane of my life ;). This is (relatively easily fixed, and I'll do this if Tim hasn't already. I do get the feeling that I need to get out a book on regex and study hard ;) > I'm getting a growing feeling that my ISPs mailserver is very badly > behaved indeed! Well, personally, I blame the RFC. It's too generic in places - like allowing any character in a folder name, instead of sensibly setting aside one for a delimiter and so on. Things like not specifying which order the results should be returned in. The imaplib could do a better job of this as well; I might drop in a feature request (if it did improve a great deal, we could always bundle a copy like with Sets and ConfigParser). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Apr 23 11:48:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 22 18:48:59 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfiltermanglingheaders!) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CCDB@its-xchg4.massey.ac.nz> > > UPDATE: Just done some more debugging, and the regex match > > seems to be failing in IMAPFolder.keys(). My server returns > responses > > like 83 (FLAGS (\Seen) UID 131) > > > > whereas the regex is looking for > > r"[0-9]+ \(UID ([0-9]+) FLAGS \(([\\\w]*)\)\)" > > > > which seems to have things in a different order. I've checked in a fix for this; if people could check it that would be great. It should now handle any ordering (as long as the message number is there at the start, which is is meant to be). =Tony Meyer From noreply at sourceforge.net Tue Apr 22 17:19:08 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 19:19:14 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Addin won't initialize Message-ID: Bugs item #725449, was opened at 2003-04-22 14:41 Message generated for change (Comment added) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) Summary: Addin won't initialize Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-23 09:19 Message: Logged In: YES user_id=14198 This is an issue with the "Installer" tool I use. I will try and sus it out before the next binary release. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From noreply at sourceforge.net Tue Apr 22 21:06:50 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 22 23:07:02 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Addin won't initialize Message-ID: Bugs item #725449, was opened at 2003-04-22 02:41 Message generated for change (Comment added) made by dmara You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) Summary: Addin won't initialize Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- >Comment By: dan maer (dmara) Date: 2003-04-23 01:06 Message: Logged In: YES user_id=759684 Ok Mark... Anyway to bypass the installer issue and get it working by manual means? Dan ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-22 21:19 Message: Logged In: YES user_id=14198 This is an issue with the "Installer" tool I use. I will try and sus it out before the next binary release. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From lists at olivermaunder.co.uk Wed Apr 23 11:15:33 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 05:15:30 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfiltermanglingheaders!) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CCDB@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CCDB@its-xchg4.massey.ac.nz> Message-ID: <3EA659B5.2070804@olivermaunder.co.uk> Meyer, Tony wrote: >I've checked in a fix for this; if people could check it that would be >great. It should now handle any ordering (as long as the message number >is there at the start, which is is meant to be). > > Getting closer ;-) The new _extract_fetch_data doesn't like messages with multiple FLAGs. In that case it fails to pick up tthe UID, which means the message doesn't get added to the list. But at least it's trying to add the message to the list, which is an improvement on yesterday. Here's the output, with a bit of extra debugging info. Loading database hammie.db... Done. Training Training ham folder INBOX.spambayes.ham-train Response: 1 (FLAGS (\Seen) UID 1) Key: FLAGS, Val: \Seen Key: UID, Val: 1 Response: 2 (FLAGS (\Seen) UID 2) Key: FLAGS, Val: \Seen Key: UID, Val: 2 Response: 3 (FLAGS (\Seen $MDNSent) UID 3) Key: FLAGS, Val: \Seen Traceback (most recent call last): File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 567, in ? run() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 553, in run imap_filter.Train() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 405, in Train num_ham_trained = folder.Train(self.classifier, False) File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 350, in Train for msg in self: File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 273, in __iter__ for key in self.keys(): File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 303, in keys uids.append(data["UID"]) KeyError: UID Olly From lists at olivermaunder.co.uk Wed Apr 23 14:41:15 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 08:41:20 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfiltermanglingheaders!) In-Reply-To: <3EA659B5.2070804@olivermaunder.co.uk> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CCDB@its-xchg4.massey.ac.nz> <3EA659B5.2070804@olivermaunder.co.uk> Message-ID: <3EA689EB.405@olivermaunder.co.uk> I wrote: > The new _extract_fetch_data doesn't like messages with multiple FLAGs. > In that case it fails to pick up tthe UID, which means the message > doesn't get added to the list. But at least it's trying to add the > message to the list, which is an improvement on yesterday. Looking at it again, it appears that it's the $ in "3 (FLAGS (\Seen $MDNSent) UID 3)" which is causing the problems. I've changed the regular expressions in _extract_fetch_data to allow $ symbols in flags, which means the UIDs get picked up now. Where there are multiple flags, the regex still only picks up the first one, which could be a problem if a message is marked "\Seen" "\Deleted" - the filter would pick up the deleted message. Meanwhile... Training took 135.652999997 seconds, 217 messages were trained I've been waiting days to see that. Now lets see what happens to those 150 odd spams in my inbox ;-) Olly From noreply at sourceforge.net Wed Apr 23 07:43:44 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 23 09:45:45 2003 Subject: [Spambayes] [ spambayes-Bugs-726255 ] Problem if bayescustomize.ini not there Message-ID: Bugs item #726255, was opened at 2003-04-23 13:43 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: Remi Ricard (papadoc) Assigned to: Nobody/Anonymous (nobody) Summary: Problem if bayescustomize.ini not there Initial Comment: Hi, I'm using spambayes and this morning I downloaded the latest cvs version and I have some problems. If the file bayescustomize.ini does not exist in the directory then I get the second trace. I tryed to get a patch for this problem but I don't know how to only save the new configuration values and not do an update_file ;-( My python skill are very poor.... For the first problem I don't know what is going on. Remi Ricard papaDoc@videotron.ca Traceback (most recent call last): File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\UserInterface.py", line 511, in onChangeopts op = open(optionsPathname, "r") IOError: [Errno 2] No such file or directory: 'C:\Devtools\SPAMBA~1\SPAMBA~1.23\bayescustomize.ini' ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 From msevilla at gts.tsc.uvigo.es Wed Apr 23 12:37:16 2003 From: msevilla at gts.tsc.uvigo.es (Miguel Sevillano) Date: Wed Apr 23 10:42:23 2003 Subject: [Spambayes] How do you classify text? Message-ID: <3EA65ECC.3020504@gts.tsc.uvigo.es> Hello, I'm working in a project that must classify a paragraph as one among N subjects. I would like to know exactly how you take a paragraph and classify it; how do you train the filter?. I would like to apply bayesian rules to distinguish among N differents subjects which a paragraph is talking about. I hope that you help because it's an important project for me. Thank you very much. From tim at fourstonesExpressions.com Wed Apr 23 10:50:31 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 23 10:51:42 2003 Subject: [Spambayes] IMAPFilter training issues (WAS imapfilter manglingheaders!) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CCBA@its-xchg4.massey.ac.nz> Message-ID: 4/22/2003 5:15:41 PM, "Meyer, Tony" wrote: >Ah, regex's - the bane of my life ;). This is (relatively easily fixed, >and I'll do this if Tim hasn't already. I do get the feeling that I >need to get out a book on regex and study hard ;) I recommend the O'Reilly book... it's a bit dated and doesn't include much python stuff, but it's a great introduction and deep dive to regex. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From mregan at coade.com Wed Apr 23 10:56:48 2003 From: mregan at coade.com (Mike Regan) Date: Wed Apr 23 11:04:56 2003 Subject: [Spambayes] Outlook 2002/SpamBayes Question Message-ID: <006a01c309a8$90dacd00$5fdc0a0a@MIKEP4> Hello, I am trying to use spam bayes with outlook 2002 sp2 on an xp box and I am having a problem. The thing is when I go to train spam bayes the list box for selecting the spam box and the good email box is empty. As a matter of fact all list boxes in spam bayes fail to populate. I have to other people in my office using it with outlook one using xp like I am and the other using 2000 and the program is working fine for them. I have tried replacing my comctl32.dll with the same version they are using but that didn't fix anything. Can you suggest another way to fix this problem? Thanks Mike Regan mregan@coade.com -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 3244 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030423/737717d4/winmail.bin From tim at fourstonesExpressions.com Wed Apr 23 11:15:37 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 23 11:21:38 2003 Subject: [Spambayes] How do you classify text? In-Reply-To: <3EA65ECC.3020504@gts.tsc.uvigo.es> Message-ID: <32ZXRKGYW940582BA2D01YURKH52UP.3ea6ae19@myst> 4/23/2003 4:37:16 AM, Miguel Sevillano wrote: > Hello, > > I'm working in a project that must classify a paragraph as one among >N subjects. I would like to know exactly how you take a paragraph and >classify it; how do you train the filter?. > > I would like to apply bayesian rules to distinguish among N >differents subjects which a paragraph is talking about. Spambayes will classify into three buckets at most: positive classification, negative classification, and unsure. To apply this to n subjects, you'd need to apply the filter n-1 times. For classifications c(1)...c(n), you would first apply the filter for c(1), removing all positive c(1) classifications from your input set. Then filter for c(2), removing all positives, etc... to c(n). You may indeed end up with negative and unsure classifications after the final c(n) filtering... Each of these filters would require a bayesian classification database (PersistentClassifier in spambayes), and would have to be trained separately, by feeding known positives to each via the learn() method. Filtering is initiated by using the spamprob method on a particular classifier, sending it the text that has been tokenized by our tokenizer. You can see a clear example of this training and filtering activity in the imapfilter. If you don't currently know python, you might want to get yourself a python primer and read it, as there is a bit of advanced python stuff in this code. By and large, the code is quite readable, though, so check it out and have a peek. Again, start at the imapfilter, and don't get hung up on the imap- ness... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim.one at comcast.net Wed Apr 23 12:19:12 2003 From: tim.one at comcast.net (Tim Peters) Date: Wed Apr 23 11:26:21 2003 Subject: [Spambayes] How do you classify text? In-Reply-To: <3EA65ECC.3020504@gts.tsc.uvigo.es> Message-ID: [Miguel Sevillano] > I'm working in a project that must classify a paragraph as one among > N subjects. I would like to know exactly how you take a paragraph and > classify it; how do you train the filter?. > > I would like to apply bayesian rules to distinguish among N > differents subjects which a paragraph is talking about. > > I hope that you help because it's an important project for me. The spambayes project doesn't (despite its name ) do Bayesian classification, or N-way classification. A good paper on a good system that does both is Jason Rennie's "ifile: An Application of Machine Learning to E-Mail Filtering" The paper summarizes the classic Bayesian classification approach. Do learn how to use citeseer: it's a great way to find papers on tech subjects! The citeseer record for the paper above is: http://citeseer.nj.nec.com/11099.html From bplist at www.wormy.org Wed Apr 23 13:25:09 2003 From: bplist at www.wormy.org (BP List) Date: Wed Apr 23 11:57:48 2003 Subject: [Spambayes] Outlook non-mail items Message-ID: Hello all, I have been using Spambayes on a Windows/Outlook XP system for a month or two, it works great save one problem. Any non-mail items (meeting requests in particular) are always classified as "unsure". I am also unable to train spambayes on these items. It gives me an error saying "No mail items are selected". Is there a way I can prevent spambayes from processing these items or train it on non-mail items? Thanks! -- Bryan Greenawalt From lists at olivermaunder.co.uk Wed Apr 23 17:58:14 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 11:58:14 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) Message-ID: <3EA6B816.4040703@olivermaunder.co.uk> Hi again Well, training seems to be working, so now I've moved on to classifying, with the following results: Traceback (most recent call last): File "C:\Development\spambayes\imapfilter.py", line 568, in ? run() File "C:\Development\spambayes\imapfilter.py", line 558, in run imap_filter.Filter() File "C:\Development\spambayes\imapfilter.py", line 437, in Filter self.unsure_folder) File "C:\Development\spambayes\imapfilter.py", line 383, in Filter msg.Save() File "C:\Development\spambayes\imapfilter.py", line 230, in Save response = imap.uid("SEARCH", "(HEADER)", "X-Spambayes-IMAP-OldID", old_id) File "C:\Program Files\Python22\lib\imaplib.py", line 622, in uid typ, dat = apply(self._simple_command, (name, command) + args) File "C:\Program Files\Python22\lib\imaplib.py", line 925, in _simple_command return self._command_complete(name, apply(self._command, (name,) + args)) File "C:\Program Files\Python22\lib\imaplib.py", line 762, in _command_complete raise self.error('%s command error: %s %s' % (name, typ, data)) imaplib.error: UID command error: BAD ['Missing required argument to Search head er'] Looks like an error coming back from the server (surprise, surprise). I'm really going to have to read that RFC soon and find out what's going on! Also, problem 2 - probably also related. Before that error, imapfilter did manage to stick a couple of messages in my Spam folder. But, when I click on them my mailer throws up a message box saying "Invalid sequence in UID". I imagine this is also coming from the server, so it looks like it doesn't like the way imapfilter is rewriting the headers. More debugging on its way later :-) Olly From tim at fourstonesExpressions.com Wed Apr 23 12:07:55 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 23 12:13:30 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: <3EA6B816.4040703@olivermaunder.co.uk> Message-ID: 4/23/2003 10:58:14 AM, Oliver Maunder wrote: >Also, problem 2 - probably also related. Before that error, imapfilter >did manage to stick a couple of messages in my Spam folder. But, when I >click on them my mailer throws up a message box saying "Invalid sequence >in UID". I imagine this is also coming from the server, so it looks like >it doesn't like the way imapfilter is rewriting the headers. Yup. Your imap server has an opposite order for uid and flags, so we have to remember that and write them out in the right order too. imaplib is probably where this happens... this is gonna get ugly. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at olivermaunder.co.uk Wed Apr 23 18:22:19 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 12:22:14 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: References: Message-ID: <3EA6BDBB.70608@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >Yup. Your imap server has an opposite order for uid and flags, so >we have to remember that and write them out in the right order too. >imaplib is probably where this happens... this is gonna get ugly. > > You know, I can always change ISPs if it'll make things easier... It just frightens me that someone else will start testing this, and find that *their* server does things differently too! From tim at fourstonesExpressions.com Wed Apr 23 12:25:14 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 23 12:28:24 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: <3EA6BDBB.70608@olivermaunder.co.uk> Message-ID: 4/23/2003 11:22:19 AM, Oliver Maunder wrote: >Tim Stone - Four Stones Expressions wrote: > >>Yup. Your imap server has an opposite order for uid and flags, so >>we have to remember that and write them out in the right order too. >>imaplib is probably where this happens... this is gonna get ugly. >> >> >You know, I can always change ISPs if it'll make things easier... > >It just frightens me that someone else will start testing this, and find >that *their* server does things differently too! Yes, it is becoming quite apparent to me that we really must have a "supported imap server" list. I doubt that we can ever make all the servers happy. These problems 'threaten' to turn us into imap problem fixers rather than anti-spam software developers. I must say that learning the nuances of every imap server's semantic is not particularly interesting to me... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From lists at olivermaunder.co.uk Wed Apr 23 21:48:18 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 15:48:26 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: References: Message-ID: <3EA6EE02.7030603@olivermaunder.co.uk> Tim Stone - Four Stones Expressions wrote: >Yes, it is becoming quite apparent to me that we really must have a "supported >imap server" list. I doubt that we can ever make all the servers happy. >These problems 'threaten' to turn us into imap problem fixers rather than >anti-spam software developers. I must say that learning the nuances of every >imap server's semantic is not particularly interesting to me... > > Hmmm... maybe it would be a good idea to go back and see how others have done it. I have used a variety of MUAs with this account with no problem. And IMAPSpamBeGone and IMAPAssassin (simple perl version of isbg) worked fine out of the box. But I agree with you about the complexities - one of the initial attractions of the imap program was that it could avoid some of the problems with pop3proxy. Olly From mhammond at skippinet.com.au Thu Apr 24 09:21:42 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Apr 23 18:22:18 2003 Subject: [Spambayes] Outlook 2002/SpamBayes Question In-Reply-To: <006a01c309a8$90dacd00$5fdc0a0a@MIKEP4> Message-ID: <004f01c309e6$b93f3040$530f8490@eden> Please see the "About.html" file that comes with SpamBayes. This includes instructions for how to see the "log", which will include any error messages. Please mail this log. Thanks, Mark. > -----Original Message----- > From: spambayes-bounces@python.org > [mailto:spambayes-bounces@python.org] > Sent: Thursday, 24 April 2003 12:57 AM > To: SpamBayes@python.org > Subject: [Spambayes] Outlook 2002/SpamBayes Question > > Hello, I am trying to use spam bayes with outlook 2002 sp2 on an xp box > and I am having a problem. The thing is when I go to train spam bayes the > list box for selecting the spam box and the good email box is empty. As a > matter of fact all list boxes in spam bayes fail to populate. I have to > other people in my office using it with outlook one using xp like I am and > the other using 2000 and the program is working fine for them. I have > tried replacing my comctl32.dll with the same version they are using but > that didn't fix anything. Can you suggest another way to fix this > problem? > Thanks > Mike Regan > mregan@coade.com << File: ATT00007.txt >> -------------- next part -------------- A non-text attachment was scrubbed... Name: winmail.dat Type: application/ms-tnef Size: 2256 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes/attachments/20030424/f5999320/winmail.bin From dave at boost-consulting.com Wed Apr 23 20:12:29 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Apr 23 19:12:56 2003 Subject: [Spambayes] Re: YAIP (Yet Another IMAP Problem) References: <3EA6B816.4040703@olivermaunder.co.uk> Message-ID: <7k9k7qqq.fsf@boost-consulting.com> Oliver Maunder writes: > Hi again > > Well, training seems to be working, so now I've moved on to > classifying, with the following results: > > Traceback (most recent call last): > File "C:\Development\spambayes\imapfilter.py", line 568, in ? > run() > File "C:\Development\spambayes\imapfilter.py", line 558, in run > imap_filter.Filter() > File "C:\Development\spambayes\imapfilter.py", line 437, in Filter > self.unsure_folder) > File "C:\Development\spambayes\imapfilter.py", line 383, in Filter > msg.Save() > File "C:\Development\spambayes\imapfilter.py", line 230, in Save > response = imap.uid("SEARCH", "(HEADER)", > "X-Spambayes-IMAP-OldID", old_id) > File "C:\Program Files\Python22\lib\imaplib.py", line 622, in uid > typ, dat = apply(self._simple_command, (name, command) + args) > File "C:\Program Files\Python22\lib\imaplib.py", line 925, in > _simple_command > return self._command_complete(name, apply(self._command, (name,) + > args)) > File "C:\Program Files\Python22\lib\imaplib.py", line 762, in > _command_complete > raise self.error('%s command error: %s %s' % (name, typ, data)) > imaplib.error: UID command error: BAD ['Missing required argument to > Search head > er'] I'm getting the same error: %python imapfilter.py -c -v -d ~/bayes.db Loading database /usr/home/dave/bayes.db... Done. Classifying Traceback (most recent call last): File "imapfilter.py", line 565, in ? run() File "imapfilter.py", line 555, in run imap_filter.Filter() File "imapfilter.py", line 434, in Filter self.unsure_folder) File "imapfilter.py", line 380, in Filter msg.Save() File "imapfilter.py", line 231, in Save old_id) File "/usr/local/lib/python2.2/imaplib.py", line 622, in uid typ, dat = apply(self._simple_command, (name, command) + args) File "/usr/local/lib/python2.2/imaplib.py", line 925, in _simple_command return self._command_complete(name, apply(self._command, (name,) + args)) File "/usr/local/lib/python2.2/imaplib.py", line 762, in _command_complete raise self.error('%s command error: %s %s' % (name, typ, data)) imaplib.error: UID command error: BAD ['syntax error'] -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Wed Apr 23 20:14:15 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Apr 23 19:21:15 2003 Subject: [Spambayes] Re: YAIP (Yet Another IMAP Problem) References: <3EA6BDBB.70608@olivermaunder.co.uk> Message-ID: <3ck87qns.fsf@boost-consulting.com> Tim Stone - Four Stones Expressions writes: > Yes, it is becoming quite apparent to me that we really must have a "supported > imap server" list. I doubt that we can ever make all the servers happy. > These problems 'threaten' to turn us into imap problem fixers rather than > anti-spam software developers. I must say that learning the nuances of every > imap server's semantic is not particularly interesting to me... I wonder if some kind of IMAP autoconf is possible; if so we could find out the protocol by probing... -- Dave Abrahams Boost Consulting www.boost-consulting.com From noreply at sourceforge.net Wed Apr 23 17:24:20 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 23 19:24:32 2003 Subject: [Spambayes] [ spambayes-Bugs-726255 ] Problem if bayescustomize.ini not there Message-ID: Bugs item #726255, was opened at 2003-04-24 01:43 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 Category: pop3proxy Group: None Status: Open Resolution: None Priority: 5 Submitted By: Remi Ricard (papadoc) >Assigned to: Tony Meyer (anadelonbrin) Summary: Problem if bayescustomize.ini not there Initial Comment: Hi, I'm using spambayes and this morning I downloaded the latest cvs version and I have some problems. If the file bayescustomize.ini does not exist in the directory then I get the second trace. I tryed to get a patch for this problem but I don't know how to only save the new configuration values and not do an update_file ;-( My python skill are very poor.... For the first problem I don't know what is going on. Remi Ricard papaDoc@videotron.ca Traceback (most recent call last): File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\UserInterface.py", line 511, in onChangeopts op = open(optionsPathname, "r") IOError: [Errno 2] No such file or directory: 'C:\Devtools\SPAMBA~1\SPAMBA~1.23\bayescustomize.ini' ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 11:24 Message: Logged In: YES user_id=552329 This is probably also my fault. I'm fixing options stuff at the moment, so I'll do this too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 From T.A.Meyer at massey.ac.nz Thu Apr 24 12:26:45 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 19:27:26 2003 Subject: [Spambayes] Outlook non-mail items Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEC9@its-xchg4.massey.ac.nz> > I have been using Spambayes on a Windows/Outlook XP system > for a month or two, it works great save one problem. Any > non-mail items (meeting requests in particular) are always > classified as "unsure". I am also unable to train spambayes > on these items. It gives me an error saying "No mail items > are selected". Is there a way I can prevent spambayes from > processing these items or train it on non-mail items? This is (more or less) a known bug; see SF: [ 690418 ] Non mail items filtered by Outlook I'm sure Mark will get to it at some point. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 12:38:21 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 19:38:57 2003 Subject: [Spambayes] IMAPFilter trainingissues Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CED1@its-xchg4.massey.ac.nz> > Looking at it again, it appears that it's the $ in "3 (FLAGS (\Seen > $MDNSent) UID 3)" which is causing the problems. > I've changed the regular expressions in _extract_fetch_data > to allow $ symbols in flags, which means the UIDs get picked up now. I wasn't aware that the $ character was valid in flag names (I ought to have checked...it's about the point where I need to have the imap RFC printed out and stuck to my wall ;). Turns out that anything apart from (,),{, ,ASCII chars 0x00 - 0x1f, 0x7f,*,%,\," is valid. I'll update the regex to reflect this. > Where there are multiple flags, the regex still only picks up the first > one, which could be a problem if a message is marked "\Seen" "\Deleted" > - the filter would pick up the deleted message. This is a different problem - I'll fix this too. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 12:53:03 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 19:53:39 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEDF@its-xchg4.massey.ac.nz> > Well, training seems to be working, so now I've moved on to > classifying, with the following results: > File "C:\Program Files\Python22\lib\imaplib.py", line 762, in > _command_complete > raise self.error('%s command error: %s %s' % (name, typ, data)) > imaplib.error: UID command error: BAD ['Missing required argument to > Search header'] I think this is the same problem that David reported. I'm not sure exactly what is causing it. Once thing I can suggest is that when testing with IMAP, if you could run the imapfilter with the option "-i4" and report the (much longer!) printout, that would be great (you can trim out message details, including the content, which isn't of interest). The -i switch sets the imap debugging level - I've found 4 to be quite useful (it gives a printout of each IMAP command and response). > Also, problem 2 - probably also related. Before that error, > imapfilter did manage to stick a couple of messages in my Spam > folder. But, when I click on them my mailer throws up a message > box saying "Invalid sequence in UID". I imagine this is also > coming from the server, so it looks like it doesn't like the way > imapfilter is rewriting the headers. Now that's weird. The imapfilter doesn't assign the uid's, the server does that. Not sure how to proceed on this one, unless the imap debugging stuff gives a hint. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 12:55:06 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 19:55:42 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEE1@its-xchg4.massey.ac.nz> > Yup. Your imap server has an opposite order for uid and flags, so > we have to remember that and write them out in the right order too. > imaplib is probably where this happens... this is gonna get ugly. The thing is that we don't write a UID - the server assigns it. The only times we transmit a UID are when we are fetching or storing, and it has to be the first parameter after the command. Very odd. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 12:57:33 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 19:58:08 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEE6@its-xchg4.massey.ac.nz> > You know, I can always change ISPs if it'll make things easier... > > It just frightens me that someone else will start testing > this, and find that *their* server does things differently too! That's definitely not the solution! As long as you're willing to keep on with the testing, this is what we need. We'll have tested with four different servers (NetMail, Courier, the one you are using, and the one David is using) once we're done, so hopefully that will cover most of the variations possible. As long as the server is following the RFC, then it's up to us to fix the problem. If we get a server that's bending the rules, then we'll see what we do :) =Tony Meyer From rubado at undr.net Wed Apr 23 17:59:07 2003 From: rubado at undr.net (Ian Rubado) Date: Wed Apr 23 19:58:51 2003 Subject: [Spambayes] SMTP proxy Message-ID: I download the source code from sourceforge; however, I do not see a SMTP proxy included. There is mention of it in the documentation. Is it not complete? One other thing, what kind of success has anyone had integrating Spambayes into a server side solution transparent to end users? One could use a content filter on the mail server to delete the mail or possible whatever else depending on the header that spambayes tacked into the message. Most of the spam I wish to target is 'porn' related and or advertisements for sex products, so I am not concerned with my users own definition of spam. Ian Rubado From T.A.Meyer at massey.ac.nz Thu Apr 24 13:09:37 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 20:10:28 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEED@its-xchg4.massey.ac.nz> > Hmmm... maybe it would be a good idea to go back and see how > others have done it. I have used a variety of MUAs with this account > with no problem. And IMAPSpamBeGone and IMAPAssassin (simple perl > version of isbg) worked fine out of the box. The trouble is that we're doing more than isbg does. For one, isbg doesn't do any training. We're also doing things more 'correctly' - copying the message's original datestamp and flags, for example, plus finding the uid of the moved message, so that we keep it in our records (isbg would simply retrain it). Plus the whole UI business. The more we interact with the IMAP server, the higher the possibility that something will go wrong. One of these days, of course, you'll be able to say that the latest version of spambayes worked fine out of the box :) =Tony Meyer From lists at olivermaunder.co.uk Thu Apr 24 02:21:32 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Wed Apr 23 20:21:41 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CEE1@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CEE1@its-xchg4.massey.ac.nz> Message-ID: <3EA72E0C.8030706@olivermaunder.co.uk> Meyer, Tony wrote: >>Yup. Your imap server has an opposite order for uid and flags, so >>we have to remember that and write them out in the right order too. >>imaplib is probably where this happens... this is gonna get ugly. >> >> > >The thing is that we don't write a UID - the server assigns it. The >only times we transmit a UID are when we are fetching or storing, and it >has to be the first parameter after the command. > > Probably not worth worrying too much at this stage. So far, imapfilter only managed to classify and move two messages before choking on the search function, and I'm only getting this problem on one of those messages. So lets ignore it for now, and see if it occurs on any other classified messages in the future. Olly From T.A.Meyer at massey.ac.nz Thu Apr 24 13:24:53 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 20:25:32 2003 Subject: [Spambayes] imapfilter progress Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEF9@its-xchg4.massey.ac.nz> > File "imapfilter.py", line 361, in Filter > msg.Save() > File "imapfilter.py", line 231, in Save > old_id) [...] > imaplib.error: UID command error: BAD ['syntax error'] Hmm. This *might* be because the command should use "HEADER" and not "(HEADER"), but I'm not sure - works on both the servers I have available. I'll check this in (RSN) so you can try it out. If that doesn't fix it, could you run imapfilter with the option "-i4"? This sets the imap debug level to 4, which provides a list of all the imap commands and responses (raw). That should have more clues. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 13:23:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 20:25:39 2003 Subject: [Spambayes] SMTP proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CEF8@its-xchg4.massey.ac.nz> > I download the source code from sourceforge; however, I do > not see a SMTP proxy included. There is mention of it in the > documentation. Is it not complete? In the spambayes directory you should have a file called smtpproxy.py. There it is. It is complete, although at the moment it's integrated with pop3proxy.py (it uses the pop3proxy caches). This would be simple to change (extend); as I've said before to the list, someone just has to give me a reason to do it. (Or do it themselves and submit a patch...) > One other thing, what kind > of success has anyone had integrating Spambayes into a server > side solution transparent to end users? One could use a > content filter on the mail server to delete the mail or > possible whatever else depending on the header that spambayes > tacked into the message. The FAQ has this to say: Q: This software is great! I want to implement it for all my users. Are there plans to develop a server-side spambayes solution? A: The problem with a server-side solution is that everyone has a different idea of what is spam - that's the whole strength of the bayesian-style filtering concept. If you are certain that *all* of your users would agree on what is spam and what is not, then this might work for you, but otherwise you really have to have individual databases for each user. Either way, you should be able to modify spambayes easily enough to fit into your setup. Please let the list know if you do have success in this area, and we'll update this answer. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Apr 24 14:28:23 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 23 21:29:01 2003 Subject: [Spambayes] SMTP proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CF2D@its-xchg4.massey.ac.nz> > Thanks for the answer; however, the source code I downloaded > from source forge for the current version does not include > the smtpproxy.py. spambayes-1.0a2.tar.gz Ah, I see. I forgot that smtpproxy wasn't done when alpha2 was released. If you get the cvs code it's there. > Perhaps you could send me a copy. Hmm. I'm not sure how well smtpproxy.py will work with the alpha2 Spambayes code. You can get it here: If you're planning to use it with pop3proxy, I would recommend holding off for a wee bit if possible. We're pretty close to another release (IMO), which would include smtpproxy working with the rest of the code. (We just have to iron out the IMAP filter, the new Options code, and one or two other things). If you want to use it separate from pop3proxy, then the cvs version linked above would be a good place to start. =Tony Meyer From noreply at sourceforge.net Wed Apr 23 23:10:55 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 24 01:11:03 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Addin won't initialize Message-ID: Bugs item #725449, was opened at 2003-04-22 04:41 Message generated for change (Comment added) made by usertgo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) Summary: Addin won't initialize Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- Comment By: J (usertgo) Date: 2003-04-24 05:10 Message: Logged In: YES user_id=763953 I also get the same errors as in the logfile on a Win98SE installation w/Outlook 2000 SP1, but it works on Win2000 w/same Outlook. If you have any workarounds (w/regedit?) please let us know... ---------------------------------------------------------------------- Comment By: dan maer (dmara) Date: 2003-04-23 03:06 Message: Logged In: YES user_id=759684 Ok Mark... Anyway to bypass the installer issue and get it working by manual means? Dan ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-22 23:19 Message: Logged In: YES user_id=14198 This is an issue with the "Installer" tool I use. I will try and sus it out before the next binary release. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From noreply at sourceforge.net Thu Apr 24 00:18:41 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 24 02:18:53 2003 Subject: [Spambayes] [ spambayes-Bugs-725307 ] Outlook plugin won't load (anymore) Message-ID: Bugs item #725307, was opened at 2003-04-22 09:50 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 Category: Outlook Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Fredrik Rodland (fmmr) Assigned to: Tony Meyer (anadelonbrin) Summary: Outlook plugin won't load (anymore) Initial Comment: I just updated to the latest cvs version. I've run python addin.py --unregister and python addin.py, but when I start outlook the following traceback is caught. The plugin is not loaded. SpamAddin - Connecting to Outlook Traceback (most recent call last): File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\universal.py", line 170, in dispatch retVal = ob._InvokeEx_(meth.dispid, 0, meth.invkind, args, None, None) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 322, in _InvokeEx_ return self._invokeex_(dispid, lcid, wFlags, args, kwargs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 662, in OnConnection self.manager = manager.GetManager(application) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 475, in GetManager _mgr = BayesManager(outlook=outlook, verbose=verbose) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 156, in __init__ import_core_spambayes_stuff(self.ini_filename) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 70, in import_core_spambayes_stuff from spambayes import classifier File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 40, in ? from spambayes.Options import options File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1411, in ? options.mergefiles(filenames) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1288, in mergefiles self._update() File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1326, in _update self.set(section, option, value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1276, in set self.convert(sect, opt, val) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 1261, in convert return converter(value) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\Options.py", line 952, in 'address_headers': ('get', lambda s: Set(s.split())), exceptions.AttributeError: 'Set' object has no attribute 'split' ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 18:18 Message: Logged In: YES user_id=552329 This should now be complete. I've verified that the latest checkin works with pop3proxy, imap and Outlook, so all should be ok. Reopen if it is not. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-22 10:34 Message: Logged In: YES user_id=552329 This is my fault so I'm assigning to me. On investigation I need to work on the options that turn into sets. This would have come up any time someone tried to set the value of one of those options (Outlook found the problem because one of the headers options is in the default ini file). I've checked in a 'fix' which just bypasses the validity check for sets for the moment. Previously there wasn't any checking, so no functionality is lost. As soon as I get time I'll look into fixing this properly. Leaving this open until that's done. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725307&group_id=61702 From noreply at sourceforge.net Thu Apr 24 00:30:29 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 24 02:31:46 2003 Subject: [Spambayes] [ spambayes-Bugs-725466 ] Include a proper locale fix in Options.py Message-ID: Bugs item #725466, was opened at 2003-04-22 18:07 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725466&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Tony Meyer (anadelonbrin) Summary: Include a proper locale fix in Options.py Initial Comment: When reading the options, the float() call fails when the locale is a language that uses a ',' for a separator instead of '.'. This is hack-fixed in Outlook, but needs to be fixed in general. I imagine that there must be some sort of locale call that will convert between the current locale and English, and that this should be called as the option is set. Anyway, I'll get to this when I can. ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 18:30 Message: Logged In: YES user_id=552329 Options.py now uses locale.atoi and locale.atof to convert options. I *think* this will solve this problem, but I'm not 100%. If someone could do some testing, that would be great. I'm leaving open until I'm sure it's done. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725466&group_id=61702 From noreply at sourceforge.net Thu Apr 24 01:27:03 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 24 03:27:06 2003 Subject: [Spambayes] [ spambayes-Bugs-726255 ] Problem if bayescustomize.ini not there Message-ID: Bugs item #726255, was opened at 2003-04-24 01:43 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 Category: pop3proxy Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Remi Ricard (papadoc) Assigned to: Tony Meyer (anadelonbrin) Summary: Problem if bayescustomize.ini not there Initial Comment: Hi, I'm using spambayes and this morning I downloaded the latest cvs version and I have some problems. If the file bayescustomize.ini does not exist in the directory then I get the second trace. I tryed to get a patch for this problem but I don't know how to only save the new configuration values and not do an update_file ;-( My python skill are very poor.... For the first problem I don't know what is going on. Remi Ricard papaDoc@videotron.ca Traceback (most recent call last): File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "C:\Devtools\SPAMBA~1\SPAMBA~1.23\spambayes\UserInterface.py", line 511, in onChangeopts op = open(optionsPathname, "r") IOError: [Errno 2] No such file or directory: 'C:\Devtools\SPAMBA~1\SPAMBA~1.23\bayescustomize.ini' ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 19:27 Message: Logged In: YES user_id=552329 This should now be fixed. Pleas re-open if it is not. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 11:24 Message: Logged In: YES user_id=552329 This is probably also my fault. I'm fixing options stuff at the moment, so I'll do this too. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=726255&group_id=61702 From noreply at sourceforge.net Thu Apr 24 01:57:54 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Thu Apr 24 03:58:52 2003 Subject: [Spambayes] [ spambayes-Feature Requests-716437 ] Version information in GUI somewhere Message-ID: Feature Requests item #716437, was opened at 2003-04-07 11:52 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 Category: Outlook Group: None Status: Open Priority: 1 Submitted By: Tony Meyer (anadelonbrin) Assigned to: Mark Hammond (mhammond) Summary: Version information in GUI somewhere Initial Comment: With the growing number of users, especially those using the binary, it would be good to have a version number printed somewhere in the GUI for people when they are reporting bugs. Greyed out text in the manager dialog, or even something in the about.html would work fine. I'll leave it to you to find somewhere appropriate :) ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-24 19:57 Message: Logged In: YES user_id=552329 If the version could be appended to the log file, that would be good, too. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-07 12:01 Message: Logged In: YES user_id=552329 :) Well, for the binaries, your 001/002 system would work. For the full source releases, there's a __version__ attribute (1.0a2 at the moment, I think). I don't really know for CVS (maybe just 'cvs'?), but anyone using the cvs code should be able to describe when they retrieved it. ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-07 11:54 Message: Logged In: YES user_id=14198 I'm not sure *what* version to report though. I will find the "where" if you tell me that "what" ;) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=716437&group_id=61702 From T.A.Meyer at massey.ac.nz Thu Apr 24 21:27:58 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 24 04:28:32 2003 Subject: [Spambayes] Re: YAIP (Yet Another IMAP Problem) Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CFBB@its-xchg4.massey.ac.nz> > I wonder if some kind of IMAP autoconf is possible; if so we > could find out the protocol by probing... The IMAP protocol does have a CAPABILITIES command, which does some of this. If our filter will work with anything that is RFC compliant, then we should avoid most of the problems. Note that I've changed the ways things work in the imap filter in a number of places in the latest check in. I finally gave in and read the RFC right through and came up with a number of improvements. In particular: * Getting hold of the uids in a folder is now done with SEARCH rather than a complicated FETCH. * Moving the flags to the copied message is done via the APPEND, rather than as a separate STORE. * All processing of FETCH responses is done via the _extract_fetch_data function found at the top of the module. This only handles RFC822, UID, FLAGS and INTERNALDATE at the moment, but those are the only ones we fetch. Unless I've made a typo, the regexs are also correct according to the RFC. * I doubt anyone is using it, but if expunge is set to true, then this is now done with a combination of CLOSE and EXPUNGE commands, which should be much faster. * I've stopped using EXAMINE and only use SELECT to select a folder. This should also speed things up. By the way, I don't know if anyone has noticed, but the filter doesn't currently remember whether a message was trained/classified between sessions. I know what the problem is here and I've dumped it with Tim , who will, no doubt, fix it when he gets a chance. =Tony Meyer From lists at olivermaunder.co.uk Thu Apr 24 10:39:40 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 24 04:39:34 2003 Subject: [Spambayes] IMAP - it works! Message-ID: <3EA7A2CC.7090201@olivermaunder.co.uk> Tony - I don't know exactly what you did last night, but it worked. I can see messages in my inbox again (I haven't been deleting spam for the last few days because I wanted to see how well the filter would work). Thanks a lot. But there was one small error - seems to be to do with the Options setup - Attempted to set [imap] username with invalid value ('my-username',) () Loading database hammie.db... Loading state from hammie.db database hammie.db is an existing database, with 95 spam and 122 ham Done. Traceback (most recent call last): File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 612, in ? run() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 561, in run username = options["imap", "username"][0] IndexError: tuple index out of range Didn't complain about the server and password being tuples though. Right - I'm off to do more training. Olly From lists at olivermaunder.co.uk Thu Apr 24 10:44:42 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 24 04:44:35 2003 Subject: [Spambayes] YAIP (Yet Another IMAP Problem) In-Reply-To: <3EA72E0C.8030706@olivermaunder.co.uk> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CEE1@its-xchg4.massey.ac.nz> <3EA72E0C.8030706@olivermaunder.co.uk> Message-ID: <3EA7A3FA.4050102@olivermaunder.co.uk> Oliver Maunder wrote: > Probably not worth worrying too much at this stage. So far, imapfilter > only managed to classify and move two messages before choking on the > search function, and I'm only getting this problem on one of those > messages. > > So lets ignore it for now, and see if it occurs on any other > classified messages in the future. Well - lots of messages have been classified now. Just tested some of them (ugh - clicking on spam. I feel soiled) and none of them caused that error. Olly From lists at olivermaunder.co.uk Thu Apr 24 12:36:52 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 24 06:36:46 2003 Subject: [Spambayes] IMAP - it works! In-Reply-To: <3EA7A2CC.7090201@olivermaunder.co.uk> References: <3EA7A2CC.7090201@olivermaunder.co.uk> Message-ID: <3EA7BE44.4090305@olivermaunder.co.uk> I wrote: > But there was one small error - seems to be to do with the Options > setup - > > Attempted to set [imap] username with invalid value ('my-username',) > () This is due to the regexes in Options.py. They only allow letters and digits in usernames and passwords. My username contains hyphens, and my passwords usually have punctuation in them too (but not in this case. Changing the regexes to [\S]+ would seem reasonable. Olly From dave at boost-consulting.com Thu Apr 24 11:21:20 2003 From: dave at boost-consulting.com (David Abrahams) Date: Thu Apr 24 10:23:23 2003 Subject: [Spambayes] Re: imapfilter progress References: <1ED4ECF91CDED24C8D012BCF2B034F130150CEF9@its-xchg4.massey.ac.nz> Message-ID: <7k9k563j.fsf@boost-consulting.com> "Meyer, Tony" writes: >> File "imapfilter.py", line 361, in Filter >> msg.Save() >> File "imapfilter.py", line 231, in Save >> old_id) > [...] >> imaplib.error: UID command error: BAD ['syntax error'] > > Hmm. This *might* be because the command should use "HEADER" and not > "(HEADER"), but I'm not sure - works on both the servers I have > available. I'll check this in (RSN) so you can try it out. > > If that doesn't fix it Well, that appeared to fix the _ability_ to run classification using my existing database, but the results were pretty disappointing - everything seemed to go into my "unsure" folder... > could you run imapfilter with the option "-i4"? > This sets the imap debug level to 4, which provides a list of all the > imap commands and responses (raw). That should have more clues. ...so I set up some test "inbox, spam, unsure" folders and tried to run: python imapfilter.py -v -i 5 -t -c -D ~/bayes.db Just so we'd have some data to look at. I got a traceback while training on ham; the tail of the session is visible here: http://users.rcn.com/abrahams/imap/bugout.txt It may look like there's not much there, but some of those lines are *really long* (whole messages). -- Dave Abrahams Boost Consulting www.boost-consulting.com From lists at olivermaunder.co.uk Thu Apr 24 16:55:56 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Thu Apr 24 10:55:50 2003 Subject: [Spambayes] Re: imapfilter progress In-Reply-To: <7k9k563j.fsf@boost-consulting.com> References: <1ED4ECF91CDED24C8D012BCF2B034F130150CEF9@its-xchg4.massey.ac.nz> <7k9k563j.fsf@boost-consulting.com> Message-ID: <3EA7FAFC.3070204@olivermaunder.co.uk> David Abrahams wrote: >Well, that appeared to fix the _ability_ to run classification using my >existing database, but the results were pretty disappointing - >everything seemed to go into my "unsure" folder... > > If everything's going into your unsure folder, that seems to imply that the filter hasn't got any training data to use. The traceback you posted shows an error in the training process, so this could be the cause. What happens if you just do imapfilter.py -t -v ? It should print out some details of you training DB - number of hams and number of spams. If there's nothing or not much in there then that would cause the classification problems. Olly From david at dsmcl.net Thu Apr 24 12:38:33 2003 From: david at dsmcl.net (David McLaughlin) Date: Thu Apr 24 11:38:36 2003 Subject: [Spambayes] Training corrupts mbox files Message-ID: <20030424153833.GA63651@ainaz.pair.com> I've been using spambayes for a couple of months now, and its results are spectacular. On my email setup, it easily catches 300 spams out of a total 400 messages each day, with virtually no false positives or negatives. I love it! My only problem with it is that it seems to trash my mbox files when I train it. I use the following training command to train it on ham and spam mboxes: mboxtrain.py -d $HOME/.hammiedb -g $HOME/mail_processing/caught/bayes_good -s $HOME/mail_processing/caught/spam It correctly learns the messages, but the two mbox files have a bunch of erroneous "messages" at the end, and opening the mbox up in mutt gives a series of errors concerning invalid uid sequences. Has anyone else had problems training spambayes on mbox files? Is there anything else I should be doing to prevent spambayes from rewriting the mbox file? It if helps, I can post a sample of the before and after mbox files to a webpage for perusal. Thanks, -- David McLaughlin david@dsmcl.net From rubado at undr.net Thu Apr 24 12:47:31 2003 From: rubado at undr.net (Ian Rubado) Date: Thu Apr 24 11:50:28 2003 Subject: [Spambayes] SMTP proxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150CF2D@its-xchg4.massey.ac.nz> Message-ID: Tony, Thanks again for the info. I checked the CVS and see the files. I will hold off as you suggested, but will my hammie.db file which contains all the training results work with the new files once they are done? If so then I am in good shape. I'd hate to have to retrain the whole thing again. Thanks. Ian Rubado -----Original Message----- From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] Sent: Wednesday, April 23, 2003 9:28 PM To: rubado@undr.net Cc: spambayes@python.org Subject: RE: [Spambayes] SMTP proxy > Thanks for the answer; however, the source code I downloaded > from source forge for the current version does not include > the smtpproxy.py. spambayes-1.0a2.tar.gz Ah, I see. I forgot that smtpproxy wasn't done when alpha2 was released. If you get the cvs code it's there. > Perhaps you could send me a copy. Hmm. I'm not sure how well smtpproxy.py will work with the alpha2 Spambayes code. You can get it here: If you're planning to use it with pop3proxy, I would recommend holding off for a wee bit if possible. We're pretty close to another release (IMO), which would include smtpproxy working with the rest of the code. (We just have to iron out the IMAP filter, the new Options code, and one or two other things). If you want to use it separate from pop3proxy, then the cvs version linked above would be a good place to start. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 25 11:57:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 24 18:58:12 2003 Subject: [Spambayes] SMTP proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CFC3@its-xchg4.massey.ac.nz> > Thanks again for the info. I checked the CVS and see the > files. I will hold off as you suggested, but will my > hammie.db file which contains all the training results work > with the new files once they are done? Absolutely. I think I can safely say that should at any point in the future databases from an old version not simply work in the new version a converter will be provided. For the moment you should simply be able to point the new version at the old hammie.db and it'll 'work out of the box'. It's perhaps worth mentioning, however, (and for anyone listening, not just you), that it's worth making the occasional backup of the hammie.db file. There has been the odd case of the db being corrupted (using when a crash happens during spambayes' operation). We're working on a way to solve this problem, but if retraining isn't easy (if you don't keep you old mail, for example), then it's worth having a backup that you can fall back on. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 25 12:03:30 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 24 19:04:03 2003 Subject: [Spambayes] IMAP - it works! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CFC4@its-xchg4.massey.ac.nz> > Tony - I don't know exactly what you did last night, but it worked. Basically, I re-worked the whole thing :) As I always say, if it doesn't work, start from scratch ;) (Really, I just went through it thoroughly with a copy of the RFC in my hand). > I can see messages in my inbox again (I haven't been deleting > spam for the last few days because I wanted to see how well the filter > would work). Thanks a lot. No worries. I'm glad it's finally (almost) going. Thanks for all the testing you (and David) have been doing - it really helps to get this closer to the stage at which it can be included in the next (pre)release - alpha3 or beta1, whichever it is. Please keep letting the list know if anything goes wrong, or if there are any improvements that you would like. Out of curiosity, how does it feel in terms of speed? I'm not really used to IMAP, so I don't know how fast things usually feel (and the servers I'm using are very geographically distant to me). =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 25 12:04:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Apr 24 19:04:50 2003 Subject: [Spambayes] IMAP - it works! Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CFC5@its-xchg4.massey.ac.nz> > > But there was one small error - seems to be to do with the Options > > setup - > > > > Attempted to set [imap] username with invalid value ('my-username',) > > () > > This is due to the regexes in Options.py. They only allow letters and > digits in usernames and passwords. My username contains > hyphens, and my > passwords usually have punctuation in them too (but not in this case. > > Changing the regexes to [\S]+ would seem reasonable. I'll fix this shortly. Thanks (again) for finding both the problem and the solution. =Tony Meyer From T.A.Meyer at massey.ac.nz Fri Apr 25 17:09:05 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Apr 25 00:09:52 2003 Subject: [Spambayes] Re: imapfilter progress Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150CFDB@its-xchg4.massey.ac.nz> > > could you run imapfilter with the option "-i4"? [...] > ...so I set up some test "inbox, spam, unsure" folders and > tried to run: > > python imapfilter.py -v -i 5 -t -c -D ~/bayes.db > > Just so we'd have some data to look at. I got a traceback > while training on ham; the tail of the session is visible here: > http://users.rcn.com/abrahams/imap/bugout.txt This is odd. The message that it crashes on doesn't have any crlf's (even just a cr or lf) separating the headers. The ones above it do, though. Does your mailer show this message correctly? If the mailer has an option to show the message (or even the headers) in it's original form, does it show it correctly? I'd understand it more if it was imaplib or our imapfilter that was stripping the cr/lf chars, but from the trace, it looks like the raw text from the server doesn't have them - and the length of the message (with no cr/lf) matches the literal length (4278). I'm baffled. It would be easy to catch this error, but I don't know what to do then - if the headers can't be parsed, then when we try and rewrite the message into the correct folder all the headers will be lost (again :( ). I could add in a test so that training continues and messages like these are just ignored, but I can't think of anything else to try (well, except for writing a script to try and insert line endings where there should be line endings). Anyone else got any ideas? =Tony Meyer From lists at olivermaunder.co.uk Thu Apr 24 13:01:36 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Fri Apr 25 07:01:27 2003 Subject: [Spambayes] Classifier Assertion error Message-ID: <3EA7C410.8030509@olivermaunder.co.uk> This is happening with a new DB (and new spambayes.messageinfo.db) trained with imapfilter: Loading database hammie.db... Loading state from hammie.db database hammie.db is an existing database, with 96 spam and 63 ham Done. Classifying Traceback (most recent call last): File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 654, in ? run() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 644, in run imap_filter.Filter() File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 519, in Filter self.unsure_folder) File "C:\Development\SpamBayes\spambayes\imapfilter.py", line 444, in Filter evidence=True) File "C:\Development\SpamBayes\spambayes\spambayes\classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "C:\Development\SpamBayes\spambayes\spambayes\classifier.py", line 441, in _getclues prob = self.probability(record) File "C:\Development\SpamBayes\spambayes\spambayes\classifier.py", line 301, in probability assert hamcount <= nham AssertionError From woody23107 at yahoo.com Fri Apr 25 09:44:24 2003 From: woody23107 at yahoo.com (woody burns) Date: Fri Apr 25 11:48:01 2003 Subject: [Spambayes] spambayes-1.0a2 crashes Message-ID: <20030425154424.18923.qmail@web41704.mail.yahoo.com> I installed spambayes -1.0a2 6 weeks ago on a linux machine. It installed good and was easily trained. It has been working great until yesterday. I received the attached email message, and it causes the filter to crash, giving the traceback of last several steps it performed. The same message also crashes the trainer. Did someone find a way to discourge spambayes? I hope not it is a great time saver. Thanks, woody23107@yahoo.com __________________________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo http://search.yahoo.com -------------- next part -------------- Received: from cvncvlcom.com ([211.104.129.150]) by xxxxxxxx ; Thu, 24 Apr 2003 14:02:13 -0500 Received: from xmxpita.mx.aol.com ([223.38.72.195]) by broome1.hotmail.com with Microsoft SMTPSVC id 235C4F74; Wed, 23 Apr 2003 18:57:30 -0000 Received: from esmtp.mail.lycos.com ([40.160.190.204]) by singularity.google.com with SMTP id 28EEF4CF; Wed, 23 Apr 2003 18:57:20 -0000 Received: from abyss.excite.com ([16.209.175.205]) by maelstrom.linksynergy.com with ESMTP id 1ADAF584; Wed, 23 Apr 2003 18:57:10 -0000 Date: Thu, 24 Apr 2003 03:57:10 +0900 From: ¾ÈÀü!! To: hotmail.com Subject: =?ks_c_5601-1988?B?KLG ksO0pIDG?==?EUC-KR?B?5fzg9b7IwPwg4+a788ewIL7Is7sgvLG5sLW1?= OK ^^ MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" Message-ID: <105121093401@xxxxxxxxx> Return-Path: X-Rcpt-To: X-DPOP: Version number supressed X-UIDL: 1051210959.230669 Status: U
q.gif

s.gif

d.gif

[[

From tim at fourstonesExpressions.com Fri Apr 25 12:48:48 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Fri Apr 25 12:48:55 2003 Subject: [Spambayes] spambayes-1.0a2 crashe Message-ID: 4/25/2003 10:44:24 AM, woody burns Message-ID: "Meyer, Tony" writes: >> > could you run imapfilter with the option "-i4"? > [...] >> ...so I set up some test "inbox, spam, unsure" folders and >> tried to run: >> >> python imapfilter.py -v -i 5 -t -c -D ~/bayes.db >> >> Just so we'd have some data to look at. I got a traceback >> while training on ham; the tail of the session is visible here: >> http://users.rcn.com/abrahams/imap/bugout.txt > > This is odd. The message that it crashes on doesn't have any crlf's > (even just a cr or lf) separating the headers. The ones above it do, > though. > > Does your mailer show this message correctly? Believe it or not, yes. My mailer (Oort GNUs 0.18) also doesn't seem to be having any problems with IMAP protocols. Maybe youse guys should be reading elisp code to see how to handle things ;-) > If the mailer has an option to show the message (or even the > headers) in it's original form, does it show it correctly? The header part shows as one long string with no newlines, just as you said. I'm guessing (pure conjecture) that header parsing works this way: 1. search for a colon 2. search backwards through lower-case letters. 3. If you're looking at a capital letter, this is a "best beginning of a header" and the colon is the end if the previous character is '-', back up one character and go to step 2. The last "best beginning of a header" is the beginning of the header. > I'd understand it more if it was imaplib or our imapfilter that was > stripping the cr/lf chars, but from the trace, it looks like the raw > text from the server doesn't have them - and the length of the message > (with no cr/lf) matches the literal length (4278). > > I'm baffled. It would be easy to catch this error, but I don't know > what to do then - if the headers can't be parsed, then when we try and > rewrite the message into the correct folder all the headers will be lost > (again :( ). Somehow, GNUs is getting it right. > I could add in a test so that training continues and messages like these > are just ignored, but I can't think of anything else to try (well, > except for writing a script to try and insert line endings where there > should be line endings). Anyone else got any ideas? Check out the emacs source, I guess... -- Dave Abrahams Boost Consulting www.boost-consulting.com From noreply at sourceforge.net Fri Apr 25 23:03:04 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Sat Apr 26 01:03:12 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Addin won't initialize Message-ID: Bugs item #725449, was opened at 2003-04-22 04:41 Message generated for change (Comment added) made by usertgo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) Summary: Addin won't initialize Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- Comment By: J (usertgo) Date: 2003-04-26 05:03 Message: Logged In: YES user_id=763953 ok, since i liked it so much on win2000 i did the python install & manual install of the spambayes outlook addin & its working good now, so i guess it was the installer. thanks ---------------------------------------------------------------------- Comment By: J (usertgo) Date: 2003-04-24 05:10 Message: Logged In: YES user_id=763953 I also get the same errors as in the logfile on a Win98SE installation w/Outlook 2000 SP1, but it works on Win2000 w/same Outlook. If you have any workarounds (w/regedit?) please let us know... ---------------------------------------------------------------------- Comment By: dan maer (dmara) Date: 2003-04-23 03:06 Message: Logged In: YES user_id=759684 Ok Mark... Anyway to bypass the installer issue and get it working by manual means? Dan ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-22 23:19 Message: Logged In: YES user_id=14198 This is an issue with the "Installer" tool I use. I will try and sus it out before the next binary release. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From tim_one at email.msn.com Sat Apr 26 11:09:28 2003 From: tim_one at email.msn.com (Tim Peters) Date: Sat Apr 26 10:10:22 2003 Subject: [Spambayes] spamprob combining In-Reply-To: Message-ID: [Jim Bublitz] > Since my last msg was incomprehensible, Not at all -- it's just that nobody knew what parts of it meant . < I'm just going to attach my code at the bottom and refer to it. > > Graham's original score calculation - > > product/(product + inverseProducct) > > does give the kind of score distribution you described. That's not a matter of speculation, we observed that endlessly over the first few weeks of this project's life, and the msg you're replying to showed that it's true even when fed random data. > If you substitute Gary Robinson's suggestion (see below - last few > lines), the score distribution does spread out to the center a little > bit. It spread out enormously for us -- it was night-and-day difference. > You can get Robinson's scoring calculation (as below) to produce a > normal distribution around the mean ham or spam score if you > either: > > a. Increase VECTOR_SIZE (max_discriminators??) - a value > of around 100 seems to do pretty well We use 150 by default. The distributions look "kinda normal", but tighter than normal toward the endpoints, and looser toward the middle. > b. Instead of selecting the most extreme N word probabilities > from the msg being tested, select the words randomly from the list > of words in the msg (not shown in code below). You immediately > (VECTOR_SIZE = 15) get a normal distribution around the means, but > accuracy sucks until you select 75 to 100 words/msg randomly. Hmm. I'm not *that* much in love with a normal distribution . > Neither (a) nor (b) works as well as the 15 most extreme words on > my test data. Also, Robinson's calculation doesn't produce ham at > 0.99 or spam at 0.01 - in fact the msgs that I had a hard time > classifying manually are (mostly) the ones that fall near the > cutoff. Right, we call that "the middle ground" here. It's believed to be useful, although everyone seems to ignore it . > Note also that the code below will produce an unpredictable score > if the msg contains only 30 .01 words and 30 .99 words. That can't happen for us: I believe you're still using Graham's contrived clamping of probabilities into [.01, .99]. We have no limits; if a word appears only in spam, we give it a raw spamprob of 1; if only in ham, a raw spamprob of 0; and then rely on Gary's probability adjustment to soften those extremes in accord with how *many* messages they've been seen in. One nice result is that we don't suffer "cancellation disease" anymore (a factor often implicated in bad errors when we were using Paul's scheme, where up to hundreds each of .01 and .99 words fought to produce a useless result). > It depends on how pairs.sort (...) handles ties. Under Python 2.3 (CVS) the sort is stable; under previous Pythons, it's stable "by (reliable) accident" so long as there are no more than 100 elements in the list; if more than that, it's unpredictable. > Making the limits asymmetrical (eg .989 and .01 instead of .99/.01) > doesn't seem to work very well. Dump the limits entirely -- we did, and we're happier . > The other thing that helps make the scores extreme in actual use is > that the distribution of word probabilities is extreme. For my > corpora using the code below I get 169378 unique tokens (from 24000 > msgs, 50% spam): > > Probability Number of Tokens % of Total > [0.00, 0.01) 46329 27.4% (never in spam) > [0.99, 1.00) 104367 61.7% (never in ham) > ----- > 89.1% This gets less problematic if you dump the artificial limits. A word that appears only in spam gets a higher spamprob the more spams it appears in then, and so pushes out other unique-to-one-kind words that haven't appeared as often. > From looking at failures (and assuming passes behave similarly) the > 10.9% (~17000 tokens) in between 0.01 and 0.99 still do a lot of the > work, which makes sense, since those are the most commonly used > words. We've gotten better results under all schemes by pretending that words with spamprobs in (0.4, 0.6) simply don't exist. > My experience has been that the tail tips of the score distribution > maintain about the same distance from the mean score no matter what > you do. If you improve the shape of the distribution (make it look > more normal), you move the tails about the same distance as the > distribution has spread out, and the ham and spam tails overlap > more and more, increasing the fp/fn rates. The little testing I did > on Spambayes (last week's CVS) seemed to show the same effect. That jibes with my experience too. "The problem" in my data, though, is that some ham simply have nothing hammish about them, unless viewed in the light of semantic knowledge this kind of scheme doesn't have; likewise the only thing spammish about some of the spam is that people here have decided to call them spam . > For the code below, if I train on 8000 msgs (50% spam) and then > test 200, retrain on those 200, and repeat for 16000 msgs, I get 4 > fns (3 are identical msgs from the same sender with different dates, > all are Klez msgs) and 1 fp (an ISP msg "Routine Service > Maintenance"), which are fn and fp rates of 0.05% and 0.01%. The > failures all scored in the range [0.495, 0.511] (cuttoff at 0.50) Whatever you're doing is certainly working well for you! > I ran the the SA Corpus What is this? A SpamAssassin corpus? Nobody has mentioned that here before. > today also and don't get any failures if I train on 8K of my msgs and > 50/100 of their msgs (worse results under other conditions), but the > sample sizes there are too small to do an adequete training sample and > have enough test data to have confidence in the results. I can post ? those results if anyone is interested. > > Graham's method was basically designed to produce extreme scores, > and the distribution of words in the data seems to reinforce that. > > If it's of any use to anybody (it's certainly beyond me), both the > distribution of msg scores and distribution of word probabilities > look like exponential or Weibull distributions. (They're "bathtub" > curves, if anyone is familiar with reliability statistics). > > This is all based on my data, which is not the same as your data. > YMMV. > > Jim > > > # classes posted to c.l.p by Erik Max Francis > # algorithm from Paul Graham ("A Plan for Spam") > > # was TOKEN_RE = re.compile(r"[a-zA-Z0-9'$_-]+") > # changed to catch Asian charsets > TOKEN_RE = re.compile(r"[\w'$_-]+", re.U) > FREQUENCY_THRESHHOLD = 1 # was 5 > GOOD_BIAS = 2.0 > BAD_BIAS = 1.0 > # changed to improve distribution 'width' because > # of smaller token count in training data > GOOD_PROB = 0.0001 # was 0.01 > BAD_PROB = 0.9999 # was 0.99 > VECTOR_SIZE = 15 > UNKNOWN_PROB = 0.5 # was 0.4 or 0.2 > > # remove mixed alphanumerics or strictly numeric: > # eg: HM6116, 555N, 1234 (also Windows98, 133t, h4X0r) > pn1_re = re.compile (r"[a-zA-Z]+[0-9]+") > pn2_re = re.compile (r"[0-9]+[a-zA-Z]+") > num_re = re.compile (r"^[0-9]+") > > > class Corpus(dict): > # instantiate one training Corpus for spam, one for ham, > # and then one Corpus for each test msg as msgs are tested > # (the msg Corpus instance is destroyed after > # testing the msg) > > def __init__(self, data=None): > dict.__init__(self) > self.count = 0 > if data is not None: > self.process(data) > > # process is used to extract tokens from msg, > # either in building the training sample or > # when testing a msg (can process entire msg > # or one part of msg at a time) > # 'data' is a string > > def process(self, data): > tokens = TOKEN_RE.findall(str (data)) > if not len (tokens): return > > # added the first 'if' in the loop to reduce > # total # of tokens by >75% > deletes = 0 > for token in tokens: > if (len (token) > 20)\ > or (pn1_re.search (token) != None)\ > or (pn2_re.search (token) != None)\ > or (num_re.search (token) != None): > deletes += 1 > continue > > if self.has_key(token): > self[token] += 1 > else: > self[token] = 1 > > # count tokens, not msgs > self.count += len (tokens) - deletes > > > class Database(dict): > def __init__(self, good, bad): > dict.__init__(self) > self.build(good, bad) > > # 'build' constructs the dict of token: probability > # run once after training from the ham/spam Corpus > # instances; the ham/spam Corpus instances can be > # destroyed (after saving?) after 'build' is run > > def build(self, good, bad): > ngood = good.count > nbad = bad.count > # print ngood, nbad, float(nbad)/float(ngood) > > for token in good.keys() + bad.keys(): # doubles up, but > # works > if not self.has_key(token): > g = GOOD_BIAS*good.get(token, 0) > b = BAD_BIAS*bad.get(token, 0) > > if g + b >= FREQUENCY_THRESHHOLD: > # the 'min's are leftovers from counting > # msgs instead of tokens for ngood, nbad > goodMetric = min(1.0, g/ngood) > badMetric = min(1.0, b/nbad) > total = goodMetric + badMetric > prob = max(GOOD_PROB,\ > min(BAD_PROB,badMetric/total)) > > self[token] = prob > > def scan(self, corpus): > pairs = [(token, self.get(token, UNKNOWN_PROB)) \ > for token in corpus.keys()] > > pairs.sort(lambda x, y: cmp(abs(y[1] - 0.5), abs(x[1]\ > - 0.5))) > significant = pairs[:VECTOR_SIZE] > > inverseProduct = product = 1.0 > for token, prob in significant: > product *= prob > inverseProduct *= 1.0 - prob > > # Graham scoring - was: > # return pairs, significant, product/(product +\ > # inverseProduct) > # 'pairs' and 'significant' added to assist data logging, evaluation > > # Robinson scoring - don't know why, but this works great > > n = float (len (significant)) # n could be < VECTOR_SIZE > > # div by zero possible if no headers (and msg has no body) > try: > P = 1 - inverseProduct ** (1/n) > Q = 1 - product ** (1/n) > S = (1 + (P - Q)/(P + Q))/2 > except: > S = 0.99 > > return pairs, significant, S > > From tim_one at email.msn.com Sat Apr 26 11:35:49 2003 From: tim_one at email.msn.com (Tim Peters) Date: Sat Apr 26 10:36:37 2003 Subject: [Spambayes] spamprob combining In-Reply-To: Message-ID: Oops! Sorry about that. This was an ancient partial reply that appears to have escaped from my Drafts folder; I'm not sure how, but I didn't intend to send it. > [Jim Bublitz] > > Since my last msg was incomprehensible, > > Not at all -- it's just that nobody knew what parts of it meant . > ... From neale at woozle.org Sat Apr 26 13:15:39 2003 From: neale at woozle.org (Neale Pickett) Date: Sat Apr 26 15:15:46 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: <20030424153833.GA63651@ainaz.pair.com> (David McLaughlin's message of "Thu, 24 Apr 2003 11:38:33 -0400") References: <20030424153833.GA63651@ainaz.pair.com> Message-ID: David McLaughlin writes: > Has anyone else had problems training spambayes on mbox files? Is > there anything else I should be doing to prevent spambayes from > rewriting the mbox file? It if helps, I can post a sample of the > before and after mbox files to a webpage for perusal. I wrote that code, and I have to confess I only tested it on a couple of mboxes. Go ahead and post your samples and I'll see if I can fix it. Neale From jm at jmason.org Sat Apr 26 17:18:47 2003 From: jm at jmason.org (Justin Mason) Date: Sat Apr 26 19:19:13 2003 Subject: [Spambayes] spamprob combining In-Reply-To: Message from "Tim Peters" of "Sat, 26 Apr 2003 10:09:28 EDT." Message-ID: <20030426231852.8766616F0D@jmason.org> Tim Peters said: > > I ran the the SA Corpus > > What is this? A SpamAssassin corpus? Nobody has mentioned that here > before. Um, pretty sure I did ;) http://SpamAssassin.org/publiccorpus/ --j. From tim_one at email.msn.com Sat Apr 26 20:44:13 2003 From: tim_one at email.msn.com (Tim Peters) Date: Sat Apr 26 19:45:05 2003 Subject: [Spambayes] spamprob combining In-Reply-To: <20030426231852.8766616F0D@jmason.org> Message-ID: [Tim] >> What is this? A SpamAssassin corpus? Nobody has mentioned that here >> before. [Justin Mason] > Um, pretty sure I did ;) > > http://SpamAssassin.org/publiccorpus/ Yes, you did. As I said later, this was a very old message. I was sorry it escaped, precisely because it failed to anticipate things that happened after it was written, causing you needless aggravation . From jbublitz at nwinternet.com Sat Apr 26 09:35:46 2003 From: jbublitz at nwinternet.com (Jim Bublitz) Date: Sat Apr 26 19:55:03 2003 Subject: [Spambayes] spamprob combining In-Reply-To: Message-ID: On 26-Apr-03 Tim Peters wrote: > Oops! Sorry about that. This was an ancient partial reply that > appears to have escaped from my Drafts folder; I'm not sure how, > but I didn't intend to send it. No problem. Seeing [Spambayes] in the subj line just made me think my mail system had gone totally berserk (since I unsubscribed quite a while ago). My spam filter, after working flawlessly until mid-Feb, did break (word db got wiped out ???) and it isn't retraining itself very well/very quickly. About 10% spam is getting through - pretty awful. I was hoping it would converge to its original performance, but it looks like I'm going to have to fix it one of these days. It does make me wonder if the original performance was a fluke (or if something is still broken). Jim From T.A.Meyer at massey.ac.nz Sun Apr 27 15:03:30 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sat Apr 26 22:04:05 2003 Subject: [Spambayes] Classifier Assertion error Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D02E@its-xchg4.massey.ac.nz> > This is happening with a new DB (and new > spambayes.messageinfo.db) trained with imapfilter: > > Loading database hammie.db... Loading state from hammie.db > database hammie.db is an existing database, with 96 spam and > 63 ham Done. Classifying Traceback (most recent call last): [...] > File "C:\Development\SpamBayes\spambayes\spambayes\classifier.py", > line 301, in probability > assert hamcount <= nham > AssertionError This is the same as SF #706520 ("assert fails in classifier"). If I understand things correctly, this probably means that the ham and spam counts in your database has become corrupted. This bug is one that we know really needs to get fixed, and people are thinking about at least, but hasn't yet been solved. I gather than what you can do is: (a) Retrain, if you have the data available (b) Use dbimpexp.py to convert your database to a text file. Change the ham and spam counts at the top to something like the numbers they should be (at the very least, they need to be greater or equal to the highest number in their column). Use dbimpexp.py to convert the text file back into a database. =Tony Meyer From sjoerd at acm.org Sun Apr 27 12:55:17 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Sun Apr 27 05:55:24 2003 Subject: [Spambayes] gratuitous changes? + bugs Message-ID: <20030427095517.B2A81750BF@indus.ins.cwi.nl> I did a cvs update today, and I clearly shouldn't have. My programs based on spambayes ceased working for a variety of reasons. 1. OptionClass.mergefiles was renamed to OptionClass.merge_files. Gratuitous change? Or is there some deeper reason behind this. The change was not mentioned in the cvs log, by the way. 2. Boolean values in my .ini file (or at least the verbose flag) can't be parsed anymore. The reason is, the check is_boolen is incorrect for Python 2.3. It checks that the allowed_values is a bool, but in actual fact it's a tuple of bools. See the diff. 3. My setting of basic_header_skip and safe_headers didn't work anymore. The header names now have to be separated with a single space. Why? I propose the fix included in the diff. -- Sjoerd Mullender Index: Options.py =================================================================== RCS file: /cvsroot/spambayes/spambayes/spambayes/Options.py,v retrieving revision 1.43 diff -u -r1.43 Options.py --- Options.py 27 Apr 2003 05:11:40 -0000 1.43 +++ Options.py 27 Apr 2003 09:46:22 -0000 @@ -1183,7 +1183,8 @@ # considered valid input (and 0 and 1 don't look as nice) # So, just for the 2.2 people, we have this helper function try: - if type(self.allowed_values) == types.BooleanType: + if type(self.allowed_values) == types.TupleType and \ + type(self.allowed_values[0]) == types.BooleanType: return True return False except AttributeError: @@ -1354,7 +1355,7 @@ (opt, sect, filename) else: if self.multiple_values_allowed(sect, opt): - value = tuple(value.split(' ')) + value = tuple(value.split()) self.set(sect, opt, self.convert(sect, opt, value)) # not strictly necessary, but convenient shortcuts to self._options From lists at olivermaunder.co.uk Sun Apr 27 21:13:06 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Sun Apr 27 15:13:10 2003 Subject: [Spambayes] Classifier Assertion error In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D02E@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150D02E@its-xchg4.massey.ac.nz> Message-ID: <3EAC2BC2.5070807@olivermaunder.co.uk> Meyer, Tony wrote: >I gather than what you can do is: > (a) Retrain, if you have the data available > This is happening on a brand new DB, training with imapfilter. It's happened on two separate PCs (home and work), but with similar setup (Win XP, Python 2.2, pybsddb) and the same training data. I'll give it a go under Linux when I get round to rebooting. > (b) Use dbimpexp.py to convert your database to a text file. Change >the ham and spam counts at the top to something like the numbers they >should be > Tried that. It works. Not sure yet what further training wiil do to those counts though. Olly From jm at jmason.org Sun Apr 27 16:02:26 2003 From: jm at jmason.org (Justin Mason) Date: Sun Apr 27 18:02:43 2003 Subject: [Spambayes] spamprob combining In-Reply-To: Message from "Tim Peters" of "Sat, 26 Apr 2003 19:44:13 EDT." Message-ID: <20030427220231.71C9416F0D@jmason.org> Tim Peters said: > [Tim] > >> What is this? A SpamAssassin corpus? Nobody has mentioned that here > >> before. > [Justin Mason] > > Um, pretty sure I did ;) > > http://SpamAssassin.org/publiccorpus/ > > Yes, you did. As I said later, this was a very old message. I was sorry it > escaped, precisely because it failed to anticipate things that happened > after it was written, causing you needless aggravation . Yeah, sorry for the noise -- the old "should have read the rest of the thread before replying" conundrum ;) Damn time-travelling mail, --j. From T.A.Meyer at massey.ac.nz Mon Apr 28 11:05:13 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 27 18:05:55 2003 Subject: [Spambayes] gratuitous changes? + bugs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D155@its-xchg4.massey.ac.nz> > I did a cvs update today, and I clearly shouldn't have. My > programs based on spambayes ceased working for a variety of reasons. Note that I have endeavoured to post lots of notification about the Options [proposed] changes to the list, and the convert script *should* be able to convert any old config files. I've tried to include as much backward compatibility code as possible, but there are always going to be issues when changing part of the code that's as widely used as Options.py. Getting buggy code is part of the joy of cvs ; along with reporting the bug and having it get fixed, of course. > 1. OptionClass.mergefiles was renamed to OptionClass.merge_files. > Gratuitous change? Or is there some deeper reason behind > this. The change was not mentioned in the cvs log, by the way. What do you have that calls mergefiles? None of the scripts in cvs do (well, a search for "mergefiles" doesn't bring anything up). It was renamed because mergefiles is not well named - it should be MergeFiles, mergeFiles or merge_files. It was renamed now because there are lots of other changes happening with Options.py, and it makes sense to fix everything at once. I suppose it should have been mentioned in the cvs log, sorry, but since I couldn't find anything that called it, I didn't rate it as worthy of specific inclusion. > 2. Boolean values in my .ini file (or at least the verbose flag) can't > be parsed anymore. The reason is, the check is_boolen is incorrect > for Python 2.3. It checks that the allowed_values is a bool, but in > actual fact it's a tuple of bools. See the diff. "globals":"verbose" should not be a tuple of bools, it should be a bool - it makes no sense for it to be a tuple. Options.py has the default as False - a single bool. What does your configuration file have? This could definitely be a bug (although it's not that likely that it's just for 2.3) - but the bug isn't in the check, it's in reading the value. If you can let me know the config line that's causing the problem, I'll try and figure a fix ASAP. > 3. My setting of basic_header_skip and safe_headers didn't work > anymore. The header names now have to be separated with a single > space. Why? I propose the fix included in the diff. Sorry, my mistake; you're absolutely correct. I'll check in the fix for this in a moment. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 28 11:13:28 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 27 18:14:03 2003 Subject: [Spambayes] Classifier Assertion error Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D15E@its-xchg4.massey.ac.nz> > This is happening on a brand new DB, training with imapfilter. It's > happened on two separate PCs (home and work), but with similar setup > (Win XP, Python 2.2, pybsddb) and the same training data. > I'll give it a go under Linux when I get round to rebooting. Hopefully someone else will chime in here - I haven't really paid much attention to this bug (other that noting that it needs to be fixed before beta1) since I've never come across it myself. > > (b) Use dbimpexp.py to convert your database to a text > > file. Change the ham and spam counts at the top to something > > like the numbers they should be > Tried that. It works. Not sure yet what further training wiil do to > those counts though. Obviously it's not something you want to do all the time, anyway. If this happens regularly for you, it would be great if you were able to track down where the problem is occurring - I think that that is part of the problem - we're not sure what is causing the problem. A question for TimP (or any of the other stats people): if the ham/spam count does get lost, would setting them to the highest number of occurrences in the db screw things up? i.e. if my most spammy word appeared in 423 emails, and my most hammy word appeared in 233 emails, could I then set hamcount to 233 and spamcount to 423? =Tony Meyer From dave at boost-consulting.com Sun Apr 27 20:32:01 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sun Apr 27 19:32:39 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: %python imapfilter.py -t -v Loading database hammie.db... Loading state from hammie.db database hammie.db is a new database Done. Training Training ham folder HamBox Traceback (most recent call last): File "imapfilter.py", line 661, in ? run() File "imapfilter.py", line 647, in run imap_filter.Train() File "imapfilter.py", line 488, in Train num_ham_trained = folder.Train(self.classifier, False) File "imapfilter.py", line 431, in Train for msg in self: File "imapfilter.py", line 383, in __iter__ yield self[key] File "imapfilter.py", line 423, in __getitem__ msg.setId(key) File "spambayes/message.py", line 173, in setId msginfoDB._getState(self) File "spambayes/message.py", line 118, in _getState (msg.c, msg.t) = self.db[msg.getId()] File "/usr/home/dave/src/email-2.5/email/Message.py", line 283, in __getitem__ return self.get(name) File "/usr/home/dave/src/email-2.5/email/Message.py", line 349, in get name = name.lower() AttributeError: 'int' object has no attribute 'lower' What's happening here is in: File "spambayes/message.py", line 118, in _getState (msg.c, msg.t) = self.db[msg.getId()] self.db[msg.getID()] is a email.Message object which doesn't support indexing by ints, and the construct a, b = expr basically evaluates expr and then tries to treat it as a sequence and iterate over its elements, grabbing the first two and sticking them in a and b. I have no idea how or why this works for anyone else; it seems like a straight-up bug to me, but maybe something went wrong long before that I'm unable to diagnose. -- Dave Abrahams Boost Consulting www.boost-consulting.com From T.A.Meyer at massey.ac.nz Mon Apr 28 12:48:38 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 27 19:49:20 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D220@its-xchg4.massey.ac.nz> > What's happening here is in: > > File "spambayes/message.py", line 118, in _getState > (msg.c, msg.t) = self.db[msg.getId()] > > self.db[msg.getID()] > > is a email.Message object [...] TimS might have a better idea, but I *think* that this is a result of the message db being fixed ;) The db used to store message objects, and now stores a tuple of the classification and training information. So self.db[msg.getID()] should return a tuple, *not* an email object. Delete (or rename) your message database and this should work. (This won't effect your classification database, just the memory of which messages have been classified/trained, which was inaccurate anyway). Apologies for this - we should have realised that old (message) databases would become invalid and need to be removed and posted a message to that effect. =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 28 13:30:44 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 27 20:31:19 2003 Subject: [Spambayes] gratuitous changes? + bugs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D263@its-xchg4.massey.ac.nz> [Tony] > What do you have that calls mergefiles? None of the scripts in cvs do > (well, a search for "mergefiles" doesn't bring anything up). Well, so much for relying on XP's search. HammieFilter does call mergefiles. I'm about to check in a fix for this. Apologies that I didn't find it earlier. (This is an ugly way to do things, though; HammieFilter shouldn't be calling mergefiles, that's what the Options modules is for...) =Tony Meyer From dave at boost-consulting.com Sun Apr 27 22:20:30 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sun Apr 27 21:21:07 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D220@its-xchg4.massey.ac.nz> (Tony Meyer's message of "Mon, 28 Apr 2003 11:48:38 +1200") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D220@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: >> What's happening here is in: >> >> File "spambayes/message.py", line 118, in _getState >> (msg.c, msg.t) = self.db[msg.getId()] >> >> self.db[msg.getID()] >> >> is a email.Message object > [...] > > TimS might have a better idea, but I *think* that this is a result of > the message db being fixed ;) The db used to store message objects, and > now stores a tuple of the classification and training information. So > self.db[msg.getID()] should return a tuple, *not* an email object. > > Delete (or rename) your message database and this should work. (This > won't effect your classification database, just the memory of which > messages have been classified/trained, which was inaccurate anyway). What's a message database, and where do I find it? OK, I presume spambayes.messageinfo.db is it. > Apologies for this - we should have realised that old (message) > databases would become invalid and need to be removed and posted a > message to that effect. No problem; just let me know if I've got the details about how to get out of it right. -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Sun Apr 27 22:59:46 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sun Apr 27 22:00:21 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: (David Abrahams's message of "Sun, 27 Apr 2003 21:20:30 -0400") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D220@its-xchg4.massey.ac.nz> Message-ID: David Abrahams writes: > "Meyer, Tony" writes: > >>> What's happening here is in: >>> >>> File "spambayes/message.py", line 118, in _getState >>> (msg.c, msg.t) = self.db[msg.getId()] >>> >>> self.db[msg.getID()] >>> >>> is a email.Message object >> [...] >> >> TimS might have a better idea, but I *think* that this is a result of >> the message db being fixed ;) The db used to store message objects, and >> now stores a tuple of the classification and training information. So >> self.db[msg.getID()] should return a tuple, *not* an email object. >> >> Delete (or rename) your message database and this should work. (This >> won't effect your classification database, just the memory of which >> messages have been classified/trained, which was inaccurate anyway). > > What's a message database, and where do I find it? > OK, I presume spambayes.messageinfo.db is it. > >> Apologies for this - we should have realised that old (message) >> databases would become invalid and need to be removed and posted a >> message to that effect. > > No problem; just let me know if I've got the details about how to get > out of it right. Okay, training worked, but: %python imapfilter.py -c Traceback (most recent call last): File "imapfilter.py", line 661, in ? run() File "imapfilter.py", line 651, in run imap_filter.Filter() File "imapfilter.py", line 526, in Filter self.unsure_folder) File "imapfilter.py", line 451, in Filter evidence=True) File "./spambayes/classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "./spambayes/classifier.py", line 441, in _getclues prob = self.probability(record) File "./spambayes/classifier.py", line 301, in probability assert hamcount <= nham AssertionError % What's the problem now? -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Sun Apr 27 23:10:29 2003 From: dave at boost-consulting.com (David Abrahams) Date: Sun Apr 27 22:11:05 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: (David Abrahams's message of "Sun, 27 Apr 2003 21:59:46 -0400") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D220@its-xchg4.massey.ac.nz> Message-ID: David Abrahams writes: > Okay, training worked, but: > > %python imapfilter.py -c > Traceback (most recent call last): > File "imapfilter.py", line 661, in ? > run() > File "imapfilter.py", line 651, in run > imap_filter.Filter() > File "imapfilter.py", line 526, in Filter > self.unsure_folder) > File "imapfilter.py", line 451, in Filter > evidence=True) > File "./spambayes/classifier.py", line 217, in chi2_spamprob > clues = self._getclues(wordstream) > File "./spambayes/classifier.py", line 441, in _getclues > prob = self.probability(record) > File "./spambayes/classifier.py", line 301, in probability > assert hamcount <= nham > AssertionError > % > > What's the problem now? Furthermore: %python imapfilter.py -t -v -c Loading database hammie.db... Loading state from hammie.db database hammie.db is an existing database, with 454 spam and 1242 ham Done. Training Training ham folder HamBox 418 trained. Training spam folder SpamBox 418 trained. Persisting hammie.db state in database Training took 118.465536952 seconds, 836 messages were trained Classifying Traceback (most recent call last): File "imapfilter.py", line 661, in ? run() File "imapfilter.py", line 651, in run imap_filter.Filter() File "imapfilter.py", line 518, in Filter imap.SelectFolder(self.spam_folder) File "imapfilter.py", line 221, in SelectFolder if self.current_folder != folder: File "imapfilter.py", line 372, in __cmp__ return cmp(self.name, obj.name) AttributeError: 'str' object has no attribute 'name'% %rm *.db %python imapfilter.py -t -v -c Loading database hammie.db... Loading state from hammie.db database hammie.db is a new database Done. Training Training ham folder HamBox 1660 trained. Training spam folder SpamBox 454 trained. Persisting hammie.db state in database Training took 136.448132992 seconds, 2114 messages were trained Classifying Traceback (most recent call last): File "imapfilter.py", line 661, in ? run() File "imapfilter.py", line 651, in run imap_filter.Filter() File "imapfilter.py", line 518, in Filter imap.SelectFolder(self.spam_folder) File "imapfilter.py", line 221, in SelectFolder if self.current_folder != folder: File "imapfilter.py", line 372, in __cmp__ return cmp(self.name, obj.name) AttributeError: 'str' object has no attribute 'name' %python imapfilter.py -t -v -c -D bayes.db Loading database bayes.db... Loading state from bayes.db database bayes.db is a new database Done. Training Training ham folder HamBox Traceback (most recent call last): File "imapfilter.py", line 661, in ? run() File "imapfilter.py", line 647, in run imap_filter.Train() File "imapfilter.py", line 488, in Train num_ham_trained = folder.Train(self.classifier, False) File "imapfilter.py", line 433, in Train classifier.unlearn(msg.asTokens(), not isSpam) File "./spambayes/classifier.py", line 277, in unlearn self._remove_msg(wordstream, is_spam) File "./spambayes/classifier.py", line 408, in _remove_msg raise ValueError("spam count would go negative!") ValueError: spam count would go negative! % different-every-time-ly y'rs. -- Dave Abrahams Boost Consulting www.boost-consulting.com From T.A.Meyer at massey.ac.nz Mon Apr 28 16:53:47 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Sun Apr 27 23:54:31 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D3A9@its-xchg4.massey.ac.nz> > What's a message database, and where do I find it? > OK, I presume spambayes.messageinfo.db is it. Sorry, yes. I've also confirmed with Tim that this is the cause. > Okay, training worked, but: > File "./spambayes/classifier.py", line 301, in probability > assert hamcount <= nham > AssertionError > % > What's the problem now? Hmm - interesting. This is the same problem as Olly was having. This means that the ham/spam count in the spambayes database (hammie.db, probably) is incorrect. My message to him had the two solutions. It's pushing the realms of coincidence a little to say that this has nothing to do with imapfilter. I'll try and see what's causing the count to go wrong. > File "imapfilter.py", line 372, in __cmp__ > return cmp(self.name, obj.name) > AttributeError: 'str' object has no attribute 'name'% I should be able to find this easily enough. Odd that it's not always coming up. > File "./spambayes/classifier.py", line 408, in _remove_msg > raise ValueError("spam count would go negative!") > ValueError: spam count would go negative! I suspect this is related to the first one. If I had to guess without looking at the code, I'd say that the untraining code is being activated when it shouldn't be. This would cause this one and the ham/spam count to be out. > different-every-time-ly y'rs. And just when I thought maybe things were going right... =Tony Meyer From T.A.Meyer at massey.ac.nz Mon Apr 28 18:52:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 28 01:53:01 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D42A@its-xchg4.massey.ac.nz> > > File "./spambayes/classifier.py", line 301, in probability > > assert hamcount <= nham > > AssertionError > > What's the problem now? (Excuse the verbosity of the answer). I've established what the problem here is, I believe. The RFC says that a message's UID must be unique within the mailbox. I was not thinking clearly enough when I read that to realise that in the context of the RFC, a mailbox is a folder, not the whole collection of folders. This means that to get a stable, unique identifier for a message isn't possible - an unstable unique identifier could be obtained by combining the UID and the folder's UID validity value, but that isn't guaranteed to stay the same over different sessions. This explains why not everyone sees this. One of the servers I use has unique numbers for any message (judging from what I've seen) - I'm not sure about the other, but it might also - it's up to the server to decide how the UIDs are allocated. On the other hand, if you try to untrain the wrong message, you'll get lots of ham/spam count errors. So I'm going to give up using the UID (in any form) as an identifier for messages, and do what all the other spambayes apps (bar Outlook) do and add my own. I'll store this with the message whenever it's saved. This will mean things will be a little slower (have to search for messages with a header with a certain value, instead of for a message with a particular uid), but slow and working is better than fast and not. I'll have to rework a reasonable chunk of imapfilter to do this. It will also mean that the message info databases (spambayes.messageinfo.db, probably) will be invalid (the ids will be changing), although they should still work. Any training done with imapfilter is now suspect, so I wouldn't advise keeping hold of those db's (hammie.db etc). I'll try and do this ASAP, but real life is keeping me a bit busy over the next couple of days. In other notes, I've found the problem that was causing the __cmp__ error and fixed it (I committed another error at the same time, but I'll check in a fix for that soon). I've also found a place where things are much slower than they need to be, so performance gains are still easily possible. =Tony Meyer From anthony at interlink.com.au Mon Apr 28 16:55:41 2003 From: anthony at interlink.com.au (Anthony Baxter) Date: Mon Apr 28 01:58:22 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D42A@its-xchg4.massey.ac.nz> Message-ID: <200304280555.h3S5tg529734@localhost.localdomain> >>> "Meyer, Tony" wrote > So I'm going to give up using the UID (in any form) as an identifier for > messages, and do what all the other spambayes apps (bar Outlook) do and > add my own. I'll store this with the message whenever it's saved. This > will mean things will be a little slower (have to search for messages > with a header with a certain value, instead of for a message with a > particular uid), but slow and working is better than fast and not. Note that a number of IMAP servers out there support caching of headers. I know we locally configured cyrus to cache the headers that we care about at work. Anthony -- Anthony Baxter It's never too late to have a happy childhood. From sjoerd at acm.org Mon Apr 28 10:46:21 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Mon Apr 28 03:46:31 2003 Subject: [Spambayes] gratuitous changes? + bugs In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D155@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150D155@its-xchg4.massey.ac.nz> Message-ID: <20030428074621.93E7874230@indus.ins.cwi.nl> On Mon, Apr 28 2003 "Meyer, Tony" wrote: > > I did a cvs update today, and I clearly shouldn't have. My > > programs based on spambayes ceased working for a variety of reasons. > > Note that I have endeavoured to post lots of notification about the > Options [proposed] changes to the list, and the convert script *should* > be able to convert any old config files. I've tried to include as much > backward compatibility code as possible, but there are always going to > be issues when changing part of the code that's as widely used as > Options.py. Getting buggy code is part of the joy of cvs ; along > with reporting the bug and having it get fixed, of course. I guess it's my fault for not reading every message in the spambayes group. It's just too much! :-) > > 1. OptionClass.mergefiles was renamed to OptionClass.merge_files. > > Gratuitous change? Or is there some deeper reason behind > > this. The change was not mentioned in the cvs log, by the way. > > What do you have that calls mergefiles? None of the scripts in cvs do > (well, a search for "mergefiles" doesn't bring anything up). It was > renamed because mergefiles is not well named - it should be MergeFiles, > mergeFiles or merge_files. It was renamed now because there are lots of > other changes happening with Options.py, and it makes sense to fix > everything at once. I suppose it should have been mentioned in the cvs > log, sorry, but since I couldn't find anything that called it, I didn't > rate it as worthy of specific inclusion. It's one of my own scripts. None of the available scripts does what I need, so I wrote my own. And in order to keep my scripts independent of any spambayes testing that I was doing at the time, I used mergefiles to merge in my own settings. > > 2. Boolean values in my .ini file (or at least the verbose flag) can't > > be parsed anymore. The reason is, the check is_boolen is incorrect > > for Python 2.3. It checks that the allowed_values is a bool, but > in > > actual fact it's a tuple of bools. See the diff. > > "globals":"verbose" should not be a tuple of bools, it should be a bool > - it makes no sense for it to be a tuple. Options.py has the default as > False - a single bool. What does your configuration file have? This > could definitely be a bug (although it's not that likely that it's just > for 2.3) - but the bug isn't in the check, it's in reading the value. > If you can let me know the config line that's causing the problem, I'll > try and figure a fix ASAP. The problem occured with a setting [globals] verbose: False If you look at the diff you'll see that I changed the test touching self.allowed_values. That is *allowed* values. Which for verbose is (False, True), i.e. a tuple. Also look at the except part of the try-except where the test is for just this tuple. That's also why it fails on Python 2.3 but succeeds on 2.2. 2.2 doesn't have the bool type (types.BooleanType) so you get an AttributeError in the try part. -- Sjoerd Mullender From lists at olivermaunder.co.uk Mon Apr 28 10:31:20 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Mon Apr 28 04:31:03 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D42A@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150D42A@its-xchg4.massey.ac.nz> Message-ID: <3EACE6D8.9050303@olivermaunder.co.uk> Meyer, Tony wrote: >So I'm going to give up using the UID (in any form) as an identifier for >messages, and do what all the other spambayes apps (bar Outlook) do and >add my own. I'll store this with the message whenever it's saved. This >will mean things will be a little slower (have to search for messages >with a header with a certain value, instead of for a message with a >particular uid), but slow and working is better than fast and not. > > Performance is fine for me at the moment. e.g. this morning 60 messages were classified in 40s. I'm intending to have imapfilter running full time in the background once everything's working, so it will just need to do 2 or 3 messages every 10 minutes. So, performance isn't a problem. >Any training done with >imapfilter is now suspect, so I wouldn't advise keeping hold of those >db's (hammie.db etc). > Again - not a problem. The way things stand I'm having to delete/retrain/edit the DB every couple of days anyway. BTW, the IMAP server I'm using is Cyrus v1.6.24. Could be useful to know. Olly From noreply at sourceforge.net Mon Apr 28 06:38:59 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 28 08:39:03 2003 Subject: [Spambayes] [ spambayes-Bugs-728886 ] In the pop3 UI not able to pass more than 1 server Message-ID: Bugs item #728886, was opened at 2003-04-28 12:38 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=728886&group_id=61702 Category: None Group: None Status: Open Resolution: None Priority: 5 Submitted By: Remi Ricard (papadoc) Assigned to: Nobody/Anonymous (nobody) Summary: In the pop3 UI not able to pass more than 1 server Initial Comment: Hi, Using the Web UI to set the options for pop3proxy does not work. For Servers: I pass the string "mail.isp1.ca,pop.isp2.ca,pop.isp3.net" (without the " " ) In the file UserInterface.py before line 576 value = mail.isp1.ca,pop.isp2.ca,pop.isp3.net but after line 577 value = (mail.isp1.ca,pop.isp2.ca,pop.isp3.net,) Then you get this error message when you try to save # 'mail.gmc.isp1.ca,pop.isp2.ca,pop.isp3.net' is not a value valid for [POP3 Proxy Options] Servers # '6110,6111, 6112' is not a value valid for [POP3 Proxy Options] Ports Remi papaDoc@videotron.ca ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=728886&group_id=61702 From lists at olivermaunder.co.uk Mon Apr 28 15:58:45 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Mon Apr 28 09:58:36 2003 Subject: [Spambayes] imapfilter repeat problem Message-ID: <3EAD3395.6030904@olivermaunder.co.uk> Just been running imapfilter with the -l flag to get it to periodically classify my inbox. The second round of classifying caused an error: "imaplib.error: command LOGIN illegal in state LOGOUT" Clearly, logging out after the initial round of classifying has left the imap object in the LOGOUT state, and it doesn't want you to log in when it's in that state. Personally I think that's probably an error in imaplib (what else are you going to do with a logged out object other than log in again?) . In the meantime there's a simple fix - move the line where the IMAPSession is created inside the while loop, so a new IMAPSession is created on each pass. e.g. - line 633, imapfilter.py while True: imap = IMAPSession(server, port, imapDebug, doExpunge) imap.login(username, pwd) ... I still don't know much about python. Will the old IMAPSession object will be cleaned up properly if I do this? Olly From tim at fourstonesExpressions.com Mon Apr 28 10:03:47 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Mon Apr 28 10:20:21 2003 Subject: [Spambayes] imapfilter repeat proble In-Reply-To: <3EAD3395.6030904@olivermaunder.co.uk Message-ID: 4/28/2003 9:46:45 AM, Oliver Maunder Just been running imapfilter with the -l flag to get it to periodically >classify my inbox. The second round of classifying caused an error: > > "imaplib.error: command LOGIN illegal in state LOGOUT" > >Clearly, logging out after the initial round of classifying has left the >imap object in the LOGOUT state, and it doesn't want you to log in when >it's in that state. Personally I think that's probably an error in >imaplib (what else are you going to do with a logged out object other >than log in again?) . > >In the meantime there's a simple fix - move the line where the >IMAPSession is created inside the while loop, so a new IMAPSession is >created on each pass. >e.g. - line 633, imapfilter.py > while True: > imap = IMAPSession(server, port, imapDebug, doExpunge) > imap.login(username, pwd) > ... > >I still don't know much about python. Will the old IMAPSession object >will be cleaned up properly if I do this? It should be... I'll make that change. It makes sense anyway, because we shouldn't keep a socket tied up during a sleep time. > >Olly > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From david at dsmcl.net Mon Apr 28 15:31:58 2003 From: david at dsmcl.net (David McLaughlin) Date: Mon Apr 28 14:32:01 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: References: <20030424153833.GA63651@ainaz.pair.com> Message-ID: <20030428183158.GA72269@ainaz.pair.com> Thanks for taking a look at it! I have put a sample before and after mbox at the following location: ftp://ftp.dsmcl.net/spambayes_samplembox.tgz It looks like it may be duplicating some lines in the header, and adding an extra line break, which generates "extra" bogus mail messages. As an example, here is the original subset of headers: >>Begin >From ebyfi587wxi@seductive.com Mon Apr 28 15:36:22 2003 Return-Path: Delivered-To: dogwood-dogwoodproductions:com-rayn@dogwoodproductions.com >From ebyfi587wxi@seductive.com Mon Apr 28 15:36:22 2003 Return-Path: Delivered-To: dogwood-dogwoodproductions:com-rayn@dogwoodproductions.com X-Envelope-To: rayn@dogwoodproductions.com <>Begin >From ebyfi587wxi@seductive.com Mon Apr 28 15:36:22 2003 Return-Path: Delivered-To: dogwood-dogwoodproductions:com-rayn@dogwoodproductions.com X-Spambayes-Trained: spam >From ebyfi587wxi@seductive.com Mon Apr 28 15:36:22 2003 Return-Path: Delivered-To: dogwood-dogwoodproductions:com-rayn@dogwoodproductions.com X-Envelope-To: rayn@dogwoodproductions.com < From: Neale Pickett > Date: Sat, Apr 26, 2003 at 12:15:39PM -0700 > To: To david@dsmcl.net > Subject: Re: [Spambayes] Training corrupts mbox files > > David McLaughlin writes: > > > Has anyone else had problems training spambayes on mbox files? Is > > there anything else I should be doing to prevent spambayes from > > rewriting the mbox file? It if helps, I can post a sample of the > > before and after mbox files to a webpage for perusal. > > I wrote that code, and I have to confess I only tested it on a couple of > mboxes. Go ahead and post your samples and I'll see if I can fix it. > > Neale From dave at boost-consulting.com Mon Apr 28 16:25:07 2003 From: dave at boost-consulting.com (David Abrahams) Date: Mon Apr 28 15:25:45 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D3A9@its-xchg4.massey.ac.nz> (Tony Meyer's message of "Mon, 28 Apr 2003 15:53:47 +1200") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D3A9@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: > I suspect this is related to the first one. If I had to guess without > looking at the code, I'd say that the untraining code is being activated > when it shouldn't be. This would cause this one and the ham/spam count > to be out. > >> different-every-time-ly y'rs. > > And just when I thought maybe things were going right... OK, now everything worked, sort of: %setenv PYTHONPATH ~/src/email-2.5 %python imapfilter.py -t -c -v Loading database hammie.db... Loading state from hammie.db database hammie.db is a new database Done. Training Training ham folder HamBox 0 trained. Training spam folder SpamBox 0 trained. Training took 51.954687953 seconds, 0 messages were trained Classifying Filtering took 1.34056997299 seconds. But no messages got classified spam or unsure, AFAICT. Even after I move some of the spam training messages into my inbox, they're not classified as spam. -- Dave Abrahams Boost Consulting www.boost-consulting.com From noreply at sourceforge.net Mon Apr 28 17:20:17 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Mon Apr 28 19:20:22 2003 Subject: [Spambayes] [ spambayes-Bugs-728886 ] In the pop3 UI not able to pass more than 1 server Message-ID: Bugs item #728886, was opened at 2003-04-29 00:38 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=728886&group_id=61702 Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: Remi Ricard (papadoc) >Assigned to: Tony Meyer (anadelonbrin) Summary: In the pop3 UI not able to pass more than 1 server Initial Comment: Hi, Using the Web UI to set the options for pop3proxy does not work. For Servers: I pass the string "mail.isp1.ca,pop.isp2.ca,pop.isp3.net" (without the " " ) In the file UserInterface.py before line 576 value = mail.isp1.ca,pop.isp2.ca,pop.isp3.net but after line 577 value = (mail.isp1.ca,pop.isp2.ca,pop.isp3.net,) Then you get this error message when you try to save # 'mail.gmc.isp1.ca,pop.isp2.ca,pop.isp3.net' is not a value valid for [POP3 Proxy Options] Servers # '6110,6111, 6112' is not a value valid for [POP3 Proxy Options] Ports Remi papaDoc@videotron.ca ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2003-04-29 11:20 Message: Logged In: YES user_id=552329 Fixed in latest cvs. Sorry, my fault again. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=728886&group_id=61702 From T.A.Meyer at massey.ac.nz Tue Apr 29 12:29:55 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 28 19:30:35 2003 Subject: [Spambayes] gratuitous changes? + bugs Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D588@its-xchg4.massey.ac.nz> > I guess it's my fault for not reading every message in the > spambayes group. It's just too much! :-) I do try and make the important ones like that stand out, but I can understand how they could be missed easily enough. > It's one of my own scripts. None of the available scripts > does what I need, so I wrote my own. And in order to keep my > scripts independent of any spambayes testing that I was doing > at the time, I used mergefiles to merge in my own settings. Well, I was wrong anyway since hammiefilter did call mergefiles as well (how the search missed that, I don't know). I'll be more rigorous about the cvs logs next time, promise. > If you look at the diff you'll see that I changed the test > touching self.allowed_values. That is *allowed* values. Ah, I get it now, sorry - I should have read the diff more closely. It's the code in is_boolean that fails. I thought you meant code in the is_valid test, which should work fine. I'll check a fix in for this ASAP. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Apr 29 12:40:08 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 28 19:40:42 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D599@its-xchg4.massey.ac.nz> > OK, now everything worked, sort of: > > %setenv PYTHONPATH ~/src/email-2.5 > %python imapfilter.py -t -c -v > Loading database hammie.db... Loading state from > hammie.db database > hammie.db is a new database > Done. > Training > Training ham folder HamBox > 0 trained. > Training spam folder SpamBox > 0 trained. > Training took 51.954687953 seconds, 0 messages were trained > Classifying > Filtering took 1.34056997299 seconds. > > But no messages got classified spam or unsure, AFAICT. > Even after I move some of the spam training messages into my > inbox, they're not classified as spam. Everything *should* be classified as unsure, since it's a new db and nothing was trained. Looks like it's not reading the messages at all. I'll take another look at it - I was a bit rushed when I did the code to stop relying on the UIDs. =Tony Meyer From T.A.Meyer at massey.ac.nz Tue Apr 29 14:05:46 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Mon Apr 28 21:06:23 2003 Subject: [Spambayes] big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> > But no messages got classified spam or unsure, AFAICT. > Even after I move some of the spam training messages into my > inbox, they're not classified as spam. I think I have fixed this now. I've changed imapfilter so that instead of iterating through the entire RFC822 message when we go through a folder, we just retrieve the headers (which we can use to determine if the message has been trained/classified or not). If we then have to do something to the message, the substance is retrieved. This should speed things up (no more retrieving the substance many times), and I think I fixed the bug that stopped messages being filtered along the way. If you could check it (again!) that would be great. If it's still not working, could you run it with "-i4" and see whether it's doing any FETCH RFC822[.PEEK] commands? If not, then there's something up with the db checking, if so then it's something else. BTW, I am working on the headers-with-no-line-endings thing (I grabbed the el source that you suggested), but it's taking a while... =Tony Meyer From jwilliam at xmission.com Tue Apr 29 00:10:34 2003 From: jwilliam at xmission.com (Jerry Williams) Date: Tue Apr 29 01:10:39 2003 Subject: [Spambayes] Bug [ 725449 ] Addin won't initialize Message-ID: SpamBayes-Outlook-Setup-002.exe worked so well under Windows 2000 that I decided to remove Python 2.2 and the win32 stuff and just use the Outlook setup. But I run into the same problem as this bug. So just as an FYI I am running Windows ME. So I just reinstalled Python and the win32all-152.exe and spambayes-1.0a2.tar.gz and it remembered all of my settings. So now I have a question. Where is the config stored? Mark H. and everyone else, thanks for a great program! P.S. I am going to dump WinME just as soon as I can move everything to Win2000. From T.A.Meyer at massey.ac.nz Tue Apr 29 18:15:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 29 01:16:00 2003 Subject: [Spambayes] Bug [ 725449 ] Addin won't initialize Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130150D7CB@its-xchg4.massey.ac.nz> > So now I have a question. Where is the config stored? At the moment the configuration for the Outlook plugin is stored in two files. These will most probably be called default_configuration.pck and default_bayes_customize.ini. The pickle (.pck) stores things specific to the plugin (like ids for the folders you are filtering), and the ini file stores spambayes-wide settings, like tokenisation settings. Where these files are found depends on your system - they get put into the folder that the OS tells spambayes is most appropriate for application data. On Win2k/XP, this will be in the Documents and Settings/username/Application Data/ path; on earlier versions, this might be the spambayes directory itself. =Tony Meyer From tim.one at comcast.net Tue Apr 29 02:17:58 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Apr 29 01:18:36 2003 Subject: [Spambayes] Bug [ 725449 ] Addin won't initialize In-Reply-To: Message-ID: [Jerry Williams] > SpamBayes-Outlook-Setup-002.exe worked so well under Windows 2000 that I > decided to remove Python 2.2 and the win32 stuff and just use the > Outlook setup. But I run into the same problem as this bug. So just as > an FYI I am running Windows ME. So I just reinstalled Python and the > win32all-152.exe and spambayes-1.0a2.tar.gz and it remembered all of my > settings. So now I have a question. Where is the config stored? It can vary according to Windows flavor and login account. On the Win98SE box I'm typing at this moment, it's stored in files under C:\WINDOWS\Application Data\SpamBayes\ > Mark H. and everyone else, thanks for a great program! Glad you're enjoying it! From noreply at sourceforge.net Mon Apr 28 23:29:19 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 29 01:29:25 2003 Subject: [Spambayes] [ spambayes-Bugs-729345 ] Outlook 2000/Win98SE failed Message-ID: Bugs item #729345, was opened at 2003-04-29 15:29 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=729345&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Peter Oram (poram) Assigned to: Mark Hammond (mhammond) Summary: Outlook 2000/Win98SE failed Initial Comment: Have installed the 002 binary for the Outlook Add-in, however after installation the add-in fails to place the button on the toolbar. Have attached the log-file with it's generated errors. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=729345&group_id=61702 From lists at olivermaunder.co.uk Tue Apr 29 10:41:59 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 29 04:41:47 2003 Subject: [Spambayes] message.py bugs Message-ID: <3EAE3AD7.3090907@olivermaunder.co.uk> Hi all (and Tim especially) I've got a problem with the latest version of Message.py (1.24) The problem is in line 193 - in class Message return self._force_CRLF(message.SBHeaderMessage.as_string(self)) The interpreter says it doesn't understand "message". Presumably this is because we're already inside message.py. Removing "message." makes things worse. SBHeaderMessage.as_string gets called, but SBHeaderMessage doesn't have it's own as_string method, so the version in Message gets called, which in turn calls SBHeaderMessage.as_string, and infinite recursion ensues. I've got round this by doing: return self._force_CRLF(email.Message.Message.as_string(self)) Don't know if this does exacly what you intend, but it works for now. Is there a "super" or "parent" object you can use to call a method in the parent class? It would save typing "email.Message.Message". I got a python book yesterday, so I'll stop asking these questions soon :-) Olly From lists at olivermaunder.co.uk Tue Apr 29 10:46:13 2003 From: lists at olivermaunder.co.uk (Oliver Maunder) Date: Tue Apr 29 04:45:51 2003 Subject: [Spambayes] big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> Message-ID: <3EAE3BD5.8070200@olivermaunder.co.uk> Meyer, Tony wrote: >If you could check it (again!) that would be great. If it's still not >working, could you run it with "-i4" and see whether it's doing any >FETCH RFC822[.PEEK] commands? If not, then there's something up with >the db checking, if so then it's something else. > > Got the new version, and it seems good so far. The training db contains the correct numbers, at least. Don't know if the filtering is working, because I haven't got any spam in my inbox . I left the previous version of imapfilter running on my work PC when I left last night, filtering every 15 minutes. There was no spam in my inbox when I checked from home, and imapfilter was still running this morning :-) Olly From noreply at sourceforge.net Tue Apr 29 05:54:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 29 07:54:41 2003 Subject: [Spambayes] [ spambayes-Bugs-725449 ] Binary plugin fails on Win9x Message-ID: Bugs item #725449, was opened at 2003-04-22 14:41 Message generated for change (Settings changed) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: dan maer (dmara) Assigned to: Mark Hammond (mhammond) >Summary: Binary plugin fails on Win9x Initial Comment: Running pure Windows (98SE) and Outlook 2000. No Python installed. I can't get the plugin to initialize. I go to Tools/options/other/advanced options/Com Addins and check the SpamBayes plug-in box, but it won't stayed checked, and I've uninstalled/reinstalled Outlook and the plugin but no go. I've got this working great, really great, on Winnt 4.0 and Xp. Logfile being attached for upload... Dan ---------------------------------------------------------------------- Comment By: J (usertgo) Date: 2003-04-26 15:03 Message: Logged In: YES user_id=763953 ok, since i liked it so much on win2000 i did the python install & manual install of the spambayes outlook addin & its working good now, so i guess it was the installer. thanks ---------------------------------------------------------------------- Comment By: J (usertgo) Date: 2003-04-24 15:10 Message: Logged In: YES user_id=763953 I also get the same errors as in the logfile on a Win98SE installation w/Outlook 2000 SP1, but it works on Win2000 w/same Outlook. If you have any workarounds (w/regedit?) please let us know... ---------------------------------------------------------------------- Comment By: dan maer (dmara) Date: 2003-04-23 13:06 Message: Logged In: YES user_id=759684 Ok Mark... Anyway to bypass the installer issue and get it working by manual means? Dan ---------------------------------------------------------------------- Comment By: Mark Hammond (mhammond) Date: 2003-04-23 09:19 Message: Logged In: YES user_id=14198 This is an issue with the "Installer" tool I use. I will try and sus it out before the next binary release. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=725449&group_id=61702 From noreply at sourceforge.net Tue Apr 29 05:55:35 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Tue Apr 29 07:55:42 2003 Subject: [Spambayes] [ spambayes-Bugs-729345 ] Outlook 2000/Win98SE failed Message-ID: Bugs item #729345, was opened at 2003-04-29 15:29 Message generated for change (Comment added) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=729345&group_id=61702 Category: Outlook Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: Peter Oram (poram) Assigned to: Mark Hammond (mhammond) Summary: Outlook 2000/Win98SE failed Initial Comment: Have installed the 002 binary for the Outlook Add-in, however after installation the add-in fails to place the button on the toolbar. Have attached the log-file with it's generated errors. ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-04-29 21:55 Message: Logged In: YES user_id=14198 Thanks, but dupe of 725449, even though you would have been very hard-pressed to know. I changed the summary of that one to better reflect this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=729345&group_id=61702 From tim at fourstonesExpressions.com Tue Apr 29 15:28:53 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 29 15:28:59 2003 Subject: [Spambayes] message.py bugs Message-ID: <076074OJB9524ZZVZTSFE7671SQIC.3eaed275@myst> I checked in a fix for this last night, after it bit me too... :) 4/29/2003 3:41:59 AM, Oliver Maunder wrote: >Hi all (and Tim especially) > >I've got a problem with the latest version of Message.py (1.24) > >The problem is in line 193 - in class Message > > return self._force_CRLF(message.SBHeaderMessage.as_string(self)) > >The interpreter says it doesn't understand "message". Presumably this is >because we're already inside message.py. > >Removing "message." makes things worse. SBHeaderMessage.as_string gets >called, but SBHeaderMessage doesn't have it's own as_string method, so >the version in Message gets called, which in turn calls >SBHeaderMessage.as_string, and infinite recursion ensues. > >I've got round this by doing: > return self._force_CRLF(email.Message.Message.as_string(self)) > >Don't know if this does exacly what you intend, but it works for now. Is >there a "super" or "parent" object you can use to call a method in the >parent class? It would save typing "email.Message.Message". > >I got a python book yesterday, so I'll stop asking these questions soon :-) > >Olly > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From cleggj at attglobal.net Wed Apr 30 11:09:15 2003 From: cleggj at attglobal.net (John Clegg) Date: Tue Apr 29 18:12:15 2003 Subject: [Spambayes] One potential problem with this filter apporach Message-ID: <000c01c30e9b$fbf39430$c86860cb@megacity1> Hi I am really impressed with this implementation of a spam filter. Like everyone else I (and my company) have been plagued by spam. I was thinking about the way the spambayes works, and I think I have thought of a way spammers could get around it. A devious spammer could use images as instead of text. So the email would just contain an HTML table. It's something you guys should think about how your filter will operate on these types of emails. FYI: I am former CTO of Baazee.com from India and I have designed email delivery systems for the company. Keep up the good work. Regards John Clegg Tech Consultant From dhylands at broadcom.com Tue Apr 29 15:27:08 2003 From: dhylands at broadcom.com (Dave Hylands) Date: Tue Apr 29 18:12:35 2003 Subject: [Spambayes] Problem installing SpamBayes Outlook Addin Message-ID: <725301F4D1E9D411A69F0002A507428E014D1906@nt-rmna-exch.ca.broadcom.com> Hi, I'm running Windows 2000 (SP3) with Outlook XP installed. I have Python 2.2.2 installed and win32all-152 installed. When I try to execute the SpamBayes-Outlook-Setup-002.exe program, it gets partway through the installation and then dies with the following error: C:\Program Files\Spambayes Outlook Addin\spambayes_addin.dll Unable to register the DLL/OCX: DllRegisterServer failed; code 0x00000000. Click Retry to try again... Does anybody have any ideas how I might get around this problem? -- Dave Hylands Vancouver, BC, Canada http://www.DaveHylands.com/ From tim at fourstonesExpressions.com Tue Apr 29 19:10:23 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 29 19:10:50 2003 Subject: [Spambayes] One potential problem with this filter apporach In-Reply-To: <000c01c30e9b$fbf39430$c86860cb@megacity1> Message-ID: 4/29/2003 5:09:15 PM, "John Clegg" wrote: >Hi > >I am really impressed with this implementation of a spam filter. Like >everyone else I (and my company) have been plagued by spam. Thanks from the whole team. > I was thinking >about the way the spambayes works, and I think I have thought of a way >spammers could get around it. A devious spammer could use images as instead >of text. So the email would just contain an HTML table. It's something you >guys should think about how your filter will operate on these types of >emails. Actually, our tests include this type of mail. Our results indicate that email with a single url for a graphic image becomes an extremely strong indicator of spam very quickly. The fact that you see very little of this kind of spam presently indicates that spammers realize this, and are busily trying to find other more clever ways around bayesian filter technologies. > >FYI: I am former CTO of Baazee.com from India and I have designed email >delivery systems for the company. > >Keep up the good work. > >Regards > >John Clegg >Tech Consultant > > > > > > > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Tue Apr 29 15:08:38 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 29 19:10:59 2003 Subject: [Spambayes] message.py bugs In-Reply-To: <3EAE3AD7.3090907@olivermaunder.co.uk> Message-ID: I checked in a fix for this last night, after it bit me too... :) 4/29/2003 3:41:59 AM, Oliver Maunder wrote: >Hi all (and Tim especially) > >I've got a problem with the latest version of Message.py (1.24) > >The problem is in line 193 - in class Message > > return self._force_CRLF(message.SBHeaderMessage.as_string(self)) > >The interpreter says it doesn't understand "message". Presumably this is >because we're already inside message.py. > >Removing "message." makes things worse. SBHeaderMessage.as_string gets >called, but SBHeaderMessage doesn't have it's own as_string method, so >the version in Message gets called, which in turn calls >SBHeaderMessage.as_string, and infinite recursion ensues. > >I've got round this by doing: > return self._force_CRLF(email.Message.Message.as_string(self)) > >Don't know if this does exacly what you intend, but it works for now. Is >there a "super" or "parent" object you can use to call a method in the >parent class? It would save typing "email.Message.Message". > >I got a python book yesterday, so I'll stop asking these questions soon :-) > >Olly > > > >_______________________________________________ >Spambayes mailing list >Spambayes@python.org >http://mail.python.org/mailman/listinfo/spambayes > > c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Wed Apr 30 11:59:09 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 29 19:12:47 2003 Subject: [Spambayes] One potential problem with this filter apporach Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13016AB5D8@its-xchg4.massey.ac.nz> > I was thinking about the way the spambayes > works, and I think I have thought of a way spammers could get > around it. A devious spammer could use images as instead of > text. So the email would just contain an HTML table. It's > something you guys should think about how your filter will > operate on these types of emails. I would be surprised if this worked. Spambayes doesn't really care what messages contain, just whether they (well, the tokens) are more similar to messages you have trained as spam, or those you've trained as ham. Unless you have a lot of ham that looks like this, then the few clues that would get generated would probably have high spam prob. There's also the headers, of course, which generate their own clues. I don't think that this method would get around a pseudo-bayesian system (like Spambayes) nearly as well as other methods - like quoting a message that you wrote at the end of their mail. =Tony Meyer From bbands at yahoo.com Tue Apr 29 17:26:35 2003 From: bbands at yahoo.com (John Bollinger) Date: Tue Apr 29 19:26:38 2003 Subject: [Spambayes] Superb! In-Reply-To: <725301F4D1E9D411A69F0002A507428E014D1906@nt-rmna-exch.ca.broadcom.com> Message-ID: <20030429232635.16995.qmail@web13907.mail.yahoo.com> win2k Outlook2000 SpamBayes-Outlook-Setup-002.exe :-)) --jab ===== John Bollinger, CFA, CMT www.BollingerBands.com If you advance far enough, you arrive at the beginning. __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com From seandarcy at hotmail.com Tue Apr 29 22:09:00 2003 From: seandarcy at hotmail.com (sean darcy) Date: Tue Apr 29 21:09:34 2003 Subject: [Spambayes] Configuration page of pop3proxy fails Message-ID: I'm just trying out spambayes. I dl'd from cvs. Installed. Then cd'd to /opt/spam/spambayes/data. Then: pop3proxy.py -b Loading database... Done. User interface url is http://localhost:8880/ Which brought up my browser ( mozilla). Went to the configuration page. Filled in the stuff. Clicked Save. Here's what I got: 500 Server error Traceback (most recent call last): File "/usr/lib/python2.2/site-packages/spambayes/Dibbler.py", line 398, in found_terminator getattr(plugin, name)(**params) File "/usr/lib/python2.2/site-packages/spambayes/UserInterface.py", line 511, in onChangeopts options.update_file(optionsPathname) File "/usr/lib/python2.2/site-packages/spambayes/Options.py", line 1303, in update_file shutil.copyfile(out.name, filename) File "/usr/lib/python2.2/shutil.py", line 28, in copyfile fsrc = open(src, 'rb') IOError: [Errno 2] No such file or directory: '(fdopen)' Any help appreciated. sean _________________________________________________________________ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From T.A.Meyer at massey.ac.nz Wed Apr 30 14:22:25 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 29 21:22:59 2003 Subject: [Spambayes] Configuration page of pop3proxy fails Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13016AB693@its-xchg4.massey.ac.nz> [sean darcy] > "/usr/lib/python2.2/site-packages/spambayes/Options.py", line 1303, > in update_file > shutil.copyfile(out.name, filename) > > File "/usr/lib/python2.2/shutil.py", line 28, in copyfile > fsrc = open(src, 'rb') > > IOError: [Errno 2] No such file or directory: '(fdopen)' What platform is this with? (The /usr/lib suggests some sort of *nix flavour). I'm pretty sure that this is a problem with the temporary file. What it's doing is saving the options to a temporary file ('out'), and it then copies this over to the actual config file. (This means that if something goes wrong halfway, the original is still ok). It seems to me from the trace that the temp files is giving '(fdopen)' as it's filename. The correct filename is retrieved on Windows; I can't test elsewhere. I need the filename to pass to copyfile. Could one of the Python'ly wise people tell me the correct way to do this so I can fix Options.py? =Tony Meyer From tim.one at comcast.net Tue Apr 29 22:28:17 2003 From: tim.one at comcast.net (Tim Peters) Date: Tue Apr 29 21:29:33 2003 Subject: [Spambayes] One potential problem with this filter apporach In-Reply-To: <000c01c30e9b$fbf39430$c86860cb@megacity1> Message-ID: [John Clegg] > I am really impressed with this implementation of a spam filter. So are we . Thanks! > Like everyone else I (and my company) have been plagued by spam. I was > thinking about the way the spambayes works, and I think I have thought > of a way spammers could get around it. A devious spammer could use > images as instead of text. So the email would just contain an HTML > table. It's something you guys should think about how your filter will > operate on these types of emails. Asian spam has been doing this for a long time. I believe it's not because they're trying to fool filters, but because they can't rely on email clients rendering their character sets correctly. So they put a jpeg of the spam out on the web, and just send URLs in the email (embedded in suitable HTML to get the image(s) rendered). I saw this show up in less exotic spam much later, where I believe they are trying to fool filters. As has already been mentioned, spambayes doesn't do any sort of semantic content analysis, yet this kind of spam usually gets caught anyway. Clues they don't manage to hide this way include "funny stuff" in the email headers, and the URLs themselves. Indeed, just finding the character strings ".jpg" or ".gif" in a URL turn out to be strong spam clues, and if the message doesn't have any hammish text then spambayes pays a lot of attention to the handful of spam clues it finds. > FYI: I am former CTO of Baazee.com from India and I have designed > email delivery systems for the company. Next time you'll know enough to code them in Python . From seandarcy at hotmail.com Tue Apr 29 22:59:23 2003 From: seandarcy at hotmail.com (sean darcy) Date: Tue Apr 29 21:59:57 2003 Subject: [Spambayes] Configuration page of pop3proxy fails Message-ID: Yes it's linux. the data directory is /opt/spam/spambayes/data which was empty. Now it has: -rw-r--r-- 1 root root 0 Apr 29 19:41 bayescustomize.ini -rw-r--r-- 1 root root 12288 Apr 29 19:34 hammie.db drwxr-xr-x 2 root root 4096 Apr 29 19:34 pop3proxy-ham-cache drwxr-xr-x 2 root root 4096 Apr 29 19:34 pop3proxy-spam-cache drwxr-xr-x 2 root root 4096 Apr 29 19:34 pop3proxy-unknown-cache -rw-r--r-- 1 root root 12288 Apr 29 19:34 spambayes.messageinfo.db Note the .ini is empty. Is there a reference on how to set up the .ini file manually? thanks for the help sean _________________________________________________________________ STOP MORE SPAM with the new MSN 8 and get 2 months FREE* http://join.msn.com/?page=features/junkmail From neale at woozle.org Tue Apr 29 19:59:56 2003 From: neale at woozle.org (Neale Pickett) Date: Tue Apr 29 22:00:07 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: <20030428183158.GA72269@ainaz.pair.com> (David McLaughlin's message of "Mon, 28 Apr 2003 14:31:58 -0400") References: <20030424153833.GA63651@ainaz.pair.com> <20030428183158.GA72269@ainaz.pair.com> Message-ID: David McLaughlin writes: > Thanks for taking a look at it! > > I have put a sample before and after mbox at the following location: > > ftp://ftp.dsmcl.net/spambayes_samplembox.tgz > > It looks like it may be duplicating some lines in the header, and > adding an extra line break, which generates "extra" bogus mail > messages. Yeah, sure enough. You're using mutt? The mbox "standard" is that any line beginning with "From " denotes a new messages. So a diff of those two mailboxes shows things like this: From removed@example.com Mon Apr 28 16:37:18 2003 Return-Path: Delivered-To: +X-Spambayes-Trained: spam + From removed@example.com Mon Apr 28 16:37:18 2003 Return-Path: Delivered-To: I think spambayes is actually doing the right thing here--it's taking a weird mbox and un-weirding it. I think Tim Stone might be working on a generic message store thingy: Tim, would that eliminate the need to rewrite mailboxes altogether? But David, if I were you I'd start trying to hunt down what's creating those duplicate headers. It might be some sort of wonky procmail recipe that just writes out headers and then drops through, but that's just a shot in the dark guess. Heh, maybe it's hammiefilter <0.7 wink> Neale From dave at boost-consulting.com Tue Apr 29 23:28:16 2003 From: dave at boost-consulting.com (David Abrahams) Date: Tue Apr 29 22:28:59 2003 Subject: [Spambayes] Re: big imapfilter.py problem In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> (Tony Meyer's message of "Tue, 29 Apr 2003 13:05:46 +1200") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> Message-ID: "Meyer, Tony" writes: >> But no messages got classified spam or unsure, AFAICT. >> Even after I move some of the spam training messages into my >> inbox, they're not classified as spam. > > I think I have fixed this now. I've changed imapfilter so that instead > of iterating through the entire RFC822 message when we go through a > folder, we just retrieve the headers (which we can use to determine if > the message has been trained/classified or not). If we then have to do > something to the message, the substance is retrieved. > > This should speed things up (no more retrieving the substance many > times), and I think I fixed the bug that stopped messages being filtered > along the way. > > If you could check it (again!) that would be great. If it's still not > working, could you run it with "-i4" and see whether it's doing any > FETCH RFC822[.PEEK] commands? If not, then there's something up with > the db checking, if so then it's something else. It's something else. With -i5: ... \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] 26:47.80 < UID 303) 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"] 26:47.81 < BBHC5 OK completed 26:47.81 matched r'(?PBBHC\d+) (?P[A-Z]+) (?P.*)' => ('BBHC5', 'OK', 'completed') 26:47.81 untagged_responses[FETCH] => [('267 (RFC822 {3537}', 'Return-Path: \r\nReceived: fr om mx04.mrf.mail.rcn.net ([207.172.4.53] verified)\r\n\tby stlport.com (CommuniGate Pro SMTP 3.5.9)\r\n\twith ESMTP id 2 04725 for dave@boost-consulting.com;\r\n\tSat, 01 Mar 2003 12:33:06 -0800\r\nReceived: from mail.python.org ([12.155.117 .29])\r\n\tby mx04.mrf.mail.rcn.net with esmtp (Exim 3.35 #4)\r\n\tid 18pDfM-0001pg-00\r\n\tfor david.abrahams@rcn.com; Sat, 01 Mar 2003 15:33:04 -0500\r\nReceived: from localhost.localdomain ([127.0.0.1] helo=mail.python.org)\r\n\tby mail. python.org with esmtp (Exim 4.05)\r\n\tid 18pDfL-00048J-00; Sat, 01 Mar 2003 15:33:03 -0500\r\nReceived: from srv.global ite.com.br ([200.180.16.1])\r\n\tby mail.python.org with esmtp (Exim 4.05)\r\n\tid 18pDem-00047V-00\r\n\tfor c++-sig@pyt hon.org; Sat, 01 Mar 2003 15:32:28 -0500\r\nReceived: from globalite.com.br (ixstoj@bridge.int-02.globalite.com.br\r\n\t [200.180.16.8])\r\n\tby srv.globalite.com.br (8.12.7/8.12.6) with ESMTP id h21HUKU7071413\r\n\tfor ; Sat, 1 Mar 2003 17:30:21 GMT\r\nMessage-ID: <3E6118D9.8030103@globalite.com.br>\r\nFrom: Nicodemus \r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;\r\n\trv:1.2.1) Gecko/20021130\r\nX-Accept-Langua ge: en-us, en\r\nMIME-Version: 1.0\r\nTo: c++-sig@python.org\r\nSubject: Re: [C++-sig] Support for member operators?\r\n References: <20030301201353.83610.qmail@web20205.mail.yahoo.com>\r\nIn-Reply-To: <20030301201353.83610.qmail@web20205.ma il.yahoo.com>\r\nContent-Type: text/plain; charset=us-ascii; format=flowed\r\nContent-Transfer-Encoding: 7bit\r\nX-Spam- Status: No, hits=-4.9 required=5.0\r\n\ttests=BODY_PYTHON_ZOPE,EMAIL_ATTRIBUTION,FROM_BR_PT_AR,IN_REP_TO,RCVD_SPAMLAND_2 ,REFERENCES,SPAM_PHRASE_00_01,USER_AGENT,USER_AGENT_MOZILLA_UA,X_ACCEPT_LANG\r\nX-Spam-Level: \r\nSender: c++-sig-admin@ python.org\r\nErrors-To: c++-sig-admin@python.org\r\nX-BeenThere: c++-sig@python.org\r\nX-Mailman-Version: 2.0.13 (10127 0)\r\nPrecedence: bulk\r\nReply-To: c++-sig@python.org\r\nList-Help: \r\ nList-Post: \r\nList-Subscribe: ,\r\n\t\r\nList-Id: Development of Python/C++ integration \ r\nList-Unsubscribe: ,\r\n\t\r\nList-Archive: \r\nDate: Sat, 01 Mar 2003 17:32:25 -0300\r\nX- Spambayes-Classification: unsure\r\nThanks for the reply Ralf,\r\n\r\nRalf W. Grosse-Kunstleve wrote:\r\n\r\n>--- Nicode mus wrote:\r\n> \r\n>\r\n>> const C operator+(int o)\r\n>>...\r\n>>\r\n>>If I try to e xpose the operator+ like this:\r\n>>\r\n>> .def( self + other() )\r\n>>\r\n>>I get a compiler error: "no operat or "+" matches these operands"\r\n>> \r\n>>\r\n>\r\n>What happens if you change the member function signature to\r\n> \r\n>C operator+(int o) const\r\n> \r\n>\r\n\r\nIt works then. 8)\r\nBut what if a class defines a operator + that is n ot const, ie., it \r\nchanges an attribute of the class? Can Boost.Python export this, or a \r\nparticular signature is required to expose operators?\r\n\r\n>>Does Boost.Python support member operators?\r\n>> \r\n>>\r\n>\r\n>I am pretty sure it does, but your placement of "const" seems very unusual.\r\n> \r\n>\r\n\r\nIt means to return a "const C" object , not that the operator+ is const.\r\n\r\nNicodemus.\r\n\r\n\r\n\r\n_______________________________________________\r\nC ++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n'), ' UID 303)'] Traceback (most recent call last): File "imapfilter.py", line 697, in ? run() File "imapfilter.py", line 683, in run imap_filter.Train() File "imapfilter.py", line 524, in Train num_ham_trained = folder.Train(self.classifier, False) File "imapfilter.py", line 464, in Train for msg in self: File "imapfilter.py", line 394, in __iter__ yield self[key] File "imapfilter.py", line 441, in __getitem__ msg.get_substance() File "imapfilter.py", line 291, in get_substance new_msg = email.Parser.Parser().parsestr(data["RFC822"]) File "/usr/local/lib/python2.2/email/Parser.py", line 75, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/usr/local/lib/python2.2/email/Parser.py", line 62, in parse self._parseheaders(root, fp) File "/usr/local/lib/python2.2/email/Parser.py", line 128, in _parseheaders raise Errors.HeaderParseError( email.Errors.HeaderParseError: Not a header, not a continuation: ``Thanks for the reply Ralf,'' > BTW, I am working on the headers-with-no-line-endings thing (I grabbed > the el source that you suggested), but it's taking a while... Good luck! -- Dave Abrahams Boost Consulting www.boost-consulting.com From djgoodwins at yahoo.com Tue Apr 29 21:01:19 2003 From: djgoodwins at yahoo.com (JGoodwin) Date: Tue Apr 29 23:09:50 2003 Subject: [Spambayes] (no subject) Message-ID: <20030430030119.43546.qmail@web80004.mail.yahoo.com> could somebody out there please explain what spambayes is?? don't know how i got it on my email. help! From tim_one at email.msn.com Wed Apr 30 00:19:51 2003 From: tim_one at email.msn.com (Tim Peters) Date: Tue Apr 29 23:20:48 2003 Subject: [Spambayes] (no subject) In-Reply-To: <20030430030119.43546.qmail@web80004.mail.yahoo.com> Message-ID: [JGoodwin] > could somebody out there please explain what spambayes is?? It's a spam detection system. See: http://spambayes.sf.net/ > don't know how i got it on my email. Sorry, neither do I. I don't know what "got it on my email" means, though. > help! Is it creating a problem for you? If so, what? Your message is pretty mysterious after its first line . From tim at fourstonesExpressions.com Tue Apr 29 23:22:21 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Tue Apr 29 23:22:29 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: Message-ID: 4/29/2003 8:59:56 PM, Neale Pickett wrote: >I think spambayes is actually doing the right thing here--it's taking a >weird mbox and un-weirding it. I think Tim Stone might be working on a >generic message store thingy: Tim, would that eliminate the need to >rewrite mailboxes altogether? I haven't started looking at the mbox problems yet, but in general it would only eliminate mbox rewriting if you don't want *any* of the spambayes headers added to the messages in the mbox. It can remember, given a message id, how that message was classified, and how it is trained, but that's all at the moment. That would seem to be inadequate for this problem... c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From T.A.Meyer at massey.ac.nz Wed Apr 30 16:56:49 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Tue Apr 29 23:57:27 2003 Subject: [Spambayes] Re: big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13016AB783@its-xchg4.massey.ac.nz> [...other headers...] > List-Archive: > \r\nDate: Sat, 01 > Mar 2003 17:32:25 -0300\r\nX- > Spambayes-Classification: unsure\r\nThanks for the reply > Ralf,\r\n\r\n [...rest of body...] > File "/usr/local/lib/python2.2/email/Parser.py", line 128, > in _parseheaders > raise Errors.HeaderParseError( > email.Errors.HeaderParseError: Not a header, not a > continuation: ``Thanks for the reply Ralf,'' These are the important bits. This has a message with the classification header, but no blank line separating the headers and the body. So parsing the headers dies. Any idea why this might happen, Tim? I'll have a look, but I probably won't be able to get to this tomorrow. I *suspect* that it's a problem in message.py rather than imapfilter.py, but I wouldn't guarantee it. =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Apr 30 17:02:35 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 30 00:03:10 2003 Subject: [Spambayes] Configuration page of pop3proxy fails Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13016AB790@its-xchg4.massey.ac.nz> > Is there a reference on how to set up the .ini file manually? There's a question about this in FAQ.txt, although you'll have to do a wee bit of Python to get a list of the options that need setting. Basically, though, you'll want this in your ini file: [pop3proxy] servers:pop.example.com:110 ports:110 persistent_storage_file:path/to/database.db (Obviously with the appropriate values replaced). This should be all that you need to use pop3proxy. The FAQ has details for finding out what the other options that you can set are. I will make sure that I figure out the temp file problem, but this might not be until tomorrow. =Tony Meyer From dave at boost-consulting.com Wed Apr 30 02:21:08 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Apr 30 01:21:49 2003 Subject: [Spambayes] Re: big imapfilter.py problem In-Reply-To: (David Abrahams's message of "Tue, 29 Apr 2003 22:28:16 -0400") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> Message-ID: David Abrahams writes: > "Meyer, Tony" writes: > >>> But no messages got classified spam or unsure, AFAICT. >>> Even after I move some of the spam training messages into my >>> inbox, they're not classified as spam. >> >> I think I have fixed this now. I've changed imapfilter so that instead >> of iterating through the entire RFC822 message when we go through a >> folder, we just retrieve the headers (which we can use to determine if >> the message has been trained/classified or not). If we then have to do >> something to the message, the substance is retrieved. >> >> This should speed things up (no more retrieving the substance many >> times), and I think I fixed the bug that stopped messages being filtered >> along the way. >> >> If you could check it (again!) that would be great. If it's still not >> working, could you run it with "-i4" and see whether it's doing any >> FETCH RFC822[.PEEK] commands? If not, then there's something up with >> the db checking, if so then it's something else. > > It's something else. With -i5: > > ... > \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] > 26:47.80 < UID 303) > 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"] The problem appears to be that imapfilter.py added an X-Spambayes-Classification: header to the message, but failed to add a newline afterwards, which is required to separate it from the message body. ...err, but I forgot to set PYTHONPATH to use email-2.5. Training works when I do that. -- Dave Abrahams Boost Consulting www.boost-consulting.com From dave at boost-consulting.com Wed Apr 30 02:21:17 2003 From: dave at boost-consulting.com (David Abrahams) Date: Wed Apr 30 01:31:24 2003 Subject: [Spambayes] Re: big imapfilter.py problem In-Reply-To: (David Abrahams's message of "Tue, 29 Apr 2003 22:28:16 -0400") References: <1ED4ECF91CDED24C8D012BCF2B034F130150D622@its-xchg4.massey.ac.nz> Message-ID: David Abrahams writes: > "Meyer, Tony" writes: > >>> But no messages got classified spam or unsure, AFAICT. >>> Even after I move some of the spam training messages into my >>> inbox, they're not classified as spam. >> >> I think I have fixed this now. I've changed imapfilter so that instead >> of iterating through the entire RFC822 message when we go through a >> folder, we just retrieve the headers (which we can use to determine if >> the message has been trained/classified or not). If we then have to do >> something to the message, the substance is retrieved. >> >> This should speed things up (no more retrieving the substance many >> times), and I think I fixed the bug that stopped messages being filtered >> along the way. >> >> If you could check it (again!) that would be great. If it's still not >> working, could you run it with "-i4" and see whether it's doing any >> FETCH RFC822[.PEEK] commands? If not, then there's something up with >> the db checking, if so then it's something else. > > It's something else. With -i5: > > ... > \nC++-sig mailing list\r\nC++-sig@python.org\r\nhttp://mail.python.org/mailman/listinfo/c++-sig\r\n')"] > 26:47.80 < UID 303) > 26:47.81 untagged_responses[FETCH] 1 += [" UID 303)"] The problem appears to be that imapfilter.py added an X-Spambayes-Classification: header to the message, but failed to add a newline afterwards, which is required to separate it from the message body. ...err, but I forgot to set PYTHONPATH to use email-2.5. Training works when I do that. -- Dave Abrahams Boost Consulting www.boost-consulting.com From T.A.Meyer at massey.ac.nz Wed Apr 30 18:44:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Apr 30 01:44:58 2003 Subject: [Spambayes] Re: big imapfilter.py problem Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13016AB805@its-xchg4.massey.ac.nz> > The problem appears to be that imapfilter.py added an > X-Spambayes-Classification: header to the message, but failed > to add a newline afterwards, which is required to separate it > from the message body. That's definitely what it was - except that message.py (which does the adding for imapfilter.py) doesn't add it as a string, it adds an entry to the headers dict in the email.Message.Message object. It looks like the email package didn't add the separating newline when it flattened the message. > ...err, but I forgot to set PYTHONPATH to use email-2.5. > Training works when I do that. What version of the email package does it use if you don't see the PYTHONPATH? If this sort of thing is going to happen, it might be worth noting somewhere. (Currently, the docs say that you need "the latest" version of the email package, or whatever comes with Python 2.2.2 (email.__version__ == '2.4.3') or later. =Tony Meyer From noreply at sourceforge.net Wed Apr 30 05:55:59 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 30 07:56:07 2003 Subject: [Spambayes] [ spambayes-Bugs-730151 ] Outlook fails to classify Message-ID: Bugs item #730151, was opened at 2003-04-30 13:55 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=730151&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Fredrik Rodland (fmmr) Assigned to: Mark Hammond (mhammond) Summary: Outlook fails to classify Initial Comment: After updating to the latest CVS version today, I get the following tracebacks whenever a mails arrives: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 210, in OnItemAdd ProcessMessage(msgstore_message, self.manager) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 170, in ProcessMessage disposition = filter.filter_message (msgstore_message, manager) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\filter.py", line 15, in filter_message prob = mgr.score(msg) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 440, in score return self.bayes.spamprob(bayes_tokenize(email), evidence) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 441, in _getclues prob = self.probability(record) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 301, in probability assert hamcount <= nham exceptions.AssertionError: ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=730151&group_id=61702 From Denny.Roberts at ProMedica.org Wed Apr 30 12:36:15 2003 From: Denny.Roberts at ProMedica.org (Roberts, Denny) Date: Wed Apr 30 11:56:33 2003 Subject: [Spambayes] Spambayes disappeared! Message-ID: <290D043E5BE9D311B3EA009027B69DC301C26FB0@PHSNTEXC06> We are testing Spambayes on several machines in our IS department. We have it loaded on W2K Prof. / Outlook 2000 which is running very well, whacking spam left and right (Love it!!!). But on the WXP Prof. / Outlook XP machine Spambayes ran fine for two weeks then disappeared from Outlook this morning. It still shows up in Add/Remove programs, the directory is still there but won't load with Outlook XP. We have deinstalled then reinstalled to no avail. We are using version 002 of the plugin from Mark Hammond. Any ideas on where to look for the problem? Denny Roberts Technical Services Administrator, IS Paramount Care, Inc. Office: (419) 887-2101 Fax: (419) 887-2019 From neale at woozle.org Wed Apr 30 11:45:18 2003 From: neale at woozle.org (Neale Pickett) Date: Wed Apr 30 13:45:29 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: (Tim Stone - Four Stones Expressions's message of "Tue, 29 Apr 2003 22:22:21 -0500") References: Message-ID: Tim Stone - Four Stones Expressions writes: > I haven't started looking at the mbox problems yet, but in general it > would only eliminate mbox rewriting if you don't want *any* of the > spambayes headers added to the messages in the mbox. In the case of training on an entire mailbox, that would probably be okay. The mbox format is kind of wonky, so if we can avoid touching it, tant mieux. Neale From david at dsmcl.net Wed Apr 30 15:19:57 2003 From: david at dsmcl.net (David McLaughlin) Date: Wed Apr 30 14:20:00 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: References: <20030424153833.GA63651@ainaz.pair.com> <20030428183158.GA72269@ainaz.pair.com> Message-ID: <20030430181957.GB22909@ainaz.pair.com> Thanks for the suggestion! I did some digging at various stages in my filtering process, to see exactly where those headers were added. It seems an older version of Mail::Audit was buggy -- upgrading to the latest version fixed the problem. Back to training, -- David McLaughlin david@dsmcl.net > But David, if I were you I'd start trying to hunt down what's creating > those duplicate headers. It might be some sort of wonky procmail recipe > that just writes out headers and then drops through, but that's just a > shot in the dark guess. Heh, maybe it's hammiefilter <0.7 wink> From Jocelyn.Montjaux at microcell.ca Wed Apr 30 17:06:04 2003 From: Jocelyn.Montjaux at microcell.ca (Montjaux, Jocelyn) Date: Wed Apr 30 16:10:02 2003 Subject: [Spambayes] Outlook 98 compatibility Message-ID: <938DB8735797D511A6CB0008C7A4D20002C2C64E@SMTLPEXC03.microcell.ca> Well first kudos to all of you guys. I am using installer 002 for Outlook XP and it works great. I would like to install SpamBayes Outlook add-in on Outlook 98 on NT4 SP6 but I get: spambayes_addin.dll Unable to register the DLL/OCX: DllRegisterServer failed; code 0x00000000 Looks like that "good" ol' Outlook 98 does not support plugins in this way? Is there a workaround? Regards, Jocelyn From noreply at sourceforge.net Wed Apr 30 15:55:05 2003 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Apr 30 17:55:14 2003 Subject: [Spambayes] [ spambayes-Bugs-730151 ] Outlook fails to classify Message-ID: Bugs item #730151, was opened at 2003-04-30 21:55 Message generated for change (Comment added) made by mhammond You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=730151&group_id=61702 Category: Outlook Group: None Status: Open Resolution: None Priority: 5 Submitted By: Fredrik Rodland (fmmr) Assigned to: Mark Hammond (mhammond) Summary: Outlook fails to classify Initial Comment: After updating to the latest CVS version today, I get the following tracebacks whenever a mails arrives: pythoncom error: Python error invoking COM method. Traceback (most recent call last): File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 275, in _Invoke_ return self._invoke_(dispid, lcid, wFlags, args) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 280, in _invoke_ return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None, None) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 601, in _invokeex_ return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags, args, kwArgs, serviceProvider) File "C:\PROGRA~1\_DEV\Python22\lib\site- packages\win32com\server\policy.py", line 541, in _invokeex_ return apply(func, args) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 210, in OnItemAdd ProcessMessage(msgstore_message, self.manager) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\addin.py", line 170, in ProcessMessage disposition = filter.filter_message (msgstore_message, manager) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\filter.py", line 15, in filter_message prob = mgr.score(msg) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\ Outlook2000\manager.py", line 440, in score return self.bayes.spamprob(bayes_tokenize(email), evidence) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 217, in chi2_spamprob clues = self._getclues(wordstream) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 441, in _getclues prob = self.probability(record) File "c:\Programfiler\_UTIL\spambayes_cvs\spambayes\s pambayes\classifier.py", line 301, in probability assert hamcount <= nham exceptions.AssertionError: ---------------------------------------------------------------------- >Comment By: Mark Hammond (mhammond) Date: 2003-05-01 07:55 Message: Logged In: YES user_id=14198 This has come up before - I am afraid you really must re-train ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498103&aid=730151&group_id=61702 From seandarcy at hotmail.com Wed Apr 30 19:22:20 2003 From: seandarcy at hotmail.com (sean darcy) Date: Wed Apr 30 18:22:54 2003 Subject: [Spambayes] Error on proxy connection & smtp proxy doesn't see spam Message-ID: I'm trying out sb on a linux redhat 9.0 machine using evolution. The web based configuration doesn't work, so I created this for the .ini file: [pop3proxy] servers:mailhost.mydomain.com <---- this is a real domain ports:120 persistent_storage_file:/opt/spam/spambayes/data/hammie.db add_mailid_to:header include_prob:True [smtpproxy] servers:mail.optonline.net ports:220 ham_address:ham@spam spam_address:spam@spam In evolution, the mail server is set to localhost:120. The smtp server is set to localhost:220 I start up pop3proxy: pop3proxy.py Loading database... Done. SMTP Listener on port 220 is proxying mail.optonline.net:25 Listener on port 120 is proxying mailhost.mydomain.com:110 User interface url is http://localhost:8880/ When evol gets the mail, it says getting pop summary, then hangs waiting for the first message. at the terminal window where I ran pop3proxy I get: error: uncaptured python exception, closing channel <__main__.ServerLineReader connected at 0x8353a4c> (exceptions.TypeError:len() of unsized object [/usr/lib/python2.2/asyncore.py|poll|99] [/usr/lib/python2.2/asyncore.py|handle_read_event|396] [/usr/lib/python2.2/asynchat.py|handle_read|130] [/usr/bin/pop3proxy.py|found_terminator|147] [/usr/bin/pop3proxy.py|onServerLine|215] [/usr/bin/pop3proxy.py|onResponse|289] [/usr/bin/pop3proxy.py|onTransaction|390] [/usr/bin/pop3proxy.py|onRetr|459] [/usr/lib/python2.2/site-packages/spambayes/message.py|as_string|193] [/usr/lib/python2.2/site-packages/email/Message.py|as_string|109] [/usr/lib/python2.2/site-packages/email/Generator.py|flatten|102] [/usr/lib/python2.2/site-packages/email/Generator.py|_write|137] [/usr/lib/python2.2/site-packages/email/Generator.py|_write_headers|183] [/usr/lib/python2.2/site-packages/email/Header.py|encode|412] [/usr/lib/python2.2/site-packages/email/Header.py|_split|297] [/usr/lib/python2.2/site-packages/email/Charset.py|encoded_header_len|341]) Is this a bug - or did I misconfigure spambayes/ Second: I tried to "train" sb by sending spam to spam@spam. This seemed to go OK, but when I go to the web page the "Review Messages" shows no untrained messages. How do these training messages show up? thanks for any help sean _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail From skip at pobox.com Wed Apr 30 18:29:22 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Apr 30 18:29:36 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: <20030430181957.GB22909@ainaz.pair.com> References: <20030424153833.GA63651@ainaz.pair.com> <20030428183158.GA72269@ainaz.pair.com> <20030430181957.GB22909@ainaz.pair.com> Message-ID: <16048.20034.691269.109781@montanaro.dyndns.org> David> It seems an older version of Mail::Audit was buggy -- upgrading David> to the latest version fixed the problem. That's all well and good, but I was thinking perhaps mboxtrain should maintain a little database parallel to its mbox file whose entries are keyed by message-id. It could store its results there and never have to monkey with the mbox file. "-f"orce training would simply be a matter of deleting all keys in that database at the start of the run. (My apologies if this was suggested previously. I hadn't really been paying much attention to this thread, then had occasion to try out mboxtrain for the first time last night. It got me thinking about the problem.) Skip From tim at fourstonesExpressions.com Wed Apr 30 23:16:45 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 30 23:18:05 2003 Subject: [Spambayes] Error on proxy connection & smtp proxy doesn't see spam In-Reply-To: Message-ID: 4/30/2003 5:22:20 PM, "sean darcy" wrote: > >at the terminal window where I ran pop3proxy I get: This looks like a bug. Would you mind making a tar of a mail that causes this error, so I can recreate and diagnose? > > >Is this a bug - or did I misconfigure spambayes/ > >Second: > >I tried to "train" sb by sending spam to spam@spam. This seemed to go OK, >but when I go to the web page the "Review Messages" shows no untrained >messages. >How do these training messages show up? They simply are used to train your database, then discarded. The Review Messages page is used for messages that the pop3proxy filters. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't. From tim at fourstonesExpressions.com Wed Apr 30 23:21:30 2003 From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions) Date: Wed Apr 30 23:34:54 2003 Subject: [Spambayes] Training corrupts mbox files In-Reply-To: <16048.20034.691269.109781@montanaro.dyndns.org> Message-ID: 4/30/2003 5:29:22 PM, Skip Montanaro wrote: > > David> It seems an older version of Mail::Audit was buggy -- upgrading > David> to the latest version fixed the problem. > >That's all well and good, but I was thinking perhaps mboxtrain should >maintain a little database parallel to its mbox file whose entries are keyed >by message-id. It could store its results there and never have to monkey >with the mbox file. "-f"orce training would simply be a matter of deleting >all keys in that database at the start of the run. Yeah, we're already walking down that path with the messageinfodb that's maintained in message.py. This will certainly need some more work for mbox purposes, but it would be perfect if mboxes never needed to be rewritten. That's the goal, afaic. c'est moi - TimS http://www.fourstonesExpressions.com http://wecanstopspam.org There are 10 kinds of people in the world: those who understand binary, and those who don't.