From Chris.Harward at siemens.com Mon Mar 1 14:23:57 2004 From: Chris.Harward at siemens.com (Harward Chris) Date: Mon Mar 1 14:26:13 2004 Subject: [spambayes-dev] A Microsoft Installer Package (.msi) for Spambaye s Message-ID: Trent Question about using the installer. Can I assume that once the program is placed on the PC, then the administrator can "activate" it? I am not very familiar with this program yet. Do you have any links to information on it? Thanks ________________________________________ SIEMENS Siemens Medical Solutions USA, Inc. Chris Harward TSE Uptime Services 110 MacAlyson Ct. Cary NC 25711 Tel: (919) 468-7466 Fax: (919) 468-7490 e-mail: chris.harward @siemens.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040301/82c502a4/attachment.html From papaDoc at videotron.ca Wed Mar 3 08:49:59 2004 From: papaDoc at videotron.ca (papaDoc) Date: Wed Mar 3 08:49:19 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id Message-ID: <4045E287.6080007@videotron.ca> Hi, I want to know the difference between the X-Spambayes-MailId and Message-id and why not all application add this to the mail (Ex sb_filter.py) Could it be usefull that sb_filter add this to the mail ? Remi -- /"\ \ / X ASCII Ribbon Campaign / \ Against HTML Email From skip at pobox.com Wed Mar 3 09:28:03 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Mar 3 09:28:18 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: <4045E287.6080007@videotron.ca> References: <4045E287.6080007@videotron.ca> Message-ID: <16453.60275.576151.42012@montanaro.dyndns.org> Remi> I want to know the difference between the X-Spambayes-MailId and Remi> Message-id and why not all application add this to the mail (Ex Remi> sb_filter.py) I've never heard of it. What's its purpose? Which applications add it? Remi> Could it be usefull that sb_filter add this to the mail ? I suppose, once we know what it's used for. Skip From tim at fourstonesExpressions.com Wed Mar 3 09:30:52 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Mar 3 09:30:58 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: <16453.60275.576151.42012@montanaro.dyndns.org> References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> Message-ID: On Wed, 3 Mar 2004 08:28:03 -0600, Skip Montanaro wrote: > > Remi> I want to know the difference between the X-Spambayes-MailId > and > Remi> Message-id and why not all application add this to the mail (Ex > Remi> sb_filter.py) > > I've never heard of it. What's its purpose? Which applications add it? It is used by the pop3proxy and imap filter for identifying a stored email with a cached copy. -- Exprimez vous!; Expr?sese; Esprimi te stesso; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com From skip at pobox.com Wed Mar 3 10:28:20 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Mar 3 10:28:31 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> Message-ID: <16453.63892.671348.346688@montanaro.dyndns.org> Remi> I want to know the difference between the X-Spambayes-MailId and Remi> Message-id and why not all application add this to the mail (Ex Remi> sb_filter.py) Skip> What's its purpose? Which applications add it? Tim> It is used by the pop3proxy and imap filter for identifying a Tim> stored email with a cached copy. So unlikely to be of much use to sb_filter.py unless Remi can provide some motivation. Skip From tim at fourstonesExpressions.com Wed Mar 3 10:35:47 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Mar 3 10:35:58 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: <16453.63892.671348.346688@montanaro.dyndns.org> References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> <16453.63892.671348.346688@montanaro.dyndns.org> Message-ID: On Wed, 3 Mar 2004 09:28:20 -0600, Skip Montanaro wrote: > > Remi> I want to know the difference between the X-Spambayes-MailId > and > Remi> Message-id and why not all application add this to the mail (Ex > Remi> sb_filter.py) > > Skip> What's its purpose? Which applications add it? > > Tim> It is used by the pop3proxy and imap filter for identifying a > Tim> stored email with a cached copy. > > So unlikely to be of much use to sb_filter.py unless Remi can provide > some > motivation. Well, I'm not very familiary with sb_filter... the other thing that id is used for (that I didn't remember right off) is to key the database that keeps track of if and how that message has been trained, and how it was classified (which is necessary because we cannot guarantee that all mailers will maintain the X-Spambayes-Classification header). This particular function might be of use in sb_filter, but only if it is using the storage.py module to manage its classifiers. > > Skip > > _______________________________________________ > spambayes-dev mailing list > spambayes-dev@python.org > http://mail.python.org/mailman/listinfo/spambayes-dev > -- Exprimez vous!; Expr?sese; Esprimi te stesso; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com From skip at pobox.com Wed Mar 3 10:40:59 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Mar 3 10:41:12 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> <16453.63892.671348.346688@montanaro.dyndns.org> Message-ID: <16453.64651.737142.982362@montanaro.dyndns.org> >>>>> "Tim" == Tim Stone writes: Tim> It is used by the pop3proxy and imap filter for identifying a Tim> stored email with a cached copy. Skip> So unlikely to be of much use to sb_filter.py unless Remi can Skip> provide some motivation. Tim> Well, I'm not very familiary with sb_filter... the other thing that Tim> id is used for (that I didn't remember right off) is to key the Tim> database that keeps track of if and how that message has been Tim> trained, and how it was classified (which is necessary because we Tim> cannot guarantee that all mailers will maintain the Tim> X-Spambayes-Classification header). This particular function might Tim> be of use in sb_filter, but only if it is using the storage.py Tim> module to manage its classifiers. sb_filter.py is just that, a filter, so it doesn't save any state about messages it processes. It's unlikely that it will see the same message again later except in a retraining scenario, in which case you would probably want it to forget everything it knew about the message from previous training. Skip From tim at fourstonesExpressions.com Wed Mar 3 10:45:32 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Wed Mar 3 10:45:38 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: <16453.64651.737142.982362@montanaro.dyndns.org> References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> <16453.63892.671348.346688@montanaro.dyndns.org> <16453.64651.737142.982362@montanaro.dyndns.org> Message-ID: On Wed, 3 Mar 2004 09:40:59 -0600, Skip Montanaro wrote: > sb_filter.py is just that, a filter, so it doesn't save any state about > messages it processes. It's unlikely that it will see the same message > again later except in a retraining scenario, in which case you would > probably want it to forget everything it knew about the message from > previous training. Yup, then I'd say there's no possible reason for it to insert the mailid header. -- Exprimez vous!; Expr?sese; Esprimi te stesso; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com From papaDoc at videotron.ca Wed Mar 3 11:23:34 2004 From: papaDoc at videotron.ca (papaDoc) Date: Wed Mar 3 11:24:22 2004 Subject: [spambayes-dev] Difference between X-Spambayes-MailId and Message-id In-Reply-To: References: <4045E287.6080007@videotron.ca> <16453.60275.576151.42012@montanaro.dyndns.org> <16453.63892.671348.346688@montanaro.dyndns.org> Message-ID: <40460686.3070203@videotron.ca> Hi, >> So unlikely to be of much use to sb_filter.py unless Remi can provide >> some >> motivation. > This is my motivation. I'm using sb_server and sb_filter At work usually I get my mail from my server (POP server) at home whose running sb_filter with procmail for 3 users. But sometime the server is not responding (Hey it is an old 486 so you can't ask to much, P.S. If you have spare equipment you can send it my way ) so I switch to sb_server at work until I got back home and reboot or solve the problem. I want to be able to move the database and the trained ham and spam from computers so they need the same information. Like oops I trained on a spam has ham 2 days ago with the sb_server web interface. So I send the mail to my server saying it is a spam not a ham Skip said: > sb_filter.py is just that, a filter, so it doesn't save any state about > messages it processes. It's unlikely that it will see the same message > again later except in a retraining scenario, in which case you would > probably want it to forget everything it knew about the message from > previous training. Tim> Yup, then I'd say there's no possible reason for it to insert the mailid header. I want to do something similar to the sb_server but using sb_filter and sb_mboxtrain and some part of smtpproxy.py and procmail This is my setup My ISP allow me to have n different address (papaDoc@videotron.ca, x@videotron.ca, y@videotron.ca) but it saves them in the same account. So fethmail mail grab all the mails. Procmail starts working it use sb_filter to filter the mail.The spam is sent to a spam account on my server. The spam and unsure are sent to the different users. (I receive a copy of all the unsure so I can train on them I don't expect my users to send me the unsure). All the mail is cached to a directory for z days. I want to be able to send back the unsure or misclassified mail to the following addresses papaDoc:spam@videotron.ca and papaDoc:ham@videotron.ca. Procmail see them as training mail call something (This may be a part of smptproxy taking email from command line instead of listening to a port) then grab the mail from the cache and train. Is this a good setup. Do you have something simpler to suggest ? Remi -- /"\ \ / X ASCII Ribbon Campaign / \ Against HTML Email From kennypitt at hotmail.com Fri Mar 5 11:31:04 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Mar 5 11:32:10 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook Message-ID: Is there a way to force the Outlook addin to train on an already-trained message without rebuilding the entire database? I'm trying to do a "train-to-exhaustion" like re-inforcement of the recent ZIP file virus messages, but when I run training on them in the Training tab it ignores them because they've all been trained before. If this doesn't exist currently, I'd like to add a "Force training of previously-trained messages" checkbox under the current training options. -- Kenny Pitt From skip at pobox.com Fri Mar 5 11:59:57 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Mar 5 12:00:12 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: References: Message-ID: <16456.45581.658497.821539@montanaro.dyndns.org> Kenny> Is there a way to force the Outlook addin to train on an Kenny> already-trained message without rebuilding the entire database? Not that I'm aware of. If there's no format difference between the training database the Outlook plugin uses and what the other apps use you should be able to run the contrib/tte.py script over a suitable pair of files then stuff the result in place. Kenny> If this doesn't exist currently, I'd like to add a "Force Kenny> training of previously-trained messages" checkbox under the Kenny> current training options. Since train-to-exhaustion isn't something you can do incrementally, why not add a "train to exhaustion" checkbox (with suitable explanation) in the dialog where you retrain from scratch? Skip From kennypitt at hotmail.com Fri Mar 5 12:06:25 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Mar 5 12:07:27 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: Message-ID: Kenny Pitt wrote: > Is there a way to force the Outlook addin to train on an > already-trained message without rebuilding the entire database? I'm > trying to do a "train-to-exhaustion" like re-inforcement of the > recent ZIP file virus messages, but when I run training on them in > the Training tab it ignores them because they've all been trained > before. Started looking over the code and it appears that not only is this not currently possible, I can't add it (without changing the behavior of a lot of other things) because it would cause the following test in the Load method of class ClassifierData to fail: """ if len(message_db) != bayes.nham + bayes.nspam: print "*** - message database has %d messages - bayes has %d - something is screwey" % \ (len(message_db), bayes.nham + bayes.nspam) """ If I were to force a message to be trained a second time, the total number of trained messages would be higher than the number of unique messages trained on. Is this a correct assessment? -- Kenny Pitt From kennypitt at hotmail.com Fri Mar 5 12:38:30 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Fri Mar 5 12:39:32 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: <16456.45581.658497.821539@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > Kenny> If this doesn't exist currently, I'd like to add a "Force > Kenny> training of previously-trained messages" checkbox under the > Kenny> current training options. > > Since train-to-exhaustion isn't something you can do incrementally, > why not add a "train to exhaustion" checkbox (with suitable > explanation) in the dialog where you retrain from scratch? If "train to exhaustion" proves to be a useful technique (and it certainly seems to be), I'd be all for including it as a training-strategy option in the initial Configuration Wizard and when rebuilding the database from scratch. It seems to be especially good for initial training because you always have the full messages for the training set available to work with. However, see another message that I just sent to the list. There appear to be some consistency checks in the Outlook plugin to make sure that the number of messages in the training database is the same as the number of messages trained on according to the message info database. Unfortunately, this is completely incompatible with the TTE concept of being able to train the same message more than once if necessary. Anyway, I was initially looking for something a little simpler. I just have a couple of problem messages that I would like to re-inforce by training them more than once. It's more in the spirit of TTE than the full implementation. -- Kenny Pitt From skip at pobox.com Fri Mar 5 12:42:50 2004 From: skip at pobox.com (Skip Montanaro) Date: Fri Mar 5 12:43:06 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: References: Message-ID: <16456.48154.169809.162266@montanaro.dyndns.org> Kenny> If I were to force a message to be trained a second time, the Kenny> total number of trained messages would be higher than the number Kenny> of unique messages trained on. Is this a correct assessment? Yup. S From schwaba at alltel.net Sun Mar 7 08:44:04 2004 From: schwaba at alltel.net (Art Schwab) Date: Sun Mar 7 08:49:39 2004 Subject: [spambayes-dev] Question Message-ID: When I click on the Delete As Spam button on my toolbox it tells me that my Spam folder isn't configured. It is configured and seems to be working correctly. What do I need to do to quit getting this message. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20040307/6248800c/attachment.html From tameyer at ihug.co.nz Sun Mar 7 17:38:47 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Mar 7 17:39:27 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13054AFE99@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677A64@its-xchg4.massey.ac.nz> > Is there a way to force the Outlook addin to train on an > already-trained message without rebuilding the entire > database? Use Outlook to make a copy of the message, and train on that. I'm pretty sure that that works. Otherwise Skip's suggestion of using the tte.py script would work, although it's laborious since you'd have to use export.py to get the messages out of Outlook first. > If this doesn't exist currently, I'd like to add a "Force > training of previously-trained messages" checkbox under the > current training options. I think if testing does show that tte is worthwhile, at least for initial training, then this should just be silently done (by a default-to-on option) when training from scratch (including in the config wizard) is done. I think exposing it as an option would be too confusing for Outlook users. =Tony Meyer From kennypitt at hotmail.com Mon Mar 8 09:43:22 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Mar 8 09:44:26 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677A64@its-xchg4.massey.ac.nz> Message-ID: Tony Meyer wrote: >> Is there a way to force the Outlook addin to train on an >> already-trained message without rebuilding the entire >> database? > > Use Outlook to make a copy of the message, and train on that. I'm > pretty sure that that works. That's what I thought, too, so that's exactly what I tried. It may depend on Outlook version, but in 2003 it didn't seem to work. I have a Possible Spam folder in my main PST file, and a separate PST file containing one collection of all messages I have received and another of all messages that I have trained on. I tried copying messages from both collections in the separate PST file back to my Possible Spam folder and training them, but SpamBayes always reported that the message had already been trained. I'll play with some other combinations such as making copies within the same PST file and see if I get any different results. -- Kenny Pitt From sethg at GoodmanAssociates.com Mon Mar 8 14:41:38 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Mon Mar 8 14:41:53 2004 Subject: [spambayes-dev] saving attachments Message-ID: I have been accumulating a message corpus for testing that is now becoming alarmingly large. My cup doth runneth over. AFAIK, SpamBayes does nothing with attachments. Neither the existence of one nor its name, size or contents are considered. While most of the spam in my corpus is attachment-free, the ham has lots of them and many are quite large (engineering drawing packages for review). It would reduce the size of the corpus .pst file considerably if I could delete all attachments. I have an inexpensive commercial tool that can do this, however, I don't want to if anyone is considering using attachments in future versions. FWIW, I don't see attachments as having much potential for spam detection. The number of tokens could easily dwarf the original message and need not be related to it in any way. The last thing we want to do is to encourage spammers to tack on huge attachments, though word salad attacks have been totally ineffective on my machine and most others who mentioned it on this list. However, including the full text of actual natural language works might have better luck, and I wouldn't want to be responsible for encouraging that practice, i.e. really bad Karma, hate mail and death threats, so I would think that continuing to ignore attachments is a good strategy. So, have there been any rumblings about possibly using attachment information? Am I reasonably safe in deleting all the attachments in my message corpus for the foreseeable future? -- Seth Goodman From popiel at wolfskeep.com Mon Mar 8 15:00:45 2004 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Mon Mar 8 15:00:52 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message from "Seth Goodman" of "Mon, 08 Mar 2004 13:41:38 CST." References: Message-ID: <20040308200045.7B8D92DE4C@cashew.wolfskeep.com> In message: "Seth Goodman" writes: >I have been accumulating a message corpus for testing that is now >becoming alarmingly large. My cup doth runneth over. AFAIK, SpamBayes >does nothing with attachments. Neither the existence of one nor its >name, size or contents are considered. My recollection is that the attachment headers (content type, name, and possibly a few other values) are indeed getting mined, just like for any other MIME part. We certainly ignore the body for non-text content, but I think the metadata is used. Of course, I haven't looked at the code in a while, so that may not be the case now... - Alex From tim at fourstonesExpressions.com Mon Mar 8 15:02:56 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Mon Mar 8 15:03:09 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: <20040308200045.7B8D92DE4C@cashew.wolfskeep.com> References: <20040308200045.7B8D92DE4C@cashew.wolfskeep.com> Message-ID: On Mon, 08 Mar 2004 12:00:45 -0800, T. Alexander Popiel wrote: > In message: > "Seth Goodman" writes: > >> I have been accumulating a message corpus for testing that is now >> becoming alarmingly large. My cup doth runneth over. AFAIK, SpamBayes >> does nothing with attachments. Neither the existence of one nor its >> name, size or contents are considered. > > My recollection is that the attachment headers (content type, name, > and possibly a few other values) are indeed getting mined, just like > for any other MIME part. We certainly ignore the body for non-text > content, but I think the metadata is used. > > Of course, I haven't looked at the code in a while, so that may not > be the case now... I believe that's the case. If you can remove the attachments without altering the headers/multipart boundaries/etc., then the resulting corpus should be functionally equivalent to the original. -- Exprimez vous!; Expr?sese; Esprimi te stesso; Express yourself! Tim Stone See my photography at www.fourstonesExpressions.com From tim.one at comcast.net Mon Mar 8 15:10:37 2004 From: tim.one at comcast.net (Tim Peters) Date: Mon Mar 8 15:10:42 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message-ID: [Seth Goodman] > I have been accumulating a message corpus for testing that is now > becoming alarmingly large. My cup doth runneth over. AFAIK, > SpamBayes does nothing with attachments. Neither the existence of > one nor its name, size or contents are considered. That's unique to the Outlook addin, and is due to that Outlook destroys the original MIME structure. In other ways of using spambayes, all and only attachments of MIME type text/* are tokenized, and tokens are synthesized for all MIME sections, recording (from a comment in tokenizer.py): # Generate tokens for: # Content-Type # and its type= param # Content-Dispostion # and its filename= param # all the charsets # # This has huge benefit for the f-n rate, and virtually no effect on # the f-p rate, although it does reduce the variance of the f-p rate # across different training sets (really marginal msgs, like a brief # HTML msg saying just "unsubscribe me", are almost always tagged as # spam now; before they were right on the edge, and now the # multipart/alternative pushes them over it more consistently). > While most of the spam in my corpus is attachment-free, the ham has > lots of them and many are quite large (engineering drawing packages > for review). They wouldn't have MIME type text/*, so only the synthesized tokens above would be generated for them. > It would reduce the size of the corpus .pst file considerably if I > could delete all attachments. I have an inexpensive commercial tool > that can do this, however, I don't want to if anyone is considering > using attachments in future versions. > > FWIW, I don't see attachments as having much potential for spam > detection. Tests before said that their MIME types, file names, and charsets did help. > The number of tokens could easily dwarf the original message and need > not be related to it in any way. The last thing we want to do is to > encourage spammers to tack on huge attachments, They won't -- bandwidth is a primary cost for bulk emailers, and big messages limit the rate at which they can send spam out. > though word salad attacks have been totally ineffective on my machine > and most others who mentioned it on this list. However, including > the full text of actual natural language works might have better > luck, and I wouldn't want to be responsible for encouraging that > practice, i.e. really bad Karma, hate mail and death threats, so I > would think that continuing to ignore attachments is a good strategy. The Outlook addin ignores them only because nobody has endured the pain necessary to try to guess what the original MIME structure might have been. > So, have there been any rumblings about possibly using attachment > information? Am I reasonably safe in deleting all the attachments in > my message corpus for the foreseeable future? No way of using spambayes makes any use of the *content* of non-text/* attachments, so you're certainly safe deleting those. Big attachments very rarely have a text type, so that would save the bulk of the space. From rmalayter at bai.org Mon Mar 8 15:18:53 2004 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Mar 8 15:18:57 2004 Subject: [spambayes-dev] saving attachments Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D5@cliff.bai.org> [Tim Stone] >I believe that's the case. If you can remove the attachments >without altering the headers/multipart boundaries/etc., then >the resulting corpus should be functionally equivalent to the >original. I just looked at the spam clues for several messages I have with attachments, and SpamBayes 1.0a9 (Outlook Plug-in with default install options) appears to ignore attachments completely. I get no Content-Type or attachment name tokens at all. It seems these would be good spam clues, or at least good for increasing the antivirus capabilities inherent in SpamBayes filtering. I've trained on several dozen Bagel-and NetSky infected messages, and they all still score below 80%. If SpamBayes generated a token for the attachment file name extension (.PIF, .EXE, whatever) it would certainly help push these worm-generated messages over the top, and aid in the quarantine of new worm variants, would it not? Also, I noticed that I get only one 'Content-Type:text/plain' token for each message, even though many of the messages are 'Content-Type:multipart/alternative' with both text and HTML body parts, as well as Word or other attachment MIME parts. Is that a bug? Regards, Ryan From skip at pobox.com Mon Mar 8 15:31:23 2004 From: skip at pobox.com (Skip Montanaro) Date: Mon Mar 8 15:31:46 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D5@cliff.bai.org> References: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D5@cliff.bai.org> Message-ID: <16460.55323.693391.892090@montanaro.dyndns.org> Ryan> I just looked at the spam clues for several messages I have with Ryan> attachments, and SpamBayes 1.0a9 (Outlook Plug-in with default Ryan> install options) appears to ignore attachments completely. As Tim Peters indicated that's because nobody's expended the effort necessary to reassemble the MIME information after Outlook has ripped it apart. I have lots of content-type information in my database: % spamcounts -d tmp/tte.db -r '(?i)content-type' db: tmp/tte.db token,nspam,nham,spam prob content-type:text/x-vcard,0,1,0.155172413793 content-type:application/x-pkcs7-signature,0,1,0.155172413793 content-type:application/octet-stream,8,1,0.862421442234 content-type:text/plain,308,408,0.410282368348 content-type:multipart/related,14,10,0.562145227031 content-type:image/bmp,0,1,0.155172413793 content-type:multipart/alternative,162,123,0.54816049659 content-type:text/html,266,134,0.646358868367 content-type/type:multipart/alternative,8,7,0.512537871748 content-type:message/delivery-status,4,3,0.548176609577 content-type:multipart/report,5,3,0.600000679352 content-type:audio/x-midi,1,0,0.844827586207 content-type:image/jpeg,4,4,0.480634749866 content-type:application/vnd.ms-excel,0,1,0.155172413793 content-type:message/rfc822,7,11,0.372799628897 content-type:multipart/mixed,60,40,0.579842359246 content-type:plain/text,2,0,0.908163265306 content-type:application/ms-tnef,4,0,0.949438202247 content-type:application/x-msdownload,1,0,0.844827586207 content-type:image/gif,6,5,0.524107121834 content-type:application/pgp-signature,0,1,0.155172413793 content-type:application/msword,0,1,0.155172413793 content-type:multipart/signed,0,2,0.0918367346939 content-type:text/rfc822-headers,1,1,0.483302411874 content-type:image/pjpeg,0,1,0.155172413793 content-type:multipart/digest,0,5,0.0412844036697 Skip From sethg at GoodmanAssociates.com Mon Mar 8 16:05:30 2004 From: sethg at GoodmanAssociates.com (Seth Goodman) Date: Mon Mar 8 16:05:37 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message-ID: > -----Original Message----- > From: spambayes-dev-bounces@python.org > [mailto:spambayes-dev-bounces@python.org]On Behalf Of Tim Peters > Sent: Monday, March 08, 2004 2:11 PM > To: sethg@GoodmanAssociates.com > Cc: SpamBayes-dev Forum > Subject: RE: [spambayes-dev] saving attachments > > > [Seth Goodman] > > I have been accumulating a message corpus for testing that is now > > becoming alarmingly large. My cup doth runneth over. AFAIK, > > SpamBayes does nothing with attachments. Neither the existence of > > one nor its name, size or contents are considered. > > That's unique to the Outlook addin, and is due to that > Outlook destroys the > original MIME structure. In other ways of using spambayes, > all and only > attachments of MIME type text/* are tokenized, and tokens are > synthesized > for all MIME sections, recording (from a comment in tokenizer.py): > > # Generate tokens for: > # Content-Type > # and its type= param > # Content-Dispostion > # and its filename= param > # all the charsets > # > # This has huge benefit for the f-n rate, and virtually no effect on > # the f-p rate, although it does reduce the variance of the f-p rate > # across different training sets (really marginal msgs, like a brief > # HTML msg saying just "unsubscribe me", are almost always tagged as > # spam now; before they were right on the edge, and now the > # multipart/alternative pushes them over it more consistently). > > > > While most of the spam in my corpus is attachment-free, the ham has > > lots of them and many are quite large (engineering drawing packages > > for review). > > They wouldn't have MIME type text/*, so only the synthesized > tokens above > would be generated for them. Unless I am mistaken, most of these synthesized tokens are not generated by the Outlook plug-in. I did an experiment with a message that had an html attachment. I copied the message, deleted the attachment, marked it as unread and filtered it again (I wasn't sure if "show spam clues" retokenizes and reclassifies each time). It had the same number of total tokens and significant tokens as the copy with the attachment. The only token that I noticed that relates to message structure was: 'content-disposition:inline' Perhaps I missed the others. I've zipped up the message with and without the html attachment and the spam clues page for each one. The message headers still seem to include the multi-part structure after removing the attachment, but I'm not sure if it is still good enough for other uses of SpamBayes. Could someone peruse these and offer an opinion? > > > It would reduce the size of the corpus .pst file considerably if I > > could delete all attachments. I have an inexpensive commercial tool > > that can do this, however, I don't want to if anyone is considering > > using attachments in future versions. > > > > FWIW, I don't see attachments as having much potential for spam > > detection. > > Tests before said that their MIME types, file names, and > charsets did help. I stand corrected. In that case, it's a pity that the Outlook plug-in can't at least take advantage of those items, though if Outlook destroys them, that's impossible unless I switch over to the proxy. > > > The number of tokens could easily dwarf the original > > message and need > > not be related to it in any way. The last thing we want to do is to > > encourage spammers to tack on huge attachments, > > They won't -- bandwidth is a primary cost for bulk emailers, and big > messages limit the rate at which they can send spam out. This makes sense. Thanks for correcting my misconceptions. > > > though word salad attacks have been totally ineffective on > > my machine > > and most others who mentioned it on this list. However, including > > the full text of actual natural language works might have better > > luck, and I wouldn't want to be responsible for encouraging that > > practice, i.e. really bad Karma, hate mail and death threats, so I > > would think that continuing to ignore attachments is a good > > strategy. > > The Outlook addin ignores them only because nobody has > endured the pain > necessary to try to guess what the original MIME structure > might have been. That does sound painful, especially since the Outlook internals are not documented. I guess the only realistic way around this is a proxy. I actually wouldn't mind using a proxy if I could keep some semblance of the present Outlook integration, but that's a whole separate project. I don't know if tokenizing and classifying the messages before Outlook mangles them has other advantages. I can see a mountain of problems, though, such as storing an RFC-compliant copy of the message in addition to Outlook's .pst store and keeping track of both. It could also cause inconsistencies if you want to train on a message at some point in the future when you only have the .pst version available. Sounds like a bad idea the more I think about it. Thanks for the replies. -- Seth Goodman -------------- next part -------------- A non-text attachment was scrubbed... Name: messages.zip Type: application/x-zip-compressed Size: 31067 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20040308/8f94701a/messages-0001.bin From tim.one at comcast.net Mon Mar 8 16:48:09 2004 From: tim.one at comcast.net (Tim Peters) Date: Mon Mar 8 16:48:13 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message-ID: [Seth Goodman] > Unless I am mistaken, most of these synthesized tokens are not > generated by the Outlook plug-in. That's correct. Outlook destroys the original MIME structure, so it's simply not possible for the Outlook-addin way of running spambayes to synthesize tokens derived from the MIME armor Outlook destroyed. Again, this is unique to the Outlook-addin way of running spambayes. All other ways of running spambayes do produce these tokens -- and they're valuable. > ... > The only token that I noticed that relates to message structure was: > > 'content-disposition:inline' Outlook leaves the MIME decorations in the email headers alone, so we get to see that one. It destroys all MIME decorations that were distributed throughout the rest of the original msg. > ... > The message headers still seem to include the multi-part structure > after removing the attachment, but I'm not sure if it is still good > enough for other uses of SpamBayes. The MIME structure of the original email, which Outlooks sees but does not show to us, is distributed throughout the entire payload of the original message. Outlook destroys all of the info before it shows the msg to us, except for the bit living in the headers. > ... > I stand corrected. In that case, it's a pity that the Outlook plug-in > can't at least take advantage of those items, though if Outlook > destroys them, that's impossible unless I switch over to the proxy. Right on both counts. We could do a better job of trying to *synthesize* MIME structure in the Outlook addin, but it's painful and nobody yet has bothered. Note that Outlook itself synthesizes MIME structure again if, e.g., you forward a message with attachments (MIME is the only standard way to ship that to another machine, so Outlook is forced to play along then; the underlying problem here is that Outlook's message store was designed before there were common standards for dealing with modern email, and so it's an endlessly nasty battle at the Outlook API level). From rmalayter at bai.org Mon Mar 8 18:43:19 2004 From: rmalayter at bai.org (Ryan Malayter) Date: Mon Mar 8 18:43:24 2004 Subject: [spambayes-dev] saving attachments Message-ID: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D8@cliff.bai.org> [Tim Peters] > >Right on both counts. We could do a better job of trying to >*synthesize* MIME structure in the Outlook addin, but it's >painful and nobody yet has bothered. This doesn't seem like it would be that hard, I will take a look at the relevant portions of the code. I've done some work with CDO and attachments for other projects, so I know a bit about how Outlook handles them. >Note that Outlook itself synthesizes MIME structure again if, e.g., >you forward a message with attachments (MIME is the > only standard >way to ship that to another machine, so Outlook is forced to play >along then; Perhaps we could use that process as a bootstrap of some sort? Forward the message to some strange internet recipient to get the MIME-formatted message from outlook? Probably not, though, since it still hides this from the plug-in object model, right? >the underlying problem here is that Outlook's message store >was designed before there were common standards for dealing >with modern email, and so it's an endlessly nasty battle at >the Outlook API level). Exchange Server 2000 and newer store all RFC-formatted messages (i.e. inbound mail from internet sources) in an unaltered form in the "stream store" rather than in the main Exchange database. You can actually access these raw RFC-formatted messages on the Exchange server side through the Exchange file system driver or CDO-Exchange. Also, I know that Outlook 2003 has enhancements with regards to message-format standards compliance as well, and Exchange 2003 will not convert messages from the standard RFC format if you read them with Exchange 2003. (All Outlook clients earlier than 2003 request all messages in the old MAPI format from the server, regardless of the originating format. Exchange 2000/2003 convert messages from RFC format on the fly when talking with older Outlook clients.) So perhaps this problem will solve itself in future Outlook versions? Should we just wait for Microsoft to do the work for us? Or maybe there is new stuff in the Outlook 2003 API to address this? From looking at the "What's New in Outlook 2003" articles on MSDN, it doesn't appear so, but I'm no expert in this area. I also found this MAPI-to-RFC822 conversion utility, which appears to be at least somewhat open source, and may prove a useful guide for reassembling Outlook messages from outlook into RFC format for SpamBayes: http://shorterlink.com/?RJ0NQV Regards, Ryan From mhammond at keypoint.com.au Mon Mar 8 21:15:50 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Mon Mar 8 21:16:12 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: Message-ID: <06b301c4057c$718c9a20$0200a8c0@eden> > Tony Meyer wrote: > >> Is there a way to force the Outlook addin to train on an > >> already-trained message without rebuilding the entire > >> database? > > > > Use Outlook to make a copy of the message, and train on that. I'm > > pretty sure that that works. > > That's what I thought, too, so that's exactly what I tried. It may > depend on Outlook version, but in 2003 it didn't seem to work. Actually, we jump through hoops to *prevent* that trick working. When we move an object to another folder, we lose "Entry ID" - ie, just like a "Copy" operation. Thus, we use a different field to record the "id" of the item - one that persists across move and copy operations. > I'll play with some other combinations such as making copies > within the > same PST file and see if I get any different results. Hopefully you wont Mark From mhammond at keypoint.com.au Tue Mar 9 04:06:20 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Tue Mar 9 04:06:48 2004 Subject: [spambayes-dev] RE: Translate configuration Page in spambay? In-Reply-To: <404D678D.540B7455@bulwar.pl> Message-ID: <084f01c405b5$ca78a460$0200a8c0@eden> Unfortunately, neither Barry nor I are the best people to answer this. I have forwarded your question to the spambayes-dev mailing list - hopefully someone will chime up there, as this sounds like a very worthwhile thing to support. Regards, Mark > -----Original Message----- > From: Dwukierunkowe Gadu Gadu na WAP-Administrator > [mailto:info2@bulwar.pl] > Sent: Tuesday, 9 March 2004 5:43 PM > To: mhammond@users.sourceforge.net; bwarsaw@users.sourceforge.net > Subject: Translate configuration Page in spambay? > > > I would like to translate the configuration page to POLISH > language for > our users. > > Could you show me the way how to do it in the easiest way? > > At the begining I thought that I would be able to do it in > binary files, > but I looked through the files and I think that it will be > necessary to > recompile the package. I looked through the sources but I didn't get > idea where the configuration page is keept and how to edit it in the > easiest way. > > I will be use only POP3 proxy. > > PS > I have wrote to (bwarsaw at users.sourceforge.net) developer, but I > didn't received any reply or maybe I missed the answer > because i used my > old e-mail that it is full of spam. So please reply to this adress. From barry at python.org Tue Mar 9 09:54:49 2004 From: barry at python.org (Barry Warsaw) Date: Tue Mar 9 09:54:54 2004 Subject: [spambayes-dev] RE: Translate configuration Page in spambay? In-Reply-To: <084f01c405b5$ca78a460$0200a8c0@eden> References: <084f01c405b5$ca78a460$0200a8c0@eden> Message-ID: <1078844088.11031.53.camel@anthem.wooz.org> On Tue, 2004-03-09 at 04:06, Mark Hammond wrote: > Unfortunately, neither Barry nor I are the best people to answer this. I > have forwarded your question to the spambayes-dev mailing list - hopefully > someone will chime up there, as this sounds like a very worthwhile thing to > support. I agree, it would be very cool to have. Python has everything you need to localize and internationalize applications. -Barry From kennypitt at hotmail.com Tue Mar 9 13:17:11 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Mar 9 13:18:15 2004 Subject: [spambayes-dev] RE: Translate configuration Page in spambay? In-Reply-To: <1078844088.11031.53.camel@anthem.wooz.org> Message-ID: Barry Warsaw wrote: > On Tue, 2004-03-09 at 04:06, Mark Hammond wrote: >> Unfortunately, neither Barry nor I are the best people to answer >> this. I have forwarded your question to the spambayes-dev mailing >> list - hopefully someone will chime up there, as this sounds like a >> very worthwhile thing to support. > > I agree, it would be very cool to have. Python has everything you > need to localize and internationalize applications. If you work from source, you can do basic translation by modifying ui.html. Just install resourcepackage so that ui_html.py will get rebuilt from any changes. Unfortunately there are still quite a few literal strings scattered throughout the .py files. The option descriptions in Options.py are a prime example, as well as the statistics and status messages. We just need someone with enough time on their hands to move all those literals either into ui.html or out to a message file using gettext or something similar. -- Kenny Pitt From kennypitt at hotmail.com Tue Mar 9 13:23:19 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Mar 9 13:24:22 2004 Subject: [spambayes-dev] Train-to-Exhaustion in Outlook In-Reply-To: <06b301c4057c$718c9a20$0200a8c0@eden> Message-ID: Mark Hammond wrote: >> I'll play with some other combinations such as making copies >> within the >> same PST file and see if I get any different results. > > Hopefully you wont And as usual, Mark is right. Didn't make a darned bit of difference. Would it make sense to try to provide a way to multi-train messages? We could probably hide it away in some "Advanced" menu or something, but I don't want to complicate things for the average user. I doubt Tim's sister has any interest in that level of detail. -- Kenny Pitt From skip at pobox.com Tue Mar 9 17:20:01 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 9 17:20:19 2004 Subject: [spambayes-dev] unicode error in sb_dbexpimp.py Message-ID: <16462.17169.913721.623966@montanaro.dyndns.org> I decided to try speeding up my train-to-exhaustion runs (they are taking longer per round and running more rounds now that I'm approaching 1,000 total training messages) by training to a pickle then using sb_dbexpimp.py to dump first to CSV then to a database file. I got this error when importing from tte.csv to tte.db: Importing database tte.db using file tte.csv Traceback (most recent call last): File "/Users/skip/local/bin/sb_dbexpimp.py", line 267, in ? runImport(dbFN, useDBM, newDBM, flatFN) File "/Users/skip/local/bin/sb_dbexpimp.py", line 199, in runImport word = uunquote(word) File "/Users/skip/local/bin/sb_dbexpimp.py", line 115, in uunquote return unicode(urllib.unquote(s), 'utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 7: unexpected end of data It's barfing on this input string %A5x%A5_%A5%AB%ABh%B8q%B0%CF which urllib.unquote()s to \xa5x\xa5_\xa5\xab\xabh\xb8q\xb0\xcf which is (apparently) invalid utf-8. At first glance the uquote() and uunquote() function definitions seemed okay, but after further reflection I wonder why urllib.(un)?quote() are being called. Skip From tameyer at ihug.co.nz Tue Mar 9 17:51:48 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 9 17:52:44 2004 Subject: [spambayes-dev] RE: Translate configuration Page in spambay? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13056454C4@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B08@its-xchg4.massey.ac.nz> > If you work from source, you can do basic translation by > modifying ui.html. Just install resourcepackage so that > ui_html.py will get rebuilt from any changes. Modifying ui.html will certainly change the vast majority of the web interface, but almost none of the configuration page. Almost everything on that page is dynamically generated from the stuff in Options.py, and you'd have to do a lot of talking to convince me that it's worthwhile taking it out of there. OTOH, if you translate Options.py *and* ui.html, that's almost everything. > Unfortunately there are still quite a few literal strings > scattered throughout the .py files. The option descriptions > in Options.py are a prime example, as well as the statistics > and status messages. We just need someone with enough time > on their hands to move all those literals either into ui.html > or out to a message file using gettext or something similar. I'm +1 on moving literals outside Options.py and ui.html, where possible. Two things that spring to mind: 1. UserInterface.py changes True and False to "Yes" and "No"; the idea is that these are more understandable to the average user. These could certainly be consolidated into two variables that are referenced throughout the file. Either Options[Class].py or ui.html could then have the actual strings, and the variables loaded from there. 2. UserInterface.py currently has all the help pages. As the comments in the file say, this isn't the right place for them, it was simply convenient at the time. I'd be very happy for these to be moved out into some other place, which would also make translation easier. The pre-formatted bug reporting text should stay in English unless it's changed to send to somewhere other than spambayes@python.org. I'm happy to help with this, if necessary, so feel free to ask (but preferably on the list, rather than in private email, which languishes when I'm busy). =Tony Meyer From skip at pobox.com Tue Mar 9 17:55:38 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 9 17:55:48 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py Message-ID: <16462.19306.489664.552012@montanaro.dyndns.org> I wrote: At first glance the uquote() and uunquote() function definitions seemed okay, but after further reflection I wonder why urllib.(un)?quote() are being called. Shortly thereafter I realized why those calls are there. The code is using a ` as its field separator. The quote/unquote dance avoids problems with them. A better solution, IMO, would be to use the csv module. I have a patch ready to check in. Unfortunately that would restrict sb_dbexpimp.py to Python 2.3 or greater. Feedback? Skip From popiel at wolfskeep.com Tue Mar 9 19:18:37 2004 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Tue Mar 9 19:18:42 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: Message from Skip Montanaro of "Tue, 09 Mar 2004 16:55:38 CST." <16462.19306.489664.552012@montanaro.dyndns.org> References: <16462.19306.489664.552012@montanaro.dyndns.org> Message-ID: <20040310001837.E0D452DDA3@cashew.wolfskeep.com> In message: <16462.19306.489664.552012@montanaro.dyndns.org> Skip Montanaro writes: > >A better solution, IMO, would be to use the csv module. I have a patch >ready to check in. Unfortunately that would restrict sb_dbexpimp.py to >Python 2.3 or greater. Feedback? I fear I'm nearly alone in my insistence on 2.2 compatibility; unfortunately, there is no more recent python packaged for Debian stable systems (and 2.1 is still default there!). If we start requiring more recent python, I will have a very annoying time working around the packaging system to use it... (and/or figuring out how to backport and package a more recent python from testing/unstable). - Alex From tameyer at ihug.co.nz Tue Mar 9 19:41:03 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 9 19:41:46 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1305645566@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B09@its-xchg4.massey.ac.nz> > Shortly thereafter I realized why those calls are there. The > code is using a ` as its field separator. The quote/unquote > dance avoids problems with them. > > A better solution, IMO, would be to use the csv module. I > have a patch ready to check in. Unfortunately that would > restrict sb_dbexpimp.py to Python 2.3 or greater. Feedback? How 2.3-specific is csv.py? Could there be a compatCsv.py file added, like compatSets? Alternatively, what about using tab as a field separator? It seems unlikely that there'll ever be a token that contains a tab, so it wouldn't have to worry about that. Speaking of patches to sb_dbexpimp.py, I'd like to have it use the database in the configuration (if it exists) by default, since the other scripts (like spamcounts.py) do, and I end up typing "c:\documents and settings\tameyer.massey\application data\spambayes\default_bayes_database.db" all the time, which is quite a mouthful. Of course, the script takes more than one database name, which is a problem. Any ideas for a way to do this nicely? Or am I just too lazy? =Tony Meyer From tameyer at ihug.co.nz Tue Mar 9 19:50:19 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 9 19:51:02 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13056455A5@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B0A@its-xchg4.massey.ac.nz> > A better solution, IMO, would be to use the csv module. I > have a patch ready to check in. Unfortunately that would > restrict sb_dbexpimp.py to Python 2.3 or greater. Feedback? I meant to also say that I, too, sometimes use SpamBayes with 2.2 and can't move to 2.3 (it's on our cluster here), but only the testtools, so it wouldn't actually effect me in practice to have this one script not work. OTOH, when 1.0a7 (I think it was that one) was released with some scripts that were missing the 2.2 compat code, there were a few people that mailed the list about it, so there must be at least a few others. I'd have no idea about how many actually use db_expimp.py, though. =Tony Meyer From skip at pobox.com Tue Mar 9 20:16:58 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 9 20:17:07 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: <20040310001837.E0D452DDA3@cashew.wolfskeep.com> References: <16462.19306.489664.552012@montanaro.dyndns.org> <20040310001837.E0D452DDA3@cashew.wolfskeep.com> Message-ID: <16462.27786.904248.565584@montanaro.dyndns.org> >> A better solution, IMO, would be to use the csv module. I have a >> patch ready to check in. Unfortunately that would restrict >> sb_dbexpimp.py to Python 2.3 or greater. Feedback? Alex> I fear I'm nearly alone in my insistence on 2.2 compatibility; I don't think so. I'm generally in favor of it, though I run from Python CVS (aka 2.4a0 at the moment). I'll see if I can wangle my way around the problem so that for 2.2 users something else will be used. Skip From skip at pobox.com Tue Mar 9 20:23:08 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 9 20:23:14 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B09@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F1305645566@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2B09@its-xchg4.massey.ac.nz> Message-ID: <16462.28156.23909.45386@montanaro.dyndns.org> Tony> How 2.3-specific is csv.py? Not terribly, I don't think, though I've never tried building it under 2.2. (Are you thinking we could deliver it with Spambayes for people running 2.2? Tony> Could there be a compatCsv.py file added, like compatSets? Probably wouldn't be too hard to generate the subset we need. Tony> Alternatively, what about using tab as a field separator? It Tony> seems unlikely that there'll ever be a token that contains a tab, Tony> so it wouldn't have to worry about that. Like it or not, while the probability of a collision is substantially reduced, I don't think you can ignore it. Tony> Speaking of patches to sb_dbexpimp.py, I'd like to have it use the Tony> database in the configuration (if it exists) by default, since the Tony> other scripts (like spamcounts.py) do, and I end up typing Tony> "c:\documents and settings\tameyer.massey\application Tony> data\spambayes\default_bayes_database.db" all the time, which is Tony> quite a mouthful. Of course, the script takes more than one Tony> database name, which is a problem. Any ideas for a way to do this Tony> nicely? Or am I just too lazy? The only problem is that sb_dbexpimp.py works a little differently than most apps. I suppose if you omitted the -d and -p flags it could default to use the database in the configuration though. Assuming a full round-trip you'd still have to use -d or -p for either the import or export. It sounds like you use it a lot. This is my first brush with it, and even there its use is embedded in a shell script so typing long filenames isn't a big deal. What's your use case that you need it so frequently? Skip From skip at pobox.com Tue Mar 9 23:40:40 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 9 23:41:21 2004 Subject: [spambayes-dev] Re: unicode error in sb_dbexpimp.py In-Reply-To: <16462.28156.23909.45386@montanaro.dyndns.org> References: <1ED4ECF91CDED24C8D012BCF2B034F1305645566@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2B09@its-xchg4.massey.ac.nz> <16462.28156.23909.45386@montanaro.dyndns.org> Message-ID: <16462.40008.469992.694437@montanaro.dyndns.org> Tony> How 2.3-specific is csv.py? Skip> Not terribly, I don't think, though I've never tried building it Skip> under 2.2. (Are you thinking we could deliver it with Spambayes Skip> for people running 2.2? Alas, I forgot that the csv module comes in two pieces, Lib/csv.py and Modules/_csv.c. The C chunk pretty much means it would be impractical to deliver with Spambayes as we can't assume most folks will have a C compiler handy. FWIW, _csv.c taken from Python CVS seems to still compile under 2.2.3. (2.2 compatibility was a goal when we created the module.) Tony> Could there be a compatCsv.py file added, like compatSets? Skip> Probably wouldn't be too hard to generate the subset we need. I'll try to take a look at this option tomorrow. Skip From info2 at bulwar.pl Wed Mar 10 05:44:15 2004 From: info2 at bulwar.pl (Dwukierunkowe Gadu Gadu na WAP-Administrator) Date: Wed Mar 10 05:46:05 2004 Subject: [spambayes-dev] Translate spambay? Compile procedure? Message-ID: <404EF17F.317C465C@bulwar.pl> I have founded all files to translate in sources. but 1. what is the procedure to compile product? 2. what is the procedure to make installer? From anthony at interlink.com.au Wed Mar 10 19:21:16 2004 From: anthony at interlink.com.au (Anthony Baxter) Date: Wed Mar 10 19:21:26 2004 Subject: [spambayes-dev] req for a radio interview on SB Message-ID: <404FB0FC.9040904@interlink.com.au> I've received a request to do a brief chat on some radio show about technology talking about SpamBayes. Does anyone have a burning desire to be on radio? I'm happy enough to do it, but if someone's got a really strong desire to do this, let me know. BTW, how many other SB folks are going to be at PyCon? I'm going to be there (obviously, if you check the list of talks ;) - hope to see a bunch of you there! Anthony -- Anthony Baxter It's never too late to have a happy childhood. From tameyer at ihug.co.nz Wed Mar 10 22:07:50 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Mar 10 22:10:01 2004 Subject: [spambayes-dev] Translate spambay? Compile procedure? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13056456BF@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B17@its-xchg4.massey.ac.nz> > 1. what is the procedure to compile product? There isn't one, really (unless you mean to make the binaries for the installer, see below). If you have Python installed, then you don't need to do anything else. The exception is ui.html - to turn this into ui_html.py you need to have resourcepackage installed , and then it'll automatically get created whenever ui.html is newer than ui_html.py. > 2. what is the procedure to make installer? You run the setup_all.py script in the windows/py2exe directory. This creates the binaries that the installer packages. You then run the spambayes.iss file through Inno Setup and the installer is created. If you're going to do that, then presumably there is more translation to be done there (IIRC Inno supports this quite nicely). Note that the installer includes both the Outlook plug-in and sb_server. If you're only translating one of them, then it would probably be better to create a different installer that just had that product (as well as a source archive, or set of patches). =Tony Meyer From barry at python.org Thu Mar 11 10:00:38 2004 From: barry at python.org (Barry Warsaw) Date: Thu Mar 11 10:00:52 2004 Subject: [spambayes-dev] req for a radio interview on SB In-Reply-To: <404FB0FC.9040904@interlink.com.au> References: <404FB0FC.9040904@interlink.com.au> Message-ID: <1079017237.6234.77.camel@anthem.wooz.org> On Wed, 2004-03-10 at 19:21, Anthony Baxter wrote: > I've received a request to do a brief chat on some radio show about > technology talking about SpamBayes. Does anyone have a burning desire > to be on radio? I've had one for 20 years. Oh wait, you don't mean as a fabulously wealthy rock star. > I'm happy enough to do it, but if someone's got a > really strong desire to do this, let me know. C'mon Uncle Timmy! -Barry From info2 at bulwar.pl Thu Mar 11 10:49:08 2004 From: info2 at bulwar.pl (Dwukierunkowe Gadu Gadu na WAP-Administrator) Date: Thu Mar 11 10:51:04 2004 Subject: [spambayes-dev] Translate spambay? Compile procedure? References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B17@its-xchg4.massey.ac.nz> Message-ID: <40508A74.9AAAE23@bulwar.pl> > > > 2. what is the procedure to make installer? > > You run the setup_all.py script in the windows/py2exe directory. This I tried but... The following modules appear to be missing ['Entrian', 'MySQLdb', 'bsddb3', 'gdbm', 'psycopg', 'pywin.debugger.dbgcon', 'py win.dialogs', 'pywin.dialogs.list', 'resource', 'twisted', 'twisted.copyright', 'twisted.internet', 'twisted.internet.app', 'twisted.internet.defer', 'twisted.i nternet.protocol', 'twisted.protocols.imap4'] I need some modules and it is a quite confiusing to look for a special module with windows installer...maybe you have pack of all modules that you need to compile it (in one www site)? From kennypitt at hotmail.com Thu Mar 11 11:13:04 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Mar 11 11:14:13 2004 Subject: [spambayes-dev] Translate spambay? Compile procedure? In-Reply-To: <40508A74.9AAAE23@bulwar.pl> Message-ID: Dwukierunkowe Gadu Gadu na WAP-Administrator wrote: >>> 2. what is the procedure to make installer? >> >> You run the setup_all.py script in the windows/py2exe directory. >> This > > I tried but... > > > The following modules appear to be missing > ['Entrian', 'MySQLdb', 'bsddb3', 'gdbm', 'psycopg', > 'pywin.debugger.dbgcon', 'py > win.dialogs', 'pywin.dialogs.list', 'resource', 'twisted', > 'twisted.copyright', > 'twisted.internet', 'twisted.internet.app', 'twisted.internet.defer', > 'twisted.i > nternet.protocol', 'twisted.protocols.imap4'] All normal. Those modules come from optional features, development-only code, and other applications that are not used in the Windows installer. Once you've run setup_all, however, you will need a copy of Inno Setup to build the actual installer executable. You can get it here: http://www.jrsoftware.org/isinfo.php Just load the "spambayes.iss" file from the windows subdirectory into Inno and compile the setup. -- Kenny Pitt From tim.one at comcast.net Sun Mar 14 22:19:39 2004 From: tim.one at comcast.net (Tim Peters) Date: Sun Mar 14 22:19:37 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D5@cliff.bai.org> Message-ID: [Ryan Malayter] > I just looked at the spam clues for several messages I have with > attachments, and SpamBayes 1.0a9 (Outlook Plug-in with default install > options) appears to ignore attachments completely. I get no > Content-Type or attachment name tokens at all. It seems these would > be good spam clues, or at least good for increasing the antivirus > capabilities inherent in SpamBayes filtering. And so on. This is all because Outlook destroys the original MIME structure, and so the only MIME armor the Outlook addin can ever see is the bit left in the email headers. > I've trained on several dozen Bagel-and NetSky infected messages, and > they all still score below 80%. If SpamBayes generated a token for the > attachment file name extension (.PIF, .EXE, whatever) it would > certainly help push these worm-generated messages over the top, and > aid in the quarantine of new worm variants, would it not? Yes (earlier testing said so), and any other way of using spambayes does synthesize a slew of tokens recording MIME clues. > Also, I noticed that I get only one 'Content-Type:text/plain' token > for each message, even though many of the messages are > 'Content-Type:multipart/alternative' with both text and HTML body > parts, as well as Word or other attachment MIME parts. Is that a bug? That's what the Outlook addin does. If you see this under any other way of using spambayes, then it would be a bug. From tim.one at comcast.net Sun Mar 14 22:19:40 2004 From: tim.one at comcast.net (Tim Peters) Date: Sun Mar 14 22:19:41 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: <792DE28E91F6EA42B4663AE761C41C2A01E1A3D8@cliff.bai.org> Message-ID: [Tim Peters] >> Right on both counts. We could do a better job of trying to >> *synthesize* MIME structure in the Outlook addin, but it's >> painful and nobody yet has bothered. [Ryan Malayter] > This doesn't seem like it would be that hard, It never does before you try it <0.9 wink>. If you pursue this, note that the tokenizer ignores the *bodies* of non-text/* attachments regardless, so there's little point in synthesizing MIME to recreate them. Sythnesizing the MIME armor saying that there *is* an attachment of type such-and-such is what counts. From mhammond at keypoint.com.au Mon Mar 15 04:04:40 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Mon Mar 15 04:05:09 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message-ID: <00c701c40a6c$8d1f21f0$0200a8c0@eden> Tim: > [Ryan Malayter] > > This doesn't seem like it would be that hard, > > It never does before you try it <0.9 wink>. > > If you pursue this, note that the tokenizer ignores the *bodies* of > non-text/* attachments regardless, so there's little point in > synthesizing > MIME to recreate them. Sythnesizing the MIME armor saying > that there *is* > an attachment of type such-and-such is what counts. Note that another, possibly more sane way of approaching this would be to manually synthesize tokens with the relelvant information. ie, if it really is as simple as "is there an attachment?" (or even tokens for the filenames/extensions), I expect it would be quite simple to implement a) without attempting to re-create the MIME just so the tokenizer can unpack it and b) without needing to extract the attachments themselves. I'm happy to offer guidance with this... Mark. From mhammond at keypoint.com.au Mon Mar 15 07:36:35 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Mon Mar 15 07:36:55 2004 Subject: [spambayes-dev] Open relay, and new release? Message-ID: <014501c40a8a$281361e0$0200a8c0@eden> I'm afraid I'm not up on the main list, but a quick scan shows nothing new of interest. Is there a fix for the relay issue recently checked-in/proposed I missed? If so, everything else seems quite stable, so maybe we can cut a 0.95 this week (then unable to avoid a 1.0 any longer :) Even straight to a 1.0rc? would suit me. If no fix is available yet, I'll just look forward to yet another week holiday, starting Friday. This time a week in Tasmania on the Ducati :) Assuming-Tony-can-pull-his-usual-rabbit-from-the-hat ly, Mark. From tim at fourstonesExpressions.com Mon Mar 15 08:43:06 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Mon Mar 15 08:43:36 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <014501c40a8a$281361e0$0200a8c0@eden> References: <014501c40a8a$281361e0$0200a8c0@eden> Message-ID: On Mon, 15 Mar 2004 23:36:35 +1100, Mark Hammond wrote: > I'm afraid I'm not up on the main list, but a quick scan shows nothing > new > of interest. > > Is there a fix for the relay issue recently checked-in/proposed I missed? I'm working a fix for this, but have been ill, so it's not finished as of yet... -- Tim Stone From tim.one at comcast.net Mon Mar 15 15:34:33 2004 From: tim.one at comcast.net (Tim Peters) Date: Mon Mar 15 15:34:30 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: <00c701c40a6c$8d1f21f0$0200a8c0@eden> Message-ID: [Mark Hammond] > Note that another, possibly more sane way of approaching this would > be to manually synthesize tokens with the relelvant information. Unless we cut a special back door for Outlook, and propagate that throughout the code, the only thing the Outlook addin can deliver to the tokenizer is a vanilla email-package message, so the only info it can communicate must live in the email headers, the synthesized MIME armor, or the message body. We can put anything into the message body, but since it *looks* like the message body then, it's subject to the limitations of any token in the message body (split on whitespace, replaced with a "skip" token if it's "too long", and so on). > ie, if it really is as simple as "is there an attachment?" (or even > tokens for the filenames/extensions), It's everything reachable from this part of tokenizer.py's tokenize_headers: # Content-{Type, Disposition} and their params, and charsets. # This is done for all MIME sections. for x in msg.walk(): for w in crack_content_xyz(x): yield w That synthesizes tokens for all the MIME sections throughout the email, covering their content-type content-type/type charset content-disposition params, and fancier tokenization of any target file names (the latter is where a token is normally generated for (among other things) "and this email had an attachment with a .pif extension"). > I expect it would be quite simple to implement a) without attempting > to re-create the MIME just so the tokenizer can unpack it and > b) without needing to extract the attachments themselves. > > I'm happy to offer guidance with this... Even better, just do it . From tameyer at ihug.co.nz Mon Mar 15 17:47:35 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Mon Mar 15 17:49:02 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA28D@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B28@its-xchg4.massey.ac.nz> [TimS] > I'm working a fix for this, but have been ill, so it's not > finished as of yet... This is my fault too, in that TimS asked me to try out some stuff he had managed to do, but I've been really busy at work and haven't managed to yet. [Mark] > If so, everything else seems quite stable, so maybe we can > cut a 0.95 this week (then unable to avoid a 1.0 any longer :) > Even straight to a 1.0rc? would suit me. I don't know how many people have tried the binary sb_server (sourceforge's stats, for what they're worth, are still down), but there certainly don't seem to have been any major problems with that. There have been (as usual) some imapfilter bugs, although I've tried to get those fixed as they've arrived, for the most part. The plug-in has been a bit more mixed - there have been a reasonable chunk of people that could install 0.81, but not 0.9. It would be nice to try and solve this, or put in some more debugging info that will help solve it in the future. I don't know enough about it to try and do this, though. I am +1 on putting out a release this week if we can manage it, particularly for the open-relay stuff and the imapfilter bugs. I'll try and find time today to get some stuff towards that done, and maybe I can put a source 1.0a95rc1 (or whatever it's being called) here for any developers that want to give it a spin. No promises, though :) =Tony Meyer From mhammond at keypoint.com.au Mon Mar 15 18:48:33 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Mon Mar 15 18:49:02 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B28@its-xchg4.massey.ac.nz> Message-ID: <02f801c40ae8$0779f4e0$0200a8c0@eden> > The plug-in has been a bit more mixed - there have been a > reasonable chunk > of people that could install 0.81, but not 0.9. Note that 0.81 had alot of people complaining about "DllRegisterServer failed - error 0", and at least some of them are able to install 0.9. But yeah, that sucks :( > It would be > nice to try and > solve this, or put in some more debugging info that will help > solve it in > the future. I don't know enough about it to try and do this, though. I will check out the issue with the incorrect pythoncomxx.dll being used. I *thought* I already went through this, making sure the right one was used even when others existed in system32. Mark. From mhammond at keypoint.com.au Mon Mar 15 19:10:03 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Mon Mar 15 19:10:25 2004 Subject: [spambayes-dev] saving attachments In-Reply-To: Message-ID: <030601c40aeb$083ca140$0200a8c0@eden> > [Mark Hammond] > > Note that another, possibly more sane way of approaching this would > > be to manually synthesize tokens with the relelvant information. > > Unless we cut a special back door for Outlook, and propagate > that throughout > the code, the only thing the Outlook addin can deliver to the > tokenizer is a > vanilla email-package message, so the only info it can > communicate must live > in the email headers, the synthesized MIME armor, or the > message body. I was thinking of a custom tokenizer for the plugin, that simply delegated everything to the real tokenizer, safe in the knowledge the mime related tokens would never be generated. We then just extend the set. I can't see any interface changes to the main engine are needed (said the blind man ) > > ie, if it really is as simple as "is there an attachment?" (or even > > tokens for the filenames/extensions), Thanks for the info. However, it also outlines my dilemma - it seems a shame to limit what we feed the tokenizer based on the limitations of mime. But I've always resisted my great ideas for making Outlook smarter, instead hoping for an improvement in the test tools so these ideas can actually be tested by more than one person. I'm good at finding catch-22s for myself. > Even better, just do it . Yes, too true, and I will. However, I will wait until post 1.0 - due to both stability concerns, and my personal overload! Mark. From skip at pobox.com Mon Mar 15 20:12:18 2004 From: skip at pobox.com (Skip Montanaro) Date: Mon Mar 15 20:12:24 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py Message-ID: <16470.21618.739677.255944@montanaro.dyndns.org> Folks, I have a change worked up for sb_dbexpimp.py which uses the csv module to write the interchange file. Accompanying that is a compatcsv module which talks just enough "csv" to be used by sb_dbexpimp.py for people stuck on Python 2.2. There are versions of both attached to https://sourceforge.net/tracker/?func=detail&aid=901920&group_id=61702&atid=498103 Since this is rather late in the game 1.0-wise I would like a little extra feedback before checking this stuff in. Also, speak now or forever hold your peace if you think there's a reason to maintain backward compatibility with the old interchange format. (My view is that it is just an interchange format and shouldn't be relied on for long-term storage.) Skip From tim at fourstonesExpressions.com Mon Mar 15 21:25:39 2004 From: tim at fourstonesExpressions.com (Tim Stone) Date: Mon Mar 15 21:26:20 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B28@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B28@its-xchg4.massey.ac.nz> Message-ID: On Tue, 16 Mar 2004 11:47:35 +1300, Tony Meyer wrote: > [TimS] >> I'm working a fix for this, but have been ill, so it's not >> finished as of yet... > > This is my fault too, in that TimS asked me to try out some stuff he had > managed to do, but I've been really busy at work and haven't managed to > yet. No problem... see below... > > [Mark] >> If so, everything else seems quite stable, so maybe we can >> cut a 0.95 this week (then unable to avoid a 1.0 any longer :) >> Even straight to a 1.0rc? would suit me. > > I don't know how many people have tried the binary sb_server > (sourceforge's > stats, for what they're worth, are still down), but there certainly don't > seem to have been any major problems with that. There have been (as > usual) > some imapfilter bugs, although I've tried to get those fixed as they've > arrived, for the most part. I've installed this binary, and it's working perfectly for me. Unfortunately, I haven't had a chance to try my fix out in it, either... :( -- Tim Stone From ta-meyer at ihug.co.nz Tue Mar 16 00:28:05 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 00:29:32 2004 Subject: [spambayes-dev] Open Relay Problem Fixed Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677AD9@its-xchg4.massey.ac.nz> I've checked in code that should fix the potential open-relay problem with smtp proxy. There are three parts to the fix: 1. The documentation is improved so that it's more clear that you don't have to enter in smtp server details if you don't plan on using it for training. 2. The "listening ports" option is now (like pop3proxy already was) able to accept and properly process values in the form server:port (this was a 6 character change; I think it was an oversight that it wasn't done before). This means that if you set it to (eg) localhost:25 only localhost connections will be allowed. The default is still empty, because the number of entries in this option has to be the same as the number in the server, and we don't know what that will be. 3. There are two new allow_remote_connections options, in the pop3proxy and smtpproxy sections (these can be found on the Advanced Configuration page for sb_server). These work just like the web interface one, in that you can set it to allow connections from anywhere, only localhost, or a set of IPs. This defaults to localhost connections only. To sum up, by default connections to the POP3 proxy and the SMTP proxy are only allowed from localhost. These can be expanded if desired, but you have to know what you're doing to do it, at least to a certain extent, so it's unlikely that it would accidentally happen. I've tested this as much as I can, using connections to & from my machine, and from the one next to me to mine. I don't control the firewall I'm behind, so I can't test further than that. If someone else has the time, it would be great to have someone else test it and confirm that this is correct. We're hoping for a release this week, and there doesn't appear to be much in the way (after this), so hopefully this will be there for the public very soon. You need sb_server v1.21, Options.py v1.105, ProxyUI.py v1.45, and smtpproxy v1.61. =Tony Meyer From tameyer at ihug.co.nz Tue Mar 16 02:17:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 02:17:28 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA4B0@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677AE2@its-xchg4.massey.ac.nz> > > The plug-in has been a bit more mixed - there have been a > > reasonable chunk of people that could install 0.81, but not 0.9. > > Note that 0.81 had alot of people complaining about > "DllRegisterServer failed - error 0", and at least some of > them are able to install 0.9. Oh, I do agree that it's an improvement, and (although it's hard to tell) it does seem that there are fewer people with install problems now, and at least some of the others can fall back on 0.81 (if they can find it ;). > I will check out the issue with the incorrect pythoncomxx.dll > being used. I *thought* I already went through this, making > sure the right one was used even when others existed in system32. It could be just that user, of course :) No-one's offered up a shell32.dll file as requested, which seems to cover the majority of the other reports, so not much we can do there, AFAIK. In other news: I've gone through the bug reports again and tried to sort out what I can for a new release. I've added in a fix for the open relay problem (as in the previous message), and fixed an outstanding imapfilter bug that I really needed to get to. I'll try and have a look at Skip's sb_dbexpimp.py patch today or tomorrow, and also update the changelog & what's new files. Anyone have other things that they'd like to see in a new release? =Tony Meyer --- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. This way, you get everyone's help, and avoid a lack of replies when I'm busy. From kennypitt at hotmail.com Tue Mar 16 09:58:29 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Tue Mar 16 09:59:35 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B28@its-xchg4.massey.ac.nz> Message-ID: Tony Meyer wrote: > The plug-in has been a bit more mixed - there have been a reasonable > chunk of people that could install 0.81, but not 0.9. We found that one problem with the pythoncom.frozen state in outlook_addin_register that might have been responsible for some of this. Unfortunately, no way to know for sure if that was the only issue. -- Kenny Pitt From skip at pobox.com Tue Mar 16 16:39:00 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 16 16:39:26 2004 Subject: [spambayes-dev] Re: [Spambayes] VERY confusing... In-Reply-To: <002501c40b97$a3d79b80$0300a8c0@nuthouse> References: <000801c40b81$f6827cd0$0300a8c0@nuthouse> <16471.26004.349056.71339@montanaro.dyndns.org> <002501c40b97$a3d79b80$0300a8c0@nuthouse> Message-ID: <16471.29684.615907.94033@montanaro.dyndns.org> (Please keep spambayes@python.org in the cc list so you get the benefit of more eyeballs and neurons. I'm also adding spambayes-dev to the cc list to generate input on whether we should be doing something to fool virus scanners.) Bethanie> I have version 4 and to be frank I don't have a clue as to Bethanie> what I may have done after installing it...all I know is that Bethanie> Norton halts the mail because of spam(and this only began 5 Bethanie> days ago) and yes when I turn off Norton email scanning my Bethanie> mail comes through fine. I want virus scanning though! Version 4. Hmmm... The latest version of the Spambayes distribution is 1.0a9 which includes version 0.4 of the POP3 Proxy. I'll assume that's what you mean. Interacting with virus scanners can be a pain. Norton may be snatching the mail out from under the POP3 Proxy's nose when it's not expecting it, though I thought Tony had some checks in the code to handle those situations. Bethanie> I can go on site and delete the spams from there and then go Bethanie> back to OE and download the rest of the mail but that's a Bethanie> PITA. Agreed, that's not the way to go. Bethanie> All I did was download spambaye and allow it to install and Bethanie> then I attempted to configure it. I use Comcast cable a Pop3 Bethanie> server btw. Can you let us know what your configuration settings are? In particular, I wonder what the "cache messages" setting on http://localhost:8880/config is set to. Unfortunately, if you don't let it cache messages you can't train on them. If you do disable that setting can you receive mail with Norton enabled? I'm not a Windows or Norton user (nor do I use the POP3 proxy very often). Is it possible to tell Norton to be a bit more selective about the folders it scans? Another possibility - more for Tony Meyer's eyes - is to consider obfuscating the cached messages so they don't look like virus candidates to Norton. Perhaps perform a simple rot-128 of the contents of the messages or xor them with a known string before writing them, or write them gzipped. Anything which you can fairly easily reverse when reading them but which will defang them enough to fool Norton. Skip From tameyer at ihug.co.nz Tue Mar 16 17:04:01 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 17:04:29 2004 Subject: [spambayes-dev] RE: [Spambayes] InBoxer/SpamAtBay beta available. In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA798@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677AE6@its-xchg4.massey.ac.nz> > 9) A menu item was added to the statistics menu to allow the > user to balance the database statistics (creating equal number > of good and spam messages), which can improve performance if > the ratio of good messages to spam messages is very large or > very small, at the cost of reducing some of the information > in the database. If the answer to this is commercially sensitive, feel free to ignore this :) What sort of thing is InBoxer doing to balance the database? It is actually removing messages/tokens from the database in the category that's too high, or finding additional messages for the category that's too low, doing some wizzy math, or something else? =Tony Meyer From skip at pobox.com Tue Mar 16 17:34:59 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 16 17:35:38 2004 Subject: [spambayes-dev] RE: [Spambayes] InBoxer/SpamAtBay beta available. In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677AE6@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13057AA798@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F1304677AE6@its-xchg4.massey.ac.nz> Message-ID: <16471.33043.566959.678005@montanaro.dyndns.org> Tony> What sort of thing is InBoxer doing to balance the database? It Tony> is actually removing messages/tokens from the database in the Tony> category that's too high, or finding additional messages for the Tony> category that's too low, doing some wizzy math, or something else? I'd be interested in this as well. In the train-to-exhaustion script I currently force it to train on pairs of ham and spam. That means the database is rigorously in-balance (except for the repeated training bit), but that many spams I've saved are so far unused. I generally add hams to the database which didn't score below 0.4 just to try and boost the number of hams. I'd like to know a good way to selectively discard spams for this endeavor. Skip From tameyer at ihug.co.nz Tue Mar 16 17:55:05 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 17:56:55 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA4EC@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2B@its-xchg4.massey.ac.nz> [Skip] > Since this is rather late in the game 1.0-wise I would > like a little extra feedback before checking this stuff in. I was too late trying it out for this, but it (cvs version) also works for me. One query (I've used the csv module quite a bit since I moved to 2.3, but only reading, never writing, so haven't noticed this before): I see that it writes rows with '\r\n' termination, so in Excel I get blank lines between every row (with a file as long as the spambayes database, this means I miss a lot of data). Should we provide an option to the dbexpimp script to change the line terminator to '\n'? (Simple enough to do, if I read the csv doc correctly). Or maybe just have a "if sys.platform == "win32": lineterminator = '\n'" kinda thing? > Also, speak now or forever hold your peace if you think > there's a reason to maintain backward compatibility with > the old interchange format. (My view is that it is just an > interchange format and shouldn't be relied on for long-term storage.) I agree. +1 to dumping it (or rather, forcing people to use the 1.0a9 script). [Skip again, from a message a little while back] > It sounds like you use it a lot. This is my first brush with it, > and even there its use is embedded in a shell script so typing long > filenames isn't a big deal. What's your use case that you need > it so frequently? Thinking about it more, my usage pattern is/was both poorly thought out and atypical, so my modification is a dumb idea. I use it mostly when I could use the (newer) spamcounts.py script instead, except that I've never got around to figuring out how to use that script correctly (it never gives me the right answers, but I'm sure it's operator error). For example, I'll want to see how often an experimental token gets used, or something like that. A lot of the time I could just use a shell script (even on Windows ) to get around the long pathname, anyway. Forget I mentioned it ;) =Tony Meyer From skip at pobox.com Tue Mar 16 18:21:47 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 16 18:21:58 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13057AA4EC@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2B@its-xchg4.massey.ac.nz> Message-ID: <16471.35851.502620.713000@montanaro.dyndns.org> (Not noticing that Tony cc'd spambayes-dev I originally sent this reply just to him.) >> Since this is rather late in the game 1.0-wise I would like a little >> extra feedback before checking this stuff in. Tony> I was too late trying it out for this, but it (cvs version) also Tony> works for me. Tony> One query (I've used the csv module quite a bit since I moved to Tony> 2.3, but only reading, never writing, so haven't noticed this Tony> before): I see that it writes rows with '\r\n' termination, so in Tony> Excel I get blank lines between every row (with a file as long as Tony> the spambayes database, this means I miss a lot of data). The csv file should be opened in "wb" mode. I thought I caught that. Can you take a quick look? Also, you are talking about using the real csv module, not the compatcsv thing, right? Tony> Should we provide an option to the dbexpimp script to change the Tony> line terminator to '\n'? (Simple enough to do, if I read the csv Tony> doc correctly). Or maybe just have a "if sys.platform == "win32": Tony> lineterminator = '\n'" kinda thing? No, I don't think so. It seems we have a bug to squash. We control everything about reading and writing that file. We should be able to make it work without any hints from the user. Tony> For example, I'll want to see how often an experimental token gets Tony> used, or something like that. A lot of the time I could just use Tony> a shell script (even on Windows ) to get around the long Tony> pathname, anyway. Forget I mentioned it ;) Okay. Here's a simple use of spamcounts: % spamcounts -d ~/tmp/tte.db -r 'long cons word' db: /Users/skip/tmp/tte.db token,nspam,nham,spam prob long cons word,32,7,0.797764401748 subject:long cons word,9,0,0.97619047619 It says report on all tokens in tte.db which match the regular expression (using re.search) 'long cons word'. Without the -r it only matches the first token. (It also runs a lot faster.) Skip From tameyer at ihug.co.nz Tue Mar 16 19:15:25 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 19:15:57 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA83B@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2C@its-xchg4.massey.ac.nz> > The csv file should be opened in "wb" mode. I thought I > caught that. Can you take a quick look? Ah, I see - no it was being opened in "w" mode, and changing that to "wb" gets rid of the blank line in Excel/Wordpad. I'll check this in. > Also, you are talking about using the real csv module, not > the compatcsv thing, right? I was, yes. I did a test run with 2.2, though (after the "wb" mod), and got the same output as 2.3, so I'm happy that the compatcsv module works. > Okay. Here's a simple use of spamcounts: [...] Cool, thanks. I'll force myself to figure it out properly (that example should be enough, really) next time I want to do this. =Tony Meyer From tameyer at ihug.co.nz Tue Mar 16 19:21:43 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 19:22:14 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA861@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677AEA@its-xchg4.massey.ac.nz> [Skip] > The csv file should be opened in "wb" mode. I thought I > caught that. Can you take a quick look? As an aside, the csv module documentation does say this this should be done: "If csvfile is a file object, it must be opened with the 'b' flag on platforms where that makes a difference", but the example in the documentation just uses "w". In your opinion, do you think this would be worth me creating a (very small!) documentation patch and submitting it? =Tony Meyer From tameyer at ihug.co.nz Tue Mar 16 19:32:17 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 16 19:33:07 2004 Subject: [spambayes-dev] RE: [Spambayes] VERY confusing... In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA7E6@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2D@its-xchg4.massey.ac.nz> [Bethanie] > I have version 4 and to be frank I don't have a clue as to > what I may have done after installing it...all I know is that > Norton halts the mail because of spam(and this only began 5 > days ago) and yes when I turn off Norton email scanning my > mail comes through fine. I want virus scanning though! I'm not really clear on the details here, but it sounds like this problem doesn't involve spambayes at all. Did you have this problem before using spambayes? If you don't use spambayes (change your mail program back to connecting to the mail server, rather than to "localhost"), do you still have the problem? What do you mean by "halts the mail"? Does Norton stop the mail being displayed? Stop it being filtered by spambayes? Stop the delivery of that one message? Stop all mail delivery? > Interacting with > virus scanners can be a pain. Norton may be snatching the > mail out from under the POP3 Proxy's nose when it's not > expecting it, though I thought Tony had some checks in the > code to handle those situations. We've gone through a few things here. TimS (IIRC) originally had the proxy ignore problems where messages disappear from under its nose, and (for some valid at the time reason) I (IIRC) modified that to continue on, printing out an error, but not displaying any missing messages in the review page. > Another possibility - more for Tony Meyer's eyes - is to > consider obfuscating the cached messages so they don't look > like virus candidates to Norton. Perhaps perform a simple > rot-128 of the contents of the messages or xor them with a > known string before writing them, or write them gzipped. > Anything which you can fairly easily reverse when reading > them but which will defang them enough to fool Norton. I've wondered about this too (people reporting the "file is missing" error is relatively common), although being able to read the message files as plain text, without using spambayes at all, is nice, IMO. I hadn't thought about gzip, though - you can set the cache to gzip all the messages already (something Richie added way way back, maybe?). I don't know if that would stop virus checkers identifying the cached messages or not, though you're probably right that it would. I don't have any handy to test on. The appropriate option is exposed on the "Advanced Configuration" page of the web interface. Bethanie could try turning that on, and seeing if that makes a difference. =Tony Meyer --- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. This way, you get everyone's help, and avoid a lack of replies when I'm busy. From skip at pobox.com Tue Mar 16 20:32:08 2004 From: skip at pobox.com (Skip Montanaro) Date: Tue Mar 16 20:32:15 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677AEA@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13057AA861@its-xchg4.massey.ac.nz> <1ED4ECF91CDED24C8D012BCF2B034F1304677AEA@its-xchg4.massey.ac.nz> Message-ID: <16471.43672.812383.631011@montanaro.dyndns.org> Tony> As an aside, the csv module documentation does say this this Tony> should be done: "If csvfile is a file object, it must be opened Tony> with the 'b' flag on platforms where that makes a difference", but Tony> the example in the documentation just uses "w". In your opinion, Tony> do you think this would be worth me creating a (very small!) Tony> documentation patch and submitting it? Nah, I'll take care of it in a moment. Thanks for catching that. Skip From kennypitt at hotmail.com Wed Mar 17 09:43:24 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Mar 17 09:44:35 2004 Subject: [spambayes-dev] Re: [Spambayes] VERY confusing... In-Reply-To: <16471.29684.615907.94033@montanaro.dyndns.org> Message-ID: Skip Montanaro wrote: > Another possibility - more for Tony Meyer's eyes - is to consider > obfuscating the cached messages so they don't look like virus > candidates to Norton. Perhaps perform a simple rot-128 of the > contents of the messages or xor them with a known string before > writing them, or write them gzipped. There's already a "Use gzip" option in the Storage Options section of the Advanced Configuration page. If you set that to Yes then SpamBayes will gzip compress any files that it writes to the cache. That would be worth a try if you think the cache is the cause of the Norton problems. -- Kenny Pitt From adam.walker at rbwconsulting.com Wed Mar 17 09:55:48 2004 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Wed Mar 17 09:55:35 2004 Subject: [spambayes-dev] Re: [Spambayes] VERY confusing... In-Reply-To: References: Message-ID: <405866F4.1050600@rbwconsulting.com> Most virus scanners can check compressed archives. That's why some viri have resorted to using password protected archives. Kenny Pitt wrote: > > There's already a "Use gzip" option in the Storage Options section of > the Advanced Configuration page. If you set that to Yes then SpamBayes > will gzip compress any files that it writes to the cache. That would be > worth a try if you think the cache is the cause of the Norton problems. > From kennypitt at hotmail.com Wed Mar 17 10:20:36 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Mar 17 10:21:48 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2B@its-xchg4.massey.ac.nz> Message-ID: Tony Meyer wrote: > [Skip] >> Since this is rather late in the game 1.0-wise I would >> like a little extra feedback before checking this stuff in. > > I was too late trying it out for this, but it (cvs version) also > works for me. > > One query (I've used the csv module quite a bit since I moved to 2.3, > but only reading, never writing, so haven't noticed this before): I > see that it writes rows with '\r\n' termination, so in Excel I get > blank lines between every row (with a file as long as the spambayes > database, this means I miss a lot of data). > > Should we provide an option to the dbexpimp script to change the line > terminator to '\n'? (Simple enough to do, if I read the csv doc > correctly). Or maybe just have a "if sys.platform == "win32": > lineterminator = '\n'" kinda thing? The '\r\n' terminator should be correct for Windows, and if I import my csv file into Excel 2003 I don't get any blank rows. Unix, on the other hand, would normally expect '\n' only. The usual method for handling this would be to always specify '\n' as the line terminator, but write files using text mode instead of binary mode so that the line terminator is translated according to the current platform. The csv module doesn't seem to do it this way, and Skip is opening the output file using 'wb' mode in sb_dbexpimp which is correct given the way csv is specifying the line endings. I did notice one oddity in my exported csv file. There were a few tokens, most of them subject: something but a few others, that were wrapped between two lines using an e-mail header style wrapping where the continuation line begins with a tab. Even though the rest of the file is written with '\r\n' line terminators, these wrapped lines have only a '\n' terminator. This anomaly fooled my Vim editor into thinking the file was a Unix file with '\n' line endings, so I wonder if the same thing could throw off Excel in some cases. -- Kenny Pitt From kennypitt at hotmail.com Wed Mar 17 10:27:18 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Mar 17 10:28:29 2004 Subject: [spambayes-dev] Re: [Spambayes] VERY confusing... In-Reply-To: <405866F4.1050600@rbwconsulting.com> Message-ID: Adam Walker wrote: > Most virus scanners can check compressed archives. That's why some > viri have resorted to using password protected archives. > > Kenny Pitt wrote: >> >> There's already a "Use gzip" option in the Storage Options section of >> the Advanced Configuration page. If you set that to Yes then >> SpamBayes will gzip compress any files that it writes to the cache. >> That would be worth a try if you think the cache is the cause of the >> Norton problems. It would be interesting to see if Norton or other anti-virus programs detect the compression in this case. If they are checking signature bytes in the file data itself to determine the filetype then it's possible, but we don't use a standard file extension that would normally be associated with gzip data. -- Kenny Pitt From kennypitt at hotmail.com Wed Mar 17 10:54:39 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Mar 17 10:55:46 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: Message-ID: Kenny Pitt wrote: > The csv module doesn't seem to do it this way, and Skip is opening > the output file using 'wb' mode in sb_dbexpimp which is correct given > the way csv is specifying the line endings. Never mind, I see the 'wb' mode was a more recent change. That's what I get for replying before catching up on my entire backlog. <0.5 wink> -- Kenny Pitt From skip at pobox.com Wed Mar 17 12:58:05 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Mar 17 12:59:45 2004 Subject: [spambayes-dev] modified version of sb_dbexpimp.py In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B2B@its-xchg4.massey.ac.nz> Message-ID: <16472.37293.987850.333685@montanaro.dyndns.org> Kenny> I did notice one oddity in my exported csv file. There were a Kenny> few tokens, most of them subject: something but a few others, Kenny> that were wrapped between two lines using an e-mail header style Kenny> wrapping where the continuation line begins with a tab. Even Kenny> though the rest of the file is written with '\r\n' line Kenny> terminators, these wrapped lines have only a '\n' terminator. Kenny> This anomaly fooled my Vim editor into thinking the file was a Kenny> Unix file with '\n' line endings, so I wonder if the same thing Kenny> could throw off Excel in some cases. The csv module handles multi-line fields and the authors went to great pains to try and read/write Excel-compatible files. I believe in Excel that an embedded newline is just a newline, not CRLF. If the files are throwing off vim or Emacs that's not all that surprising. How they interpret files with mixes of CRLF and LF is anyone's guess. Skip From tameyer at ihug.co.nz Wed Mar 17 16:50:14 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Wed Mar 17 16:50:26 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13057AA4B0@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677AF7@its-xchg4.massey.ac.nz> > I will check out the issue with the incorrect pythoncomxx.dll > being used. I *thought* I already went through this, making > sure the right one was used even when others existed in system32. I'm not sure how closely you're monitoring the bug reports at the moment, so in case you're not: Someone that had this problem searched for all instances of the pythoncom dll, found two, and replaced the system one with the spambayes one, and the problem went away (although who knows what other trouble that will cause!). Another user, reading the comment, did the same thing, and his problem went away, too. Seems pretty certain that this is a problem, then. Apart from this, I don't have anything that I feel needs doing before a new release (other than deciding on a version number ). =Tony Meyer From hera at optonline.net Wed Mar 17 21:05:05 2004 From: hera at optonline.net (Erin Lazzaro) Date: Wed Mar 17 21:07:27 2004 Subject: [spambayes-dev] RE: [Spambayes] VERY confusing... Message-ID: <001601c40c8d$72f13930$6a01a8c0@Peacemaker> I have Norton also, and whenever it detects a virus on an email, it displays an informational dialog. Downloading pauses until you dismiss the dialog. If that's the problem, then I don't think SpamBayes can help. There should be a way to configure Norton so it doesn't pause (it's a major pain), but I haven't found it yet. -Erin Lazzaro -----Original Message----- From: "Tony Meyer" Sent: 03/16/2004 19:32:18 To: "skip@pobox.com" , "'Bethanie Gordon'" Cc: "spambayes@python.org" , "spambayes-dev@python.org" Subject: RE: [Spambayes] VERY confusing... [Bethanie] > I have version 4 and to be frank I don't have a clue as to > what I may have done after installing it...all I know is that > Norton halts the mail because of spam(and this only began 5 > days ago) and yes when I turn off Norton email scanning my > mail comes through fine. I want virus scanning though! I'm not really clear on the details here, but it sounds like this problem doesn't involve spambayes at all. Did you have this problem before using spambayes? If you don't use spambayes (change your mail program back to connecting to the mail server, rather than to "localhost"), do you still have the problem? What do you mean by "halts the mail"? Does Norton stop the mail being displayed? Stop it being filtered by spambayes? Stop the delivery of that one message? Stop all mail delivery? > Interacting with > virus scanners can be a pain. Norton may be snatching the > mail out from under the POP3 Proxy's nose when it's not > expecting it, though I thought Tony had some checks in the > code to handle those situations. We've gone through a few things here. TimS (IIRC) originally had the proxy ignore problems where messages disappear from under its nose, and (for some valid at the time reason) I (IIRC) modified that to continue on, printing out an error, but not displaying any missing messages in the review page. > Another possibility - more for Tony Meyer's eyes - is to > consider obfuscating the cached messages so they don't look > like virus candidates to Norton. Perhaps perform a simple > rot-128 of the contents of the messages or xor them with a > known string before writing them, or write them gzipped. > Anything which you can fairly easily reverse when reading > them but which will defang them enough to fool Norton. I've wondered about this too (people reporting the "file is missing" error is relatively common), although being able to read the message files as plain text, without using spambayes at all, is nice, IMO. I hadn't thought about gzip, though - you can set the cache to gzip all the messages already (something Richie added way way back, maybe?). I don't know if that would stop virus checkers identifying the cached messages or not, though you're probably right that it would. I don't have any handy to test on. The appropriate option is exposed on the "Advanced Configuration" page of the web interface. Bethanie could try turning that on, and seeing if that makes a difference. =Tony Meyer --- Please always include the list (spambayes@python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. This way, you get everyone's help, and avoid a lack of replies when I'm busy. _______________________________________________ Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html From info2 at bulwar.pl Thu Mar 18 07:46:19 2004 From: info2 at bulwar.pl (Dwukierunkowe Gadu Gadu na WAP-Administrator) Date: Thu Mar 18 07:47:46 2004 Subject: [spambayes-dev] ui.html - How to force to make changes? Message-ID: <40599A1A.57BB632D@bulwar.pl> I modified ui.html after that i ran setup_all.py, but after compiling package I didn't see the changes. How to force changes in ui.html to make present? From kennypitt at hotmail.com Thu Mar 18 09:15:57 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Mar 18 09:17:16 2004 Subject: [spambayes-dev] ui.html - How to force to make changes? In-Reply-To: <40599A1A.57BB632D@bulwar.pl> Message-ID: Dwukierunkowe Gadu Gadu na WAP-Administrator wrote: > I modified ui.html after that i ran setup_all.py, but after compiling > package I didn't see the changes. > > > How to force changes in ui.html to make present? First make sure that you have resourcepackage properly installed. Then you need to run the proxy once from the command line before running setup_all.py. The ui_html.py file will be regenerated when sb_server.py (or pop3proxy_tray.py) is executed, but it is not done automatically by setup_all.py. -- Kenny Pitt From mhammond at keypoint.com.au Fri Mar 19 00:06:54 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Fri Mar 19 00:07:18 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1304677AF7@its-xchg4.massey.ac.nz> Message-ID: <003b01c40d6f$ff069950$0200a8c0@eden> > I'm not sure how closely you're monitoring the bug reports at > the moment, so > in case you're not: I'm not :) > Someone that had this problem searched for all instances of > the pythoncom > dll, found two, and replaced the system one with the > spambayes one, and the > problem went away (although who knows what other trouble that > will cause!). > Another user, reading the comment, did the same thing, and > his problem went > away, too. I have discovered the problem. ActivePython is still registering the 'pythoncom' and 'pywintypes' module in the registry. This is no longer necessary, for any version of Python. Indeed, it is what is causing the problem - our binary still *does* look in the registry, as python23.dll itself does. The capability to register modules in the registry was a dumb one that I must take full credit for . Trent - would it be possible for you to modify the ActivePython installer to not write these module values, now pywin32 no longer needs them? The best thing I can see to do is for us to ship a slightly modified python23.dll. All that needs modification is a string resource in the DLL - ie, no code. This will change where Python looks for its configuration information in the registry. By changing this value, we can ensure that we can't possibly conflict with any existing installations. At this stage, the only way I can change the resource is via MSVC. With a little work, it will be possible to have py2exe write a custom value for this string into the DLL - it can already write resources, but strings need a little work. Thomas - if you are listening, do you have any thoughts on this? Unfortunately, we really can't simply ask people to upgrade ActivePython, as I believe even the latest writes this registry value. Further, an upgrade of an old ActivePython may leave it in place (recent pywin32 installers actually remove the values!) And finally, I'm off for a week holiday. I tried to get a build out (including a hand-modified string resource), but for some reason it is failing to work for me. I'm out of time to debug this, so I'm afraid it will have to wait. Thanks, Mark. From theller at python.net Fri Mar 19 09:47:03 2004 From: theller at python.net (Thomas Heller) Date: Fri Mar 19 09:47:27 2004 Subject: [spambayes-dev] Re: Open relay, and new release? References: <1ED4ECF91CDED24C8D012BCF2B034F1304677AF7@its-xchg4.massey.ac.nz> <003b01c40d6f$ff069950$0200a8c0@eden> Message-ID: "Mark Hammond" writes: >> I'm not sure how closely you're monitoring the bug reports at the >> moment, so in case you're not: > > I'm not :) > >> Someone that had this problem searched for all instances of the >> pythoncom dll, found two, and replaced the system one with the >> spambayes one, and the problem went away (although who knows what >> other trouble that will cause!). Another user, reading the comment, >> did the same thing, and his problem went away, too. > > I have discovered the problem. ActivePython is still registering the > 'pythoncom' and 'pywintypes' module in the registry. This is no > longer necessary, for any version of Python. Indeed, it is what is > causing the problem - our binary still *does* look in the registry, as > python23.dll itself does. An idea: the ignore environment flag (I do not remember the exact name now, but you know what I mean: the flag set by the -E command line switch) should also prevent the python dll to look into the registry, probably. > The capability to register modules in the registry was a dumb one that I > must take full credit for . Robin Dunn was also creative: he registers the wxPython version number in the registry where additional modules are expected - leading to bugs in Gordon's installer ;-). > The best thing I can see to do is for us to ship a slightly modified > python23.dll. All that needs modification is a string resource in the DLL - > ie, no code. This will change where Python looks for its configuration > information in the registry. By changing this value, we can ensure that we > can't possibly conflict with any existing installations. > > At this stage, the only way I can change the resource is via MSVC. With a > little work, it will be possible to have py2exe write a custom value for > this string into the DLL - it can already write resources, but strings need > a little work. Thomas - if you are listening, do you have any thoughts on > this? It is already possible and has for a while even in py2exe 0.4 or so. At that time, built services required some string table entries. This little script patches the stringtable in python23.dll: """ from py2exe.resources.StringTables import StringTable, RT_STRING from py2exe.py2exe_util import add_resource s = StringTable() s.add_string(1000, "2.3.py2exe") delete = True for id, data in s.binary(): add_resource(ur"c:\tests\python23.dll", data, RT_STRING, id, delete) delete = False """ The only problem is that py2exe's add_resource function does always use MAKELINGID(LANGNEUTRAL, SUBLANG_NEUTRAL) as LCID, and the standard python dll has the string table in 'English US'. So, the above script has to delete all existing resources first, and this of course also removes the existing versioninfo resource. OTOH, this is probably the right thing to do, because it's not the standard dll anymore. The other problem, of course, is that the script doesn't throw an error when Windows fails to update the resource because the file is already open, not writeable, or whatever. But that's a windows problem, imo. Thomas From trentm at ActiveState.com Fri Mar 19 12:24:42 2004 From: trentm at ActiveState.com (Trent Mick) Date: Fri Mar 19 12:26:33 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <003b01c40d6f$ff069950$0200a8c0@eden>; from mhammond@keypoint.com.au on Fri, Mar 19, 2004 at 04:06:54PM +1100 References: <1ED4ECF91CDED24C8D012BCF2B034F1304677AF7@its-xchg4.massey.ac.nz> <003b01c40d6f$ff069950$0200a8c0@eden> Message-ID: <20040319092442.B1152@ActiveState.com> [Mark Hammond wrote] > I have discovered the problem. ActivePython is still registering the > 'pythoncom' and 'pywintypes' module in the registry. This is no longer > necessary, for any version of Python. Indeed, it is what is causing the > problem - our binary still *does* look in the registry, as python23.dll > itself does. > > The capability to register modules in the registry was a dumb one that I > must take full credit for . Trent - would it be possible for you to > modify the ActivePython installer to not write these module values, now > pywin32 no longer needs them? > Do you mean these ones? HKLM\Software\Python\PythonCore\{2.2|2.3}\Modules\... If so, then yes I can change ActivePython to not write these. Note that I won't have the bandwidth to get new ActivePython builds out very quickly though. > Unfortunately, we really can't simply ask people to upgrade > ActivePython, as I believe even the latest writes this registry value. It does. > Further, an upgrade of an old ActivePython may leave it in place (recent > pywin32 installers actually remove the values!) Actually "upgrading" ActivePython involves uninstalling the old and installing the new. The uninstall will remove those registry entries and the install of the new (presuming the above changes) will not have them. Let me know if those are the registry entries that you mean. Cheers, Trent -- Trent Mick TrentM@ActiveState.com From papaDoc at videotron.ca Sun Mar 21 10:58:41 2004 From: papaDoc at videotron.ca (papaDoc) Date: Sun Mar 21 10:58:48 2004 Subject: [spambayes-dev] Sb_server crash on training (Last change to Corpus might be the problem) Message-ID: <405DBBB1.8030905@videotron.ca> Hi everyone, I'm using sb_server SpamBayes POP3 Proxy Version 0.4 (February 2004) and engine SpamBayes Engine Version 0.3 (January 2004). More precisely: from cvs of the 18 or 19/03/2004 When I try to train on message I get the following error: This is the traceback from the web page Traceback (most recent call last): File "c:\Devtools\spambayes\spambayes\spambayes\Dibbler.py", line 465, in found_terminator getattr(plugin, name)(**params) File "c:\Devtools\spambayes\spambayes\spambayes\ProxyUI.py", line 392, in onReview fromCache=True) File "c:\Devtools\spambayes\spambayes\spambayes\Corpus.py", line 206, in takeMessage print "Corpus:takeMessage: options=%s" % options["Headers", header_opt] TypeError: not enough arguments for format string I try to add some print statement since in the code their is a warning # For Python 2.2, which doesn't allow "string in string". but I'm using Python 2.3.2 on Windows 2k #################################### First try print "Corpus:takeMessage: Ici 1" for header, header_opt in (("Subject", "notate_subject"), ("To", "notate_to")): print "Corpus:takeMessage: Ici 4" print "Corpus:takeMessage: options=%s" % options print "Corpus:takeMessage: header_opt=%s" % header_opt # print "Corpus:takeMessage: options[Headers]=%s" % options["Headers"] # print "Corpus:takeMessage: options=%s" % options["Headers", header_opt] print "Corpus:takeMessage: Ici 4.1" Corpus:takeMessage: Start Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 2 Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 4 Corpus:takeMessage: options= Corpus:takeMessage: header_opt=notate_subject Corpus:takeMessage: Ici 4.1 #################################### Second try print "Corpus:takeMessage: Ici 1" for header, header_opt in (("Subject", "notate_subject"), ("To", "notate_to")): print "Corpus:takeMessage: Ici 4" print "Corpus:takeMessage: options=%s" % options print "Corpus:takeMessage: header_opt=%s" % header_opt print "Corpus:takeMessage: options[Headers]=%s" % options["Headers"] # print "Corpus:takeMessage: options=%s" % options["Headers", header_opt] print "Corpus:takeMessage: Ici 4.1" Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 2 Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 4 Corpus:takeMessage: options= Corpus:takeMessage: header_opt=notate_subject (Nothing printed ???) I get back the web page with no error but the message was not train (i.e. still in the unsure section). #################################### Third try print "Corpus:takeMessage: Ici 1" for header, header_opt in (("Subject", "notate_subject"), ("To", "notate_to")): print "Corpus:takeMessage: Ici 4" print "Corpus:takeMessage: options=%s" % options print "Corpus:takeMessage: header_opt=%s" % header_opt # print "Corpus:takeMessage: options[Headers]=%s" % options["Headers"] print "Corpus:takeMessage: options=%s" % options["Headers", header_opt] print "Corpus:takeMessage: Ici 4.1" Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 2 Corpus:takeMessage: Ici 1 Corpus:takeMessage: Ici 4 Corpus:takeMessage: options= Corpus:takeMessage: header_opt=notate_subject This is the traceback from the web page Traceback (most recent call last): File "c:\Devtools\spambayes\spambayes\spambayes\Dibbler.py", line 465, in found_terminator getattr(plugin, name)(**params) File "c:\Devtools\spambayes\spambayes\spambayes\ProxyUI.py", line 392, in onReview fromCache=True) File "c:\Devtools\spambayes\spambayes\spambayes\Corpus.py", line 206, in takeMessage print "Corpus:takeMessage: options=%s" % options["Headers", header_opt] TypeError: not enough arguments for format string Since this is above my head I give you the problem but I'm ready to do the testing. Remi -- /"\ \ / X ASCII Ribbon Campaign / \ Against HTML Email From tameyer at ihug.co.nz Sun Mar 21 19:03:43 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Sun Mar 21 19:04:00 2004 Subject: [spambayes-dev] Sb_server crash on training (Last change to Corpusmight be the problem) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1305904123@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B43@its-xchg4.massey.ac.nz> > When I try to train on message I get the following error: [...] > File "c:\Devtools\spambayes\spambayes\spambayes\Corpus.py", > line 206, > in takeMessage > print "Corpus:takeMessage: options=%s" % > options["Headers", header_opt] > > TypeError: not enough arguments for format string This is the traceback *after* you put in print statements, though, right? As far as I can see, there aren't any print statements in takeMessage(). However, there were two bugs in the function; I've just checked in fixes for those. Could you see if this fixes your problem too? (Sorry about the bugs - I thought I had tested the code, but when checking now, I realise I was testing the wrong copy of the spambayes code). =Tony Meyer From papaDoc at videotron.ca Mon Mar 22 08:48:29 2004 From: papaDoc at videotron.ca (papaDoc) Date: Mon Mar 22 08:48:41 2004 Subject: [spambayes-dev] Sb_server crash on training (Last change to Corpusmight be the problem) In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B43@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B43@its-xchg4.massey.ac.nz> Message-ID: <405EEEAD.5000504@videotron.ca> Hi Tony, >>When I try to train on message I get the following error: >> >> >[...] > > >> File "c:\Devtools\spambayes\spambayes\spambayes\Corpus.py", >>line 206, >>in takeMessage >> print "Corpus:takeMessage: options=%s" % >>options["Headers", header_opt] >> >>TypeError: not enough arguments for format string >> >> > >This is the traceback *after* you put in print statements, though, right? >As far as I can see, there aren't any print statements in takeMessage(). > > Yes this is my print statement. For now this is the only way I know to debug python. >However, there were two bugs in the function; I've just checked in fixes for >those. Could you see if this fixes your problem too? > I will do it right now........ No problem anymore. Thanks >(Sorry about the bugs >- I thought I had tested the code, but when checking now, I realise I was >testing the wrong copy of the spambayes code). > > This remind me of something but I don't want to talk about it -- /"\ \ / X ASCII Ribbon Campaign / \ Against HTML Email From papaDoc at videotron.ca Mon Mar 22 13:41:35 2004 From: papaDoc at videotron.ca (papaDoc) Date: Mon Mar 22 13:41:39 2004 Subject: [spambayes-dev] Help with ui.html and ui_html.py Message-ID: <405F335F.1040000@videotron.ca> Hi, I'm trying to add a column in the review page of sb_server. The column is the current score (Can be different of the message score if I do some training). The problem I can see my change. I browsed the mailing list archive and found I need to regenerate the ui_html.py with resource_package. So where is resource_package ? I found on the web resourcepackage (http://resourcepackage.sourceforge.net/) but I'm not sure this is the right tools. Remi -- /"\ \ / X ASCII Ribbon Campaign / \ Against HTML Email From kennypitt at hotmail.com Mon Mar 22 14:25:06 2004 From: kennypitt at hotmail.com (Kenny Pitt) Date: Mon Mar 22 14:26:21 2004 Subject: [spambayes-dev] Help with ui.html and ui_html.py In-Reply-To: <405F335F.1040000@videotron.ca> Message-ID: papaDoc wrote: > I need to regenerate the ui_html.py with resource_package. > > So where is resource_package ? I found on the web resourcepackage > (http://resourcepackage.sourceforge.net/) but I'm not sure this is the > right tools. Yes, that's the right one. -- Kenny Pitt From tameyer at ihug.co.nz Tue Mar 23 01:56:45 2004 From: tameyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 23 01:56:56 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1305903DF3@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F13026F2B52@its-xchg4.massey.ac.nz> > And finally, I'm off for a week holiday. I tried to get a build out > (including a hand-modified string resource), but for some reason it is > failing to work for me. I'm out of time to debug this, so > I'm afraid it will have to wait. It seems (from Thomas and Trent's replies) that this will all be nicely resolved once Mark is back, so we'll be able to put out a release hopefully early next week. Does anyone have anything they want in it that's not already in cvs? If not, then I'll put together release candidates (with the caveat that people testing the Outlook binary shouldn't have ActivePython installed) tomorrow and post a link here. That gives us all four or five days to give them a spin, if we've got the time. And of course, someone please make a call on whether this is 1.0a10 or 1.0b1 . I'm sure Mark would prefer 1.0b1, since plug-in users tend to call the releases things like 0.9, and 0.10 looks a lot like 0.1! :) =Tony Meyer From ta-meyer at ihug.co.nz Tue Mar 23 23:33:21 2004 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Tue Mar 23 23:33:43 2004 Subject: [spambayes-dev] 1.0b1 Release candidates Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1304677B67@its-xchg4.massey.ac.nz> Here are release candidate 1's for 1.0b1. If you've got some time to try out the one appropriate for you before Mark gets back next week, that would be great. Note that the ActivePython pythoncom.dll problem still exists with the binary - I'm leaving this for Mark. This is from CVS 24/3/4 16:28+1200. The changelog/what's new has all the details about what's different since 1.0a9 (a few small bugs, the open relay problem, various imapfilter problems). Enjoy :) =Tony Meyer From tdickenson at geminidataloggers.com Wed Mar 24 03:33:33 2004 From: tdickenson at geminidataloggers.com (Toby Dickenson) Date: Wed Mar 24 03:33:38 2004 Subject: [spambayes-dev] cant live without xmlrpcserver Message-ID: <200403240833.33291.tdickenson@geminidataloggers.com> I run spambayes out of procmail. Originally I used xmlrpcserver, but eventually gave it up to avoid issues with daemon issues; starting the daemon running, remembering to restart it after hacking spambayes code, and issues of having my large daemon running even when I wasnt logged in. Since giving it up I have noticed (but not measured) a reduction in classification throughput. I guess this is due to overheads of starting a whole new process for each classification. What I *think* I want is an sb_filter-like process that will use an existing sb_xmlrpcserver daemon if it can, start a new deamon if it cant, and the deamon to shut itself down after a period of inactivity. Has anyone else any desires (or code ;-) for something similar? -- Toby Dickenson From skip at pobox.com Wed Mar 24 09:28:25 2004 From: skip at pobox.com (Skip Montanaro) Date: Wed Mar 24 09:28:53 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <200403240833.33291.tdickenson@geminidataloggers.com> References: <200403240833.33291.tdickenson@geminidataloggers.com> Message-ID: <16481.39689.756128.11409@montanaro.dyndns.org> Toby> Since giving it up I have noticed (but not measured) a reduction Toby> in classification throughput. I guess this is due to overheads of Toby> starting a whole new process for each classification. What version of Python are you using? Startup time got worse between 2.2 and 2.3. At the moment the culprit which caused the problem remains at large. Toby> What I *think* I want is an sb_filter-like process that will use Toby> an existing sb_xmlrpcserver daemon if it can, start a new deamon Toby> if it cant, and the deamon to shut itself down after a period of Toby> inactivity. Has anyone else any desires (or code ;-) for something Toby> similar? I think you'd still take a startup overhead hit if you didn't code your sb_filter proxy in C. Skip From tdickenson at geminidataloggers.com Wed Mar 24 10:05:34 2004 From: tdickenson at geminidataloggers.com (Toby Dickenson) Date: Wed Mar 24 10:05:40 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <16481.39689.756128.11409@montanaro.dyndns.org> References: <200403240833.33291.tdickenson@geminidataloggers.com> <16481.39689.756128.11409@montanaro.dyndns.org> Message-ID: <200403241505.34721.tdickenson@geminidataloggers.com> On Wednesday 24 March 2004 14:28, Skip Montanaro wrote: > What version of Python are you using? Startup time got worse between 2.2 > and 2.3. At the moment the culprit which caused the problem remains at > large. its 2.2 on the machine where I notice this most. > Toby> What I *think* I want is an sb_filter-like process that will use > Toby> an existing sb_xmlrpcserver daemon if it can, start a new deamon > Toby> if it cant, and the deamon to shut itself down after a period of > Toby> inactivity. Has anyone else any desires (or code ;-) for > something Toby> similar? > > I think you'd still take a startup overhead hit if you didn't code your > sb_filter proxy in C. True. Ive just measured 778ms to classify a mail using sb_filter, against 234ms using sb_client. Thats a modest saving. If I stop the server and rerun sb_client it predictably terminates with a 'Connection refused'. Getting that far takes 219ms; I guess most of that could be saved using C and a lighter protocol than xmlrpc. -- Toby Dickenson From scott.l.miller at hp.com Wed Mar 24 16:36:06 2004 From: scott.l.miller at hp.com (Miller, Scott L (Omaha Networks)) Date: Wed Mar 24 16:36:16 2004 Subject: [spambayes-dev] FAQ addition request Message-ID: <1F7C0C8F4BD7C54A8BC55012FEF3DF6D0302E4AD@omaexc11.americas.cpqcorp.net> Hi all, I've just begun using SpamBayes, Kudo's to all who've worked on it, especially the Outlook plugin. It installed cleanly and works as advertised by the overview/FAQ's. I currently have two suggestions for improvement in the documentation area: The first deals with the Outlook plugin FAQ. The FAQ makes a big deal about SpamBayes being able to work with both POP3 and IMAP. Now I used to be a Coder, still do some, but I am now a Network Designer/Engineer, so I understand what these do, at least at a high level. But I've never stopped to think about what (potentially convoluted) protocol M$ Outlook used to communicate with an Exchange server. I still haven't cared enough to find out either. But it is/was not clear to me that SpamBayes would work with an Exchange server. Even with the terse "Yes" answer to that question. Given the amount of time spent on POP3 and IMAP possibilities, the FAQ should really treat this question better, or have the Windows overview page answer this a bit more authoritatively. Suggestion for the overview page under General Information: "The Outlook addin is an application of the SpamBayes project. To our knowledge, the current version of the plug-in should work with Windows 98 and above and Outlook 2000 or above. The plug-in works with POP3 and Exchange servers. You may be able..." ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Suggestion for the FAQ: "3.3 Will SpamBayes work with Outlook 2000 connecting to an Exchange 2000 server?" The SpamBayes Outlook plug-in simply watches the Inbox, and optionally others, for new mail and attempts to apply it's rule set to those messages. Thus it doesn't attempt to get between Outlook and Exchange, so there is no problem working with the above discussed delivery mechanism." My second suggestion is that, either in the FAQ or a new "anomaly" section, there needs to be an explaination of why "Undeliverable" messages are considered to be "unfilterable". It took me quite a while to finally find a semi-cryptic email about how it was initially thought that Undeliverable messages were thought to be something that should not be filtered as the user should always be notified if their sent email had problems. This belief may be changing due to the myriad virus strains that are forging the "from" fields, and of which my compaq.com address has been widely used since the first such strains hit mid January. Thus, I'd really like to be able to filter those out, but then the problem becomes how does one distinguish real "undeliverable" messages from "undeliverable" due to forged virus messages? Which is likely beyond the scope of the SpamBayes utilities... Thanks for your consideration, -Scott L. Miller From pje at telecommunity.com Wed Mar 24 21:02:58 2004 From: pje at telecommunity.com (Phillip J. Eby) Date: Wed Mar 24 20:57:07 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: Message-ID: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> >From: Toby Dickenson >Subject: Re: [spambayes-dev] cant live without xmlrpcserver > >On Wednesday 24 March 2004 14:28, Skip Montanaro wrote: > > I think you'd still take a startup overhead hit if you didn't code your > > sb_filter proxy in C. > >True. Ive just measured 778ms to classify a mail using sb_filter, against >234ms using sb_client. Thats a modest saving. > >If I stop the server and rerun sb_client it predictably terminates with a >'Connection refused'. Getting that far takes 219ms; I guess most of that >could be saved using C and a lighter protocol than xmlrpc. Have you considered ReadyExec? http://readyexec.sourceforge.net/ It's a C client and a Python library for running more-or-less arbitrary Python programs as daemons, using Unix-domain sockets. It doesn't automatically start the daemon, but you might be able to add that. Meanwhile, the C client is extremely small and should be quite fast. The "protocol" is also simple and fast: stdin/stdout/stderr are handed to the daemon directly via OS-level file descriptor transfers, and only the command line arguments and environment are pumped over the socket, as netstrings. From jepler at unpythonic.net Wed Mar 24 22:41:04 2004 From: jepler at unpythonic.net (Jeff Epler) Date: Wed Mar 24 22:41:23 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <200403241505.34721.tdickenson@geminidataloggers.com> References: <200403240833.33291.tdickenson@geminidataloggers.com> <16481.39689.756128.11409@montanaro.dyndns.org> <200403241505.34721.tdickenson@geminidataloggers.com> Message-ID: <20040325034104.GC16886@unpythonic.net> On Wed, Mar 24, 2004 at 03:05:34PM +0000, Toby Dickenson wrote: > If I stop the server and rerun sb_client it predictably terminates with a > 'Connection refused'. Getting that far takes 219ms; I guess most of that > could be saved using C and a lighter protocol than xmlrpc. Does C++ and xmlrpc count as 50% of your request, or 25% of it? http://mail.python.org/pipermail/spambayes/2003-December/010357.html | Conclusions | ----------- | | On small and moderately sized messages, a compiled-language version of | sb_client can give a clear speedup, (sb_client.py vs sb_cclient -n 4) | but the startup time is still a relatively large when messages are small | (sb_cclientS vs sb_cclient101) and if messages are large then startup | time is irrelevant (sb_client.pyL vs sb_cclient101L) Jeff From Amir_Katz at bmc.com Thu Mar 25 01:45:41 2004 From: Amir_Katz at bmc.com (Katz, Amir) Date: Thu Mar 25 01:47:31 2004 Subject: [spambayes-dev] RE: 1.0b1 Release candidates Message-ID: Installed Proxy version on WinXP and Outlook Express. So far looks good. The minor bug that I reported to Tony ('Notate to' option becoming a radio button) seems to be fixed. Amir -----Original Message----- From: Tony Meyer [mailto:ta-meyer@ihug.co.nz] Sent: Wednesday, March 24, 2004 06:33 To: spambayes-dev@python.org Cc: 'Katz, Amir' Subject: 1.0b1 Release candidates Here are release candidate 1's for 1.0b1. If you've got some time to try out the one appropriate for you before Mark gets back next week, that would be great. Note that the ActivePython pythoncom.dll problem still exists with the binary - I'm leaving this for Mark. This is from CVS 24/3/4 16:28+1200. The changelog/what's new has all the details about what's different since 1.0a9 (a few small bugs, the open relay problem, various imapfilter problems). Enjoy :) =Tony Meyer From tdickenson at geminidataloggers.com Thu Mar 25 04:16:48 2004 From: tdickenson at geminidataloggers.com (Toby Dickenson) Date: Thu Mar 25 04:16:58 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> References: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> Message-ID: <200403250916.48518.tdickenson@geminidataloggers.com> On Thursday 25 March 2004 02:02, Phillip J. Eby wrote: > Have you considered ReadyExec? > > It's a C client and a Python library for running more-or-less arbitrary > Python programs as daemons, using Unix-domain sockets. It doesn't > automatically start the daemon, but you might be able to add > that. Meanwhile, the C client is extremely small and should be quite > fast. The "protocol" is also simple and fast: stdin/stdout/stderr are > handed to the daemon directly via OS-level file descriptor transfers, and > only the command line arguments and environment are pumped over the socket, > as netstrings. Thats a good tip, thanks. It looks like a mature solution for the client process. > From: Toby Dickenson > > Ive just measured 778ms to classify a mail using sb_filter Ive now got a prototype solution, all in python. I takes 849ms to start the daemon and classify the first email, but then only 95ms to classify any subsequent email that arrives before the server times out. With ReadyExec for the client process I should be able to reduce that further. Code available on request. -- Toby Dickenson From skip at pobox.com Thu Mar 25 10:03:28 2004 From: skip at pobox.com (Skip Montanaro) Date: Thu Mar 25 10:03:41 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <200403250916.48518.tdickenson@geminidataloggers.com> References: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> <200403250916.48518.tdickenson@geminidataloggers.com> Message-ID: <16482.62656.50758.339819@montanaro.dyndns.org> Toby> Ive now got a prototype solution, all in python. I takes 849ms to Toby> start the daemon and classify the first email, but then only 95ms Toby> to classify any subsequent email that arrives before the server Toby> times out. With ReadyExec for the client process I should be able Toby> to reduce that further. Toby> Code available on request. That's great. Can you just check it into Spambayes CVS in the contrib directory? Skip From tdickenson at devmail.geminidataloggers.co.uk Thu Mar 25 10:28:40 2004 From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson) Date: Thu Mar 25 10:28:46 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <16482.62656.50758.339819@montanaro.dyndns.org> References: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> <200403250916.48518.tdickenson@geminidataloggers.com> <16482.62656.50758.339819@montanaro.dyndns.org> Message-ID: <200403251528.40632.tdickenson@devmail.geminidataloggers.co.uk> On Thursday 25 March 2004 15:03, Skip Montanaro wrote: > Toby> Ive now got a prototype solution, all in python. I takes 849ms to > Toby> start the daemon and classify the first email, but then only 95ms > Toby> to classify any subsequent email that arrives before the server > Toby> times out. With ReadyExec for the client process I should be able > Toby> to reduce that further. > > Toby> Code available on request. > > That's great. Can you just check it into Spambayes CVS in the contrib > directory? Im happy too. Im not on sourceforge's developer list for spambayes yet... Sourceforge id 'htrd' if anyone wants to add me. -- Toby Dickenson From skip at pobox.com Thu Mar 25 10:35:14 2004 From: skip at pobox.com (Skip Montanaro) Date: Thu Mar 25 10:35:27 2004 Subject: [spambayes-dev] cant live without xmlrpcserver In-Reply-To: <200403251528.40632.tdickenson@devmail.geminidataloggers.co.uk> References: <5.1.0.14.0.20040324205454.01f212e0@mail.telecommunity.com> <200403250916.48518.tdickenson@geminidataloggers.com> <16482.62656.50758.339819@montanaro.dyndns.org> <200403251528.40632.tdickenson@devmail.geminidataloggers.co.uk> Message-ID: <16482.64562.764951.413912@montanaro.dyndns.org> Toby> Code available on request. >> That's great. Can you just check it into Spambayes CVS in the >> contrib directory? Toby> Im not on sourceforge's developer list for spambayes yet... Toby> Sourceforge id 'htrd' if anyone wants to add me. Done. Have at it. Skip From mhammond at keypoint.com.au Thu Mar 25 20:51:54 2004 From: mhammond at keypoint.com.au (Mark Hammond) Date: Thu Mar 25 20:52:17 2004 Subject: [spambayes-dev] Open relay, and new release? In-Reply-To: <20040319092442.B1152@ActiveState.com> Message-ID: <317101c412d4$ebf8bbd0$0200a8c0@eden> > Do you mean these ones? > > HKLM\Software\Python\PythonCore\{2.2|2.3}\Modules\... > > If so, then yes I can change ActivePython to not write these. I do, and that would be great! > Note that > I won't have the bandwidth to get new ActivePython builds out very > quickly though. No worries. Thanks, Mark. From tp at diffenbach.org Sat Mar 27 01:30:49 2004 From: tp at diffenbach.org (TP Diffenbach) Date: Sat Mar 27 01:29:24 2004 Subject: [spambayes-dev] Moving database from Outlook plugin to POP3 server? Message-ID: I've had it with Outlook: if it's not viruses, it's iframes, if it's not iframes it's being constantly asked if I /really/ want to change format from HTML mail (I always do, it never learns), if it's not HTML mail it's Outlook's belief that nearly every user interface decision needs to be accompanied by beeps -(even pressing the down arrow redundantly wins me a beep) and on and on -- and on. Please understand that this is /NOT/ a reaction to the SpamBayes Outlook plugin -- the only problem I've had with the plugin is that it does make Outlook think that mail is no longer in the Inbox, even when the mail's not been moved, when accessed from the notification pop-up. But that's minor, and the convenience of being able to train within email client has kept me using Outlook far longer than I would otherwise have -- this isn't a SpamBayes plugin problem, it's accumulated frustration with Outlook. That said, my databases contain 1500 hams and 3600 spams, and I'd like to preserve that in my move. I'll be moving to Mozilla Thunderbird, primarily because it's free, Free, and cross-platform. I plan to export all mail currently in Outlook to Thunderbird. Given that, how can i best preserve (or re-capitulate) what's in my corpus? Oh, I'n perfectly happy with compiling my own SpamBayes if need be, or even doing minor hacking at it, if required to move the db. Thanks for your time & attention -- I realize I'm asking in what is primarily a developer's list. --Tom From manuelfo at ono.com Mon Mar 29 04:23:33 2004 From: manuelfo at ono.com (Manuel) Date: Mon Mar 29 04:23:43 2004 Subject: [spambayes-dev] Spambayes on Windows XP Message-ID: <4061844A000B8C01@mta01.ono.com> (added by postmaster@mta01.onolab.com) First: Excuse my bad english, I'am from Spain I've installed Spambayes on my Windows XP machine, but I need the outlook add-in for all machine users. I've tried the solution that you hint at http://cvs.sourceforge.net/viewcvs.py/*checkout*/spambayes/spambayes/Outlook 2000/docs/troubleshooting.html?rev=HEAD&content-type=text/html#register-all- users But results an error: "Dllname was loaded, but the DllRegisterServer or DllUnregisterServer entry point was not found" and i can't load de COM Addin in Outlook (Outlook 2003). What can i do? I would like to use your program. Thanks.