From sanjaydarisi at cox.net Wed Oct 1 00:35:32 2003 From: sanjaydarisi at cox.net (Sanjay Darisi) Date: Wed Oct 1 00:38:30 2003 Subject: [spambayes-dev] A question? Message-ID: <3F7A5994.5090905@cox.net> Hello all, I am developing a similar spam filter as spambayes. I am looking into the source code of spambayes outlook plugin. I have few questions regarding how it handles email messages. Could anyone please let me know how spambayes outlook plug-in handles email messages? Does it add any new fields to the email? Does it use the outlook database of emails, if so does it change email headers or something? Where is the email database stored? So, like when a new email message arrives in the inbox, what actions does the outlook plugin performs on the email? What fields does it change? and what fields does it add? Could anyone take a little bit of time in explaining me these basic principles of the outlook plugin? I am a novice to this project and would like to know more about the way outlook plugin works, particularly email database handling part. I guess i have asked the question at the right place. Thank you in advance, Sanjay. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030930/b8dcef92/attachment.html From richie at entrian.com Wed Oct 1 03:33:44 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Oct 1 03:34:04 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy In-Reply-To: References: <16249.58278.674451.186755@montanaro.dyndns.org> Message-ID: <0dvknv4rd15jerqkbmbbf4acumpi5khrqm@4ax.com> [Tim] > If it amounts to no more than making training mutually exclusive with > scoring, then some gross locks at a higher level would be a lot cheaper. The POP3 proxy effectively has this already, by virtue of its asyncore-based architecture, but I've still had DBRunRecoveryErrors from it. And I've never run any other process that touched the database. I don't the problem we have is a threading problem. [Tim] > I'll speculate about one possible problem with Berkeley: if it isn't shut > down cleanly, DBRunRecoveryError may well be an *expected* exception when > you next start it, and running recovery at such times would then be a normal > part of using Berkeley. Until we know what's triggering DBRunRecoveryError, > I'm just as inclined to believe it can't be fixed without incorporating > recovery as I am to believe it's due to a thread race. You could well be right. There's a small part of my brain jumping up and down saying "No, you've seen DBRunRecoveryErrors start to occur during the middle of a run, not just after a restart!" but I'm not 100% convinced it's right. -- Richie Hindle richie@entrian.com From richie at entrian.com Wed Oct 1 03:33:46 2003 From: richie at entrian.com (Richie Hindle) Date: Wed Oct 1 03:34:05 2003 Subject: [spambayes-dev] About Bug 814322 In-Reply-To: <3F7990C8.4010001@simlog.com> References: <3F7990C8.4010001@simlog.com> Message-ID: <4jvknvgncfmf3v2epart305e70jaqoto8f@4ax.com> [Tony] > Basically the review page would die if one of the cached messages > was moved from the cache directory by someone other than spambayes > (for example a virus protection program). [Remi] > It is possible that this error can be caused by the fact that > spambayes is not able to separate the mime part of a message (to > generate file 123456 and 123456-1) No, that's not what those hyphenated filenames are about. The filenames are the timestamp - measured in seconds - of when the message was received, and hyphenated filenames are used when more than one message is received in a given second. No parsing of MIME happens at this level. -- Richie Hindle richie@entrian.com From mhammond at skippinet.com.au Wed Oct 1 05:21:54 2003 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed Oct 1 05:22:36 2003 Subject: [spambayes-dev] A question? In-Reply-To: <3F7A5994.5090905@cox.net> Message-ID: <04ce01c387fd$74eadad0$f502a8c0@eden> SpamBayes uses the MAPI API to work with messages. SpamBayes adds a couple of new fields to messages, but does not change any headers (themselves stored as a MAPI field) MAPI itself manages the "database of messages". Look at msdn.microsoft.com for information on MAPI. Mark. -----Original Message----- From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-bounces@python.org]On Behalf Of Sanjay Darisi Sent: Wednesday, 1 October 2003 2:36 PM To: spambayes-dev@python.org Subject: [spambayes-dev] A question? Hello all, I am developing a similar spam filter as spambayes. I am looking into the source code of spambayes outlook plugin. I have few questions regarding how it handles email messages. Could anyone please let me know how spambayes outlook plug-in handles email messages? Does it add any new fields to the email? Does it use the outlook database of emails, if so does it change email headers or something? Where is the email database stored? So, like when a new email message arrives in the inbox, what actions does the outlook plugin performs on the email? What fields does it change? and what fields does it add? Could anyone take a little bit of time in explaining me these basic principles of the outlook plugin? I am a novice to this project and would like to know more about the way outlook plugin works, particularly email database handling part. I guess i have asked the question at the right place. Thank you in advance, Sanjay. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20031001/0ebcd9cf/attachment-0001.html From kpitt at spidynamics.com Wed Oct 1 10:14:11 2003 From: kpitt at spidynamics.com (Kenny Pitt) Date: Wed Oct 1 10:14:15 2003 Subject: [spambayes-dev] Setting up pop3proxy_tray Message-ID: I've been working with the Outlook plugin for a while, but just decided to set up sb_server via pop3proxy_tray on my home system for use with Mozilla Mail. I'm having problems getting it to train properly. I'm sure I've probably just set something up incorrectly but I can't figure out what. Here's what I've done so far and the results I'm seeing (sorry for the long-winded explanation): I ran pop3proxy_tray which started correctly and displayed the tray icon. I selected Configure from the tray icon menu and the browser-based UI worked fine. I set up one listening port, double-checked all the database settings, and saved my config. I then exited pop3proxy_tray and restarted just to be sure the config had been updated, although I believe that problem has been fixed. After configuring my mail account in Mozilla, I was able to download mail through the proxy and SpamBayes classified all the messages as Unsure as expected. I was then able to view the list of messages in Review Messages. Here's where the problem starts. I select a few messages to train on and click Train. The UI reports "Training... Trained on n messages. Saving... Done." However, even after I do a Save & Shutdown and then restart the proxy, the home page still reports "Database has no training information." Now I'm having the additional problem that no matter how many messages I receive through the proxy, the Review Messages page always shows "no untrained messages" and I have to use Find Message to get to them. I'm running from latest SpamBayes CVS source with Python 2.3.1 and win32all 157. I'm using BerkeleyDB storage, and "which_database" properly reports that my statistics_database.db file is a dbhash. I set verbose to True and added some additional trace statements to storage.py. I found that changed_words is always empty when it goes to store the training info, and _wordinfoset never detects any singleton words. What am I missing here? -- Kenny Pitt From skip at pobox.com Wed Oct 1 10:14:08 2003 From: skip at pobox.com (Skip Montanaro) Date: Wed Oct 1 10:14:19 2003 Subject: [spambayes-dev] RE: [Spambayes] Toolbar In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130382A141@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130382A141@its-xchg4.massey.ac.nz> Message-ID: <16250.57648.481504.298390@montanaro.dyndns.org> >> I have had to uninstall and reinstall Spambayes and after the >> reinstall the Toolbar does not appear. I have tried everything that >> was listed in the troubleshooting.html and it has not been corrected. >> Any advice? Tony> What version of SpamBayes are you using? What do your log files Tony> have in them? Tony's response seems to always be the first thing asked to elucidate more input from users who submit questions. Maybe the various GUI tools should have a "Report Bug" button which prompts the user for their input and composes a message which includes the version number and log files (or at least gives the user checkboxes to include them, with the default being checked). When complete, either the message would be send or the full text displayed asking the user to use that as the text of their message. Skip From adam.walker at rbwconsulting.com Wed Oct 1 10:34:41 2003 From: adam.walker at rbwconsulting.com (Adam Walker) Date: Wed Oct 1 10:35:04 2003 Subject: [spambayes-dev] Setting up pop3proxy_tray In-Reply-To: References: Message-ID: <3F7AE601.9030203@rbwconsulting.com> I just started using the tray today. You need to stop spambayes from the tray instead of the web interface. I/we will get this fixed shortly. Thanks, Adam Kenny Pitt wrote: >Here's where the problem starts. I select a few messages to train on >and click Train. The UI reports "Training... Trained on n messages. >Saving... Done." However, even after I do a Save & Shutdown and then >restart the proxy, the home page still reports "Database has no training >information." Now I'm having the additional problem that no matter how >many messages I receive through the proxy, the Review Messages page >always shows "no untrained messages" and I have to use Find Message to >get to them. > > From kennypitt at hotmail.com Wed Oct 1 10:56:33 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Wed Oct 1 10:56:41 2003 Subject: [spambayes-dev] Setting up pop3proxy_tray In-Reply-To: <3F7AE601.9030203@rbwconsulting.com> Message-ID: Adam Walker wrote: > I just started using the tray today. You need to stop spambayes from > the tray instead of the web interface. I/we will get this fixed > shortly. Thanks, > Adam > > Kenny Pitt wrote: > >> Here's where the problem starts. I select a few messages to train on >> and click Train. The UI reports "Training... Trained on n messages. >> Saving... Done." However, even after I do a Save & Shutdown and then >> restart the proxy, the home page still reports "Database has no >> training information." Now I'm having the additional problem that >> no matter how many messages I receive through the proxy, the Review >> Messages page always shows "no untrained messages" and I have to use >> Find Message to get to them. Interesting, because that's what I tried first. I only did Save & Shutdown from the Web interface after not having any luck shutting down from the tray icon. The only obvious problem that I had with shutting down from the web interface was that pop3proxy_tray did not detect that the proxy had been stopped. -- Kenny Pitt From rob at hooft.net Wed Oct 1 16:35:55 2003 From: rob at hooft.net (Rob Hooft) Date: Wed Oct 1 16:35:50 2003 Subject: [spambayes-dev] FAQ Message-ID: <3F7B3AAB.90509@hooft.net> The FAQ, question 1.4, refers to the mimelib.sf.net pages, but those are quite empty. The spambayes download instructions themselves refer (correctly) to the sig-pages. Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From T.A.Meyer at massey.ac.nz Wed Oct 1 19:49:22 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Oct 1 19:49:38 2003 Subject: [spambayes-dev] RE: [Spambayes] Toolbar Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A34F@its-xchg4.massey.ac.nz> > Tony's response seems to always be the first thing asked to > elucidate more input from users who submit questions. Maybe > the various GUI tools should have a "Report Bug" button which > prompts the user for their input and composes a message which > includes the version number and log files (or at least gives > the user checkboxes to include them, with the default being > checked). The Outlook plug-in has told people to do this via the trouble-shooting guide for a long time, although not that many people seem to follow the instructions. (Ironically, a lot of people say "I have tried all the steps in the troubleshooting guide", and then fail to include a log, which the troubleshooting guide says to do...). The latest version of the plug-in does at least make it easier to find the log file. I think I remember seeing code that would put the whole bug report message together in the Outlook sandbox directory, too, but it's not exposed yet. (I guess Mark is planning on adding it at some point). > When complete, either the message would be sent or > the full text displayed asking the user to use that as the > text of their message. It's probably a good idea to add this to the web interface, too. It shouldn't be all that difficult since we already (probably) have the user's smtp details and can just use smtplib. I'll knock something up for everyone to take a look at later on today. (Now that we will be able to send mail, we are one step closer to that full pop3 mail client, too ). =Tony Meyer From T.A.Meyer at massey.ac.nz Wed Oct 1 20:02:56 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Wed Oct 1 20:03:08 2003 Subject: [spambayes-dev] FAQ Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A36A@its-xchg4.massey.ac.nz> > The FAQ, question 1.4, refers to the mimelib.sf.net pages, > but those are quite empty. The spambayes download instructions > themselves refer (correctly) to the sig-pages. Thanks Rob, fixed. =Tony Meyer From ta-meyer at ihug.co.nz Thu Oct 2 01:17:28 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 2 01:17:36 2003 Subject: [spambayes-dev] Example auto-message Message-ID: <200310020517.RAA19732@its-campus1.massey.ac.nz> A non-text attachment was scrubbed... Name: SpamBayesServer1.log Type: application/octet-stream Size: 781 bytes Desc: not available Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20031002/19fc2c5d/SpamBayesServer1.obj -------------- next part -------------- I am using SpamBayes POP3 Proxy Beta3, version 0.3 (September 2003) (source), with version 2.3.1 (#47, Sep 23 2003, 23:47:32) [MSC v.1200 32 bit (Intel)] of Python; my operating system is Windows 5.1.2600.2 (Service Pack 1). I have trained 201 ham and 231 spam. The problem I am having is [DESCRIBE YOUR PROBLEM HERE] --- This is what a message generated with by the "Send Help Message" feature of sb_server looks like. Any suggestions for improvement, or other information that should be included? (You can ignore the attached log file, it's just for example purposes). =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Oct 2 01:56:24 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Oct 2 01:56:31 2003 Subject: [spambayes-dev] Setting up pop3proxy_tray Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A528@its-xchg4.massey.ac.nz> > Here's where the problem starts. I select a few messages to > train on and click Train. The UI reports "Training... > Trained on n messages. Saving... Done." However, even after > I do a Save & Shutdown and then restart the proxy, the home > page still reports "Database has no training information." Yoicks. I think this was a bug I introduced. I believe I have fixed it now (it wasn't specific to the tray app). If you could check and let me know, that would be great. > Now I'm having the additional problem that no matter how many > messages I receive through the proxy, the Review Messages > page always shows "no untrained messages" and I have to use > Find Message to get to them. This I'm not sure about. You've tried refreshing the page in the browser? If the find message query can retrieve it, then it's definitely loaded into the Corpus ok, so I'm not sure why it wouldn't be showing up. Three possibilities come to mind: * The code I added to handle the case where the file can't be opened - but this should also stop it working in the 'find message' case, I think. * The code Skip added to only show a certain number of messages per page - you don't have this somehow set to 0, do you? * The messages are somehow ending up in the ham/spam cache directory instead of the unknown one (the find message query would find it anyway). =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Oct 2 02:16:17 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Oct 2 02:16:24 2003 Subject: [spambayes-dev] About Bug 814322 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A53A@its-xchg4.massey.ac.nz> [Tony] > Basically the review page would die if one of the cached messages was > moved from the cache directory by someone other than spambayes (for > example a virus protection program). [Remi] > It is possible that this error can be caused by the fact that > spambayes is not able to separate the mime part of a message (to > generate file 123456 and 123456-1) [Richie] > No, that's not what those hyphenated filenames are about. > The filenames are the timestamp - measured in seconds - of > when the message was received, and hyphenated filenames are > used when more than one message is received in a given > second. No parsing of MIME happens at this level. Which still leaves the question of why people see the "hdrtxt" error if it's not because of this (this was definitely one cause). If the hdrtxt attribute is asked for, then the load() is called, load calls SetSubstance(), which sets hdrtxt. So it's failing in the load, somewhere. Whatever the reason, note that cvs head doesn't have the symptom, because there is no hdrtxt attribute ever - the Corpus.Message class was subsumed into the message.Message class, and the hdrtxt stuff was no longer necessary. This doesn't mean that the message is fixed (although it may have been), just that it will show up differently. =Tony Meyer From T.A.Meyer at massey.ac.nz Thu Oct 2 02:48:01 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Oct 2 02:48:08 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> [Tim] > I'll speculate about one possible problem with Berkeley: if it isn't > shut down cleanly, DBRunRecoveryError may well be an *expected* > exception when you next start it, and running recovery at such times > would then be a normal part of using Berkeley. Until we know what's > triggering DBRunRecoveryError, I'm just as inclined to believe it > can't be fixed without incorporating recovery as I am to believe it's > due to a thread race. According to the sleepcat docs, it's standard practice to check if recovery needs to be run every time you start up, because nothing can go wrong if recovery doesn't need to be done and you check. They say that it's the db's log files that tell it if recovery needs to be done or not. Does the Python wrapper for bsddb reate these log files, too? If so, would they be somewhere we can find them and look at them to see if they indicate what the problem might be? Also from the docs: """ DB_RUNRECOVERY There exists a class of errors that Berkeley DB considers fatal to an entire Berkeley DB environment. An example of this type of error is a corrupted database or a log write failure because the disk is out of free space. The only way to recover from these failures is to have all threads of control exit the Berkeley DB environment, run recovery of the environment, and re-enter Berkeley DB. (It is not strictly necessary that the processes exit, although that is the only way to recover system resources, such as file descriptors and memory, allocated by Berkeley DB.) When this type of error is encountered, the error value DB_RUNRECOVERY is returned. This error can be returned by any Berkeley DB interface. Once DB_RUNRECOVERY is returned by any interface, it will be returned from all subsequent Berkeley DB calls made by any threads of control participating in the environment. Optionally, applications may also specify a fatal-error callback function using the DB_ENV->set_paniccall function. This callback function will be called with two arguments: a reference to the DB_ENV structure associated with the environment and the errno value associated with the underlying error that caused the problem. Applications can handle such fatal errors in one of two ways: by checking for DB_RUNRECOVERY as part of their normal Berkeley DB error return checking, similarly to DB_LOCK_DEADLOCK or any other error, or by simply exiting the application when the callback function is called in applications that have no cleanup processing of their own. """ """ Errors can occur in the Berkeley DB library where the only solution is to shut down the application and run recovery (for example, if Berkeley DB is unable to allocate heap memory). """ I don't suppose you're low on memory or disk space, are you Richie? In any case, the sleepycat docs all seem to say that running recovery will fix the db without any worries, so perhaps that is simply what we should do? =Tony Meyer (Who wishes that ctrl-c would copy in the new-style html help with 2.3.1) From barry at python.org Thu Oct 2 07:41:38 2003 From: barry at python.org (Barry Warsaw) Date: Thu Oct 2 07:41:46 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> Message-ID: <1065094897.21561.81.camel@anthem> On Thu, 2003-10-02 at 02:48, Meyer, Tony wrote: > I don't suppose you're low on memory or disk space, are you Richie? In > any case, the sleepycat docs all seem to say that running recovery will > fix the db without any worries, so perhaps that is simply what we should > do? FWIW, in the BerkeleyDB storage in ZODB, we do the following three things related to recovery: - Open the environment with db.DB_RECOVER so that recovery is run automatically if necessary. If it's not necessary (i.e. the db was closed cleanly), this should take no time. - Spawn a checkpointing thread. Recovery time is related to the amount of time since the last checkpoint. If you never checkpoint, the recovery (auto or explicit) can take a very long time. By default, the BerkeleyDB storage checkpoints every two minutes. - On shutdown, do two forced checkpoints. That is: env.txn_checkpoint(0, 0, db.DB_FORCE) env.txn_checkpoint(0, 0, db.DB_FORCE) This deep voodoo was recommended to me by Keith Bostic ages ago and its purpose is to avoid lengthy recoveries even when the database is shutdown cleanly. Apparently the DB_FORCE is required for the second call, but does no harm for the first. And yep, you need two of them. wishing-you-a-speedy-recovery-ly y'rs, -Barry From kennypitt at hotmail.com Thu Oct 2 09:45:10 2003 From: kennypitt at hotmail.com (Kenny Pitt) Date: Thu Oct 2 09:45:21 2003 Subject: [spambayes-dev] Setting up pop3proxy_tray In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130382A528@its-xchg4.massey.ac.nz> Message-ID: Meyer, Tony wrote: >> Here's where the problem starts. I select a few messages to >> train on and click Train. The UI reports "Training... >> Trained on n messages. Saving... Done." However, even after >> I do a Save & Shutdown and then restart the proxy, the home >> page still reports "Database has no training information." > > Yoicks. I think this was a bug I introduced. I believe I have fixed > it now (it wasn't specific to the tray app). If you could check and > let me know, that would be great. Seems to work fine now, thanks for the fix. > >> Now I'm having the additional problem that no matter how many >> messages I receive through the proxy, the Review Messages >> page always shows "no untrained messages" and I have to use >> Find Message to get to them. > > This I'm not sure about. You've tried refreshing the page in the > browser? If the find message query can retrieve it, then it's > definitely loaded into the Corpus ok, so I'm not sure why it wouldn't > be showing up. This problem appeared inexplicably and has now disappeared again just as inexplicably. I had tried refreshing the browser, and clicking the "Check again" link, both to no avail at the time I posted. Now I'm seeing all of my messages again including those that had disappeared from the review page yesterday. I'll let you know if the problem crops up again. -- Kenny Pitt From popiel at wolfskeep.com Thu Oct 2 10:45:15 2003 From: popiel at wolfskeep.com (T. Alexander Popiel) Date: Thu Oct 2 10:45:20 2003 Subject: [spambayes-dev] Example auto-message In-Reply-To: Message from "Tony Meyer" of "Thu, 02 Oct 2003 17:17:28 +1200." <200310020517.RAA19732@its-campus1.massey.ac.nz> References: <200310020517.RAA19732@its-campus1.massey.ac.nz> Message-ID: <20031002144515.B14A22DE7C@cashew.wolfskeep.com> In message: <200310020517.RAA19732@its-campus1.massey.ac.nz> "Tony Meyer" writes: >I am using SpamBayes POP3 Proxy Beta3, version 0.3 (September 2003) (source), with version 2.3.1 (#47, Sep 23 2003, 23:47:32) [MSC v.1200 32 bit (Intel)] of Python; my operating system is Windows 5.1.2600.2 (Service Pack 1). I have tra ined 201 ham and 231 spam. > >The problem I am having is [DESCRIBE YOUR PROBLEM HERE] > >--- > >This is what a message generated with by the "Send Help Message" feature of sb _server looks like. Any suggestions for improvement, or other information that should be included? Some newlines might me nice. :-) - Alex From rob at hooft.net Thu Oct 2 11:15:06 2003 From: rob at hooft.net (Rob Hooft) Date: Thu Oct 2 11:15:13 2003 Subject: [spambayes-dev] Setting up a new account Message-ID: <3F7C40FA.1000608@hooft.net> I am setting up a new mail account for the first time in a long while, and decided to check the current docs for spambayes on how I am supposed to do that. Here are some comments. 1) First I thought about setting up the imapfilter. To configure SpamBayes, run "sb_imapfilter.py -b", which should open a web page to , click on the "Configuration" link at the top right, and fill in the relevant details. "which should open a web page": what does that mean? Does it start a server and a browser? For me it tried to run a browser, but it did hang in there for a long time and then I decided to give up. I was confused. I managed to run sb_server.py correctly, and connect to it using an external browser. Impressive. I set up a procmail filter instead. 2) sb_mboxtrain.py does not say that it can also handle Maildir, I had to find out from the source that it can under non-Windows O/S. 3) In the handling of Maildir by sb_mboxtrain, deleted messages are not ignored. 4) sb_mboxtrain requires a -d/-D argument, it should probably use the values specified in the rc file. I must say that this has come a long way since the versions I have running for my other accounts. I may even decide to upgrade my other setups, making training sessions a lot easier.... Rob -- Rob W.W. Hooft || rob@hooft.net || http://www.hooft.net/people/rob/ From sjoerd at acm.org Thu Oct 2 11:24:15 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Oct 2 11:24:22 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <3F7C40FA.1000608@hooft.net> References: <3F7C40FA.1000608@hooft.net> Message-ID: <3F7C431F.6080507@acm.org> Rob Hooft wrote: > I am setting up a new mail account for the first time in a long while, > and decided to check the current docs for spambayes on how I am supposed > to do that. Here are some comments. > > 1) First I thought about setting up the imapfilter. To configure > SpamBayes, run "sb_imapfilter.py -b", which should open a web > page to , click on the "Configuration" link > at the > top right, and fill in the relevant details. > "which should open a web page": what does that mean? Does it start a > server and > a browser? For me it tried to run a browser, but it did hang in there > for a long > time and then I decided to give up. I was confused. I managed to run > sb_server.py > correctly, and connect to it using an external browser. Impressive. I noticed this too and I investigated a little. I neglected to report my findings here. The problem is that it does indeed start a browser and waits for the command to finish before starting to serve webpages. This of course does not work. It should not wait. In fact, I'm not convinced that it should start a browser in the first place. -- Sjoerd Mullender From skip at pobox.com Thu Oct 2 12:55:33 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Oct 2 12:55:45 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <3F7C431F.6080507@acm.org> References: <3F7C40FA.1000608@hooft.net> <3F7C431F.6080507@acm.org> Message-ID: <16252.22661.75395.349915@montanaro.dyndns.org> Sjoerd> The problem is that it does indeed start a browser and waits for Sjoerd> the command to finish before starting to serve webpages. This Sjoerd> of course does not work. It should not wait. In fact, I'm not Sjoerd> convinced that it should start a browser in the first place. All it does is call webbrowser.open(...), (or open_new) right? It's a bug in the webbrowser module or its setup if it doesn't return after launching the page. I just tried webbrowser.open("http://www.mojam.com/") with BROWSER=open in my environment. It returned immediately, then Safari popped up with the page. Do you have the BROWSER environment variable set? What does webbrowser.get() return for you? Skip From tim.one at comcast.net Thu Oct 2 13:36:35 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Oct 2 13:36:46 2003 Subject: [spambayes-dev] Bad spam & unsure stats in Outlook addin Message-ID: I noticed that the #spam and #unsure recorded by the Outlook addin were always even, while the #ham was sometimes even and sometimes odd. Paying closer attention, the #spam and #unsure reported are twice as big as they should be. This appears due to filter_message in Outlook2000/filter.py: def filter_message(msg, mgr, all_actions=True): ... prob = mgr.score(msg) prob_perc = prob * 100 if prob_perc >= config.spam_threshold: ... mgr.stats.num_spam += 1 HERE elif prob_perc >= config.unsure_threshold: ... mgr.stats.num_unsure += 1 HERE else: ... ... mgr.stats.RecordClassification(prob) HERE That is, the start of the function reaches into mgt.stats and increments its num_spam and num_unsure directly, while near the end of the function the call to mgr.stats.RecordClassifiaction bumps them (indirectly) again. Offhand I'm not sure which is the better place to do this -- but it probably shouldn't be done both places . From tim.one at comcast.net Thu Oct 2 13:49:00 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Oct 2 13:49:06 2003 Subject: [spambayes-dev] Bad spam & unsure stats in Outlook addin In-Reply-To: Message-ID: [Tim] > ... > Paying closer attention, the #spam and #unsure reported are twice as > big as they should be. > ... > Offhand I'm not sure which is the better place to do this -- but it > probably shouldn't be done both places . Never mind: I checked in the obvious fix. From sjoerd at acm.org Thu Oct 2 15:37:18 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Oct 2 15:37:24 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <16252.22661.75395.349915@montanaro.dyndns.org> References: <3F7C40FA.1000608@hooft.net> <3F7C431F.6080507@acm.org> <16252.22661.75395.349915@montanaro.dyndns.org> Message-ID: <3F7C7E6E.8040609@acm.org> Skip Montanaro wrote: > Sjoerd> The problem is that it does indeed start a browser and waits for > Sjoerd> the command to finish before starting to serve webpages. This > Sjoerd> of course does not work. It should not wait. In fact, I'm not > Sjoerd> convinced that it should start a browser in the first place. > > All it does is call webbrowser.open(...), (or open_new) right? It's a bug > in the webbrowser module or its setup if it doesn't return after launching > the page. I just tried > > webbrowser.open("http://www.mojam.com/") > > with BROWSER=open in my environment. It returned immediately, then Safari > popped up with the page. Do you have the BROWSER environment variable set? > What does webbrowser.get() return for you? This was on RedHat with the Gnome desktop. It started galeon, and since I didn't have a galeon running at the time, the one just invoked didn't return. Normally if you have a galeon browser running, a new one will just signal the old one and exit. From skip at pobox.com Thu Oct 2 15:59:48 2003 From: skip at pobox.com (Skip Montanaro) Date: Thu Oct 2 16:00:01 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <3F7C7E6E.8040609@acm.org> References: <3F7C40FA.1000608@hooft.net> <3F7C431F.6080507@acm.org> <16252.22661.75395.349915@montanaro.dyndns.org> <3F7C7E6E.8040609@acm.org> Message-ID: <16252.33716.948940.776302@montanaro.dyndns.org> Sjoerd> This was on RedHat with the Gnome desktop. It started galeon, Sjoerd> and since I didn't have a galeon running at the time, the one Sjoerd> just invoked didn't return. Normally if you have a galeon Sjoerd> browser running, a new one will just signal the old one and Sjoerd> exit. So if you don't already have Galeon running and you execute import webbrowser webbrowser.open("http://www.python.org/") at the Python prompt, you don't get the next prompt until after you've exited Galeon? This is a webbrowser module bug (or a Galeon bug which the webbrowser module fails to worm around). Can you file a bug report on the Python project? In the meantime, it looks like if you modify the definition of cmd in Galeon._remote to cmd = "%s %s %s & >/dev/null 2>&1" % (self.name, raise_opt, action) that webbrowser.open() should return for you. I'm not sure the '&' is sufficient though. You may lose the Galeon instance if you then exit from the Python interpreter. In general, the code in webbrowser._remote() looks a bit hackish. I'm not sure I like this: rc = os.system(cmd) if rc: import time os.system("%s >/dev/null 2>&1 &" % self.name) time.sleep(PROCESS_CREATION_DELAY) rc = os.system(cmd) Oh well, it's what we're stuck with... Thx, Skip From sjoerd at acm.org Thu Oct 2 17:02:02 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Thu Oct 2 17:02:06 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <16252.33716.948940.776302@montanaro.dyndns.org> References: <3F7C40FA.1000608@hooft.net> <3F7C431F.6080507@acm.org> <16252.22661.75395.349915@montanaro.dyndns.org> <3F7C7E6E.8040609@acm.org> <16252.33716.948940.776302@montanaro.dyndns.org> Message-ID: <3F7C924A.9050903@acm.org> Skip Montanaro wrote: > Sjoerd> This was on RedHat with the Gnome desktop. It started galeon, > Sjoerd> and since I didn't have a galeon running at the time, the one > Sjoerd> just invoked didn't return. Normally if you have a galeon > Sjoerd> browser running, a new one will just signal the old one and > Sjoerd> exit. > > So if you don't already have Galeon running and you execute > > import webbrowser > webbrowser.open("http://www.python.org/") > > at the Python prompt, you don't get the next prompt until after you've > exited Galeon? This is a webbrowser module bug (or a Galeon bug which the > webbrowser module fails to worm around). Can you file a bug report on the > Python project? Done. Bug #816810. From richie at entrian.com Thu Oct 2 18:36:11 2003 From: richie at entrian.com (Richie Hindle) Date: Thu Oct 2 18:36:14 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> Message-ID: [Tony] > I don't suppose you're low on memory or disk space, are you Richie? No, sorry, 450MB free RAM and 25GB free disk space. Should be enough. 8-) > In any case, the sleepycat docs all seem to say that running recovery will > fix the db without any worries, so perhaps that is simply what we should > do? I can't quite remember all the details of this, but doesn't running recovery require a full DB environment rather than the single-file mode we use? -- Richie Hindle richie@entrian.com From T.A.Meyer at massey.ac.nz Thu Oct 2 22:15:29 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Thu Oct 2 22:15:38 2003 Subject: [spambayes-dev] Example auto-message Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A79D@its-xchg4.massey.ac.nz> > Some newlines might me nice. :-) Ah yes, I forgot that smtplib wouldn't wrap the message for me :) I'll add that in. =Tony Meyer From ta-meyer at ihug.co.nz Thu Oct 2 22:39:46 2003 From: ta-meyer at ihug.co.nz (Tony Meyer) Date: Thu Oct 2 22:39:54 2003 Subject: [spambayes-dev] RE: Returned mail: Service unavailable In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1303745639@its-xchg4.massey.ac.nz> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AFC4@its-xchg4.massey.ac.nz> > ----- The following addresses had permanent fatal errors > ----- > > ----- Transcript of session follows ----- > ... while talking to mail.python.org.: > >>> DATA > <<< 550 message rejected -- looks like spam > 554 ... Service unavailable Any idea why this message was considered spam by mail.python.org? """ Return-Path: Received: from its-mm1.massey.ac.nz (its-mm1 [130.123.128.45]) by its-mail1.massey.ac.nz (8.9.3/8.9.3) with ESMTP id OAA08940; Fri, 3 Oct 2003 14:14:41 +1200 (NZST) Received: from its-campus1.massey.ac.nz (Not Verified[130.123.32.254]) by its-mm1.massey.ac.nz with NetIQ MailMarshal id ; Fri, 03 Oct 2003 14:14:41 +1200 Received: from it029048 (it029048.massey.ac.nz [130.123.238.51]) by its-campus1.massey.ac.nz (8.9.3/8.9.3) with ESMTP id OAA26124; Fri, 3 Oct 2003 14:14:40 +1200 (NZST) From: "Tony Meyer" To: Cc: Subject: RE: [Spambayes-checkins] spambayes/spambayes UserInterface.py, 1.28,1.29 Date: Fri, 3 Oct 2003 14:14:40 +1200 Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AFC2@its-xchg4.massey.ac.nz> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 Importance: Normal In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130374555F@its-xchg4.massey.ac.nz> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 > Damn, Tony! You're quick! Thanks for that enhancement. I was encouraged by the thought of fewer "It doesn't work, I'm not going to tell you anything else" messages . =Tony Meyer """ >From details.txt, which was attached to the failure: """ Reporting-MTA: dns; its-mail1.massey.ac.nz Received-From-MTA: DNS; its-mm1 Arrival-Date: Fri, 3 Oct 2003 14:14:41 +1200 (NZST) Final-Recipient: RFC822; spambayes-checkins@python.org Action: failed Status: 5.2.0 Remote-MTA: DNS; mail.python.org Diagnostic-Code: SMTP; 550 message rejected -- looks like spam Last-Attempt-Date: Fri, 3 Oct 2003 14:14:47 +1200 (NZST) """ =Tony Meyer From tim.one at comcast.net Thu Oct 2 22:50:22 2003 From: tim.one at comcast.net (Tim Peters) Date: Thu Oct 2 22:50:31 2003 Subject: [spambayes-dev] Why did python.org give this a 550 ("looks like spam")? In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212AFC4@its-xchg4.massey.ac.nz> Message-ID: [Tony Meyer] > Any idea why this message was considered spam by mail.python.org? I don't, so let's ask the postmaster (to whom this is copied, with your message as an attachment). Suspicion: some regular expressions were set up to reject worms, and any algorithm using a regular expression in the hope of accomplishing something non-trivial is bound to fail at times in real life. At one recent point, python.org blacklisted itself on those grounds . -------------- next part -------------- An embedded message was scrubbed... From: "Tony Meyer" Subject: [spambayes-dev] RE: Returned mail: Service unavailable Date: Thu, 2 Oct 2003 22:39:46 -0400 Size: 3472 Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20031002/58746ab5/attachment.mht From T.A.Meyer at massey.ac.nz Fri Oct 3 01:57:26 2003 From: T.A.Meyer at massey.ac.nz (Meyer, Tony) Date: Fri Oct 3 01:57:38 2003 Subject: [spambayes-dev] Setting up a new account Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130382A87B@its-xchg4.massey.ac.nz> > > run "sb_imapfilter.py -b", which should open a web > > page to , > In fact, I'm not convinced that it > should start a browser in the first place. It's just the "-b" that's telling it to start a browser; if you run it without, then you just get the interface and have to open the browser yourself. However, imapfilter currently requires you to use -c, -t, or -b (or more than one), unlike sb_server, which works without anything. Given that there are apparently problems with forcing the browser to open, I'll change this so that running sb_imapfilter.py by itself is just like sb_server.py by itself, serving the interface. =Tony Meyer From sjoerd at acm.org Fri Oct 3 03:42:50 2003 From: sjoerd at acm.org (Sjoerd Mullender) Date: Fri Oct 3 03:43:04 2003 Subject: [spambayes-dev] Setting up a new account In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130382A87B@its-xchg4.massey.ac.nz> References: <1ED4ECF91CDED24C8D012BCF2B034F130382A87B@its-xchg4.massey.ac.nz> Message-ID: <3F7D287A.3060506@acm.org> Meyer, Tony wrote: >>>run "sb_imapfilter.py -b", which should open a web >>>page to , >> >>In fact, I'm not convinced that it >>should start a browser in the first place. > > > It's just the "-b" that's telling it to start a browser; if you run it > without, then you just get the interface and have to open the browser > yourself. However, imapfilter currently requires you to use -c, -t, or > -b (or more than one), unlike sb_server, which works without anything. > Given that there are apparently problems with forcing the browser to > open, I'll change this so that running sb_imapfilter.py by itself is > just like sb_server.py by itself, serving the interface. I was talking about imapfilter. I haven't looked at sb_server. It seems to me the ideal situation would be: if given the correct flag, start a web server. If, in addition, given the -b flag, start a browser. I don't think you want to loose the possibility of running a one off training session without starting a web server. -- Sjoerd Mullender From richie at entrian.com Fri Oct 3 05:12:01 2003 From: richie at entrian.com (Richie Hindle) Date: Fri Oct 3 05:12:10 2003 Subject: [spambayes-dev] Tricky false positive: US states Message-ID: Here's an interesting false positive: I asked an American colleague a question about US state codes, and he emailed me a copy of this page from the US Post Office website: http://www.usps.com/ncsc/lookups/usps_abbreviations.html Now that scored as pretty solid spam for me (0.99075) because all the state names are slight spam clues - most of my spam comes from the USA. Here's a snippet of the X-Spambayes-Evidence header: 'lock': 0.73; 'louisiana': 0.73; 'marshall': 0.73; 'missouri': 0.73; 'mount': 0.73; 'nebraska': 0.73; 'ohio': 0.73; 'parkway': 0.73; 'pennsylvania': 0.73; 'plz': 0.73; 'rad': 0.73; 'square': 0.73; 'tennessee': 0.73; 'texas': 0.73; 'trl': 0.73; 'valley': 0.73; and so on. All those fifty sightly-spammy state names add up to a big spam score. Most of them are hapaxes, but that's not very relevant - it's just a result of not having a very big training set (~600 messages). Not sure whether there's anything we can do about it (or even whether we should consider doing anything about it) but I thought it was interesting. [ Ah, no, hang on, I *do* have an idea, but it's mostly outside the remit of Spambayes. Mail that never went outside my organisation shouldn't be marked as spam. All the Received headers show the mail moving within my organisation. So I want some kind of plug-in system whereby I can use the Spambayes tokeniser, header analysis and so on to make my own decisions that override the classifier. Once my army of winged monkeys has finished their Python training course I'll get them onto it. ] -- Richie Hindle richie@entrian.com From doug at sonosphere.com Fri Oct 3 09:25:52 2003 From: doug at sonosphere.com (Doug Wyatt) Date: Fri Oct 3 09:25:55 2003 Subject: [spambayes-dev] fp: innocuous text in hidden html Message-ID: <1C2ABA44-F5A5-11D7-856C-000393A34B5A@sonosphere.com> I have to admit this is a clever spam technique -- I've taken a quick look in the archives and read through tokenizer.py and seen nothing about it. The trick is that the message has a number of elements with type=hidden and value=something very hammy. Has anyone tried making the tokenizer do special things with the values of input fields? Doug MIME-Version: 1.0 Content-Type: text/html; charset="iso-8859-1" ... snip ... ... snip --here's the payload-- ... Click here to stop further messages From skip at pobox.com Fri Oct 3 09:46:58 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Oct 3 09:47:06 2003 Subject: [spambayes-dev] Re: [Spambayes] Error from pop3proxy In-Reply-To: References: <1ED4ECF91CDED24C8D012BCF2B034F130382A53C@its-xchg4.massey.ac.nz> Message-ID: <16253.32210.888672.540317@montanaro.dyndns.org> Richie> I can't quite remember all the details of this, but doesn't Richie> running recovery require a full DB environment rather than the Richie> single-file mode we use? I believe so. To fix some of the problems, Greg Smith checked in a change to Python's bsddb package the other day which creates an in-memory environment. This should help, but without an environment stored on-disk, I don't think proper recovery will be possible. If we want to be able to do db recovery in SpamBayes, I think we're going to have to start using the new features of the bsddb package. Skip From skip at pobox.com Fri Oct 3 10:16:43 2003 From: skip at pobox.com (Skip Montanaro) Date: Fri Oct 3 10:16:49 2003 Subject: [spambayes-dev] fp: innocuous text in hidden html In-Reply-To: <1C2ABA44-F5A5-11D7-856C-000393A34B5A@sonosphere.com> References: <1C2ABA44-F5A5-11D7-856C-000393A34B5A@sonosphere.com> Message-ID: <16253.33995.174355.744583@montanaro.dyndns.org> Doug> I have to admit this is a clever spam technique -- I've taken a Doug> quick look in the archives and read through tokenizer.py and seen Doug> nothing about it. The trick is that the message has a number of Doug> elements with type=hidden and value=something very hammy. I believe the tokenizer strips out all HTML tags, at least it makes a good effort to do so. It uses a fancy-schmancy regular expression Tim Peters wrote to make it fast, but I believe it's also limited in what it believes the maximum length of an HTML tag can be: # Cheap-ass gimmick to probabilistically find HTML/XML tags. # Note that