From mhammond at skippinet.com.au Tue Oct 9 07:40:41 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Tue, 9 Oct 2007 15:40:41 +1000 Subject: [spambayes-dev] exceptions in windows binary Message-ID: <04a001c80a36$eef22ee0$ccd68ca0$@com.au> I just tried the spambayes-1.1a4-070629 with a proxy and I think there is a problem. If you do not select the 'upgrade databases' option, attempting to train will result in: Training... 500 Server error Traceback (most recent call last): File "spambayes\Dibbler.pyc", line 476, in found_terminator File "spambayes\ProxyUI.pyc", line 304, in onReview File "spambayes\Stats.pyc", line 134, in RecordTraining File "spambayes\message.pyc", line 147, in set_persistent_statistics File "shelve.pyc", line 130, in __setitem__ TypeError: object does not support item assignment I only played with this as I first saw it on my Mum's PC - I had left the 'upgrade' option enabled, but saw an exception flash by in the DOS box that briefly appeared, and then started seeing the exception. On a whim, I re-executed the converter, and the problem went away. Returning home, I tried to repro, but could not - I didn't see an exception (although it may have been too fast) and I didn't see the error. I uninstalled, removed the spambayes data directory, re-ran the executable, deselected the options, then was able to repro it. The installer also asks you twice about upgrading - in the first dialog, then at the end. I only de-selected the end one. I'm just noting it here as I'm too lazy to open a bug, but this seems like a bug we need to fix before 1.0. Anyone have any clues? Mark From skip at pobox.com Tue Oct 9 17:30:43 2007 From: skip at pobox.com (skip at pobox.com) Date: Tue, 9 Oct 2007 10:30:43 -0500 Subject: [spambayes-dev] exceptions in windows binary In-Reply-To: <04a001c80a36$eef22ee0$ccd68ca0$@com.au> References: <04a001c80a36$eef22ee0$ccd68ca0$@com.au> Message-ID: <18187.40611.971014.108032@montanaro.dyndns.org> Mark> I just tried the spambayes-1.1a4-070629 with a proxy and I think Mark> there is a problem. If you do not select the 'upgrade databases' Mark> option, attempting to train will result in: ... Remind me again what "upgrade dataabses" is supposed to do? Mark> I'm just noting it here as I'm too lazy to open a bug, but this Mark> seems like a bug we need to fix before 1.0. Anyone have any Mark> clues? Not a one. Is this problem specific to the Outlook installer? Skip From mhammond at skippinet.com.au Tue Oct 9 22:52:33 2007 From: mhammond at skippinet.com.au (Mark Hammond) Date: Wed, 10 Oct 2007 06:52:33 +1000 Subject: [spambayes-dev] exceptions in windows binary In-Reply-To: <18187.40611.971014.108032@montanaro.dyndns.org> References: <04a001c80a36$eef22ee0$ccd68ca0$@com.au> <18187.40611.971014.108032@montanaro.dyndns.org> Message-ID: <052e01c80ab6$6118a5b0$2349f110$@com.au> > Mark> I just tried the spambayes-1.1a4-070629 with a proxy and I > think > Mark> there is a problem. If you do not select the 'upgrade > databases' > Mark> option, attempting to train will result in: > ... > > Remind me again what "upgrade dataabses" is supposed to do? No idea! No idea who added it either. > Mark> I'm just noting it here as I'm too lazy to open a bug, but > this > Mark> seems like a bug we need to fix before 1.0. Anyone have any > Mark> clues? > > Not a one. Is this problem specific to the Outlook installer? Its specific to using the pop3 proxy from the binary installer, but has nothing to do with the Outlook addin. Mark From kenny.pitt at gmail.com Fri Oct 12 14:36:32 2007 From: kenny.pitt at gmail.com (Kenny Pitt) Date: Fri, 12 Oct 2007 08:36:32 -0400 Subject: [spambayes-dev] [Spambayes] removal In-Reply-To: <18173.22822.905736.389082@montanaro.dyndns.org> References: <114793.92018.qm@web83705.mail.sp1.yahoo.com> <1187717281.9821.49.camel@saruman> <18157.12854.323837.957584@montanaro.dyndns.org> <2a052b990709181017h3b3acdcewec849e9f4485bdaf@mail.gmail.com> <18160.35235.278173.917643@montanaro.dyndns.org> <2a052b990709280705t1fe95be1r46ce4fdd2e0dd0ce@mail.gmail.com> <2a052b990709280724v624f6e51s8dac4c8bb8603a8b@mail.gmail.com> <18173.22822.905736.389082@montanaro.dyndns.org> Message-ID: <2a052b990710120536l2b32a2e8oeec1dd86df91474d@mail.gmail.com> On 9/28/07, skip at pobox.com wrote: > > Kenny> Hmm, this is a bit odd.... It's almost as if GMail is inferring > Kenny> a hyperlink there, but I can't figure out why or where it came up > Kenny> with the link because the spambayesInfo name doesn't even appear > Kenny> in any of the mail header fields. > > I use gmail as well, though I get the mail delivered via POP and only rarely > use the gmail web interface. I just went back and looked at this and a > couple other threads via the gmail web interface. Not only did I not see or > click URLs with missing "/" characters, I did not encounter any > Info/Unsubscribe links which ended in "/Unsubscribe". They all took me > (correctly, I believe) to the ".../listinfo/spambayes" page. Maybe it's a > browser problem or an interaction between gmail and your web browser. I'm > viewing through Thunderbird 2.0.0.something on Solaris 10 at the moment. I > can check using Safari and T-Bird from my Mac when I get home. Yeah, it's definately something weird that the GMail web interface is doing with the sigs, and it seems to be browser-specific. I opened the same message thread in both Safari and Firefox 2 on my Mac. For the following line in the signature, I got totally different results: Info/Unsubscribe: http://mail.python.org/mailman/listinfo/spambayes Safari: "Info/Unsubscribe" not highlighted as a link. "http://mail.python.org/mailman" highlighted as a link (note incomplete highlight) linking to "http://mail.pythong.org/mailman/listinfo/spambayes". Firefox: "Info/Unsubscribe" highlighted as a link linking to "http://mail.python.org/mailman/listinfo/spambayesInfo/Unsubscribe". "http://mail.python.org/mailman/listinfo/spambayes" highlighted as a link to the same complete URL. IIRC, the behavior I got at work on IE in Windows was the same as what I'm seeing in Firefox on the Mac. In looking closer at the Firefox result, I think I see what is happening. The line above the "Info/Unsubscribe" line in the signature is "http://mail.python.org/mailman/listinfo/spambayes" on a line by itself. The "Info/Unsubscribe" appears to have been detected as a broken-line continuation of the URL on the line above and was converted into a single link to the "re-assembled" URL. The "spambayesInfo" comes from the "Info" part being appended. The "/Unsubscribe" then makes it *look* like a direct link to an unsubscribe page, but isn't the actual URL format used by mailman. I can't be sure, but I think it may actually be GMail that is doing the combining. First, I saw it in both Firefox and IE, though not in Safari, so it is not entirely browser-specific. Second, I'm quite sure I actually saw the link in the page source on IE. On the Mac, all I could get from source was the GMail JavaScript, but my IE has a developer extension to show the actual HTML inserted by the JavaScript and that's where it showed up. So apparently it's just a strange GMail web interface glitch. There's obviously nothing wrong with the sig coming from mailman, so nothing to fix. Sorry for the wild goose chase! <0.5 wink> -- Kenny Pitt From dave at boost-consulting.com Sun Oct 21 16:37:08 2007 From: dave at boost-consulting.com (David Abrahams) Date: Sun, 21 Oct 2007 10:37:08 -0400 Subject: [spambayes-dev] Unsafe Pickle Saving Message-ID: <871wbosfe3.fsf@grogan.peloton> SpamBayes has, in storage.py, some code for storing pickles safely, but that code is only used for the training database and not the numerous other pickles used in SpamBayes, e.g. for caches. The result was that, when I was doing server-side filtering, cache pickles would be read while being written, and SpamBayes wasn't prepared to deal with the corruption. I submitted a patch at http://sourceforge.net/tracker/index.php?func=detail&aid=1816240&group_id=61702&atid=498103 that factors that code out and uses it wherever pickles are saved. Frankly it would be worth looking at all the places where SpamBayes saves files, because this is probably not a pickle-specific issue. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com From skip at pobox.com Mon Oct 22 00:33:54 2007 From: skip at pobox.com (skip at pobox.com) Date: Sun, 21 Oct 2007 17:33:54 -0500 Subject: [spambayes-dev] Unsafe Pickle Saving In-Reply-To: <871wbosfe3.fsf@grogan.peloton> References: <871wbosfe3.fsf@grogan.peloton> Message-ID: <18203.54226.797351.19158@montanaro.dyndns.org> David> The result was that, when I was doing server-side filtering, David> cache pickles would be read while being written, and SpamBayes David> wasn't prepared to deal with the corruption. David, Can you explain how the pickle would be read and written simultaneously? Are we talking multiple apps or the same app from multiple threads? (Haven't looked at your patch yet...) Thx, Skip From dave at boost-consulting.com Mon Oct 22 07:22:54 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 22 Oct 2007 01:22:54 -0400 Subject: [spambayes-dev] Unsafe Pickle Saving In-Reply-To: <18203.54226.797351.19158@montanaro.dyndns.org> (skip@pobox.com's message of "Sun\, 21 Oct 2007 17\:33\:54 -0500") References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> Message-ID: <87r6jnradt.fsf@grogan.peloton> on Sun Oct 21 2007, skip-AT-pobox.com wrote: > David> The result was that, when I was doing server-side filtering, > David> cache pickles would be read while being written, and SpamBayes > David> wasn't prepared to deal with the corruption. > > David, > > Can you explain how the pickle would be read and written simultaneously? > Are we talking multiple apps or the same app from multiple threads? > (Haven't looked at your patch yet...) I assume it's the same app from multiple processes. I don't know how my server hands messages off to procmail -- it could well be running them in parallel. And then there's my cron job that does training once daily, which could happen at the same time as any message arrives. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com From skip at pobox.com Mon Oct 22 14:00:17 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Oct 2007 07:00:17 -0500 Subject: [spambayes-dev] Unsafe Pickle Saving In-Reply-To: <87r6jnradt.fsf@grogan.peloton> References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> <87r6jnradt.fsf@grogan.peloton> Message-ID: <18204.37073.862229.980827@montanaro.dyndns.org> >> Can you explain how the pickle would be read and written >> simultaneously? Dave> I assume it's the same app from multiple processes.... And then Dave> there's my cron job that does training once daily, which could Dave> happen at the same time as any message arrives. That would be it. There is no locking of the various databases between processes. I suppose we could implement something, but the vagaries of network file systems and cross-platform demands have always made that task problematic. Maybe we should expand your safe_pickle code into safe_pickle_read and safe_pickle_write with a corresponding bit of file locking code they can invoke (which will probably be ugly as sin - but hidden away in a dark corner - once it's complete). Skip From dave at boost-consulting.com Mon Oct 22 17:43:25 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 22 Oct 2007 11:43:25 -0400 Subject: [spambayes-dev] Unsafe Pickle Saving In-Reply-To: <18204.37073.862229.980827@montanaro.dyndns.org> (skip@pobox.com's message of "Mon\, 22 Oct 2007 07\:00\:17 -0500") References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> <87r6jnradt.fsf@grogan.peloton> <18204.37073.862229.980827@montanaro.dyndns.org> Message-ID: <87640zgnoi.fsf@grogan.peloton> on Mon Oct 22 2007, skip-AT-pobox.com wrote: > >> Can you explain how the pickle would be read and written > >> simultaneously? > > Dave> I assume it's the same app from multiple processes.... And then > Dave> there's my cron job that does training once daily, which could > Dave> happen at the same time as any message arrives. > > That would be it. Well, the cron job does tte with a fresh database, so it's only the various caches that can get into trouble. > There is no locking of the various databases between > processes. I suppose we could implement something, but the vagaries of > network file systems and cross-platform demands have always made that task > problematic. Maybe we should expand your safe_pickle code into > safe_pickle_read and safe_pickle_write with a corresponding bit of file > locking code they can invoke (which will probably be ugly as sin - but > hidden away in a dark corner - once it's complete). Sounds fine to me. I doubt it would be so terribly ugly either. Somebody should have already written a file lock library that encapsulates it, so maybe we can use that. And if it doesn't exist, it should! -- Dave Abrahams Boost Consulting http://www.boost-consulting.com From skip at pobox.com Mon Oct 22 17:56:22 2007 From: skip at pobox.com (skip at pobox.com) Date: Mon, 22 Oct 2007 10:56:22 -0500 Subject: [spambayes-dev] Unsafe Pickle Saving In-Reply-To: <87640zgnoi.fsf@grogan.peloton> References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> <87r6jnradt.fsf@grogan.peloton> <18204.37073.862229.980827@montanaro.dyndns.org> <87640zgnoi.fsf@grogan.peloton> Message-ID: <18204.51238.591613.909583@montanaro.dyndns.org> Dave> Somebody should have already written a file lock library that Dave> encapsulates it, so maybe we can use that. And if it doesn't Dave> exist, it should! It does exist, in several flavors, as my query to python-dev indicates. I've gotten three responses so far (twisted, mailman and something called zc.lockfile). They are all different implementations and it's not clear they all satisfy the same constraints. I'll look at them then do something (not sure what at this point). Skip From tdickenson at geminidataloggers.com Tue Oct 23 11:11:30 2007 From: tdickenson at geminidataloggers.com (Toby Dickenson) Date: Tue, 23 Oct 2007 10:11:30 +0100 Subject: [spambayes-dev] Unsafe Pickle Saving References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> <87r6jnradt.fsf@grogan.peloton> <18204.37073.862229.980827@montanaro.dyndns.org> <87640zgnoi.fsf@grogan.peloton> <18204.51238.591613.909583@montanaro.dyndns.org> Message-ID: skip at pobox.com wrote: > > Dave> Somebody should have already written a file lock library that > Dave> encapsulates it, so maybe we can use that. And if it doesn't > Dave> exist, it should! > > It does exist, in several flavors, as my query to python-dev indicates. Sounds like a good addition to python, but Im not sure its the solution in this case. The cron job probably should tte using a different database filename, then atomically swap the new database in place of the old. This ensures you dont leave a partial database in place if the machine crashes/is shutdown/process is killed during the tte run. It also allows procmail to run on the old database concurrently with the training run, which I guess a lock file would prevent. Or am I confused here. Ive just re-read David's original post, and you say this the problem is caches rather than the database. Which caches are these? -- Toby Dickenson (happy procmail user) From neilldavy at hotmail.com Sun Oct 28 01:54:14 2007 From: neilldavy at hotmail.com (Neill Davy) Date: Sun, 28 Oct 2007 00:54:14 +0100 Subject: [spambayes-dev] How do I setup spambayes for this situation.... Message-ID: I currently have two email addresses which I access through Outlook Express. A tiscali one and a hotmail one. How do I set up Spambayes for this? The info I have currently found in the FAQs seems to be based around just a POP3 supplier - so I'm nor sure whether to just follow that or not. clueless of MK -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20071028/bd9af38d/attachment.htm From dave at boost-consulting.com Mon Oct 29 19:17:25 2007 From: dave at boost-consulting.com (David Abrahams) Date: Mon, 29 Oct 2007 14:17:25 -0400 Subject: [spambayes-dev] Unsafe Pickle Saving References: <871wbosfe3.fsf@grogan.peloton> <18203.54226.797351.19158@montanaro.dyndns.org> <87r6jnradt.fsf@grogan.peloton> <18204.37073.862229.980827@montanaro.dyndns.org> <87640zgnoi.fsf@grogan.peloton> <18204.51238.591613.909583@montanaro.dyndns.org> Message-ID: <87sl3tdbuy.fsf@grogan.peloton> on Tue Oct 23 2007, Toby Dickenson wrote: > skip at pobox.com wrote: > >> >> Dave> Somebody should have already written a file lock library that >> Dave> encapsulates it, so maybe we can use that. And if it doesn't >> Dave> exist, it should! >> >> It does exist, in several flavors, as my query to python-dev indicates. > > Sounds like a good addition to python, but Im not sure its the solution in > this case. The cron job probably should tte using a different database > filename, then atomically swap the new database in place of the > old. It does do that. However, the caches for dns and images (which are pickles) are the same in both cases. > This ensures you dont leave a partial database in place if the > machine crashes/is shutdown/process is killed during the tte run. It > also allows procmail to run on the old database concurrently with > the training run, which I guess a lock file would prevent. > > Or am I confused here. Ive just re-read David's original post, and you say > this the problem is caches rather than the database. Which caches are > these? Bingo; see above. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com