From nas at python.ca  Mon Jan  6 16:07:15 2003
From: nas at python.ca (Neil Schemenauer)
Date: Mon Jan  6 19:03:15 2003
Subject: [Spambayes] Two amusing spam clues
Message-ID: <20030107000715.GA19250@glacier.arctrix.com>

For an email I just received:

  'header:Reply-To:1' 0.815893146633
  'message-id:@murphy.debian.org' 0.997094899935

The first one surprised me.  It looks like most spam provides a reply-to
header that is the same as the from header.  I have not idea why they do
that.  The second one is spammy because I get a fair amount of spam
through my debian.org address.  I guess a lot of spam doesn't have a
message ID so the Debian mail server adds one.

The moral of the story is that statistical filters are good at picking
up on clues that humans might miss.  I don't about other people on this
list but my spambayes filter is kicking spammer ass.  I very rarely see
a FN and even more rarely see a FP.  Props to everyone who helped out
with development.

  Neil

From richie at entrian.com  Tue Jan  7 10:49:48 2003
From: richie at entrian.com (richie@entrian.com)
Date: Tue Jan  7 05:50:23 2003
Subject: [Spambayes] Two amusing spam clues
In-Reply-To: <20030107000715.GA19250@glacier.arctrix.com>
Message-ID: <E18VrIr-0004dS-0U@anchor-post-30.mail.demon.net>

[Neil]
> I don't about other people on this
> list but my spambayes filter is kicking spammer ass.  I very rarely see
> a FN and even more rarely see a FP.  Props to everyone who helped out
> with development.

Me too.  I came back from the Christmas break to find 315 messages waiting
for me.  My spambayes system had only been trained on 900 messages, but
correctly classified 262 spams and 48 hams, and was unsure about just 5
messages (2 hams, 3 spams).  No FPs, no FNs.  Very impressive.

I'm now working on pulling the HTML code out from pop3proxy.py (where all
the HTML is mixed in with the Python) into a separate HTML file, which is
viewable and editable, and making the web UI pull the HTML components out
of there at run time.  That should let me unify the pop3proxy web UI with
Tim Stone's OptionsConfig.py, and should let John Draper integrate his
proposed web-based Spam Management System much more easily.

I also have a fix for the memory-usage problems posted by Rob B back in
December - I'll check that in with the HTML edits.

-- 
Richie Hindle
richie@entrian.com


From skip at pobox.com  Tue Jan  7 09:08:06 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan  7 10:54:30 2003
Subject: [Spambayes] Two amusing spam clues
In-Reply-To: <20030107000715.GA19250@glacier.arctrix.com>
References: <20030107000715.GA19250@glacier.arctrix.com>
Message-ID: <15898.60758.122222.194424@montanaro.dyndns.org>


    Neil>   'message-id:@murphy.debian.org' 0.997094899935

    Neil> I guess a lot of spam doesn't have a message ID so the Debian mail
    Neil> server adds one.

Yeah, seems rather odd, doesn't it?  It's not like they are hard to
generated or anything.  For me message-id:@pobox.com and
message-id:@manatee.mojam.com are killer clues for the same reason.

    Neil> I don't about other people on this list but my spambayes filter is
    Neil> kicking spammer ass.  I very rarely see a FN and even more rarely
    Neil> see a FP.  Props to everyone who helped out with development.

Same here on all accounts.  I have fairly conservative cutoffs (0.25 and
0.8), so I get a handful of unsures per day.  I can't remember the last time
I got a false positive.  I haven't trained in awhile either.  The last time
I trained was before Christmas.

I have mods to pop3proxy to allow startup of antoher program before making
connections (this allows you to do things like tunnel pop3 over ssh), but
I've been too chicken to try it out, fearing I'd lose email.  I'll get to it
one of these days and then check in the changes.

Skip

From lists at morpheus.demon.co.uk  Tue Jan  7 20:18:46 2003
From: lists at morpheus.demon.co.uk (Paul Moore)
Date: Tue Jan  7 15:46:38 2003
Subject: [Spambayes] Two amusing spam clues
References: <20030107000715.GA19250@glacier.arctrix.com>
Message-ID: <n2m-g.ptr8yaqh.fsf@morpheus.demon.co.uk>

Neil Schemenauer <nas@python.ca> writes:

> I very rarely see a FN and even more rarely see a FP.  Props to
> everyone who helped out with development.

I agree entirely. These days, spam simply isn't anything like the
problem it used to be, and that's entirely down to spambayes.

Paul.
-- 
This signature intentionally left blank

From lists at morpheus.demon.co.uk  Tue Jan  7 20:28:47 2003
From: lists at morpheus.demon.co.uk (Paul Moore)
Date: Tue Jan  7 15:46:39 2003
Subject: [Spambayes] Outlook addin is slow shutting down
Message-ID: <n2m-g.n0mcya9s.fsf@morpheus.demon.co.uk>

I've noticed that these days, Outlook is very slow in shutting down,
sometimes taking 2 or 3 minutes after the UI has gone before the
process terminates. This is starting to be a problem for me, as my end
of day routine is to shut down Outlook, then the PC. I now have to
wait and check that Outlook has really gone before I start shutting
down the PC (I don't want my spambayes database corrupted because I
shut the PC down before the pickle was written out).

I assume that the delay is caused by the large pickle getting written
to disk. (Can I check this assumption in any way?) Is that probable,
and if so is anyone looking at addressing the issue? I'm not sure what
might be done - we can't easily switch to DBM files without hitting
the issue that the Windows distribution of Python 2.2 doesn't have a
(non-broken) DBM alternative, and I don't think we'd want the Outlook
client to gain a dependency on bsddb :-(

Any thoughts?

Paul.
-- 
This signature intentionally left blank

From rob at hooft.net  Tue Jan  7 09:18:44 2003
From: rob at hooft.net (Rob W.W. Hooft)
Date: Wed Jan  8 00:37:25 2003
Subject: [Spambayes] Two amusing spam clues
References: <20030107000715.GA19250@glacier.arctrix.com>
Message-ID: <3E1A8D64.40107@hooft.net>

Neil Schemenauer wrote:
> For an email I just received:
> 
>   'header:Reply-To:1' 0.815893146633
>   'message-id:@murphy.debian.org' 0.997094899935
> 
> The first one surprised me.  It looks like most spam provides a reply-to
> header that is the same as the from header.  I have not idea why they do
> that.  The second one is spammy because I get a fair amount of spam
> through my debian.org address.  I guess a lot of spam doesn't have a
> message ID so the Debian mail server adds one.
> 
> The moral of the story is that statistical filters are good at picking
> up on clues that humans might miss.  I don't about other people on this
> list but my spambayes filter is kicking spammer ass.  I very rarely see
> a FN and even more rarely see a FP.  Props to everyone who helped out
> with development.

I just retrained on the latest batch for me: 448 messages classified as 
ham, 3 of these were fn. 122 messages classified as spam, no fp. 18 
messages classified as unsure, all of these were spam. I could have 
reduced the number of unsures to 11 retrospectively by using the default 
spam cutoff of 0.90.

Lowest scoring spam: 0.11, highest scoring ham: 0.01

spambayes really is very good now. If someone could find time, we should 
make a release! It is awfully quiet on this list lately....

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/


From diana at uv-ray.com  Wed Jan  8 17:33:55 2003
From: diana at uv-ray.com (Diana Revencu)
Date: Wed Jan  8 12:49:26 2003
Subject: [Spambayes] 
Message-ID: <012101c2b72b$5ba40db0$0100a8c0@home>

Dear Sirs,

I was having a look over your anti-spam resources, very nice! 
We recently introduced a spam filter, It is available at http://www.spambully.com.

Spam Bully utilizes a Bayesian Filter, Confirmation Messages, can bounce known spams and friend/spammer lists.

We would be very grateful if you would link to us. We can provide a link back to your site in our news section we are developing. 
We provide information on the latest developments in spam. http://www.spambully.com/news/

Thank you,
Diana

diana@spambully.com
From anthony at interlink.com.au  Thu Jan  9 13:13:03 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Jan  8 21:14:14 2003
Subject: [Spambayes] 
In-Reply-To: <012101c2b72b$5ba40db0$0100a8c0@home> 
Message-ID: <200301090213.h092D4F14953@localhost.localdomain>

An entry has been added to the 'related projects' page on the website.


From francois.granger at free.fr  Thu Jan  9 11:24:32 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Thu Jan  9 05:24:37 2003
Subject: [Spambayes] 
In-Reply-To: <200301090213.h092D4F14953@localhost.localdomain>
Message-ID: <BA430C70.60CDF%francois.granger@free.fr>

on 9/01/03 3:13, Anthony Baxter at anthony@interlink.com.au wrote:

> An entry has been added to the 'related projects' page on the website.

On page

http://spambayes.sourceforge.net/applications.html

Under title Hammie.py, you could replace

>Currently documentation focusses on Unix.

By

>Currently documentation focusses on Unix. Works on MacOS X as well

Under title pop3proxy. Py, you could replace

>Should work on windows/unix/whatever... ?

By

>Should work on windows/unix/MacOS 9/whatever... ?


-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From Paul.Moore at atosorigin.com  Thu Jan  9 13:48:30 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Thu Jan  9 08:49:13 2003
Subject: [Spambayes] 
 Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange
Message-ID: <16E1010E4581B049ABC51D4975CEDB886199A1@UKDCX001.uk.int.atosorigin.com>

I just upgraded Spambayes to latest CVS, and it broke my Outlook setup. I
use Outlook plus MS Exchange, and with the current CVS version (addin.py
revision 1.43), when I select "Open other user's calendar" I get an
immediate crash (GPF, trying to read from address 0) in Outlook. I reverted
just addin.py back to revision 1.42, and the crash no longer occurs.

I have no idea what in the changes might have caused this to happen. The
"open other user's folder" menu item does create a new Outlook window,
and that window (with the old addin.py) has a "Delete as spam" button,
which doesn't make sense for a calendar entry, much less for someone else's
calendar. So maybe that's relevant (I could try clicking the button, but as
I'm writing this mail in Outlook, a crash would lose it, so I won't for now
:-))

Sorry I can't give any more clues - if you want me to do any testing, just
ask.

Paul.

From tdickenson at devmail.geminidataloggers.co.uk  Thu Jan  9 21:07:59 2003
From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson)
Date: Thu Jan  9 17:37:07 2003
Subject: [Spambayes] Two amusing spam clues
In-Reply-To: <15898.60758.122222.194424@montanaro.dyndns.org>
References: <20030107000715.GA19250@glacier.arctrix.com>
	<15898.60758.122222.194424@montanaro.dyndns.org>
Message-ID: <200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk>

On Tuesday 07 January 2003 3:08 pm, Skip Montanaro wrote:
> Neil> I guess a lot of spam doesn't have a message ID so the Debian
> Neil> mail server adds one.
>
> Yeah, seems rather odd, doesn't it?  It's not like they are hard to
> generated or anything.  For me message-id:@pobox.com and
> message-id:@manatee.mojam.com are killer clues for the same reason.

I am seeing a high proportion of spams coming through our secondary MX, so 
message-id:@charon.geminidataloggers.com is a suprising spam clue for me too.

> Neil> I don't about other people on this list but my spambayes filter
> Neil> is kicking spammer ass.  I very rarely see a FN and even more rarely
> Neil> see a FP.  Props to everyone who helped out with development.
>
> Same here on all accounts.  I have fairly conservative cutoffs (0.25 and
> 0.8), so I get a handful of unsures per day.  I can't remember the last
> time I got a false positive.  

I have only had one Unsure in the last few weeks, and it had some interesting 
characteristics. The first half looked very spammy, but the second half was a 
list of three, four, and five digit numbers. Apparently numbers are a strong 
ham clue for me. Particularly 336, 603, and 320.


From rbyrnes at ozemail.com.au  Fri Jan 10 10:04:59 2003
From: rbyrnes at ozemail.com.au (Rob B)
Date: Thu Jan  9 18:06:06 2003
Subject: [Spambayes] Two amusing spam clues
In-Reply-To: <E18VrIr-0004dS-0U@anchor-post-30.mail.demon.net>
References: <20030107000715.GA19250@glacier.arctrix.com>
Message-ID: <5.1.1.6.2.20030110100056.01cf80d0@127.0.0.1>

At 21:49 7/01/2003, richie@entrian.com sent this up the stick:
>I also have a fix for the memory-usage problems posted by Rob B back in
>December - I'll check that in with the HTML edits.

The 250-message limit "fix" posted to CVS (v1.8) on Jan 5 seemed to work.

cheer,
Rob

--
Let a fool hold his tongue and he will pass for a sage.

This is random quote 775 of a collection of 1273

Distance from the centre of the brewing universe:
[15200.8 km (8207.8 mi), 262.8 deg](Apparent) Rennerian

Public Key fingerprint = 6219 33BD A37B 368D 29F5  19FB 945D C4D7 1F66 D9C5


From piersh at friskit.com  Thu Jan  9 15:22:35 2003
From: piersh at friskit.com (Piers Haken)
Date: Thu Jan  9 18:08:09 2003
Subject: [Spambayes] Outlook addin is slow shutting down
Message-ID: <9891913C5BFE87429D71E37F08210CB9297535@zeus.sfhq.friskit.com>

You're right, it's the saving of the pickle. If you run pythonwin's
debug output window then you'll see the diagnostic messages telling you
what's going on.

I wonder, would it be possible to use MSDE (a single-user version of SQL
Server), which ships with office, for the Outlook plugin?

Piers.

-----Original Message-----
From: Paul Moore [mailto:lists@morpheus.demon.co.uk] 
Sent: Tuesday, January 07, 2003 12:29 PM
To: spambayes@python.org
Subject: [Spambayes] Outlook addin is slow shutting down


I've noticed that these days, Outlook is very slow in shutting down,
sometimes taking 2 or 3 minutes after the UI has gone before the process
terminates. This is starting to be a problem for me, as my end of day
routine is to shut down Outlook, then the PC. I now have to wait and
check that Outlook has really gone before I start shutting down the PC
(I don't want my spambayes database corrupted because I shut the PC down
before the pickle was written out).

I assume that the delay is caused by the large pickle getting written to
disk. (Can I check this assumption in any way?) Is that probable, and if
so is anyone looking at addressing the issue? I'm not sure what might be
done - we can't easily switch to DBM files without hitting the issue
that the Windows distribution of Python 2.2 doesn't have a
(non-broken) DBM alternative, and I don't think we'd want the Outlook
client to gain a dependency on bsddb :-(

Any thoughts?

Paul.
-- 
This signature intentionally left blank

_______________________________________________
Spambayes mailing list
Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes
From mhammond at skippinet.com.au  Fri Jan 10 11:35:55 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan  9 19:36:43 2003
Subject: [Spambayes] 
 RE: Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB886199A1@UKDCX001.uk.int.atosorigin.com>
Message-ID: <000301c2b840$3d23a880$530f8490@eden>

> I just upgraded Spambayes to latest CVS, and it broke my 
> Outlook setup. I
> use Outlook plus MS Exchange, and with the current CVS 
> version (addin.py
> revision 1.43), when I select "Open other user's calendar" I get an
> immediate crash (GPF, trying to read from address 0) in 
> Outlook. I reverted
> just addin.py back to revision 1.42, and the crash no longer occurs.

I hope I just checked in a fix for this.  It seems to happen whenever you
select "Open in new window" for *any* Outlook item.

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2456 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030110/93ba98ce/winmail.bin
From anthony at interlink.com.au  Fri Jan 10 20:09:15 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 10 04:10:34 2003
Subject: [Spambayes] re-org - making a package &c.
Message-ID: <200301100909.h0A99G403099@localhost.localdomain>


I'm just making a reorg-branch now in CVS - I'm going to move the library
code into a subdirectory 'spambayes', and then adjust things. There may be
some disruption :) but this should then allow us to actually package stuff
up and release things. 

If anyone wants to help, let me know...


From Paul.Moore at atosorigin.com  Fri Jan 10 09:19:35 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Fri Jan 10 04:20:10 2003
Subject: [Spambayes] 
 RE: Outlook client - addin.py revision 1.43 broke Outlook+MS Exchange
Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D82C@UKDCX001.uk.int.atosorigin.com>

From: Mark Hammond [mailto:mhammond@skippinet.com.au]
> I hope I just checked in a fix for this.  It seems to happen
> whenever you select "Open in new window" for *any* Outlook item.

Yes, that fixed it. Thanks for the extremely quick response!

Paul.

From anthony at interlink.com.au  Fri Jan 10 22:08:25 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 10 06:09:33 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <200301100909.h0A99G403099@localhost.localdomain> 
Message-ID: <200301101108.h0AB8Q304706@localhost.localdomain>


I should probably add that once this is done, and bedded down, I'd
like to propose that we make a real release - I'm thinking we do one
release that's hammie and pop3proxy, and another that's the Outlook
plugin.

It's not like we're still tracking a moving target here, and there's
no reason we can't make this a lot easier for people than "get the CVS" :)


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From msergeant at startechgroup.co.uk  Fri Jan 10 11:15:00 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Fri Jan 10 06:13:13 2003
Subject: [Spambayes] Two amusing spam clues
In-Reply-To: <200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk>
References: <20030107000715.GA19250@glacier.arctrix.com>
	<15898.60758.122222.194424@montanaro.dyndns.org> 
	<200301092107.59382.tdickenson@devmail.geminidataloggers.co.uk>
Message-ID: <1042197300.30555.87.camel@felony.int.star.co.uk>

On Thu, 2003-01-09 at 21:07, Toby Dickenson wrote:
> On Tuesday 07 January 2003 3:08 pm, Skip Montanaro wrote:
> > Neil> I guess a lot of spam doesn't have a message ID so the Debian
> > Neil> mail server adds one.
> >
> > Yeah, seems rather odd, doesn't it?  It's not like they are hard to
> > generated or anything.  For me message-id:@pobox.com and
> > message-id:@manatee.mojam.com are killer clues for the same reason.
> 
> I am seeing a high proportion of spams coming through our secondary MX, so 
> message-id:@charon.geminidataloggers.com is a suprising spam clue for me too.

This is to bypass postini, who's default setting is to set three (I
think) MX records: two of their servers at high priority and finally
yours at low priority in case postini's servers are down.

So spammers are starting to just choose the lowest priority MX server.

Matt.


From anthony at interlink.com.au  Fri Jan 10 22:18:11 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 10 06:19:23 2003
Subject: [Spambayes] Two amusing spam clues 
In-Reply-To: <1042197300.30555.87.camel@felony.int.star.co.uk> 
Message-ID: <200301101118.h0ABIBl04824@localhost.localdomain>


>>> Matt Sergeant wrote
> This is to bypass postini, who's default setting is to set three (I
> think) MX records: two of their servers at high priority and finally
> yours at low priority in case postini's servers are down.
> 
> So spammers are starting to just choose the lowest priority MX server.

Excellent. I know many places that have their systems set up with 
multiple MXs, the lowest is usually the hideously overloaded mail server
that supplies their bandwidth. These machines can sometimes get hours and
hours behind, as they're generally not over-burdened with spare cycles.

So now they'll get even more spam shite to deal with. 

wonderful :/


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From Paul.Moore at atosorigin.com  Fri Jan 10 11:19:08 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Fri Jan 10 06:19:54 2003
Subject: [Spambayes] re-org - making a package &c. 
Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D82E@UKDCX001.uk.int.atosorigin.com>

From: Anthony Baxter [mailto:anthony@interlink.com.au]
> I should probably add that once this is done, and bedded
> down, I'd like to propose that we make a real release - I'm
> thinking we do one release that's hammie and pop3proxy, and
> another that's the Outlook plugin.

+1

Paul

From tim at fourstonesExpressions.com  Fri Jan 10 08:25:01 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 10 09:51:43 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <16E1010E4581B049ABC51D4975CEDB880113D82E@UKDCX001.uk.int.atosorigin.com>
Message-ID: <HDGJGURPLWVXUQKOIKED74GCDALFC9.3e1ed7bd@riven>

You might want to tag the current tree...

- TimS

1/10/2003 5:19:08 AM, "Moore, Paul" <Paul.Moore@atosorigin.com> wrote:

>From: Anthony Baxter [mailto:anthony@interlink.com.au]
>> I should probably add that once this is done, and bedded
>> down, I'd like to propose that we make a real release - I'm
>> thinking we do one release that's hammie and pop3proxy, and
>> another that's the Outlook plugin.
>
>+1
>
>Paul
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From papaDoc at videotron.ca  Fri Jan 10 09:29:32 2003
From: papaDoc at videotron.ca (papaDoc)
Date: Fri Jan 10 09:51:43 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <200301101108.h0AB8Q304706@localhost.localdomain>
References: <200301101108.h0AB8Q304706@localhost.localdomain>
Message-ID: <3E1ED8CC.1000606@videotron.ca>

Hi Anthony,

>I should probably add that once this is done, and bedded down, I'd
>like to propose that we make a real release - I'm thinking we do one
>release that's hammie and pop3proxy, and another that's the Outlook
>plugin.
>
>It's not like we're still tracking a moving target here, and there's
>no reason we can't make this a lot easier for people than "get the CVS" :)
>  
>
I wrote some documentation for pop3proxy (sometime in november) and 
summitted to the list.
Francois Granger added stuff for Mac OS X. I'm still doing a rewritting 
and reformatting but it
should be available soon.

papaDoc


From whisper at oz.net  Fri Jan 10 10:56:33 2003
From: whisper at oz.net (David LeBlanc)
Date: Fri Jan 10 14:03:50 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <HDGJGURPLWVXUQKOIKED74GCDALFC9.3e1ed7bd@riven>
Message-ID: <GCEDKONBLEFPPADDJCOEAEGLHEAA.whisper@oz.net>

Why split it up?

David LeBlanc
Seattle, WA USA 

> -----Original Message-----
> From: spambayes-bounces@python.org
> [mailto:spambayes-bounces@python.org]On Behalf Of Tim Stone - Four
> Stones Expressions
> Sent: Friday, January 10, 2003 6:25
> To: Anthony Baxter; Moore, Paul
> Cc: spambayes@python.org
> Subject: Re: RE: [Spambayes] re-org - making a package &c. 
> 
> 
> You might want to tag the current tree...
> 
> - TimS
> 
> 1/10/2003 5:19:08 AM, "Moore, Paul" <Paul.Moore@atosorigin.com> wrote:
> 
> >From: Anthony Baxter [mailto:anthony@interlink.com.au]
> >> I should probably add that once this is done, and bedded
> >> down, I'd like to propose that we make a real release - I'm
> >> thinking we do one release that's hammie and pop3proxy, and
> >> another that's the Outlook plugin.
> >
> >+1
> >
> >Paul
> >


From anthony at interlink.com.au  Sat Jan 11 15:18:03 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 10 23:19:08 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <GCEDKONBLEFPPADDJCOEAEGLHEAA.whisper@oz.net> 
Message-ID: <200301110418.h0B4I4V23348@localhost.localdomain>


>>> "David LeBlanc" wrote
> Why split it up?

I'm not sure what 'it' means in this context - if you mean 'outlook plugin'
and 'pop3proxy/hammie', well, they're completely applications, and it's 
unlikely that it would be of use to everyone. On the other hand, it's 
possibly better to have 3 packages - the base "spambayes" one, the outlook
plugin, and the pop3proxy/hammie package. Not sure.

If you mean "why reorganise the files into directories - well, as it is
now, we install a large pile of packages with _very_ generic names into
site-packages. This is ungood.


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From whisper at oz.net  Fri Jan 10 20:45:53 2003
From: whisper at oz.net (David LeBlanc)
Date: Fri Jan 10 23:46:06 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <200301110418.h0B4I4V23348@localhost.localdomain>
Message-ID: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>

I thought that the pop3proxy was needed for the outlook application, thus
the "why split" question.

As for putting things in directories, I heartily agree/approve - people who
release things that put things _in_ site-packages ought to be shot ;)

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: Anthony Baxter [mailto:anthony@interlink.com.au]
> Sent: Friday, January 10, 2003 20:18
> To: David LeBlanc
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] re-org - making a package &c.
>
>
>
> >>> "David LeBlanc" wrote
> > Why split it up?
>
> I'm not sure what 'it' means in this context - if you mean
> 'outlook plugin'
> and 'pop3proxy/hammie', well, they're completely applications, and it's
> unlikely that it would be of use to everyone. On the other hand, it's
> possibly better to have 3 packages - the base "spambayes" one, the outlook
> plugin, and the pop3proxy/hammie package. Not sure.
>
> If you mean "why reorganise the files into directories - well, as it is
> now, we install a large pile of packages with _very_ generic names into
> site-packages. This is ungood.
>
>
> --
> Anthony Baxter     <anthony@interlink.com.au>
> It's never too late to have a happy childhood.
>


From mhammond at skippinet.com.au  Sun Jan 12 23:34:21 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Jan 12 07:35:01 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
Message-ID: <05d801c2ba36$ef200a90$530f8490@eden>

[David LeBlanc]
> I thought that the pop3proxy was needed for the outlook
> application, thus the "why split" question.

Nope - Outlook just needs the engine.

I don't see a need to make a release of the core engine - it is always
available from CVS, and anyone who wants just the engine and no applicaiton
is going to arrange for that without problem.

I fully support the directory splits, though.  I would also like to see a
"test" directory.

For various reasons, I don't think we are ready for a "binary" distribution
of this stuff yet - but making the first release "python source only" may
appeal in terms of limiting the set of initial users to a fairly literate
and sympathetic audience willing to offer valuable feedback.  Especially
valuable will be the initial training experiences given many people wont
have been collecting spam when they first install this.  Tim's experiments
implied that without at least a few spam to start with, you better be
sympathetic for a while!

Still-working-on-the-stand-alone-DLL-tho-ly,

Mark.


From anthony at interlink.com.au  Mon Jan 13 15:56:56 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Jan 12 23:58:17 2003
Subject: [Spambayes] 
 Re: [Spambayes-checkins] website background.ht,1.4,1.5 style.css,1.2,1.3 
In-Reply-To: <E18XwT4-0004iR-00@sc8-pr-cvs1.sourceforge.net> 
Message-ID: <200301130456.h0D4uuQ15598@localhost.localdomain>


>>> "Anthony Baxter" wrote
> Update of /cvsroot/spambayes/website
> In directory sc8-pr-cvs1:/tmp/cvs-serv17660
> 
> Modified Files:
> 	background.ht style.css 
> Log Message:
> updated background with some sample plots. If someone in the set of
> (Tim, Gary, Rob) could review this and point out the obvious stupids,
> that would be good. (Or anyone else who understands the math...)

Could someone who understands the math of this stuff please read 
over the 'background' page and point out the mistakes? 

Also, if someone has a pointer to something that explains chi-squared
in words that don't include phrases like "confluent hypergeometric function 
of the second kind", that would be good :)


Anthony

From anthony at interlink.com.au  Mon Jan 13 17:55:23 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 13 01:56:42 2003
Subject: [Spambayes] changing various Options settings. 
In-Reply-To: <200301101108.h0AB8Q304706@localhost.localdomain> 
Message-ID: <200301130655.h0D6tNc16780@localhost.localdomain>


Before we do a proper release of this puppy, there's a few options-related
changes I'd like to suggest:

First off, all the stuff that's currently under 'TestDriver' that gets
used by real people needs to be moved. I'm looking at

ham_cutoff:  0.20
spam_cutoff: 0.90

in particular. Unfortunately, changing this will break everyone who's 
currently got the system deployed. Rather than doing this, I suggest
we add a new section 'Categorization', and add cutoff_ham and cutoff_spam 
options. We can then change the code to use the new options rather than
the old - it means people with the old code will get their preferences
ignored until they upgrade, but the alternative is to make it break
for everyone.

That way, the only options most people will want to frob are either in
the 'Tokenizer' block, or the new 'Categorization' block. 

Thoughts?


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tim at fourstonesExpressions.com  Mon Jan 13 07:49:15 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 13 08:52:22 2003
Subject: [Spambayes] changing various Options settings. 
Message-ID: <PKLIFUQ6MHVSWUNI1ZNHZUO93PO62.3e22c3db@myst>

This is really only a semantic problem, but I agree that it needs to 
be moved.  Unfortunately, duplicating it creates another problem.  I 
doubt that the old one will ever really go away if we do it that 
way.  Let's keep in mind that we're doing this for release, and just 
do it right the first time.

This is a special case of a general problem with Options.py.  
There's a ton of stuff that's only meaningful to the research phase 
of the project.  Also, and this is a big one, there are multiple 
places to specify database names.  To me this is a really big 
problem, particularly for OptionsConfig.py, which right now assumes 
that you're running the pop3proxy (an obviously invalid assumption).  
There should only be one place to specify a database, or if we think 
that people might be running more than one 'subsystem', we should 
have a place clearly for each one.  I propose we simply have a term 
'database-name' or something like that, which will be used 
regardless of whether your running pop3proxy or hammie or whatever 
else...

- TimS

1/13/2003 12:55:23 AM, Anthony Baxter <anthony@interlink.com.au> 
wrote:

>
>Before we do a proper release of this puppy, there's a few options-
related
>changes I'd like to suggest:
>
>First off, all the stuff that's currently under 'TestDriver' that 
gets
>used by real people needs to be moved. I'm looking at
>
>ham_cutoff:  0.20
>spam_cutoff: 0.90
>
>in particular. Unfortunately, changing this will break everyone 
who's 
>currently got the system deployed. Rather than doing this, I 
suggest
>we add a new section 'Categorization', and add cutoff_ham and 
cutoff_spam 
>options. We can then change the code to use the new options rather 
than
>the old - it means people with the old code will get their 
preferences
>ignored until they upgrade, but the alternative is to make it break
>for everyone.
>
>That way, the only options most people will want to frob are either 
in
>the 'Tokenizer' block, or the new 'Categorization' block. 
>
>Thoughts?
>
>
>-- 
>Anthony Baxter     <anthony@interlink.com.au>   
>It's never too late to have a happy childhood.
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From rob at hooft.net  Mon Jan 13 15:04:33 2003
From: rob at hooft.net (Rob W.W. Hooft)
Date: Mon Jan 13 09:05:24 2003
Subject: [Spambayes]  Re: [Spambayes-checkins] website
	background.ht,1.4,1.5 style.css,1.2,1.3
References: <200301130456.h0D4uuQ15598@localhost.localdomain>
Message-ID: <3E22C771.9090809@hooft.net>

Anthony Baxter wrote:
>>>>"Anthony Baxter" wrote
>>>
>>Update of /cvsroot/spambayes/website
>>In directory sc8-pr-cvs1:/tmp/cvs-serv17660
>>
>>Modified Files:
>>	background.ht style.css 
>>Log Message:
>>updated background with some sample plots. If someone in the set of
>>(Tim, Gary, Rob) could review this and point out the obvious stupids,
>>that would be good. (Or anyone else who understands the math...)
> 
> 
> Could someone who understands the math of this stuff please read 
> over the 'background' page and point out the mistakes? 

I have one important correction to make: AFAIK I had nothing to do with 
the mathematical conception of the chi-squared combining method. I have 
been working on chi-squared normalization of the other combining methods 
at the time the chi squared method was proposed by Gary and implemented 
by Tim. My only contribution in that realm is the change from 
(S-H)/(S+H) to (S-H+1)/2 to get a better "unsure" classification for 
messages that do not look like ham nor like spam. I'll not be of much 
help for mathematical foundation of anything.

How about the word "definately"? I would spell it definitely.

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/


From carel.fellinger at chello.nl  Mon Jan 13 12:13:44 2003
From: carel.fellinger at chello.nl (Carel Fellinger)
Date: Mon Jan 13 11:52:09 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <05d801c2ba36$ef200a90$530f8490@eden>
References: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
	<05d801c2ba36$ef200a90$530f8490@eden>
Message-ID: <20030113111344.GA12027@mail.felnet>


<\lurking-mode --now-that-the-work-is-almost-done>

On Sun, Jan 12, 2003 at 11:34:21PM +1100, Mark Hammond wrote:
...
> For various reasons, I don't think we are ready for a "binary" distribution
> of this stuff yet - but making the first release "python source only" may
> appeal in terms of limiting the set of initial users to a fairly literate
> and sympathetic audience willing to offer valuable feedback.  Especially

Willing to do more then just give feedback, at least I would:)

Suppose "spambayes --slash-training-styles" would run against several
databases, each of those databases keeping track of the probs for a
particular training style, adding extra headers indicating if and how
the different databases scored this particular email.  Me, I would be
willing then to be carefull to train according to all training style
candidates simultaniously.

The advantage would be that we would be comparing all training methods
on the same data.


<lurking-mode --just-in-case-someone-says-go-ahead-implement-it>


-- 
groetjes, carel

From neale at woozle.org  Mon Jan 13 10:36:07 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 13:37:35 2003
Subject: [Spambayes] changing various Options settings.
In-Reply-To: <PKLIFUQ6MHVSWUNI1ZNHZUO93PO62.3e22c3db@myst> (Tim Stone - Four
 Stones Expressions's message of "Mon, 13 Jan 2003 07:49:15 -0600")
References: <PKLIFUQ6MHVSWUNI1ZNHZUO93PO62.3e22c3db@myst>
Message-ID: <w53smvwkico.fsf@woozle.org>

Tim Stone - Four Stones Expressions <tim@fourstonesExpressions.com> writes:

> I propose we simply have a term 'database-name' or something like
> that, which will be used regardless of whether your running pop3proxy
> or hammie or whatever else...

+1

This has been bugging me since I first wrote the dbm storage method.  If
we can all agree on a default database name for every platform, I'm all
for standardizing it.


From neale at woozle.org  Mon Jan 13 10:39:19 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 13:39:22 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <20030113111344.GA12027@mail.felnet> (Carel Fellinger's message
 of "Mon, 13 Jan 2003 12:13:44 +0100")
References: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
	<05d801c2ba36$ef200a90$530f8490@eden>
	<20030113111344.GA12027@mail.felnet>
Message-ID: <w53ptr0ki7c.fsf@woozle.org>

Carel Fellinger <carel.fellinger@chello.nl> writes:

> Suppose "spambayes --slash-training-styles" would run against several
> databases, each of those databases keeping track of the probs for a
> particular training style, adding extra headers indicating if and how
> the different databases scored this particular email.  Me, I would be
> willing then to be carefull to train according to all training style
> candidates simultaniously.

That's an interesting idea.  It made me wonder if it wouldn't be helpful
to have some sort of application which trained on some data, then ran a
few scoring runs on other data with various options set, and reported
back what your "ideal" options are.  Does that make sense?  This could
be a part of the initial training step.

Neale

From neale at woozle.org  Mon Jan 13 10:58:06 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 13:58:10 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <200301110418.h0B4I4V23348@localhost.localdomain> (Anthony
 Baxter's message of "Sat, 11 Jan 2003 15:18:03 +1100")
References: <200301110418.h0B4I4V23348@localhost.localdomain>
Message-ID: <w53n0m4khc1.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

> On the other hand, it's possibly better to have 3 packages - the base
> "spambayes" one, the outlook plugin, and the pop3proxy/hammie
> package. Not sure.

I like the idea of splitting things into specific directories.  

Perhaps it's time rename things according to what they do and move the
emphasis away from testing.  Here's what I propose for hammie & co:

hammie/         -> contrib/
hammie.py + hammiefilter.py
                -> filter.py
mboxtrain.py + hammiebulk.py
                -> bulktrain.py
hammiesrv.py    -> contrib/XMLRPCServer.py
hammiecli.py    -> contrib/XMLRPCClient.py
hammiebatch.py  -> deleted  (or is someone using this?)

Aside from renaming things based on what they do, this would reduce
hammie's littering of the top-level directory to:

  filter.py
  bulktrain.py

Plus the standard supporting modules (storage.py, dbmstorage.py,
tokenizer.py, etc.)

I wager it could make the options file a lot simpler, too.

Shall I barge ahead with this?

Neale

From carel.fellinger at chello.nl  Mon Jan 13 20:24:19 2003
From: carel.fellinger at chello.nl (Carel Fellinger)
Date: Mon Jan 13 14:36:19 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <w53ptr0ki7c.fsf@woozle.org>
References: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
	<05d801c2ba36$ef200a90$530f8490@eden> <20030113111344.GA12027@mail.felnet>
	<w53ptr0ki7c.fsf@woozle.org>
Message-ID: <20030113192419.GA17717@mail.felnet>

On Mon, Jan 13, 2003 at 10:39:19AM -0800, Neale Pickett wrote:
...
> That's an interesting idea.  It made me wonder if it wouldn't be helpful
> to have some sort of application which trained on some data, then ran a
> few scoring runs on other data with various options set, and reported
> back what your "ideal" options are.  Does that make sense?  This could
> be a part of the initial training step.

The problem with initialisation is that there is no data to start
with, so such a "default adaptor" can't come into play until you've
gathered some spam and ham.  But as part of this extra fine tuning
step I proposed, it sure seems interesting to try to derive good
settings for the options at several moments in time and see whether
they differ widely for all us early adaptors.


-- 
groetjes, carel

From neale at woozle.org  Mon Jan 13 11:42:56 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 14:43:07 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <20030113192419.GA17717@mail.felnet> (Carel Fellinger's message
 of "Mon, 13 Jan 2003 20:24:19 +0100")
References: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
	<20030113111344.GA12027@mail.felnet> <w53ptr0ki7c.fsf@woozle.org>
	<20030113192419.GA17717@mail.felnet>
Message-ID: <w53hecckf9b.fsf@woozle.org>

Carel Fellinger <carel.fellinger@chello.nl> writes:

> The problem with initialisation is that there is no data to start
> with, so such a "default adaptor" can't come into play until you've
> gathered some spam and ham.  But as part of this extra fine tuning
> step I proposed, it sure seems interesting to try to derive good
> settings for the options at several moments in time and see whether
> they differ widely for all us early adaptors.

With my scheme at least, you are expected to at some point have some ham
and some spam.  Maybe not initially, but after a week or two you are
supposed to be collecting examples of both.

In any case, what you propose would work as a tuning tool, to be run
whenever you want to tune your config.  I would look at the existing
test programs and try to figure out a way to combine them.  I believe
Tim wrote it so that one set of trained data can be used over and over
for multiple types of scoring.  That and the existing support modules
should make it little more than chaining scoring methods together.

You still interested in doing this, Carel?

Neale


From richie at entrian.com  Mon Jan 13 20:42:03 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 13 15:42:25 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <w53n0m4khc1.fsf@woozle.org>
References: <200301110418.h0B4I4V23348@localhost.localdomain>
	<w53n0m4khc1.fsf@woozle.org>
Message-ID: <as762v4rg71s090avbt7c2sn53ltl0gu0i@4ax.com>


[Anthony]
> If anyone wants to help, let me know...

If testing counts as helping... I've tested all the pieces I use, and
they're all fine on the reorg-branch.  This re-organisation is a very good
plan.

Two questions:

Should we also have a 'resources' directory, or similar?  I've nearly
finished splitting the HTML components out of pop3proxy.py and
OptionConfig.py and into an external (viewable, editable) HTML file.  At
the moment I have that living with the source code, and being found via
__file__.  Maybe things like the HTML (and images files and whatever else)
should have their own subdirectory.  It could be found by __file__ (or
sys.argv[0] for some future frozen version) by default, or become a
configuration option if there's ever a reason for that.

Second, is it sensible to check in major edits at the moment?  I guess
things like that should wait until the reorg-branch is merged back onto the
head?  What with files being moved, CVS isn't going to be much help with
the merge.  Of course, if it's a dead cert that the reorg-branch will be
merged back (and I can't see why we wouldn't do that) then edits could just
be committed to that.

[Neale]
> Here's what I propose for hammie & co:

That looks very sensible.  I'd also suggest we move pop3graph.py into
utilities - it's not important enough to live at the top level.

-- 
Richie Hindle
richie@entrian.com


From mhammond at skippinet.com.au  Tue Jan 14 12:20:05 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Jan 13 20:20:52 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <20030113111344.GA12027@mail.felnet>
Message-ID: <03d501c2bb6b$12cc9f50$530f8490@eden>

> Willing to do more then just give feedback, at least I would:)
>
> Suppose "spambayes --slash-training-styles" would run against several
> databases, each of those databases keeping track of the probs for a
> particular training style, adding extra headers indicating if and how
> the different databases scored this particular email.  Me, I would be
> willing then to be carefull to train according to all training style
> candidates simultaniously.
>
> The advantage would be that we would be comparing all training methods
> on the same data.

My idea was closer to the existing test harness we have.  I was thinking of
somehow formalizing Tim's original hapax experiments.

>From my limited playing with our test harness, it seems that we simply pick
random messages from our ham and spam folders, train over these messages,
then score these messages against the trained data.  This hasn't been as
important for a few months, as the algorithm hasn't changed in that period.

What if we changed this to perform a "time ordered" selection of messages?

For example, off the top of my head, I can see 2 training candidates (there
would be a number more, but let's start with just 2):

* Do not start filtering until we have, say, 20 spam and 20 ham.  Once we
reach this threshold, we go into a little "initial training mode".  This
mode trains on the ham and spam, then scores the entire inbox.  We continue
until the user indicates there are no spams left in their inbox.

* Start filtering immediately, but only incrementally train on either
incorrect or unsure classifications.

Our test harness would be designed to test multiple strategies over our
standard corpa.  Instead of random messages, time-ordered message would be
iterated over.  Results similar to the existing ones are produced, so we can
compare results over vastly different mail stores.  IMO, it is far more
important to know the best training strategy across vastly different mail
stores than to know which strategy works best on any individual's store.

I am pretty sure this is similar to your idea, but I thought it worth
pointing out that we possibly already have some test framework we can
leverage here.

Mark.


From tony at lownds.com  Mon Jan 13 17:35:44 2003
From: tony at lownds.com (Tony Lownds)
Date: Mon Jan 13 20:43:23 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
Message-ID: <a05200f17ba491567a9c6@[10.0.1.3]>

Hi All,

I just started using Spambayes with Eudora. It works fine - 
fantastically in fact - for one POP account, but some limitations of 
Eudora are making using two POP accounts very problematic.

As far as I can tell, Eudora can have multiple POP accounts with 
different POP servers, but the port cannot be changed using normal 
means. Even through extraordinary means (installing an "Esoteric 
Settings" plugin), the port number is only changeable at a global 
level, not per-POP account.

Since Spambayes listens on a different port for each proxied server, 
I am limited to one spam-free account right now.

Has anyone had luck using Eudora with multiple POP accounts going 
through pop3proxy?

(Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2)

Thanks,
-Tony

From tim at fourstonesExpressions.com  Mon Jan 13 19:46:40 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 13 20:47:16 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <a05200f17ba491567a9c6@[10.0.1.3]>
Message-ID: <HGO07DB09NHA9VSQNNI7683C7QM06A5.3e236c00@myst>

Woah... that's a serious problem.  Richie and I will have to give that one 
some thought... We'll get back to ya on that!

- TimS

1/13/2003 7:35:44 PM, Tony Lownds <tony@lownds.com> wrote:

>Hi All,
>
>I just started using Spambayes with Eudora. It works fine - 
>fantastically in fact - for one POP account, but some limitations of 
>Eudora are making using two POP accounts very problematic.
>
>As far as I can tell, Eudora can have multiple POP accounts with 
>different POP servers, but the port cannot be changed using normal 
>means. Even through extraordinary means (installing an "Esoteric 
>Settings" plugin), the port number is only changeable at a global 
>level, not per-POP account.
>
>Since Spambayes listens on a different port for each proxied server, 
>I am limited to one spam-free account right now.
>
>Has anyone had luck using Eudora with multiple POP accounts going 
>through pop3proxy?
>
>(Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2)
>
>Thanks,
>-Tony
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim.one at comcast.net  Mon Jan 13 20:49:29 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Jan 13 20:50:04 2003
Subject: [Spambayes] changing various Options settings.
In-Reply-To: <200301130655.h0D6tNc16780@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCCEHADIAB.tim.one@comcast.net>

[Anthony Baxter]
> Before we do a proper release of this puppy, there's a few options-
> related changes I'd like to suggest:
>
> First off, all the stuff that's currently under 'TestDriver' that gets
> used by real people needs to be moved.

+1.

> I'm looking at
>
> ham_cutoff:  0.20
> spam_cutoff: 0.90
>
> in particular. Unfortunately, changing this will break everyone who's
> currently got the system deployed. Rather than doing this, I suggest
> we add a new section 'Categorization', and add cutoff_ham and cutoff_spam
> options. We can then change the code to use the new options rather than
> the old - it means people with the old code will get their preferences
> ignored until they upgrade, but the alternative is to make it break
> for everyone.

-1.  Just move it and post an announcement here.  Nothing will break until
somebody synchs up with CVS, and if they're pulling pre-alpha code out of
CVS, they better be reading this list.  Recovery is easy.


From rbyrnes at ozemail.com.au  Tue Jan 14 12:56:18 2003
From: rbyrnes at ozemail.com.au (Rob B)
Date: Mon Jan 13 20:56:45 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <a05200f17ba491567a9c6@[10.0.1.3]>
Message-ID: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1>

At 12:35 14/01/2003, Tony Lownds sent this up the stick:
>As far as I can tell, Eudora can have multiple POP accounts with different 
>POP servers, but the port cannot be changed using normal means. Even 
>through extraordinary means (installing an "Esoteric Settings" plugin), 
>the port number is only changeable at a global level, not per-POP account.
>
>Since Spambayes listens on a different port for each proxied server, I am 
>limited to one spam-free account right now.
>
>Has anyone had luck using Eudora with multiple POP accounts going through 
>pop3proxy?

Sure have

>(Using Eudora 5.2 on Mac OS X 10.2.3 w/python 2.2)

Dunno about Eudora on a Mac ... but on peecee if you open up the 
Personalities pane, you should be able to edit each account 
individually.  If you go through the Tools menu, then this is a global change.

cheers,
Rob

(Eudora 5.1 on Win NT 4.0 <shudder> - python 2.2)


--
A little madness now and then is relished by the wisest men.

This is random quote 162 of a collection of 1273

Distance from the centre of the brewing universe:
[15200.8 km (8207.8 mi), 262.8 deg](Apparent) Rennerian

Public Key fingerprint = 6219 33BD A37B 368D 29F5  19FB 945D C4D7 1F66 D9C5


From anthony at interlink.com.au  Tue Jan 14 13:16:39 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 13 21:17:57 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <w53n0m4khc1.fsf@woozle.org> 
Message-ID: <200301140216.h0E2Gdt25884@localhost.localdomain>


>>> Neale Pickett wrote
> Perhaps it's time rename things according to what they do and move the
> emphasis away from testing.  Here's what I propose for hammie & co:

Um, don't do this! It will conflict with what I've done on the 
reorg-branch, already. Check it out and examine what's been moved
there...

> Shall I barge ahead with this?

Nooooo 

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Jan 14 13:22:02 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 13 21:23:11 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <a05200f17ba491567a9c6@[10.0.1.3]> 
Message-ID: <200301140222.h0E2M2P26012@localhost.localdomain>


>>> Tony Lownds wrote
> As far as I can tell, Eudora can have multiple POP accounts with 
> different POP servers, but the port cannot be changed using normal 
> means. Even through extraordinary means (installing an "Esoteric 
> Settings" plugin), the port number is only changeable at a global 
> level, not per-POP account.

What about using multiple virtual loopback interfaces? 127.0.0.1, 
127.0.0.2, &c, and making pop3proxy use getpeername() to look up
what address it is you've called?


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From skip at pobox.com  Mon Jan 13 20:27:55 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 13 21:28:07 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <HGO07DB09NHA9VSQNNI7683C7QM06A5.3e236c00@myst>
References: <a05200f17ba491567a9c6@[10.0.1.3]>
        <HGO07DB09NHA9VSQNNI7683C7QM06A5.3e236c00@myst>
Message-ID: <15907.30123.985476.218448@montanaro.dyndns.org>


    >> Has anyone had luck using Eudora with multiple POP accounts going
    >> through pop3proxy?

    Tim> Woah... that's a serious problem.  Richie and I will have to give
    Tim> that one some thought... We'll get back to ya on that!

Is there any reason that in principle pop3proxy can't multiplex the content
it receives from several different servers on a single output port?

Skip


From anthony at interlink.com.au  Tue Jan 14 13:55:55 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 13 21:57:06 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <15907.30123.985476.218448@montanaro.dyndns.org> 
Message-ID: <200301140255.h0E2ttG26338@localhost.localdomain>


> Is there any reason that in principle pop3proxy can't multiplex the content
> it receives from several different servers on a single output port?

The problem is knowing, when it gets a connection from the mail client,
which server the mail client wishes to talk to.

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Jan 14 14:00:36 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 13 22:01:45 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <as762v4rg71s090avbt7c2sn53ltl0gu0i@4ax.com> 
Message-ID: <200301140300.h0E30aJ26394@localhost.localdomain>


>>> Richie Hindle wrote
> If testing counts as helping... I've tested all the pieces I use, and
> they're all fine on the reorg-branch.  This re-organisation is a very good
> plan.

Well, I think I'm about done with it, so I'll be merging back into
the trunk shortly. You _will_ need to do a cvs up -dP to get the
new directories.

> Should we also have a 'resources' directory, or similar?  I've nearly
> finished splitting the HTML components out of pop3proxy.py and
> OptionConfig.py and into an external (viewable, editable) HTML file.  At
> the moment I have that living with the source code, and being found via
> __file__.  Maybe things like the HTML (and images files and whatever else)
> should have their own subdirectory.  It could be found by __file__ (or
> sys.argv[0] for some future frozen version) by default, or become a
> configuration option if there's ever a reason for that.

There's a few problems with that - the first, as you pointed out, is
finding the damn files. The second is getting distutils to do the
right thing with them. What we ended up doing with roundup was to
bundle all of the resources up into a separate python module, and
get it with 'import'.

> Second, is it sensible to check in major edits at the moment?  I guess
> things like that should wait until the reorg-branch is merged back onto the
> head?  What with files being moved, CVS isn't going to be much help with
> the merge.  Of course, if it's a dead cert that the reorg-branch will be
> merged back (and I can't see why we wouldn't do that) then edits could just
> be committed to that.

Wait til I merge the branch. Will be later this afternoon.


> That looks very sensible.  I'd also suggest we move pop3graph.py into
> utilities - it's not important enough to live at the top level.

Ok - I'll do that before the merge.


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tony at lownds.com  Mon Jan 13 18:45:47 2003
From: tony at lownds.com (Tony Lownds)
Date: Mon Jan 13 22:13:07 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <200301140222.h0E2M2P26012@localhost.localdomain>
References: <200301140222.h0E2M2P26012@localhost.localdomain>
Message-ID: <a05200f19ba4926cabcc5@[10.0.1.3]>

At 1:22 PM +1100 1/14/03, Anthony Baxter wrote:
>What about using multiple virtual loopback interfaces? 127.0.0.1,
>127.0.0.2, &c, and making pop3proxy use getpeername() to look up
>what address it is you've called?

I will try this. Adding another loopback interface has to be done 
from the command line:

sudo ifconfig lo0 inet 127.0.0.2 add

Because its not done through the OS' configuration GUI, I'm not sure 
the settings will be saved after a restart. FYI, adding a another 
interface for an ethernet port IS easily done through the GUI.

I'll see if my loopback address stays around after a restart.

-Tony

From tony at lownds.com  Mon Jan 13 19:17:40 2003
From: tony at lownds.com (Tony Lownds)
Date: Mon Jan 13 22:31:00 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1>
References: <5.1.1.6.2.20030114125159.01d6b270@127.0.0.1>
Message-ID: <a05200f1dba49310c244d@[10.0.1.3]>

At 12:56 PM +1100 1/14/03, Rob B wrote:
>>Has anyone had luck using Eudora with multiple POP accounts going 
>>through pop3proxy?
>
>Sure have

It turns out that Rob's accounts were on the same server, so he was 
lucky enough to avoid this tar pit.

-Tony

From tim at fourstonesExpressions.com  Mon Jan 13 21:47:50 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 13 22:48:27 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
Message-ID: <GASO83VTPLDB5321PJUQNLMJA9JI4W1Y.3e238866@myst>

1/13/2003 8:55:55 PM, Anthony Baxter <anthony@interlink.com.au> wrote:

>
>> Is there any reason that in principle pop3proxy can't multiplex the content
>> it receives from several different servers on a single output port?
>
>The problem is knowing, when it gets a connection from the mail client,
>which server the mail client wishes to talk to.

Correct.  This is the problem.  Nothing in the pop3 conversation gives any 
indication as to what server is on the other end of the line...

- Tim S

>
>-- 
>Anthony Baxter     <anthony@interlink.com.au>   
>It's never too late to have a happy childhood.
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Mon Jan 13 20:19:22 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 23:19:33 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <200301140255.h0E2ttG26338@localhost.localdomain> (Anthony
 Baxter's message of "Tue, 14 Jan 2003 13:55:55 +1100")
References: <200301140255.h0E2ttG26338@localhost.localdomain>
Message-ID: <w5365ssjrcl.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

>> Is there any reason that in principle pop3proxy can't multiplex the content
>> it receives from several different servers on a single output port?
>
> The problem is knowing, when it gets a connection from the mail client,
> which server the mail client wishes to talk to.

Wait, why can't you just log in with a username of

  username@hostname

?

Or username:hostname or username@@hostname or whatever.  The point is,
you'd send the name of the POP server you're trying to contact as part
of the username.  We used to do this to vhost pop accounts back at the
big dot-bomb ISP where I worked.  AFAIK, it worked great.

Neale

From neale at woozle.org  Mon Jan 13 20:23:17 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 13 23:23:21 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <200301140216.h0E2Gdt25884@localhost.localdomain> (Anthony
 Baxter's message of "Tue, 14 Jan 2003 13:16:39 +1100")
References: <200301140216.h0E2Gdt25884@localhost.localdomain>
Message-ID: <w533cnwjr62.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

> Um, don't do this! It will conflict with what I've done on the 
> reorg-branch, already. Check it out and examine what's been moved
> there...

Oh my!  That's what I get for only half paying attention.

Well then, I retract my proposal entirely.  Good work in the reorg, I
like it.

Although I still think there should be a "contrib" or somesuch directory
to tuck away things like hammiesrv and hammiecli, which are mostly only
of academic use.  (Well, there's one guy on here using hammiesrv, but he
seemed nice enough not to mind being labelled "academic" :)

>> Shall I barge ahead with this?
>
> Nooooo 

Right.  Thanks for stopping me, Anthony.  I can be a little bullheaded
sometime, so it's good for people to put down fenceposts for me to bonk
my head against now and again ;)

Neale

From tim at fourstonesExpressions.com  Mon Jan 13 22:24:11 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 13 23:24:46 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <w5365ssjrcl.fsf@woozle.org>
Message-ID: <1UB6USFEAPLRC0C0ROTN6ZIDGFMKXU.3e2390eb@myst>

1/13/2003 10:19:22 PM, Neale Pickett <neale@woozle.org> wrote:

>Anthony Baxter <anthony@interlink.com.au> writes:
>
>>> Is there any reason that in principle pop3proxy can't multiplex the 
content
>>> it receives from several different servers on a single output port?
>>
>> The problem is knowing, when it gets a connection from the mail client,
>> which server the mail client wishes to talk to.
>
>Wait, why can't you just log in with a username of
>
>  username@hostname
>
>?
>
>Or username:hostname or username@@hostname or whatever.  The point is,
>you'd send the name of the POP server you're trying to contact as part
>of the username.  We used to do this to vhost pop accounts back at the
>big dot-bomb ISP where I worked.  AFAIK, it worked great.

There ya go... it's a hack, but relatively elegant.  pop3proxy would have to 
be altered to recognize the pattern, but that shouldn't be too difficult.  I 
was thinking of a scheme where the proxy would recognize that multiple servers 
were being proxied on the same port, and do the LIST and RETR stuff on both, 
and send the stuff back on that single port, where a filter could be set up to 
route the incoming mail to the correct inbox based on the headers.  Your idea 
might be a bit easier than that...

- TimS
>
>Neale
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Mon Jan 13 22:51:14 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 00:05:48 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <200301140255.h0E2ttG26338@localhost.localdomain>
References: <15907.30123.985476.218448@montanaro.dyndns.org>
        <200301140255.h0E2ttG26338@localhost.localdomain>
Message-ID: <15907.38722.282150.490057@montanaro.dyndns.org>


    >> Is there any reason that in principle pop3proxy can't multiplex the
    >> content it receives from several different servers on a single output
    >> port?

    Anthony> The problem is knowing, when it gets a connection from the mail
    Anthony> client, which server the mail client wishes to talk to.

All of them?

S


From tim at fourstonesExpressions.com  Mon Jan 13 23:09:48 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 14 00:10:24 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <15907.38722.282150.490057@montanaro.dyndns.org>
Message-ID: <YVXR3Z3164KFUREC6363HGEAJIUUJ.3e239b9c@myst>

1/13/2003 10:51:14 PM, Skip Montanaro <skip@pobox.com> wrote:

>
>    >> Is there any reason that in principle pop3proxy can't multiplex the
>    >> content it receives from several different servers on a single output
>    >> port?
>
>    Anthony> The problem is knowing, when it gets a connection from the mail
>    Anthony> client, which server the mail client wishes to talk to.
>
>All of them?

Not at all... the mail client will query specific servers on specific 
schedules or upon request by the user.  I have three accounts configured on 
mine (not Eudora).  One one of them, I have the client automatically check for 
new mail every minute, on another it checks every five minutes, and one I only 
check occasionally and that by manual request only (I push the check button 
every now and then).  The proxy would normally not query all the accounts and 
send mail back from all of them, because the client is only expecting mail 
from one of them... - TimS

>
>S
>
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From anthony at interlink.com.au  Tue Jan 14 16:31:16 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 00:32:37 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <w5365ssjrcl.fsf@woozle.org> 
Message-ID: <200301140531.h0E5VGj17824@localhost.localdomain>


>>> Neale Pickett wrote
> Wait, why can't you just log in with a username of
> 
>   username@hostname

Last time I looked, Eudora ate everything to the right of an @ sign
as the server name. And the field was limited to something like 14
characters.

It works fine with real mailers <wink> just not eudora.


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From neale at woozle.org  Mon Jan 13 21:36:29 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 14 00:36:39 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <200301140531.h0E5VGj17824@localhost.localdomain> (Anthony
 Baxter's message of "Tue, 14 Jan 2003 16:31:16 +1100")
References: <200301140531.h0E5VGj17824@localhost.localdomain>
Message-ID: <w53wul8i97m.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

> Last time I looked, Eudora ate everything to the right of an @ sign
> as the server name. And the field was limited to something like 14
> characters.

Oh, man, that sucks!

Okay then, I guess the only thing for it is to have a map in python from
usernames to username/host combinations.

{'user1': ('pop3.bigmailhost.net', 'neale'),
 'user2': ('pop3.mediumhost.net', 'npickett')}

etc.

Would that would be a workable fallback for the @-impaired?

Neale

From anthony at interlink.com.au  Tue Jan 14 16:40:39 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 00:41:51 2003
Subject: [Spambayes] merge is done. cvs up -dP time.
Message-ID: <200301140540.h0E5eds17982@localhost.localdomain>


The reorg-branch has been merged into the trunk. If you're running from CVS,
you will need to do a cvs up -dP to get a working version.

There might be a cvs commit message, but it's probable that mailman
will bitch that it's too large and won't let it straight through.

I'm about to make the options change - this _will_ break your existing
customised options files.


--
Anthony Baxter     <anthony@interlink.com.au>
It's never too late to have a happy childhood.


From tony-bayes at lownds.com  Mon Jan 13 22:49:04 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Tue Jan 14 01:49:03 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
Message-ID: <a05200f02ba49612f02f9@[10.0.1.3]>

At 1:22 PM +1100 1/14/03, Anthony Baxter wrote:
>What about using multiple virtual loopback interfaces? 127.0.0.1,
>127.0.0.2, &c, and making pop3proxy use getpeername() to look up
>what address it is you've called?

This approach is working great, and no getpeername() call is needed. 
I have a startup script that sets everything up, I even have the 
actual POP traffic tunneled over ssh.

Here is my startup script:

------- SpamBayes.command ---------
#!/bin/sh
clear
ulimit -s 2048
cd ~/spambayes
sudo ifconfig lo0 inet 127.0.0.2 add
ssh -N -L 1110:127.0.0.1:110 tony@server1.com &
ssh -N -L 1111:127.0.0.1:110 tony@server2.com &
sudo python pop3proxy.py

And the relevant lines from bayescustomize.ini:

pop3proxy_ports = 127.0.0.1:110, 127.0.0.2:110
pop3proxy_servers = localhost:1110, localhost:1111

The diff to pop3proxy.py is attached.

-Tony-------------- next part --------------
A non-text attachment was scrubbed...
Name: bind_address.patch
Type: application/mac-binhex40
Size: 7188 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030113/b1ea0340/bind_address.bin
From anthony at interlink.com.au  Tue Jan 14 17:54:53 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 01:56:10 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <a05200f02ba49612f02f9@[10.0.1.3]> 
Message-ID: <200301140654.h0E6srh18800@localhost.localdomain>


>>> Tony Lownds wrote
> The diff to pop3proxy.py is attached.

Erm. The patch came through as a "application/mac-binhex40". Could
you re-send with a more... standard... format? 

Ta

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Jan 14 18:41:03 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 02:42:15 2003
Subject: [Spambayes] what else is needed for a first (source) release?
Message-ID: <200301140741.h0E7f3R19057@localhost.localdomain>


Ok, can people nominate things that they think would be good before a
first release? I'd like to try and get one out before the spam
conference (it's as good a date as any :) 

I figure we make the first release a source only one - which isn't a
biggie for most people, since it's in python. Should the release just
have what's in the current setup.py (which doesn't include the Outlook2000
directory), with a separate release for the O2K plugin+core? Or should
it all be in one big bundle? 

I'm fiddling around with the documentation on the website - trying
to explain how it all works in terms that my partner can understand
(my usual approach to making sure non-technical explanations are 
clear enough).

Anthony

From vanhorn at whidbey.com  Mon Jan 13 23:45:06 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Tue Jan 14 02:45:10 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
References: <200301140531.h0E5VGj17824@localhost.localdomain>
Message-ID: <3E23C002.DE810CEF@whidbey.com>

Anthony Baxter wrote:

> >>> Neale Pickett wrote
> > Wait, why can't you just log in with a username of
> >
> >   username@hostname
>
> Last time I looked, Eudora ate everything to the right of an @ sign
> as the server name. And the field was limited to something like 14
> characters.

I haven't run Eudora through a proxy for a long time, but I don't think
that character limit you site has been in place recently. Eudora had no
problem picking up mail for vanhorn@coldwellbankerwhidbey.com or
vanhorn@verbose.twistedhistory.com, I'm pretty sure those are well over
14 chars.

Van (vanhorn@more.domains.than.you.can.shake.a.stick.at.org)


--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------


From francois.granger at free.fr  Tue Jan 14 11:19:39 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Tue Jan 14 05:23:50 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <200301140654.h0E6srh18800@localhost.localdomain>
Message-ID: <BA49A2CB.611E6%francois.granger@free.fr>

on 14/01/03 7:54, Anthony Baxter at anthony@interlink.com.au wrote:

> 
>>>> Tony Lownds wrote
>> The diff to pop3proxy.py is attached.
> 
> Erm. The patch came through as a "application/mac-binhex40". Could
> you re-send with a more... standard... format?

Open it with a text editor. It is pure texte with unix EOL
-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From just at letterror.com  Tue Jan 14 11:34:12 2003
From: just at letterror.com (Just van Rossum)
Date: Tue Jan 14 05:34:19 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <BA49A2CB.611E6%francois.granger@free.fr>
Message-ID: <r01050400-1023-BFB0828227AB11D7BEE7003065D5E7E4@[10.0.0.23]>

Fran?ois Granger wrote:

> > Erm. The patch came through as a "application/mac-binhex40". Could
> > you re-send with a more... standard... format?
> 
> Open it with a text editor. It is pure texte with unix EOL

No it's not, it's encoded as binhex:

--============_-1169595545==_============
Content-Type: application/mac-binhex40; Name="bind_address.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment

(This file must be converted with BinHex 4.0)

:%Q*TEQ4IB@4NFQ9cFbj`BA4MD!!!!!!!!!!!!!!!!!!8E!!!!!$$`5SU+L"`Eh!
cF(*[H(NZF(N*6@pZ)%TKEL!a-b!a0$Se16Sb-L!b-$!c#LdY,5"`Eh!cF(*[H(P

etc.

I've attached a decoded version, reencoded as base64.

Just-------------- next part --------------
z'??mj?Zr?????+???t??y??u?]??,??\
<?y?i?'?*'??-z?-???J,??O4o+^?7?rz-j?????yJ???i????Zr?'????*^?f???)jf????j?Zr?????M7?]x????????????v?????
From francois.granger at free.fr  Tue Jan 14 13:58:30 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Tue Jan 14 07:58:45 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <r01050400-1023-BFB0828227AB11D7BEE7003065D5E7E4@[10.0.0.23]>
Message-ID: <BA49C806.61214%francois.granger@free.fr>

on 14/01/03 11:34, Just van Rossum at just@letterror.com wrote:

> Fran?ois Granger wrote:
> 
>>> Erm. The patch came through as a "application/mac-binhex40". Could
>>> you re-send with a more... standard... format?
>> 
>> Open it with a text editor. It is pure texte with unix EOL
> 
> No it's not, it's encoded as binhex:

Apology, it was automatically decoded at my end.
-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From tony at lownds.com  Mon Jan 13 23:06:10 2003
From: tony at lownds.com (Tony Lownds)
Date: Tue Jan 14 08:54:27 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <200301140654.h0E6srh18800@localhost.localdomain>
References: <200301140654.h0E6srh18800@localhost.localdomain>
Message-ID: <a05200f08ba4966b84f21@[204.162.121.104]>

At 5:54 PM +1100 1/14/03, Anthony Baxter wrote:
>  >>> Tony Lownds wrote
>>  The diff to pop3proxy.py is attached.
>
>Erm. The patch came through as a "application/mac-binhex40". Could
>you re-send with a more... standard... format?
>
>Ta
>

You mean there's something more standard than mac-binhex40?

:)

*** pop3proxy.py        Mon Jan 13 14:59:22 2003
--- pop3proxy_peer.py   Mon Jan 13 21:56:27 2003
***************
*** 157,164 ****
       dispatchers created by a factory callable.
       """

!     def __init__(self, port, factory, factoryArgs=(),
!                  socketMap=asyncore.socket_map):
           asyncore.dispatcher.__init__(self, map=socketMap)
           self.socketMap = socketMap
           self.factory = factory
--- 157,164 ----
       dispatchers created by a factory callable.
       """

!     def __init__(self, port, factory, factoryArgs=(), listenAddress='',
!                  socketMap=asyncore.socket_map, ):
           asyncore.dispatcher.__init__(self, map=socketMap)
           self.socketMap = socketMap
           self.factory = factory
***************
*** 168,175 ****
           self.set_socket(s, socketMap)
           self.set_reuse_addr()
           if options.verbose:
!             print "%s listening on port %d." % 
(self.__class__.__name__, port)
!         self.bind(('', port))
           self.listen(5)

       def handle_accept(self):
--- 168,175 ----
           self.set_socket(s, socketMap)
           self.set_reuse_addr()
           if options.verbose:
!             print "%s listening on %s port %d." % 
(self.__class__.__name__, listenAddress, port)
!         self.bind((listenAddress, port))
           self.listen(5)

       def handle_accept(self):
***************
*** 390,397 ****

       def __init__(self, serverName, serverPort, proxyPort):
           proxyArgs = (serverName, serverPort)
!         Listener.__init__(self, proxyPort, BayesProxy, proxyArgs)
!         print 'Listener on port %d is proxying %s:%d' % (proxyPort, 
serverName, serverPort)


   class BayesProxy(POP3ProxyBase):
--- 390,403 ----

       def __init__(self, serverName, serverPort, proxyPort):
           proxyArgs = (serverName, serverPort)
!         bindAddress, bindPort = proxyPort
!         Listener.__init__(
!            self, bindPort, BayesProxy, proxyArgs,
!            listenAddress=bindAddress
!         )
!         print 'Listener on port %s is proxying %s:%d' % (
!                   _addressPortStr(proxyPort), serverName, serverPort
!               )


   class BayesProxy(POP3ProxyBase):
***************
*** 1251,1257 ****

           if options.pop3proxy_ports:
               splitPorts = options.pop3proxy_ports.split(',')
!             self.proxyPorts = map(int, map(string.strip, splitPorts))

           if len(self.servers) != len(self.proxyPorts):
               print "pop3proxy_servers & pop3proxy_ports are 
different lengths!"
--- 1257,1263 ----

           if options.pop3proxy_ports:
               splitPorts = options.pop3proxy_ports.split(',')
!             self.proxyPorts = map(_addressAndOrPort, 
map(string.strip, splitPorts))

           if len(self.servers) != len(self.proxyPorts):
               print "pop3proxy_servers & pop3proxy_ports are 
different lengths!"
***************
*** 1286,1292 ****
           versions of the details, for display in the Status panel."""
           serverStrings = ["%s:%s" % (s, p) for s, p in self.servers]
           self.serversString = ', '.join(serverStrings)
!         self.proxyPortsString = ', '.join(map(str, self.proxyPorts))

       def createWorkers(self):
           """Using the options that were initialised in __init__ and then
--- 1292,1298 ----
           versions of the details, for display in the Status panel."""
           serverStrings = ["%s:%s" % (s, p) for s, p in self.servers]
           self.serversString = ', '.join(serverStrings)
!         self.proxyPortsString = ', '.join(map(_addressPortStr, 
self.proxyPorts))

       def createWorkers(self):
           """Using the options that were initialised in __init__ and then
***************
*** 1333,1340 ****
--- 1339,1364 ----
               self.spamCorpus.addObserver(self.spamTrainer)
               self.hamCorpus.addObserver(self.hamTrainer)

+ # helper functions
+
+ def _addressAndOrPort(s):
+    if ':' in s:
+      addr, port = s.split(':')
+      return addr, int(port)
+    else:
+      return '', int(s)
+
+ def _addressPortStr((addr, port)):
+   if not addr:
+     return str(port)
+   else:
+     return '%s:%d' % (addr, port)
+
+ # globals
+
   state = State()

+ # main program

   def main(servers, proxyPorts, uiPort, launchUI):
       """Runs the proxy forever or until a 'KILL' command is received or
***************
*** 1573,1579 ****
       pop3Server.sendall("kill\r\n")
       pop3Server.recv(100)

-
   # ===================================================================
   # __main__ driver.
   # ===================================================================
--- 1597,1602 ----
***************
*** 1601,1607 ****
           elif opt == '-p':
               state.databaseFilename = arg
           elif opt == '-l':
!             state.proxyPorts = [int(arg)]
           elif opt == '-u':
               state.uiPort = int(arg)
           elif opt == '-z':
--- 1624,1630 ----
           elif opt == '-p':
               state.databaseFilename = arg
           elif opt == '-l':
!             state.proxyPorts = [_addressAndOrPort(arg)]
           elif opt == '-u':
               state.uiPort = int(arg)
           elif opt == '-z':

From barry at python.org  Tue Jan 14 09:10:18 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 09:10:50 2003
Subject: [Spambayes] what else is needed for a first (source) release?
References: <200301140741.h0E7f3R19057@localhost.localdomain>
Message-ID: <15908.6730.504794.353666@gargle.gargle.HOWL>


>>>>> "AB" == Anthony Baxter <anthony@interlink.com.au> writes:

    AB> Ok, can people nominate things that they think would be good
    AB> before a first release? I'd like to try and get one out before
    AB> the spam conference (it's as good a date as any :)

Although it might not be ready until my train pulls into South
Station, I'm working on a Mailman handler module for integration with
Spambayes.  The actual hook is pretty easy (using the hammie.py
interface) -- it's all the niddling little stuff <wink> like u/i,
moderation, training, configuration, etc. that's a bit rough around
the edges.

Probably won't be ready for the 1.0 release, but it might make a good
patch for a follow on.  (I'm trying to decide what to actually do with
it -- check it into a branch of Mailman, release it as a patch, etc...).

Having a spambayes package I can unpack in Mailman's pythonlib dir is
perfect.

-Barry

From francois.granger at free.fr  Tue Jan 14 16:05:37 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Tue Jan 14 10:09:19 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <w53wul8i97m.fsf@woozle.org>
Message-ID: <BA49E5D1.61227%francois.granger@free.fr>

on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote:

> Anthony Baxter <anthony@interlink.com.au> writes:
> 
>> Last time I looked, Eudora ate everything to the right of an @ sign
>> as the server name. And the field was limited to something like 14
>> characters.
> 
> Oh, man, that sucks!
> 
> Okay then, I guess the only thing for it is to have a map in python from
> usernames to username/host combinations.
> 
> {'user1': ('pop3.bigmailhost.net', 'neale'),
> 'user2': ('pop3.mediumhost.net', 'npickett')}

I would have an issue with this since I try to have the same login name on
various servers...
 francois.granger@free.fr
 francois.granger@laposte.net

A scheme where i would use alogin of francois.granger:free and
francois.granger:laposte would make it. Then pop3proxy just need to split on
":" and match the remaining "free" to "pop.free.fr".


-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From tim at fourstonesExpressions.com  Tue Jan 14 09:12:26 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 14 10:13:02 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
Message-ID: <F8B871E0TP76A9YW9797A7421X083Y.3e2428da@myst>

1/14/2003 9:05:37 AM, Fran�ois Granger <francois.granger@free.fr> wrote:

>on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote:
>
>> Anthony Baxter <anthony@interlink.com.au> writes:
>> 
>>> Last time I looked, Eudora ate everything to the right of an @ sign
>>> as the server name. And the field was limited to something like 14
>>> characters.
>> 
>> Oh, man, that sucks!
>> 
>> Okay then, I guess the only thing for it is to have a map in python from
>> usernames to username/host combinations.
>> 
>> {'user1': ('pop3.bigmailhost.net', 'neale'),
>> 'user2': ('pop3.mediumhost.net', 'npickett')}
>
>I would have an issue with this since I try to have the same login name on
>various servers...
> francois.granger@free.fr
> francois.granger@laposte.net
>
>A scheme where i would use alogin of francois.granger:free and
>francois.granger:laposte would make it. Then pop3proxy just need to split on
>":" and match the remaining "free" to "pop.free.fr".

This would only work for mail servers that conform to the 'standard' naming 
convention.  I have one mail server that is 'incoming.verizon.net'  We would 
need to do <username>:<complete server name>... this all gets ugly, and I'm 
wondering how successfully the 'average user' could set all this up...  - TimS
>
>
>-- 
>Le courrier est un moyen de communication. Les gens devraient
>se poser des questions sur les implications politiques des choix (ou non
>choix) de leurs outils et technologies. Pour des courriers propres :
><http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Tue Jan 14 09:21:09 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 10:21:13 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <GASO83VTPLDB5321PJUQNLMJA9JI4W1Y.3e238866@myst>
References: <GASO83VTPLDB5321PJUQNLMJA9JI4W1Y.3e238866@myst>
Message-ID: <15908.10981.796169.470346@montanaro.dyndns.org>


    >>> Is there any reason that in principle pop3proxy can't multiplex the
    >>> content it receives from several different servers on a single
    >>> output port?
    >> 
    >> The problem is knowing, when it gets a connection from the mail
    >> client, which server the mail client wishes to talk to.

    Tim> Correct.  This is the problem.  Nothing in the pop3 conversation
    Tim> gives any indication as to what server is on the other end of the
    Tim> line...

I'm still unclear what the problem is.  I use fetchmail to grab mail from
two POP servers at the moment.  Everything funnels into procmail which runs
hammie then distributes each message to one of several different accounts.
I don't care what the source of the message is.  What's the big deal?  A
source of mail is a source of mail.

Skip

From skip at pobox.com  Tue Jan 14 09:25:43 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 10:25:45 2003
Subject: [Spambayes] Using Spambayes w/ Eudora 
In-Reply-To: <YVXR3Z3164KFUREC6363HGEAJIUUJ.3e239b9c@myst>
References: <15907.38722.282150.490057@montanaro.dyndns.org>
        <YVXR3Z3164KFUREC6363HGEAJIUUJ.3e239b9c@myst>
Message-ID: <15908.11255.27102.87951@montanaro.dyndns.org>


    Anthony> The problem is knowing, when it gets a connection from the mail
    Anthony> client, which server the mail client wishes to talk to.

    >> All of them?

    Tim> Not at all... the mail client will query specific servers on
    Tim> specific schedules or upon request by the user.

Oh, okay, I get it.  I actually have two fetchmail schedules one that runs
every five minutes and one that runs every 30 minutes.  I forget that
Windows users have to do all that fiddling from within whatever mail client
they run.

(They actually use Eudora on Windows here at Northwestern as the "supported"
email software.  I haven't touched it, preferring instead to just bring my
trusty Powerbook to work with me and continue using my
fetchmail/hammie/XEmacs/VM combination.)

Skip

From jeremy at alum.mit.edu  Tue Jan 14 10:15:11 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue Jan 14 10:28:10 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <w5365ssjrcl.fsf@woozle.org>
References: <200301140255.h0E2ttG26338@localhost.localdomain>
	<w5365ssjrcl.fsf@woozle.org>
Message-ID: <15908.10623.343421.523340@slothrop.zope.com>

>>>>> "NP" == Neale Pickett <neale@woozle.org> writes:

  NP> Wait, why can't you just log in with a username of

  NP>   username@hostname

  NP> ?

The last time I suggested this for pop3proxy, someone mentioned that
several clients issue commands before the login such that the proxy
wouldn't be able to guess what server it was for.

But it sounds like we now have a report of a client that is difficult
to configure without adding the username to the servername.  I think
it should be an option to do either.  It's certainly more attractive
for configuration: The user never needs to configure anything in the
proxy beyond the port; all the per-server configuration can be done
using the client's configuration system.

  NP> Or username:hostname or username@@hostname or whatever.  The
  NP> point is, you'd send the name of the POP server you're trying to
  NP> contact as part of the username.  We used to do this to vhost
  NP> pop accounts back at the big dot-bomb ISP where I worked.
  NP> AFAIK, it worked great.

There's code in pspam/pop.py that does this.  It's not difficult.

Jeremy


From francois.granger at free.fr  Tue Jan 14 16:43:46 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Tue Jan 14 10:43:56 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <F8B871E0TP76A9YW9797A7421X083Y.3e2428da@myst>
Message-ID: <BA49EEC2.61247%francois.granger@free.fr>

on 14/01/03 16:12, Tim Stone - Four Stones Expressions at
tim@fourstonesExpressions.com wrote:

> 1/14/2003 9:05:37 AM, Fran?ois Granger <francois.granger@free.fr> wrote:
> 
>> on 14/01/03 6:36, Neale Pickett at neale@woozle.org wrote:
>> 
>>> Anthony Baxter <anthony@interlink.com.au> writes:
>>> 
>>>> Last time I looked, Eudora ate everything to the right of an @ sign
>>>> as the server name. And the field was limited to something like 14
>>>> characters.
>>> 
>>> Oh, man, that sucks!
>>> 
>>> Okay then, I guess the only thing for it is to have a map in python from
>>> usernames to username/host combinations.
>>> 
>>> {'user1': ('pop3.bigmailhost.net', 'neale'),
>>> 'user2': ('pop3.mediumhost.net', 'npickett')}
>> 
>> I would have an issue with this since I try to have the same login name on
>> various servers...
>> francois.granger@free.fr
>> francois.granger@laposte.net
>> 
>> A scheme where i would use alogin of francois.granger:free and
>> francois.granger:laposte would make it. Then pop3proxy just need to split on
>> ":" and match the remaining "free" to "pop.free.fr".
> 
> This would only work for mail servers that conform to the 'standard' naming
> convention.  I have one mail server that is 'incoming.verizon.net'  We would
> need to do <username>:<complete server name>... this all gets ugly, and I'm
> wondering how successfully the 'average user' could set all this up...  - TimS

If we have options in the ini file like:

# format : mod:login@local:port remote_login@remote.server:remote_port
account1 : francois.granger:laposte@127.0.0.1:110
francois.granger@pop.lapost.net:111

For each account, we have all needed parameters but password.

-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From neale at woozle.org  Tue Jan 14 08:42:10 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 14 11:42:21 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <BA49EEC2.61247%francois.granger@free.fr>
 =?iso-8859-1?q?(Fran=E7ois?= Granger's message of "Tue, 14 Jan 2003
 16:43:46 +0100")
References: <BA49EEC2.61247%francois.granger@free.fr>
Message-ID: <w53lm1nisyl.fsf@woozle.org>

Fran?ois Granger <francois.granger@free.fr> writes:

> If we have options in the ini file like:
>
> # format : mod:login@local:port remote_login@remote.server:remote_port
> account1 : francois.granger:laposte@127.0.0.1:110
> francois.granger@pop.lapost.net:111
>
> For each account, we have all needed parameters but password.

Right.  That's what I meant :)

Neale

From neale at woozle.org  Tue Jan 14 08:48:46 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 14 11:48:50 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain> (Anthony
 Baxter's message of "Tue, 14 Jan 2003 18:41:03 +1100")
References: <200301140741.h0E7f3R19057@localhost.localdomain>
Message-ID: <w53iswrisnl.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

> Ok, can people nominate things that they think would be good before a
> first release? I'd like to try and get one out before the spam
> conference (it's as good a date as any :)

Word.  Are you going to the spam conference too, Anthony?

I've been using hammiefilter and mboxtrain for over a month now with no
complaints, so I think that little corner of the code is ready for a
release.

We may want to merge HAMMIE.txt or some subset of it into a top-level
README.  Setting up procmail-based filtering, at this point, is a piece
of cake.

Neale

From richie at entrian.com  Tue Jan 14 16:50:08 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 11:50:33 2003
Subject: [Spambayes] re-org - making a package &c. 
In-Reply-To: <200301140300.h0E30aJ26394@localhost.localdomain>
References: <as762v4rg71s090avbt7c2sn53ltl0gu0i@4ax.com>
	<200301140300.h0E30aJ26394@localhost.localdomain>
Message-ID: <nie82vsplocogrlm3b03k9e7bqse3qac2h@4ax.com>


[Anthony]
> Well, I think I'm about done with it, so I'll be merging back into
> the trunk shortly.

Great!  Well done on putting in the effort on this.

[Richie]
> Should we also have a 'resources' directory, or similar?

[Anthony]
> There's a few problems with that - the first, as you pointed out, is
> finding the damn files. The second is getting distutils to do the
> right thing with them. What we ended up doing with roundup was to
> bundle all of the resources up into a separate python module, and
> get it with 'import'.

By coincidence (or is it?) Mike Fletcher has just announced his
ResourcePackage package, which does exactly this.  I'll look into using it
- it could be just what we need.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Tue Jan 14 17:01:07 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 12:01:28 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <15908.6730.504794.353666@gargle.gargle.HOWL>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.6730.504794.353666@gargle.gargle.HOWL>
Message-ID: <ueg82vkpqaofbbte26d28ctbju8ucb70ru@4ax.com>

Hi Barry,

> Although it might not be ready until my train pulls into South
> Station, I'm working on a Mailman handler module for integration with
> Spambayes.  The actual hook is pretty easy (using the hammie.py
> interface) -- it's all the niddling little stuff <wink> like u/i,
> moderation, training, configuration, etc. that's a bit rough around
> the edges.

It would be nice if you could leverage the existing web interface - I'm
working towards making it less monolithic right now, which might help.

-- 
Richie Hindle
richie@entrian.com


From skip at pobox.com  Tue Jan 14 11:04:08 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 12:05:56 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
Message-ID: <15908.17160.397749.855624@montanaro.dyndns.org>


    Anthony> Ok, can people nominate things that they think would be good
    Anthony> before a first release?

I have three changes available.  The first two will get checked in when the
off-campus link wakes up:

    * A -U option in hammiebulk.py which allows you to "untrain" a message.

    * Display the number of unsures as well as hams and spams (again, part
      of hammiebulk.py).

    * The ability for pop3proxy to fork off an external program before other
      setup (for tunnelling POP3 through ssh and such).  This is still
      untested and probably Unix-only.  I think I have some time today to
      try and test it.

What's pop3graph.py?  I tried executing "pop3graph.py --help" and got this
gibberish:

    /Users/skip/local/bin/pop3graph.py: Analyse the pop3proxy's caches and produce a graph of how accurate
    classifier has been over time.  Only really meaningful if you started
    with an empty database.: command not found
    from: can't read /var/mail/__future__.
    /Users/skip/local/bin/pop3graph.py: import: command not found
    from: can't read /var/mail/spambayes.
    from: can't read /var/mail/spambayes.FileCorpus.
    from: can't read /var/mail/spambayes.Options.
    /Users/skip/local/bin/pop3graph.py: line 12: syntax error near unexpected token `main()'
    /Users/skip/local/bin/pop3graph.py: line 12: `def main():'

It would appear it gets installed in your executable bin directory but
doesn't have a #! line.  I'll add one, but I wonder if this is indicative of
deeper problems.  Should it be installed?

Skip

From barry at python.org  Tue Jan 14 12:04:19 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 12:06:10 2003
Subject: [Spambayes] what else is needed for a first (source) release?
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<w53iswrisnl.fsf@woozle.org>
Message-ID: <15908.17171.346379.639313@gargle.gargle.HOWL>


>>>>> "NP" == Neale Pickett <neale@woozle.org> writes:

    NP> Word.  Are you going to the spam conference too, Anthony?

Hey, who else is going?  I'll be there and even giving a talk <heh>.

-Barry

From richie at entrian.com  Tue Jan 14 17:05:41 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 12:07:22 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
Message-ID: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com>


[Anthony]
> Ok, can people nominate things that they think would be good before a
> first release? I'd like to try and get one out before the spam
> conference (it's as good a date as any :) 

When is that date?  The Linux Journal articles on Spambayes are due out at
the beginning of February, so we should aim for whichever date is sooner.

I have two lists of things that I think need doing:

1. Things that my Linux Journal article implies will be ready by the time
   the article is published:

    o Integration with Mutt (and other clients) via a single-shot script
      (like a single-message version of Hammie, or new switches to Hammie
      itself) See http://www.linuxjournal.com/article.php?sid=6439  Does
      anyone have something like this already, or any requirements?  And I
      don't like to ask, but would anyone like to write this?  8-)  I was
      intending to do it myself (and still will if no-one else fancies the
      job) but I'm going to be rushed off my feet these next two weeks...

    o Web-based configuration, and new doco on setting up the POP3 proxy
      using it.  This is nearly there - I'm combining the pop3proxy.py web
      UI and Tim Stone's OptionConfig.py into one unified web interface.

    o Security for the web interface - done but not yet checked in.  All
      this can do at the moment is limit web connections to localhost, but
      at least it means you're not opening up your Spambayes system to all
      and sundry.


2. Things that have cropped up recently and need sorting out:

    o Silly memory usage by the POP3 proxy.  Done and awaiting checkin.

    o The Eudora problem.  This is nasty - I think we'll end up with a
      compromise here (as proposed by Jeremy) because I don't see a clean
      solution out there.  (See
      http://mail.python.org/pipermail/spambayes/2002-November/002054.html
      for an explanation of why combining the hostname and the username is
      not a perfect solution).

    o Integration of papaDoc's documentation into the website.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Tue Jan 14 17:18:36 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 12:19:01 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <15908.17160.397749.855624@montanaro.dyndns.org>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.17160.397749.855624@montanaro.dyndns.org>
Message-ID: <4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com>


[Skip]
> I have three changes available.  The first two will get checked in when the
> off-campus link wakes up:
> 
>     * A -U option in hammiebulk.py which allows you to "untrain" a message.

Ooo!  That sounds like a step towards "Integration with Mutt (and other
clients) via a single-shot script (like a single-message version of Hammie,
or new switches to Hammie itself)" as appears on my to-do list!

> What's pop3graph.py?  [...]  Should it be installed?

It's a silly toy that shouldn't be in the main scripts area (it creates an
ASCII graph of how accurate the POP3 proxy is over time).  With a #! line
it should work, but it's not important enough to live with the main scripts
- it should go in 'utilities'.  I'll move it when I next check in. 

-- 
Richie Hindle
richie@entrian.com


From neale at woozle.org  Tue Jan 14 09:34:49 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 14 12:35:06 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com> (Richie Hindle's
 message of "Tue, 14 Jan 2003 17:18:36 +0000")
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.17160.397749.855624@montanaro.dyndns.org>
	<4bh82vcr59pjebkg5pe0d5atavldufpmc8@4ax.com>
Message-ID: <w53fzrviqiu.fsf@woozle.org>

Richie Hindle <richie@entrian.com> writes:

> [Skip]
>> I have three changes available.  The first two will get checked in when the
>> off-campus link wakes up:
>> 
>>     * A -U option in hammiebulk.py which allows you to "untrain" a message.
>
> Ooo!  That sounds like a step towards "Integration with Mutt (and other
> clients) via a single-shot script (like a single-message version of Hammie,
> or new switches to Hammie itself)" as appears on my to-do list!

You should consider adding this to hammiefilter, along with the ability
to train while scoring.  My idea was to make hammiefilter the
single-message equivalent of hammiebulk.

Speaking of which, I think hammiebulk and mboxtrain can be merged if
hammiebulk gets a new "save state" flag.  IIRC I wrote mboxtrain with
the intention of it and hammiefilter replacing hammiebulk after a while.

Is anyone using the -u option to hammiebulk anymore?  Without -u (and
-f, which is duplicated by hammiefilter) hammiebulk would be the same
thing as mboxtrain.

Neale

From neale at woozle.org  Tue Jan 14 09:37:59 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 14 12:42:56 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> (Richie Hindle's
 message of "Tue, 14 Jan 2003 17:05:41 +0000")
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com>
Message-ID: <w53d6mziqdk.fsf@woozle.org>

Richie Hindle <richie@entrian.com> writes:

>     o Integration with Mutt (and other clients) via a single-shot script
>       (like a single-message version of Hammie, or new switches to Hammie
>       itself) See http://www.linuxjournal.com/article.php?sid=6439  Does
>       anyone have something like this already, or any requirements?  And I
>       don't like to ask, but would anyone like to write this?  8-)  I was
>       intending to do it myself (and still will if no-one else fancies the
>       job) but I'm going to be rushed off my feet these next two weeks...

Well, my wife is currently using mutt with hammiefilter in her procmailrc
and mboxtrain being run from a cron job.  Is this an acceptable "with
mutt" configuration, or were you thinking of something that mutt could
run itself, like the outlook plugin?

Either way, I'm volunteering to help with or do this one :)

Neale

From papaDoc at videotron.ca  Tue Jan 14 12:35:51 2003
From: papaDoc at videotron.ca (papaDoc)
Date: Tue Jan 14 12:45:36 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <ejh82vo21ik9j58mj8a616gk185ds61vg2@4ax.com>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
 <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> <3E24458A.2040408@videotron.ca>
 <ejh82vo21ik9j58mj8a616gk185ds61vg2@4ax.com>
Message-ID: <3E244A77.4050701@videotron.ca>

Hi,

I was talking with Richie about the documentation for pop3proxy.

Since there will be many change to pop3proxy. I will wait before 
resubmitting my updated documentation.
Since this will enable me to integrate the documentation for the newUI.

If the new UI is submitted by the end of the week I will be able to 
update the documentation for the next
tuesday night.

>Hi Remi,
>
>  
>
>>>   o Web-based configuration, and new doco on setting up the POP3 proxy
>>>     using it.  This is nearly there - I'm combining the pop3proxy.py web
>>>     UI and Tim Stone's OptionConfig.py into one unified web interface.
>>>
>>>      
>>>
>>Should I wait until this is done to resubmit my doc ?
>>    
>>
>
>If you're willing to update it to cover the new web interface, that would
>be fantastic!  If you are, you should mention it on the mailing list before
>someone integrates your previous version into the web site.  I'm hoping to
>check in the new web interface before the end of the week.
>
>  
>


From skip at pobox.com  Tue Jan 14 13:08:21 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 14:08:26 2003
Subject: [Spambayes] loosen up address_headers option?
Message-ID: <15908.24613.426948.717998@montanaro.dyndns.org>


The tokenizer's address_headers option only examines "from".  The code has
this comment:

        # Dang -- I can't use Sender:.  If I do,
        #     'sender:email name:python-list-admin'
        # becomes the most powerful indicator in the whole database.
        #
        # From:         # this helps both rates
        # Reply-To:     # my error rates are too low now to tell about this
        #               # one (smalls wins & losses across runs, overall
        #               # not significant), so leaving it out
        # To:, Cc:      # These can help, if your ham and spam are sourced
        #               # from the same location. If not, they'll be horrible.

which dates from a time early in the spambayes development history.  (Can't
tell exactly when since the recent directory reorganization.  Could the loss
of cvs comments have been avoided?)  Much water has passed under the
tokenizing bridge since then.  I'm skeptical that the above token all by
itself would relegate any spam to the hambox.

In my personal experience, adding to and cc headers to the list would pick
up some strong spam clues.  While there are any number of <foo>@mojam.com
email aliases which eventually reach me, most are essentially unused, having
been harvested from obscure places in the Mojam websites and are rarely used
by real people with Mojam business to transact.

As spambayes moves out of the experimental stage, perhaps it's worth looking
at adding to and cc (and maybe reply-to and sender) to the default list of
analyzed headers.

Skip


From jm at jmason.org  Tue Jan 14 19:29:25 2003
From: jm at jmason.org (Justin Mason)
Date: Tue Jan 14 14:29:05 2003
Subject: [Spambayes] loosen up address_headers option? 
In-Reply-To: Message from Skip Montanaro <skip@pobox.com> 
	<15908.24613.426948.717998@montanaro.dyndns.org> 
Message-ID: <20030114192930.2417116F17@jmason.org>


Skip Montanaro said:
> As spambayes moves out of the experimental stage, perhaps it's worth looking
> at adding to and cc (and maybe reply-to and sender) to the default list of
> analyzed headers.

FWIW, I certainly found they had useful clues in SpamAssassin testing,
Reply-To, To, and Cc at least.  Sender, however, was just noise.  A look
at the SpamAssassin-devel archives may dig up the test results in
question...

--j.

From richie at entrian.com  Tue Jan 14 19:29:04 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 14:29:26 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <w53d6mziqdk.fsf@woozle.org>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com> <w53d6mziqdk.fsf@woozle.org>
Message-ID: <gfo82v8j42jbv3gc5pv7lriasdmvjc902k@4ax.com>


[Richie]
> ...Integration with Mutt...

[Neale]
> Well, my wife is currently using mutt with hammiefilter in her procmailrc
> and mboxtrain being run from a cron job.  Is this an acceptable "with
> mutt" configuration, or were you thinking of something that mutt could
> run itself, like the outlook plugin?

Your setup is great, but there is also the plugin route.  Many people might
prefer it because it's more user-driven and doesn't depend on cron jobs and
so on.  This way also means you don't need to keep spam around in your
mailbox.

Nick Moffitt's article at http://www.linuxjournal.com/article.php?sid=6439
shows how to integrate Bogofilter with Mutt such that Save automatically
trains as ham, commands like Reply and so on do the same (on the grounds
that you never save or reply to spam), and there's a new Delete As Spam
command.

Nick's system does auto-training as well, so it's a little different from
what we'd expect with the current Spambayes, but the idea is the same.
It's something that Don Marti, the Limux Journal editor, was keen on (I get
the impression it's because he's a Mutt user and wants to use Spambayes as
a plugin 8-)

> Either way, I'm volunteering to help with or do this one :)

Wonderful - many thanks!  Hopefully Nick's article explains the idea in
full.

-- 
Richie Hindle
richie@entrian.com


From tim.one at comcast.net  Tue Jan 14 14:34:32 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 14 14:35:40 2003
Subject: [Spambayes] loosen up address_headers option?
In-Reply-To: <15908.24613.426948.717998@montanaro.dyndns.org>
Message-ID: <BIEJKCLHCIOIHAGOKOLHAEBHEIAA.tim.one@comcast.net>

[Skip Montanaro]
> The tokenizer's address_headers option only examines "from".  The code has
> this comment:
>
> ...
>
> As spambayes moves out of the experimental stage, perhaps it's
> worth looking at adding to and cc (and maybe reply-to and sender) to the
> default list of analyzed headers.

They remain killer-strong clues for bad reasons when training on
mixed-source corpora, so caution is still in order.

In the Outlook client, life is so constrained (meaning mixed-source corpora
are darned hard to get at there) that the Outlook client's default has been:

    [Tokenizer]
    address_headers: from to cc sender reply-to

for a long time.  This works fine in practice, except when python.org has to
turn off Spamassassin and lots of spam leaks thru.  Then it piles up lots of
"this came from python.org, so it's probably not spam" tokens, which
increases the incidence of FN and (especially) spam rating Unsure.


From barry at python.org  Tue Jan 14 14:31:30 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 14:37:28 2003
Subject: [Spambayes] what else is needed for a first (source) release?
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.6730.504794.353666@gargle.gargle.HOWL>
	<ueg82vkpqaofbbte26d28ctbju8ucb70ru@4ax.com>
Message-ID: <15908.26002.8870.504247@gargle.gargle.HOWL>


>>>>> "RH" == Richie Hindle <richie@entrian.com> writes:

    RH> It would be nice if you could leverage the existing web
    RH> interface - I'm working towards making it less monolithic
    RH> right now, which might help.

For configuring, yes, we'll use the Privacy -> Spam Filters page.  The
trick is the admindb page, which already sucks.  But I don't intend to
clean it up for the prototype.

-Barry

From richie at entrian.com  Tue Jan 14 19:46:25 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 14:46:49 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <15908.26002.8870.504247@gargle.gargle.HOWL>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.6730.504794.353666@gargle.gargle.HOWL>
	<ueg82vkpqaofbbte26d28ctbju8ucb70ru@4ax.com>
	<15908.26002.8870.504247@gargle.gargle.HOWL>
Message-ID: <15q82vkjd9t07ddgc2ms9gr889v1dqvi6e@4ax.com>


[Richie]
> It would be nice if you could leverage the existing web
> interface - I'm working towards making it less monolithic
> right now, which might help.

[Barry]
> For configuring, yes, we'll use the Privacy -> Spam Filters page.  The
> trick is the admindb page, which already sucks.  But I don't intend to
> clean it up for the prototype.

Are we talking at cross purposes?  I meant leverage the existing
*Spambayes* web interface that's (currently) a part of pop3proxy.py.  But
it sounds like you're way ahead of me anyway.

-- 
Richie Hindle
richie@entrian.com


From barry at python.org  Tue Jan 14 14:49:59 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 14:50:30 2003
Subject: [Spambayes] what else is needed for a first (source) release?
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<15908.6730.504794.353666@gargle.gargle.HOWL>
	<ueg82vkpqaofbbte26d28ctbju8ucb70ru@4ax.com>
	<15908.26002.8870.504247@gargle.gargle.HOWL>
	<15q82vkjd9t07ddgc2ms9gr889v1dqvi6e@4ax.com>
Message-ID: <15908.27111.16922.418177@gargle.gargle.HOWL>


>>>>> "RH" == Richie Hindle <richie@entrian.com> writes:

    RH> [Richie]
    >> It would be nice if you could leverage the existing web
    >> interface - I'm working towards making it less monolithic right
    >> now, which might help.

    RH> [Barry]
    >> For configuring, yes, we'll use the Privacy -> Spam Filters
    >> page.  The trick is the admindb page, which already sucks.  But
    >> I don't intend to clean it up for the prototype.

    RH> Are we talking at cross purposes?  I meant leverage the
    RH> existing *Spambayes* web interface that's (currently) a part
    RH> of pop3proxy.py.  But it sounds like you're way ahead of me
    RH> anyway.

Oops, we're talking about different things.  I'm talking about the
hooks in Mailman to enable spambayes scoring, and what to do with
messages based on those scores.  Configuring spambayes itself is
another kettle of fish, one that I'm not planning on addressing for my
prototype.

-Barry

From whisper at oz.net  Tue Jan 14 12:38:30 2003
From: whisper at oz.net (David LeBlanc)
Date: Tue Jan 14 15:38:28 2003
Subject: [Spambayes] Lot of files removed from CVS?
Message-ID: <GCEDKONBLEFPPADDJCOEKEOBHFAA.whisper@oz.net>

I just updated my local copy of CVS (from about a week ago or so) and got
this (normal update messages removed):

cvs server: Corpus.py is no longer in the repository
cvs server: CostCounter.py is no longer in the repository
cvs server: FileCorpus.py is no longer in the repository
cvs server: HistToGNU.py is no longer in the repository
cvs server: Histogram.py is no longer in the repository

cvs server: Options.py is no longer in the repository
cvs server: TestDriver.py is no longer in the repository
cvs server: Tester.py is no longer in the repository
cvs server: cdb.py is no longer in the repository
cvs server: chi2.py is no longer in the repository
cvs server: classifier.py is no longer in the repository
cvs server: cmp.py is no longer in the repository
cvs server: dbmstorage.py is no longer in the repository
cvs server: fpfn.py is no longer in the repository

cvs server: hammiebulk.py is no longer in the repository
cvs server: heapq.py is no longer in the repository
cvs server: loosecksum.py is no longer in the repository

cvs server: mboxcount.py is no longer in the repository
cvs server: mboxtest.py is no longer in the repository
P mboxtrain.py
cvs server: mboxutils.py is no longer in the repository
cvs server: msgs.py is no longer in the repository
cvs server: optimize.py is no longer in the repository

cvs server: rates.py is no longer in the repository
cvs server: rebal.py is no longer in the repository
cvs server: sets.py is no longer in the repository

cvs server: simplexloop.py is no longer in the repository
cvs server: split.py is no longer in the repository
cvs server: splitn.py is no longer in the repository
cvs server: splitndirs.py is no longer in the repository
cvs server: storage.py is no longer in the repository
cvs server: table.py is no longer in the repository
cvs server: timcv.py is no longer in the repository
cvs server: timtest.py is no longer in the repository
cvs server: tokenizer.py is no longer in the repository

cvs server: weaktest.py is no longer in the repository

*****CVS exited normally with code 0*****

Is this correct?

David LeBlanc
Seattle, WA USA


From skip at pobox.com  Tue Jan 14 14:42:41 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 15:42:46 2003
Subject: [Spambayes] Lot of files removed from CVS?
In-Reply-To: <GCEDKONBLEFPPADDJCOEKEOBHFAA.whisper@oz.net>
References: <GCEDKONBLEFPPADDJCOEKEOBHFAA.whisper@oz.net>
Message-ID: <15908.30273.472786.864309@montanaro.dyndns.org>


    David> I just updated my local copy of CVS (from about a week ago or so)
    David> and got this (normal update messages removed):
    ...
    David> Is this correct?

Yup.  Try

    cvs -dP .

instead.  Anthony moved stuff all around a day or two ago.

Skip

From whisper at oz.net  Tue Jan 14 12:52:47 2003
From: whisper at oz.net (David LeBlanc)
Date: Tue Jan 14 15:52:30 2003
Subject: [Spambayes] Lot of files removed from CVS?
In-Reply-To: <15908.30273.472786.864309@montanaro.dyndns.org>
Message-ID: <GCEDKONBLEFPPADDJCOEEEODHFAA.whisper@oz.net>

I'm using wincvs. By selecting "create missing directories that exist in the
repository", I get the -d flag (and a bunch of "U" messages of some (all?)
of the files listed in my last post during the refresh) - what's the P mean?
I also tried using wincvs' command line option to run the command suggested
and got back a usage message.

Sorry, I realize this is OT a bit...

David LeBlanc
Seattle, WA USA

> -----Original Message-----
> From: Skip Montanaro [mailto:skip@pobox.com]
> Sent: Tuesday, January 14, 2003 12:43
> To: David LeBlanc
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Lot of files removed from CVS?
>
>
>
>     David> I just updated my local copy of CVS (from about a week
> ago or so)
>     David> and got this (normal update messages removed):
>     ...
>     David> Is this correct?
>
> Yup.  Try
>
>     cvs -dP .
>
> instead.  Anthony moved stuff all around a day or two ago.
>
> Skip


From skip at pobox.com  Tue Jan 14 15:21:46 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 16:21:51 2003
Subject: [Spambayes] Lot of files removed from CVS?
In-Reply-To: <GCEDKONBLEFPPADDJCOEEEODHFAA.whisper@oz.net>
References: <15908.30273.472786.864309@montanaro.dyndns.org>
        <GCEDKONBLEFPPADDJCOEEEODHFAA.whisper@oz.net>
Message-ID: <15908.32618.148694.815679@montanaro.dyndns.org>

    what's the P mean?

"P"rune empty directories.

Skip

From carel.fellinger at chello.nl  Tue Jan 14 22:21:35 2003
From: carel.fellinger at chello.nl (Carel Fellinger)
Date: Tue Jan 14 16:33:53 2003
Subject: [Spambayes] re-org - making a package &c.
In-Reply-To: <w53hecckf9b.fsf@woozle.org>
References: <GCEDKONBLEFPPADDJCOEKEJJHEAA.whisper@oz.net>
	<20030113111344.GA12027@mail.felnet> <w53ptr0ki7c.fsf@woozle.org>
	<20030113192419.GA17717@mail.felnet> <w53hecckf9b.fsf@woozle.org>
Message-ID: <20030114212135.GA1769@mail.felnet>

On Mon, Jan 13, 2003 at 11:42:56AM -0800, Neale Pickett wrote:
...
> In any case, what you propose would work as a tuning tool, to be run
> whenever you want to tune your config.  I would look at the existing
> test programs and try to figure out a way to combine them.  I believe

fine idea, but..

> You still interested in doing this, Carel?

to be honest but blunt: no, not at all.  Maybe in a few weeks i'm in a
better position to spent some time on it, but my hopes are low:(  Heck,
I even haven't come round to install spambayes and enjoy its excelence!

living-a-lurking-live-isn't-always-by-choice-ly y'rs - carel

From skip at pobox.com  Tue Jan 14 15:55:48 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 16:55:51 2003
Subject: [Spambayes] pop3proxy - a couple issues
Message-ID: <15908.34660.141569.623885@montanaro.dyndns.org>


I am just trying out pop3proxy for the first time with ssh (having pop3proxy
start an ssh session that forwards the pop connection) and ran into a couple
problems.  I don't think they are related to my use of an ssh tunnel, and
I'm not about to test pop3proxy without ssh and have my password go over the
net in the clear (though I will mess about with manually starting ssh
external to pop3proxy shortly).

Messages get sucked over the pipe and properly classified, however I have
two problems:

    * No X-Hammie-Debug headers are added to the processed messages even
      though I have

        [Hammie]
        hammie_debug_header: True

      in my options file and have BAYESCUSTOMIZE set to

        BAYESCUSTOMIZE=$HOME/hammie.opt

    * When I click the "review" button in my web browser I get a max
      recursion depth exception from pop3proxy and a blank page in my
      browser.  Here are the start and end of the asyncore traceback:

    error: uncaptured python exception, closing channel
    <__main__.UserInterface connected at 0x534940>
    (exceptions.RuntimeError:maximum recursion depth exceeded [/Users/skip/local/lib/python2.3/asyncore.py|read|69]
    [/Users/skip/local/lib/python2.3/asyncore.py|handle_read_event|385]
    [/Users/skip/local/lib/python2.3/asynchat.py|handle_read|136]
    [/Users/skip/local/bin/pop3proxy.py|found_terminator|808]
    [/Users/skip/local/bin/pop3proxy.py|onRequest|834]
    [/Users/skip/local/bin/pop3proxy.py|onReview|1146]
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getitem__|208]
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282]
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282]
    ...
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282]
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282]
    [/Users/skip/local/lib/python2.3/site-packages/spambayes/Corpus.py|__getattr__|282]

The proxy is started like so:

    pop3proxy.py -p ~/hammie.db -d -l 11111 \
        -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \
        "while true ; do sleep 60 ; done"' \
        localhost 11110

(remove the backslashes before trying this at home).  All the -e flag does
is get the associated command started up before doing anything else:

    state.buildServerStrings()
    pid = 0
    if state.initCommand:
        pid = spawnInitCommand(state.initCommand)
    try:
        main(state.servers, state.proxyPorts, state.uiPort,
             state.launchUI)
    finally:
        if pid:
            killInitCommand(pid)

spawnInitCommand and killInitCommand are straightforward:

    # these may need some changing for non-Unixoid platforms
    def spawnInitCommand(cmd):
        """run cmd (a string) in the background"""
        cmd, args = cmd.split(" ", 1)
        args = args.split()
        return os.spawnvp(os.P_NOWAIT, cmd, args)


    def killInitCommand(pid):
        os.kill(pid, signal.SIGHUP)

Is anyone else seeing these problems?  I'm running on Mac OS X with a fairly
recent CVS checkout of Python (Jan 7 2003, 16:09) and with spambayes updated
earlier today.

Thanks,

Skip

From francois.granger at free.fr  Tue Jan 14 23:13:53 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Tue Jan 14 17:13:59 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <w53iswrisnl.fsf@woozle.org>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
 <w53iswrisnl.fsf@woozle.org>
Message-ID: <a05200f03ba4a3b030d6a@[192.168.1.20]>

At 08:48 -0800 on 14/01/2003, in message Re: [Spambayes] what else is 
needed for a first (source, Neale Pickett wrote:
>
>We may want to merge HAMMIE.txt or some subset of it into a top-level
>README.  Setting up procmail-based filtering, at this point, is a piece
>of cake.

Even I was able to do it... ;-)

If you want to cut and past some naive words for the readme, there 
are some words of what I did at bottom of first box ...

http://francois.granger.free.fr/radiohome/2002/12/29.html


-- 
Recently using MacOSX.......

From francois.granger at free.fr  Tue Jan 14 23:30:22 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Tue Jan 14 17:30:27 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
 <92f82v07amarkc7hotqfeksqh15ecdh6k3@4ax.com>
Message-ID: <a05200f04ba4a3d589962@[192.168.1.20]>

At 17:05 +0000 14/01/2003, in message Re: [Spambayes] what else is 
needed for a first (source, Richie Hindle wrote:

>     o The Eudora problem.  This is nasty - I think we'll end up with a
>       compromise here (as proposed by Jeremy) because I don't see a clean
>       solution out there.  (See
>       http://mail.python.org/pipermail/spambayes/2002-November/002054.html
>       for an explanation of why combining the hostname and the username is
>       not a perfect solution).

The solution given by Tony Lownds of two loopback adresses does not 
works on MacOS 9 either. I shortly tested it today at work.

I will be testing it at home on MacOS X soon.

I would say that giving a "complex login" like proposed earlier could 
be the only  solution for Eudora MacOS 9. (Not that I care now that I 
switched to X ;-)

server1 = francois.granger:free@127.0.0.1, pop.free.fr:110
server2 = francois.granger:lap, pop.laposte.net

In Eudora, you put a login of francois.granger:free and a server at 
127.0.0.1, pop3proxy remove the :free and use the pop.free.fr as mail 
server. It is not much more that what is already needed, and it don't 
have the problem above.

Easy to say. I don't know how much changes in the code is needed.

-- 
Recently using MacOSX.......

From mhammond at skippinet.com.au  Wed Jan 15 09:41:20 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 14 17:42:07 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <200301140741.h0E7f3R19057@localhost.localdomain>
Message-ID: <06d401c2bc1e$0f782cd0$530f8490@eden>

> I figure we make the first release a source only one - which isn't a
> biggie for most people, since it's in python. Should the release just
> have what's in the current setup.py (which doesn't include
> the Outlook2000
> directory), with a separate release for the O2K plugin+core? Or should
> it all be in one big bundle?

If we are going source-code only, then I would say just one bundle is fine.
We just point out the win32all versions people need, then they run
"outlook2000\addin.py", and everything works.

We need some better docs, which would include our intention to move to a
binary/bz2 distribution, and some good documentation on how to get started
with training etc.

Mark.


From tony-bayes at lownds.com  Tue Jan 14 14:47:33 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Tue Jan 14 17:47:28 2003
Subject: [Spambayes] pop3proxy - a couple issues
Message-ID: <a05200f25ba4a43e4c347@[204.162.121.104]>

Skip wrote:
>     * When I click the "review" button in my web browser I get a max
>       recursion depth exception from pop3proxy and a blank page in my
>       browser.  Here are the start and end of the asyncore traceback:

I ran into this too; the stack size is too small. Run one of these 
commands first:

tcsh: ulimit stacksize 2048

sh: ulimit -s 2048

Mac OS X's default is 512, I picked 2048 at random.

>The proxy is started like so:
>
>     pop3proxy.py -p ~/hammie.db -d -l 11111 \
>         -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \
>         "while true ; do sleep 60 ; done"' \
>         localhost 11110

ssh has an -N flag that will replace that while loop.

>(remove the backslashes before trying this at home).  All the -e flag does
>is get the associated command started up before doing anything else:

I have found, in my one day of using ssh tunnels + pop3proxy ,that my 
ssh tunnels will go down (due to the computer going to sleep or my 
internet connection being flakey) more often than pop3proxy.py does 
(due to me closing it). So, perhaps the command is better spawned 
when the proxy can't connect to the server. Just a thought...

-Tony

From skip at pobox.com  Tue Jan 14 17:08:28 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 14 18:08:35 2003
Subject: [Spambayes] pop3proxy - a couple issues
In-Reply-To: <a05200f25ba4a43e4c347@[204.162.121.104]>
References: <a05200f25ba4a43e4c347@[204.162.121.104]>
Message-ID: <15908.39020.117137.398334@montanaro.dyndns.org>


    Tony> Skip wrote:
    >> * When I click the "review" button in my web browser I get a max
    >> recursion depth exception from pop3proxy and a blank page in my
    >> browser.  Here are the start and end of the asyncore traceback:

    Tony> I ran into this too; the stack size is too small. Run one of these 
    Tony> commands first:

    Tony> tcsh: ulimit stacksize 2048

    Tony> sh: ulimit -s 2048

    Tony> Mac OS X's default is 512, I picked 2048 at random.

That's not it, or at least a stacksize of 2048 won't be sufficient.  I
already have my stack size set to 8192 by default:

    % ulimit -a
    core file size        (blocks, -c) 0
    data seg size         (kbytes, -d) 6144
    file size             (blocks, -f) unlimited
    max locked memory     (kbytes, -l) unlimited
    max memory size       (kbytes, -m) unlimited
    open files                    (-n) 256
    pipe size          (512 bytes, -p) 1
    stack size            (kbytes, -s) 8192
    cpu time             (seconds, -t) unlimited
    max user processes            (-u) 100
    virtual memory        (kbytes, -v) 14336

Looking at the traceback it seems to me that a __getattr__ or __getitem__
method has a bug.

    >> The proxy is started like so:
    >> 
    >> pop3proxy.py -p ~/hammie.db -d -l 11111 \
    >> -e 'ssh -q -C -f mail.mojam.com -L 11110:localhost:110 bash -c \
    >> "while true ; do sleep 60 ; done"' \
    >> localhost 11110

    Tony> ssh has an -N flag that will replace that while loop.

I tried it.  It doesn't work as I'd like.  When you use -N, ssh exits after
one proxy session.  That is, pop3proxy connects through the tunnel as a
result of a local connection request, then once that session is complete,
ssh exits.  The next time the local mail user agent (fetchmail in my case at
the moment), pop3proxy gets a connection refused message because ssh is
gone.

    >> (remove the backslashes before trying this at home).  All the -e flag
    >> does is get the associated command started up before doing anything
    >> else:

    Tony> I have found, in my one day of using ssh tunnels + pop3proxy ,that
    Tony> my ssh tunnels will go down (due to the computer going to sleep or
    Tony> my internet connection being flakey) more often than pop3proxy.py
    Tony> does (due to me closing it). So, perhaps the command is better
    Tony> spawned when the proxy can't connect to the server. Just a
    Tony> thought...

Maybe, but that's going to be a fair amount more work.

Skip

From anthony at interlink.com.au  Wed Jan 15 10:51:49 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 18:53:08 2003
Subject: [Spambayes] what else is needed for a first (source) release? 
In-Reply-To: <w53iswrisnl.fsf@woozle.org> 
Message-ID: <200301142351.h0ENpnk29524@localhost.localdomain>


>>> Neale Pickett wrote
> Word.  Are you going to the spam conference too, Anthony?

I wish... nah, air fares cost too much, plus business situation is
such that work can't/won't pay for it (I don't even get to go to pycon :(

> I've been using hammiefilter and mboxtrain for over a month now with no
> complaints, so I think that little corner of the code is ready for a
> release.

Hm. I've been doing my training via hammie. I think we might want to
remove one or two of the myriad ways to train the system before release.

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Wed Jan 15 11:22:25 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 19:23:36 2003
Subject: [Spambayes] what else is needed for a first (source) release? 
In-Reply-To: <15908.6730.504794.353666@gargle.gargle.HOWL> 
Message-ID: <200301150022.h0F0MPx29798@localhost.localdomain>


>>> Barry A. Warsaw wrote
> 
> >>>>> "AB" == Anthony Baxter <anthony@interlink.com.au> writes:
> 
>     AB> Ok, can people nominate things that they think would be good
>     AB> before a first release? I'd like to try and get one out before
>     AB> the spam conference (it's as good a date as any :)
> 
> Although it might not be ready until my train pulls into South
> Station, 

Ah, you wacky americans and your strange mannerisms. What does this
mean in english?

> I'm working on a Mailman handler module for integration with
> Spambayes.  The actual hook is pretty easy (using the hammie.py
> interface) -- it's all the niddling little stuff <wink> like u/i,
> moderation, training, configuration, etc. that's a bit rough around
> the edges.

This will be one database per list? 

> Having a spambayes package I can unpack in Mailman's pythonlib dir is
> perfect.

Can you make sure, then, that the API that is exposed is sufficient 
for your needs?

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From richie at entrian.com  Wed Jan 15 00:38:54 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 19:39:34 2003
Subject: [Spambayes] what else is needed for a first (source) release?
In-Reply-To: <06d401c2bc1e$0f782cd0$530f8490@eden>
References: <200301140741.h0E7f3R19057@localhost.localdomain>
	<06d401c2bc1e$0f782cd0$530f8490@eden>
Message-ID: <m6b92vgnlk39g52ci63e71uk14vdhg238r@4ax.com>


[Mark]
> We need some better docs, which would include our intention to move to a
> binary/bz2 distribution, and some good documentation on how to get started
> with training etc.

I can help there - as long as we give a copyright attribution, we can reuse
parts of my Linux Journal article in the documentation.  Publishing the
whole thing in advance of the magazine coming out would be kind of
impolite, but we can use pieces of it.

That said, I have no time to work on the documentation directly - anyone
who *is* working on it, please feel free to ask for a copy of my article.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Wed Jan 15 00:40:02 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 14 19:40:43 2003
Subject: [Spambayes] pop3proxy - a couple issues
In-Reply-To: <15908.39020.117137.398334@montanaro.dyndns.org>
References: <a05200f25ba4a43e4c347@[204.162.121.104]>
	<15908.39020.117137.398334@montanaro.dyndns.org>
Message-ID: <1db92vkbiopoejvvebf6ii16rfqourqp3d@4ax.com>


[Skip]
> it seems to me that a __getattr__ or __getitem__ method has a bug.

I'll look at this - thanks.  But I'll go to bed first.

-- 
Richie Hindle
richie@entrian.com


From tim at fourstonesExpressions.com  Tue Jan 14 18:30:28 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 14 19:41:24 2003
Subject: [Spambayes] what else is needed for a first (source) release? 
In-Reply-To: <200301150022.h0F0MPx29798@localhost.localdomain>
Message-ID: <LIMA6L87JDZU54FBVUNJB0YWHEWMI.3e24aba4@myst>

Off topic...  <wink**2>  - TimS

1/14/2003 6:22:25 PM, Anthony Baxter <anthony@interlink.com.au> wrote:

>
>>>> Barry A. Warsaw wrote
>> 
>> >>>>> "AB" == Anthony Baxter <anthony@interlink.com.au> writes:
>> 
>>     AB> Ok, can people nominate things that they think would be good
>>     AB> before a first release? I'd like to try and get one out before
>>     AB> the spam conference (it's as good a date as any :)
>> 
>> Although it might not be ready until my train pulls into South
>> Station, 
>
>Ah, you wacky americans and your strange mannerisms. What does this
>mean in english?
>

There's no such thing as South Station... hehe

>> I'm working on a Mailman handler module for integration with
>> Spambayes.  The actual hook is pretty easy (using the hammie.py
>> interface) -- it's all the niddling little stuff <wink> like u/i,
>> moderation, training, configuration, etc. that's a bit rough around
>> the edges.

Niddling?  Invert above mannerism jab... <wink>

>
>This will be one database per list? 
>
>> Having a spambayes package I can unpack in Mailman's pythonlib dir is
>> perfect.
>
>Can you make sure, then, that the API that is exposed is sufficient 
>for your needs?
>
>-- 
>Anthony Baxter     <anthony@interlink.com.au>   
>It's never too late to have a happy childhood.
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From richard at jowsey.com  Wed Jan 15 12:20:43 2003
From: richard at jowsey.com (Richard Jowsey)
Date: Tue Jan 14 20:21:16 2003
Subject: [Spambayes] FYI: Java implementation
Message-ID: <3E25521B.20937.3607FFA@localhost>

Hi all,

I've been building a Java implementation of Paul Graham's 
"Bayesian" classification logic over the past couple months, 
intended as a plug-in filter for the Apache JAMES mail server. 

However, after considerable testing, tweaking and tuning via a 
proxy setup (similar to POPFile), plus some recent lurking on 
the Spambayes list, I'm now modifying this project to 
incorporate the excellent notions contributed by Gary Robinson, 
et al, as implemented in your Python code.

Early results are *very* promising!!! This death2spam stuff is 
definitely heading in the right direction! I haven't quite 
finished the chi2 comparison logic, but even using just "gary-
combining", the kinds of messages ending up in my "uncertain" 
category make much more sense. Plus I'm now seeing far less 
weirdness caused by Graham's "2 * nGood + nSpam >= 5" trick, 
etc. Will keep the list posted as to further progress.

I'd sure love to attend the upcoming spam-fest at MIT, but we 
moved downunder (Seattle -> Sydney) last year, and it's one 
helluva long way to go just for a day...

Many thanks for all your fine coding, testing efforts, and 
thoughtful conversations! It's been very helpful, not to mention 
highly entertaining at times.  ;-)

Cheers,
Richard


From barry at python.org  Tue Jan 14 20:41:35 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 20:42:04 2003
Subject: [Spambayes] what else is needed for a first (source) release? 
References: <15908.6730.504794.353666@gargle.gargle.HOWL>
	<200301150022.h0F0MPx29798@localhost.localdomain>
Message-ID: <15908.48207.917525.910821@gargle.gargle.HOWL>


>>>>> "AB" == Anthony Baxter <anthony@interlink.com.au> writes:

    >> :) Although it might not be ready until my train pulls into
    >> South Station,

    AB> Ah, you wacky americans and your strange mannerisms. What does
    AB> this mean in english?

It means my floob boober babs boober bubs won't constrapulate until my
sneenkle quods the flamb.  Jeez, you Aussies.

(Translation: i'm on a 6:30 hour train trip, with not much else to do
than randomly peck at my laptop.)

    >> I'm working on a Mailman handler module for integration with
    >> Spambayes.  The actual hook is pretty easy (using the hammie.py
    >> interface) -- it's all the niddling little stuff <wink> like
    >> u/i, moderation, training, configuration, etc. that's a bit
    >> rough around the edges.

    AB> This will be one database per list?

Yup.

    >> Having a spambayes package I can unpack in Mailman's pythonlib
    >> dir is perfect.

    AB> Can you make sure, then, that the API that is exposed is
    AB> sufficient for your needs?

I'm make sure I cvs up before I leave.  Won't have much time to look
before then, but I think the hammie.py module is all I need (from the
Mailman side -- for now).  If that's in spambayes.hammie I'm all set.

-Barry

From barry at python.org  Tue Jan 14 20:44:03 2003
From: barry at python.org (Barry A. Warsaw)
Date: Tue Jan 14 20:44:32 2003
Subject: [Spambayes] pop3proxy - a couple issues
References: <a05200f25ba4a43e4c347@[204.162.121.104]>
Message-ID: <15908.48355.950264.416317@gargle.gargle.HOWL>


>>>>> "TL" == Tony Lownds <tony-bayes@lownds.com> writes:

    TL> I ran into this too; the stack size is too small. Run one of
    TL> these commands first:

    TL> tcsh: ulimit stacksize 2048

    TL> sh: ulimit -s 2048

    TL> Mac OS X's default is 512, I picked 2048 at random.

That crops up a lot with Python, i.e. test_re IIRC, and definitely in
Mailman.

-Barry

From T.A.Meyer at massey.ac.nz  Wed Jan 15 15:43:15 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 14 21:56:07 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE0@its-xchg2.massey.ac.nz>

Hi,

I've had a trouble with the Outlook plugin in that whenever it tries to build a folder list (i.e. in the various dialogs) an exception is raised and the list presented is empty.  I traced it to a bad folder (Outlook can't display it either).

Now, normally, one should fix the cause, not the effect, but in this case the folder is on an exchange server and is not mine (it's a public folder).  Getting the owner of the folder to fix things would be very difficult.

So I altered FolderSelector.py so that if a bad folder causes this sort of problem, it's simply not presented in the list (but all the other folders are).

Probably not all that important, but it does (in most ways) make it more user-friendly.  Anyway, here's the new function in case you want to alter the cvs to reflect it.  I've never used Python before, so this may not be the best way to do this (suggestions of better ways are welcome, obviously).

import pywintypes
def _BuildFolderTreeOutlook(session, parent):
    children = []
    for i in range(parent.Folders.Count):
        folder = parent.Folders[i+1]
        try:
            spec = FolderSpec((folder.StoreID, folder.EntryID),
                          folder.Name.encode("mbcs", "replace"))
            if folder.Folders:
                spec.children = _BuildFolderTreeOutlook(session, folder)
            children.append(spec)
        except pywintypes.com_error:
            print "Skipping folder " + folder.Name
    return children

=Tony Meyer

From mhammond at skippinet.com.au  Wed Jan 15 14:22:39 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 14 22:23:18 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE0@its-xchg2.massey.ac.nz>
Message-ID: <078601c2bc45$5c874750$530f8490@eden>

[Tony Meyer]
> Now, normally, one should fix the cause, not the effect, but
> in this case the folder is on an exchange server and is not
> mine (it's a public folder).  Getting the owner of the folder
> to fix things would be very difficult.

Of course, fixing the cause makes sense when possible, but if Outlook and
other tools all work OK, then it is a bug in spambayes that we don't.

> I've never
> used Python before, so this may not be the best way to do
> this (suggestions of better ways are welcome, obviously).

Excellent!  The more common pattern is to catch pythoncom.error, but
pywintypes.com_error is an alias for the same object, so your code is just
fine.

The only thing is that we are wrapping the recursive call to
_BuildFolderTree in the exception handler.  I would generally prefer to only
catch the operation in error.  Is it possible for you to include the full
traceback without this patch applied?  Then I will get it into CVS.

Thanks for digging in to find this problem!

Mark.


From mhammond at skippinet.com.au  Wed Jan 15 14:29:05 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 14 22:29:52 2003
Subject: [Spambayes] Updating outlook to the new directory structure
Message-ID: <078d01c2bc46$420ff5b0$530f8490@eden>

FYI, after the recent source reorg, the Outlook addin seems to work fine,
except for 2 things:

* You must remember to blow away your .pyc files, else things may go
screwey, and you won't notice the next point until later.

* You need to do a full retrain of the database (as the module name stored
in the pickle has changed)

Apart from that, it all looks good.  If we can just get rid of more .py
files from the root, life will be good <wink>

Mark.


From anthony at interlink.com.au  Wed Jan 15 14:39:36 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 22:40:56 2003
Subject: [Spambayes] Updating outlook to the new directory structure 
In-Reply-To: <078d01c2bc46$420ff5b0$530f8490@eden> 
Message-ID: <200301150339.h0F3dar01317@localhost.localdomain>


>>> "Mark Hammond" wrote
> * You must remember to blow away your .pyc files, else things may go
> screwey, and you won't notice the next point until later.
> * You need to do a full retrain of the database (as the module name stored
> in the pickle has changed)

Oo. Yuk. Good catch.

> Apart from that, it all looks good.  If we can just get rid of more .py
> files from the root, life will be good <wink>

We could port it all to perl, or ruby, or something? :)


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Wed Jan 15 14:44:45 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 14 22:46:10 2003
Subject: [Spambayes] 
 new attempt at non-technical explanation on index.html of website
Message-ID: <200301150344.h0F3ijF01416@localhost.localdomain>


I just checked in some text that attempts to explain, in a mostly
non-technical way, how spambayes works. It's the "handwaving" bit
on the index.html document on the website.

suggestions for improvement accepted.


From T.A.Meyer at massey.ac.nz  Wed Jan 15 16:36:59 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 14 22:50:08 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz>

[Mark Hammond]
> The only thing is that we are wrapping the recursive call to
> _BuildFolderTree in the exception handler.  I would generally 
> prefer to only
> catch the operation in error.  Is it possible for you to 
> include the full
> traceback without this patch applied?  Then I will get it into CVS.
Here you go:

Traceback (most recent call last):
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 322, in OnInitDialog
    tree = BuildFolderTreeOutlook(self.mapi)
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 128, in BuildFolderTreeOutlook
    root.children = _BuildFolderTreeOutlook(session, session)
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook
    spec.children = _BuildFolderTreeOutlook(session, folder)
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook
    spec.children = _BuildFolderTreeOutlook(session, folder)
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 122, in _BuildFolderTreeOutlook
    spec.children = _BuildFolderTreeOutlook(session, folder)
  File "D:\CVS Modules\spambayes\Outlook2000\dialogs\FolderSelector.py", line 119, in _BuildFolderTreeOutlook
    spec = FolderSpec((folder.StoreID, folder.EntryID),
  File "D:\Python22\lib\site-packages\win32com\client\__init__.py", line 369, in __getattr__
    return apply(self._ApplyTypes_, args)
  File "D:\Python22\lib\site-packages\win32com\client\__init__.py", line 363, in _ApplyTypes_
    return self._get_good_object_(apply(self._oleobj_.InvokeTypes, (dispid, 0, wFlags, retType, argTypes) + args), user, resultCLSID)
pywintypes.com_error: (-2147352567, 'Exception occurred.', (4096, 'Microsoft Outlook', 'The operation failed.', None, 0, -2147221233), None)
win32ui: OnInitDialog() virtual handler (<bound method FolderSelector.OnInitDialog of <dialogs.FolderSelector.FolderSelector instance at 0x03A7CE00>>) raised an exception

> Thanks for digging in to find this problem!
It wouldn't have felt right to mail a "my folder list is empty" message to the list and not do something myself :)

Along a similiar(ish) line:
I actually have another line added to _BuildFolderTreeOutlook that skips me past all the public folders (just a 'if folder.name == "Public Folders" kind of thing), because otherwise it takes several minutes to build the list.

How likely is it that people will want to train on a public folder?  Could there maybe be an option in the .ini or somewhere like "Present_Public_Folders: False", for those like me that don't and have very large public folders?

=Tony Meyer

From mhammond at skippinet.com.au  Wed Jan 15 14:52:26 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 14 22:53:14 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz>
Message-ID: <079b01c2bc49$853c47f0$530f8490@eden>

[Tony]
> [Mark Hammond]
> > The only thing is that we are wrapping the recursive call to
> > _BuildFolderTree in the exception handler.  I would generally 
> > prefer to only
> > catch the operation in error.  Is it possible for you to 
> > include the full
> > traceback without this patch applied?  Then I will get it into CVS.
> Here you go:

Thanks!  Please check the version I just checked in works OK for you.

Thanks,

Mark.


From mhammond at skippinet.com.au  Wed Jan 15 14:56:51 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 14 22:57:13 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE1@its-xchg2.massey.ac.nz>
Message-ID: <079c01c2bc4a$237ea7a0$530f8490@eden>

Sorry, I missed this bit:

[Tony]
> Along a similiar(ish) line:
> I actually have another line added to _BuildFolderTreeOutlook
> that skips me past all the public folders (just a 'if
> folder.name == "Public Folders" kind of thing), because
> otherwise it takes several minutes to build the list.
>
> How likely is it that people will want to train on a public
> folder?  Could there maybe be an option in the .ini or
> somewhere like "Present_Public_Folders: False", for those
> like me that don't and have very large public folders?

Check out the comments in this source file that start with:

# Oh, lord help us.

There is a MAPI version of the folder builder in that source file that will
work *much* faster - but until I get my hands on an Exchange server, I can't
really test it.

If you look further in the source file for where BuildFolderTreeMAPI() is
commented out, and you can manage to test it, I would be interested to know
your experiences with the code - except that you may find the exact same
exception we just plugged will be raised in this MAPI code - and a similar
fix will also work.

Mark.


From tim.one at comcast.net  Tue Jan 14 23:06:30 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 14 23:07:08 2003
Subject: [Spambayes] Updating outlook to the new directory structure
In-Reply-To: <078d01c2bc46$420ff5b0$530f8490@eden>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEKBDIAB.tim.one@comcast.net>

[Mark Hammond]
> FYI, after the recent source reorg, the Outlook addin seems to work fine,
> except for 2 things:
>
> * You must remember to blow away your .pyc files, else things may go
> screwey, and you won't notice the next point until later.
>
> * You need to do a full retrain of the database (as the module name
> stored in the pickle has changed)

Thanks for the advice!  It's good advice, and it worked for me.

> Apart from that, it all looks good.  If we can just get rid of more .py
> files from the root, life will be good <wink>

This will be easy to achieve after everyone upgrades to Outlook2000 <wink>.

From T.A.Meyer at massey.ac.nz  Wed Jan 15 17:06:33 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 14 23:07:44 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE4@its-xchg2.massey.ac.nz>

[Mark]
> If you look further in the source file for where 
> BuildFolderTreeMAPI() is
> commented out, and you can manage to test it, I would be 
> interested to know
> your experiences with the code - except that you may find the 
> exact same
> exception we just plugged will be raised in this MAPI code - 
> and a similar
> fix will also work.

When I was first trying to find the cause of the empty folder list, I looked at this, but had trouble (probably mostly because I was still trying to figure out Python).

Is the switch as simple as changing "tree = BuildFolderTreeOutlook(self.mapi)" to "tree = BuildFolderTreeMAPI(self.mapi)"?

I'll play around with this.  Is there a thread or anything that lists the problems that you were experiencing?

=Tony Meyer

From T.A.Meyer at massey.ac.nz  Wed Jan 15 17:48:50 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 14 23:49:31 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE6@its-xchg2.massey.ac.nz>

> If you look further in the source file for where 
> BuildFolderTreeMAPI() is
> commented out, and you can manage to test it, I would be 
> interested to know
> your experiences with the code - except that you may find the 
> exact same
> exception we just plugged will be raised in this MAPI code - 
> and a similar
> fix will also work.

This code worked perfectly (once I plugged in the same fix) for me, and took 32729ms instead of 88548.6ms.
(Without the public folders it's 1136.05ms for MAPI and 2744.1ms for Outlook).

What wasn't working with Exchange?

=Tony Meyer

From piersh at friskit.com  Tue Jan 14 21:17:41 2003
From: piersh at friskit.com (Piers Haken)
Date: Wed Jan 15 00:02:08 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <9891913C5BFE87429D71E37F08210CB929753A@zeus.sfhq.friskit.com>

The outlook version was added because the IDs that MAPI returns aren't
compatible with the outlook IDs and you can't open a message on an
exchange server with a MAPI ID.

The MAPI tree-building case works fine on exchange, it's the message
filtering code that breaks.

BTW: Mark, you still didn't commit the CompareIDs fix I sent you a while
back. The current version 'works' but '==' is not the recommended way to
do the comparison...

Piers.

> -----Original Message-----
> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] 
> Sent: Tuesday, January 14, 2003 8:49 PM
> To: 'Mark Hammond'; spambayes@python.org
> Subject: RE: [Spambayes] Outlook plugin & bad folders
> 
> 
> > If you look further in the source file for where
> > BuildFolderTreeMAPI() is
> > commented out, and you can manage to test it, I would be 
> > interested to know
> > your experiences with the code - except that you may find the 
> > exact same
> > exception we just plugged will be raised in this MAPI code - 
> > and a similar
> > fix will also work.
> 
> This code worked perfectly (once I plugged in the same fix) 
> for me, and took 32729ms instead of 88548.6ms. (Without the 
> public folders it's 1136.05ms for MAPI and 2744.1ms for Outlook).
> 
> What wasn't working with Exchange?
> 
> =Tony Meyer
> 
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes
> 
From T.A.Meyer at massey.ac.nz  Wed Jan 15 20:23:52 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Jan 15 02:24:31 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAE8@its-xchg2.massey.ac.nz>

> The outlook version was added because the IDs that
> MAPI returns aren't compatible with the outlook IDs
> and you can't open a message on an exchange server
> with a MAPI ID.
Is this a definitive "can't", or a 'no-one has figured out how to yet' "can't"?

> The MAPI tree-building case works fine on exchange,
> it's the message filtering code that breaks. 
Ah yes, this doesn't work :)

Couldn't the FolderSelection dialog only load in the folders it needs to display?  i.e. at first it loads in the root folders, and then whenever OnTreeItemExpanding is called it adds in the necessary children?  If this is practical/possible then if no-one else wants to do it, I could give it a go.  (Although I have all of two days of Python knowledge).

=Tony Meyer

From piersh at friskit.com  Wed Jan 15 02:18:36 2003
From: piersh at friskit.com (Piers Haken)
Date: Wed Jan 15 05:02:45 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <9891913C5BFE87429D71E37F08210CB929753B@zeus.sfhq.friskit.com>

> -----Original Message-----
> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] 
> Sent: Tuesday, January 14, 2003 11:24 PM
> To: spambayes@python.org
> Subject: RE: [Spambayes] Outlook plugin & bad folders
> 
> 
> > The outlook version was added because the IDs that
> > MAPI returns aren't compatible with the outlook IDs
> > and you can't open a message on an exchange server
> > with a MAPI ID.
> Is this a definitive "can't", or a 'no-one has figured out 
> how to yet' "can't"?

I think it's more like a "you should be able to, and the docs say so,
but it just doesn't work. Ugh..."

> > The MAPI tree-building case works fine on exchange,
> > it's the message filtering code that breaks.
> Ah yes, this doesn't work :)
> 
> Couldn't the FolderSelection dialog only load in the folders 
> it needs to display?  i.e. at first it loads in the root 
> folders, and then whenever OnTreeItemExpanding is called it 
> adds in the necessary children?  If this is 
> practical/possible then if no-one else wants to do it, I 
> could give it a go.  (Although I have all of two days of 
> Python knowledge).

Yes, this would definitely be a much better way of doing it, especially
for people who have very large folder structures (eg, corporate public
folders). You might want to keep the behavior where it expands enough to
show the 'currently selected' folders.

Piers.
From mhammond at skippinet.com.au  Wed Jan 15 22:52:09 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Jan 15 06:52:59 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <9891913C5BFE87429D71E37F08210CB929753A@zeus.sfhq.friskit.com>
Message-ID: <085201c2bc8c$88e7a140$530f8490@eden>

[Piers Haken]

> The outlook version was added because the IDs that MAPI 
> returns aren't compatible with the outlook IDs and you 
> can't open a message on an exchange server with a MAPI ID.
> The MAPI tree-building case works fine on exchange, it's the 
> message filtering code that breaks. 

Can you remember the exact problem?  Can you re-enable that code and see
what it was?  If necessary, we can add some extra diagnostic code to see
where the EntryIDs differ, and look if there is any way we can normalize it.

> BTW: Mark, you still didn't commit the CompareIDs fix I
> sent you a while back. The current version 'works' but
> '==' is not the recommended way to do the comparison...

Yes, I haven't committed it because, as you said, it currently works <wink>.
Fortunately and thankfully, you added a bug about it (even attaching a
patch) so there is no way I can forget.  As I am sure you can see from
Tony's stats though, getting the MAPI version working is a much better
option!  I'm sure we can make the MAPI verion work - the MAPI extensions
were developed against an exchange server.

Back-when-Outlook-was-but-a-sparkle-in-Bill's-eye ly,

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 3105 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030115/8b28d7e6/winmail.bin
From mhammond at skippinet.com.au  Wed Jan 15 23:08:50 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Jan 15 07:09:41 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAE8@its-xchg2.massey.ac.nz>
Message-ID: <086401c2bc8e$de1b3c60$530f8490@eden>

> Couldn't the FolderSelection dialog only load in the folders
> it needs to display?  i.e. at first it loads in the root
> folders, and then whenever OnTreeItemExpanding is called it
> adds in the necessary children?  If this is
> practical/possible then if no-one else wants to do it, I
> could give it a go.  (Although I have all of two days of
> Python knowledge).

:) go for it!  pywin\tools\hierlist.py has an example of
OnTreeItemExpanding.  Only complication will be that the current code
expands the tree to show the currently selected folders - this should be
doable though.

Still-wouldn't-mind-getting-that-MAPI-version-going ly,

Mark.


From skip at pobox.com  Wed Jan 15 08:59:20 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 09:59:29 2003
Subject: [Spambayes] 
 new attempt at non-technical explanation on index.html of website
In-Reply-To: <200301150344.h0F3ijF01416@localhost.localdomain>
References: <200301150344.h0F3ijF01416@localhost.localdomain>
Message-ID: <15909.30536.143312.323641@montanaro.dyndns.org>


    Anthony> I just checked in some text that attempts to explain, in a
    Anthony> mostly non-technical way, how spambayes works. It's the
    Anthony> "handwaving" bit on the index.html document on the website.

Looks good.

    Anthony> suggestions for improvement accepted.

I just checked out the website module and noticed that whatever editor you
use doesn't wrap lines (most <p>...</p> chunks are one big honkin' line).
That makes it a bit problematic to edit text in Emacs.  If I wrap
the lines will it hose your editing?

Skip

From skip at pobox.com  Wed Jan 15 11:28:48 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 12:28:57 2003
Subject: [Spambayes] spambayes fronting a mailing list?
Message-ID: <15909.39504.598866.52741@montanaro.dyndns.org>


I know Barry's working on spambayes integration with Mailman.  Pretend I
can't wait that long. ;-) Ignoring training issues (I can solve them without
much problem), should I be able to just stick "hammie.py -f ..." in front of
mailman in my aliases file and then just edit my "hold postings" regular
expression?  Am I missing something obvious?

Skip

From skip at pobox.com  Wed Jan 15 14:43:17 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 15:43:27 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
Message-ID: <15909.51173.814202.900365@montanaro.dyndns.org>


I'm sure others have considered this already, but I began wondering today
how hard it would be to separate pop3proxy into two pieces, the proxy stuff
and the training/web stuff.  I think having a separate training interface
would be good because it could then be used by other spambayes tools.

For example, just today I modified some Mailman-managed mailing lists to
pump incoming messages through "hammie.py -f" before passing along to
Mailman:

    #!/bin/bash
    BAYESHOME=/home/skip
    export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

(Please don't flog me for using stripmime.pl.  I'm sure there are better
MIME strippers out there, but it works fine for my needs. ;-)

For the time being I'm just using my own training database which is a
superset of what goes to that particular mailing list.

The "bright idea" I had today was that it would be great to simply modify
the above pipeline to

    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
    | tee /tmp/cedu-list-trainer \
    | /usr/local/bin/stripmime.pl \
    | /home/mailman/mail/wrapper "$@"

and have the training stuff from pop3proxy waiting on a Unix named pipe
named /tmp/cedu-list-trainer.  At my leisure I could then visit the web
interface and train any collected messages.

The "tee" command could be replaced by a simple little tee-like program
which disposed of the file in some other fashion, perhaps by using HTTP PUT
to toss it at the training server.

Any thoughts on this?  Richie?

Thx,

Skip


From tim at fourstonesExpressions.com  Wed Jan 15 14:52:37 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Wed Jan 15 15:53:18 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
In-Reply-To: <15909.51173.814202.900365@montanaro.dyndns.org>
Message-ID: <OLLGEAF2VVPXSNMCA96HFWQQLCAGDN.3e25ca15@myst>

1/15/2003 2:43:17 PM, Skip Montanaro <skip@pobox.com> wrote:

>
>I'm sure others have considered this already, but I began wondering today
>how hard it would be to separate pop3proxy into two pieces, the proxy stuff
>and the training/web stuff.  I think having a separate training interface
>would be good because it could then be used by other spambayes tools.
>
>For example, just today I modified some Mailman-managed mailing lists to
>pump incoming messages through "hammie.py -f" before passing along to
>Mailman:
>
>    #!/bin/bash
>    BAYESHOME=/home/skip
>    export BAYESCUSTOMIZE=$BAYESHOME/hammie.opt
>
>    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
>    | /usr/local/bin/stripmime.pl \
>    | /home/mailman/mail/wrapper "$@"
>
>(Please don't flog me for using stripmime.pl.  I'm sure there are better
>MIME strippers out there, but it works fine for my needs. ;-)
>
>For the time being I'm just using my own training database which is a
>superset of what goes to that particular mailing list.
>
>The "bright idea" I had today was that it would be great to simply modify
>the above pipeline to
>
>    /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
>    | tee /tmp/cedu-list-trainer \
>    | /usr/local/bin/stripmime.pl \
>    | /home/mailman/mail/wrapper "$@"
>
>and have the training stuff from pop3proxy waiting on a Unix named pipe
>named /tmp/cedu-list-trainer.  At my leisure I could then visit the web
>interface and train any collected messages.
>
>The "tee" command could be replaced by a simple little tee-like program
>which disposed of the file in some other fashion, perhaps by using HTTP PUT
>to toss it at the training server.
>
>Any thoughts on this?  Richie?

The training stuff used by the pop3proxy is already 'stripped out' into 
Corpus.py and FileCorpus.py.  These modules probably don't do exactly what you 
need right now, but we've been considering rewriting them anyway, to handle 
more than just file system artifacts for messages.  You might take a look at 
those modules.  I have some ideas about rewriting them, Mark Hammond has 
levied some requirements as well...

>
>Thx,
>
>Skip
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Wed Jan 15 13:08:39 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 15 16:08:48 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15909.39504.598866.52741@montanaro.dyndns.org> (Skip
 Montanaro's message of "Wed, 15 Jan 2003 11:28:48 -0600")
References: <15909.39504.598866.52741@montanaro.dyndns.org>
Message-ID: <w53adi25dew.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

> I know Barry's working on spambayes integration with Mailman.  Pretend I
> can't wait that long. ;-) Ignoring training issues (I can solve them without
> much problem), should I be able to just stick "hammie.py -f ..." in front of
> mailman in my aliases file and then just edit my "hold postings" regular
> expression?  Am I missing something obvious?

That seems like it'd work, but please use hammiefilter.  Running hammie
-f is deprecated (meaning that as soon as I get a round tuit, hammie
will no longer be executable).

Neale

From neale at woozle.org  Wed Jan 15 13:12:25 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 15 16:12:29 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how
 hard?
In-Reply-To: <15909.51173.814202.900365@montanaro.dyndns.org> (Skip
 Montanaro's message of "Wed, 15 Jan 2003 14:43:17 -0600")
References: <15909.51173.814202.900365@montanaro.dyndns.org>
Message-ID: <w537kd65d8m.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

>     /usr/local/bin/hammie.py -f -d -p $BAYESHOME/hammie.db \
>     | /usr/local/bin/stripmime.pl \
>     | /home/mailman/mail/wrapper "$@"
>
> (Please don't flog me for using stripmime.pl.  I'm sure there are better
> MIME strippers out there, but it works fine for my needs. ;-)

If you ever need an optimization, it occurs to me that hammie.py will
have already pulled the message apart into MIME parts, so you should be
able to start with hammiefilter.py and write a dual spamcheck/MIME-strip
program.

Someone else might want this too--for example, SpamAssassin munges MIME
for tagged spam, presumably to protect the "click first, ask questions
later" crowd :)

Neale

From skip at pobox.com  Wed Jan 15 15:23:53 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 16:24:08 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <w53adi25dew.fsf@woozle.org>
References: <15909.39504.598866.52741@montanaro.dyndns.org>
        <w53adi25dew.fsf@woozle.org>
Message-ID: <15909.53609.187510.588099@montanaro.dyndns.org>


    Neale> That seems like it'd work, but please use hammiefilter.  Running
    Neale> hammie -f is deprecated (meaning that as soon as I get a round
    Neale> tuit, hammie will no longer be executable).

Hmmm...:

    % type hammiefilter.py
    hammiefilter.py is /Users/skip/local/bin/hammiefilter.py
    % hammiefilter.py --help
    Traceback (most recent call last):
      File "/Users/skip/local/bin/hammiefilter.py", line 43, in ?
        from spambayes import hammie, Options, StringIO
    ImportError: cannot import name StringIO

Looks like a transcription error in the grand directory shuffling.  I just
checked in a fix.  I suspect nobody who uses hammiefilter.py has cvs up'd
recently.

Skip


From neale at woozle.org  Wed Jan 15 13:28:12 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 15 16:28:16 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15909.53609.187510.588099@montanaro.dyndns.org> (Skip
 Montanaro's message of "Wed, 15 Jan 2003 15:23:53 -0600")
References: <15909.39504.598866.52741@montanaro.dyndns.org>
	<w53adi25dew.fsf@woozle.org>
	<15909.53609.187510.588099@montanaro.dyndns.org>
Message-ID: <w534r8a5cib.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

> Looks like a transcription error in the grand directory shuffling.  I just
> checked in a fix.  I suspect nobody who uses hammiefilter.py has cvs up'd
> recently.

Yup.  But it turns out we don't even need to import StringIO, so I just
checked in its removal :)

Thanks!

Neale


From skip at pobox.com  Wed Jan 15 16:14:25 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 17:14:35 2003
Subject: [Spambayes] Something's still missing from hammiefilter
Message-ID: <15909.56641.568386.266344@montanaro.dyndns.org>


Neale encouraged me to use "hammiefilter.py" instead of "hammmie.py -f", but
it doesn't support enough command line args.  I currently call hammie.py
from procmail like so:

    HAMMIE=$HOME/local/bin/hammie.py
    ...
    :0 fw:hamlock
    | $HAMMIE -f -d -p $HOME/hammie.db

The -d (use dbm) and -p (specify pickle or database file) flags are missing.
I'd really prefer these be available on the command line as well as via the
options file.  Is there a reason not to expose them on the command line?

Skip

From skip at pobox.com  Wed Jan 15 19:51:04 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 20:51:10 2003
Subject: [Spambayes] pop3proxy.UserInterface.onSave - self.shutdown?
Message-ID: <15910.4104.787891.400893@montanaro.dyndns.org>


Pychecker complains about the call to self.shutdown(2) on line 1441 of
pop3proxy.py.  It should probably be self.socket.shutdown(2), but I'll let
someone else who knows the code better verify that.

Skip

From anthony at interlink.com.au  Thu Jan 16 12:55:17 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Jan 15 20:56:48 2003
Subject: [Spambayes] new attempt at non-technical explanation on
	index.html of website 
In-Reply-To: <15909.30536.143312.323641@montanaro.dyndns.org> 
Message-ID: <200301160155.h0G1tIb02948@localhost.localdomain>


>>> Skip Montanaro wrote
> I just checked out the website module and noticed that whatever editor you
> use doesn't wrap lines (most <p>...</p> chunks are one big honkin' line).
> That makes it a bit problematic to edit text in Emacs.  If I wrap
> the lines will it hose your editing?

Nope. I'm just lazy with vi - much of the verbiage is done in large slabs
of typing, and I hate autowrapping :)

Feel free to foldspindlemutilate.


From skip at pobox.com  Wed Jan 15 19:10:47 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 15 21:27:50 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
In-Reply-To: <OLLGEAF2VVPXSNMCA96HFWQQLCAGDN.3e25ca15@myst>
References: <15909.51173.814202.900365@montanaro.dyndns.org>
        <OLLGEAF2VVPXSNMCA96HFWQQLCAGDN.3e25ca15@myst>
Message-ID: <15910.1687.290158.515305@montanaro.dyndns.org>


    >> I'm sure others have considered this already, but I began wondering
    >> today how hard it would be to separate pop3proxy into two pieces, the
    >> proxy stuff and the training/web stuff.  I think having a separate
    >> training interface would be good because it could then be used by
    >> other spambayes tools.

    Tim> The training stuff used by the pop3proxy is already 'stripped out'
    Tim> into Corpus.py and FileCorpus.py.  These modules probably don't do
    Tim> exactly what you need right now, but we've been considering
    Tim> rewriting them anyway, to handle more than just file system
    Tim> artifacts for messages.  

Thanks, I'll take a look.  I'm interested in separating the POP stuff from
the training/web stuff.  Maybe I could simply delete the POP stuff and see
what's left. ;-)

Skip


From T.A.Meyer at massey.ac.nz  Thu Jan 16 18:37:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 16 00:38:31 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAEC@its-xchg2.massey.ac.nz>

[Tony]
> > Couldn't the FolderSelection dialog only load in the folders
> > it needs to display?  i.e. at first it loads in the root
> > folders, and then whenever OnTreeItemExpanding is called it
> > adds in the necessary children?  If this is
> > practical/possible then if no-one else wants to do it, I
> > could give it a go.  (Although I have all of two days of
> > Python knowledge).

[Mark]
> :) go for it!  pywin\tools\hierlist.py has an example of
> OnTreeItemExpanding.  Only complication will be that the current code
> expands the tree to show the currently selected folders - 
> this should be doable though.

It's done.  Well, it works on my system, anyway :)  Including the expanding the tree to show the selected items.  So what do I do with my code now?

> Still-wouldn't-mind-getting-that-MAPI-version-going ly,

I *think* that this should all still work (with a bit of tweaking) with the MAPI version, which would make things even faster :)  Still, this is fast enough for me - the dialog takes about 0.5->1s to appear, rather than the 30s (MAPI) or 60s (Outlook) that it did before.  Of course, anyone with a really large, really flat folder structure will still have to wait, but they should just be more organised :)

=Tony Meyer

From anthony at interlink.com.au  Thu Jan 16 16:38:25 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 16 00:40:08 2003
Subject: [Spambayes] credit/blame
Message-ID: <200301160538.h0G5cQS14083@localhost.localdomain>


Here's what's currently on the index.html page for the 'credits/blame'
section. Have I missed anyone? It's possible, as I'm feeling amazingly
fuzzy-brained today. It's a chunk I wrote some time ago, so it's 
probably missing people...

Most of the heavy lifting on this project was done by Tim Peters, with
the cast of spambayes obsessive-compulsives providing ideas, heckling, and
testing. Gary Robinson and Rob Hooft contributed valuable help on the maths
behind it all. Mark Hammond amazed the world with the Outlook2000 plugin,
and Rich Hindle, Neale Pickett, Tim Stone worked on the end-user applications.

If I have missed someone, or misrepresented their work, my apologies -
please drop me an email... or should we simply have an 'Acknowledgments'
file in the distribution?


From T.A.Meyer at massey.ac.nz  Thu Jan 16 18:43:21 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 16 00:44:10 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAED@its-xchg2.massey.ac.nz>

> > > The outlook version was added because the IDs that 
> > > MAPI returns aren't compatible with the outlook IDs 
> > > and you can't open a message on an exchange server 
> > > with a MAPI ID. 
> > Is this a definitive "can't", or a 'no-one has figured out 
> > how to yet' "can't"? 
> I think it's more like a "you should be able to, and the docs say so, but it just doesn't work. Ugh..." 

:) Does this mean that it's not worth bothering to try and fix it?

In any case, I did the mod that changes the list to only build on demand.  This is faster, although not what I would call fast.  (But then, my system isn't that fast, and I'm shackled up to a large public folder through work).

Would implementing the _BuildFolderTreeOutlook() function in C/C++ make a significant difference?  (I guess what I'm asking is whether it's just Outlook itself that is causing the delay).

=Tony Meyer

From barry at python.org  Thu Jan 16 00:51:57 2003
From: barry at python.org (Barry A. Warsaw)
Date: Thu Jan 16 00:52:25 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15909.39504.598866.52741@montanaro.dyndns.org>
Message-ID: <15910.18557.535408.669103@gargle.gargle.HOWL>


>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

    SM> I know Barry's working on spambayes integration with Mailman.
    SM> Pretend I can't wait that long. ;-) Ignoring training issues
    SM> (I can solve them without much problem), should I be able to
    SM> just stick "hammie.py -f ..." in front of mailman in my
    SM> aliases file and then just edit my "hold postings" regular
    SM> expression?  Am I missing something obvious?

This ought to work fairly well, I think, modulo the training issue.
My idea was to not train the list at all, before turning on
spambayes.  So the first batch of messages will all get held as
unsure, and you'd use the admindb page to accept and reject messages.
Accept messages would train as ham and rejected messages would get
trained as spam.

The u/i for these options is undecided -- maybe you have an additional
"train as..." radio button.  I don't think this matters much right
now.

So as your list warms up, you'll be training the system.  I wonder how
long it'll take before spambayes gets pretty good at detecting what's
appropriate and what's not for your list?

-Barry

From barry at python.org  Thu Jan 16 00:52:52 2003
From: barry at python.org (Barry A. Warsaw)
Date: Thu Jan 16 00:53:20 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
References: <15909.51173.814202.900365@montanaro.dyndns.org>
Message-ID: <15910.18612.748855.750857@gargle.gargle.HOWL>


>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

    SM> (Please don't flog me for using stripmime.pl.  I'm sure there
    SM> are better MIME strippers out there, but it works fine for my
    SM> needs. ;-)

Of course, you know that Mailman 2.1 has this built in, right?

-Barry

From anthony at interlink.com.au  Thu Jan 16 17:11:59 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 16 01:13:43 2003
Subject: [Spambayes] spambayes fronting a mailing list? 
In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL> 
Message-ID: <200301160612.h0G6C0x14523@localhost.localdomain>


>>> Barry A. Warsaw wrote
> So as your list warms up, you'll be training the system.  I wonder how
> long it'll take before spambayes gets pretty good at detecting what's
> appropriate and what's not for your list?

This seems like a plan - so long as the UI doesn't suck too hard :)

Previous experiments have shown that it learns _really_ quickly, 
if the subject matter's really focussed... something like 20 messages
gave a remarkably good result, from memory.

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From ducky at webfoot.com  Wed Jan 15 22:31:29 2003
From: ducky at webfoot.com (Kaitlin Duck Sherwood)
Date: Thu Jan 16 01:28:51 2003
Subject: [Spambayes] Two Stage Plan
In-Reply-To: <BA24FFE4.1ADB9%grobinson@transpose.com>
References: <BA24FFE4.1ADB9%grobinson@transpose.com>
Message-ID: <p05100304ba4bf466ee7e@[10.0.0.2]>

(Sorry I'm late to this particular discussion on using postage...)

I'd like to suggest
+ making the postage stamp computationally VERY expensive for the client, and
+ assume that users look at postage as only one factor in judging spaminess.

For example, hypothetically:
+ For me, anybody on my whitelist gets their messages through without postage.
+ For Frieda, any message without postage gets through if it's got a 
SpamAssassin score of less than 3.
+ For Paul, any message that his Bayesian algorithm rates as <20% 
likely to be spam gets through without postage.
+ For Chantelle, any message without postage gets a 
reverse-Turing-test challenge.

If postage is only one factor, then it can be useful before 
"everybody" adopts it.  If postage is only one factor, then listbots 
can insist on one postage unit for messages that the listbot 
receives, but the listbot can then send out out messages (to the 
teeming hordes on the list) without postage


I want postage to be computationally *very* expensive.  Like five or 
ten minutes on a (currently) high-end desktop.  I want strangers to 
have to spend some time -- not just money -- to be sure that I'll 
read their messages.  Shoot, I don't even care if there is no money 
involved at all, "only" time.

I also want the reverse algorithm -- where I check to see if their 
token is valid -- to be very fast.


So are there any one-way algorithms that would involve my email 
address and some other piece of changing data, like seconds since Jan 
1, 1970?  Or perhaps make and use a Web service that generates and 
posts random time-stamped numbers?  (A web service with random, 
time-stamped numbers could also provide for essentially constant 
difficulty as processors get higher-powered, e.g. the random numbers 
keep getting bigger.)


BTW, anyone who is going to the spam conference, look for me in the 
colored (probably purple) beret!  

From mhammond at skippinet.com.au  Thu Jan 16 17:30:25 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 16 01:30:45 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAEC@its-xchg2.massey.ac.nz>
Message-ID: <0b0001c2bd28$c31267a0$530f8490@eden>

> It's done.  Well, it works on my system, anyway :)  Including 
> the expanding the tree to show the selected items.  So what 
> do I do with my code now?

Mail it to me :)

Mark.


From mhammond at skippinet.com.au  Thu Jan 16 17:33:51 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 16 01:34:47 2003
Subject: [Spambayes] Outlook plugin & bad folders
In-Reply-To: <98B01D2717B9D411B38F0008C78409310EE3DAED@its-xchg2.massey.ac.nz>
Message-ID: <0b0101c2bd29$3c81e610$530f8490@eden>

> Would implementing the _BuildFolderTreeOutlook() function in
> C/C++ make a significant difference?  (I guess what I'm
> asking is whether it's just Outlook itself that is causing the delay).

Yes, Outlook itself is the problem.  MAPI is the high-performance API, and
as you can see, Python does indeed get high-performance using it.

I'm still yet to know any specific details on what problem we have when
using the MAPI version.

Mark.


From piersh at friskit.com  Wed Jan 15 23:34:54 2003
From: piersh at friskit.com (Piers Haken)
Date: Thu Jan 16 02:18:46 2003
Subject: [Spambayes] Outlook plugin & bad folders
Message-ID: <9891913C5BFE87429D71E37F08210CB929753E@zeus.sfhq.friskit.com>

The problem is that the GetFolderFromID call in outlook's object model
(called from MAPIMsgStoreFolder.GetOutlookItem) does not accept MAPI
folder IDs when those folders are on an exchange server. It probably has
something to do with the fact that when you're using PSTs the object
model and the underlying MAPI store are the same thing, but when you're
using exchange the store is a separate component. In theory they should
be able to keep the oulook IDs and the exchange MAPI IDs consistent, but
in practice...

Piers.

> -----Original Message-----
> From: Mark Hammond [mailto:mhammond@skippinet.com.au] 
> Sent: Wednesday, January 15, 2003 10:34 PM
> To: 'Meyer, Tony'; Piers Haken; spambayes@python.org
> Subject: RE: [Spambayes] Outlook plugin & bad folders
> 
> 
> > Would implementing the _BuildFolderTreeOutlook() function in C/C++ 
> > make a significant difference?  (I guess what I'm asking is whether 
> > it's just Outlook itself that is causing the delay).
> 
> Yes, Outlook itself is the problem.  MAPI is the 
> high-performance API, and as you can see, Python does indeed 
> get high-performance using it.
> 
> I'm still yet to know any specific details on what problem we 
> have when using the MAPI version.
> 
> Mark.
> 
> 
From vanhorn at whidbey.com  Thu Jan 16 00:28:24 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Thu Jan 16 03:28:26 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15909.39504.598866.52741@montanaro.dyndns.org>
	<15910.18557.535408.669103@gargle.gargle.HOWL>
Message-ID: <3E266D28.8591603A@whidbey.com>

Barry,

If your going into the administrative interface anyway ...
When I used to get a message from Mailman about a message being held for
my approval, the first page I hit told me why, whether it was too large,
non-member post, whatever. Now I have to jump to a second page to learn
that for each sender (and it's normally only one message per sender).
Since you are putting up the neat summary windows with all the options on
that first page, could we please have the reason for the hold in there?
Pretty please?

As to how long it would take to make a difference, based on what folks
have said here I suspect that any list with ten messages a day would be
over 99% accurate by the end of a week. Since half my Mailman moderation
is probably spam these days, I'm looking forward to it.

Van

"Barry A. Warsaw" wrote:

> >>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:
>
>     SM> I know Barry's working on spambayes integration with Mailman.
>     SM> Pretend I can't wait that long. ;-) Ignoring training issues
>     SM> (I can solve them without much problem), should I be able to
>     SM> just stick "hammie.py -f ..." in front of mailman in my
>     SM> aliases file and then just edit my "hold postings" regular
>     SM> expression?  Am I missing something obvious?
>
> This ought to work fairly well, I think, modulo the training issue.
> My idea was to not train the list at all, before turning on
> spambayes.  So the first batch of messages will all get held as
> unsure, and you'd use the admindb page to accept and reject messages.
> Accept messages would train as ham and rejected messages would get
> trained as spam.
>
> The u/i for these options is undecided -- maybe you have an additional
> "train as..." radio button.  I don't think this matters much right
> now.
>
> So as your list warms up, you'll be training the system.  I wonder how
> long it'll take before spambayes gets pretty good at detecting what's
> appropriate and what's not for your list?
>
> -Barry
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes

--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------


From richie at entrian.com  Thu Jan 16 08:54:09 2003
From: richie at entrian.com (richie@entrian.com)
Date: Thu Jan 16 03:54:18 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
In-Reply-To: <15910.1687.290158.515305@montanaro.dyndns.org>
Message-ID: <E18Z5mr-000ELz-0W@anchor-post-32.mail.demon.net>


[Skip]
> I'm interested in separating the POP stuff from
> the training/web stuff.  Maybe I could simply delete the POP stuff and see
> what's left. ;-)

I think you'd be surprised at how well that would work.  8-)

I've already done half of this job - I need to give it some more testing,
but it's pretty much there.  The POP3 proxy and the web UI are already
fairly independent - they don't communicate directly, but instead refer to
a common set of FileCorpuses.  The new version will enable me to pull them
apart completely, into three separate files - a core server component, the
POP proxy and the web interface.  (Though I'll commit under the current
all-in-pop3proxy.py arrangement first to make it easier to track changes
through CVS.)

Soon you'll be able to run a "Spambayes Server" that provides either the
POP3 proxy or the web interface or both, with no dependencies.  The work
I'll be committing this week is a step towards that.  You should be able
to add a listen-for-incoming-messages-by-HTTP-or-whatever component very
easily - it will plug into the core server and poke messages into the
FileCorpuses in the same way that the POP3 proxy does now.

-- 
Richie Hindle
richie@entrian.com


From rob at hooft.net  Thu Jan 16 13:15:52 2003
From: rob at hooft.net (Rob W. W. Hooft)
Date: Thu Jan 16 07:15:57 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
Message-ID: <3E26A278.3080302@hooft.net>

Anthony Baxter wrote:
>>>>Barry A. Warsaw wrote
>>>
>>So as your list warms up, you'll be training the system.  I wonder how
>>long it'll take before spambayes gets pretty good at detecting what's
>>appropriate and what's not for your list?
> 
> 
> This seems like a plan - so long as the UI doesn't suck too hard :)
> 
> Previous experiments have shown that it learns _really_ quickly, 
> if the subject matter's really focussed... something like 20 messages
> gave a remarkably good result, from memory.

Doesn't it take time before the first spam arrives on a brand new mailinglist? Spambayes' results are going to be real lousy if it is trained on 200 ham and 0 spam messages....

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/


From skip at pobox.com  Thu Jan 16 06:31:43 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 16 07:31:47 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL>
References: <15909.39504.598866.52741@montanaro.dyndns.org>
        <15910.18557.535408.669103@gargle.gargle.HOWL>
Message-ID: <15910.42543.629381.696105@montanaro.dyndns.org>


    BAW> This ought to work fairly well, I think, modulo the training issue.
    BAW> My idea was to not train the list at all, before turning on
    BAW> spambayes.  So the first batch of messages will all get held as
    BAW> unsure, and you'd use the admindb page to accept and reject
    BAW> messages.  Accept messages would train as ham and rejected messages
    BAW> would get trained as spam.

In my case I sidestepped training altogether because the list's content is a
subset of the stuff I'm interested in anyway.  Most of the "spam" messages
encountered by the list at this point are really of the virus/worm variety,
and since it's set up for members only posting, little, if any garbage
actually gets through to the list, even without using spambayes.

    BAW> The u/i for these options is undecided -- maybe you have an
    BAW> additional "train as..." radio button.  I don't think this matters
    BAW> much right now.

One reason I'm interested in separating pop3proxy into two functions ( POP
retrieval/classifying and training/web UI) is that the training/web
component should be useful for other spambayes users.  Right now in my
current environment, training is clunky enough that I only train on unsures
and mistakes.  While that works okay because my starting corpus was so large
(around 20,000 messages) the indications from people who've experimented
with that sort of training is that the quality of classification does
degrade over time.

Last night I ripped out the POP stuff from pop3proxy, renamed the result
proxytrainer and added one extra method, onUpload.  Then I wrote a simple
proxytee.py script which passes stdin to stdout and uploads the message it
received to http://localhost:8880/upload as a file upload (in theory,
allowing upload of large mbox files).  The mbox upload doesn't seem to be
quite working yet and there's still that pesky infinite loop in onReview,
but I have hope it will eventually work pretty well.  At that point, anyone
should be able to use it as a training interface.  All they will need is a
tee-type hook they can insert into their mail transport somewhere.

A bit further down the road, I will probably dump the asyncore stuff in
favor of something based on SimpleHTTPServer just to reduce the number of
lines of code.  Without the POP stuff going on there's no great need for the
channel multiplexing.  Even without threading, the amount of work the server
would have to do per click on the user interface is minimal.

    BAW> So as your list warms up, you'll be training the system.  I wonder
    BAW> how long it'll take before spambayes gets pretty good at detecting
    BAW> what's appropriate and what's not for your list?

Like I indicated, I gave it a head start. ;-)

Skip

From skip at pobox.com  Thu Jan 16 06:36:41 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 16 07:36:44 2003
Subject: [Spambayes] separating training stuff from pop3proxy - how hard?
In-Reply-To: <15910.18612.748855.750857@gargle.gargle.HOWL>
References: <15909.51173.814202.900365@montanaro.dyndns.org>
        <15910.18612.748855.750857@gargle.gargle.HOWL>
Message-ID: <15910.42841.803998.192192@montanaro.dyndns.org>

    SM> (Please don't flog me for using stripmime.pl.  I'm sure there are
    SM> better MIME strippers out there, but it works fine for my needs. ;-)

    BAW> Of course, you know that Mailman 2.1 has this built in, right?

No, actually, I didn't.  I haven't upgraded yet.

Thanks for such a gentle flog...

Skip

From skip at pobox.com  Thu Jan 16 06:51:47 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 16 07:51:51 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <3E26A278.3080302@hooft.net>
References: <200301160612.h0G6C0x14523@localhost.localdomain>
        <3E26A278.3080302@hooft.net>
Message-ID: <15910.43747.285523.378123@montanaro.dyndns.org>


    Rob> Doesn't it take time before the first spam arrives on a brand new
    Rob> mailinglist? Spambayes' results are going to be real lousy if it is
    Rob> trained on 200 ham and 0 spam messages....

A couple of things come to mind:

    1. Don't enable spambayes until you start having trouble

    2. With a proxytrainer/proxytee setup as I described in a previous
       message you can seed it with a handful of spam you have laying about.
       Just set your options to ignore stuff like sender and to while
       training on those messages.

    3. Send your mailing list address directly to the spammers.  They'll
       find it soon enough anyway. ;-)

It's-not-like-spam-is-hard-to-find-ly, y'rs,

Skip

From tim at fourstonesExpressions.com  Thu Jan 16 07:13:09 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 16 08:13:55 2003
Subject: [Spambayes] spambayes fronting a mailing list? 
In-Reply-To: <200301160612.h0G6C0x14523@localhost.localdomain>
Message-ID: <XRLOIOMVA6GE1XRP3Y2VVRLKSQLY.3e26afe5@myst>

1/16/2003 12:11:59 AM, Anthony Baxter <anthony@interlink.com.au> wrote:

>
>>>> Barry A. Warsaw wrote
>> So as your list warms up, you'll be training the system.  I wonder how
>> long it'll take before spambayes gets pretty good at detecting what's
>> appropriate and what's not for your list?
>
>This seems like a plan - so long as the UI doesn't suck too hard :)
>
>Previous experiments have shown that it learns _really_ quickly, 
>if the subject matter's really focussed... something like 20 messages
>gave a remarkably good result, from memory.

This is exactly what I did, and it started producing results immediately.  
After I had trained on only a few (maybe 5) or so spam, it began classifying 
nearly all spam correctly.  It didn't classify ham correctly very often at 
that point, but I was ok with that.  Unsures and ham are much the same to 
me...  - TimS

>
>-- 
>Anthony Baxter     <anthony@interlink.com.au>   
>It's never too late to have a happy childhood.
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From esj at harvee.billerica.ma.us  Thu Jan 16 08:02:59 2003
From: esj at harvee.billerica.ma.us (Eric S. Johansson)
Date: Thu Jan 16 08:14:01 2003
Subject: [Spambayes] Two Stage Plan
In-Reply-To: <p05100304ba4bf466ee7e@[10.0.0.2]>
References: <BA24FFE4.1ADB9%grobinson@transpose.com>
	<p05100304ba4bf466ee7e@[10.0.0.2]>
Message-ID: <3E26AD83.9070500@harvee.billerica.ma.us>

Kaitlin Duck Sherwood wrote:
> (Sorry I'm late to this particular discussion on using postage...)
> 
> I'd like to suggest
> + making the postage stamp computationally VERY expensive for the 
> client, and
> + assume that users look at postage as only one factor in judging 
> spaminess.

actually, there is a project for this type of system called camram.  I 
have a working proof of concept including handling postage due notices 
etc. I'm expanding it to include a Bayesian style filter as a 
discriminator in case a message fails the stamp or white list tests.  If 
you'd like a white paper on this, either way a few days for it to be 
published (theoretically) in the proceedings of the upcoming antispam 
conference, or you can ask me nice and I'll send you the document 
(unfortunately in Microsoft Word format).

> If postage is only one factor, then it can be useful before "everybody" 
> adopts it.  If postage is only one factor, then listbots can insist on 
> one postage unit for messages that the listbot receives, but the listbot 
> can then send out out messages (to the teeming hordes on the list) 
> without postage

not exactly true.  If you use postage due notices with the ability to 
generate postage stamps via a Java applet, you can get some benefit 
without 100 percent adoption.

We operate on the principal that "strangers cost, friends fly free" 
which means that I only expect stamps from people I don't know.  A 
mailing list is someone I know and therefore I don't expect any stamps 
from them.

A mailing list could ask for stamps from everyone but it would make more 
sense to use the postage due mechanism only for nonsubscribers.  Then 
you can use the same technique I do and camram which is that anyone you 
can't deliver a postage due notice to is spam and therefore the message 
can be safely discarded.  Yes, I know it's not strictly true but if you 
pay attention to why message delivery fails, it's effectively true.

white list on the other hand our home other topic and I believe should 
be based on name but on public key.

> I want postage to be computationally *very* expensive.  Like five or ten 
> minutes on a (currently) high-end desktop.  I want strangers to have to 
> spend some time -- not just money -- to be sure that I'll read their 
> messages.  Shoot, I don't even care if there is no money involved at 
> all, "only" time.
> 
> I also want the reverse algorithm -- where I check to see if their token 
> is valid -- to be very fast.

google for Adam Back, hashcash.  Also look for "proof of work" puzzles. 
  unfortunately, proof of work puzzles suffer from Moore's Law 
inflation.  I've been given a lead that says a proof of work puzzled 
exercises the memory bus will be less susceptible to Moore's law 
inflation and I'm talking with a cryptographer about a memory intensive 
POW puzzle.

By the way, if you do the math, a three second computation would slow 
down a high-powered spammer 140 times or, put another way they would 
need 140 machines generating stamps constantly in order to keep up the 
same data rate through a T1.

Computation length is very tricky because you don't want it to be so 
long that you discourage low-end machine users while at the same time, 
not giving high-end machine users a significant advantage.  although, 
this problem can be reduced by pushing the stamp calculation and mail 
delivery into the background.

> So are there any one-way algorithms that would involve my email address 
> and some other piece of changing data, like seconds since Jan 1, 1970?  
> Or perhaps make and use a Web service that generates and posts random 
> time-stamped numbers?  (A web service with random, time-stamped numbers 
> could also provide for essentially constant difficulty as processors get 
> higher-powered, e.g. the random numbers keep getting bigger.)

centralized services fail from a reliability perspective.  They can also 
fail if a service can be corrupted or abused.


> BTW, anyone who is going to the spam conference, look for me in the 
> colored (probably purple) beret! 

as will I. (be there I mean. sans beret, may be bright yellow terry 
cloth hat at times)

---eric


From rob at hooft.net  Thu Jan 16 14:34:35 2003
From: rob at hooft.net (Rob W. W. Hooft)
Date: Thu Jan 16 08:35:58 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<3E26A278.3080302@hooft.net> <15910.43747.285523.378123@montanaro.dyndns.org>
Message-ID: <3E26B4EB.5020100@hooft.net>

Skip Montanaro wrote:
>     Rob> Doesn't it take time before the first spam arrives on a brand new
>     Rob> mailinglist? Spambayes' results are going to be real lousy if it is
>     Rob> trained on 200 ham and 0 spam messages....
> 
> A couple of things come to mind:
> 
>     1. Don't enable spambayes until you start having trouble

It is going to be too late....

>     2. With a proxytrainer/proxytee setup as I described in a previous
>        message you can seed it with a handful of spam you have laying about.
>        Just set your options to ignore stuff like sender and to while
>        training on those messages.

This sounds reasonable, but this can also be implemented as a "preloaded database" that comes with spambayes. This is something many people have already asked for.

>     3. Send your mailing list address directly to the spammers.  They'll
>        find it soon enough anyway. ;-)

www.spamsubmit.com? "Now submit your address to 100s of spam engines without all the hassle! Ever tried to get all spam messages first? Manually typed your address in at many spammer sites? Now, at spamsubmit.com we submit your address to 100s of spam engines without any efforts from you! Introductory price for this service is $99 for this month only!"

Rob
-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/


From barry at python.org  Thu Jan 16 09:35:53 2003
From: barry at python.org (Barry A. Warsaw)
Date: Thu Jan 16 09:37:47 2003
Subject: [Spambayes] spambayes fronting a mailing list? 
References: <15910.18557.535408.669103@gargle.gargle.HOWL>
	<200301160612.h0G6C0x14523@localhost.localdomain>
Message-ID: <15910.49993.523948.576657@gargle.gargle.HOWL>


>>>>> "AB" == Anthony Baxter <anthony@interlink.com.au> writes:

    >> Barry A. Warsaw wrote
    >> So as your list warms up, you'll be training the system.  I
    >> wonder how long it'll take before spambayes gets pretty good at
    >> detecting what's appropriate and what's not for your list?

    AB> This seems like a plan - so long as the UI doesn't suck too
    AB> hard :)

The u/i already sucks so I doubt it could suck any worse. :)

    AB> Previous experiments have shown that it learns _really_
    AB> quickly, if the subject matter's really focussed... something
    AB> like 20 messages gave a remarkably good result, from memory.

That's what I'm counting on!
-Barry

From barry at python.org  Thu Jan 16 09:43:49 2003
From: barry at python.org (Barry A. Warsaw)
Date: Thu Jan 16 09:44:23 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15909.39504.598866.52741@montanaro.dyndns.org>
	<15910.18557.535408.669103@gargle.gargle.HOWL>
	<3E266D28.8591603A@whidbey.com>
Message-ID: <15910.50469.559467.710145@gargle.gargle.HOWL>


>>>>> "GAVH" == G Armour Van Horn <vanhorn@whidbey.com> writes:

    GAVH> If your going into the administrative interface anyway ...
    GAVH> When I used to get a message from Mailman about a message
    GAVH> being held for my approval, the first page I hit told me
    GAVH> why, whether it was too large, non-member post,
    GAVH> whatever. Now I have to jump to a second page to learn that
    GAVH> for each sender (and it's normally only one message per
    GAVH> sender).  Since you are putting up the neat summary windows
    GAVH> with all the options on that first page, could we please
    GAVH> have the reason for the hold in there?  Pretty please?

This is better discussed on mailman-developers, or better yet, file a
bug report. :)  But it seems like a reasonable suggestion!

    GAVH> As to how long it would take to make a difference, based on
    GAVH> what folks have said here I suspect that any list with ten
    GAVH> messages a day would be over 99% accurate by the end of a
    GAVH> week. Since half my Mailman moderation is probably spam
    GAVH> these days, I'm looking forward to it.

That's doesn't sound too onerous as a training regimen for lists.
-Barry

From barry at python.org  Thu Jan 16 09:57:00 2003
From: barry at python.org (Barry A. Warsaw)
Date: Thu Jan 16 09:57:42 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<3E26A278.3080302@hooft.net>
Message-ID: <15910.51260.847140.60292@gargle.gargle.HOWL>


>>>>> "RWWH" == Rob W W Hooft <rob@hooft.net> writes:

    RWWH> Doesn't it take time before the first spam arrives on a
    RWWH> brand new mailinglist? Spambayes' results are going to be
    RWWH> real lousy if it is trained on 200 ham and 0 spam
    RWWH> messages....

Why?  Because those spams will be marked as "unsure"?  Under my
(current) approach, once the messages start getting marked as ham,
even if they're held for approval for other reasons, they wouldn't go
into the ham training when approved.  Presumably, spams that later
come in would be marked as spam and when rejected would go into the
spam training.

But that's all just conjecture.  I've no idea whether that will really
work in practice.  I've got a back up plan if not, but it's more
complicated and requires more work from the list admin, so I'd like to
experiment with the simpler approach first.

-Barry

From rob at hooft.net  Thu Jan 16 17:14:02 2003
From: rob at hooft.net (Rob W. W. Hooft)
Date: Thu Jan 16 11:14:07 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<3E26A278.3080302@hooft.net> <15910.51260.847140.60292@gargle.gargle.HOWL>
Message-ID: <3E26DA4A.40404@hooft.net>

Barry A. Warsaw wrote:
>>>>>>"RWWH" == Rob W W Hooft <rob@hooft.net> writes:
>>>>>
> 
>     RWWH> Doesn't it take time before the first spam arrives on a
>     RWWH> brand new mailinglist? Spambayes' results are going to be
>     RWWH> real lousy if it is trained on 200 ham and 0 spam
>     RWWH> messages....
> 
> Why?  Because those spams will be marked as "unsure"? 

Isn't everything going to be marked as unsure as long as there 
is no spam at all? That would not be very useful! AFAICS, nothing 
can be marked "ham" until there is spam in the database.

Rob

-- 
Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/


From tim.one at comcast.net  Thu Jan 16 11:35:39 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 16 11:36:11 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15910.18557.535408.669103@gargle.gargle.HOWL>
Message-ID: <BIEJKCLHCIOIHAGOKOLHOEKOEIAA.tim.one@comcast.net>

[Barry A. Warsaw]
> ...
> My idea was to not train the list at all, before turning on
> spambayes.  So the first batch of messages will all get held as
> unsure, and you'd use the admindb page to accept and reject messages.
> Accept messages would train as ham and rejected messages would get
> trained as spam.

Better to start by training on a few spam, and a few copies of the list
introduction msg (a decent intro msg necessarily contains many words and
lexicalisms characteristic of the list's topic).

If you have only ham in the database, the false negative rate will zoom
(every word in the database will be hammish).

If you have only spam in the database, the false positive rate will zoom
(every word in the database will be spammish).

> ...
> I wonder how long it'll take before spambayes gets pretty good at
> detecting what's appropriate and what's not for your list?

Depends more on list throughput than on time, i.e. it depends more on total
# of msgs trained on.  By the time you've got 1 of each kind, it should do
better than chance.  By the time you've got 20 of each kind, it should be a
major help.  By the time you've got 500 of each, it should be excellent.  By
the time you've got 15,000 of each, both error rates in c.l.py tests were
statistically indistinguishable from 0.

I keep hearing that spammers have gotten cleverer since then, but I haven't
seen evidence of it in my own email.  The spam that sneaks through seems
much more likely to be due to spammer incompetence (like spam where they
forget to put *anything* in the msg body).


From noreply at sourceforge.net  Thu Jan 16 08:34:18 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 16 11:38:25 2003
Subject: [Spambayes] 
 [ spambayes-Bugs-669149 ] NameError in ExpiryCorpus.removeExpiredMessages
Message-ID: <E18ZCyA-0002bG-00@sc8-sf-web2.sourceforge.net>

Bugs item #669149, was opened at 2003-01-16 10:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tim Stone (timstone4)
Summary: NameError in ExpiryCorpus.removeExpiredMessages

Initial Comment:
In verbose mode, removeExpiredMessages prints out a line which
references the nonexistent variable, key.  I have no idea what it
should be, otherwise I'd fix it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702

From noreply at sourceforge.net  Thu Jan 16 08:39:59 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 16 11:38:43 2003
Subject: [Spambayes] [ spambayes-Bugs-651365 ] getattr recursion in Corpus.py
Message-ID: <E18ZD3f-0006rV-00@sc8-sf-web1.sourceforge.net>

Bugs item #651365, was opened at 2002-12-10 04:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Wolfgang Strobl (strobl)
>Assigned to: Tim Stone (timstone4)
Summary: getattr recursion in Corpus.py

Initial Comment:
After feeding a bunch of new messages into pop3proxy, 
classifying them and when trying to save the result, I got 
a recursion loop (followed by recursion depth exceeded) 
in \cvshome\spambayes\Corpus.py|__getattr__|269]

After looking into setSubstance, I noticed that 
setSubstance (called by load) only sets the attributes 
payload and hdrtext when the pattern matches. 

I temporarily added an else clause to bmatch, i.e.

     if bmatch:
            self.payload = bmatch.group(2)
            self.hdrtxt = sub[:bmatch.start(2)]
            print ".",
        else:
            self.payload = "nix\r\n"
            self.hdrtxt="nix\r\n"
            print "?", len(sub),

and indeed, when trying to save, I notice that after about 
800 good messages, ~ 100 have an empty message, 
see the output below. 

I don't really know what I'm doing here, but at this fix at 
least allows me to continue.

-------------------------

C:\archiv\cvshome\spambayes>python -u pop3proxy.py -
l 8110 mail.gmd.de
Loading database... Done.
Listener on port 8110 is proxying mail:110
User interface url is http://localhost:8880
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 . . . . . . . . .
. . . . . .

-----------------------
Initial traceback:

error: uncaptured python exception, closing channel 
<__main__.UserInterface conn
ected at 0x2213470> 
(exceptions.RuntimeError:maximum recursion depth 
exceeded [C
:\Python22\lib\asyncore.py|poll|95] [C:\Python22
\lib\asyncore.py|handle_read_eve
nt|392] [C:\Python22\lib\asynchat.py|handle_read|112] 
[C:\archiv\cvshome\spambay
es\pop3proxy.py|found_terminator|804] 
[C:\archiv\cvshome\spambayes\pop3proxy.py|
onRequest|830] 
[C:\archiv\cvshome\spambayes\pop3proxy.py|onReview|1
093] [C:\arch
iv\cvs\spambayes\Corpus.py|takeMessage|188] 
[C:\archiv\cvs\spambayes\FileCorpus.
py|addMessage|140] 
[C:\archiv\cvs\spambayes\FileCorpus.py|store|231] 
[C:\archiv\
cvs\spambayes\Corpus.py|getSubstance|318] 
[C:\archiv\cvs\spambayes\Corpus.py|__g
etattr__|269] 
[C:\archiv\cvs\spambayes\Corpus.py|__getattr__|269] 
[C:\archiv\cvs
\spambayes\Corpus.py|__getattr__|269] 
[C:\archiv\cvs\spambayes\Corpus.py|__getat


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2003-01-16 10:39

Message:
Logged In: YES 
user_id=44345

Assigning to Tim Stone.  I think this is the same problem I reported on the
list the other day.  I think the offending code is in Corpus.__getitem__.  The
test of amsg - "if not amsg" should be "if amsg is None" I think.  I suspect
a fix further up the line as the OP indicated would probably do the trick.

If you don't do something to set self.hdrtxt I believe it is None and you infloop trying to resolve a non-existent __nonzero__ method.

Something like that. ;-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

From skip at pobox.com  Thu Jan 16 11:45:49 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 16 12:45:57 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in
Message-ID: <15910.61389.133887.569308@montanaro.dyndns.org>

I just checked in proxytrainer.py and proxytee.py.  The former is
essentially pop3proxy.py with the POP stuff removed.  I know this results in
a large amount of code duplication, but a) it was the fastest way for me to
get a GUI training interface without using POP, and b) maybe I can convince
the pop3proxy advocates to slim it down by ripping out the user interface
stuff. ;-) Proxytee.py is like the Unix tee program (copy stdin to stdout
and an external file), except the "external file" is to upload the message
or mailbox as a file to proxytrainer.py.

I'm still experimenting with things, but should have proxytee.py embedded
into my procmailrc file by the end of the day once I refresh my memory on
flags and such.

Come to think of it, hammiefilter.py could pretty easily be extended to do
the file upload.  The core functionality is implemented in two functions
Wade Leftwich posted to the Python Cookbook.  hmmm...

Skip

From tim at fourstonesExpressions.com  Thu Jan 16 11:52:15 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 16 12:52:53 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHOEKOEIAA.tim.one@comcast.net>
Message-ID: <QOPMKD0C973WQICTSYT09NB953IEYV.3e26f14f@myst>

1/16/2003 10:35:39 AM, Tim Peters <tim.one@comcast.net> wrote:

>[Barry A. Warsaw]
>> ...
>> My idea was to not train the list at all, before turning on
>> spambayes.  So the first batch of messages will all get held as
>> unsure, and you'd use the admindb page to accept and reject messages.
>> Accept messages would train as ham and rejected messages would get
>> trained as spam.

I think I'm hearing something on this thread that doesn't make much sense to 
me.  If we always train as spam stuff that's been classified as spam, always 
train as ham stuff that's been classified as ham, then we're kinda reinforcing 
the obvious, and increasing the spaminess of words in that spam... isn't it 
more realistic (and ultimately actually better) to train on a random sample 
rather than always?  - TimS

>
>Better to start by training on a few spam, and a few copies of the list
>introduction msg (a decent intro msg necessarily contains many words and
>lexicalisms characteristic of the list's topic).
>
>If you have only ham in the database, the false negative rate will zoom
>(every word in the database will be hammish).
>
>If you have only spam in the database, the false positive rate will zoom
>(every word in the database will be spammish).
>
>> ...
>> I wonder how long it'll take before spambayes gets pretty good at
>> detecting what's appropriate and what's not for your list?
>
>Depends more on list throughput than on time, i.e. it depends more on total
># of msgs trained on.  By the time you've got 1 of each kind, it should do
>better than chance.  By the time you've got 20 of each kind, it should be a
>major help.  By the time you've got 500 of each, it should be excellent.  By
>the time you've got 15,000 of each, both error rates in c.l.py tests were
>statistically indistinguishable from 0.
>
>I keep hearing that spammers have gotten cleverer since then, but I haven't
>seen evidence of it in my own email.  The spam that sneaks through seems
>much more likely to be due to spammer incompetence (like spam where they
>forget to put *anything* in the msg body).
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim at fourstonesExpressions.com  Thu Jan 16 11:57:05 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 16 12:57:42 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in
In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org>
Message-ID: <JZYZYM325YKF3Y2Z86IHGBB76ZPJJF.3e26f271@myst>

1/16/2003 11:45:49 AM, Skip Montanaro <skip@pobox.com> wrote:

>I just checked in proxytrainer.py and proxytee.py.  The former is
>essentially pop3proxy.py with the POP stuff removed.  I know this results in
>a large amount of code duplication, but a) it was the fastest way for me to
>get a GUI training interface without using POP, and b) maybe I can convince
>the pop3proxy advocates to slim it down by ripping out the user interface
>stuff. ;-) Proxytee.py is like the Unix tee program (copy stdin to stdout
>and an external file), except the "external file" is to upload the message
>or mailbox as a file to proxytrainer.py.
>
>I'm still experimenting with things, but should have proxytee.py embedded
>into my procmailrc file by the end of the day once I refresh my memory on
>flags and such.
>
>Come to think of it, hammiefilter.py could pretty easily be extended to do
>the file upload.  The core functionality is implemented in two functions
>Wade Leftwich posted to the Python Cookbook.  hmmm...

I think we're really onto something here, that's bothered me for a while now.  
There is a core engine in all of this stuff that really should be packaged as 
such.  Classifier, tokenizer, the corpus stuff, and the training stuff, is 
basically it.  Corpus isn't up to the task yet, but with some rework it could 
be made usable to hammiefilter, pop3proxy, outlook, proxytee, or any other 
client type we can think up...

Richie and I have had a couple of offlist jabs at this, and I know Richie is 
in the process of ripping pop3proxy apart into smaller components...  - TimS
>
>Skip
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim.one at comcast.net  Thu Jan 16 13:55:48 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 16 13:56:57 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <QOPMKD0C973WQICTSYT09NB953IEYV.3e26f14f@myst>
Message-ID: <BIEJKCLHCIOIHAGOKOLHEEMBEIAA.tim.one@comcast.net>

[Tim Stone - Four Stones Expressions]
> I think I'm hearing something on this thread that doesn't make
> much sense to me.  If we always train as spam stuff that's been
> classified as spam, always train as ham stuff that's been
> classified as ham, then we're kinda reinforcing the obvious, and
> increasing the spaminess of words in that spam... isn't it
> more realistic (and ultimately actually better) to train on a
> random sample rather than always?  - TimS

Testing results failed to find any way of training that didn't work well,
ranging from purely mistake-based training, to letting a classifier
self-train on its own decisions.  My real-life experience on my own email is
that pure mistake-based training is unsatisfactory in practice because it
keeps the Unsure rate higher longer than need be (also showed in formal
tests), and especially because the *kinds* of spam that remained Unsure were
maddeningly "obvious" spam (something I don't know how to test formally).

OTOH, in real life now I started with a few hundred random msgs, and since
then have done *almost* purely mistake-based training.  This may not be
optimal (and I believe it is not), but leaves so little manual
classification for me to do that I don't care.  When error rates get below
1%, the difference between, say, 0.5% and 0.2% is more than a factor of two,
but isn't actually noticeable unless you've got many thousands of msgs to
dig thru.  This *is* the case for the mailing list run via
comp.lang.python's news<->mail gateway, and more-careful training there may
more than repay the cost.  But most Mailman lists have much lower volume,
and "excellent" results with little training effort may be more attractive
to list admins than "superb" results requiring substantially more training
effort.

The important thing now is just that Barry get off his ass and start <wink>.


From skip at pobox.com  Thu Jan 16 13:04:21 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 16 14:04:33 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEEMBEIAA.tim.one@comcast.net>
References: <QOPMKD0C973WQICTSYT09NB953IEYV.3e26f14f@myst>
        <BIEJKCLHCIOIHAGOKOLHEEMBEIAA.tim.one@comcast.net>
Message-ID: <15911.565.674990.932342@montanaro.dyndns.org>

    Tim> But most Mailman lists have much lower volume, and "excellent"
    Tim> results with little training effort may be more attractive to list
    Tim> admins than "superb" results requiring substantially more training
    Tim> effort.

Which suggests that if Barry hasn't already considered it (and I'll be he
has given that bass players are about three steps up on the evolutionary
scale from say, drummers or viola players :-), he should give Mailman admins
a variety of ways to train: everything, mistakes only, random, unsures only,
etc.

Skip

From tim at fourstonesExpressions.com  Thu Jan 16 17:44:02 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 16 18:44:44 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <3E272711.9050001@hooft.net>
Message-ID: <04VQKE84QNC731SOVUS5ONMLPNLIMI.3e2743c2@myst>

This is a great discussion, one I think we should include on the main site.  
This is obviously a superiority that we (this algorithm) has... you can hardly 
go wrong!  In my installation, I have 156 spam and 223 ham, and it almost 
never makes a classification mistake.  Unsures are almost always ham, spam is 
DOA.  It hasn't improved (there is precious little *room* for improvement) 
since a few days after I started this database.  In fact, I'm a bit reluctant 
to reup, it's working so well  <wink>

Now my mail is a bit unique in that I get mostly machine driven event 
notification mails, which are VERY similar... There's probably 5 different 
email content patterns/sources that comprise 90% of my mail (e.g. "Order 
Received", "Mail List Opt-In" "Spambayes", etc.)  But even the unique stuff is 
nailed as ham almost all the time.

Perhaps we can document a few training patterns: mistake driven, 
classification driven, random sample driven, <more?>, and allow users to 
select which type of training pattern they want to do.  The user interface, 
then, might only present messages that are pertinent for that type of training 
regimen.  For example, the pop3proxy right now presents every message it 
receives in buckets by classification.  If I'm doing classification driven 
training, I wouldn't need to look at every spam that comes in... Oh I don't 
know, I'm rambling now...   - TimS

1/16/2003 3:41:37 PM, Rob Hooft <rob@hooft.net> wrote:

>Tim Stone - Four Stones Expressions wrote:
>> 
>> I think I'm hearing something on this thread that doesn't make much sense 
to 
>> me.  If we always train as spam stuff that's been classified as spam, 
always 
>> train as ham stuff that's been classified as ham, then we're kinda 
reinforcing 
>> the obvious, and increasing the spaminess of words in that spam... isn't it 
>> more realistic (and ultimately actually better) to train on a random sample 
>> rather than always?  - TimS
>
Tim1 said:
>Testing results failed to find any way of training that didn't work well,
>ranging from purely mistake-based training, to letting a classifier
>self-train on its own decisions.  My real-life experience on my own email is
>that pure mistake-based training is unsatisfactory in practice because it
>keeps the Unsure rate higher longer than need be (also showed in formal
>tests), and especially because the *kinds* of spam that remained Unsure were
>maddeningly "obvious" spam (something I don't know how to test formally).
>
>OTOH, in real life now I started with a few hundred random msgs, and since
>then have done *almost* purely mistake-based training.  This may not be
>optimal (and I believe it is not), but leaves so little manual
>classification for me to do that I don't care.  When error rates get below
>1%, the difference between, say, 0.5% and 0.2% is more than a factor of two,
>but isn't actually noticeable unless you've got many thousands of msgs to
>dig thru.  This *is* the case for the mailing list run via
>comp.lang.python's news<->mail gateway, and more-careful training there may
>more than repay the cost.  But most Mailman lists have much lower volume,
>and "excellent" results with little training effort may be more attractive
>to list admins than "superb" results requiring substantially more training
>effort.
>
Rob said:
>Nope, the mathematics say this isn't true. Say by the word "Sex" you 
>recognize a new message as being spam. This message may be the first 
>that contains the word "oral", so training on this makes it a spammy 
>word. The word "sex" becomes more spammy. And the word "ink-cartridge" 
>that does not appear in this message becomes a little less spammy.
>
>In other words: training on a new spam doesn't only make the tokens in 
>it more spammy, but also makes the spammy tokens that do not occur in 
>there less spammy.
>
>Then there are words that occur both in ham and in spam messages. There 
>it is important to get the right "balance". If you train only on 
>"non-obvious" cases, this will almost certainly result in an imbalance.
>
>All of this determines, like Tim1 explained, only the difference between 
>excellent and superb separation of classes.
>
>Rob
>
>-- 
>Rob W.W. Hooft  ||  rob@hooft.net  ||  http://www.hooft.net/people/rob/
>
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From mhammond at skippinet.com.au  Fri Jan 17 10:56:21 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 16 18:57:09 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15910.42543.629381.696105@montanaro.dyndns.org>
Message-ID: <002001c2bdba$df01f790$530f8490@eden>

[Skip]
> and since it's set up for members only posting, little, if any garbage
> actually gets through to the list, even without using spambayes.

Unfortunately, none of this stuff gets through as the poor list
administrator has explicitly rejected it.

So particularly for closed lists, spambayes could be a huge bonus -
auto-reject any non-members posts with a particular score, and most of my
admin duties will vanish!

Mark.


From mhammond at skippinet.com.au  Fri Jan 17 11:03:44 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 16 19:04:44 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHEEMBEIAA.tim.one@comcast.net>
Message-ID: <002201c2bdbb$e7642830$530f8490@eden>

[Tim1]
> tests), and especially because the *kinds* of spam that
> remained Unsure were
> maddeningly "obvious" spam (something I don't know how to
> test formally).

This is touching my test-of-training-strategies comments recently.

If we have a decent framework in place, then "obvious" spam would be
anything that is spam given complete data.

ie, assume we have 3000 ham and 3000 spam.  My training strategy would be to
perform a complete train over the entire database, and collect "correct"
scores for each item.  We then can test out various training strategies,
watching not only the fp/fn/unsure rates, but also deviance from the
"correct" score.

> OTOH, in real life now I started with a few hundred random
> msgs, and since
> then have done *almost* purely mistake-based training.  This
> may not be
> optimal (and I believe it is not), but leaves so little manual
> classification for me to do that I don't care.

Do you believe we can reasonable formalize some tests for these strategies?

> The important thing now is just that Barry get off his ass
> and start <wink>.

Yeah, 'cos when he is finished there are some nice training strategies I
would like him to work on <wink>

Mark.


From mhammond at skippinet.com.au  Fri Jan 17 11:05:14 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 16 19:07:27 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <002001c2bdba$df01f790$530f8490@eden>
Message-ID: <002301c2bdbc$1ddae020$530f8490@eden>

[I wrote]
> So particularly for closed lists, spambayes could be a huge bonus -
> auto-reject any non-members posts with a particular score,
> and most of my
> admin duties will vanish!

Obviously too early for me.  It will not help closed lists.  What it *will*
do is allow me to open up a few lists - ones that I only closed due to the
spam coming through.

Mark.


From tim.one at comcast.net  Thu Jan 16 22:06:16 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 16 22:06:51 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <002201c2bdbb$e7642830$530f8490@eden>
Message-ID: <LNBBLJKPBEHFEDALKOLCCENODIAB.tim.one@comcast.net>

[Mark Hammond]
> ...
> If we have a decent framework in place, then "obvious" spam would be
> anything that is spam given complete data.

That's not how I meant it.  "Obvious" is a human judgment, and is (AFAICT)
subjective.  Purely mistake-based training, starting from an empty database,
left substantial "obvious spam" in the Unsure category even after 2 weeks,
which is well over 1200 spam at the rate I get spam.  So little spam got
trained on during that time (there weren't many mistakes after the first two
days) that spam-detection remained mostly hapax-driven, and the few
instances of trained farm-porn spam didn't do enough to nail gay-porn spam
too, etc.

"Obvious spam" means that you personally are surprised to see it rate
Unsure, at least surprised enough to click the "Spam Clues" button to try to
figure out why it wasn't nailed.

> ie, assume we have 3000 ham and 3000 spam.  My training strategy
> would be to perform a complete train over the entire database, and
> collect "correct" scores for each item.

I'm not sure what correct means here.  How do you decide?  You're surely not
going to look at those 6,000 msgs by hand and assign a two-digit number to
each, right?

> We then can test out various training strategies, watching not only
> the fp/fn/unsure rates, but also deviance from the "correct" score.
> ...
> Do you believe we can reasonable formalize some tests for these
> strategies?

If you can define what it is you're trying to measure, sure <wink>.  All
along in testing we used a three-term cost function (assigning different
"dollar" penalties to FP, FN and unsure), and the measure of goodness was
how small the total penalty got.  It's easy (albeit tedious) to set up
experiments to measure the effect of any definable training strategy on
that.  If you define a different penalty function, likewise.


From frank.horowitz at csiro.au  Fri Jan 17 11:16:51 2003
From: frank.horowitz at csiro.au (Frank Horowitz)
Date: Thu Jan 16 22:26:24 2003
Subject: [Spambayes] Sourceforge :pserver cvs access broken...
Message-ID: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au>

... and has been for a few days:

http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#cv

While this doesn't affect those with developer cvs access (via SSH), it
kind of makes it hard for we "lurkers" to get our spambayes fixes (err,
I mean "patches" of course ;-). 

Does anyone (by any small miracle) have a mirror of the cvs tree that
they'd be willing to put online while SF gets it's act together?

	Cheers,
		Frank Horowitz


From frank.horowitz at csiro.au  Fri Jan 17 11:33:19 2003
From: frank.horowitz at csiro.au (Frank Horowitz)
Date: Thu Jan 16 22:42:00 2003
Subject: [Spambayes] Re: Sourceforge :pserver cvs access broken... (Good URL)
Message-ID: <1042774398.22402.11.camel@bonzo.ned.dem.csiro.au>

Sorry. Pilot error (or something; mutter).

That URL again:

http://sourceforge.net/docman/display_doc.php?docid=2352&group_id=1#cvs


	Frank


From tim.one at comcast.net  Thu Jan 16 22:46:27 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 16 22:47:02 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <3E25521B.20937.3607FFA@localhost>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEOADIAB.tim.one@comcast.net>

[Richard Jowsey]
> I've been building a Java implementation of Paul Graham's
> "Bayesian" classification logic over the past couple months,
> intended as a plug-in filter for the Apache JAMES mail server.

Upgrade to Python and you would have finished a couple months ago <wink>.

> However, after considerable testing, tweaking and tuning via a
> proxy setup (similar to POPFile), plus some recent lurking on
> the Spambayes list, I'm now modifying this project to
> incorporate the excellent notions contributed by Gary Robinson,
> et al, as implemented in your Python code.
>
> Early results are *very* promising!!! This death2spam stuff is
> definitely heading in the right direction! I haven't quite
> finished the chi2 comparison logic, but even using just "gary-
> combining", the kinds of messages ending up in my "uncertain"
> category make much more sense.

chi-combining will give you more of the same.  The combining methods are
related, in such a way that they're monotonic with each other.  chi is more
extreme, and you'll find that it pushes most spam very close to 1.0, most
ham very close to 0.0, and highly ambiguous msgs very close to 0.5.  This
gives it some nice properties for automated decision making (the cutoff
points for gary-combining were too touchy, across test sets, and across
time).  But if you like a mode where you simply sort msgs by score, you can
stop with gary-combining and be happy.

> Plus I'm now seeing far less weirdness caused by Graham's
> "2 * nGood + nSpam >= 5" trick,  etc. Will keep the list posted as to
> further progress.

The biases indeed had strange effects!  It was quite a struggle to eliminate
all of them, in part because near the end of that struggle, some biases
acted to counteract others, so removing any one of them in isoolation made
things worse.  Gary Robinson pushed us out of the pit by proposing to
eliminate all the remaining biases in one shot.  I'm glad we were wise
enough to listen to him <wink>>

> I'd sure love to attend the upcoming spam-fest at MIT, but we
> moved downunder (Seattle -> Sydney) last year, and it's one
> helluva long way to go just for a day...

Meet up with Mark Hammond instead.  He wrote the wondrous Outlook 2000
client for this project, and also sleeps upside down.  Just don't try to
talk to him about Java.  Our Anthony Baxter, who deserves more thanks at
least for his thankless work in maintaining the web site, is also on the
wrong side of the globe.

> Many thanks for all your fine coding, testing efforts, and
> thoughtful conversations! It's been very helpful, not to mention
> highly entertaining at times.  ;-)

Less spam means more time for fun.  Too bad I was kicked off the project
<wink>.


From anthony at interlink.com.au  Fri Jan 17 15:09:56 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 16 23:11:24 2003
Subject: [Spambayes] Sourceforge :pserver cvs access broken... 
In-Reply-To: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> 
Message-ID: <200301170409.h0H49uq25399@localhost.localdomain>


>>> Frank Horowitz wrote
> Does anyone (by any small miracle) have a mirror of the cvs tree that
> they'd be willing to put online while SF gets it's act together?

I'm planning a first pre-release tarball later today.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Fri Jan 17 15:11:36 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 16 23:13:01 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in 
In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org> 
Message-ID: <200301170411.h0H4Baw25446@localhost.localdomain>


>>> Skip Montanaro wrote
> I just checked in proxytrainer.py and proxytee.py. 

Cool. I won't put them in the first "release package" today - let's see
if they work first :)


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Fri Jan 17 15:14:31 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 16 23:16:04 2003
Subject: [Spambayes] spambayes fronting a mailing list? 
In-Reply-To: <04VQKE84QNC731SOVUS5ONMLPNLIMI.3e2743c2@myst> 
Message-ID: <200301170414.h0H4EV625489@localhost.localdomain>


>>> Tim Stone - Four Stones Expressions wrote
> This is a great discussion, one I think we should include on the main site.  

I'm working on a bit on the background page on the "training" section.
It's not there yet.

And yes, I know that "background", "documentation", and "developer" need
to be sorted out. At the moment I'm just trying to get the words down in
readable english - then we can work out what goes where...

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From T.A.Meyer at massey.ac.nz  Fri Jan 17 17:16:58 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 16 23:17:44 2003
Subject: [Spambayes] Outlook plugin & Bad Folders
Message-ID: <98B01D2717B9D411B38F0008C78409310EE3DAF7@its-xchg2.massey.ac.nz>

OK, I've solved the MAPI/Exchange problem, I think.  What was happening was that we were storing a short term id, not a long term one.  This is why the id worked when buliding the list, but not later on.  See this for more information:

<http://msdn.microsoft.com/library/en-us/mapi/html/_mapi1book_pr_entryid.asp>

It now works for me - builds the tree, and I can use the results (i.e. filter on a selected folder).  Unfortunately, my copy of FolderSelector.py is stuffed because I've played around with it so much, and I can't check out a new copy because of the sourceforge cvs problem.  This is my new _BuildFoldersMAPI function, which is all that needs to be changed (plus changing FilterDialog.py, TrainingDialog.py and FolderSelector.py to use MAPI and not Outlook tree builds).

def _BuildFoldersMAPI(msgstore, folder):
    # Get the hierarchy table for it.
    table = folder.GetHierarchyTable(0)
    children = []
    rows = mapi.HrQueryAllRows(table, (PR_ENTRYID,
                                       PR_STORE_ENTRYID,
                                       PR_DISPLAY_NAME_A), None, None, 0)
    for (eid_tag, eid),(storeeid_tag, store_eid), (name_tag, name) in rows:
        folder_id = mapi.HexFromBin(store_eid), mapi.HexFromBin(eid)
        spec = FolderSpec(folder_id, name)
        try:
            child_folder = msgstore.OpenEntry(eid, None, mapi.MAPI_DEFERRED_ERRORS)
            prop_ids = PR_ENTRYID, PR_STORE_ENTRYID
            hr, data = child_folder.GetProps(prop_ids,0)
            folder_eid = data[0][1]
            spec.folder_id = mapi.HexFromBin(store_eid), mapi.HexFromBin(folder_eid)
        except pythoncom.error:
            # Something strange with this folder - just ignore it
            spec = None
        if spec is not None:
            spec.children = _BuildFoldersMAPI(msgstore, child_folder)
            children.append(spec)
    return children

The bad news, of course, is that this (I believe) means that MAPI works, but that means my nice build-on-demand code is broken.  I guess I'll have to re-implement in using MAPI...<sigh>

=Tony Meyer

From anthony at interlink.com.au  Fri Jan 17 17:29:58 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 17 01:31:40 2003
Subject: [Spambayes] Sourceforge :pserver cvs access broken... 
In-Reply-To: <1042778024.22400.16.camel@bonzo.ned.dem.csiro.au> 
Message-ID: <200301170629.h0H6TwK28960@localhost.localdomain>


There's now a nightly snapshot available from the front page 
of the website.

At the moment I'm building them from my laptop and pushing them
out - once the pserver's working again, I'll move it to a cron 
job at SF.

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From whisper at oz.net  Thu Jan 16 23:06:24 2003
From: whisper at oz.net (David LeBlanc)
Date: Fri Jan 17 02:05:46 2003
Subject: [Spambayes] SF CVS
Message-ID: <GCEDKONBLEFPPADDJCOEEEOIHGAA.whisper@oz.net>

What they say:
(2003-01-14 14:04:19 - Project CVS Services)   As of 2003-01-14,
pserver-based CVS repository access and ViewCVS (web-based) CVS repository
access have been taken offline as to stabilize CVS server performance for
developers. These services will be re-enabled as soon as the underlying
scalability issues have been analyzed and resolved (as soon as 2003-01-15,
if possible). Additional updates will be posted to the Site Status page as
they become available. Your patience is appreciated.


David LeBlanc
Seattle, WA USA 

From anthony at interlink.com.au  Fri Jan 17 18:36:43 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 17 02:38:26 2003
Subject: [Spambayes] 1.0a1 is done.
Message-ID: <200301170736.h0H7ah429690@localhost.localdomain>


I'm done with the packaging &c of the first pre-release. Can people
have a look at this and see what's missing/busted/stupid, and let me
know? Should we drop a note to something like comp.lang.python.announce?

See the download page on the website for more...

Anthony

From francois.granger at free.fr  Fri Jan 17 11:54:18 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Fri Jan 17 05:54:24 2003
Subject: [Spambayes] Using Spambayes w/ Eudora
In-Reply-To: <a05200f02ba49612f02f9@[10.0.1.3]>
References: <a05200f02ba49612f02f9@[10.0.1.3]>
Message-ID: <a05200f06ba4d8c2ea338@[192.168.1.20]>

At 22:49 -0800 13/01/2003, in message Re: [Spambayes] Using Spambayes 
w/ Eudora, Tony Lownds wrote:
>I have a startup script that sets everything up

I did copy and past your modifications on my MacOS X station. I am 
currently using an "old" version of Spambayes... most recent files 
being dated December 29.

It woks perfect on two pop servers. It does not work on a third one. 
I can't figure out why. I exchanged the local proxy addresses and the 
same server was unreachable. I guess this is my problem ;-) this 
server being pop.laposte.net, it may be kind of "special". But I can 
reach it directly with Eudora or fetchmail (home) and Entourage 
(work).


-- 
Recently using MacOSX.......

From t.a.meyer at massey.ac.nz  Fri Jan 17 11:28:14 2003
From: t.a.meyer at massey.ac.nz (t.a.meyer@massey.ac.nz)
Date: Fri Jan 17 06:28:21 2003
Subject: [Spambayes] 1.0a1 is done.
Message-ID: <E18ZUfW-0007Uj-00@grunt2.ihug.co.nz>

> See the download page on the website for more...

This 404's for me.

=Tony Meyer


From anthony at interlink.com.au  Fri Jan 17 22:40:40 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 17 06:42:04 2003
Subject: [Spambayes] 1.0a1 is done. 
In-Reply-To: <E18ZUfW-0007Uj-00@grunt2.ihug.co.nz> 
Message-ID: <200301171140.h0HBefc32422@localhost.localdomain>


>>> t.a.meyer@massey.ac.nz wrote
> > See the download page on the website for more...
> 
> This 404's for me.

which does? I can't seem to see any 404s...?

From mwh at python.net  Fri Jan 17 11:48:03 2003
From: mwh at python.net (Michael Hudson)
Date: Fri Jan 17 06:48:08 2003
Subject: [Spambayes] Re: 1.0a1 is done.
References: <E18ZUfW-0007Uj-00@grunt2.ihug.co.nz>
	<200301171140.h0HBefc32422@localhost.localdomain>
Message-ID: <2mn0m0xaj0.fsf@starship.python.net>

Anthony Baxter <anthony@interlink.com.au> writes:

> >>> t.a.meyer@massey.ac.nz wrote
> > > See the download page on the website for more...
> > 
> > This 404's for me.
> 
> which does? I can't seem to see any 404s...?

http://spambayes.sourceforge.net/downloads.html

404s for me.

Ah, there's a link from the index page to the above; it has an extra
's' at the end...

Cheers,
M>


From anthony at interlink.com.au  Fri Jan 17 22:52:55 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 17 06:54:15 2003
Subject: [Spambayes] Re: 1.0a1 is done. 
In-Reply-To: <2mn0m0xaj0.fsf@starship.python.net> 
Message-ID: <200301171152.h0HBqtE32557@localhost.localdomain>


>>> Michael Hudson wrote
> http://spambayes.sourceforge.net/downloads.html
> 
> 404s for me.
> 
> Ah, there's a link from the index page to the above; it has an extra
> 's' at the end...

dammit. thanks. fixed.

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From Alexander at Leidinger.net  Fri Jan 17 13:47:41 2003
From: Alexander at Leidinger.net (Alexander Leidinger)
Date: Fri Jan 17 07:48:16 2003
Subject: [Spambayes] Stemming and stopword elemination
Message-ID: <20030117134741.1e88011c.Alexander@Leidinger.net>

Hi,

has someone already experimented with Information Retrieval techniques
like stopword elemination (stopwords: the, a, an, or, and, ...) and word
stemming?

See http://www.tartarus.org/~martin/PorterStemmer for a description of
the algorithm for english text and a python implementation, or
http://snowball.tartarus.org/ for non-english stemmers.

I don't think this will change the failure rate significantly (maybe
better results with few training data, maybe worser; I don't expect
much change with large training data), but it should reduce the size of
the needed database.

Bye,
Alexander.

-- 
               I believe the technical term is "Oops!"

http://www.Leidinger.net                       Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7

From papaDoc at videotron.ca  Fri Jan 17 09:15:09 2003
From: papaDoc at videotron.ca (papaDoc)
Date: Fri Jan 17 09:15:10 2003
Subject: [Spambayes] Re: 1.0a1 is done
Message-ID: <3E280FED.5070305@videotron.ca>

Hi,

I tested it and this is what I found

1- I think there is a typo in INTEGRATION.txt
183c183
< The minimum you need to do to get started is create a bayescustomize.ini
---
 > The minimum you need too do to get started is create a bayescustomize.ini

2- When I try to run pop3graph.py. I get this error message
Traceback (most recent call last):
   File "D:\REMI_N~1\MAILFI~1\SPAMBA~1.0A1\UTILIT~1\POP3GR~1.PY", line 
12, in ?
     from spambayes import  mboxutils
ImportError: No module named spambayes

3- This is not a problem with the release but I will ask
I'm running pop3proxy since a while so I have accumulated some ham and 
spam that pop3proxy saved in the cache file.
How can I make the new pop3proxy aware of those file.
If I only copy the cache file or define the values
pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam-cache
pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham-cache
pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-unknown-cache
I still have in the web interface
  Total emails trained: Spam: 0  Ham: 0
instead of
  Total emails trained: Spam: 68  Ham: 93


From tim at fourstonesExpressions.com  Fri Jan 17 08:18:55 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 17 09:19:32 2003
Subject: [Spambayes] Re: 1.0a1 is done
Message-ID: <A7A070PJA8454541VPO73YOJQO.3e2810cf@myst>

1/17/2003 8:15:09 AM, papaDoc <papaDoc@videotron.ca> wrote:

>Hi,
>
>I tested it and this is what I found
>
>1- I think there is a typo in INTEGRATION.txt
>183c183
>< The minimum you need to do to get started is create a 
bayescustomize.ini
>---
> > The minimum you need too do to get started is create a 
bayescustomize.ini
>
>2- When I try to run pop3graph.py. I get this error message
>Traceback (most recent call last):
>   File "D:\REMI_N~1\MAILFI~1\SPAMBA~1.0A1\UTILIT~1\POP3GR~1.PY", 
line 
>12, in ?
>     from spambayes import  mboxutils
>ImportError: No module named spambayes
>
>3- This is not a problem with the release but I will ask
>I'm running pop3proxy since a while so I have accumulated some ham 
and 
>spam that pop3proxy saved in the cache file.
>How can I make the new pop3proxy aware of those file.
>If I only copy the cache file or define the values
>pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam-
cache
>pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham-
cache
>pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-
unknown-cache
>I still have in the web interface
>  Total emails trained: Spam: 0  Ham: 0
>instead of
>  Total emails trained: Spam: 68  Ham: 93

Right now the only way to handle this is to retrain your database 
using hammiefilter.  A bit of a pain, but it's your only option.  - 
TimS

>
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From bkc at murkworks.com  Fri Jan 17 09:30:06 2003
From: bkc at murkworks.com (Brad Clements)
Date: Fri Jan 17 09:21:32 2003
Subject: [Spambayes] spamconference webcasts
Message-ID: <3E27CAEC.14277.6FB1C7CD@localhost>

For those who don't know, the spam conference is online live now ..

http://www.spamconference.org  follow the link to webcasts


-- 
Brad Clements,                bkc@murkworks.com   (315)268-1000
http://www.murkworks.com                          (315)268-9812 Fax
http://www.wecanstopspam.org/                   AOL-IM: BKClements


From skip at pobox.com  Fri Jan 17 08:43:35 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 09:43:51 2003
Subject: [Spambayes] Sourceforge :pserver cvs access broken...
In-Reply-To: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au>
References: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au>
Message-ID: <15912.5783.216516.749029@montanaro.dyndns.org>


    Frank> Does anyone (by any small miracle) have a mirror of the cvs tree
    Frank> that they'd be willing to put online while SF gets it's act
    Frank> together?

Not a mirror, but I just put a gzipped tar file snapshot at

    http://www.musi-cal.com/~skip/python.spambayes.tar.gz

I'd be happy to update it periodically, though I have to do it manually,
since on that machine cvs prompts me for my SF password when I 'cvs up'.

Skip


From skip at pobox.com  Fri Jan 17 08:46:49 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 09:46:59 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOADIAB.tim.one@comcast.net>
References: <3E25521B.20937.3607FFA@localhost>
        <LNBBLJKPBEHFEDALKOLCEEOADIAB.tim.one@comcast.net>
Message-ID: <15912.5977.869435.819287@montanaro.dyndns.org>


    Tim> Less spam means more time for fun.  Too bad I was kicked off the
    Tim> project <wink>.

That's what you get for having too much fun.  Barry got jealous.  (Those
bass players are a jealous lot you know.) ;-)

Skip


From skip at pobox.com  Fri Jan 17 09:00:39 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 10:00:49 2003
Subject: [Spambayes] spamconference webcasts
In-Reply-To: <3E27CAEC.14277.6FB1C7CD@localhost>
References: <3E27CAEC.14277.6FB1C7CD@localhost>
Message-ID: <15912.6807.897201.900909@montanaro.dyndns.org>


    Brad> For those who don't know, the spam conference is online live now ..
    Brad> http://www.spamconference.org  follow the link to webcasts

Thanks!  So much for getting any other work done today. ;-)

Skip

From tim at fourstonesExpressions.com  Fri Jan 17 09:02:34 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 17 10:03:16 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <15912.5977.869435.819287@montanaro.dyndns.org>
Message-ID: <1YUQA7UQYT421W2UZVGAGBDBIGVQVT51.3e281b0a@myst>

1/17/2003 8:46:49 AM, Skip Montanaro <skip@pobox.com> wrote:

>
>    Tim> Less spam means more time for fun.  Too bad I was kicked off the
>    Tim> project <wink>.
>
>That's what you get for having too much fun.  Barry got jealous.  (Those
>bass players are a jealous lot you know.) ;-)

Ya, and all for what...?  Using two fingers at a time to play one note at a 
time?  <wink>  - TimS

>
>Skip
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From anthony at interlink.com.au  Sat Jan 18 03:04:22 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri Jan 17 11:07:21 2003
Subject: [Spambayes] spamconference webcasts 
In-Reply-To: <15912.6807.897201.900909@montanaro.dyndns.org> 
Message-ID: <200301171604.h0HG4NK01834@localhost.localdomain>


>>> Skip Montanaro wrote
> 
>     Brad> For those who don't know, the spam conference is online live now ..
>     Brad> http://www.spamconference.org  follow the link to webcasts
> 
> Thanks!  So much for getting any other work done today. ;-)


... or sleep :-/


From skip at pobox.com  Fri Jan 17 11:35:34 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 12:37:02 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
Message-ID: <15912.16102.713265.622424@montanaro.dyndns.org>

In Corpus.Message, __getattr__ is defined as

    def __getattr__(self, attributeName):
        '''On-demand loading of the message text.'''

        if attributeName in ('hdrtxt', 'payload'):
            self.load()
        return getattr(self, attributeName)

This has to be an infloop, right?

Skip

From richie at entrian.com  Fri Jan 17 18:18:09 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 17 13:18:37 2003
Subject: [Spambayes] Re: 1.0a1 is done
In-Reply-To: <3E280FED.5070305@videotron.ca>
References: <3E280FED.5070305@videotron.ca>
Message-ID: <7jhg2v8c91f582f1mm71smk7vrkff7btt3@4ax.com>


> 2- When I try to run pop3graph.py. I get this error message

I'll add this to my ever-growing list of things to do...

> 3- This is not a problem with the release but I will ask
> I'm running pop3proxy since a while so I have accumulated some ham and 
> spam that pop3proxy saved in the cache file.
> How can I make the new pop3proxy aware of those file.

You can upload them into the web interface, via the "Train on a given
message" form (which I should probably need to rename, now that it supports
mbox files).

> If I only copy the cache file or define the values
> pop3proxy_spam_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-spam-cache
> pop3proxy_ham_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-ham-cache
> pop3proxy_unknown_cache: D:/Remi_NoBackup/MailFilter/pop3proxy-unknown-cache

Ooo, don't do that!  Those caches need to be directories - things will get
very confused if you make them files.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Fri Jan 17 18:50:54 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 17 13:51:22 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <15912.16102.713265.622424@montanaro.dyndns.org>
References: <15912.16102.713265.622424@montanaro.dyndns.org>
Message-ID: <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>

Hi Skip,

> In Corpus.Message, __getattr__ is defined as
> 
>     def __getattr__(self, attributeName):
>         '''On-demand loading of the message text.'''
> 
>         if attributeName in ('hdrtxt', 'payload'):
>             self.load()
>         return getattr(self, attributeName)
> 
> This has to be an infloop, right?

It should probably be:

    return self.__dict__[attributeName]

so that it raises an exception when something goes wrong.  This is probably
related to
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

The suggested fix in the bug report looks needlessly destructive to me -
I'd use something like (untested)

        if bmatch:
            self.payload = bmatch.group(2)
            self.hdrtxt = sub[:bmatch.start(2)]
        else:
            self.payload = sub
            self.hdrtxt = ""

Skip, since you're having trouble with this and I can't reproduce it, could
you try the above edit?  Tim S if you're listening, any better ideas?

-- 
Richie Hindle
richie@entrian.com


From tim.one at comcast.net  Fri Jan 17 14:12:48 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Jan 17 14:14:09 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <1YUQA7UQYT421W2UZVGAGBDBIGVQVT51.3e281b0a@myst>
Message-ID: <BIEJKCLHCIOIHAGOKOLHMEAMEJAA.tim.one@comcast.net>

[TimP]
> Less spam means more time for fun.  Too bad I was kicked off the
> project <wink>.

[Skip Montanaro]
> That's what you get for having too much fun.  Barry got jealous.  (Those
> bass players are a jealous lot you know.) ;-)

[TimS]
> Ya, and all for what...?  Using two fingers at a time to play one
> note at a time?  <wink>  - TimS

That's on Barry's best day.  Usually he plays about one note per minute, due
to heavy drool landing on a string.  Sometimes he uses a finger to try to
push the stream into position, though, so he's more advanced than most bass
players.

we-only-hire-the-best-ly y'rs  - tim


From tim at fourstonesExpressions.com  Fri Jan 17 13:14:20 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 17 14:14:57 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <15912.16102.713265.622424@montanaro.dyndns.org>
Message-ID: <QBQSNSN63XKHVPRQTOLGUQNMMION.3e28560c@myst>

I'm not sure how this ever worked!  Unfortunately, I'm in the middle of 
changing workstations right now, and don't have cvs up and running yet, so I 
can't fix it...  

1/17/2003 11:35:34 AM, Skip Montanaro <skip@pobox.com> wrote:

>In Corpus.Message, __getattr__ is defined as
>
>    def __getattr__(self, attributeName):
>        '''On-demand loading of the message text.'''
>
>        if attributeName in ('hdrtxt', 'payload'):
>            self.load()
>        return getattr(self, attributeName)
>
>This has to be an infloop, right?
>
>Skip
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From just at letterror.com  Fri Jan 17 20:29:15 2003
From: just at letterror.com (Just van Rossum)
Date: Fri Jan 17 14:29:33 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
Message-ID: <r01050400-1023-F90669F32A5111D79FAB003065D5E7E4@[10.0.0.23]>

Richie Hindle wrote:

> > In Corpus.Message, __getattr__ is defined as
> > 
> >     def __getattr__(self, attributeName):
> >         '''On-demand loading of the message text.'''
> > 
> >         if attributeName in ('hdrtxt', 'payload'):
> >             self.load()
> >         return getattr(self, attributeName)
> > 
> > This has to be an infloop, right?
> 
> It should probably be:
> 
>     return self.__dict__[attributeName]
> 
> so that it raises an exception when something goes wrong. [ ... ]

Neither makes sense (unless I'm missing some magic context): __getattr__
is only called if the attr isn't found the normal way, which means it's
for sure not in self.__dict__.

Just

From just at letterror.com  Fri Jan 17 20:54:57 2003
From: just at letterror.com (Just van Rossum)
Date: Fri Jan 17 14:55:07 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <r01050400-1023-F90669F32A5111D79FAB003065D5E7E4@[10.0.0.23]>
Message-ID: <r01050400-1023-8F8286EF2A5511D79FAB003065D5E7E4@[10.0.0.23]>

Just van Rossum wrote:

[never mind what I wrote... self.load() obviously loads the right attrs]

That said, yeah, looking it up in self.__dict__ is better, but you must
then catch KeyError and raise AttributeError instead.

Just
 

From tim at fourstonesExpressions.com  Fri Jan 17 14:03:44 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 17 15:04:29 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <r01050400-1023-F90669F32A5111D79FAB003065D5E7E4@[10.0.0.23]>
Message-ID: <A7A962A6LKHD09RL341VSR4HGZV85.3e2861a0@myst>

1/17/2003 1:29:15 PM, Just van Rossum <just@letterror.com> wrote:

>Richie Hindle wrote:
>
>> > In Corpus.Message, __getattr__ is defined as
>> > 
>> >     def __getattr__(self, attributeName):
>> >         '''On-demand loading of the message text.'''
>> > 
>> >         if attributeName in ('hdrtxt', 'payload'):
>> >             self.load()
>> >         return getattr(self, attributeName)
>> > 
>> > This has to be an infloop, right?
>> 
>> It should probably be:
>> 
>>     return self.__dict__[attributeName]
>> 
>> so that it raises an exception when something goes wrong. [ ... ]
>
>Neither makes sense (unless I'm missing some magic context): __getattr__
>is only called if the attr isn't found the normal way, which means it's
>for sure not in self.__dict__.

It's not an infloop if self.load() sets the attributes hdrtxt and payload, AND 
attributeName is in ('hdrtxt', 'payload').  Obviously both of these conditions 
are not being met.  self.load() ends up calling self.setSubstance, which does 
nothing if the message substance cannot be split into header text and payload.  
This is an error.  setSubstance should look like:

    def setSubstance(self, sub):
        '''set this message substance'''

        bodyRE = re.compile(r"\r?\n(\r?\n)(.*)", re.DOTALL+re.MULTILINE)
        bmatch = bodyRE.search(sub)
        if bmatch:
            self.payload = bmatch.group(2)
            self.hdrtxt = sub[:bmatch.start(2)]
        else:
            self.payload = sub  #we don't have valid headers, only payload
            self.hdrtxt = ''

and __getattr__ should look like:

    def __getattr__(self, attributeName):
        '''On-demand loading of the message text.'''

        if attributeName in ('hdrtxt', 'payload'):
            self.load()
            # will recurse if load does not set hdrtxt or payload
            return getattr(self, attributeName)
        else
            # we should never get here.  if we do, some attribute is missing
            # and we don't know what to do about it
            raise AttributeError, attributeName

Unfortunately, I don't have cvs access to fix this at the moment.
>
>Just
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Fri Jan 17 14:06:52 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 15:07:27 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
References: <15912.16102.713265.622424@montanaro.dyndns.org>
        <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
Message-ID: <15912.25180.638042.95791@montanaro.dyndns.org>


    Richie> It should probably be:

    Richie>     return self.__dict__[attributeName]

    Richie> so that it raises an exception when something goes wrong.  This
    Richie> is probably related to
    Richie> https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

Yes, now that I know what's going on, I understand why I was getting
infinite loops.  The __getattr__ method is really only meant to initialize
payload and hdrtxt.  Any other attributes should raise AttributeError.  I
corrected the code in Corpus.py and closed out the bug report.

Skip

From skip at pobox.com  Fri Jan 17 14:10:34 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 15:10:45 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <r01050400-1023-F90669F32A5111D79FAB003065D5E7E4@[10.0.0.23]>
References: <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
        <r01050400-1023-F90669F32A5111D79FAB003065D5E7E4@[10.0.0.23]>
Message-ID: <15912.25402.44471.77970@montanaro.dyndns.org>


    Just> Neither makes sense (unless I'm missing some magic context):
    Just> __getattr__ is only called if the attr isn't found the normal way,
    Just> which means it's for sure not in self.__dict__.

Well, I think Richie meant it should be:

    def __getattr__(self, attributeName):
        '''On-demand loading of the message text.'''

        if attributeName in ('hdrtxt', 'payload'):
            self.load()
        try:
            return self.__dict__[attributeName]
        except KeyError:
            raise AttributeError, attributeName

That is, __getattr__ is called when hdrtxt or payload are accessed but not
yet initialized.  All other accesses (or if self.load() fails somehow)
should raise AttributeError.  See Corpus.py 1.3.

Skip


From noreply at sourceforge.net  Fri Jan 17 12:09:18 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 17 15:16:20 2003
Subject: [Spambayes] [ spambayes-Bugs-651365 ] getattr recursion in Corpus.py
Message-ID: <E18Zcnm-0005eb-00@sc8-sf-web1.sourceforge.net>

Bugs item #651365, was opened at 2002-12-10 04:42
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Wolfgang Strobl (strobl)
Assigned to: Tim Stone (timstone4)
Summary: getattr recursion in Corpus.py

Initial Comment:
After feeding a bunch of new messages into pop3proxy, 
classifying them and when trying to save the result, I got 
a recursion loop (followed by recursion depth exceeded) 
in \cvshome\spambayes\Corpus.py|__getattr__|269]

After looking into setSubstance, I noticed that 
setSubstance (called by load) only sets the attributes 
payload and hdrtext when the pattern matches. 

I temporarily added an else clause to bmatch, i.e.

     if bmatch:
            self.payload = bmatch.group(2)
            self.hdrtxt = sub[:bmatch.start(2)]
            print ".",
        else:
            self.payload = "nix\r\n"
            self.hdrtxt="nix\r\n"
            print "?", len(sub),

and indeed, when trying to save, I notice that after about 
800 good messages, ~ 100 have an empty message, 
see the output below. 

I don't really know what I'm doing here, but at this fix at 
least allows me to continue.

-------------------------

C:\archiv\cvshome\spambayes>python -u pop3proxy.py -
l 8110 mail.gmd.de
Loading database... Done.
Listener on port 8110 is proxying mail:110
User interface url is http://localhost:8880
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 ? 0 ? 0 ? 0 ? 0 ?
0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 0 ? 
0 . . . . . . . . .
. . . . . .

-----------------------
Initial traceback:

error: uncaptured python exception, closing channel 
<__main__.UserInterface conn
ected at 0x2213470> 
(exceptions.RuntimeError:maximum recursion depth 
exceeded [C
:\Python22\lib\asyncore.py|poll|95] [C:\Python22
\lib\asyncore.py|handle_read_eve
nt|392] [C:\Python22\lib\asynchat.py|handle_read|112] 
[C:\archiv\cvshome\spambay
es\pop3proxy.py|found_terminator|804] 
[C:\archiv\cvshome\spambayes\pop3proxy.py|
onRequest|830] 
[C:\archiv\cvshome\spambayes\pop3proxy.py|onReview|1
093] [C:\arch
iv\cvs\spambayes\Corpus.py|takeMessage|188] 
[C:\archiv\cvs\spambayes\FileCorpus.
py|addMessage|140] 
[C:\archiv\cvs\spambayes\FileCorpus.py|store|231] 
[C:\archiv\
cvs\spambayes\Corpus.py|getSubstance|318] 
[C:\archiv\cvs\spambayes\Corpus.py|__g
etattr__|269] 
[C:\archiv\cvs\spambayes\Corpus.py|__getattr__|269] 
[C:\archiv\cvs
\spambayes\Corpus.py|__getattr__|269] 
[C:\archiv\cvs\spambayes\Corpus.py|__getat


----------------------------------------------------------------------

>Comment By: Skip Montanaro (montanaro)
Date: 2003-01-17 14:09

Message:
Logged In: YES 
user_id=44345

Fixed by restricting __getattr__ (make it raise AttributeError at appropriate times) and handle the case here where the message text isn't formatted as expected.  See Corpus.py 1.3.

----------------------------------------------------------------------

Comment By: Skip Montanaro (montanaro)
Date: 2003-01-16 10:39

Message:
Logged In: YES 
user_id=44345

Assigning to Tim Stone.  I think this is the same problem I reported on the
list the other day.  I think the offending code is in Corpus.__getitem__.  The
test of amsg - "if not amsg" should be "if amsg is None" I think.  I suspect
a fix further up the line as the OP indicated would probably do the trick.

If you don't do something to set self.hdrtxt I believe it is None and you infloop trying to resolve a non-existent __nonzero__ method.

Something like that. ;-)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=651365&group_id=61702

From richie at entrian.com  Fri Jan 17 20:21:36 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 17 15:22:08 2003
Subject: [Spambayes] New POP3 proxy and web interface
Message-ID: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com>

Hello all,

For those not subscribed to the checkins-list:

You can now run pop3proxy.py with no POP3 servers, and
just get the web interface.  I'll split it into different
source files at some point so that the naming is more
sensible.  This should let Skip use it instead of his
proxytrainer.py.

Time Stone's web-based configurator is now a part of the
main web interface.

The fact that you can run the thing without any POP3
proxies set up, and that the config page is now a part of
it, means that you don't need to touch bayescustomize.ini,
even when starting from scratch.  Run pop3proxy.py, hit
the Configuration link, enter your POP3 details, and
you're away.

There's a new architecture for pop3proxy and the web
interface.  The HTML is now all in resources/ui.html, with
the pieces being pulled out and stitched together at
runtime.  All the socket/async code has been pulled out
into a library module, so there's only application code
left in pop3proxy.py (it's still a combination of web UI
and POP3 proxy, which I'll address RSN).

I've added a new directory 'resources' for the HTML and
GIFs.  These are packaged using Mike Fletcher's excellent
ResourcePackage tool, but you don't need to know about
that, or have ResourcePackage installed, unless you want
to change the resources.

I've added a new option html_ui_allow_remote_connections,
which can be set to False to provide some measure of
privacy (I'm loath to say 'security' for fear of bugs 8-)

I've also added some pretty icons to the web interface,
because I couldn't help myself.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Fri Jan 17 20:21:39 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 17 15:22:12 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in
In-Reply-To: <15910.61389.133887.569308@montanaro.dyndns.org>
References: <15910.61389.133887.569308@montanaro.dyndns.org>
Message-ID: <rlkg2vogpv40bbu7879emjbida93oefk1n@4ax.com>

Hi Skip,

> I just checked in proxytrainer.py and proxytee.py.  The former is
> essentially pop3proxy.py with the POP stuff removed.

I've just checked in a version of pop3proxy.py that can run with no POP3
servers configured, so it just provides the web interface.  This should let
you use it instead of your (hopefully interim!) pop3trainer.py - just move
your onUpload method into it.

You should really make your message-naming code use the same system as
everything else - the names are unix timestamps of when each messages was
received, and are used to paginate the training pages into one day per page
(by day received rather than potentially-broken Date header).  If you want
me to do that then let me know, but I have an ever-growing to-do list...

> A bit further down the road, I will probably dump the asyncore stuff in
> favor of something based on SimpleHTTPServer just to reduce the number of
> lines of code.  Without the POP stuff going on there's no great need for the
> channel multiplexing.

If I can persuade you to use pop3proxy (or its successor, a generic
Spambayes server that can optionally host either or both of the web UI and
the POP3 proxy), you won't need to pull out the async stuff.  And all the
async-related code is now refactored into a separate module now, so
pop3proxy.py is a good deal smaller than it was.  It'll be smaller still
when the core server, POP3 proxy, and web UI parts are all separated.  I'm
trying to unify the servers we have (eg. my latest edits make Tim Stone's
OptionConfig.py a part of pop3proxy.py - again, ignore the bad naming, I'm
going to fix that - I'm doing it in stages to make CVS remain useful).  I'd
rather other people didn't fork off new servers at the same time as I'm
trying to unify them!

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Fri Jan 17 20:21:42 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 17 15:22:15 2003
Subject: [Spambayes] pop3proxy.UserInterface.onSave - self.shutdown?
In-Reply-To: <15910.4104.787891.400893@montanaro.dyndns.org>
References: <15910.4104.787891.400893@montanaro.dyndns.org>
Message-ID: <s5mg2vcorh7kvvc01popn9e26dhh4ikeuu@4ax.com>


> Pychecker complains about the call to self.shutdown(2) on line 1441 of
> pop3proxy.py.  It should probably be self.socket.shutdown(2), but I'll let
> someone else who knows the code better verify that.

asyncore is using __getattr__ to proxy unknown method calls to the
underlying socket.  I've changed it anyway, to keep PyChecker happy.

-- 
Richie Hindle
richie@entrian.com


From tim at fourstonesExpressions.com  Fri Jan 17 14:39:41 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 17 15:40:21 2003
Subject: [Spambayes] spamconference webcasts 
In-Reply-To: <15912.12331.478178.788727@montanaro.dyndns.org>
Message-ID: <41FCE97ZV75Y3V961YOKNHZ1X32EA.3e286a0d@myst>

This �$0phhhhht dude is kinda fulla somethin... - TimS

1/17/2003 10:32:43 AM, Skip Montanaro <skip@pobox.com> wrote:

>
>    Anthony> I just wish they would use fonts that are readable through the
>    Anthony> webcast...
>
>In all fairness, I wonder if they knew it would be webcast?  (One would
>think Paul Graham ought to have known.)
>
>S
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim.one at comcast.net  Fri Jan 17 15:51:54 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Jan 17 15:53:27 2003
Subject: [Spambayes] Stemming and stopword elemination
In-Reply-To: <20030117134741.1e88011c.Alexander@Leidinger.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHMEBMEJAA.tim.one@comcast.net>

[Alexander Leidinger]
> has someone already experimented with Information Retrieval techniques
> like stopword elemination (stopwords: the, a, an, or, and, ...) and word
> stemming?

Yes and no.

Stopword elimination doesn't make sense here.  A typical IR application
requires space proportional to the number of times a word appears, but this
app doesn't:  one word == one database entry, no matter how many times the
word appears.  Identifying stopwords would complicate and slow the code, and
introduce language dependence, for a trivial database savings.

Some Classic Bayesian classifiers remove stopwords for another reason
(related to one discussed below), but that reason doesn't make sense in this
code either:  when scoring, the classifier automatically ignores words with
a spamprob close to 0.5, so stopwords that truly *are* common across all
kinds of texts have no effect on scoring.

Stemming is a different issue.  We not only don't stem, we don't even strip
punctuation.  So, e.g., "free" and "free," and "free:" and "(free" and
"free--" and "free?" and "free!" and "free!!!" (etc) are all considered
distinct by our tokenizer.  That definitely grows the database size, but
tests run both early and late in the project showed that leaving punctuation
in works better than taking it out.

In the literature on Classic Bayesian classifiers, better results are
reported when using stemming.  But they do something else very different
too:  a "mutual information" calculation (or moral equivalent) is done on
all the training data, to identify the N words with (in effect) the greatest
discriminatory power.  N is typically less than 1000, and all words not in
that set are completely ignored.  In that context, it's very easy to believe
that stemming is valuable, else minor word variations would compete with
entirely different words for the privilege of not being ignored.  OTOH, we
ignore nothing except for tokens with spamprobs close to 0.5.

> ...
> I don't think this will change the failure rate significantly (maybe
> better results with few training data, maybe worser; I don't expect
> much change with large training data), but it should reduce the size of
> the needed database.

I expect that stopword elimination would make no difference, unless the
stopword list contained words that are actually hammish or spammish in real
life (in which case stopword elimination would hurt); the database size
difference would be too small to notice.  I expect that stemming would hurt
period, although it would reduce database size.


From skip at pobox.com  Fri Jan 17 14:53:35 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 15:53:46 2003
Subject: [Spambayes] New POP3 proxy and web interface
In-Reply-To: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com>
References: <5apg2voqi3c5s01ib4b5kstgjvvfhl85ni@4ax.com>
Message-ID: <15912.27983.607928.417089@montanaro.dyndns.org>


    Richie> You can now run pop3proxy.py with no POP3 servers, and just get
    Richie> the web interface.  I'll split it into different source files at
    Richie> some point so that the naming is more sensible.  This should let
    Richie> Skip use it instead of his proxytrainer.py.

This is great!  I just checked in a number of changes to proxytrainer.py.
Looks like it's time to backport them to pop3proxy.py.

Skip

From skip at pobox.com  Fri Jan 17 14:58:30 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 15:58:39 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in
In-Reply-To: <rlkg2vogpv40bbu7879emjbida93oefk1n@4ax.com>
References: <15910.61389.133887.569308@montanaro.dyndns.org>
        <rlkg2vogpv40bbu7879emjbida93oefk1n@4ax.com>
Message-ID: <15912.28278.137619.916136@montanaro.dyndns.org>


    Richie> You should really make your message-naming code use the same
    Richie> system as everything else - the names are unix timestamps of

Wasn't aware I did anything differently than you.  Did you notice something?

    Richie> when each messages was received, and are used to paginate the
    Richie> training pages into one day per page (by day received rather
    Richie> than potentially-broken Date header).  If you want me to do that
    Richie> then let me know, but I have an ever-growing to-do list...

I think as important (or more important) than day-by-day display is
chunk-by-chunk display.  I get far too much mail to want to review it all at
once anyway.  If I can't take the time to train everything, I don't want to
be depressed about it. ;-)

    >> A bit further down the road, I will probably dump the asyncore stuff
    >> in favor of something based on SimpleHTTPServer just to reduce the
    >> number of lines of code.  Without the POP stuff going on there's no
    >> great need for the channel multiplexing.

    Richie> If I can persuade you to use pop3proxy (or its successor, a
    Richie> generic Spambayes server that can optionally host either or both
    Richie> of the web UI and the POP3 proxy), you won't need to pull out
    Richie> the async stuff.

That's fine.  My only worry is that the async code will never be as well
exercised as SimpleHTTPServer.

Skip


From just at letterror.com  Fri Jan 17 22:01:30 2003
From: just at letterror.com (Just van Rossum)
Date: Fri Jan 17 16:02:00 2003
Subject: [Spambayes] Corpus.Message.__getattr__ can't be correct can it?
In-Reply-To: <15912.25402.44471.77970@montanaro.dyndns.org>
Message-ID: <r01050400-1023-E6DDFD202A5E11D79FAB003065D5E7E4@[10.0.0.23]>

Skip Montanaro wrote:

> Well, I think Richie meant it should be:
> 
>     def __getattr__(self, attributeName):
>         '''On-demand loading of the message text.'''
> 
>         if attributeName in ('hdrtxt', 'payload'):
>             self.load()
>         try:
>             return self.__dict__[attributeName]
>         except KeyError:
>             raise AttributeError, attributeName
> 
> That is, __getattr__ is called when hdrtxt or payload are accessed
> but not yet initialized.  All other accesses (or if self.load() fails
> somehow) should raise AttributeError.  See Corpus.py 1.3.

Yeah, what I wrote was nonsense. But while we're nitpicking, the _real_
intent of the code is probably this:

    def __getattr__(self, attributeName):
        '''On-demand loading of the message text.'''

        if attributeName in ('hdrtxt', 'payload'):
            self.load()
            return self.__dict__[attributeName]
        raise AttributeError, attributeName

This is assuming self.load() _always_ sets those two attrs.

Back to lurk mode...

Just

From skip at pobox.com  Fri Jan 17 15:19:48 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 16:20:10 2003
Subject: [Spambayes] Stemming and stopword elemination
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHMEBMEJAA.tim.one@comcast.net>
References: <20030117134741.1e88011c.Alexander@Leidinger.net>
        <BIEJKCLHCIOIHAGOKOLHMEBMEJAA.tim.one@comcast.net>
Message-ID: <15912.29556.507000.951582@montanaro.dyndns.org>


    >> has someone already experimented with Information Retrieval
    >> techniques like stopword elemination (stopwords: the, a, an, or, and,
    >> ...) and word stemming?

    Tim> Yes and no.

    Tim> Stemming is a different issue.  We not only don't stem, we don't
    Tim> even strip punctuation.  

Well, mostly.  In the usual linguistic sense spambayes doesn't stem, however
the tokenizer does collapse some things.  Long strings are compressed to
something like "skip b 40" where 'b' is the first letter and '40' is the
length of the string (or the number of characters elided).  In the email
prefix stuff I checked in and the suffix stuff I am still pondering, I
generate tokens like pfxlen:%d up to some small threshold value.  Above
that, I just generate "pflen:big" or "sfxlen:big".  Otherwise, I'd have a
number of tokens in my database with keys of "pfxlen:N" (where is is a
"biggish" number) and a value of (1,0) (spammy hapaxes - seen once in spam
and never in ham).

Skip

From skip at pobox.com  Fri Jan 17 16:30:08 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 17 17:30:18 2003
Subject: [Spambayes] OptionConfig.py - split into two pieces?
Message-ID: <15912.33776.619638.320031@montanaro.dyndns.org>


In pop3proxy.py I see 

    from OptionConfig import OptionsConfigurator

but OptionConfig.py is at the top level (not in the spambayes package) and
isn't installed.  It looks like both a module and a script.  Perhaps it
should be split in two pieces, a script and an importable module.

Skip


From frank.horowitz at csiro.au  Sat Jan 18 11:06:06 2003
From: frank.horowitz at csiro.au (Frank Horowitz)
Date: Fri Jan 17 22:07:00 2003
Subject: [Spambayes] Sourceforge :pserver cvs access broken... (FIXED)
In-Reply-To: <15912.5783.216516.749029@montanaro.dyndns.org>
References: <1042773411.22390.7.camel@bonzo.ned.dem.csiro.au> 
	<15912.5783.216516.749029@montanaro.dyndns.org>
Message-ID: <1042859167.1792.1.camel@amdo>

On Fri, 2003-01-17 at 22:43, Skip Montanaro wrote:
> 
>     Frank> Does anyone (by any small miracle) have a mirror of the cvs tree
>     Frank> that they'd be willing to put online while SF gets it's act
>     Frank> together?
> 
> Not a mirror, but I just put a gzipped tar file snapshot at
> 
>     http://www.musi-cal.com/~skip/python.spambayes.tar.gz
> 
> I'd be happy to update it periodically, though I have to do it manually,
> since on that machine cvs prompts me for my SF password when I 'cvs up'.
> 

Thanks to all of the kind souls out there who jumped into the SF breach!

SF is now serving cvs via both :pserver and the web again.

	Frank


From francois.granger at free.fr  Sat Jan 18 15:22:49 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 18 09:22:55 2003
Subject: [Spambayes] Fresh download
Message-ID: <a05200f05ba4f12c3f230@[192.168.1.20]>

I downloaded the nightly build this morning.
I copied my current bayescustomize.ini in the new directory.
First try give this message:

=================================================
[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py
config file has unknown option 'spam_cutoff' in section 'TestDriver'
config file has unknown option 'ham_cutoff' in section 'TestDriver'
Traceback (most recent call last):
   File "OptionConfig.py", line 32, in ?
     from spambayes.Options import options
   File "spambayes/Options.py", line 542, in ?
     options.mergefiles(['bayescustomize.ini'])
   File "spambayes/Options.py", line 496, in mergefiles
     self._update()
   File "spambayes/Options.py", line 523, in _update
     raise ValueError("errors while parsing .ini file")
ValueError: errors while parsing .ini file
=================================================

So I remove these two option.....
Next try gave this:

=================================================
[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py
Serving HTTP on 0.0.0.0 port 8000 ...
localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 -
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 49809)
Traceback (most recent call last):
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
line 221, in handle_request
     self.process_request(request, client_address)
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
line 240, in process_request
     self.finish_request(request, client_address)
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
line 253, in finish_request
     self.RequestHandlerClass(request, client_address, self)
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
line 514, in __init__
     self.handle()
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", 
line 266, in handle
     method()
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SimpleHTTPServer.py", 
line 41, in do_GET
     f = self.send_head()
   File "SmarterHTTPServer.py", line 100, in send_head
     retstr = getattr(self, methname)(pdict)
   File "OptionConfig.py", line 84, in homepage
     parm_ini_map[httpparm][PIMapOpt]))
   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", 
line 279, in get
     raise NoOptionError(option, section)
NoOptionError: No option `spam_cutoff' in section: TestDriver

-- 
Recently using MacOSX.......

From francois.granger at free.fr  Sat Jan 18 15:33:38 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 18 09:33:44 2003
Subject: [Spambayes] Success and failure
Message-ID: <a05200f06ba4f156d923f@[192.168.1.20]>

I tried pop3proxy on one account, it worked like a charm. But failed 
after getting the mails with segmentation fault (see at end).

I got some questions:

On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there 
any other possibility to launch it ?

[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python pop3proxy.py
Loading database... Done.
Traceback (most recent call last):
   File "pop3proxy.py", line 1650, in ?
     run()
   File "pop3proxy.py", line 1644, in run
     main(state.servers, state.proxyPorts, state.uiPort, state.launchUI)
   File "pop3proxy.py", line 1349, in main
     BayesProxyListener(server, serverPort, proxyPort)
   File "pop3proxy.py", line 399, in __init__
     Listener.__init__(self, proxyPort, BayesProxy, proxyArgs)
   File "pop3proxy.py", line 178, in __init__
     self.bind(('', port))
   File "/usr/lib/python2.2/asyncore.py", line 306, in bind
     return self.socket.bind (addr)
socket.error: (13, 'Permission denied')
[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py
Password:
Loading database... Done.
Listener on port 110 is proxying pop.nerim.net:110
User interface url is http://localhost:8880
Segmentation fault

-- 
Recently using MacOSX.......

From francois.granger at free.fr  Sat Jan 18 16:00:08 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 18 10:00:13 2003
Subject: [Spambayes] Follow up
Message-ID: <a05200f07ba4f1a73bf9f@[192.168.1.20]>

There is kind of a problem.
It may be specific to MacOS X, but I think that pop3proxy should 
filter on filenames since it may grab incorrect files.

MacOS X create in each directory a file named ".DS_Store" for it own 
uses. Since it is a hidden file, there is no issue with most 
software. But pop3proxy loads it as if it was a normal message file.

========================================
[fbg:/Volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py
Loading database... Loading state from 
/Volumes/OS99/spambayesf/hammie.db database
/Volumes/OS99/spambayesf/hammie.db is an existing database, with 99 
spam and 29 ham
Done.
placing .DS_Store in corpus cache                      <- this is a 
serious problem ;-)
BayesProxyListener listening on port 110.
Listener on port 110 is proxying pop.nerim.net:110
UserInterfaceListener listening on port 8880.
User interface url is http://localhost:8880
========================================

After clicking on the "Review message" link in the main page of 
pop3proxy, the terminal display the following lines.

========================================
adding 1042901588 to corpus
storing 1042901588
adding message 1042901588 to corpus
placing 1042901588 in corpus cache
adding 1042901588-2 to corpus
storing 1042901588-2
adding message 1042901588-2 to corpus
placing 1042901588-2 in corpus cache
error: uncaptured python exception, closing channel 
<__main__.UserInterface connected at 0x276450> 
(exceptions.ValueError:invalid literal for long(): .DS_Store 
[/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asyncore.py|poll|94] 
[/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asyncore.py|handle_read_event|389] 
[/BinaryCache/python/python-3.root~193/usr/lib/python2.2/asynchat.py|handle_read|130] 
[pop3proxy.py|found_terminator|811] [pop3proxy.py|onRequest|837] 
[pop3proxy.py|onReview|1143] [pop3proxy.py|buildReviewKeys|1020] 
[pop3proxy.py|keyToTimestamp|976])
========================================


-- 
Recently using MacOSX.......

From skip at pobox.com  Sat Jan 18 09:02:54 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 18 10:02:59 2003
Subject: [Spambayes] Fresh download
In-Reply-To: <a05200f05ba4f12c3f230@[192.168.1.20]>
References: <a05200f05ba4f12c3f230@[192.168.1.20]>
Message-ID: <15913.27806.863804.858968@montanaro.dyndns.org>

    Fran?ois> config file has unknown option 'spam_cutoff' in section 'TestDriver'
    Fran?ois> config file has unknown option 'ham_cutoff' in section 'TestDriver'

These two now go in the new [Categorization] section.

Skip

From francois.granger at free.fr  Sat Jan 18 16:21:27 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 18 10:21:34 2003
Subject: [Spambayes] Fresh download
In-Reply-To: <15913.27806.863804.858968@montanaro.dyndns.org>
References: <a05200f05ba4f12c3f230@[192.168.1.20]>
 <15913.27806.863804.858968@montanaro.dyndns.org>
Message-ID: <a05200f09ba4f205d2265@[192.168.1.20]>

At 09:02 -0600 18/01/2003, in message Re: [Spambayes] Fresh download, 
Skip Montanaro wrote:
>     Fran?ois> config file has unknown option 'spam_cutoff' in 
>section 'TestDriver'
>     Fran?ois> config file has unknown option 'ham_cutoff' in section 
>'TestDriver'
>
>These two now go in the new [Categorization] section.

Thanks, but if I remove them from my ini files, they should get a 
default value. This is done in Option.py.

But it did not worked for me as stated in the second part of my 
previous message:

>So I remove these two option.....
>Next try gave this:
>
>=================================================
>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py
>Serving HTTP on 0.0.0.0 port 8000 ...
>localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 -
>----------------------------------------
>Exception happened during processing of request from ('127.0.0.1', 49809)
>Traceback (most recent call last):
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>line 221, in handle_request
>     self.process_request(request, client_address)
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>line 240, in process_request
>     self.finish_request(request, client_address)
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>line 253, in finish_request
>     self.RequestHandlerClass(request, client_address, self)
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>line 514, in __init__
>     self.handle()
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", 
>line 266, in handle
>     method()
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SimpleHTTPServer.py", 
>line 41, in do_GET
>     f = self.send_head()
>   File "SmarterHTTPServer.py", line 100, in send_head
>     retstr = getattr(self, methname)(pdict)
>   File "OptionConfig.py", line 84, in homepage
>     parm_ini_map[httpparm][PIMapOpt]))
>   File 
>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", 
>line 279, in get
>     raise NoOptionError(option, section)
>NoOptionError: No option `spam_cutoff' in section: TestDriver
>
>

-- 
Recently using MacOSX.......

From tim at fourstonesExpressions.com  Sat Jan 18 09:23:39 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Sat Jan 18 10:24:19 2003
Subject: [Spambayes] Fresh download
In-Reply-To: <a05200f09ba4f205d2265@[192.168.1.20]>
Message-ID: <E0TPZTNKBLG6164WT7587EA2XVR434.3e29717b@myst>

Looks like we might could use a migration script?  We certainly ought to keep 
this in mind for future releases...

On another note, I finally was able to get cvs workin on this new machine, so 
I'm back in business :)

- TimS

1/18/2003 9:21:27 AM, Fran?ois Granger <francois.granger@free.fr> wrote:

>At 09:02 -0600 18/01/2003, in message Re: [Spambayes] Fresh download, 
>Skip Montanaro wrote:
>>     Fran?ois> config file has unknown option 'spam_cutoff' in 
>>section 'TestDriver'
>>     Fran?ois> config file has unknown option 'ham_cutoff' in section 
>>'TestDriver'
>>
>>These two now go in the new [Categorization] section.
>
>Thanks, but if I remove them from my ini files, they should get a 
>default value. This is done in Option.py.
>
>But it did not worked for me as stated in the second part of my 
>previous message:
>
>>So I remove these two option.....
>>Next try gave this:
>>
>>=================================================
>>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% python OptionConfig.py
>>Serving HTTP on 0.0.0.0 port 8000 ...
>>localhost - - [18/Jan/2003 15:15:33] "GET / HTTP/1.1" 200 -
>>----------------------------------------
>>Exception happened during processing of request from ('127.0.0.1', 49809)
>>Traceback (most recent call last):
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>>line 221, in handle_request
>>     self.process_request(request, client_address)
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>>line 240, in process_request
>>     self.finish_request(request, client_address)
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>>line 253, in finish_request
>>     self.RequestHandlerClass(request, client_address, self)
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/SocketServer.py", 
>>line 514, in __init__
>>     self.handle()
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/BaseHTTPServer.py", 
>>line 266, in handle
>>     method()
>>   File 
>>"/BinaryCache/python/python-3.root~
193/usr/lib/python2.2/SimpleHTTPServer.py", 
>>line 41, in do_GET
>>     f = self.send_head()
>>   File "SmarterHTTPServer.py", line 100, in send_head
>>     retstr = getattr(self, methname)(pdict)
>>   File "OptionConfig.py", line 84, in homepage
>>     parm_ini_map[httpparm][PIMapOpt]))
>>   File 
>>"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/ConfigParser.py", 
>>line 279, in get
>>     raise NoOptionError(option, section)
>>NoOptionError: No option `spam_cutoff' in section: TestDriver
>>
>>
>
>-- 
>Recently using MacOSX.......
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Sat Jan 18 09:59:52 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 18 10:59:58 2003
Subject: [Spambayes] Fresh download
In-Reply-To: <a05200f09ba4f205d2265@[192.168.1.20]>
References: <a05200f05ba4f12c3f230@[192.168.1.20]>
        <15913.27806.863804.858968@montanaro.dyndns.org>
        <a05200f09ba4f205d2265@[192.168.1.20]>
Message-ID: <15913.31224.984560.906915@montanaro.dyndns.org>


    >> These two now go in the new [Categorization] section.

    Fran?ois> Thanks, but if I remove them from my ini files, they should
    Fran?ois> get a default value. This is done in Option.py.

    >> NoOptionError: No option `spam_cutoff' in section: TestDriver

They are somehow still winding up in the [TestDriver] section.  Make sure
you aren't importing an old version of Options.py and don't have another
.ini file which is getting loaded.

Skip


From mwh at python.net  Sat Jan 18 17:05:42 2003
From: mwh at python.net (Michael Hudson)
Date: Sat Jan 18 12:05:50 2003
Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can it?
References: <15912.16102.713265.622424@montanaro.dyndns.org>
	<d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
	<15912.25180.638042.95791@montanaro.dyndns.org>
Message-ID: <2mznpy8k2h.fsf@starship.python.net>

Skip Montanaro <skip@pobox.com> writes:

> Yes, now that I know what's going on, I understand why I was getting
> infinite loops.  The __getattr__ method is really only meant to initialize
> payload and hdrtxt.

In which case why not use a property?

Cheers,
M.

-- 
  ROOSTA:  Ever since you arrived on this planet last night you've
           been going round telling people that you're Zaphod
           Beeblebrox, but that they're not to tell anyone else.
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 7


From skip at pobox.com  Sat Jan 18 11:15:01 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 18 12:15:09 2003
Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can
	it?
In-Reply-To: <2mznpy8k2h.fsf@starship.python.net>
References: <15912.16102.713265.622424@montanaro.dyndns.org>
        <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
        <15912.25180.638042.95791@montanaro.dyndns.org>
        <2mznpy8k2h.fsf@starship.python.net>
Message-ID: <15913.35733.325660.407255@montanaro.dyndns.org>


    >> The __getattr__ method is really only meant to initialize payload and
    >> hdrtxt.

    Michael> In which case why not use a property?

Why?  __getattr__ works fine, once it's properly written.

Skip

From mwh at python.net  Sat Jan 18 17:22:57 2003
From: mwh at python.net (Michael Hudson)
Date: Sat Jan 18 12:23:01 2003
Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can 	it?
References: <15912.16102.713265.622424@montanaro.dyndns.org>
	<d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
	<15912.25180.638042.95791@montanaro.dyndns.org>
	<2mznpy8k2h.fsf@starship.python.net>
	<15913.35733.325660.407255@montanaro.dyndns.org>
Message-ID: <2mwul28j9q.fsf@starship.python.net>

Skip Montanaro <skip@pobox.com> writes:

>     >> The __getattr__ method is really only meant to initialize payload and
>     >> hdrtxt.
> 
>     Michael> In which case why not use a property?
> 
> Why?  __getattr__ works fine, once it's properly written.

I was thinking of its performance-mangling properties.  Dunno if
that's an issue here, it was only an off-the-cuff remark.

Cheers,
M.

-- 
  ARTHUR:  Why should he want to know where his towel is?
    FORD:  Everybody should know where his towel is.
  ARTHUR:  I think your head's come undone.
                    -- The Hitch-Hikers Guide to the Galaxy, Episode 7


From skip at pobox.com  Sat Jan 18 11:56:49 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 18 12:56:59 2003
Subject: [Spambayes] Re: Corpus.Message.__getattr__ can't be correct can
	it?
In-Reply-To: <2mwul28j9q.fsf@starship.python.net>
References: <15912.16102.713265.622424@montanaro.dyndns.org>
        <d4jg2votv6ogbh01dhr17jtei8cibfv0fc@4ax.com>
        <15912.25180.638042.95791@montanaro.dyndns.org>
        <2mznpy8k2h.fsf@starship.python.net>
        <15913.35733.325660.407255@montanaro.dyndns.org>
        <2mwul28j9q.fsf@starship.python.net>
Message-ID: <15913.38241.389238.638886@montanaro.dyndns.org>


    >> >> The __getattr__ method is really only meant to initialize payload
    >> >> and hdrtxt.
    >> 
    Michael> In which case why not use a property?
    >> 
    >> Why?  __getattr__ works fine, once it's properly written.

    Michael> I was thinking of its performance-mangling properties.  Dunno
    Michael> if that's an issue here, it was only an off-the-cuff remark.

I was thinking that the simplest solution which gives correct behavior would
be best.  Using properties would have required me to convert Corpus.Message
to a new-style class, and while that probably wouldn't have broken anything,
it wasn't a direct response to the bug.

Sure, __getattr__ can hurt performance, but in this case I think it's
reasonable.  It computes the necessary attribute values and updates them so
further accesses won't call __getattr__.

Skip

From tony-bayes at lownds.com  Sat Jan 18 12:08:03 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Sat Jan 18 15:08:22 2003
Subject: [Spambayes] Success and failure
In-Reply-To: <a05200f06ba4f156d923f@[192.168.1.20]>
References: <a05200f06ba4f156d923f@[192.168.1.20]>
Message-ID: <a05200f6fba4f5cf5f34a@[204.162.121.104]>

>On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there 
>any other possibility to launch it ?

Superuser privileges are always needed to bind to any port below 1000 
on Unix. Are you binding to port 110?

You can work around this by a) binding to a port above 1000 and b) 
configuring Eudora to connect to that port instead of port 110. 
You'll need the "Esoteric Settings" Eudora plugin installed to make 
that change.

>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py
>Password:
>Loading database... Done.
>Listener on port 110 is proxying pop.nerim.net:110
>User interface url is http://localhost:8880
>Segmentation fault

Did it segfault after you asked it to train messages? Raising the 
stack size allocated to new process before starting pop3proxy will 
fix this. Mac OS X has a rather small stack size by default.

Try running "limit stacksize 2048" before starting pop3proxy.py

BTW, I've updated the patch for binding to a specific address and 
posted it to Sourceforge: #670417

-Tony

From noreply at sourceforge.net  Sat Jan 18 08:35:13 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat Jan 18 17:00:16 2003
Subject: [Spambayes] 
 [ spambayes-Bugs-669149 ] NameError in ExpiryCorpus.removeExpiredMessages
Message-ID: <E18Zvw9-0002ue-00@sc8-sf-web2.sourceforge.net>

Bugs item #669149, was opened at 2003-01-16 10:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702

Category: None
Group: None
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Skip Montanaro (montanaro)
Assigned to: Tim Stone (timstone4)
Summary: NameError in ExpiryCorpus.removeExpiredMessages

Initial Comment:
In verbose mode, removeExpiredMessages prints out a line which
references the nonexistent variable, key.  I have no idea what it
should be, otherwise I'd fix it.

----------------------------------------------------------------------

>Comment By: Tim Stone (timstone4)
Date: 2003-01-18 10:35

Message:
Logged In: YES 
user_id=645698

Corrected print statement to reference msg.key()

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=669149&group_id=61702

From noreply at sourceforge.net  Sat Jan 18 12:06:23 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat Jan 18 17:00:25 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to
	bind to specific addresses
Message-ID: <E18ZzEV-0003T0-00@sc8-sf-web4.sourceforge.net>

Patches item #670417, was opened at 2003-01-18 20:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow the pop3 proxies to bind to specific addresses

Initial Comment:
This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting.

This is useful for two reasons:

1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security.

2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts.

The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

From francois.granger at free.fr  Sun Jan 19 01:11:12 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 18 19:11:20 2003
Subject: [Spambayes] Success and failure
In-Reply-To: <a05200f6fba4f5cf5f34a@[204.162.121.104]>
References: <a05200f06ba4f156d923f@[192.168.1.20]>
 <a05200f6fba4f5cf5f34a@[204.162.121.104]>
Message-ID: <a05200f0dba4f9cbea5ba@[192.168.1.20]>

At 12:08 -0800 18/01/2003, in message Re: [Spambayes] Success and 
failure, Tony Lownds wrote:
>>On MacOS X, it seems that pop3proxy _must_ run with sudo. Is there 
>>any other possibility to launch it ?
>
>Superuser privileges are always needed to bind to any port below 
>1000 on Unix. Are you binding to port 110?
>
>You can work around this by a) binding to a port above 1000 and b) 
>configuring Eudora to connect to that port instead of port 110. 
>You'll need the "Esoteric Settings" Eudora plugin installed to make 
>that change.

I see what you mean. I don't care since I am the only user of this 
station. It was more for prospective users.

>>[fbg:/volumes/OS99/spambayes-2003-01-17] fgranger% sudo python pop3proxy.py
>>Password:
>>Loading database... Done.
>>Listener on port 110 is proxying pop.nerim.net:110
>>User interface url is http://localhost:8880
>>Segmentation fault
>
>Did it segfault after you asked it to train messages?

yes.

>  Raising the stack size allocated to new process before starting 
>pop3proxy will fix this. Mac OS X has a rather small stack size by 
>default.
>
>Try running "limit stacksize 2048" before starting pop3proxy.py

My script is copied on yours:
#!/bin/sh
#clear
ulimit -s 2048
sudo ifconfig lo0 inet 127.0.0.2 add
sudo ifconfig lo0 inet 127.0.0.3 add
sudo ifconfig lo0 inet 127.0.0.4 add
cd /Volumes/OS99/spambayes/
sudo python pop3proxym.py

By the way, thanks to your help, I am able to connect to 4 pop 
servers from Eudora.

>BTW, I've updated the patch for binding to a specific address and 
>posted it to Sourceforge: #670417

Thanks a lot.

Please, people with commit rights, validate this patch so that I can 
more fully test the latest and greatest version.

-- 
Recently using MacOSX.......

From barry at python.org  Thu Jan 16 18:48:57 2003
From: barry at python.org (Barry A. Warsaw)
Date: Sat Jan 18 19:58:00 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<3E26A278.3080302@hooft.net>
	<15910.43747.285523.378123@montanaro.dyndns.org>
Message-ID: <15911.17641.101054.962896@gargle.gargle.HOWL>


    Rob> Doesn't it take time before the first spam arrives on a brand
    Rob> new mailinglist? Spambayes' results are going to be real
    Rob> lousy if it is trained on 200 ham and 0 spam messages....

It might be, but how will that lousiness manifest?  As false
negatives?  If so, the -spam reporting address for the list should
eventually warm up the spam side, right?

Depending on how much your legitimate list traffic looks like spam
already, it might warm up pretty quickly.

-Barry

From barry at python.org  Thu Jan 16 18:51:20 2003
From: barry at python.org (Barry A. Warsaw)
Date: Sat Jan 18 19:58:09 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<15910.43747.285523.378123@montanaro.dyndns.org>
	<3E26B4EB.5020100@hooft.net>
Message-ID: <15911.17784.585905.637568@gargle.gargle.HOWL>


>>>>> "RWWH" == Rob W W Hooft <rob@hooft.net> writes:

    RWWH> This sounds reasonable, but this can also be implemented as
    RWWH> a "preloaded database" that comes with spambayes. This is
    RWWH> something many people have already asked for.

I thought about this, and from Mailman's perspective it wouldn't be
hard to pre-train the list on some known spam when spambayes is
enabled.  If there is actual list traffic at that point, then perhaps
we can assume it's all ham and train on a balanced number of messages.

There may need to be hooks to reset, retrain or untrain the system.  I
think those are all tractable but not something I've addressed in my
prototype.

-Barry

From barry at python.org  Thu Jan 16 18:52:52 2003
From: barry at python.org (Barry A. Warsaw)
Date: Sat Jan 18 19:58:15 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <200301160612.h0G6C0x14523@localhost.localdomain>
	<3E26A278.3080302@hooft.net>
	<15910.51260.847140.60292@gargle.gargle.HOWL>
	<3E26DA4A.40404@hooft.net>
Message-ID: <15911.17876.845208.125779@gargle.gargle.HOWL>


>>>>> "RWWH" == Rob W W Hooft <rob@hooft.net> writes:

    RWWH> Isn't everything going to be marked as unsure as long as
    RWWH> there is no spam at all?

It didn't seem to.  But I only barely played with it.
-Barry

From barry at python.org  Thu Jan 16 18:57:02 2003
From: barry at python.org (Barry A. Warsaw)
Date: Sat Jan 18 19:58:20 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15910.18557.535408.669103@gargle.gargle.HOWL>
	<BIEJKCLHCIOIHAGOKOLHOEKOEIAA.tim.one@comcast.net>
Message-ID: <15911.18126.450611.521139@gargle.gargle.HOWL>


>>>>> "TP" == Tim Peters <tim.one@comcast.net> writes:

    TP> Better to start by training on a few spam, and a few copies of
    TP> the list introduction msg (a decent intro msg necessarily
    TP> contains many words and lexicalisms characteristic of the
    TP> list's topic).

See my previous message about initial training.  We may want to have
some canned spam to train on when we enable spambayes.  Using the list
intro message is a neat idea for when you have no posts available for
the list.

If, OTOH, people take Skips advice and only turn it on when its
necessary, then maybe we can use messages we already have to train
it.  One source of known good messages are those the admin has
explicitly approved.  Maybe if we have 20 canned spam, we can save up
to the last 20 approved messages.  Then when the list admin enables
spambayes, we train on those.

-Barry

From barry at python.org  Thu Jan 16 18:45:58 2003
From: barry at python.org (Barry A. Warsaw)
Date: Sat Jan 18 19:58:35 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15909.39504.598866.52741@montanaro.dyndns.org>
	<15910.18557.535408.669103@gargle.gargle.HOWL>
	<15910.42543.629381.696105@montanaro.dyndns.org>
Message-ID: <15911.17462.734808.296463@gargle.gargle.HOWL>


[I added mailman-developers to this list because I think people will
be interested in my prototype integration of Mailman and spambayes, a
statistical learning classifier, which I've targeted for spam fighting
on Mailman lists.  -BAW].

>>>>> "SM" == Skip Montanaro <skip@pobox.com> writes:

    SM> In my case I sidestepped training altogether because the
    SM> list's content is a subset of the stuff I'm interested in
    SM> anyway.  Most of the "spam" messages encountered by the list
    SM> at this point are really of the virus/worm variety, and since
    SM> it's set up for members only posting, little, if any garbage
    SM> actually gets through to the list, even without using
    SM> spambayes.

I suspect python.org will be similar, since we have many other spam
defenses in place.  I've just been playing with my prototype, and
yeah, it sure learns fast even with no a-priori training.  I'm not
100% a train-on-the-fly approach will work, so it's worth some real
world banging.

In my simplified approach, you start out holding all unsure and spam.
Legit messages will hit one of those first, likely unsure if your list
wasn't advertised on Usenet before real people started posting
<wink>.  There's one extra button on the admindb page called
"Train?".  Click this if you want to train a held message based on
your action.  If you approve the message, it gets trained as ham, and
if you reject or discard it, it gets trained as spam.

Within about 10 messages (first a bunch of ham, then a random and
unscientific barrage <wink> of spam and ham) the classifier was doing
pretty good.  It was catching all the spam and letting through most of
the ham.  The ham recognition definitely went up as I approved more
messages.

False positives get caught on the admindb screen, so you approve and
train them in one action.  Although I never saw any false negatives, I
think the way to handle these will be to add a -spam address that
people can send messages to.  If the list admin sends it then it gets
spam trained.  If not, the list admin will have a chance to decide
whether to spam train it or not.

    SM> One reason I'm interested in separating pop3proxy into two
    SM> functions ( POP retrieval/classifying and training/web UI) is
    SM> that the training/web component should be useful for other
    SM> spambayes users.  Right now in my current environment,
    SM> training is clunky enough that I only train on unsures and
    SM> mistakes.  While that works okay because my starting corpus
    SM> was so large (around 20,000 messages) the indications from
    SM> people who've experimented with that sort of training is that
    SM> the quality of classification does degrade over time.

That's an important point.  While I'm not sure that with my approach
the quality of classification will improve over time <wink>, I think a
training regimen integrated with the admindb stuff will be the most
natural for a Mailman list admin.

BTW, the hammie.py interface was all I needed for my prototype.  One
reason for going with hammie is that each mailing list needs its own
database, and I can just create a Hammie, associate it with a list,
and tie it easily into Mailman's load/save mechanism.

-Barry

From tim.one at comcast.net  Sun Jan 19 00:27:10 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sun Jan 19 00:29:03 2003
Subject: [Spambayes] spambayes fronting a mailing list?
In-Reply-To: <15911.17641.101054.962896@gargle.gargle.HOWL>
Message-ID: <LNBBLJKPBEHFEDALKOLCMECKDJAB.tim.one@comcast.net>

[Rob]
> Doesn't it take time before the first spam arrives on a brand
> new mailinglist? Spambayes' results are going to be real
> lousy if it is trained on 200 ham and 0 spam messages....

[Barry]
> It might be, but how will that lousiness manifest?

It depends on a lot on whether you enable the bool
experimental_ham_spam_imbalance_adjustment option.  It it's true, and you
have no spam, every msg will score exactly 0.5.

> As false negatives?

If experimental_ham_spam_imbalance_adjustment is false (still the default,
since I haven't touched the code since the option was introduced), yes.
Every word in the database will be associated with ham, so nothing is
evidence for spam.

> If so, the -spam reporting address for the list should
> eventually warm up the spam side, right?

Yes it will.  It's best to shoot for the same # of ham and spam, if for no
other reason than that then experimental_ham_spam_imbalance_adjustment has
no effect either way <0.6 wink>.

> Depending on how much your legitimate list traffic looks like spam
> already, it might warm up pretty quickly.

It won't look like spam.  Even if it "looks like spam" to human eyes, the
classifier will find many strong differences, some of which people will
never think of.  Hell, some differences people will even argue about, but
it's futile -- real-life data doesn't lie about real life <wink>.


From vanhorn at whidbey.com  Sat Jan 18 22:37:54 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Sun Jan 19 01:37:57 2003
Subject: [Spambayes] spambayes fronting a mailing list?
References: <15910.18557.535408.669103@gargle.gargle.HOWL>
	<15911.18126.450611.521139@gargle.gargle.HOWL>
Message-ID: <3E2A47C2.B323EF93@whidbey.com>

Since I have about eight lists on my Mailman server that relate to real
estate, I'd certainly hate to see any mortgage or refininance spam show up
in the canned spam seed. I'm sure that there are hosts for medical
purposes who wouldn't want to have penis and breast enlargement spam in
the seed.

The list intro as ham makes sense, as might sending a message (or a
series) comprising the current list membership, those e-mail addresses are
certainly strong ham clues.

Van

"Barry A. Warsaw" wrote:

> >>>>> "TP" == Tim Peters <tim.one@comcast.net> writes:
>
>     TP> Better to start by training on a few spam, and a few copies of
>     TP> the list introduction msg (a decent intro msg necessarily
>     TP> contains many words and lexicalisms characteristic of the
>     TP> list's topic).
>
> See my previous message about initial training.  We may want to have
> some canned spam to train on when we enable spambayes.  Using the list
> intro message is a neat idea for when you have no posts available for
> the list.
>
> If, OTOH, people take Skips advice and only turn it on when its
> necessary, then maybe we can use messages we already have to train
> it.  One source of known good messages are those the admin has
> explicitly approved.  Maybe if we have 20 canned spam, we can save up
> to the last 20 approved messages.  Then when the list admin enables
> spambayes, we train on those.
>
> -Barry
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes

--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------


From noreply at sourceforge.net  Sat Jan 18 22:51:27 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun Jan 19 10:32:06 2003
Subject: [Spambayes] [ spambayes-Feature Requests-670573 ] IMAP proxy
Message-ID: <E18a9Il-0003Xy-00@sc8-sf-web3.sourceforge.net>

Feature Requests item #670573, was opened at 2003-01-19 01:51
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Jean-Marc Valin (jmvalin)
Assigned to: Nobody/Anonymous (nobody)
Summary: IMAP proxy

Initial Comment:
I use IMAP for my mail, so I think an IMAP proxy for
spambayes would be great.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=670573&group_id=61702

From francois.granger at free.fr  Sun Jan 19 19:23:42 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sun Jan 19 13:23:49 2003
Subject: [Spambayes] Success and failure
In-Reply-To: <a05200f6fba4f5cf5f34a@[204.162.121.104]>
References: <a05200f06ba4f156d923f@[192.168.1.20]>
 <a05200f6fba4f5cf5f34a@[204.162.121.104]>
Message-ID: <a05200f19ba509d66ec48@[192.168.1.20]>

At 12:08 -0800 18/01/2003, in message Re: [Spambayes] Success and 
failure, Tony Lownds wrote:
>
>BTW, I've updated the patch for binding to a specific address and 
>posted it to Sourceforge: #670417

I had a look to Sourceforge. It is there but I can't download the file.

Can you send it to me directly ?

-- 
Recently using MacOSX.......

From richard at jowsey.com  Mon Jan 20 06:38:26 2003
From: richard at jowsey.com (Richard Jowsey)
Date: Sun Jan 19 14:44:22 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEOADIAB.tim.one@comcast.net>
References: <3E25521B.20937.3607FFA@localhost>
Message-ID: <3E2B9962.26334.308D0BD@localhost>

> Upgrade to Python and you would have finished a couple months ago
> <wink>.

Yeah, that thought had occurred to me too... <grin>
 
> [chi-combining] This gives it some nice
> properties for automated decision making (the cutoff points for
> gary-combining were too touchy, across test sets, and across
> time).  But if you like a mode where you simply sort msgs by
> score, you can stop with gary-combining and be happy.

I have a very large training corpus, so I'm seeing well-
separated distributions of good versus spam probs, with a 
sprinkling of "unsures" scattered through the middle. An 
uncertain cutoff at 3 sigma from the means should work, but this 
notion needs some testing. That chi2 test is definitely on the 
drawing boards, even if only for comparison purposes...

Death To Spam!

Cheers,
Richard


From nas at python.ca  Sun Jan 19 16:13:44 2003
From: nas at python.ca (Neil Schemenauer)
Date: Sun Jan 19 19:07:43 2003
Subject: [Spambayes] pushing back the cost of spam
Message-ID: <20030120001344.GA6862@glacier.arctrix.com>

Here's an idea.  Do spam filtering at the transport level (i.e. "STMP
time").  When a message is considered spam by the filter, return a
temporary error (i.e. 4xx).  Include the number of times the message
delivery has been retried and the time since the first attempt as part
of the evidence when filtering.  RFC 2821 specifies that messages should
be retried for at least 4 days.  If the message is still being retried
after, say, 2 days and is still flagged as spam by the filter then
accept it but save it in the spam folder.

I think if this system was widely implemented the spammer's job would
become considerably more difficult.  Spammers rely on hit and run
tactics and I don't think they could tolerate a one or two day delay.
Abused open relays would become heavily loaded due to all messages
queued and the retries.  When the open relay was secured hopefully the
queue of spam would be cleared.  Also, I believe most email viruses do
not retry after a temporary error.

Finally, I'm guessing that a retry after one day would end up being a
strong ham clue.  Legitimate email that was initially considered spam
would have a better chance of not ending up in the spam folder.

Thoughts?

  Neil

From nas at python.ca  Sun Jan 19 16:35:05 2003
From: nas at python.ca (Neil Schemenauer)
Date: Sun Jan 19 19:29:01 2003
Subject: [Spambayes] pushing back the cost of spam
In-Reply-To: <20030120001344.GA6862@glacier.arctrix.com>
References: <20030120001344.GA6862@glacier.arctrix.com>
Message-ID: <20030120003505.GB6862@glacier.arctrix.com>

I forgot to mention one advantage of this scheme.  It could be
implemented in modified form for an entire server of users (without
their help).  Only return a temporary error for something like 12 hours.
After that, allow the mail through.  That doesn't violate any standards
and all legitimate mail will get through.  I suspect a lot of spam would
be blocked (I'll try to run some tests).

Having a system that can be enabled server-wide is a big advantage.
Spambayes is great for technical people who don't want to see spam.
It's not really helping make spam unprofitable though, as Paul Graham
has mentioned in one of his articles.  We need to stop spam from
reaching those few idiots that actually act upon it.  I doubt those
people would install a spam filter themselves.  Either it has to be part
of the MUA or it needs to be installed by someone else.

  Neil

From john.abel at pa.press.net  Mon Jan 20 14:02:59 2003
From: john.abel at pa.press.net (John Abel)
Date: Mon Jan 20 09:05:09 2003
Subject: [Spambayes] Change Required To pspam/options.py
Message-ID: <3E2C0193.3040109@pa.press.net>

Hi,

I've been playing around with the pspam scripts, and found, since the 
move-around, that it was broke.

The line:

from Options import options, all_options, \
      boolean_cracker, float_cracker, int_cracker, string_cracker

needs changing to

from spambayes.Options import options, all_options, \
      boolean_cracker, float_cracker, int_cracker, string_cracker

I notice that this part of spambayes, seems to be somewhat aimed at *nix 
distributions.  I would be willing to work/test it on Win32?

Regards

John


From skip at pobox.com  Mon Jan 20 09:00:41 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 10:00:49 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access?
Message-ID: <15916.3865.297629.696625@montanaro.dyndns.org>


Depending on how training and classifying are accomplished, it's quite
possible that the two activities will be done in different processes.  For
example, I am currently experimenting with training using pop3proxy (well,
still my offshoot proxytrainer at the moment) while classification is being
done by hammiefilter run from procmail.  This implies a need to lock the
shelve/pickle file used to store the training info.  Seems to me we need to
(be able to) lock the shelve/pickle file.  The only lock facility which
seems cross-platform enough for this application is the set of flags used by
os.open().  To lock the database you'd have to check/create a lock file
related (namewise) to the actual database file.  Has anyone given this any
thought?

Skip


From noreply at sourceforge.net  Mon Jan 20 03:35:25 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 20 10:01:59 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to
	bind to specific addresses
Message-ID: <E18aaD7-0002Pr-00@sc8-sf-web4.sourceforge.net>

Patches item #670417, was opened at 2003-01-18 20:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
>Assigned to: Richie Hindle (richiehindle)
Summary: Allow the pop3 proxies to bind to specific addresses

Initial Comment:
This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting.

This is useful for two reasons:

1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security.

2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts.

The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string.


----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 11:35

Message:
Logged In: YES 
user_id=85414

Has SourceForge eaten the patch file?  It says
"No Files Currently Attached".


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

From francois.granger at free.fr  Mon Jan 20 14:13:05 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Mon Jan 20 10:02:19 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ]
Message-ID: <BA51B471.61A00%francois.granger@free.fr>

I got the files from Tony by private mail yesterday night.


-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>

-------------- next part --------------
#!/usr/bin/env python

"""A POP3 proxy that works with classifier.py, and adds a simple
X-Spambayes-Classification header (ham/spam/unsure) to each incoming
email.  You point pop3proxy at your POP3 server, and configure your
email client to collect mail from the proxy then filter on the added
header.  Usage:

    pop3proxy.py [options] [<server> [<server port>]]
        <server> is the name of your real POP3 server
        <port>   is the port number of your real POP3 server, which
                 defaults to 110.

        options:
            -z      : Runs a self-test and exits.
            -t      : Runs a fake POP3 server on port 8110 (for testing).
            -h      : Displays this help message.

            -p FILE : use the named database file
            -d      : the database is a DBM file rather than a pickle
            -l port : proxy listens on this port number (default 110)
            -u port : User interface listens on this port number
                      (default 8880; Browse http://localhost:8880/)
            -b      : Launch a web browser showing the user interface.

        All command line arguments and switches take their default
        values from the [pop3proxy] and [html_ui] sections of
        bayescustomize.ini.

For safety, and to help debugging, the whole POP3 conversation is
written out to _pop3proxy.log for each run, if options.verbose is True.

To make rebuilding the database easier, uploaded messages are appended
to _pop3proxyham.mbox and _pop3proxyspam.mbox.
"""

# This module is part of the spambayes project, which is Copyright 2002
# The Python Software Foundation and is covered by the Python Software
# Foundation license.

__author__ = "Richie Hindle <richie@entrian.com>"
__credits__ = "Tim Peters, Neale Pickett, Tim Stone, all the Spambayes folk."

try:
    True, False
except NameError:
    # Maintain compatibility with Python 2.2
    True, False = 1, 0


todo = """

Web training interface:

 o Functional tests.
 o Review already-trained messages, and purge them.
 o Put in a link to view a message (plain text, html, multipart...?)
   Include a Reply link that launches the registered email client, eg.
   mailto:tim@fourstonesExpressions.com?subject=Re:%20pop3proxy&body=Hi%21%0D
 o Keyboard navigation (David Ascher).  But aren't Tab and left/right
   arrow enough?
 o [Francois Granger] Show the raw spambrob number close to the buttons
   (this would mean using the extra X-Hammie header by default).
 o Add Today and Refresh buttons on the Review page.


User interface improvements:

 o Once the pieces are on separate pages, make the paste box bigger.
 o Deployment: Windows executable?  atlaxwin and ctypes?  Or just
   webbrowser?
 o Can it cleanly dynamically update its status display while having a
   POP3 converation?  Hammering reload sucks.
 o Save the stats (num classified, etc.) between sessions.
 o "Reload database" button.


New features:

 o "Send me an email every [...] to remind me to train on new
   messages."
 o "Send me a status email every [...] telling how many mails have been
   classified, etc."
 o Possibly integrate Tim Stone's SMTP code - make it use async, make
   the training code update (rather than replace!) the database.
 o Allow use of the UI without the POP3 proxy.
 o Remove any existing X-Spambayes-Classification header from incoming
   emails.
 o Whitelist.
 o Online manual.
 o Links to project homepage, mailing list, etc.
 o List of words with stats (it would have to be paged!) a la SpamSieve.


Code quality:

 o Make a separate Dibbler plugin for serving images, so there's no
   duplication between pop3proxy and OptionConfig.
 o Move the UI into its own module.
 o Cope with the email client timing out and closing the connection.
 o Lose the trailing dot from cached messages.


Info:

 o Slightly-wordy index page; intro paragraph for each page.
 o In both stats and training results, report nham and nspam - warn if
   they're very different (for some value of 'very').
 o "Links" section (on homepage?) to project homepage, mailing list,
   etc.


Gimmicks:

 o Classify a web page given a URL.
 o Graphs.  Of something.  Who cares what?
 o NNTP proxy.
 o Zoe...!

Notes, for the sake of somewhere better to put them:

Don't proxy spams at all?  This would mean writing a full POP3 client
and server - it would download all your mail on a timer and serve to you
all the non-spams.  It could be 'safe' in that it leaves the messages in
the real POP3 account until you collect them from it (or in the case of
spams, until you collect contemporaneous hams).  The web interface would
then present all the spams so that you could correct any FPs and mark
them for collection.  The thing is no longer a proxy (because the first
POP3 command in a conversion is STAT or LIST, which tells you how many
mails there are - it wouldn't know the answer, and finding out could
take weeks over a modem - I've already had problems with clients timing
out while the proxy was downloading stuff from the server).

Adam's idea: add checkboxes to a Google results list for "Relevant" /
"Irrelevant", then submit that to build a search including the
highest-scoring tokens and excluding the lowest-scoring ones.
"""

try:
    import cStringIO as StringIO
except ImportError:
    import StringIO

import os, sys, re, operator, errno, getopt, string, time, bisect
import socket, asyncore, asynchat, cgi, urlparse, webbrowser
import mailbox, email.Header
import spambayes
from spambayes import storage, tokenizer, mboxutils, PyMeldLite, Dibbler
from spambayes.FileCorpus import FileCorpus, ExpiryFileCorpus
from spambayes.FileCorpus import FileMessageFactory, GzipFileMessageFactory
from email.Iterators import typed_subpart_iterator
from OptionConfig import OptionsConfigurator
from spambayes.Options import options

# HEADER_EXAMPLE is the longest possible header - the length of this one
# is added to the size of each message.
HEADER_FORMAT = '%s: %%s\r\n' % options.hammie_header_name
HEADER_EXAMPLE = '%s: xxxxxxxxxxxxxxxxxxxx\r\n' % options.hammie_header_name

IMAGES = ('helmet', 'status', 'config',
          'message', 'train', 'classify', 'query')

class ServerLineReader(Dibbler.BrighterAsyncChat):
    """An async socket that reads lines from a remote server and
    simply calls a callback with the data.  The BayesProxy object
    can't connect to the real POP3 server and talk to it
    synchronously, because that would block the process."""

    def __init__(self, serverName, serverPort, lineCallback):
        Dibbler.BrighterAsyncChat.__init__(self)
        self.lineCallback = lineCallback
        self.request = ''
        self.set_terminator('\r\n')
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            self.connect((serverName, serverPort))
        except socket.error, e:
            error = "Can't connect to %s:%d: %s" % (serverName, serverPort, e)
            print >>sys.stderr, error
            self.lineCallback('-ERR %s\r\n' % error)
            self.lineCallback('')   # "The socket's been closed."
            self.close()

    def collect_incoming_data(self, data):
        self.request = self.request + data

    def found_terminator(self):
        self.lineCallback(self.request + '\r\n')
        self.request = ''

    def handle_close(self):
        self.lineCallback('')
        self.close()


class POP3ProxyBase(Dibbler.BrighterAsyncChat):
    """An async dispatcher that understands POP3 and proxies to a POP3
    server, calling `self.onTransaction(request, response)` for each
    transaction. Responses are not un-byte-stuffed before reaching
    self.onTransaction() (they probably should be for a totally generic
    POP3ProxyBase class, but BayesProxy doesn't need it and it would
    mean re-stuffing them afterwards).  self.onTransaction() should
    return the response to pass back to the email client - the response
    can be the verbatim response or a processed version of it.  The
    special command 'KILL' kills it (passing a 'QUIT' command to the
    server).
    """

    def __init__(self, clientSocket, serverName, serverPort):
        Dibbler.BrighterAsyncChat.__init__(self, clientSocket)
        self.request = ''
        self.response = ''
        self.set_terminator('\r\n')
        self.command = ''           # The POP3 command being processed...
        self.args = ''              # ...and its arguments
        self.isClosing = False      # Has the server closed the socket?
        self.seenAllHeaders = False # For the current RETR or TOP
        self.startTime = 0          # (ditto)
        self.serverSocket = ServerLineReader(serverName, serverPort,
                                             self.onServerLine)

    def onTransaction(self, command, args, response):
        """Overide this.  Takes the raw request and the response, and
        returns the (possibly processed) response to pass back to the
        email client.
        """
        raise NotImplementedError

    def onServerLine(self, line):
        """A line of response has been received from the POP3 server."""
        isFirstLine = not self.response
        self.response = self.response + line

        # Is this the line that terminates a set of headers?
        self.seenAllHeaders = self.seenAllHeaders or line in ['\r\n', '\n']

        # Has the server closed its end of the socket?
        if not line:
            self.isClosing = True

        # If we're not processing a command, just echo the response.
        if not self.command:
            self.push(self.response)
            self.response = ''

        # Time out after 30 seconds for message-retrieval commands if
        # all the headers are down.  The rest of the message will proxy
        # straight through.
        if self.command in ['TOP', 'RETR'] and \
           self.seenAllHeaders and time.time() > self.startTime + 30:
            self.onResponse()
            self.response = ''
        # If that's a complete response, handle it.
        elif not self.isMultiline() or line == '.\r\n' or \
           (isFirstLine and line.startswith('-ERR')):
            self.onResponse()
            self.response = ''

    def isMultiline(self):
        """Returns True if the request should get a multiline
        response (assuming the response is positive).
        """
        if self.command in ['USER', 'PASS', 'APOP', 'QUIT',
                            'STAT', 'DELE', 'NOOP', 'RSET', 'KILL']:
            return False
        elif self.command in ['RETR', 'TOP']:
            return True
        elif self.command in ['LIST', 'UIDL']:
            return len(self.args) == 0
        else:
            # Assume that an unknown command will get a single-line
            # response.  This should work for errors and for POP-AUTH,
            # and is harmless even for multiline responses - the first
            # line will be passed to onTransaction and ignored, then the
            # rest will be proxied straight through.
            return False

    ## This is an attempt to solve the problem whereby the email client
    ## times out and closes the connection but the ServerLineReader is still
    ## connected, so you get errors from the POP3 server next time because
    ## there's already an active connection.  But after introducing this,
    ## I kept getting unexplained "Bad file descriptor" errors in recv.
    ##
    ## def handle_close(self):
    ##     """If the email client closes the connection unexpectedly, eg.
    ##     because of a timeout, close the server connection."""
    ##     self.serverSocket.shutdown(2)
    ##     self.serverSocket.close()
    ##     self.close()

    def collect_incoming_data(self, data):
        """Asynchat override."""
        self.request = self.request + data

    def found_terminator(self):
        """Asynchat override."""
        verb = self.request.strip().upper()
        if verb == 'KILL':
            self.socket.shutdown(2)
            self.close()
            raise SystemExit
        elif verb == 'CRASH':
            # For testing
            x = 0
            y = 1/x

        self.serverSocket.push(self.request + '\r\n')
        if self.request.strip() == '':
            # Someone just hit the Enter key.
            self.command = self.args = ''
        else:
            # A proper command.
            splitCommand = self.request.strip().split(None, 1)
            self.command = splitCommand[0].upper()
            self.args = splitCommand[1:]
            self.startTime = time.time()

        self.request = ''

    def onResponse(self):
        # Pass the request and the raw response to the subclass and
        # send back the cooked response.
        if self.response:
            cooked = self.onTransaction(self.command, self.args, self.response)
            self.push(cooked)

        # If onServerLine() decided that the server has closed its
        # socket, close this one when the response has been sent.
        if self.isClosing:
            self.close_when_done()

        # Reset.
        self.command = ''
        self.args = ''
        self.isClosing = False
        self.seenAllHeaders = False


class BayesProxyListener(Dibbler.Listener):
    """Listens for incoming email client connections and spins off
    BayesProxy objects to serve them.
    """

    def __init__(self, serverName, serverPort, proxyPort):
        proxyArgs = (serverName, serverPort)
        Dibbler.Listener.__init__(self, proxyPort, BayesProxy, proxyArgs)
        print 'Listener on port %s is proxying %s:%d' % \
               (_addressPortStr(proxyPort), serverName, serverPort)


class BayesProxy(POP3ProxyBase):
    """Proxies between an email client and a POP3 server, inserting
    judgement headers.  It acts on the following POP3 commands:

     o STAT:
        o Adds the size of all the judgement headers to the maildrop
          size.

     o LIST:
        o With no message number: adds the size of an judgement header
          to the message size for each message in the scan listing.
        o With a message number: adds the size of an judgement header
          to the message size.

     o RETR:
        o Adds the judgement header based on the raw headers and body
          of the message.

     o TOP:
        o Adds the judgement header based on the raw headers and as
          much of the body as the TOP command retrieves.  This can
          mean that the header might have a different value for
          different calls to TOP, or for calls to TOP vs. calls to
          RETR.  I'm assuming that the email client will either not
          make multiple calls, or will cope with the headers being
          different.
    """

    def __init__(self, clientSocket, serverName, serverPort):
        POP3ProxyBase.__init__(self, clientSocket, serverName, serverPort)
        self.handlers = {'STAT': self.onStat, 'LIST': self.onList,
                         'RETR': self.onRetr, 'TOP': self.onTop}
        state.totalSessions += 1
        state.activeSessions += 1
        self.isClosed = False

    def send(self, data):
        """Logs the data to the log file."""
        if options.verbose:
            state.logFile.write(data)
            state.logFile.flush()
        try:
            return POP3ProxyBase.send(self, data)
        except socket.error:
            # The email client has closed the connection - 40tude Dialog
            # does this immediately after issuing a QUIT command,
            # without waiting for the response.
            self.close()

    def recv(self, size):
        """Logs the data to the log file."""
        data = POP3ProxyBase.recv(self, size)
        if options.verbose:
            state.logFile.write(data)
            state.logFile.flush()
        return data

    def close(self):
        # This can be called multiple times by async.
        if not self.isClosed:
            self.isClosed = True
            state.activeSessions -= 1
            POP3ProxyBase.close(self)

    def onTransaction(self, command, args, response):
        """Takes the raw request and response, and returns the
        (possibly processed) response to pass back to the email client.
        """
        handler = self.handlers.get(command, self.onUnknown)
        return handler(command, args, response)

    def onStat(self, command, args, response):
        """Adds the size of all the judgement headers to the maildrop
        size."""
        match = re.search(r'^\+OK\s+(\d+)\s+(\d+)(.*)\r\n', response)
        if match:
            count = int(match.group(1))
            size = int(match.group(2)) + len(HEADER_EXAMPLE) * count
            return '+OK %d %d%s\r\n' % (count, size, match.group(3))
        else:
            return response

    def onList(self, command, args, response):
        """Adds the size of an judgement header to the message
        size(s)."""
        if response.count('\r\n') > 1:
            # Multiline: all lines but the first contain a message size.
            lines = response.split('\r\n')
            outputLines = [lines[0]]
            for line in lines[1:]:
                match = re.search('^(\d+)\s+(\d+)', line)
                if match:
                    number = int(match.group(1))
                    size = int(match.group(2)) + len(HEADER_EXAMPLE)
                    line = "%d %d" % (number, size)
                outputLines.append(line)
            return '\r\n'.join(outputLines)
        else:
            # Single line.
            match = re.search('^\+OK\s+(\d+)(.*)\r\n', response)
            if match:
                size = int(match.group(1)) + len(HEADER_EXAMPLE)
                return "+OK %d%s\r\n" % (size, match.group(2))
            else:
                return response

    def onRetr(self, command, args, response):
        """Adds the judgement header based on the raw headers and body
        of the message."""
        # Use '\n\r?\n' to detect the end of the headers in case of
        # broken emails that don't use the proper line separators.
        if re.search(r'\n\r?\n', response):
            # Break off the first line, which will be '+OK'.
            ok, messageText = response.split('\n', 1)

            # Now find the spam disposition and add the header.
            prob = state.bayes.spamprob(tokenizer.tokenize(messageText))
            if prob < options.ham_cutoff:
                disposition = options.header_ham_string
                if command == 'RETR':
                    state.numHams += 1
            elif prob > options.spam_cutoff:
                disposition = options.header_spam_string
                if command == 'RETR':
                    state.numSpams += 1
            else:
                disposition = options.header_unsure_string
                if command == 'RETR':
                    state.numUnsure += 1

            header = '%s: %s\r\n' % (options.hammie_header_name, disposition)
            headers, body = re.split(r'\n\r?\n', messageText, 1)
            headers = headers + "\n" + header + "\r\n"
            messageText = headers + body

            # Cache the message; don't pollute the cache with test messages.
            if command == 'RETR' and not state.isTest:
                # The message name is the time it arrived, with a uniquifier
                # appended if two arrive within one clock tick of each other.
                messageName = "%10.10d" % long(time.time())
                if messageName == state.lastBaseMessageName:
                    state.lastBaseMessageName = messageName
                    messageName = "%s-%d" % (messageName, state.uniquifier)
                    state.uniquifier += 1
                else:
                    state.lastBaseMessageName = messageName
                    state.uniquifier = 2

                # Write the message into the Unknown cache.
                message = state.unknownCorpus.makeMessage(messageName)
                message.setSubstance(messageText)
                state.unknownCorpus.addMessage(message)

            # Return the +OK and the message with the header added.
            return ok + "\n" + messageText

        else:
            # Must be an error response.
            return response

    def onTop(self, command, args, response):
        """Adds the judgement header based on the raw headers and as
        much of the body as the TOP command retrieves."""
        # Easy (but see the caveat in BayesProxy.__doc__).
        return self.onRetr(command, args, response)

    def onUnknown(self, command, args, response):
        """Default handler; returns the server's response verbatim."""
        return response


class UserInterfaceServer(Dibbler.HTTPServer):
    """Implements the web server component via a Dibbler plugin."""

    def __init__(self, uiPort):
        Dibbler.HTTPServer.__init__(self, uiPort)
        print 'User interface url is http://localhost:%d/' % (uiPort)


def readUIResources():
    """Returns ui.html and a dictionary of Gifs.  Used here and by
    OptionConfig"""

    # Using `exec` is nasty, but I couldn't figure out a way of making
    # `getattr` or `__import__` work with ResourcePackage.
    from spambayes.resources import ui_html
    images = {}
    for imageName in IMAGES:
        exec "from spambayes.resources import %s_gif" % imageName
        exec "images[imageName] = %s_gif.data" % imageName
    return ui_html.data, images


class UserInterface(Dibbler.HTTPPlugin):
    """Serves the HTML user interface of the proxy."""

    def __init__(self):
        """Load up the necessary resources: ui.html and helmet.gif."""
        Dibbler.HTTPPlugin.__init__(self)
        htmlSource, self._images = readUIResources()
        self.html = PyMeldLite.Meld(htmlSource, readonly=True)

    def onIncomingConnection(self, clientSocket):
        """Checks the security settings."""
        return options.html_ui_allow_remote_connections or \
               clientSocket.getpeername()[0] == clientSocket.getsockname()[0]

    def _writePreamble(self, name, showImage=True):
        """Writes the HTML for the beginning of a page - time-consuming
        methlets use this and `_writePostamble` to write the page in
        pieces, including progress messages."""

        # Take the whole palette and remove the content and the footer,
        # leaving the header and an empty body.
        html = self.html.clone()
        html.mainContent = " "
        del html.footer

        # Add in the name of the page and remove the link to Home if this
        # *is* Home.
        html.title = name
        if name == 'Home':
            del html.homelink
            html.pagename = "Home"
        else:
            html.pagename = "> " + name

        # Remove the helmet image if we're not showing it - this happens on
        # shutdown because the browser might ask for the image after we've
        # exited.
        if not showImage:
            del html.helmet

        # Strip the closing tags, so we push as far as the start of the main
        # content.  We'll push the closing tags at the end.
        self.writeOKHeaders('text/html')
        self.write(re.sub(r'</div>\s*</body>\s*</html>', '', str(html)))

    def _writePostamble(self):
        """Writes the end of time-consuming pages - see `_writePreamble`."""
        footer = self.html.footer.clone()
        footer.timestamp = time.asctime(time.localtime())
        self.write("</div>" + self.html.footer)
        self.write("</body></html>")

    def _trimHeader(self, field, limit, quote=False):
        """Trims a string, adding an ellipsis if necessary and HTML-quoting
        on request.  Also pumps it through email.Header.decode_header, which
        understands charset sections in email headers - I suspect this will
        only work for Latin character sets, but hey, it works for Francois
        Granger's name.  8-)"""

        sections = email.Header.decode_header(field)
        field = ' '.join([text for text, unused in sections])
        if len(field) > limit:
            field = field[:limit-3] + "..."
        if quote:
            field = cgi.escape(field)
        return field

    def onHome(self):
        """Serve up the homepage."""
        stateDict = state.__dict__.copy()
        stateDict.update(state.bayes.__dict__)
        statusTable = self.html.statusTable.clone()
        if not state.servers:
            statusTable.proxyDetails = "No POP3 proxies running."
        content = (self._buildBox('Status and Configuration',
                                  'status.gif', statusTable % stateDict)+
                   self._buildBox('Train on proxied messages',
                                  'train.gif', self.html.reviewText) +
                   self._buildTrainBox() +
                   self._buildClassifyBox() +
                   self._buildBox('Word query', 'query.gif',
                                  self.html.wordQuery))
        self._writePreamble("Home")
        self.write(content)
        self._writePostamble()

    def _doSave(self):
        """Saves the database."""
        self.write("<b>Saving... ")
        self.flush()
        state.bayes.store()
        self.write("Done</b>.\n")

    def onSave(self, how):
        """Command handler for "Save" and "Save & shutdown"."""
        isShutdown = how.lower().find('shutdown') >= 0
        self._writePreamble("Save", showImage=(not isShutdown))
        self._doSave()
        if isShutdown:
            self.write("<p>%s</p>" % self.html.shutdownMessage)
            self.write("</div></body></html>")
            self.flush()
            ## Is this still required?: self.shutdown(2)
            self.close()
            raise SystemExit
        self._writePostamble()

    def onTrain(self, file, text, which):
        """Train on an uploaded or pasted message."""
        self._writePreamble("Train")

        # Upload or paste?  Spam or ham?
        content = file or text
        isSpam = (which == 'Train as Spam')

        # Convert platform-specific line endings into unix-style.
        content = content.replace('\r\n', '\n').replace('\r', '\n')

        # Single message or mbox?
        if content.startswith('From '):
            # Get a list of raw messages from the mbox content.
            class SimpleMessage:
                def __init__(self, fp):
                    self.guts = fp.read()
            contentFile = StringIO.StringIO(content)
            mbox = mailbox.PortableUnixMailbox(contentFile, SimpleMessage)
            messages = map(lambda m: m.guts, mbox)
        else:
            # Just the one message.
            messages = [content]

        # Append the message(s) to a file, to make it easier to rebuild
        # the database later.   This is a temporary implementation -
        # it should keep a Corpus of trained messages.
        if isSpam:
            f = open("_pop3proxyspam.mbox", "a")
        else:
            f = open("_pop3proxyham.mbox", "a")

        # Train on the uploaded message(s).
        self.write("<b>Training...</b>\n")
        self.flush()
        for message in messages:
            tokens = tokenizer.tokenize(message)
            state.bayes.learn(tokens, isSpam)
            f.write("From pop3proxy@spambayes.org Sat Jan 31 00:00:00 2000\n")
            f.write(message)
            f.write("\n\n")

        # Save the database and return a link Home and another training form.
        f.close()
        self._doSave()
        self.write("<p>OK. Return <a href='home'>Home</a> or train again:</p>")
        self.write(self._buildTrainBox())
        self._writePostamble()

    def _keyToTimestamp(self, key):
        """Given a message key (as seen in a Corpus), returns the timestamp
        for that message.  This is the time that the message was received,
        not the Date header."""
        return long(key[:10])

    def _getTimeRange(self, timestamp):
        """Given a unix timestamp, returns a 3-tuple: the start timestamp
        of the given day, the end timestamp of the given day, and the
        formatted date of the given day."""
        # This probably works on Summertime-shift days; time will tell.  8-)
        this = time.localtime(timestamp)
        start = (this[0], this[1], this[2], 0, 0, 0, this[6], this[7], this[8])
        end = time.localtime(time.mktime(start) + 36*60*60)
        end = (end[0], end[1], end[2], 0, 0, 0, end[6], end[7], end[8])
        date = time.strftime("%A, %B %d, %Y", start)
        return time.mktime(start), time.mktime(end), date

    def _buildReviewKeys(self, timestamp):
        """Builds an ordered list of untrained message keys, ready for output
        in the Review list.  Returns a 5-tuple: the keys, the formatted date
        for the list (eg. "Friday, November 15, 2002"), the start of the prior
        page or zero if there isn't one, likewise the start of the given page,
        and likewise the start of the next page."""
        # Fetch all the message keys and sort them into timestamp order.
        allKeys = state.unknownCorpus.keys()
        allKeys.sort()

        # The default start timestamp is derived from the most recent message,
        # or the system time if there are no messages (not that it gets used).
        if not timestamp:
            if allKeys:
                timestamp = self._keyToTimestamp(allKeys[-1])
            else:
                timestamp = time.time()
        start, end, date = self._getTimeRange(timestamp)

        # Find the subset of the keys within this range.
        startKeyIndex = bisect.bisect(allKeys, "%d" % long(start))
        endKeyIndex = bisect.bisect(allKeys, "%d" % long(end))
        keys = allKeys[startKeyIndex:endKeyIndex]
        keys.reverse()

        # What timestamps to use for the prior and next days?  If there any
        # messages before/after this day's range, use the timestamps of those
        # messages - this will skip empty days.
        prior = end = 0
        if startKeyIndex != 0:
            prior = self._keyToTimestamp(allKeys[startKeyIndex-1])
        if endKeyIndex != len(allKeys):
            end = self._keyToTimestamp(allKeys[endKeyIndex])

        # Return the keys and their date.
        return keys, date, prior, start, end

    def _makeMessageInfo(self, message):
        """Given an email.Message, return an object with subjectHeader,
        fromHeader and bodySummary attributes.  These objects are passed into
        appendMessages by onReview - passing email.Message objects directly
        uses too much memory."""
        subjectHeader = message["Subject"] or "(none)"
        fromHeader = message["From"] or "(none)"
        try:
            part = typed_subpart_iterator(message, 'text', 'plain').next()
            text = part.get_payload()
        except StopIteration:
            try:
                part = typed_subpart_iterator(message, 'text', 'html').next()
                text = part.get_payload()
                text, unused = tokenizer.crack_html_style(text)
                text, unused = tokenizer.crack_html_comment(text)
                text = tokenizer.html_re.sub(' ', text)
                text = '(this message only has an HTML body)\n' + text
            except StopIteration:
                text = '(this message has no text body)'
        text = text.replace('&nbsp;', ' ')      # Else they'll be quoted
        text = re.sub(r'(\s)\s+', r'\1', text)  # Eg. multiple blank lines
        text = text.strip()

        class _MessageInfo:
            pass
        messageInfo = _MessageInfo()
        messageInfo.subjectHeader = self._trimHeader(subjectHeader, 50, True)
        messageInfo.fromHeader = self._trimHeader(fromHeader, 40, True)
        messageInfo.bodySummary = self._trimHeader(text, 200)
        return messageInfo

    def _appendMessages(self, table, keyedMessageInfo, label):
        """Appends the rows of a table of messages to 'table'."""
        stripe = 0
        for key, messageInfo in keyedMessageInfo:
            row = self.html.reviewRow.clone()
            if label == 'Spam':
                row.spam.checked = 1
            elif label == 'Ham':
                row.ham.checked = 1
            else:
                row.defer.checked = 1
            row.subject = messageInfo.subjectHeader
            row.subject.title = messageInfo.bodySummary
            row.from_ = messageInfo.fromHeader
            setattr(row, 'class', ['stripe_on', 'stripe_off'][stripe]) # Grr!
            row = str(row).replace('TYPE', label).replace('KEY', key)
            table += row
            stripe = stripe ^ 1

    def onReview(self, **params):
        """Present a list of message for (re)training."""
        # Train/discard sumbitted messages.
        self._writePreamble("Review")
        id = ''
        numTrained = 0
        numDeferred = 0
        for key, value in params.items():
            if key.startswith('classify:'):
                id = key.split(':')[2]
                if value == 'spam':
                    targetCorpus = state.spamCorpus
                elif value == 'ham':
                    targetCorpus = state.hamCorpus
                elif value == 'discard':
                    targetCorpus = None
                    try:
                        state.unknownCorpus.removeMessage(state.unknownCorpus[id])
                    except KeyError:
                        pass  # Must be a reload.
                else: # defer
                    targetCorpus = None
                    numDeferred += 1
                if targetCorpus:
                    try:
                        targetCorpus.takeMessage(id, state.unknownCorpus)
                        if numTrained == 0:
                            self.write("<p><b>Training... ")
                            self.flush()
                        numTrained += 1
                    except KeyError:
                        pass  # Must be a reload.

        # Report on any training, and save the database if there was any.
        if numTrained > 0:
            plural = ''
            if numTrained != 1:
                plural = 's'
            self.write("Trained on %d message%s. " % (numTrained, plural))
            self._doSave()
            self.write("<br>&nbsp;")

        # If any messages were deferred, show the same page again.
        if numDeferred > 0:
            start = self._keyToTimestamp(id)

        # Else after submitting a whole page, display the prior page or the
        # next one.  Derive the day of the submitted page from the ID of the
        # last processed message.
        elif id:
            start = self._keyToTimestamp(id)
            unused, unused, prior, unused, next = self._buildReviewKeys(start)
            if prior:
                start = prior
            else:
                start = next

        # Else if they've hit Previous or Next, display that page.
        elif params.get('go') == 'Next day':
            start = self._keyToTimestamp(params['next'])
        elif params.get('go') == 'Previous day':
            start = self._keyToTimestamp(params['prior'])

        # Else show the most recent day's page, as decided by _buildReviewKeys.
        else:
            start = 0

        # Build the lists of messages: spams, hams and unsure.
        keys, date, prior, this, next = self._buildReviewKeys(start)
        keyedMessageInfo = {options.header_spam_string: [],
                            options.header_ham_string: [],
                            options.header_unsure_string: []}
        for key in keys:
            # Parse the message, get the judgement header and build a message
            # info object for each message.
            cachedMessage = state.unknownCorpus[key]
            message = mboxutils.get_message(cachedMessage.getSubstance())
            judgement = message[options.hammie_header_name] or \
                                            options.header_unsure_string
            messageInfo = self._makeMessageInfo(message)
            keyedMessageInfo[judgement].append((key, messageInfo))

        # Present the list of messages in their groups in reverse order of
        # appearance.
        if keys:
            page = self.html.reviewtable.clone()
            if prior:
                page.prior.value = prior
                del page.priorButton.disabled
            if next:
                page.next.value = next
                del page.nextButton.disabled
            templateRow = page.reviewRow.clone()
            page.table = ""  # To make way for the real rows.
            for header, label in ((options.header_spam_string, 'Spam'),
                                  (options.header_ham_string, 'Ham'),
                                  (options.header_unsure_string, 'Unsure')):
                messages = keyedMessageInfo[header]
                if messages:
                    subHeader = str(self.html.reviewSubHeader)
                    subHeader = subHeader.replace('TYPE', label)
                    page.table += self.html.blankRow
                    page.table += subHeader
                    self._appendMessages(page.table, messages, label)

            page.table += self.html.trainRow
            title = "Untrained messages received on %s" % date
            box = self._buildBox(title, None, page)  # No icon, to save space.
        else:
            page = "<p>There are no untrained messages to display. "
            page += "Return <a href='home'>Home</a>.</p>"
            title = "No untrained messages"
            box = self._buildBox(title, 'status.gif', page)

        self.write(box)
        self._writePostamble()

    def onClassify(self, file, text, which):
        """Classify an uploaded or pasted message."""
        message = file or text
        message = message.replace('\r\n', '\n').replace('\r', '\n') # For Macs
        tokens = tokenizer.tokenize(message)
        probability, clues = state.bayes.spamprob(tokens, evidence=True)

        cluesTable = self.html.cluesTable.clone()
        cluesRow = cluesTable.cluesRow.clone()
        del cluesTable.cluesRow   # Delete dummy row to make way for real ones
        for word, wordProb in clues:
            cluesTable += cluesRow % (word, wordProb)

        results = self.html.classifyResults.clone()
        results.probability = probability
        results.cluesBox = self._buildBox("Clues:", 'status.gif', cluesTable)
        results.classifyAnother = self._buildClassifyBox()
        self._writePreamble("Classify")
        self.write(results)
        self._writePostamble()

    def onWordquery(self, word):
        word = word.lower()
        wordinfo = state.bayes._wordinfoget(word)
        if wordinfo:
            stats = self.html.wordStats.clone()
            stats.spamcount = wordinfo.spamcount
            stats.hamcount = wordinfo.hamcount
            stats.spamprob = state.bayes.probability(wordinfo)
        else:
            stats = "%r does not exist in the database." % word

        query = self.html.wordQuery.clone()
        query.word.value = word
        statsBox = self._buildBox("Statistics for %r" % word,
                                  'status.gif', stats)
        queryBox = self._buildBox("Word query", 'query.gif', query)
        self._writePreamble("Word query")
        self.write(statsBox + queryBox)
        self._writePostamble()

    def _writeImage(self, image):
        self.writeOKHeaders('image/gif')
        self.write(self._images[image])

    # If you are easily offended, look away now...
    for imageName in IMAGES:
        exec "def %s(self): self._writeImage('%s')" % \
             ("on%sGif" % imageName.capitalize(), imageName)

    def _buildBox(self, heading, icon, content):
        """Builds a yellow-headed HTML box."""
        box = self.html.headedBox.clone()
        box.heading = heading
        if icon:
            box.icon.src = icon
        else:
            del box.iconCell
        box.boxContent = content
        return box

    def _buildClassifyBox(self):
        """Returns a "Classify a message" box.  This is used on both the Home
        page and the classify results page.  The Classify form is based on the
        Upload form."""

        form = self.html.upload.clone()
        del form.or_mbox
        del form.submit_spam
        del form.submit_ham
        form.action = "classify"
        return self._buildBox("Classify a message", 'classify.gif', form)

    def _buildTrainBox(self):
        """Returns a "Train on a given message" box.  This is used on both
        the Home page and the training results page.  The Train form is
        based on the Upload form."""

        form = self.html.upload.clone()
        del form.submit_classify
        return self._buildBox("Train on a given message", 'message.gif', form)

    def reReadOptions(self):
        """Called by the config page when the user saves some new options, or
        restores the defaults."""
        # Reload the options.
        global state
        state.bayes.store()
        reload(spambayes.Options)
        global options
        from spambayes.Options import options

        # Recreate the state.
        state = State()
        state.buildServerStrings()
        state.createWorkers()

        # Close the exsiting listeners and create new ones.  This won't
        # affect any running proxies - once a listener has created a proxy,
        # that proxy is then independent of it.
        for proxy in proxyListeners:
            proxy.close()
        del proxyListeners[:]
        _createProxies(state.servers, state.proxyPorts)


# This keeps the global state of the module - the command-line options,
# statistics like how many mails have been classified, the handle of the
# log file, the Classifier and FileCorpus objects, and so on.
class State:
    def __init__(self):
        """Initialises the State object that holds the state of the app.
        The default settings are read from Options.py and bayescustomize.ini
        and are then overridden by the command-line processing code in the
        __main__ code below."""
        # Open the log file.
        if options.verbose:
            self.logFile = open('_pop3proxy.log', 'wb', 0)

        # Load up the old proxy settings from Options.py / bayescustomize.ini
        # and give warnings if they're present.   XXX Remove these soon.
        self.servers = []
        self.proxyPorts = []
        if options.pop3proxy_port != 110 or \
           options.pop3proxy_server_name != '' or \
           options.pop3proxy_server_port != 110:
            print "\n    pop3proxy_port, pop3proxy_server_name and"
            print "    pop3proxy_server_port are deprecated!  Please use"
            print "    pop3proxy_servers and pop3proxy_ports instead.\n"
            self.servers = [(options.pop3proxy_server_name,
                             options.pop3proxy_server_port)]
            self.proxyPorts = [options.pop3proxy_port]

        # Load the new proxy settings - these will override the old ones
        # if both are present.
        if options.pop3proxy_servers:
            for server in options.pop3proxy_servers.split(','):
                server = server.strip()
                if server.find(':') > -1:
                    server, port = server.split(':', 1)
                else:
                    port = '110'
                self.servers.append((server, int(port)))

        if options.pop3proxy_ports:
            splitPorts = options.pop3proxy_ports.split(',')
            self.proxyPorts = map(_addressAndPort, splitPorts)

        if len(self.servers) != len(self.proxyPorts):
            print "pop3proxy_servers & pop3proxy_ports are different lengths!"
            sys.exit()

        # Load up the other settings from Option.py / bayescustomize.ini
        self.useDB = options.pop3proxy_persistent_use_database
        self.uiPort = options.html_ui_port
        self.launchUI = options.html_ui_launch_browser
        self.gzipCache = options.pop3proxy_cache_use_gzip
        self.cacheExpiryDays = options.pop3proxy_cache_expiry_days
        self.runTestServer = False
        self.isTest = False

        # Set up the statistics.
        self.totalSessions = 0
        self.activeSessions = 0
        self.numSpams = 0
        self.numHams = 0
        self.numUnsure = 0

        # Unique names for cached messages - see BayesProxy.onRetr
        self.lastBaseMessageName = ''
        self.uniquifier = 2

    def buildServerStrings(self):
        """After the server details have been set up, this creates string
        versions of the details, for display in the Status panel."""
        serverStrings = ["%s:%s" % (s, p) for s, p in self.servers]
        self.serversString = ', '.join(serverStrings)
        self.proxyPortsString = ', '.join(map(_addressPortStr, self.proxyPorts))

    def createWorkers(self):
        """Using the options that were initialised in __init__ and then
        possibly overridden by the driver code, create the Bayes object,
        the Corpuses, the Trainers and so on."""
        print "Loading database...",
        if self.isTest:
            self.useDB = True
            options.pop3proxy_persistent_storage_file = \
                        '_pop3proxy_test.pickle'   # This is never saved.
        if self.useDB:
            self.bayes = storage.DBDictClassifier( \
                                options.pop3proxy_persistent_storage_file)
        else:
            self.bayes = storage.PickledClassifier(\
                                options.pop3proxy_persistent_storage_file)
        print "Done."

        # Don't set up the caches and training objects when running the self-test,
        # so as not to clutter the filesystem.
        if not self.isTest:
            def ensureDir(dirname):
                try:
                    os.mkdir(dirname)
                except OSError, e:
                    if e.errno != errno.EEXIST:
                        raise

            # Create/open the Corpuses.  Use small cache sizes to avoid hogging
            # lots of memory.
            map(ensureDir, [options.pop3proxy_spam_cache,
                            options.pop3proxy_ham_cache,
                            options.pop3proxy_unknown_cache])
            if self.gzipCache:
                factory = GzipFileMessageFactory()
            else:
                factory = FileMessageFactory()
            age = options.pop3proxy_cache_expiry_days*24*60*60
            self.spamCorpus = ExpiryFileCorpus(age, factory,
                                               options.pop3proxy_spam_cache,
                                               '[0-9]*',
                                               cacheSize=20)
            self.hamCorpus = ExpiryFileCorpus(age, factory,
                                              options.pop3proxy_ham_cache,
                                               '[0-9]*',
                                              cacheSize=20)
            self.unknownCorpus = FileCorpus(factory,
                                            options.pop3proxy_unknown_cache,
                                            cacheSize=20)

            # Expire old messages from the trained corpuses.
            self.spamCorpus.removeExpiredMessages()
            self.hamCorpus.removeExpiredMessages()

            # Create the Trainers.
            self.spamTrainer = storage.SpamTrainer(self.bayes)
            self.hamTrainer = storage.HamTrainer(self.bayes)
            self.spamCorpus.addObserver(self.spamTrainer)
            self.hamCorpus.addObserver(self.hamTrainer)

# option-parsing helper functions

def _addressAndPort(s):
   "Decode a string representing a port to bind to, with optional address"
   if ':' in s: 
     addr, port = s.strip().split(':')
     return addr, int(port)
   else:
     return '', int(s)

def _addressPortStr((addr, port)):
  "Encode a string representing a port to bind to, with optional address"
  if not addr: 
    return str(port)
  else:
    return '%s:%d' % (addr, port)


state = State()
proxyListeners = []
def _createProxies(servers, proxyPorts):
    """Create BayesProxyListeners for all the given servers."""
    for (server, serverPort), proxyPort in zip(servers, proxyPorts):
        listener = BayesProxyListener(server, serverPort, proxyPort)
        proxyListeners.append(listener)

def main(servers, proxyPorts, uiPort, launchUI):
    """Runs the proxy forever or until a 'KILL' command is received or
    someone hits Ctrl+Break."""
    _createProxies(servers, proxyPorts)
    httpServer = UserInterfaceServer(uiPort)
    proxyUI = UserInterface()
    httpServer.register(proxyUI, OptionsConfigurator(proxyUI))
    Dibbler.run(launchBrowser=launchUI)


# ===================================================================
# Test code.
# ===================================================================

# One example of spam and one of ham - both are used to train, and are
# then classified.  Not a good test of the classifier, but a perfectly
# good test of the POP3 proxy.  The bodies of these came from the
# spambayes project, and I added the headers myself because the
# originals had no headers.

spam1 = """From: friend@public.com
Subject: Make money fast

Hello tim_chandler , Want to save money ?
Now is a good time to consider refinancing. Rates are low so you can cut
your current payments and save money.

http://64.251.22.101/interest/index%38%30%300%2E%68t%6D

Take off list on site [s5]
"""

good1 = """From: chris@example.com
Subject: ZPT and DTML

Jean Jordaan wrote:
> 'Fraid so ;>  It contains a vintage dtml-calendar tag.
>   http://www.zope.org/Members/teyc/CalendarTag
>
> Hmm I think I see what you mean: one needn't manually pass on the
> namespace to a ZPT?

Yeah, Page Templates are a bit more clever, sadly, DTML methods aren't :-(

Chris
"""

class TestListener(Dibbler.Listener):
    """Listener for TestPOP3Server.  Works on port 8110, to co-exist
    with real POP3 servers."""

    def __init__(self, socketMap=asyncore.socket_map):
        Dibbler.Listener.__init__(self, 8110, TestPOP3Server,
                                  (socketMap,), socketMap=socketMap)


class TestPOP3Server(Dibbler.BrighterAsyncChat):
    """Minimal POP3 server, for testing purposes.  Doesn't support
    UIDL.  USER, PASS, APOP, DELE and RSET simply return "+OK"
    without doing anything.  Also understands the 'KILL' command, to
    kill it.  The mail content is the example messages above.
    """

    def __init__(self, clientSocket, socketMap):
        # Grumble: asynchat.__init__ doesn't take a 'map' argument,
        # hence the two-stage construction.
        Dibbler.BrighterAsyncChat.__init__(self)
        Dibbler.BrighterAsyncChat.set_socket(self, clientSocket, socketMap)
        self.maildrop = [spam1, good1]
        self.set_terminator('\r\n')
        self.okCommands = ['USER', 'PASS', 'APOP', 'NOOP',
                           'DELE', 'RSET', 'QUIT', 'KILL']
        self.handlers = {'STAT': self.onStat,
                         'LIST': self.onList,
                         'RETR': self.onRetr,
                         'TOP': self.onTop}
        self.push("+OK ready\r\n")
        self.request = ''

    def collect_incoming_data(self, data):
        """Asynchat override."""
        self.request = self.request + data

    def found_terminator(self):
        """Asynchat override."""
        if ' ' in self.request:
            command, args = self.request.split(None, 1)
        else:
            command, args = self.request, ''
        command = command.upper()
        if command in self.okCommands:
            self.push("+OK (we hope)\r\n")
            if command == 'QUIT':
                self.close_when_done()
            if command == 'KILL':
                self.socket.shutdown(2)
                self.close()
                raise SystemExit
        else:
            handler = self.handlers.get(command, self.onUnknown)
            self.push(handler(command, args))   # Or push_slowly for testing
        self.request = ''

    def push_slowly(self, response):
        """Useful for testing."""
        for c in response:
            self.push(c)
            time.sleep(0.02)

    def onStat(self, command, args):
        """POP3 STAT command."""
        maildropSize = reduce(operator.add, map(len, self.maildrop))
        maildropSize += len(self.maildrop) * len(HEADER_EXAMPLE)
        return "+OK %d %d\r\n" % (len(self.maildrop), maildropSize)

    def onList(self, command, args):
        """POP3 LIST command, with optional message number argument."""
        if args:
            try:
                number = int(args)
            except ValueError:
                number = -1
            if 0 < number <= len(self.maildrop):
                return "+OK %d\r\n" % len(self.maildrop[number-1])
            else:
                return "-ERR no such message\r\n"
        else:
            returnLines = ["+OK"]
            for messageIndex in range(len(self.maildrop)):
                size = len(self.maildrop[messageIndex])
                returnLines.append("%d %d" % (messageIndex + 1, size))
            returnLines.append(".")
            return '\r\n'.join(returnLines) + '\r\n'

    def _getMessage(self, number, maxLines):
        """Implements the POP3 RETR and TOP commands."""
        if 0 < number <= len(self.maildrop):
            message = self.maildrop[number-1]
            headers, body = message.split('\n\n', 1)
            bodyLines = body.split('\n')[:maxLines]
            message = headers + '\r\n\r\n' + '\n'.join(bodyLines)
            return "+OK\r\n%s\r\n.\r\n" % message
        else:
            return "-ERR no such message\r\n"

    def onRetr(self, command, args):
        """POP3 RETR command."""
        try:
            number = int(args)
        except ValueError:
            number = -1
        return self._getMessage(number, 12345)

    def onTop(self, command, args):
        """POP3 RETR command."""
        try:
            number, lines = map(int, args.split())
        except ValueError:
            number, lines = -1, -1
        return self._getMessage(number, lines)

    def onUnknown(self, command, args):
        """Unknown POP3 command."""
        return "-ERR Unknown command: %s\r\n" % repr(command)


def test():
    """Runs a self-test using TestPOP3Server, a minimal POP3 server
    that serves the example emails above.
    """
    # Run a proxy and a test server in separate threads with separate
    # asyncore environments.
    import threading
    state.isTest = True
    testServerReady = threading.Event()
    def runTestServer():
        testSocketMap = {}
        TestListener(socketMap=testSocketMap)
        testServerReady.set()
        asyncore.loop(map=testSocketMap)

    proxyReady = threading.Event()
    def runUIAndProxy():
        httpServer = UserInterfaceServer(8881)
        proxyUI = UserInterface()
        httpServer.register(proxyUI, OptionsConfigurator(proxyUI))
        BayesProxyListener('localhost', 8110, 8111)
        state.bayes.learn(tokenizer.tokenize(spam1), True)
        state.bayes.learn(tokenizer.tokenize(good1), False)
        proxyReady.set()
        Dibbler.run()

    threading.Thread(target=runTestServer).start()
    testServerReady.wait()
    threading.Thread(target=runUIAndProxy).start()
    proxyReady.wait()

    # Connect to the proxy.
    proxy = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    proxy.connect(('localhost', 8111))
    response = proxy.recv(100)
    assert response == "+OK ready\r\n"

    # Stat the mailbox to get the number of messages.
    proxy.send("stat\r\n")
    response = proxy.recv(100)
    count, totalSize = map(int, response.split()[1:3])
    assert count == 2

    # Loop through the messages ensuring that they have judgement
    # headers.
    for i in range(1, count+1):
        response = ""
        proxy.send("retr %d\r\n" % i)
        while response.find('\n.\r\n') == -1:
            response = response + proxy.recv(1000)
        assert response.find(options.hammie_header_name) >= 0

    # Smoke-test the HTML UI.
    httpServer = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    httpServer.connect(('localhost', 8881))
    httpServer.sendall("get / HTTP/1.0\r\n\r\n")
    response = ''
    while 1:
        packet = httpServer.recv(1000)
        if not packet: break
        response += packet
    assert re.search(r"(?s)<html>.*Spambayes proxy.*</html>", response)

    # Kill the proxy and the test server.
    proxy.sendall("kill\r\n")
    proxy.recv(100)
    pop3Server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    pop3Server.connect(('localhost', 8110))
    pop3Server.sendall("kill\r\n")
    pop3Server.recv(100)


# ===================================================================
# __main__ driver.
# ===================================================================

def run():
    # Read the arguments.
    try:
        opts, args = getopt.getopt(sys.argv[1:], 'htdbzp:l:u:')
    except getopt.error, msg:
        print >>sys.stderr, str(msg) + '\n\n' + __doc__
        sys.exit()

    runSelfTest = False
    for opt, arg in opts:
        if opt == '-h':
            print >>sys.stderr, __doc__
            sys.exit()
        elif opt == '-t':
            state.isTest = True
            state.runTestServer = True
        elif opt == '-b':
            state.launchUI = True
        elif opt == '-d':
            state.useDB = True
        elif opt == '-p':
            options.pop3proxy_persistent_storage_file = arg
        elif opt == '-l':
            state.proxyPorts = [_addressAndPort(arg)]
        elif opt == '-u':
            state.uiPort = int(arg)
        elif opt == '-z':
            state.isTest = True
            runSelfTest = True

    # Do whatever we've been asked to do...
    state.createWorkers()
    if runSelfTest:
        print "\nRunning self-test...\n"
        state.buildServerStrings()
        test()
        print "Self-test passed."   # ...else it would have asserted.

    elif state.runTestServer:
        print "Running a test POP3 server on port 8110..."
        TestListener()
        asyncore.loop()

    elif 0 <= len(args) <= 2:
        # Normal usage, with optional server name and port number.
        if len(args) == 1:
            state.servers = [(args[0], 110)]
        elif len(args) == 2:
            state.servers = [(args[0], int(args[1]))]

        state.buildServerStrings()
        main(state.servers, state.proxyPorts, state.uiPort, state.launchUI)

    else:
        print >>sys.stderr, __doc__

if __name__ == '__main__':
    run()
-------------- next part --------------

"""
*Introduction*

Dibbler is a Python web application framework.  It lets you create web-based
applications by writing independent plug-in modules that don't require any
networking code.  Dibbler takes care of the HTTP side of things, leaving you
to write the application code.


*Plugins and Methlets*

Dibbler uses a system of plugins to implement the application logic.  Each
page maps to a 'methlet', which is a method of a plugin object that serves
that page, and is named after the page it serves.  The address
`http://server/spam` calls the methlet `onSpam`.  `onHome` is a reserved
methlet name for the home page, `http://server/`.  For resources that need a
file extension (eg. images) you can use a URL such as `http://server/eggs.gif`
to map to the `onEggsGif` methlet.  All the registered plugins are searched
for the appropriate methlet, so you can combine multiple plugins to build
your application.

A methlet needs to call `self.writeOKHeaders('text/html')` followed by
`self.write(content)`.  You can pass whatever content-type you like to
`writeOKHeaders`, so serving images, PDFs, etc. is no problem.  If a methlet
wants to return an HTTP error code, it should call (for example)
`self.writeError(403, "Forbidden")` instead of `writeOKHeaders`
and `write`.  If it wants to write its own headers (for instance to return
a redirect) it can simply call `write` with the full HTTP response.

If a methlet raises an exception, it is automatically turned into a "500
Server Error" page with a full traceback in it.


*Parameters*

Methlets can take parameters, the values of which are taken from form
parameters submitted by the browser.  So if your form says
`<form action='subscribe'><input type="text" name="email"/> ...` then your
methlet should look like `def onSubscribe(self, email=None)`.  It's good
practice to give all the parameters default values, in case the user navigates
to that URL without submitting a form, or submits the form without filling in
any parameters.  If you have lots of parameters, or their names are determined
at runtime, you can define your methlet like this:
`def onComplex(self, **params)` to get a dictionary of parameters.


*Example*

Here's a web application server that serves a calendar for a given year:

>>> import Dibbler, calendar
>>> class Calendar(Dibbler.HTTPPlugin):
...     _form = '''<html><body><h3>Calendar Server</h3>
...                <form action='/'>
...                Year: <input type='text' name='year' size='4'>
...                <input type='submit' value='Go'></form>
...                <pre>%s</pre></body></html>'''
...
...     def onHome(self, year=None):
...         if year:
...             result = calendar.calendar(int(year))
...         else:
...             result = ""
...         self.writeOKHeaders('text/html')
...         self.write(self._form % result)
...
>>> httpServer = Dibbler.HTTPServer(8888)
>>> httpServer.register(Calendar())
>>> Dibbler.run(launchBrowser=True)

Your browser will start, and you can ask for a calendar for the year of
your choice.  If you don't want to start the browser automatically, just call
`run()` with no arguments - the application is available at
http://localhost:8888/ .  You'll have to kill the server manually because it
provides no way to stop it; a real application would have some kind of
'shutdown' methlet that called `sys.exit()`.

By combining Dibbler with an HTML manipulation library like
PyMeld (shameless plug - see http://entrian.com/PyMeld for details) you can
keep the HTML and Python code separate.


*Building applications*

You can run several plugins together like this:

>>> httpServer = Dibbler.HTTPServer()
>>> httpServer.register(plugin1, plugin2, plugin3)
>>> Dibbler.run()

...so many plugin objects, each implementing a different set of pages,
can cooperate to implement a web application.  See also the `HTTPServer`
documentation for details of how to run multiple `Dibbler` environments
simultaneously in different threads.


*Controlling connections*

There are times when your code needs to be informed the moment an incoming
connection is received, before any HTTP conversation begins.  For instance,
you might want to only accept connections from `localhost` for security
reasons.  If this is the case, your plugin should implement the
`onIncomingConnection` method.  This will be passed the incoming socket
before any reads or writes have taken place, and should return True to allow
the connection through or False to reject it.  Here's an implementation of
the `localhost`-only idea:

>>> def onIncomingConnection(self, clientSocket):
>>>     return clientSocket.getpeername()[0] == clientSocket.getsockname()[0]


*Advanced usage: Dibbler Contexts*

If you want to run several independent Dibbler environments (in different
threads for example) then each should use its own `Context`.  Normally
you'd say something like:

>>> httpServer = Dibbler.HTTPServer()
>>> httpServer.register(MyPlugin())
>>> Dibbler.run()

but that's only safe to do from one thread.  Instead, you can say:

>>> myContext = Dibbler.Context()
>>> httpServer = Dibbler.HTTPServer(context=myContext)
>>> httpServer.register(MyPlugin())
>>> Dibbler.run(myContext)

in as many threads as you like.


*Dibbler and asyncore*

If this section means nothing to you, you can safely ignore it.

Dibbler is built on top of Python's asyncore library, which means that it
integrates into other asyncore-based applications, and you can write other
asyncore-based components and run them as part of the same application.

By default, Dibbler uses the default asyncore socket map.  This means that
`Dibbler.run()` also runs your asyncore-based components, provided they're
using the default socket map.  If you want to tell Dibbler to use a
different socket map, either to co-exist with other asyncore-based components
using that map or to insulate Dibbler from such components by using a
different map, you need to use a `Dibbler.Context`.  If you're using your own
socket map, give it to the context: `context = Dibbler.Context(myMap)`.  If
you want Dibbler to use its own map: `context = Dibbler.Context({})`.

You can either call `Dibbler.run(context)` to run the async loop, or call
`asyncore.loop()` directly - the only difference is that the former has a
few more options, like launching the web browser automatically.


*Self-test*

Running `Dibbler.py` directly as a script runs the example calendar server
plus a self-test.
"""

# Dibbler is released under the Python Software Foundation license; see
# http://www.python.org/

__author__ = "Richie Hindle <richie@entrian.com>"
__credits__ = "Tim Stone"

try:
    import cStringIO as StringIO
except ImportError:
    import StringIO

import os, sys, re, time, traceback
import socket, asyncore, asynchat, cgi, urlparse, webbrowser

try:
    True, False
except NameError:
    # Maintain compatibility with Python 2.2
    True, False = 1, 0


class BrighterAsyncChat(asynchat.async_chat):
    """An asynchat.async_chat that doesn't give spurious warnings on
    receiving an incoming connection, lets SystemExit cause an exit, can
    flush its output, and will correctly remove itself from a non-default
    socket map on `close()`."""

    def __init__(self, conn=None, map=None):
        """See `asynchat.async_chat`."""
        asynchat.async_chat.__init__(self, conn)
        self._map = map

    def handle_connect(self):
        """Suppresses the asyncore "unhandled connect event" warning."""
        pass

    def handle_error(self):
        """Let SystemExit cause an exit."""
        type, v, t = sys.exc_info()
        if type == socket.error and v[0] == 9:  # Why?  Who knows...
            pass
        elif type == SystemExit:
            raise
        else:
            asynchat.async_chat.handle_error(self)

    def flush(self):
        """Flush everything in the output buffer."""
        while self.producer_fifo or self.ac_out_buffer:
            self.initiate_send()

    def close(self):
        """Remove this object from the correct socket map."""
        self.del_channel(self._map)
        self.socket.close()


class Context:
    """See the main documentation for details of `Dibbler.Context`."""
    def __init__(self, asyncMap=asyncore.socket_map):
        self._HTTPPort = None  # Stores the port for `run(launchBrowser=True)`
        self._map = asyncMap

_defaultContext = Context()


class Listener(asyncore.dispatcher):
    """Generic listener class used by all the different types of server.
    Listens for incoming socket connections and calls a factory function
    to create handlers for them."""

    def __init__(self, port, factory, factoryArgs,
                 socketMap=_defaultContext._map):
        """Creates a listener object, which will listen for incoming
        connections when Dibbler.run is called:

         o port: The TCP/IP (address, port) to listen on. Usually '' -
           meaning bind to all IP addresses that the machine has - will be
           passed as the address. For backwards interface compatibility, if
           port is just an int, an address of '' will be assumed.

         o factory: The function to call to create a handler (can be a class
           name).

         o factoryArgs: The arguments to pass to the handler factory.  For
           proper context support, this should include a `context` argument
           (or a `socketMap` argument for pure asyncore listeners).  The
           incoming socket will be prepended to this list, and passed as the
           first argument.  See `HTTPServer` for an example.

         o socketMap: Optional.  The asyncore socket map to use.  If you're
           using a `Dibbler.Context`, pass context._map.

        See `HTTPServer` for an example `Listener` - it's a good deal smaller
        than this description!"""

        asyncore.dispatcher.__init__(self, map=socketMap)
        self.socketMap = socketMap
        self.factory = factory
        self.factoryArgs = factoryArgs
        s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        s.setblocking(False)
        self.set_socket(s, self.socketMap)
        self.set_reuse_addr()
        if type(port) != type(()):
          port = ('', port)
        self.bind(port)
        self.listen(5)

    def handle_accept(self):
        """Asyncore override."""
        # If an incoming connection is instantly reset, eg. by following a
        # link in the web interface then instantly following another one or
        # hitting stop, handle_accept() will be triggered but accept() will
        # return None.
        result = self.accept()
        if result:
            clientSocket, clientAddress = result
            args = [clientSocket] + list(self.factoryArgs)
            self.factory(*args)


class HTTPServer(Listener):
    """A web server with which you can register `HTTPPlugin`s to serve up
    your content - see `HTTPPlugin` for detailed documentation and examples.

    `port` specifies the TCP/IP (address, port) on which to run, defaulting 
    to ('', 80).

    `context` optionally specifies a `Dibbler.Context` for the server.
    """

    def __init__(self, port=('', 80), context=_defaultContext):
        """Create an `HTTPServer` for the given port."""
        Listener.__init__(self, port, _HTTPHandler,
                          (self, context), context._map)
        self._plugins = []
        context._HTTPPort = port

    def register(self, *plugins):
        """Registers one or more `HTTPPlugin`-derived objects with the
        server."""
        for plugin in plugins:
            self._plugins.append(plugin)


class _HTTPHandler(BrighterAsyncChat):
    """This is a helper for the HTTP server class - one of these is created
    for each incoming request, and does the job of decoding the HTTP traffic
    and driving the plugins."""

    def __init__(self, clientSocket, server, context):
        # Grumble: asynchat.__init__ doesn't take a 'map' argument,
        # hence the two-stage construction.
        BrighterAsyncChat.__init__(self, map=context._map)
        BrighterAsyncChat.set_socket(self, clientSocket, context._map)
        self._server = server
        self._request = ''
        self.set_terminator('\r\n\r\n')

        # Because a methlet is likely to call `writeOKHeaders` before doing
        # anything else, an unexpected exception won't send back a 500, which
        # is poor.  So we buffer any sent headers until either a plain `write`
        # happens or the methlet returns.
        self._bufferedHeaders = []
        self._headersWritten = False

        # Tell the plugins about the connection, letting them veto it.
        for plugin in self._server._plugins:
            if not plugin.onIncomingConnection(clientSocket):
                self.close()

    def collect_incoming_data(self, data):
        """Asynchat override."""
        self._request = self._request + data

    def found_terminator(self):
        """Asynchat override."""
        # Parse the HTTP request.
        requestLine, headers = (self._request+'\r\n').split('\r\n', 1)
        try:
            method, url, version = requestLine.strip().split()
        except ValueError:
            self.pushError(400, "Malformed request: '%s'" % requestLine)
            self.close_when_done()
            return

        # Parse the URL, and deal with POST vs. GET requests.
        method = method.upper()
        unused, unused, path, unused, query, unused = urlparse.urlparse(url)
        cgiParams = cgi.parse_qs(query, keep_blank_values=True)
        if self.get_terminator() == '\r\n\r\n' and method == 'POST':
            # We need to read the body - set a numeric async_chat terminator
            # equal to the Content-Length.
            match = re.search(r'(?i)content-length:\s*(\d+)', headers)
            contentLength = int(match.group(1))
            if contentLength > 0:
                self.set_terminator(contentLength)
                self._request = self._request + '\r\n\r\n'
                return

        # Have we just read the body of a POSTed request?  Decode the body,
        # which will contain parameters and possibly uploaded files.
        if type(self.get_terminator()) is type(1):
            self.set_terminator('\r\n\r\n')
            body = self._request.split('\r\n\r\n', 1)[1]
            match = re.search(r'(?i)content-type:\s*([^\r\n]+)', headers)
            contentTypeHeader = match.group(1)
            contentType, pdict = cgi.parse_header(contentTypeHeader)
            if contentType == 'multipart/form-data':
                # multipart/form-data - probably a file upload.
                bodyFile = StringIO.StringIO(body)
                cgiParams.update(cgi.parse_multipart(bodyFile, pdict))
            else:
                # A normal x-www-form-urlencoded.
                cgiParams.update(cgi.parse_qs(body, keep_blank_values=True))

        # Convert the cgi params into a simple dictionary.
        params = {}
        for name, value in cgiParams.iteritems():
            params[name] = value[0]

        # Find and call the methlet.  '/eggs.gif' becomes 'onEggsGif'.
        if path == '/':
            path = '/Home'
        pieces = path[1:].split('.')
        name = 'on' + ''.join([piece.capitalize() for piece in pieces])
        for plugin in self._server._plugins:
            if hasattr(plugin, name):
                # The plugin's APIs (`write`, etc) reflect back to us via
                # `plugin._handler`.
                plugin._handler = self
                try:
                    # Call the methlet.
                    getattr(plugin, name)(**params)
                    if self._bufferedHeaders:
                        # The methlet returned without writing anything other
                        # than headers.  This isn't unreasonable - it might
                        # have written a 302 or something.  Flush the buffered
                        # headers
                        self.write(None)
                except:
                    # The methlet raised an exception - send the traceback to
                    # the browser, unless it's SystemExit in which case we let
                    # it go.
                    eType, eValue, eTrace = sys.exc_info()
                    if eType == SystemExit:
                        ##self.shutdown(2)
                        raise
                    message = """<h3>500 Server error</h3><pre>%s</pre>"""
                    details = traceback.format_exception(eType, eValue, eTrace)
                    details = '\n'.join(details)
                    self.writeError(500, message % cgi.escape(details))
                plugin._handler = None
                break
        else:
            self.onUnknown(path, params)

        # `close_when_done` and `Connection: close` ensure that we don't
        # support keep-alives or pipelining.  There are problems with some
        # browsers, for instance with extra characters being appended after
        # the body of a POSTed request.
        self.close_when_done()

    def onUnknown(self, path, params):
        """Handler for unknown URLs.  Returns a 404 page."""
        self.writeError(404, "Not found: '%s'" % path)

    def writeOKHeaders(self, contentType, extraHeaders={}):
        """Reflected from `HTTPPlugin`s."""
        # Buffer the headers until there's a `write`, in case an error occurs.
        timeNow = time.gmtime(time.time())
        httpNow = time.strftime('%a, %d %b %Y %H:%M:%S GMT', timeNow)
        headers = []
        headers.append("HTTP/1.1 200 OK")
        headers.append("Connection: close")
        headers.append("Content-Type: %s" % contentType)
        headers.append("Date: %s" % httpNow)
        for name, value in extraHeaders.items():
            headers.append("%s: %s" % (name, value))
        headers.append("")
        headers.append("")
        self._bufferedHeaders = headers

    def writeError(self, code, message):
        """Reflected from `HTTPPlugin`s."""
        # Writing an error overrides any buffered headers, but obviously
        # doesn't want to write any headers if some have already gone.
        headers = []
        if not self._headersWritten:
            headers.append("HTTP/1.0 %d Error" % code)
            headers.append("Connection: close")
            headers.append("Content-Type: text/html")
            headers.append("")
            headers.append("")
        self.push("%s<html><body>%s</body></html>" % \
                  ('\r\n'.join(headers), message))

    def write(self, content):
        """Reflected from `HTTPPlugin`s."""
        # The methlet is writing, so write any buffered headers first.
        headers = []
        if self._bufferedHeaders:
            headers = self._bufferedHeaders
            self._bufferedHeaders = None
            self._headersWritten = True

        # `write(None)` just flushes buffered headers.
        if content is None:
            content = ''
        self.push('\r\n'.join(headers) + str(content))


class HTTPPlugin:
    """Base class for HTTP server plugins.  See the main documentation for
    details."""

    def __init__(self):
        # self._handler is filled in by `HTTPHandler.found_terminator()`.
        pass

    def onIncomingConnection(self, clientSocket):
        """Implement this and return False to veto incoming connections."""
        return True

    def writeOKHeaders(self, contentType, extraHeaders={}):
        """A methlet should call this with the Content-Type and optionally
        a dictionary of extra headers (eg. Expires) before calling
        `write()`."""
        return self._handler.writeOKHeaders(contentType, extraHeaders)

    def writeError(self, code, message):
        """A methlet should call this instead of `writeOKHeaders()` /
        `write()` to report an HTTP error (eg. 403 Forbidden)."""
        return self._handler.writeError(code, message)

    def write(self, content):
        """A methlet should call this after `writeOKHeaders` to write the
        page's content."""
        return self._handler.write(content)

    def flush(self):
        """A methlet can call this after calling `write`, to ensure that
        the content is written immediately to the browser.  This isn't
        necessary most of the time, but if you're writing "Please wait..."
        before performing a long operation, calling `flush()` is a good
        idea."""
        return self._handler.flush()

    def close(self, flush=True):
        """Closes the connection to the browser.  You should call `close()`
        before calling `sys.exit()` in any 'shutdown' methlets you write."""
        if flush:
            self.flush()
        return self._handler.close()


def run(launchBrowser=False, context=_defaultContext):
    """Runs a `Dibbler` application.  Servers listen for incoming connections
    and route requests through to plugins until a plugin calls `sys.exit()`
    or raises a `SystemExit` exception."""

    if launchBrowser:
        webbrowser.open_new("http://localhost:%d/" % context._HTTPPort)
    asyncore.loop(map=context._map)


def runTestServer(readyEvent=None):
    """Runs the calendar server example, with an added `/shutdown` URL."""
    import Dibbler, calendar
    class Calendar(Dibbler.HTTPPlugin):
        _form = '''<html><body><h3>Calendar Server</h3>
                   <form action='/'>
                   Year: <input type='text' name='year' size='4'>
                   <input type='submit' value='Go'></form>
                   <pre>%s</pre></body></html>'''

        def onHome(self, year=None):
            if year:
                result = calendar.calendar(int(year))
            else:
                result = ""
            self.writeOKHeaders('text/html')
            self.write(self._form % result)

        def onShutdown(self):
            self.writeOKHeaders('text/html')
            self.write("<html><body><p>OK.</p></body></html>")
            self.close()
            sys.exit()

    httpServer = Dibbler.HTTPServer(8888)
    httpServer.register(Calendar())
    if readyEvent:
        # Tell the self-test code that the test server is up and running.
        readyEvent.set()
    Dibbler.run(launchBrowser=True)

def test():
    """Run a self-test."""
    # Run the calendar server in a separate thread.
    import re, threading, urllib
    testServerReady = threading.Event()
    threading.Thread(target=runTestServer, args=(testServerReady,)).start()
    testServerReady.wait()

    # Connect to the server and ask for a calendar.
    page = urllib.urlopen("http://localhost:8888/?year=2003").read()
    if page.find('January') != -1:
        print "Self test passed."
    else:
        print "Self-test failed!"

    # Wait for a key while the user plays with his browser.
    raw_input("Press any key to shut down the application server...")

    # Ask the server to shut down.
    page = urllib.urlopen("http://localhost:8888/shutdown").read()
    if page.find('OK') != -1:
        print "Shutdown OK."
    else:
        print "Shutdown failed!"

if __name__ == '__main__':
    test()
-------------- next part --------------
Skipped content of type multipart/appledouble
From noreply at sourceforge.net  Mon Jan 20 05:13:16 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 20 10:02:33 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to
	bind to specific addresses
Message-ID: <E18abjo-0002dn-00@sc8-sf-web3.sourceforge.net>

Patches item #670417, was opened at 2003-01-18 21:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Richie Hindle (richiehindle)
Summary: Allow the pop3 proxies to bind to specific addresses

Initial Comment:
This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting.

This is useful for two reasons:

1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security.

2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts.

The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string.


----------------------------------------------------------------------

Comment By: Fran�ois Granger (fgranger)
Date: 2003-01-20 14:13

Message:
Logged In: YES 
user_id=86948

I asked Tony about this, he sent me the files. Can I upload them or forward them to you ?

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 12:35

Message:
Logged In: YES 
user_id=85414

Has SourceForge eaten the patch file?  It says
"No Files Currently Attached".


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

From noreply at sourceforge.net  Mon Jan 20 06:28:09 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 20 10:02:49 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to
	bind to specific addresses
Message-ID: <E18acuH-0001Ds-00@sc8-sf-web4.sourceforge.net>

Patches item #670417, was opened at 2003-01-18 20:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Richie Hindle (richiehindle)
Summary: Allow the pop3 proxies to bind to specific addresses

Initial Comment:
This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting.

This is useful for two reasons:

1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security.

2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts.

The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string.


----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 14:28

Message:
Logged In: YES 
user_id=85414

If you can't upload them here, please email them to me.
Thanks.


----------------------------------------------------------------------

Comment By: Fran�ois Granger (fgranger)
Date: 2003-01-20 13:13

Message:
Logged In: YES 
user_id=86948

I asked Tony about this, he sent me the files. Can I upload them or forward them to you ?

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 11:35

Message:
Logged In: YES 
user_id=85414

Has SourceForge eaten the patch file?  It says
"No Files Currently Attached".


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

From skip at pobox.com  Mon Jan 20 09:06:28 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 10:06:39 2003
Subject: [Spambayes] Change Required To pspam/options.py
In-Reply-To: <3E2C0193.3040109@pa.press.net>
References: <3E2C0193.3040109@pa.press.net>
Message-ID: <15916.4212.309598.216695@montanaro.dyndns.org>


    John> from Options ...

    John> needs changing to

    John> from spambayes.Options ...

Fix checked in.

Skip


From sjoerd at acm.org  Mon Jan 20 16:23:49 2003
From: sjoerd at acm.org (Sjoerd Mullender)
Date: Mon Jan 20 10:23:55 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access?
In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org> 
References: <15916.3865.297629.696625@montanaro.dyndns.org> 
Message-ID: <20030120152349.0855F74247@indus.ins.cwi.nl>

On Mon, Jan 20 2003 Skip Montanaro wrote:

> 
> Depending on how training and classifying are accomplished, it's quite
> possible that the two activities will be done in different processes.  For
> example, I am currently experimenting with training using pop3proxy (well,
> still my offshoot proxytrainer at the moment) while classification is being
> done by hammiefilter run from procmail.  This implies a need to lock the
> shelve/pickle file used to store the training info.  Seems to me we need to
> (be able to) lock the shelve/pickle file.  The only lock facility which
> seems cross-platform enough for this application is the set of flags used by
> os.open().  To lock the database you'd have to check/create a lock file
> related (namewise) to the actual database file.  Has anyone given this any
> thought?

I use the following code in my programs.  Programs start with creating
an instance of this class, and end by calling the close method.

As far as I know, the safest way to do locking if you also have NFS
partitions is to try to link to the lock file, so that is the
technique I use.

import os, time
import spambayes.Options
import spambayes.hammie

class error(Exception):
    pass

class HammieFilter(object):
    def __init__(self):
        dbname = spambayes.Options.options.hammiefilter_persistent_storage_file
        dbname = os.path.expanduser(dbname)
        usedb = spambayes.Options.options.hammiefilter_persistent_use_database
        tmplock = '%s.lock%d' % (dbname, os.getpid())
        self.lockfile = '%s.lock' % dbname
        open(tmplock, 'w').close()
        for i in range(5):
            if i > 0:
                time.sleep(5)
            try:
                os.link(tmplock, self.lockfile)
            except OSError:
                pass
            else:
                break
        else:
            os.unlink(tmplock)
            raise error, 'Database locked'
        os.unlink(tmplock)
        self.hammie = spambayes.hammie.open(dbname, usedb, 'c')

    def train(self, msg, is_spam):
        self.hammie.train(msg, is_spam)

    def untrain(self, msg, is_spam):
        self.hammie.untrain(msg, is_spam)

    def score(self, msg, evidence = False):
        return self.hammie.score(msg, evidence)

    def close(self):
        self.hammie.store()
        os.unlink(self.lockfile)


-- Sjoerd Mullender <sjoerd@acm.org>

From seant at webreply.com  Mon Jan 20 11:08:05 2003
From: seant at webreply.com (Sean True)
Date: Mon Jan 20 11:11:30 2003
Subject: [Spambayes] Spamconference.org
Message-ID: <MJEHLHJKGINLONDMMKNEEEHFJAAA.seant@webreply.com>

I attended the spam conference on Friday. Barry mentioned spambayes as part 
the global Mailman picture. It was a good talk, in general.

http://www.spamconference.org has links to a copy of the webcasts.
Worth listening to. There are also abstracts. There are eventually going to
be complete papers.

-- Sean


From skip at pobox.com  Mon Jan 20 10:43:39 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 11:43:48 2003
Subject: [Spambayes] nothing gets updated
Message-ID: <15916.10043.564368.750299@montanaro.dyndns.org>


I noticed that when training via my proxytrainer the shelf file isn't
getting modified - at least its 'saved state' key doesn't change.  I also
noticed that it seems to be taking longer and longer to complete the
operation after I click the 'train' button.

I'd like to switch back to pop3proxy and start folding my user interface
changes into it, but I still have trouble running it.  I just tried running
it as

    python pop3proxy.py -d -p hammie.db

It started okay, except I got a warning about xmllib:

    /Users/skip/local/lib/python2.3/xmllib.py:10: DeprecationWarning: The
    xmllib module is obsolete.  Use xml.sax instead.
      DeprecationWarning)

and when I tried to visit http://localhost:8880/ I get a 500 Server error
message:

    Traceback (most recent call last):

      File "spambayes/Dibbler.py", line 389, in found_terminator
        getattr(plugin, name)(**params)

      File "pop3proxy.py", line 619, in onHome
        'status.gif', statusTable % stateDict)+

      File "spambayes/PyMeldLite.py", line 618, in __getattr__
        return self.__dict__[name]

    KeyError: '__coerce__'

Looking through the code in both pop3proxy and proxytrainer, I see calls to
self._doSave() or self.doSave() at the end of onReview(), but all they do is
call self.bayes.store().  Where is the actual decision about a message's
status translated into a change in the state of the database, either
in-memory or on-disk?

Skip

From richie at entrian.com  Mon Jan 20 20:26:22 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 15:27:05 2003
Subject: [Spambayes] Follow up
In-Reply-To: <a05200f07ba4f1a73bf9f@[192.168.1.20]>
References: <a05200f07ba4f1a73bf9f@[192.168.1.20]>
Message-ID: <golo2v0pjrd4c7b1a2s8untjkgc7stu6h1@4ax.com>


[Fran?ois]
> MacOS X create in each directory a file named ".DS_Store" for it own 
> uses. Since it is a hidden file, there is no issue with most 
> software. But pop3proxy loads it as if it was a normal message file.

Oops.  Now fixed - thanks.

Incidentally, unexpected exceptions raised by the web UI should now give
you the exception and a traceback in your browser.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Mon Jan 20 20:26:41 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 15:27:14 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access?
In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org>
References: <15916.3865.297629.696625@montanaro.dyndns.org>
Message-ID: <11mo2vsri5vjvio62irbkq3ihcjndb0p9k@4ax.com>


[Skip]
> Depending on how training and classifying are accomplished, it's quite
> possible that the two activities will be done in different processes.

<slightly-wild-idea>
For what it's worth, this is one of the reasons I'm keen to keep all server
components within one process, and using asyncore - all concurrency issues
are taken care of automatically.  It's probably overkill for our
application, but if hammie could classify by talking to the web UI, just
like your proxytee.py script does, we could use the server as the
concurrency mechanism.  Pure hammie users wouldn't need a server (probably,
depending on how they train).  This is how most relational databases do it,
after all...
</slightly-wild-idea>

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Mon Jan 20 20:27:24 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 15:27:54 2003
Subject: [Spambayes] Re: proxytrainer.py
Message-ID: <gvfo2vce7r5imhii5esudafgn5r4iaetrh@4ax.com>


[Skip]
> this will probably quickly deteriorate into a matter of personal taste
> and display properties

You're right.  8-)

Your striping looks too dark to me on my monitor - I have #f4f4f4, you have
#dddddd; can we compromise on #e8e8e8?  That looks a fraction too dark to
me, and will probably look a fraction too light for you.

You've made each radio button line up directly under its heading; I
deliberately hadn't done that.  It looks nicer at first glance that way,
but I've found (through extensive usability testing 8-) that it's easier to
use when the radio buttons are physically closer together.  They still lay
out sensibly (in my environment at least - was the layout bad for you, or
did you just want each button directly under the heading?)

I like the fact that you can view the messages - that's been on my to-do
list for ages!  And pre-classifying messages in the Unsure list is another
good idea - nice one.

> Looks like it's time to backport them to pop3proxy.py.

Yes please... having both is confusing to users and a pain for developers.
If you'd like us to share the work, let me know - the way the HTML is built
has changed dramatically, for instance, so I could do those bits (though I
still prefer my way of laying out the radio buttons...)

-- 
Richie


From richie at entrian.com  Mon Jan 20 20:30:16 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 15:30:43 2003
Subject: [Spambayes] proxytrainer.py and proxytee.py are checked in
In-Reply-To: <15912.28278.137619.916136@montanaro.dyndns.org>
References: <15910.61389.133887.569308@montanaro.dyndns.org>
	<rlkg2vogpv40bbu7879emjbida93oefk1n@4ax.com>
	<15912.28278.137619.916136@montanaro.dyndns.org>
Message-ID: <i8io2v8g054f2tg84oj8s9lb6jaqq042a9@4ax.com>


[Richie]
> You should really make your message-naming code use the same
> system as everything else

[Skip]
> Wasn't aware I did anything differently than you.  Did you notice something?

It looks like you've introduced self.messageName, which you increment each
time you receive a message.  I base all message names on time.time(), with
a uniquifier appended if two arrive within one clock tick of each other -
see onRetr().

[Skip]
> I think as important (or more important) than day-by-day display is
> chunk-by-chunk display.  I get far too much mail to want to review it all at
> once anyway.  If I can't take the time to train everything, I don't want to
> be depressed about it. ;-)

Fair enough.  I find that one day per chunk makes sense, and even if I get
a hundred messages in a day, once the system's been trained it's still very
quick to cast my eye down the list and correct any mistakes.

[Richie]
> If I can persuade you to use pop3proxy (or its successor, a
> generic Spambayes server that can optionally host either or both
> of the web UI and the POP3 proxy), you won't need to pull out
> the async stuff.

[Skip]
> That's fine.  My only worry is that the async code will never be as well
> exercised as SimpleHTTPServer.

Maybe, I don't know.  As far as I know there have never been any
async-related problems with pop3proxy, and I've used it successfully in my
day job.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Mon Jan 20 20:31:20 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 15:31:48 2003
Subject: [Spambayes] OptionConfig.py - split into two pieces?
In-Reply-To: <15912.33776.619638.320031@montanaro.dyndns.org>
References: <15912.33776.619638.320031@montanaro.dyndns.org>
Message-ID: <d6lo2vk6g1gdreme0j1sks9acovq56750j@4ax.com>


[Skip]
> Perhaps [OptionsConfig] should be split in two pieces, a script and
> an importable module.

Is there any need to keep the script, now that it's a part of pop3proxy.py
and you can run pop3proxy.py without any proxies configured?  Does anyone
have a reason for keeping OptionsConfig as a standalone script?

-- 
Richie Hindle
richie@entrian.com


From neale at woozle.org  Mon Jan 20 13:28:40 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 20 16:28:49 2003
Subject: [Spambayes] spampot -- spam honeypot server
Message-ID: <w53smvnzf1z.fsf@woozle.org>

So the spam conference was great, etc. etc.  The best thing was that I
met a bunch of intersting people. From talking with folks, it sounds
like my spampot program might be of interest to the general public.

Spampot is basically Jackpot, but written in Python.  Right now I'm sure
Jackpot does more than Spampot does.  But I'm not sure Jackpot saves any
of the messages it traps, and I have a feeling spampot will run on more
platforms.

For those unfamiliar with Jackpot, it comes up looking like an SMTP
server, and will relay messages it thinks are probe tests.  Everything
else just goes to the bit bucket.  With spampot, 5% of the incoming spam
is saved to disk so you can look at it later.  This is of critical
importance to anyone who's writing a spam filter, because this way you
get pure unadulterated spam as it would come in to your SMTP server.
Contrast this with something like SpamArchive, where you get all sorts
of messages of variying quality, forwarded and sullied by who knows
what.

The first night I ran spampot on an IP with no DNS entry associated with
it, I got a probe after four hours.  After I'd fixed the probe relaying
logic to relay that type of probe, it took all of ten hours for me to
collect over 400MB of spam.

If people are interested enough in this, I'll make a separate mail list
and web page for it.  But for now it's available at
<http://woozle.org/~neale/src/python/spampot.py>.

Happy hacking.

Neale

From skip at pobox.com  Mon Jan 20 15:41:06 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 16:41:15 2003
Subject: [Spambayes] spampot -- spam honeypot server
In-Reply-To: <w53smvnzf1z.fsf@woozle.org>
References: <w53smvnzf1z.fsf@woozle.org>
Message-ID: <15916.27890.874021.624060@montanaro.dyndns.org>

Neale,

Hopefully I won't sound too much like an idiot, but what's a "probe
message"?  How do you classify messages which come into spampot, just "probe
message" and "everything else"?

Skip


From richie at entrian.com  Mon Jan 20 21:48:20 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 16:49:03 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ]
In-Reply-To: <BA51B471.61A00%francois.granger@free.fr>
References: <BA51B471.61A00%francois.granger@free.fr>
Message-ID: <niro2v41mrsulriqu5i1puu17fag4nhf24@4ax.com>


[Fran?ois]
> bind_address.patch

Great - thanks.  I'll look at it as soon as I get the chance.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Mon Jan 20 22:18:31 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 17:19:07 2003
Subject: [Spambayes] nothing gets updated
In-Reply-To: <15916.10043.564368.750299@montanaro.dyndns.org>
References: <15916.10043.564368.750299@montanaro.dyndns.org>
Message-ID: <jmro2vc5264peorptjfv9hc8ivv8mcuve8@4ax.com>


[Skip]
> I'd like to switch back to pop3proxy and start folding my user interface
> changes into it, but I still have trouble running it
>     [...]
> DeprecationWarning
>     [...]
> KeyError: '__coerce__'

Looks like I need to test this with 2.3a1... I'm downloading it now.

> Looking through the code in both pop3proxy and proxytrainer, I see calls to
> self._doSave() or self.doSave() at the end of onReview(), but all they do is
> call self.bayes.store().  Where is the actual decision about a message's
> status translated into a change in the state of the database, either
> in-memory or on-disk?

In pop3proxy, the training (ie. the calling of Classifier.learn()) is done
when messages are moved from the Unknown corpus to one of the Ham or Spam
corpuses.  This code:

            # Create the Trainers.
            self.spamTrainer = storage.SpamTrainer(self.bayes)
            self.hamTrainer = storage.HamTrainer(self.bayes)
            self.spamCorpus.addObserver(self.spamTrainer)
            self.hamCorpus.addObserver(self.hamTrainer)

sets up trainers which automatically train the classifier when messages are
moved between corpuses using Corpus.takeMessage(), which is called by
onReview().  This code is missing from proxytrainer.py - overzealous code
trimming?  8-)

-- 
Richie Hindle
richie@entrian.com


From ducky at webfoot.com  Mon Jan 20 14:42:34 2003
From: ducky at webfoot.com (Kaitlin Duck Sherwood)
Date: Mon Jan 20 17:47:22 2003
Subject: [Spambayes] (anti-)spam conference trip report
In-Reply-To: <jmro2vc5264peorptjfv9hc8ivv8mcuve8@4ax.com>
References: <15916.10043.564368.750299@montanaro.dyndns.org>
 <jmro2vc5264peorptjfv9hc8ivv8mcuve8@4ax.com>
Message-ID: <p05100310ba522b82d8d4@[10.0.0.2]>

For those who couldn't make it to the (anti-)spam conference in 
Boston last week, I posted a (long) trip report at
	http://www.overcomeemailoverload.com./antispam/2003SpamConfNotes.html

No reply needed.

From neale at woozle.org  Mon Jan 20 15:16:00 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 20 18:16:04 2003
Subject: [Spambayes] spampot -- spam honeypot server
In-Reply-To: <15916.27890.874021.624060@montanaro.dyndns.org> (Skip
 Montanaro's message of "Mon, 20 Jan 2003 15:41:06 -0600")
References: <w53smvnzf1z.fsf@woozle.org>
	<15916.27890.874021.624060@montanaro.dyndns.org>
Message-ID: <w53iswjza33.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

> Neale,
>
> Hopefully I won't sound too much like an idiot, but what's a "probe
> message"?  How do you classify messages which come into spampot, just
> "probe message" and "everything else"?

So when you kick up a mail server, you'll get a lot of messages like
this:

  SMTP-Hello: master-cv7889w2
  SMTP-Mail-From: <china9988@21cn.com>
  SMTP-Rcpt-To: <china9988@21cn.com>
  From: china9988@21cn.com
  Subject: 192.168.1.2
  To: china9988@21cn.com
  Date: Thu, 16 Jan 2003 21:48:41 +0900
  X-Priority: 3
  X-Library: Indy 8.0.25

  t_Smtp.LocalIP

This is one of the more baffling probes, since china9988@21cn.com gives
NDRs--maybe really old spam software.  But all of the probes I've seen
so far have the IP address of my honeypot sever in the subject line.  It
makes sense--send out mail blindly, and anything you get back has the IP
address of an open relay in the subject line.

And yes, currently I only classify as "probe" and "everything else".  I
do this with Maildir flags, though there's really no reason why it
should have to be in Maildir format, aside from making it easy to view
with mutt.

Right now my probe detection logic needs work :)

Neale

From jdhunter at ace.bsd.uchicago.edu  Mon Jan 20 17:19:29 2003
From: jdhunter at ace.bsd.uchicago.edu (John Hunter)
Date: Mon Jan 20 18:19:12 2003
Subject: [Spambayes] spambayes with gnus/nnml
Message-ID: <m2vg0je7em.fsf@mother.paradise.lost>


I am currently using gnus to split my incoming mail using the nnml
backend.  It splits the mail in /var/spool/mail/jdhunter into my
personal, professional, mailing list dirs.  After I have read my mail,
I periodically sort it into archival directories, typically by sender,
with the exception of mailing lists, which are already split directly
into their final resting place

As such, after gnus does the split, my inbox looks like (nnml files
are one file per mail named as integers, and denoted by [0-9]+ below)

    Mail/inbox1/[0-9]+
    Mail/inbox2/[0-9]+
    Mail/inbox3/[0-9]+
    Mail/mail-list/list1/[0-9]+
    Mail/mail-list/list2/[0-9]+
    Mail/mail-list/list3//[0-9]+

Typically, I'll move my inbox1-n mail into archive folders every few
days

    Mail/spam/[0-9]+
    Mail/personal/sender1/[0-9]+
    Mail/personal/sender2/[0-9]+
    Mail/prof/sender1/[0-9]+
    Mail/prof/sender2/[0-9]+
    Mail/biz/sender1/[0-9]+
    Mail/biz/sender2/[0-9]+
    Mail/biz/sendern/[0-9]+

I have a lot of mail-list and archival folders, and am adding new ones
all the time.  My inbox folders, however, are fairly static.

I am wondering how to best integrate spambayes, since my spam split
regexes are no longer keeping up.

What I would like is for spam bayes to prefilter my mail, siphoning
off spam to a spam folder, and putting the rest in
/var/spool/mail/jdhunter (or some other file that I can advise gnus to
check) where I can then use gnus to split it.  But I am not sure which
dirs I should advise hammiefilter.py to monitor so it can retrain
itself.  Are any of you combining spambayes with gnus splitting?  The
notes in HAMMIE.txt suggest that hammiefilter expects mbox files.

Thanks,
John Hunter


From richie at entrian.com  Mon Jan 20 23:24:33 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 20 18:25:02 2003
Subject: [Spambayes] nothing gets updated
In-Reply-To: <jmro2vc5264peorptjfv9hc8ivv8mcuve8@4ax.com>
References: <15916.10043.564368.750299@montanaro.dyndns.org>
	<jmro2vc5264peorptjfv9hc8ivv8mcuve8@4ax.com>
Message-ID: <881p2vkng1maek065f3vcadjbho9ga4p0i@4ax.com>


> [Skip]
> DeprecationWarning
>     [...]
> KeyError: '__coerce__'

The KeyError problem is fixed, and the DeprecationWarning is suppressed for
now.

-- 
Richie Hindle
richie@entrian.com


From piersh at friskit.com  Mon Jan 20 15:49:43 2003
From: piersh at friskit.com (Piers Haken)
Date: Mon Jan 20 18:35:05 2003
Subject: [Spambayes] Outlook plugin and hotmail
Message-ID: <9891913C5BFE87429D71E37F08210CB92C7495@zeus.sfhq.friskit.com>

I'm not sure why but it looks like filtering of Hotmail inboxes has
recently been broken. Here's a stack trace:

pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "C:\Python22\lib\site-packages\win32com\server\policy.py", line
275, in _Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "C:\Python22\lib\site-packages\win32com\server\policy.py", line
280, in _invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None,
None)
  File "C:\Python22\lib\site-packages\win32com\server\policy.py", line
562, in _invokeex_
    return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags,
args, kwArgs, serviceProvider)
  File "C:\Python22\lib\site-packages\win32com\server\policy.py", line
510, in _invokeex_
    return apply(func, args)
  File "C:\Python22\spam\spambayes\Outlook2000\addin.py", line 184, in
OnItemAdd
    msgstore_message = self.manager.message_store.GetMessage(item)
  File "C:\Python22\spam\spambayes\Outlook2000\msgstore.py", line 230,
in GetMessage
    message_id = self.NormalizeID(message_id)
  File "C:\Python22\spam\spambayes\Outlook2000\msgstore.py", line 178,
in NormalizeID
    assert type(item_id) in [type(''), type(u'')], "What kind of ID is
'%r'?" % (item_id,)
exceptions.AssertionError: What kind of ID is
'<win32com.gen_py.None.MailItem>'?

Piers.
From anthony at interlink.com.au  Tue Jan 21 10:33:46 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 20 18:36:55 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <15916.3865.297629.696625@montanaro.dyndns.org> 
Message-ID: <200301202333.h0KNXl520936@localhost.localdomain>


>>> Skip Montanaro wrote
> 
> Depending on how training and classifying are accomplished, it's quite
> possible that the two activities will be done in different processes.  For
> example, I am currently experimenting with training using pop3proxy (well,
> still my offshoot proxytrainer at the moment) while classification is being
> done by hammiefilter run from procmail.  This implies a need to lock the
> shelve/pickle file used to store the training info.  Seems to me we need to
> (be able to) lock the shelve/pickle file.  The only lock facility which
> seems cross-platform enough for this application is the set of flags used by
> os.open().  To lock the database you'd have to check/create a lock file
> related (namewise) to the actual database file.  Has anyone given this any
> thought?

I'd suggest, instead, that training write to a different filename, then,
when it's complete, rename the new file to the existing file. Real operating
systems will do the right thing - I don't know if Windows will just choke
and die, tho....


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Jan 21 10:38:43 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 20 18:40:48 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <11mo2vsri5vjvio62irbkq3ihcjndb0p9k@4ax.com> 
Message-ID: <200301202338.h0KNchK21003@localhost.localdomain>


>>> Richie Hindle wrote
> For what it's worth, this is one of the reasons I'm keen to keep all server
> components within one process, and using asyncore - all concurrency issues
> are taken care of automatically.  It's probably overkill for our
> application, but if hammie could classify by talking to the web UI, just
> like your proxytee.py script does, we could use the server as the
> concurrency mechanism.  Pure hammie users wouldn't need a server (probably,
> depending on how they train).  This is how most relational databases do it,
> after all...

Hm. I'd prefer that this _not_ be a requirement, as it makes it harder
for my setup to do the right thing, as well as limiting the usefulness
in a number of potential applications I'm thinking of for work...

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From anthony at interlink.com.au  Tue Jan 21 10:40:40 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 20 18:42:20 2003
Subject: [Spambayes] spambayes with gnus/nnml 
In-Reply-To: <m2vg0je7em.fsf@mother.paradise.lost> 
Message-ID: <200301202340.h0KNefa21028@localhost.localdomain>


>>> John Hunter wrote
> What I would like is for spam bayes to prefilter my mail, siphoning
> off spam to a spam folder, and putting the rest in
> /var/spool/mail/jdhunter (or some other file that I can advise gnus to
> check) where I can then use gnus to split it.  But I am not sure which
> dirs I should advise hammiefilter.py to monitor so it can retrain
> itself.  Are any of you combining spambayes with gnus splitting?  The
> notes in HAMMIE.txt suggest that hammiefilter expects mbox files.

Can you use procmail? If so, check the INTEGRATION.txt file for a 
suitable recipe - modify it so the default action is to put the message
in your spool file.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From neale at woozle.org  Mon Jan 20 15:58:10 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 20 18:58:16 2003
Subject: [Spambayes] spambayes with gnus/nnml
In-Reply-To: <m2vg0je7em.fsf@mother.paradise.lost> (John Hunter's message of
 "Mon, 20 Jan 2003 17:19:29 -0600")
References: <m2vg0je7em.fsf@mother.paradise.lost>
Message-ID: <w533cnnz84t.fsf@woozle.org>

John Hunter <jdhunter@ace.bsd.uchicago.edu> writes:

> What I would like is for spam bayes to prefilter my mail, siphoning
> off spam to a spam folder, and putting the rest in
> /var/spool/mail/jdhunter (or some other file that I can advise gnus to
> check) where I can then use gnus to split it.  But I am not sure which
> dirs I should advise hammiefilter.py to monitor so it can retrain
> itself.  Are any of you combining spambayes with gnus splitting?  The
> notes in HAMMIE.txt suggest that hammiefilter expects mbox files.

Well, I think mboxtrain will see a Gnus nnml directory as an MH
directory and work fine.  But I think there may be a better solution for
Gnus, and maybe in turn, mutt.

I'll check out Ted Zlatanov's spam.el to see how easy it'd be to hook
in.  IIRC hooking spambayes into this would be a piece of cake.

Neale

From neale at woozle.org  Mon Jan 20 16:00:13 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 20 19:00:17 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access?
In-Reply-To: <200301202333.h0KNXl520936@localhost.localdomain> (Anthony
 Baxter's message of "Tue, 21 Jan 2003 10:33:46 +1100")
References: <200301202333.h0KNXl520936@localhost.localdomain>
Message-ID: <w53y95fxtgy.fsf@woozle.org>

Hey guys, I'm not quite up on the discussion yet, but doesn't the
bsddb module already lock the database when you open it for writes?  I
really don't recall, because I remember writing a wrapper for dbm files
once that locked the database using flock(), but then I remember getting
rid of that because I thought dbm files were automatically locked by the
dbm implementation.

Neale

From neale at woozle.org  Mon Jan 20 16:18:21 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 20 19:18:29 2003
Subject: [Spambayes] Something's still missing from hammiefilter
In-Reply-To: <15909.56641.568386.266344@montanaro.dyndns.org> (Skip
 Montanaro's message of "Wed, 15 Jan 2003 16:14:25 -0600")
References: <15909.56641.568386.266344@montanaro.dyndns.org>
Message-ID: <w53vg0jxsmq.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

> The -d (use dbm) and -p (specify pickle or database file) flags are missing.
> I'd really prefer these be available on the command line as well as via the
> options file.  Is there a reason not to expose them on the command line?

Nope.  They're exposed now.  Thanks for the suggestion :)

Neale


From skip at pobox.com  Mon Jan 20 18:26:03 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 19:26:16 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <200301202333.h0KNXl520936@localhost.localdomain>
References: <15916.3865.297629.696625@montanaro.dyndns.org>
        <200301202333.h0KNXl520936@localhost.localdomain>
Message-ID: <15916.37787.511871.538898@montanaro.dyndns.org>


    Anthony> I'd suggest, instead, that training write to a different
    Anthony> filename, then, when it's complete, rename the new file to the
    Anthony> existing file.

Depending how your training works you might wind up copying a 20+MB file for
each message.

Skip

From skip at pobox.com  Mon Jan 20 18:28:14 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 19:28:27 2003
Subject: [Spambayes] spambayes with gnus/nnml 
In-Reply-To: <200301202340.h0KNefa21028@localhost.localdomain>
References: <m2vg0je7em.fsf@mother.paradise.lost>
        <200301202340.h0KNefa21028@localhost.localdomain>
Message-ID: <15916.37918.391558.623588@montanaro.dyndns.org>


    >>>> John Hunter wrote
    >> What I would like is for spam bayes to prefilter my mail, siphoning
    >> off spam to a spam folder, and putting the rest in
    >> /var/spool/mail/jdhunter ...

    Anthony> Can you use procmail?

Agreed, this should not be spambayes' job.  SpamAssassin went through this a
few months ago.  SA was originally written so it could do what John
requested.  They finally concluded this was wrong and wound up ripping out
the stuff which did it.

Skip

From anthony at interlink.com.au  Tue Jan 21 11:28:26 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 20 19:30:36 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <15916.37787.511871.538898@montanaro.dyndns.org> 
Message-ID: <200301210028.h0L0SSP21497@localhost.localdomain>


>>> Skip Montanaro wrote
> Depending how your training works you might wind up copying a 20+MB file for
> each message.

Copying? When? You'd write out the new pickles, sure, but you have to
do that, anyway. 


From skip at pobox.com  Mon Jan 20 18:34:52 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 19:35:00 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <200301210028.h0L0SSP21497@localhost.localdomain>
References: <15916.37787.511871.538898@montanaro.dyndns.org>
        <200301210028.h0L0SSP21497@localhost.localdomain>
Message-ID: <15916.38316.889660.193790@montanaro.dyndns.org>


    >>>> Skip Montanaro wrote
    >> Depending how your training works you might wind up copying a 20+MB
    >> file for each message.

    Anthony> Copying? When? You'd write out the new pickles, sure, but you
    Anthony> have to do that, anyway.

How do you get the temp file from the real file without copying it?  If I
understand the way things work, you'd do something like

    * copy real to temp
    * train on new messages
    * update temp
    * move temp back to real (the atomic part we all want)

Skip

From skip at pobox.com  Mon Jan 20 18:32:13 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 19:35:57 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access?
In-Reply-To: <w53y95fxtgy.fsf@woozle.org>
References: <200301202333.h0KNXl520936@localhost.localdomain>
        <w53y95fxtgy.fsf@woozle.org>
Message-ID: <15916.38157.157203.892109@montanaro.dyndns.org>


    Neale> Hey guys, I'm not quite up on the discussion yet, but doesn't the
    Neale> bsddb module already lock the database when you open it for
    Neale> writes? 

If it does, that would be fine when anydbm selects that database, but
wouldn't help people who (silently) get other databases.  Also, I believe
the shelve module always opens databases for read/write access, probably
generating unnecessary lock activity.  It would be nice if hammiefilter (at
least) could open the file read-only.

Skip


From anthony at interlink.com.au  Tue Jan 21 11:52:05 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 20 19:54:17 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <15916.38316.889660.193790@montanaro.dyndns.org> 
Message-ID: <200301210052.h0L0q8R21738@localhost.localdomain>


>>> Skip Montanaro wrote
> How do you get the temp file from the real file without copying it?  If I
> understand the way things work, you'd do something like
> 
>     * copy real to temp
>     * train on new messages
>     * update temp
>     * move temp back to real (the atomic part we all want)

I thought it'd be more like:

      * open real in read-only mode, load into memory
      * train on new messages
      * write new data out to temp
      * rename temp to real (atomically)


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From skip at pobox.com  Mon Jan 20 19:18:49 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 20 20:18:58 2003
Subject: [Spambayes] locking pickle/dbm against concurrent access? 
In-Reply-To: <200301210052.h0L0q8R21738@localhost.localdomain>
References: <15916.38316.889660.193790@montanaro.dyndns.org>
        <200301210052.h0L0q8R21738@localhost.localdomain>
Message-ID: <15916.40953.882684.507152@montanaro.dyndns.org>

>>>>> "Anthony" == Anthony Baxter <anthony@interlink.com.au> writes:

    >>>> Skip Montanaro wrote
    >> How do you get the temp file from the real file without copying it?  If I
    >> understand the way things work, you'd do something like
    >> 
    >> * copy real to temp
    >> * train on new messages
    >> * update temp
    >> * move temp back to real (the atomic part we all want)

    Anthony> I thought it'd be more like:

    Anthony>       * open real in read-only mode, load into memory
    Anthony>       * train on new messages
    Anthony>       * write new data out to temp
    Anthony>       * rename temp to real (atomically)

Perhaps, but that first step would be even more expensive than a simple
copy.  I thought all the current system did was score the current message
then update only those keys necessary.

In addition, I don't think shelve allows you to open a database in read-only
mode.  Oops, wait, the default is read/write.  Neither shelve.open()'s
docstring nor the section in the libref manual says anything about its flag
argument.  You have to RTSL to learn about it.  I'll see about fixing that.

The libref docs do say:

    The shelve module does not support concurrent read/write access to
    shelved objects.  (Multiple simultaneous read accesses are safe.)  When
    a program has a shelf open for writing, no other program should have it
    open for reading or writing.  Unix file locking can be used to solve
    this, but this differs across Unix versions and requires knowledge about
    the database implementation used.

Skip

From tim at fourstonesExpressions.com  Mon Jan 20 20:17:32 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 20 21:18:42 2003
Subject: [Spambayes] spampot -- spam honeypot server
In-Reply-To: <15916.27890.874021.624060@montanaro.dyndns.org>
Message-ID: <TP72TQ64TRPNMFEHDZXC8DBQNUPLFW.3e2cadbc@myst>

Skip, a probe message is a test smtp stream that's sent to an ip address by 
someone who's looking to see if an smtp server is running there.  If it 
appears that there is, the spammer will try to relay a spam through it.  If 
that works, then watch out... the floodgates will open.  Neale's spampot idea 
is very kewl!  - TimS

1/20/2003 3:41:06 PM, Skip Montanaro <skip@pobox.com> wrote:

>Neale,
>
>Hopefully I won't sound too much like an idiot, but what's a "probe
>message"?  How do you classify messages which come into spampot, just "probe
>message" and "everything else"?
>
>Skip
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim at fourstonesExpressions.com  Mon Jan 20 21:28:52 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 20 22:29:53 2003
Subject: [Spambayes] spampot -- spam honeypot server
In-Reply-To: <w53iswjza33.fsf@woozle.org>
Message-ID: <WTE0URPL82TO22VTHEF0HDRQ3J4Y.3e2cbe74@myst>

Probe detection.... looks like a job for spambayes... - TimS  ;)

1/20/2003 5:16:00 PM, "Neale Pickett" <neale@woozle.org> wrote:

>Skip Montanaro <skip@pobox.com> writes:
>
>> Neale,
>>
>> Hopefully I won't sound too much like an idiot, but what's a "probe
>> message"?  How do you classify messages which come into spampot, just
>> "probe message" and "everything else"?
>
>So when you kick up a mail server, you'll get a lot of messages like
>this:
>
>  SMTP-Hello: master-cv7889w2
>  SMTP-Mail-From: <china9988@21cn.com>
>  SMTP-Rcpt-To: <china9988@21cn.com>
>  From: china9988@21cn.com
>  Subject: 192.168.1.2
>  To: china9988@21cn.com
>  Date: Thu, 16 Jan 2003 21:48:41 +0900
>  X-Priority: 3
>  X-Library: Indy 8.0.25
>
>  t_Smtp.LocalIP
>
>This is one of the more baffling probes, since china9988@21cn.com gives
>NDRs--maybe really old spam software.  But all of the probes I've seen
>so far have the IP address of my honeypot sever in the subject line.  It
>makes sense--send out mail blindly, and anything you get back has the IP
>address of an open relay in the subject line.
>
>And yes, currently I only classify as "probe" and "everything else".  I
>do this with Maildir flags, though there's really no reason why it
>should have to be in Maildir format, aside from making it easy to view
>with mutt.
>
>Right now my probe detection logic needs work :)
>
>Neale
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim.one at comcast.net  Mon Jan 20 23:50:41 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Jan 20 23:51:14 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <3E2B9962.26334.308D0BD@localhost>
Message-ID: <LNBBLJKPBEHFEDALKOLCMEHEDJAB.tim.one@comcast.net>

[Richard Jowsey]
> I have a very large training corpus, so I'm seeing well-
> separated distributions of good versus spam probs, with a
> sprinkling of "unsures" scattered through the middle. An
> uncertain cutoff at 3 sigma from the means should work, but this
> notion needs some testing. That chi2 test is definitely on the
> drawing boards, even if only for comparison purposes...

Anthony Baxter has some plots of score distributions for Graham-combining,
Gary-combining and chi-combining here:

    http://spambayes.sourceforge.net/background.html

It's the sharpness and spread of the separation in chi- that's attractive.
Our experiments showed (most of mine were on a 34,000-msg database) that you
could usually pick cutoffs equally good under Gary-combining, but that it
took 3 decimal digits of precision to do so, best cutoffs kept shifting over
time (== amount of training data) and across test sets, and that it wasn't
possible to guess good values in advance.  In contrast, canned chi- cutoff
values with 1 decimal digit of precision worked well for just about
everyone.  The primary size-related (# of training msgs) effect I noticed is
that the chi- unsure range could be profitably shrunk the more msgs trained
on, but even if you didn't bother, your original cutoffs continued to work
well (although, as with Gary-combining, *optimal* cutoffs shifted too; chi-
degraded more gently if you didn't bother to change them).


From anthony at interlink.com.au  Tue Jan 21 16:35:59 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 21 00:38:26 2003
Subject: [Spambayes] FYI: Java implementation 
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEHEDJAB.tim.one@comcast.net> 
Message-ID: <200301210535.h0L5Zxc23853@localhost.localdomain>


>>> Tim Peters wrote
> It's the sharpness and spread of the separation in chi- that's attractive.
> Our experiments showed (most of mine were on a 34,000-msg database) that you
> could usually pick cutoffs equally good under Gary-combining, but that it
> took 3 decimal digits of precision to do so, best cutoffs kept shifting over
> time (== amount of training data) and across test sets, and that it wasn't
> possible to guess good values in advance.  

It's also worth noting that the optimal cutoff values before chi-combining
varied between 0.5 something and 0.7 for some people. It was impossible to
pick a number that worked for everyone.

(yes, I do plan to re-do the plots off the same data set at some point,
and add some for the CLM combiners... - if someone wants to do it first 
and save me the effort, it would be faaaabulous)

Anthony


From neale at woozle.org  Tue Jan 21 00:09:49 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 21 03:09:57 2003
Subject: [Spambayes] Something's still missing from hammiefilter
In-Reply-To: <w53vg0jxsmq.fsf@woozle.org> (Neale Pickett's message of "Mon,
 20 Jan 2003 16:18:21 -0800")
References: <15909.56641.568386.266344@montanaro.dyndns.org>
	<w53vg0jxsmq.fsf@woozle.org>
Message-ID: <w534r82zzxu.fsf@woozle.org>

Neale Pickett <neale@woozle.org> writes:

> They're exposed now.

Don't use the new options to hammiefilter.py just yet--they won't work.
I've got something with working options, as well as a new -t option to
filter and train in one step.  I'll check it in tomorrow after it's
handled a night of email and I'm sure it does what it says it does :)

Neale

From mwh at python.net  Tue Jan 21 10:28:55 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Jan 21 05:29:04 2003
Subject: [Spambayes] Re: FYI: Java implementation
References: <3E2B9962.26334.308D0BD@localhost>
	<LNBBLJKPBEHFEDALKOLCMEHEDJAB.tim.one@comcast.net>
Message-ID: <2mvg0issns.fsf@starship.python.net>

Tim Peters <tim.one@comcast.net> writes:

> [Richard Jowsey]
> > I have a very large training corpus, so I'm seeing well-
> > separated distributions of good versus spam probs, with a
> > sprinkling of "unsures" scattered through the middle. An
> > uncertain cutoff at 3 sigma from the means should work, but this
> > notion needs some testing. That chi2 test is definitely on the
> > drawing boards, even if only for comparison purposes...
> 
> Anthony Baxter has some plots of score distributions for Graham-combining,
> Gary-combining and chi-combining here:
> 
>     http://spambayes.sourceforge.net/background.html

I meant to say it when I first looked at that page, but seeing those
plots nearly made my eyeballs fall out.  Why does anyone still use
Graham-combining?

Cheers,
M.


From msergeant at startechgroup.co.uk  Tue Jan 21 10:46:31 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Tue Jan 21 05:46:32 2003
Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd)
In-Reply-To: <20030120222240.CE21D16F16@jmason.org>
Message-ID: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk>

On Monday, Jan 20, 2003, at 22:22 Europe/London, Justin Mason wrote:

> From:    "Neale Pickett" <neale@woozle.org>
>
> The first night I ran spampot on an IP with no DNS entry associated 
> with
> it, I got a probe after four hours.  After I'd fixed the probe relaying
> logic to relay that type of probe, it took all of ten hours for me to
> collect over 400MB of spam.

I had dinner with Neale where we discussed this. Interesting project 
(as is Jackpot, but err, Java. Ick). However the one downside of this 
was it was 400MB of the EXACT same email.

:-)

My guess is you'd need to put some sort of Razor-like signature 
checking in place (perhaps using Pyzor) to remove dupes.

Matt.


From jm at jmason.org  Tue Jan 21 11:20:53 2003
From: jm at jmason.org (Justin Mason)
Date: Tue Jan 21 06:20:54 2003
Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd) 
In-Reply-To: Message from Matt Sergeant <msergeant@startechgroup.co.uk> 
	<9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> 
Message-ID: <20030121112058.D815116F16@jmason.org>


Matt Sergeant said:
> My guess is you'd need to put some sort of Razor-like signature 
> checking in place (perhaps using Pyzor) to remove dupes.

Actually, I have some rough-but-working-well-enough perl code in
SpamAssassin CVS, in the "masses/corpora" dir, which does this.
"fuzzy-hash-maildir" is the script in question.  Here's how it works:

  - for each mail:

    - strip all HTML tags

    - strip text in "quotes" -- vars in javascript, etc.

    - remove words with ? marks inside them, possible encoded mail addrs

    - remove words with @ marks inside them, possible encoded mail addrs

    - remove lines that contain just a single string of non-white chars,
      possible hash busters or encoded mail addrs

    - split into an array of lines (NOT bytes, since spammers are using
      variable-length hash-busting strings)

    - divide into 4 blocks and hash them: hash1, hash2, hash3, hash4

    - output into associative arrays as
	hash1.hash2 -> filename
	hash1.hash2.hash3 -> filename
	hash1.hash2.hash3.hash4 -> filename
      (should probably use e.g. hash2.hash3.hash4 as well.  Note that
      hashbusters and encoded addrs generally appear in the first and/or
      last blocks.)

  - finally check those arrays for collisions and output these as "likely
    dups".

It works sufficiently well. ;)

--j.

From pje at telecommunity.com  Tue Jan 21 10:11:12 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Jan 21 10:11:48 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java implementation)
In-Reply-To: <2mvg0issns.fsf@starship.python.net>
References: <3E2B9962.26334.308D0BD@localhost>
 <LNBBLJKPBEHFEDALKOLCMEHEDJAB.tim.one@comcast.net>
Message-ID: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com>

At 10:28 AM 1/21/03 +0000, Michael Hudson wrote:
>Tim Peters <tim.one@comcast.net> writes:
> > Anthony Baxter has some plots of score distributions for Graham-combining,
> > Gary-combining and chi-combining here:
> >
> >     http://spambayes.sourceforge.net/background.html
>
>I meant to say it when I first looked at that page, but seeing those
>plots nearly made my eyeballs fall out.  Why does anyone still use
>Graham-combining?

Because nobody's seen the plots, obviously.  :)

I think what would be needed to change that would be:

1. A Spambayes release

2. A "spam shootout" wherein half a dozen Bayesian mail filters (e.g. 
Popfile, Mozilla, other...?)  are tested against the same corpus, using the 
cross-validation testing mechanism.

3a. Spambayes comes out on top, with a fraction of the error rate of 
others: publish the results, get a Slashdot story, and slashdot the project 
site.  :)

3b. Spambayes doesn't come out on top: find out why, fix the problem, go to 
step 3a.  :)


From msergeant at startechgroup.co.uk  Tue Jan 21 16:05:48 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Tue Jan 21 11:07:09 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation)
In-Reply-To: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com>
Message-ID: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk>

On Tuesday, Jan 21, 2003, at 15:11 Europe/London, Phillip J. Eby wrote:

> 2. A "spam shootout" wherein half a dozen Bayesian mail filters (e.g. 
> Popfile, Mozilla, other...?)  are tested against the same corpus, 
> using the cross-validation testing mechanism.
>
> 3a. Spambayes comes out on top, with a fraction of the error rate of 
> others: publish the results, get a Slashdot story, and slashdot the 
> project site.  :)
>
> 3b. Spambayes doesn't come out on top: find out why, fix the problem, 
> go to step 3a.  :)

Mozilla and SpamAssassin both copy their bayesian code from spambayes 
(including tokenisation ideas and combiners). If the results are 
different at all that's probably a bug somewhere.

It's nice to be proud of software, but when it's open source you kinda 
leave it wide open for us to nab your ideas ;-)

Matt.


From tim.one at comcast.net  Tue Jan 21 11:12:00 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 21 11:13:05 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <200301210535.h0L5Zxc23853@localhost.localdomain>
Message-ID: <BIEJKCLHCIOIHAGOKOLHCEJCEJAA.tim.one@comcast.net>

[Anthony Baxter]
> It's also worth noting that the optimal cutoff values before chi-combining
> varied between 0.5 something and 0.7 for some people. It was impossible to
> pick a number that worked for everyone.

Ah, memories <wink>.

> (yes, I do plan to re-do the plots off the same data set at some point,
> and add some for the CLM combiners... - if someone wants to do it first
> and save me the effort, it would be faaaabulous)

Assuming CLM refers to the three central-limit combining schemes, they never
got far enough to develop a rational notion of "score".  They were the first
schemes that "knew when they were confused", and that caught us by surprise:
the initial stabs at getting "a score" out of them were like
Graham-combining in that they were sometimes extremely certain of a wrong
answer.  It took a while to realize that, when this happened, an internal
(for example) spam score was 50 sdevs on the spam of the ham mean,
simultaneous with the internal ham score being 40 sdevs on the ham side of
the spam mean.  The overall result was extreme certainty that the thing was
spam, although the internal scores were certain it was neither.  Once we
figured that out, testing proceeded by producing one of exactly three
scores:  "it's ham", "it's spam", "I'm lost".  That's as far as they got, at
which point chi-combining appeared, also knowing when it was lost, but far
less problematic for training, and producing a "smooth" score naturally.

A CLM plot would consist of three vertical lines, and so be a bit confusing
<wink>.


From neale at woozle.org  Tue Jan 21 08:15:17 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 21 11:15:24 2003
Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd)
In-Reply-To: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk> (Matt
 Sergeant's message of "Tue, 21 Jan 2003 10:46:31 +0000")
References: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
Message-ID: <w531y36zdgq.fsf@woozle.org>

Matt Sergeant <msergeant@startechgroup.co.uk> writes:

> I had dinner with Neale where we discussed this. Interesting project
> (as is Jackpot, but err, Java. Ick). However the one downside of this
> was it was 400MB of the EXACT same email.

Psh, details.  :^)

But yes, it was 200MB of the exact same email (with templates filled
in), and then 200MB of another message.  I've since added in logic to
only store every 20th message sent over the same SMTP connection.  Now I
have a measley 2MB of spam.  But I imagine that will increase rapidly as
my probe detection gets better.  (I mean, I've only been running this
thing for a week ;)

I'm not sure if I need razor (or pyzor) just yet.  So far, spammers
will send all their mail over just a few connections, so it's effective
to only store specimens.  But I may need some signature-checking logic
soon.

Neale

From anthony at interlink.com.au  Wed Jan 22 03:18:36 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 21 11:20:18 2003
Subject: [Spambayes] FYI: Java implementation 
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHCEJCEJAA.tim.one@comcast.net> 
Message-ID: <200301211618.h0LGIai30812@localhost.localdomain>


>>> Tim Peters wrote
> A CLM plot would consist of three vertical lines, and so be a bit confusing
> <wink>.

Yes, but suggesting them _did_ get a nice simple summary about them out of
you, so it wasn't a complete loss :)

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From nas at python.ca  Tue Jan 21 08:31:48 2003
From: nas at python.ca (Neil Schemenauer)
Date: Tue Jan 21 11:25:44 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation)
In-Reply-To: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com>
	<34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
Message-ID: <20030121163148.GA15240@glacier.arctrix.com>

Matt Sergeant wrote:
> Mozilla and SpamAssassin both copy their bayesian code from spambayes 
> (including tokenisation ideas and combiners).

I, for one, am extremely pleased to hear that.  It would be a shame if
people kept using Paul Graham's original algorithm after all the work
that was put in improving Spambayes.  Despite what was said at the spam
conference, I think the algorithm is important.

> It's nice to be proud of software, but when it's open source you kinda 
> leave it wide open for us to nab your ideas ;-)

I think the concern was that people won't nab the ideas (because they
didn't know about them).

  Neil

From pje at telecommunity.com  Tue Jan 21 11:28:10 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Jan 21 11:28:55 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
  implementation)
In-Reply-To: <34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20030121112431.01ea3390@mail.telecommunity.com>

At 04:05 PM 1/21/03 +0000, Matt Sergeant wrote:
>Mozilla and SpamAssassin both copy their bayesian code from spambayes 
>(including tokenisation ideas and combiners)

Cool.  But then it sounds like the "heavy hitters" have already abandoned 
the Graham algorithm.


>It's nice to be proud of software, but when it's open source you kinda 
>leave it wide open for us to nab your ideas ;-)

That's the whole point of promoting Spambayes, really.  To get the "good 
stuff" into the hands of more people.  I'd rather see lots of programs 
using the more effective ideas, than have a bunch of non-programmers try 
less effective tools and swear off of "learning" spam filters because of it.


From mwh at python.net  Tue Jan 21 16:56:17 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Jan 21 11:56:26 2003
Subject: [Spambayes] Re: Promoting Spambayes (was Re: FYI: Java
	implementation)
References: <5.1.0.14.0.20030121100703.01ea7dd0@mail.telecommunity.com>
	<34AD58D0-2D5A-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
Message-ID: <2msmvmmoge.fsf@starship.python.net>

Matt Sergeant <msergeant@startechgroup.co.uk> writes:

> Mozilla and SpamAssassin both copy their bayesian code from spambayes 
> (including tokenisation ideas and combiners). If the results are 
> different at all that's probably a bug somewhere.

Really?  I knew SA did, but I hadn't heard anything about Mozilla.
I'm trying to find out one way or the other through bugzilla, but it
scares me :) I did find an open bug saying "you should use spambayes'
algorithm".

> It's nice to be proud of software, but when it's open source you kinda 
> leave it wide open for us to nab your ideas ;-)

Kinda the point, I'd say :-) Then you get to deal with the nasty
integration issues...

Cheers,
M.

-- 
  Gevalia is undrinkable low-octane see-through only slightly
  roasted bilge water. Compared to .us coffee it is quite
  drinkable.                                      -- M�ns Nilsson, asr


From tim.one at comcast.net  Tue Jan 21 12:03:20 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 21 12:04:29 2003
Subject: [Spambayes] Re: FYI: Java implementation
In-Reply-To: <2mvg0issns.fsf@starship.python.net>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEJHEJAA.tim.one@comcast.net>

[Michael Hudson, on the plots at
    http://spambayes.sourceforge.net/background.html
]

> I meant to say it when I first looked at that page, but seeing those
> plots nearly made my eyeballs fall out.  Why does anyone still use
> Graham-combining?

Perhaps because the "Plan for Spam" paper kept on describing it, and people
who tried it found that their first stab worked better than anything else
they had tried.  It took much testing on large and varied data before its
problems became clear.  Paul Graham has since discovered some of these on
his own, as he started getting his own false positives:

    http://www.paulgraham.com/better.html

Graham-combining has the advantage of being rigorously correct, to the
extent that its assumptions hold (word independence, and prior spam
probability of 0.5).  I can't really say what chi-combining produces in the
end, other than that "it's a score".  It's certainly not the probability
that a msg is spam.  Graham-combining does compute a spam probability, which
would be correct if only the world were nothing like it is <wink -- I simply
mean that the assumptions under which the calculation would be correct don't
hold in the real world>.

So it's explainable and works remarkably well out of the box.  Its problems
are more-or-less subtle, and people have little patience for subtleties.


From pje at telecommunity.com  Tue Jan 21 12:17:59 2003
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue Jan 21 12:18:29 2003
Subject: [Spambayes] Re: FYI: Java implementation
In-Reply-To: <BIEJKCLHCIOIHAGOKOLHKEJHEJAA.tim.one@comcast.net>
References: <2mvg0issns.fsf@starship.python.net>
Message-ID: <5.1.0.14.0.20030121121650.03b04e10@mail.telecommunity.com>

At 12:03 PM 1/21/03 -0500, Tim Peters wrote:

>So it's explainable and works remarkably well out of the box.  Its problems
>are more-or-less subtle, and people have little patience for subtleties.

Then let's hear it for the 'bots, who not only have patience for 
subtleties, but obsess on them as well!  :)


From jm at jmason.org  Tue Jan 21 17:11:52 2003
From: jm at jmason.org (Justin Mason)
Date: Tue Jan 21 12:33:23 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation) 
In-Reply-To: Message from Neil Schemenauer <nas@python.ca> 
	<20030121163148.GA15240@glacier.arctrix.com> 
Message-ID: <20030121171157.67BCD16F16@jmason.org>


Neil Schemenauer said:
> Matt Sergeant wrote:
> > Mozilla and SpamAssassin both copy their bayesian code from spambayes 
> > (including tokenisation ideas and combiners).
> 
> I, for one, am extremely pleased to hear that.  It would be a shame if
> people kept using Paul Graham's original algorithm after all the work
> that was put in improving Spambayes.  Despite what was said at the spam
> conference, I think the algorithm is important.

BTW it's worth noting we didn't just "nab" the ideas ;) Instead I
reimplemented based on descriptions, running a cross-validation test each
time, and threw in a few tokenization ideas of our own.  In most cases the
results indicated that SpamBayes' techniques are the most effective --
there were a few extras, like SpamAssassin tokenizing some headers that SB
doesn't (From etc.), and different S and X values, but for the most part
they're effectively the same.

The nice thing is that it means those techniques have been independently
verified by 2 parties -- in other words, a scientific process ;)

--j.

From noreply at sourceforge.net  Tue Jan 21 10:31:48 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue Jan 21 13:34:07 2003
Subject: [Spambayes] [ spambayes-Patches-670417 ] Allow the pop3 proxies to
	bind to specific addresses
Message-ID: <E18b3Bc-0004Dd-00@sc8-sf-web4.sourceforge.net>

Patches item #670417, was opened at 2003-01-18 20:06
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Tony Lownds (tonylownds)
Assigned to: Richie Hindle (richiehindle)
Summary: Allow the pop3 proxies to bind to specific addresses

Initial Comment:
This patch allows one to specify an IP address when specifying a port in the pop3proxy_ports setting.

This is useful for two reasons:

1. By binding to a loopback address, the pop3proxy cannot be contacted from outside machines. Providing this option improves security.

2. The mail client Eudora - which is quite popular - is unable to specify a different POP port for different POP accounts. This patch alllows Eudora to be used with spambayes with multiple POP accounts.

The implementation is fairly straightforward: any place a port was passed for binding, a pair of (address, port) is passed. In the two places a port was read (from a configuration file and from command line options), either an int or an address:int is accepted. Any place a port was turned into a string for printing, the (address, port) pair is turned into a suitable string.


----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-21 18:31

Message:
Logged In: YES 
user_id=85414

Many thanks for the patch, Tony - excellent job.  Checked in
as pop3proxy.py 1.38 and spambayes/Dibbler.py 1.2.


----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 14:28

Message:
Logged In: YES 
user_id=85414

If you can't upload them here, please email them to me.
Thanks.


----------------------------------------------------------------------

Comment By: Fran�ois Granger (fgranger)
Date: 2003-01-20 13:13

Message:
Logged In: YES 
user_id=86948

I asked Tony about this, he sent me the files. Can I upload them or forward them to you ?

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-20 11:35

Message:
Logged In: YES 
user_id=85414

Has SourceForge eaten the patch file?  It says
"No Files Currently Attached".


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=670417&group_id=61702

From francois.granger at free.fr  Tue Jan 21 19:39:44 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Tue Jan 21 13:40:25 2003
Subject: [Spambayes] Issue with pop3proxy
Message-ID: <a05200f09ba53403813b7@[192.168.1.20]>

I got this message in my mailbox. The strange thing is that the 
X-Spambayes-Classification: spam got added after the content, this is 
not normal. It happens on some messages from more than one sender but 
not all messages from the same sender on this mailing list. A quick 
comparison between the headers of two messages from same sender, one 
with proper X-Spambayes-Classification: in header and the other with 
the field at end of message show no easy difference.


==================================================
Return-Path: <alois-owner@medicalistes.org>
Received: from cancale.medicalistes.org (62.212.100.79) by 
smtp.laposte.net (6.0.053)
         id 3DF927E500486B4A; Tue, 21 Jan 2003 09:33:35 +0100
Received: from cancale.medicalistes.org (localhost [127.0.0.1])
	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) 
with ESMTP id h0L8XJOc031969;
	Tue, 21 Jan 2003 09:33:19 +0100
Received: (from sympa@localhost)
	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) id 
h0L8XJQe031967;
	Tue, 21 Jan 2003 09:33:19 +0100
X-Authentication-Warning: cancale.medicalistes.org: sympa set sender 
to alois-owner@medicalistes.org using -f
Received: from mel-rto4.wanadoo.fr (smtp-out-4.wanadoo.fr [193.252.19.23])
	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) 
with ESMTP id h0L8XGOc031964
	for <alois@medicalistes.org>; Tue, 21 Jan 2003 09:33:16 +0100
Received: from mel-rta6.wanadoo.fr (193.252.19.26) by 
mel-rto4.wanadoo.fr (6.7.015)
         id 3E0C33FD00EBE0FE for alois@medicalistes.org; Tue, 21 Jan 
2003 09:33:16 +0100
Received: from pc (193.250.146.197) by mel-rta6.wanadoo.fr (6.7.015)
         id 3E26CE21002ADE4B for alois@medicalistes.org; Tue, 21 Jan 
2003 09:33:15 +0100
Message-ID: <003e01c2c128$485b70c0$c592fac1@pc>
From: "Martine Lemaitre" <MARTINE.LEMAITRE@wanadoo.fr>
To: <alois@medicalistes.org>
References: <000901c2c117$26334ce0$03b5a8c0@duron800>
Subject: Re: [Alois] P?ter les plombs
Date: Tue, 21 Jan 2003 09:26:42 +0100
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0029_01C2C12F.35460480"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 5.00.2314.1300
X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
Reply-To: alois@medicalistes.org
X-Loop: alois@medicalistes.org
X-Sequence: 1253
Precedence: list
X-no-archive: yes
List-Id: <alois@medicalistes.org>

<x-html><!x-stuff-for-pete base="" src="" id="0" charset=""><!DOCTYPE 
HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META content="MSHTML 5.00.2314.1000" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY>
<DIV><FONT face=Arial size=2>

[HTML deleted...]

<BR></DIV></EM></FONT></FONT></FONT></BODY></HTML>
X-Spambayes-Classification: spam

</x-html>
==================================================
-- 
Recently using MacOSX.......

From nas at python.ca  Tue Jan 21 10:49:34 2003
From: nas at python.ca (Neil Schemenauer)
Date: Tue Jan 21 13:43:28 2003
Subject: [Spambayes] pushing back the cost of spam
In-Reply-To: <20030120003505.GB6862@glacier.arctrix.com>
References: <20030120001344.GA6862@glacier.arctrix.com>
	<20030120003505.GB6862@glacier.arctrix.com>
Message-ID: <20030121184934.GA15762@glacier.arctrix.com>

Neil Schemenauer wrote:
> (I'll try to run some tests).

See http://python.ca/nas/log/200301/index.html#21_001 for some
preliminary results.  In summary, it works.

  Neil

From neale at woozle.org  Tue Jan 21 10:45:26 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 21 13:45:35 2003
Subject: [Spambayes] Issue with pop3proxy
In-Reply-To: <a05200f09ba53403813b7@[192.168.1.20]>
 =?iso-8859-1?q?(Fran=E7ois?= Granger's message of "Tue, 21 Jan 2003
 19:39:44 +0100")
References: <a05200f09ba53403813b7@[192.168.1.20]>
Message-ID: <w53of6axry1.fsf@woozle.org>

Fran?ois Granger <francois.granger@free.fr> writes:

> I got this message in my mailbox. The strange thing is that the
> X-Spambayes-Classification: spam got added after the content, this is
> not normal. It happens on some messages from more than one sender but
> not all messages from the same sender on this mailing list. A quick
> comparison between the headers of two messages from same sender, one
> with proper X-Spambayes-Classification: in header and the other with the
> field at end of message show no easy difference.

Weird.  Perhaps this is what happens when the email module can't parse
the message.  If the headers really were word-wrapped like you sent
them, this is certainly the case.

>
>
>
>
> ==================================================
> Return-Path: <alois-owner@medicalistes.org>
> Received: from cancale.medicalistes.org (62.212.100.79) by
> smtp.laposte.net (6.0.053)
>          id 3DF927E500486B4A; Tue, 21 Jan 2003 09:33:35 +0100
> Received: from cancale.medicalistes.org (localhost [127.0.0.1])
> 	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with
> ESMTP id h0L8XJOc031969;
> 	Tue, 21 Jan 2003 09:33:19 +0100
> Received: (from sympa@localhost)
> 	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) id
> h0L8XJQe031967;
> 	Tue, 21 Jan 2003 09:33:19 +0100
> X-Authentication-Warning: cancale.medicalistes.org: sympa set sender to
> alois-owner@medicalistes.org using -f
> Received: from mel-rto4.wanadoo.fr (smtp-out-4.wanadoo.fr [193.252.19.23])
> 	by cancale.medicalistes.org (8.12.6/8.12.6/Debian-6Woody) with
> ESMTP id h0L8XGOc031964
> 	for <alois@medicalistes.org>; Tue, 21 Jan 2003 09:33:16 +0100
> Received: from mel-rta6.wanadoo.fr (193.252.19.26) by
> mel-rto4.wanadoo.fr (6.7.015)
>          id 3E0C33FD00EBE0FE for alois@medicalistes.org; Tue, 21 Jan
> 2003 09:33:16 +0100
> Received: from pc (193.250.146.197) by mel-rta6.wanadoo.fr (6.7.015)
>          id 3E26CE21002ADE4B for alois@medicalistes.org; Tue, 21 Jan
> 2003 09:33:15 +0100
> Message-ID: <003e01c2c128$485b70c0$c592fac1@pc>
> From: "Martine Lemaitre" <MARTINE.LEMAITRE@wanadoo.fr>
> To: <alois@medicalistes.org>
> References: <000901c2c117$26334ce0$03b5a8c0@duron800>
> Subject: Re: [Alois] P?ter les plombs
> Date: Tue, 21 Jan 2003 09:26:42 +0100
> MIME-Version: 1.0
> Content-Type: multipart/alternative;
> 	boundary="----=_NextPart_000_0029_01C2C12F.35460480"
> X-Priority: 3
> X-MSMail-Priority: Normal
> X-Mailer: Microsoft Outlook Express 5.00.2314.1300
> X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2314.1300
> Reply-To: alois@medicalistes.org
> X-Loop: alois@medicalistes.org
> X-Sequence: 1253
> Precedence: list
> X-no-archive: yes
> List-Id: <alois@medicalistes.org>
>
> <x-html><!x-stuff-for-pete base="" src="" id="0" charset=""><!DOCTYPE
> HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
> <HTML><HEAD>
> <META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
> <META content="MSHTML 5.00.2314.1000" name=GENERATOR>
> <STYLE></STYLE>
> </HEAD>
> <BODY>
> <DIV><FONT face=Arial size=2>
>
> [HTML deleted...]
>
> <BR></DIV></EM></FONT></FONT></FONT></BODY></HTML>
> X-Spambayes-Classification: spam
>
> </x-html>
> ==================================================
> -- 
> Recently using MacOSX.......
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes

From tim.one at comcast.net  Tue Jan 21 13:47:39 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 21 13:49:55 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <200301211618.h0LGIai30812@localhost.localdomain>
Message-ID: <BIEJKCLHCIOIHAGOKOLHKEKBEJAA.tim.one@comcast.net>

[Tim]
> A CLM plot would consist of three vertical lines, and so be a
> bit confusing <wink>.

[Anthony Baxter]
> Yes, but suggesting them _did_ get a nice simple summary about them out of
> you, so it wasn't a complete loss :)

Indeed not.  Suck all you can out of my memory before I die.  A thousand
generations will pass before anyone can reconstruct it all from the code
comments <wink>.


From tim.one at comcast.net  Tue Jan 21 14:38:12 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Jan 21 14:46:03 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation)
In-Reply-To: <20030121171157.67BCD16F16@jmason.org>
Message-ID: <BIEJKCLHCIOIHAGOKOLHCEKJEJAA.tim.one@comcast.net>

[Justin Mason]
> BTW it's worth noting we didn't just "nab" the ideas ;)

I would have <wink>.

> Instead I reimplemented based on descriptions, running a cross-validation
> test each time, and threw in a few tokenization ideas of our own.

One thing we found, on rare occasions, is that a change vetted as winner or
loser via a CV run on one set of test data turned out to be neutral on
somebody else's test data, or (very rarely) even gave an opposite result.
Some small amount of that is expected by chance, of course, but multiple
test sets (in addition to slicing & dicing a single test set) is an
important check too.

> In most cases the results indicated that SpamBayes' techniques are the
> most effective -- there were a few extras, like SpamAssassin tokenizing
> some headers that SB doesn't (From etc.),

There are generally options to change all that.  I became inactive as this
project was transitioning from mostly-research to mostly-deployment, and the
defaults still reflect the more severe "purity needs" of research.  For
example, virtually all the ham in my main test set had a common "From" line
(it was generated by a news->email gateway) but none of my spam had that
>From line.  So "From" was ignored by default.  In the Outlook 2000 client I
use every day, though, From To Cc Sender and Reply-To are all tokenized.

> and different S and X values,

Note that Greg Louis has done a lot of good research on those, in connection
with bogofilter.

> but for the most part they're effectively the same.
>
> The nice thing is that it means those techniques have been independently
> verified by 2 parties -- in other words, a scientific process ;)

It's appreciated!  That's more important than the specific algorithms used.
Given a proper test framework, the data will eventually tell you what does
and doesn't work; without proper statistical testing it's all guessing.  A
problem is what to do when error rates get too low to measure reliably.  My
previous life in speech recognition didn't prepare me for that one <wink>.


From skip at pobox.com  Tue Jan 21 14:10:05 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 21 15:10:27 2003
Subject: [Spambayes] Re: [SAtalk] spampot -- spam honeypot server (fwd)
In-Reply-To: <w531y36zdgq.fsf@woozle.org>
References: <9A0FE27E-2D2D-11D7-AE99-0003939CB5D8@startechgroup.co.uk>
        <w531y36zdgq.fsf@woozle.org>
Message-ID: <15917.43293.153834.759378@montanaro.dyndns.org>

    Neale> But I may need some signature-checking logic soon.

I'm modifying utilities/loosecksum.py to incorporate many of the ideas
Justin posted today about his fuzzy-hash-maildir script in SpamAssassin.
Should have it checked in later today.  Instead of returning a single md5
checksum it will return four separated by dots, one for each of the four
blocks he mentioned.  If you want to consider pieces you can just split on
the dots.

Skip

From neale at woozle.org  Tue Jan 21 12:10:26 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 21 15:10:33 2003
Subject: [Spambayes] degeneration
Message-ID: <w53lm1exo0d.fsf@woozle.org>

So one of the more interesting things I left the spam conference with
was Paul Graham's notion of "degeneration".  The idea is simple.  If you
tokenize "FREE!!!!", but that's not in your wordlist, try the following
until you get a match:

  FREE!!!!
  Free!!!!
  free!!!!
  FREE!!!
  Free!!!
  free!!!
  FREE!!
  Free!!
  free!!
  FREE!
  Free!
  free!
  FREE
  Free
  free

He claims this helps a lot.  I'm currently in the midst of getting
hammiefilter to integrate more cleanly with Gnus and Mutt, and merging
mboxtrain and hammiebulk.  But this should be relatively easy to
implement and test.  Any takers?

Neale

From vanhorn at whidbey.com  Tue Jan 21 12:12:28 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Tue Jan 21 15:12:35 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Javaimplementation)
References: <20030121171157.67BCD16F16@jmason.org>
Message-ID: <3E2DA9AC.A1CE12F6@whidbey.com>

You used the past tense there, is this really in Spam Assassin now? I just
upgraded SA last week and didn't notice any references to a Spambayesian
filter and would dearly love to turn it on if it's in there somewhere.

Van

Justin Mason wrote:

> Neil Schemenauer said:
> > Matt Sergeant wrote:
> > > Mozilla and SpamAssassin both copy their bayesian code from spambayes
> > > (including tokenisation ideas and combiners).
> >
> > I, for one, am extremely pleased to hear that.  It would be a shame if
> > people kept using Paul Graham's original algorithm after all the work
> > that was put in improving Spambayes.  Despite what was said at the spam
> > conference, I think the algorithm is important.
>
> BTW it's worth noting we didn't just "nab" the ideas ;) Instead I
> reimplemented based on descriptions, running a cross-validation test each
> time, and threw in a few tokenization ideas of our own.  In most cases the
> results indicated that SpamBayes' techniques are the most effective --
> there were a few extras, like SpamAssassin tokenizing some headers that SB
> doesn't (From etc.), and different S and X values, but for the most part
> they're effectively the same.
>
> The nice thing is that it means those techniques have been independently
> verified by 2 parties -- in other words, a scientific process ;)
>
> --j.
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes

--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------


From hupp at upl.cs.wisc.edu  Tue Jan 21 15:27:41 2003
From: hupp at upl.cs.wisc.edu (Adam Hupp)
Date: Tue Jan 21 16:36:39 2003
Subject: [Spambayes] degeneration
In-Reply-To: <w53lm1exo0d.fsf@woozle.org>
References: <w53lm1exo0d.fsf@woozle.org>
Message-ID: <20030121212741.GA3849@upl.cs.wisc.edu>

On Tue, Jan 21, 2003 at 12:10:26PM -0800, Neale Pickett wrote:
> 
> He claims this helps a lot.  I'm currently in the midst of getting
> hammiefilter to integrate more cleanly with Gnus and Mutt, and merging
> mboxtrain and hammiebulk.  But this should be relatively easy to
> implement and test.  Any takers?

I'm curious what you're doing for the mutt integration.  I was playing
with spambayes a few months ago and worked up an (IMO) fairly useful
mutt integration.  It was a combination of procmail rules, mutt macros
and changes to hammiefilter that allowed marking and retraining on
Unsures, retraining on mistakes, automatic training, etc.  All this
was put on hold by a desire to graduate but now I'm excited to start
working on it again.

-Adam

From drew at poured.net  Tue Jan 21 16:22:03 2003
From: drew at poured.net (Drew Raines)
Date: Tue Jan 21 17:22:32 2003
Subject: [Spambayes] Re: degeneration
References: <w53lm1exo0d.fsf@woozle.org>
Message-ID: <l6v7kcygn3o.fsf@poured.net>

Neale Pickett <neale@woozle.org> writes:

> I'm currently in the midst of getting hammiefilter to integrate
> more cleanly with Gnus and Mutt, and merging mboxtrain and
> hammiebulk.  But this should be relatively easy to implement and
> test.  Any takers?

Me.  Before you get too far, though, make sure you look at spam.el
in Oorts of late.

-Drew


From jm at jmason.org  Wed Jan 22 00:26:54 2003
From: jm at jmason.org (Justin Mason)
Date: Tue Jan 21 19:26:39 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Javaimplementation)
	
In-Reply-To: Message from "G. Armour Van Horn" <vanhorn@whidbey.com> 
   of "Tue, 21 Jan 2003 12:12:28 PST." <3E2DA9AC.A1CE12F6@whidbey.com> 
Message-ID: <20030122002659.A96C316F18@jmason.org>


G. Armour Van Horn said:
> You used the past tense there, is this really in Spam Assassin now? I just
> upgraded SA last week and didn't notice any references to a Spambayesian
> filter and would dearly love to turn it on if it's in there somewhere.

No, it's still in CVS -- but mucho rescoring and GA running going on at
the mo' for a release RSN.

--j.

From neale at woozle.org  Tue Jan 21 16:51:57 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 21 19:52:07 2003
Subject: [Spambayes] Re: degeneration
In-Reply-To: <l6v7kcygn3o.fsf@poured.net> (Drew Raines's message of "Tue, 21
 Jan 2003 16:22:03 -0600")
References: <w53lm1exo0d.fsf@woozle.org> <l6v7kcygn3o.fsf@poured.net>
Message-ID: <w538yxexaz6.fsf@woozle.org>

Drew Raines <drew@poured.net> writes:

> Neale Pickett <neale@woozle.org> writes:
>
>> I'm currently in the midst of getting hammiefilter to integrate
>> more cleanly with Gnus and Mutt, and merging mboxtrain and
>> hammiebulk.  But this should be relatively easy to implement and
>> test.  Any takers?
>
> Me.  Before you get too far, though, make sure you look at spam.el
> in Oorts of late.

Yeah, I did check out spam.el, but it's not exactly what I want.
Namely, two keybindings for "refile as spam" and "refile as ham".  The
rest will be done by procmail.  Anything further would need spam.el.

Neale

From seant at webreply.com  Tue Jan 21 19:54:51 2003
From: seant at webreply.com (Sean True)
Date: Tue Jan 21 20:24:59 2003
Subject: [Spambayes] Declaring victory
Message-ID: <MJEHLHJKGINLONDMMKNEGEJOJAAA.seant@webreply.com>

Tim wrote:
>It's appreciated!  That's more important than the specific algorithms used.
>Given a proper test framework, the data will eventually tell you what does
>and doesn't work; without proper statistical testing it's all guessing.  A
>problem is what to do when error rates get too low to measure reliably.  My
>previous life in speech recognition didn't prepare me for that one <wink>.

Geez, Tim, even I got prepared for that one at Dragon: when the error rate
gets low enough, you declare victory and move on. Before they throw you in
jail for fraud!

Say. You already did that.

Just-winking-this-one-time-ly yours,

Sean

-------
Sean True
WebReply, Inc.


From tim_one at email.msn.com  Tue Jan 21 21:29:36 2003
From: tim_one at email.msn.com (Tim Peters)
Date: Tue Jan 21 21:30:19 2003
Subject: [Spambayes] Declaring victory
In-Reply-To: <MJEHLHJKGINLONDMMKNEGEJOJAAA.seant@webreply.com>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEINDJAB.tim_one@email.msn.com>

[Tim]
> A problem is what to do when error rates get too low to measure
> reliably.  My previous life in speech recognition didn't prepare me for
> that one <wink>.

[Sean True]
> Geez, Tim, even I got prepared for that one at Dragon: when the error
> rate gets low enough, you declare victory and move on. Before they throw
> you in jail for fraud!
>
> Say. You already did that.

I learned more from our betters, too:  I never took spambayes public, and
the untold millions I made off teasing potential investors are safely tucked
away in offshore accounts.  Now it's back to the quieter life of smuggling
drugs.

still-missing-the-action-though-ly y'rs  - tim


From tim_one at email.msn.com  Tue Jan 21 21:47:12 2003
From: tim_one at email.msn.com (Tim Peters)
Date: Tue Jan 21 21:47:54 2003
Subject: [Spambayes] degeneration
In-Reply-To: <w53lm1exo0d.fsf@woozle.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCOEIPDJAB.tim_one@email.msn.com>

[Neale Pickett]
> So one of the more interesting things I left the spam conference with
> was Paul Graham's notion of "degeneration".  The idea is simple.  If you
> tokenize "FREE!!!!", but that's not in your wordlist, try the following
> until you get a match:
>
>   FREE!!!!
>   Free!!!!
>   free!!!!
>   FREE!!!
>   Free!!!
>   free!!!
>   FREE!!
>   Free!!
>   free!!
>   FREE!
>   Free!
>   free!
>   FREE
>   Free
>   free

We fold case so it's easier for us (just 5 possibilities).

> He claims this helps a lot.

Ya, but he's still artificially boosting ham counts by a factor of 2 -- it's
small wonder then that some other gimmick is needed to counteract the bias.

> I'm currently in the midst of getting hammiefilter to integrate more
> cleanly with Gnus and Mutt, and merging mboxtrain and hammiebulk.  But
> this should be relatively easy to implement and test.  Any takers?

It wouldn't be hard to implement, and I agree it's interesting.  So far as
testing goes, I don't have any test data that *can* show an improvement
anymore, so I lost interest in tweaking the algorithms.  Do you have test
sets that could show improvements?  If so, you can eyeball the mistakes and
usually make a good guess as to whether a specific new gimmick would help
them.  What you can't usually guess is whether the gimmick would hurt the
correctly classified msgs more than it helps the mistakes.


From neale at woozle.org  Tue Jan 21 21:20:34 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 00:20:41 2003
Subject: [Spambayes] degeneration
In-Reply-To: <20030121212741.GA3849@upl.cs.wisc.edu> (Adam Hupp's message of
 "Tue, 21 Jan 2003 15:27:41 -0600")
References: <w53lm1exo0d.fsf@woozle.org>
	<20030121212741.GA3849@upl.cs.wisc.edu>
Message-ID: <w53el75lpzx.fsf@woozle.org>

Adam Hupp <hupp@upl.cs.wisc.edu> writes:

> I'm curious what you're doing for the mutt integration.  I was playing
> with spambayes a few months ago and worked up an (IMO) fairly useful
> mutt integration.  It was a combination of procmail rules, mutt macros
> and changes to hammiefilter that allowed marking and retraining on
> Unsures, retraining on mistakes, automatic training, etc.

That, in a nutshell, is what I'm doing for mutt integration.  But I'm a
Gnus user; someone more familiar with mutt's innards would be a better
candidate to write up mutt integration instructions :)

I'm going to go ahead and check in my new hammiefilter.py with big
[EXPERIMENTAL] disclaimers by most of the options--I'm not sure they
actually do what they say they do yet.

But the basic idea is to run "hammiefilter.py -t" from procmail, so that
it trains on its decisions.  Then you can tweak it in your MUA by
hitting some magic key which will pipe it to "hammiefilter.py -s -f" for
spam incognito, and "hammiefilter.py -g -f" for false negatives.  The
"-t" step inserts a header telling how it trained itself.  The "-s" and
"-g" options, when they see that header, will untrain, then retrain.

So what we need is some sort of mutt magic (note: not "butt magic") to
pipe a message out to something, then remove it, all in one keystroke.
The pipe would look something like "hammiefilter.py -g -f | procmail".
Or is there a more mutty way to do it?

> All this was put on hold by a desire to graduate but now I'm excited
> to start working on it again.

Yes, the desire to graduate does get strong at times, but eventually it
always subsides.  I hope that for you it subsided because you actually
graduated ;)

Neale

From richard at jowsey.com  Wed Jan 22 16:28:56 2003
From: richard at jowsey.com (Richard Jowsey)
Date: Wed Jan 22 00:29:38 2003
Subject: [Spambayes] FYI: Java implementation
In-Reply-To: <LNBBLJKPBEHFEDALKOLCMEHEDJAB.tim.one@comcast.net>
References: <3E2B9962.26334.308D0BD@localhost>
Message-ID: <3E2EC6C8.20276.54112D8@localhost>

> > That chi2 test is definitely on
> > the drawing boards, even if only for comparison purposes...
> 
> Anthony Baxter has some plots of score distributions for
> Graham-combining, Gary-combining and chi-combining here:
>   http://spambayes.sourceforge.net/background.html

Damn nice graphics! And a good explanation for the advantages of 
the chi-squared "combining and scoring" treatment. OK, so I'm a 
believer! :-)

> It's the sharpness and spread of the separation in chi- that's
> attractive.

Indeed! I've now mostly finished my core word-tokenization and 
training logic, and am presently running sweeps across my 
good/spam corpus to complete populating the database. I'll be re-
working the comparator classes presently, to incorporate this 
chi-2 math. Will keep everyone posted as to progress...

Cheers,
Richard

----------------------------------------------------------------
"Once the number three, being the third number, be reached, then 
lobbest thou thy Holy Hand Grenade of Antioch towards thou foe, 
who being naughty in my sight, shall snuff it!"


From neale at woozle.org  Tue Jan 21 21:53:55 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 00:53:58 2003
Subject: [Spambayes] degeneration
In-Reply-To: <LNBBLJKPBEHFEDALKOLCOEIPDJAB.tim_one@email.msn.com> ("Tim
 Peters"'s message of "Tue, 21 Jan 2003 21:47:12 -0500")
References: <LNBBLJKPBEHFEDALKOLCOEIPDJAB.tim_one@email.msn.com>
Message-ID: <w53bs29logc.fsf@woozle.org>

"Tim Peters" <tim_one@email.msn.com> writes:

> It wouldn't be hard to implement, and I agree it's interesting.  So
> far as testing goes, I don't have any test data that *can* show an
> improvement anymore, so I lost interest in tweaking the algorithms.
> Do you have test sets that could show improvements?

Well, no, actually.  I keep forgetting this thing is so good.  I have
been getting more false negatives lately than I'd like, but I'm sure
that's because I keep fouling up my wordlist while testing hammiefilter
options.

But I'll be sure and let everyone know if it turns out that I actually
do have a difficult set of test data.

Neale

From hupp at upl.cs.wisc.edu  Wed Jan 22 00:16:45 2003
From: hupp at upl.cs.wisc.edu (Adam Hupp)
Date: Wed Jan 22 01:16:49 2003
Subject: [Spambayes] degeneration
In-Reply-To: <w53el75lpzx.fsf@woozle.org>
References: <w53lm1exo0d.fsf@woozle.org>
	<20030121212741.GA3849@upl.cs.wisc.edu> <w53el75lpzx.fsf@woozle.org>
Message-ID: <20030122061645.GA6147@upl.cs.wisc.edu>

On Tue, Jan 21, 2003 at 09:20:34PM -0800, Neale Pickett wrote:
> 
> I'm going to go ahead and check in my new hammiefilter.py with big
> [EXPERIMENTAL] disclaimers by most of the options--I'm not sure they
> actually do what they say they do yet.
> 
> But the basic idea is to run "hammiefilter.py -t" from procmail, so that
> it trains on its decisions.  Then you can tweak it in your MUA by
> hitting some magic key which will pipe it to "hammiefilter.py -s -f" for
> spam incognito, and "hammiefilter.py -g -f" for false negatives.  The
> "-t" step inserts a header telling how it trained itself.  The "-s" and
> "-g" options, when they see that header, will untrain, then retrain.
> 
> So what we need is some sort of mutt magic (note: not "butt magic") to
> pipe a message out to something, then remove it, all in one keystroke.
> The pipe would look something like "hammiefilter.py -g -f | procmail".
> Or is there a more mutty way to do it?

It looks like your hammiefilter uses almost the same interface that
mine does, so the integration should be a snap.  Since I can't add and
modify arbitrary headers from within mutt (someone please correct me
if I'm wrong) I'm using one of the flags ("important" I believe) to
indicate an unsure in need of training.  

.procmailrc:

:0 fhb w:hammie
| /home/hupp/spambayes/hammiefilter.py --filter --train

:0:
* ^X-Hammie-Disposition: Yes
caughtspam

:0 fh w:
* ^X-Hammie-Disposition: Unsure
|formail -i "X-Status: F"


.muttrc:

folder-hook . "macro index F '|hammiefilter.py --reverse --train --good\n <save-message>=caughtspam\n'"
folder-hook . "macro pager F '|hammiefilter.py --reverse --train --good\n <save-message>=caughtspam\n'"
folder-hook caughtspam "macro index F '|hammiefilter.py --reverse --train --spam\r <save-message>!\r'"
folder-hook caughtspam "macro pager F '|hammiefilter.py --reverse --train --spam\r <save-message>!\r'"
macro pager H "|hammiefilter.py --train --good\r <clear-flag>!"
macro index H "|hammiefilter.py --train --good\r <clear-flag>!"
macro pager S "|hammiefilter.py --train --spam\r <clear-flag>!\r <save-message>=caughtspam\r"
macro index S "|hammiefilter.py --train --spam\r <clear-flag>!\r <save-message>=caughtspam\r"
color index red black "~h 'X-Hammie-Disposition: Unsure' ~F"


This puts messages scored as spam into the caughtspam folder.  If you
are in the caughtspam folder and type "F" (for false) it will untrain
as spam, retrain as ham, and move it to the mail spool.  If you are in
any other folder it does the opposite and moves into caughtspam.
Unsure messages show up as red in the index; "H" or "S" trains and
removes the flag.  I'm not positive the Unsure flagging works
entirely correctly, it's been a while.


> Yes, the desire to graduate does get strong at times, but eventually it
> always subsides.  I hope that for you it subsided because you actually
> graduated ;)

Today, actually.  Now I have time for more important matters such as
ridding my mailbox of spam.

-Adam

From anthony at interlink.com.au  Wed Jan 22 19:08:37 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Jan 22 03:11:11 2003
Subject: [Spambayes] KMail integration?
Message-ID: <200301220808.h0M88cm13824@localhost.localdomain>


Has anyone had thoughts about KMail integration? (The KDE mailer).

I don't use it, but have a number of colleagues that do, and they'd
like something that's easy to use for spam killing...

Anthony


From tdickenson at devmail.geminidataloggers.co.uk  Wed Jan 22 08:41:43 2003
From: tdickenson at devmail.geminidataloggers.co.uk (Toby Dickenson)
Date: Wed Jan 22 03:41:46 2003
Subject: [Spambayes] KMail integration?
In-Reply-To: <200301220808.h0M88cm13824@localhost.localdomain>
References: <200301220808.h0M88cm13824@localhost.localdomain>
Message-ID: <200301220841.43243.tdickenson@devmail.geminidataloggers.co.uk>

On Wednesday 22 January 2003 8:08 am, Anthony Baxter wrote:
> Has anyone had thoughts about KMail integration? (The KDE mailer).
>
> I don't use it, but have a number of colleagues that do, and they'd
> like something that's easy to use for spam killing...

I use KMail with procmail. procmail adds the X-Hammie-Disposition header, and 
KMail filters using it.

Im not sure if that qualifies as "integration"


From msergeant at startechgroup.co.uk  Wed Jan 22 10:04:09 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Wed Jan 22 05:04:09 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation)
In-Reply-To: <5.1.0.14.0.20030121112431.01ea3390@mail.telecommunity.com>
Message-ID: <D94F4A0A-2DF0-11D7-AE99-0003939CB5D8@startechgroup.co.uk>

On Tuesday, Jan 21, 2003, at 16:28 Europe/London, Phillip J. Eby wrote:

>> It's nice to be proud of software, but when it's open source you 
>> kinda leave it wide open for us to nab your ideas ;-)
>
> That's the whole point of promoting Spambayes, really.  To get the 
> "good stuff" into the hands of more people.  I'd rather see lots of 
> programs using the more effective ideas, than have a bunch of 
> non-programmers try less effective tools and swear off of "learning" 
> spam filters because of it.

True, but I can also see the flip side of it - it's almost always 
better to have a number of different tools implementing different 
algorithms because then the spammers have to work a *lot* harder to get 
around them.

Matt.


From msergeant at startechgroup.co.uk  Wed Jan 22 10:08:53 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Wed Jan 22 05:08:53 2003
Subject: [Spambayes] Promoting Spambayes (was Re: FYI: Java
	implementation) 
In-Reply-To: <20030121171157.67BCD16F16@jmason.org>
Message-ID: <8304A044-2DF1-11D7-AE99-0003939CB5D8@startechgroup.co.uk>

On Tuesday, Jan 21, 2003, at 17:11 Europe/London, Justin Mason wrote:

> Neil Schemenauer said:
>> Matt Sergeant wrote:
>>> Mozilla and SpamAssassin both copy their bayesian code from spambayes
>>> (including tokenisation ideas and combiners).
>>
>> I, for one, am extremely pleased to hear that.  It would be a shame if
>> people kept using Paul Graham's original algorithm after all the work
>> that was put in improving Spambayes.  Despite what was said at the 
>> spam
>> conference, I think the algorithm is important.
>
> BTW it's worth noting we didn't just "nab" the ideas ;)

Well I did, and then I gave SA most of my code ;-)

Matt.


From skip at pobox.com  Wed Jan 22 08:00:54 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 22 09:01:10 2003
Subject: [Spambayes] KMail integration?
In-Reply-To: <200301220808.h0M88cm13824@localhost.localdomain>
References: <200301220808.h0M88cm13824@localhost.localdomain>
Message-ID: <15918.42006.932397.960128@montanaro.dyndns.org>


    Anthony> Has anyone had thoughts about KMail integration?

No, but if it understands POP I suspect your colleagues could just use
pop3proxy.

Skip

From m2 at plusseven.com  Wed Jan 22 15:37:19 2003
From: m2 at plusseven.com (Alex Polite)
Date: Wed Jan 22 09:37:40 2003
Subject: [Spambayes] dumbdbm faster than bsddb3
Message-ID: <20030122143719.GA2540@matijek>

I moved from spamcan to spambayes today and wasted a couple hours
profiling hammie.py

<snip>
profile.run("spambayes.hammiebulk.main()", '/tmp/stats')
<snip>

I ran this on approximately 2000 messages and aggregated the stats. 
The entire run was 496 CPU seconds.

When looking at the profiling information I realized that I was using
dumbdbm, which is supposed to very slow. I installed bsddb3, rebuilt
my db and rerun the profiling tests.

The entire run was now 520 CPU seconds, a 4.8% increase.

So it seems like "stupid beats smart" goes for speed optimizations to.

Can anyone corroborate this?

-- 

Alex Polite
http://plusseven.com/gpg

From drew at poured.net  Wed Jan 22 09:29:29 2003
From: drew at poured.net (Drew Raines)
Date: Wed Jan 22 10:29:22 2003
Subject: [Spambayes] Does spambayes train on its own headers?
Message-ID: <l6vy95dfbja.fsf@poured.net>

Since my corpus is relatively small (on the order of hundreds for
spam and many hundreds for ham), I get false negatives fairly
frequently.

This doesn't bother me; I just move them to my spam folder where my
hammie cron trains on them daily.  These old spams, though, have
X-Spambayes-Classification and X-Hammie-Debug headers which could
skew statistics in .hammiedb.  

Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does
hammiefilter know not to look at hammie_header_name and
hammie_debug_header_name when training?

-Drew


From noreply at sourceforge.net  Wed Jan 22 07:28:08 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 10:40:20 2003
Subject: [Spambayes] 
 [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content
Message-ID: <E18bMnQ-0007lj-00@sc8-sf-web3.sourceforge.net>

Bugs item #672489, was opened at 2003-01-22 16:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Problems with unallowed chars in XML content

Initial Comment:
The attached patch fixes problems with subjects like
the following:

'Valentines Day Special \x96 2 bikinis for the pric...'

When you try to review such a message, you get an XML
parsing error (note the \x96).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

From noreply at sourceforge.net  Wed Jan 22 07:33:03 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 10:40:25 2003
Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py
Message-ID: <E18bMsB-0008Dw-00@sc8-sf-web4.sourceforge.net>

Bugs item #672495, was opened at 2003-01-22 16:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Files not installed by setup.py

Initial Comment:
Patch:

===================================================================
RCS file: /cvsroot/spambayes/spambayes/setup.py,v
retrieving revision 1.13
diff -u -r1.13 setup.py
--- setup.py    17 Jan 2003 06:45:36 -0000      1.13
+++ setup.py    22 Jan 2003 15:28:05 -0000
@@ -39,8 +39,12 @@
            'pop3proxy.py',
            'proxytrainer.py',
            'proxytee.py',
+           'OptionConfig.py',
           ],
-        packages = [ 'spambayes', ],
+        packages = [
+           'spambayes',
+           'spambayes.resources',
+        ],
         classifiers = [
             'Development Status :: 4 - Beta',
             'Environment :: Console',

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

From neale at woozle.org  Wed Jan 22 08:20:53 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 11:21:01 2003
Subject: [Spambayes] Does spambayes train on its own headers?
In-Reply-To: <l6vy95dfbja.fsf@poured.net> (Drew Raines's message of "Wed, 22
 Jan 2003 09:29:29 -0600")
References: <l6vy95dfbja.fsf@poured.net>
Message-ID: <w531y35xije.fsf@woozle.org>

Drew Raines <drew@poured.net> writes:

> Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does
> hammiefilter know not to look at hammie_header_name and
> hammie_debug_header_name when training?

hammiefilter doesn't do anything special WRT its own headers.  But
unless you changed your bayescustomize.ini file, the tokenizer will skip
all the X-Spambayes-* headers, so you're okay.

Of course, I could be reading the default options incorrectly...

From skip at pobox.com  Wed Jan 22 10:25:56 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 22 11:26:05 2003
Subject: [Spambayes] Does spambayes train on its own headers?
In-Reply-To: <l6vy95dfbja.fsf@poured.net>
References: <l6vy95dfbja.fsf@poured.net>
Message-ID: <15918.50708.875665.899376@montanaro.dyndns.org>


    Drew> Do I need to add those to safe_headers in $BAYESCUSTOMIZE, or does
    Drew> hammiefilter know not to look at hammie_header_name and
    Drew> hammie_debug_header_name when training?

Drew,

I believe spambayes ignores its own headers.  Just the same, I strip them
using unheader.py.  Here's my training script:

    #!/bin/bash

    export BAYESCUSTOMIZE=$HOME/hammie.opt
    cd ~/tmp

    # touch the messages up a bit to avoid spurious "clues"
    unheader.py -p 'X-VM|X-Hammie|X-Spam' newham > newham.clean
    unheader.py -p 'X-VM|X-Hammie|X-Spam' newspam > newspam.clean

    # do the deed
    hammie.py -d -p ~/hammie.db -g newham.clean -s newspam.clean

    # save the files for later retraining
    echo "" >> newham.clean.save
    cat newham.clean >> newham.clean.save
    rm newham newham.clean

    echo "" >> newspam.clean.save
    cat newspam.clean >> newspam.clean.save
    rm newspam newspam.clean

I save ham and spam to ~/tmp/{newham,newspam}.

Skip


From neale at woozle.org  Wed Jan 22 08:37:56 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 11:38:01 2003
Subject: [Spambayes] packaging question
Message-ID: <w53smvlw36j.fsf@woozle.org>

I have an emacs lisp file, spambayes.el, which integrates spambayes into
Gnus.

I'd like to rename the hammie/ directory to contrib/, and put my
spambayes.el (as well as an example muttrc) in there.

Any objections?

Neale

From skip at pobox.com  Wed Jan 22 10:50:53 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 22 11:51:04 2003
Subject: [Spambayes] packaging question
In-Reply-To: <w53smvlw36j.fsf@woozle.org>
References: <w53smvlw36j.fsf@woozle.org>
Message-ID: <15918.52205.545824.761182@montanaro.dyndns.org>


    Neale> I'd like to rename the hammie/ directory to contrib/, and put my
    Neale> spambayes.el (as well as an example muttrc) in there.

    Neale> Any objections?

No, but I'd prefer it if you coaxed the SF folks into renaming it so CVS
history info is retained.

Skip


From francois.granger at free.fr  Wed Jan 22 19:02:46 2003
From: francois.granger at free.fr (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Wed Jan 22 13:03:03 2003
Subject: [Spambayes] Congratulations
Message-ID: <BA549B55.61EBE%francois.granger@free.fr>

Today I got the latest CVS version on my work Mac (Cube MacOS 9.1 192MB
RAM).

I copied my bayescustomize.ini file and my db file from my previous setup
(dated December 2) and ran the new version through pop3proxy.py.

Et voil? ! [1]

It worked like a charm.

By the way, I think that the released version should ship with a minimal
bayescustomize.ini file loaded for pop3proxy use with a fake server. People
will have an easier time to replace this with their real server name.
Something like:

========================================
[pop3proxy]
pop3proxy_persistent_storage_file = hammie.db
pop3proxy_servers = pop.yourisp.com, pop.otherisp.com
pop3proxy_ports = 110, 1110
# Replace the values pop.yourisp.com, pop.otherisp.com by you real servers.
# In you mail app, as pop server, put
# 127.0.0.1 for the account pop.yourisp.com and
# 127.0.0.1:1110 for the account pop.otherisp.com
# and you are done. If you uses Eudora, see documentation.

[globals]
dbm_type = best
verbose = False

[html_ui]
html_ui_launch_browser = True
html_ui_port = 8880
html_ui_allow_remote_connections = True
========================================

Anyway, I don't think that Spambayes on OS 9 will ever be a hit because:
- it is really slow and slow down mail retrieval (thanks cooperative
  multitasking where some does not cooperate enough)
- It needs a lot of memory (I gave it 25 MB) and since memory allocation
  is not dynamic on MacOS 9, your are stuck with less memory or you have
  to launch it only when needed, which diminish the usefulness.

[1] As americans say... ;-)
-- 
Le courrier est un moyen de communication. Les gens devraient
se poser des questions sur les implications politiques des choix (ou non
choix) de leurs outils et technologies. Pour des courriers propres :
<http://marc.herbert.free.fr/mail/> -- <http://minilien.com/?IXZneLoID0>


From python-spambayes at discworld.dyndns.org  Wed Jan 22 12:14:17 2003
From: python-spambayes at discworld.dyndns.org (Charles Cazabon)
Date: Wed Jan 22 13:11:39 2003
Subject: [Spambayes] Congratulations
In-Reply-To: <BA549B55.61EBE%francois.granger@free.fr>;
	from francois.granger@free.fr on Wed, Jan 22, 2003 at 07:02:46PM +0100
References: <BA549B55.61EBE%francois.granger@free.fr>
Message-ID: <20030122121417.C21326@discworld.dyndns.org>

Fran?ois Granger <francois.granger@free.fr> wrote:
> 
> By the way, I think that the released version should ship with a minimal
> bayescustomize.ini file loaded for pop3proxy use with a fake server. People
> will have an easier time to replace this with their real server name.
> Something like:
> 
> ========================================
> [pop3proxy]
> pop3proxy_persistent_storage_file = hammie.db
> pop3proxy_servers = pop.yourisp.com, pop.otherisp.com

If this idea is chosen, please use one of the domains reserved for this, like
"example.org".  "yourisp.com" and "otherisp.com" are available for people to
register, and if they were my domains, I wouldn't appreciate the extra
traffic.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                 <python-spambayes@discworld.dyndns.org>
GPL'ed software available at:     http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------

From noreply at sourceforge.net  Wed Jan 22 09:46:31 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 13:12:34 2003
Subject: [Spambayes] 
 [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content
Message-ID: <E18bOxL-0003aH-00@sc8-sf-web1.sourceforge.net>

Bugs item #672489, was opened at 2003-01-22 15:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
>Assigned to: Richie Hindle (richiehindle)
Summary: Problems with unallowed chars in XML content

Initial Comment:
The attached patch fixes problems with subjects like
the following:

'Valentines Day Special \x96 2 bikinis for the pric...'

When you try to review such a message, you get an XML
parsing error (note the \x96).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

From richie at entrian.com  Wed Jan 22 18:18:36 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Jan 22 13:19:05 2003
Subject: [Spambayes] Issue with pop3proxy
In-Reply-To: <w53of6axry1.fsf@woozle.org>
References: <a05200f09ba53403813b7@[192.168.1.20]>
	<w53of6axry1.fsf@woozle.org>
Message-ID: <qfmt2vo2p3v289bkjmtc7kuocs8aud75td@4ax.com>


[Fran?ois]
> the X-Spambayes-Classification: spam got added after the content

[Neale]
> Perhaps this is what happens when the email module can't parse the message.

No, the POP3 proxy doesn't use 'email' when adding the
X-Spambayes-Classification header, exactly because there are messages that
it can't parse.  It splits the headers from the body like this:

    headers, body = re.split(r'\n\r?\n', messageText, 1)

so it's difficult to know how it can fail.  Perhaps the messages uses only
'\r's to terminate the headers.

Fran?ois, could you do me a favour?  Could you send me an exact copy of one
of these messages?  Send it as a binary attachment (eg. in a zip file), so
that nothing mucks about with the line endings.  You should find it in one
of your cache directories.  Thanks.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Wed Jan 22 18:18:45 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Jan 22 13:19:13 2003
Subject: [Spambayes] Congratulations
In-Reply-To: <BA549B55.61EBE%francois.granger@free.fr>
References: <BA549B55.61EBE%francois.granger@free.fr>
Message-ID: <4nnt2vcd05qgg9m6ldlea194pnc6doo05f@4ax.com>


[Fran?ois]
> It worked like a charm.

Great!  Another satisfied customer.  8-)

> By the way, I think that the released version should ship with a minimal
> bayescustomize.ini file loaded for pop3proxy use with a fake server. People
> will have an easier time to replace this with their real server name.

This is a good idea, but the obvious defaults are different for different
platforms, which makes things difficult.  On Windows, the proxy should run
on port 110 because non-root processes can do that, and it saves having to
reconfigure your email client to use a non-default port.  On Unix, using
port 110 means running as root, and possibly conflicting with an existing
POP3 server (which you're much more likely to find on unix than on
Windows), so it should default to something 1110 instead.  Awkward,
inconsistent, potentially confusing.  (I'm including MacOS X in with Unix
here - I assume that's correct?)

Have you looked at the web configuration page?  That attempts to explain
how to configure the POP3 proxy, and should be easier than modifying
bayescustomize.ini (though it doesn't talk about privileged ports).  We
should encourage POP3 proxy users to set up via that rather than via
hand-editing bayescustomize.ini.

-- 
Richie Hindle
richie@entrian.com


From noreply at sourceforge.net  Wed Jan 22 10:34:52 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 13:40:41 2003
Subject: [Spambayes] 
 [ spambayes-Bugs-672489 ] Problems with unallowed chars in XML content
Message-ID: <E18bPi8-0008I3-00@sc8-sf-web3.sourceforge.net>

Bugs item #672489, was opened at 2003-01-22 15:28
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
Assigned to: Richie Hindle (richiehindle)
Summary: Problems with unallowed chars in XML content

Initial Comment:
The attached patch fixes problems with subjects like
the following:

'Valentines Day Special \x96 2 bikinis for the pric...'

When you try to review such a message, you get an XML
parsing error (note the \x96).


----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-22 18:34

Message:
Logged In: YES 
user_id=85414

Many thanks, J�rgen.  Checked in with PyMeldLite.py 1.4.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672489&group_id=61702

From noreply at sourceforge.net  Wed Jan 22 10:43:05 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 13:40:46 2003
Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py
Message-ID: <E18bPq5-0000HH-00@sc8-sf-web3.sourceforge.net>

Bugs item #672495, was opened at 2003-01-22 15:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Files not installed by setup.py

Initial Comment:
Patch:

===================================================================
RCS file: /cvsroot/spambayes/spambayes/setup.py,v
retrieving revision 1.13
diff -u -r1.13 setup.py
--- setup.py    17 Jan 2003 06:45:36 -0000      1.13
+++ setup.py    22 Jan 2003 15:28:05 -0000
@@ -39,8 +39,12 @@
            'pop3proxy.py',
            'proxytrainer.py',
            'proxytee.py',
+           'OptionConfig.py',
           ],
-        packages = [ 'spambayes', ],
+        packages = [
+           'spambayes',
+           'spambayes.resources',
+        ],
         classifiers = [
             'Development Status :: 4 - Beta',
             'Environment :: Console',

----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-22 18:43

Message:
Logged In: YES 
user_id=85414

You're dead right about spambayes.resources, but I'm not
convinced we should be installing OptionConfig.py now that
it's been folded into the main pop3proxy web interface.  I asked
on the list whether anyone thought we should leave it in with the
other scripts and got no replies.  I'm tempted to move it into the
spambayes package, from where pop3proxy.py can import it.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

From richie at entrian.com  Wed Jan 22 19:11:11 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Jan 22 14:11:38 2003
Subject: [Spambayes] packaging question
In-Reply-To: <w53smvlw36j.fsf@woozle.org>
References: <w53smvlw36j.fsf@woozle.org>
Message-ID: <duqt2v01aita6sq5vjq1d03jpsp388av8l@4ax.com>


[Neale]
> I'd like to rename the hammie/ directory to contrib/, and put my
> spambayes.el (as well as an example muttrc) in there.

Good plan.  We should possibly ship it with the release as well - the
documentation will eventually refer to things like your muttrc, so we
should be shipping them.  We don't have to install any scripts, just copy
the contrib directory into the installation so that people have it to refer
to.

Well done on the muttrc - you've saved me a lot of work there.  And Don
Marti will be pleased.  8-)

-- 
Richie Hindle
richie@entrian.com


From noreply at sourceforge.net  Wed Jan 22 12:22:58 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed Jan 22 15:31:05 2003
Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py
Message-ID: <E18bROk-0003Hx-00@sc8-sf-web1.sourceforge.net>

Bugs item #672495, was opened at 2003-01-22 16:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
Assigned to: Nobody/Anonymous (nobody)
Summary: Files not installed by setup.py

Initial Comment:
Patch:

===================================================================
RCS file: /cvsroot/spambayes/spambayes/setup.py,v
retrieving revision 1.13
diff -u -r1.13 setup.py
--- setup.py    17 Jan 2003 06:45:36 -0000      1.13
+++ setup.py    22 Jan 2003 15:28:05 -0000
@@ -39,8 +39,12 @@
            'pop3proxy.py',
            'proxytrainer.py',
            'proxytee.py',
+           'OptionConfig.py',
           ],
-        packages = [ 'spambayes', ],
+        packages = [
+           'spambayes',
+           'spambayes.resources',
+        ],
         classifiers = [
             'Development Status :: 4 - Beta',
             'Environment :: Console',

----------------------------------------------------------------------

>Comment By: J�rgen Hermann (jhermann)
Date: 2003-01-22 21:22

Message:
Logged In: YES 
user_id=39128

The current problem is the import in line 153 of pop3proxy:

from OptionConfig import OptionsConfigurator

Moving OptionConfig into the package is surely the best fix,
including adapting the above import.

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-22 19:43

Message:
Logged In: YES 
user_id=85414

You're dead right about spambayes.resources, but I'm not
convinced we should be installing OptionConfig.py now that
it's been folded into the main pop3proxy web interface.  I asked
on the list whether anyone thought we should leave it in with the
other scripts and got no replies.  I'm tempted to move it into the
spambayes package, from where pop3proxy.py can import it.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

From BPettersen at NAREX.com  Wed Jan 22 18:43:54 2003
From: BPettersen at NAREX.com (Bjorn Pettersen)
Date: Wed Jan 22 20:57:03 2003
Subject: [Spambayes] I did something stupid...
Message-ID: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE15E@admin56.narex.com>

...after setting up spambayes with Outlook XP (training, telling it to
watch the Inbox and move spam), I decided the icons were too far to the
right so I right-clicked on the toolbar, dragged them down to the next
line, and then Outlook froze. In particular the customize dialog was
unresponsive (although I could still move icons around on the toolbar).
I was forced to shut Outlook down through taskmanager...

Now whenever I start Outlook, I see the message below from
win32traceutil.py, and the taskbar icons are non-functional. I couldn't
find any clues on how to uninstall and try again (I did run addin.py
--unregister, and it said it was successful, but still the same message
after running addin.py again). 

Any help or pointers to documentation would be greatly appreciated.

-- bjorn

Outlook Spam Addin module loading
SpamAddin - Connecting to Outlook
Either bayes database or message database is missing - creating new
Bayes database initialized with 0 spam and 0 good messages
pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
275, in _Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
280, in _invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, args, None,
None)
  File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
562, in _invokeex_

    return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags,
args, kwArgs, serv
iceProvider)
  File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
510, in _invokeex_

    return apply(func, args)
  File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 511, in
OnSelectionChange
    self.SetupUI()
  File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 435, in
SetupUI
    Tag = "SpamBayes.Manager")
  File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", line 470, in
_AddControl
    item = parent.Controls.Add(Type=control_type, Temporary=True)
  File "C:\Python22\\lib\site-packages\win32com\client\__init__.py",
line 369, in __getatt
r__
    return apply(self._ApplyTypes_, args)
  File "C:\Python22\\lib\site-packages\win32com\client\__init__.py",
line 363, in _ApplyTy
pes_
    return self._get_good_object_(apply(self._oleobj_.InvokeTypes,
(dispid, 0, wFlags, ret
Type, argTypes) + args), user, resultCLSID)
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, None,
None, None, 0, -21474
67259), None)

From mhammond at skippinet.com.au  Thu Jan 23 13:28:41 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Jan 22 21:29:05 2003
Subject: [Spambayes] I did something stupid...
In-Reply-To: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE15E@admin56.narex.com>
Message-ID: <01ac01c2c287$25c05390$530f8490@eden>

> ...after setting up spambayes with Outlook XP (training, telling it to
> watch the Inbox and move spam), I decided the icons were too 
> far to the
> right so I right-clicked on the toolbar, dragged them down to the next
> line, and then Outlook froze. In particular the customize dialog was
> unresponsive (although I could still move icons around on the 

I will try to repro this, but am busy for the next few days.

You may like to try the customize dialog, and hitting "Reset" on the
toolbars.

Otherwise, for the time being, wrap an exception handler around:

>   File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", 
> line 470, in
> _AddControl
>     item = parent.Controls.Add(Type=control_type, Temporary=True)

And just ignore it for now.

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2683 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030123/ed140805/winmail.bin
From anthony at interlink.com.au  Thu Jan 23 13:32:10 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Jan 22 21:33:52 2003
Subject: [Spambayes] Congratulations 
In-Reply-To: <4nnt2vcd05qgg9m6ldlea194pnc6doo05f@4ax.com> 
Message-ID: <200301230232.h0N2WAY01294@localhost.localdomain>


>>> Richie Hindle wrote
> 
> [Fran?ois]
> > By the way, I think that the released version should ship with a minimal
> > bayescustomize.ini file loaded for pop3proxy use with a fake server. People
> > will have an easier time to replace this with their real server name.
> 
> This is a good idea, but the obvious defaults are different for different
> platforms, which makes things difficult. 

Which reminds me - I'd like to make it so bayescustomize.ini can be found
in a couple of places other than the current directory, or the env var.

For instance, on Unix, $HOME/.spambayes/bayescustomize.ini 
I'm not sure where on Windows or MacOS. Suggestions?

This removes that whole "strange behaviour if you're in the wrong place"
thing...

Anthony

-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From neale at woozle.org  Wed Jan 22 20:43:51 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 23:44:00 2003
Subject: [Spambayes] packaging question
In-Reply-To: <15918.52205.545824.761182@montanaro.dyndns.org> (Skip
 Montanaro's message of "Wed, 22 Jan 2003 10:50:53 -0600")
References: <w53smvlw36j.fsf@woozle.org>
	<15918.52205.545824.761182@montanaro.dyndns.org>
Message-ID: <w53of68mq60.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

> No, but I'd prefer it if you coaxed the SF folks into renaming it so
> CVS history info is retained.

Since the only history on any of the files in that directory is the
initial checkin message, I'm just going to rename.  Skip, you have
permission to throttle me if this hacks you off ;)

Neale

From neale at woozle.org  Wed Jan 22 20:48:09 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 22 23:48:16 2003
Subject: [Spambayes] Congratulations
In-Reply-To: <200301230232.h0N2WAY01294@localhost.localdomain> (Anthony
 Baxter's message of "Thu, 23 Jan 2003 13:32:10 +1100")
References: <200301230232.h0N2WAY01294@localhost.localdomain>
Message-ID: <w53lm1cmpyu.fsf@woozle.org>

Anthony Baxter <anthony@interlink.com.au> writes:

> Which reminds me - I'd like to make it so bayescustomize.ini can be
> found in a couple of places other than the current directory, or the
> env var.

I've already checked something like this in to Options.py.  Great minds
think alike!


And so do ours!

;-)

Neale


From neale at woozle.org  Wed Jan 22 22:03:55 2003
From: neale at woozle.org (Neale Pickett)
Date: Thu Jan 23 01:03:58 2003
Subject: [Spambayes] can I change the options to hammie.py and hammiebulk.py?
Message-ID: <w53znps8ks4.fsf@woozle.org>

I know a lot of you out there in hammieland are still using hammie.py
instead of hammiebulk.py, and that's okay.  But I think it's time I
reined everyone in here, so I'm proposing some changes.

I want to:

1. Do away with hammie.py in the top directory.  The hammie.py module
   would still exist, you'd just have to call hammiebulk.py directly.
2. Move hammiebulk.py into the top directory.
3. Totally rearrange the options to hammiebulk.  Specifically, I want to
   make it more like hammiefilter.  To wit:

"""Usage: %(program)s [OPTION]...

[OPTION] is one of:
    -h
        show usage and exit
    -x
        show some usage examples and exit
    -d DBFILE
        use database in DBFILE
    -D PICKLEFILE
        use pickle (instead of database) in PICKLEFILE
    -n
        create a new database
*   -f
        filter (default if no processing options are given)
*   -t
        [EXPERIMENTAL] filter and train based on the result (you must
        make sure to untrain all mistakes later)
*   -g
        [EXPERIMENTAL] (re)train as a good (ham) message
*   -s
        [EXPERIMENTAL] (re)train as a bad (spam) message
*   -G
        [EXPERIMENTAL] untrain ham (only use if you've already trained
        this message)
*   -S
        [EXPERIMENTAL] untrain spam (only use if you've already trained
        this message)
"""

   I'd provide a (-F, --force-train) option to force training even if a
   trained header is found, and a (-N, --no-trained-header) option to
   prevent writing out trained headers.

4. Remove mboxtrain.py, as hammiebulk.py would replace it.

Glutton for punishment,

Neale

From neale at woozle.org  Wed Jan 22 22:12:13 2003
From: neale at woozle.org (Neale Pickett)
Date: Thu Jan 23 01:12:16 2003
Subject: [Spambayes] degeneration
In-Reply-To: <20030122061645.GA6147@upl.cs.wisc.edu> (Adam Hupp's message of
 "Wed, 22 Jan 2003 00:16:45 -0600")
References: <w53lm1exo0d.fsf@woozle.org>
	<20030121212741.GA3849@upl.cs.wisc.edu> <w53el75lpzx.fsf@woozle.org>
	<20030122061645.GA6147@upl.cs.wisc.edu>
Message-ID: <w53wukw8kea.fsf@woozle.org>

Adam Hupp <hupp@upl.cs.wisc.edu> writes:

> It looks like your hammiefilter uses almost the same interface that
> mine does, so the integration should be a snap.

Killer!  So with the arguments on hammiefilter.py that I just checked
in, your rules would look like this:

> folder-hook . "macro index S '|hammiefilter.py -s\n <save-message>=caughtspam\n'"
> folder-hook . "macro pager S '|hammiefilter.py -s\n <save-message>=caughtspam\n'"
> folder-hook . "macro index H '|hammiefilter.py -g\r <save-message>!\r'"
> folder-hook . "macro pager H '|hammiefilter.py -g\r <save-message>!\r'"
> color index red black "~h 'X-Hammie-Disposition: spam' ~F"

And then you run all your mail through "hammiefilter.py -t" from
procmail.  Does that look good to you?

Perhaps there should also be a "delete as spam" button.  Or maybe that's
what S should do, since folks probably aren't going to want to keep spam
around.

Neale

From sjoerd at acm.org  Thu Jan 23 10:14:45 2003
From: sjoerd at acm.org (Sjoerd Mullender)
Date: Thu Jan 23 04:14:52 2003
Subject: [Spambayes] packaging question
In-Reply-To: <15918.52205.545824.761182@montanaro.dyndns.org> 
References: <w53smvlw36j.fsf@woozle.org> 
            <15918.52205.545824.761182@montanaro.dyndns.org> 
Message-ID: <20030123091445.DC33674D14@indus.ins.cwi.nl>

On Wed, Jan 22 2003 Skip Montanaro wrote:

> 
>     Neale> I'd like to rename the hammie/ directory to contrib/, and put my
>     Neale> spambayes.el (as well as an example muttrc) in there.
> 
>     Neale> Any objections?
> 
> No, but I'd prefer it if you coaxed the SF folks into renaming it so CVS
> history info is retained.

When you do that everybody who updates their repository will get error
messages from CVS and the old hammie directory will not have been
removed from the checked-out copy.

-- Sjoerd Mullender <sjoerd@acm.org>

From francois.granger at laposte.net  Thu Jan 23 10:32:45 2003
From: francois.granger at laposte.net (Fran=?ISO-8859-1?B?5w==?=ois Granger)
Date: Thu Jan 23 06:43:40 2003
Subject: [Spambayes] Congratulations 
In-Reply-To: <200301230232.h0N2WAY01294@localhost.localdomain>
Message-ID: <BA55754D.61F94%francois.granger@laposte.net>

on 23/01/03 3:32, Anthony Baxter at anthony@interlink.com.au wrote:

> For instance, on Unix, $HOME/.spambayes/bayescustomize.ini
> I'm not sure where on Windows or MacOS. Suggestions?

MacOS 9: there is no rule apart the "preference" folder, but I don't think
it is a good idea. With no explicit path, the file will be launched from the
script os.getcwd() folder. With a path the script will launch it from
anywhere but path notation on MacOS 9 are really strange for Unix users.

MacOS X: Same as Unix mainly, but can work like MacOS 9.


From noreply at sourceforge.net  Wed Jan 22 20:57:43 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 06:43:48 2003
Subject: [Spambayes] [ spambayes-Bugs-650496 ] hammie.py discards headers
Message-ID: <E18bZQt-00076A-00@sc8-sf-web3.sourceforge.net>

Bugs item #650496, was opened at 2002-12-08 10:39
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=650496&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Works For Me
Priority: 5
Submitted By: Simon Baatz (bnomis26)
Assigned to: Neale Pickett (npickett)
Summary: hammie.py discards headers

Initial Comment:
When feeding the (malformed) attached mail to hammie.py
in filter mode, the headers of the mail are not present
in the output.

Command line:

python hammie.py -f -d -p ~/mail/hammie.db < msg.lAoM

Output:

X-Spambayes-Classification: ham; 0.00


--Amazon.com_multipart_boundary____________
Content-Type: text/plain; charset=iso-8859-1
Vielen Dank f�r Ihre Bestellung bei Amazon.de.

--Amazon.com_multipart_boundary____________
Content-Type: text/html; charset=iso-8859-1


<html>
</html>

--Amazon.com_multipart_boundary____________--


----------------------------------------------------------------------

>Comment By: Neale Pickett (npickett)
Date: 2003-01-22 20:57

Message:
Logged In: YES 
user_id=619391

Seems to be okay now...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=650496&group_id=61702

From noreply at sourceforge.net  Wed Jan 22 21:01:48 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 06:43:54 2003
Subject: [Spambayes] 
 [ spambayes-Patches-639122 ] hammie: ignore emails older than n days
Message-ID: <E18bZUq-0007EW-00@sc8-sf-web3.sourceforge.net>

Patches item #639122, was opened at 2002-11-15 13:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

Category: None
Group: None
Status: Open
>Resolution: Later
Priority: 5
Submitted By: Jason Hildebrand (jdhildeb)
Assigned to: Neale Pickett (npickett)
Summary: hammie: ignore emails older than n days

Initial Comment:
Since your documentation stresses the importance of
training using only relatively recent emails, I thought
a good way to do this would be to have hammie do it for me.

So I added a new configuration option:

[Hammie]
# when training, hammie will ignore messages older than
this number of days.
# i.e. set to 365 to ignore messages older than one year.
# Set to 0 to disable any filtering by date.
ignore_old_messages: 0

The patch also modifies Hammie to output the number of
messages it read/ignored for each mail file it processes.

This option might also prove useful for doing
incremental training (i.e. set up cron to train once a
week, and set ignore_old_messages to 7).


----------------------------------------------------------------------

>Comment By: Neale Pickett (npickett)
Date: 2003-01-22 21:01

Message:
Logged In: YES 
user_id=619391

Jason, does the current mboxtrain.py script do enough of
this functionality for you, or would you still like to see
us work by the Recieved header?  I suspect it might be good
enough...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

From mhammond at skippinet.com.au  Thu Jan 23 23:25:10 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 23 07:26:06 2003
Subject: [Spambayes] Outlook: new folder selector code
Message-ID: <000d01c2c2da$7a72fb10$530f8490@eden>

I have just checked in some significant changes to the "folder selector"
dialog - the cute dialog that presents the list of folders in various parts
of the UI.

Inspired by patches from Tony Meyer, everything should look the same, but
behind the scenes there are 2 major changes that will be of advantage to
Exchange Server users:

* We are back to an extended MAPI (ie, fast) version of the code.  We
believe we have identified and fixed the problem that prevented this code
from working with an Exchange Server before.

* The folder hierarchy is no longer walked fully before the dialog is
created.  The folder hierarchy is only walked as a node is expanded.  This
should make the dialog come up much faster if you have a large "public
folders" hierarchy - these folders are not walked until you actually expand
them (and even then, only walked one level down at that time)

So, in summary, I hope that non Exchange Server users see slight performance
gain displaying this dialog, and Exchange Server users see a significant
one.  Please let me know if you have any problems.

Mark.


From jchilders at smartbusinessware.com  Thu Jan 23 09:18:40 2003
From: jchilders at smartbusinessware.com (Jeff Childers)
Date: Thu Jan 23 09:39:39 2003
Subject: [Spambayes] Outlook 2000 Install not working - "No module named
	spambayes"
Message-ID: <9C067B997594D3118BF50050DA24EA1097D5E4@CCG2>

Hi all,

I've installed the SpamBayes files and run addin.py. Seems to register OK
(at least the trace comments suggest it is so). Then, when loading Outlook,
SB fails to load and the trace collector shows the following error:

>>> Outlook Spam Addin module loading
>>> SpamAddin - Connecting to Outlook
>>> Created new configuration file
'C:\Python22\Lib\site-packages\SpamBayes\default_configuration.pck'
>>> Traceback (most recent call last):
>>>   File "C:\Python22\\lib\site-packages\win32com\universal.py", line 150,
in dispatch
>>>     retVal = ob._InvokeEx_(meth.dispid, 0, pythoncom.DISPATCH_METHOD,
args, None, None)
>>>   File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
322, in _InvokeEx_
>>>     return self._invokeex_(dispid, lcid, wFlags, args, kwargs,
serviceProvider)
>>>   File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
562, in _invokeex_
>>>     return DesignatedWrapPolicy._invokeex_( self, dispid, lcid, wFlags,
args, kwArgs, serviceProvider)
>>>   File "C:\Python22\\lib\site-packages\win32com\server\policy.py", line
510, in _invokeex_
>>>     return apply(func, args)
>>>   File "D:\Data\Apps\Python\SpamBayes\addin.py", line 594, in
OnConnection
>>>     self.manager = manager.GetManager(application)
>>>   File "D:\Data\Apps\Python\SpamBayes\manager.py", line 335, in
GetManager
>>>     _mgr = BayesManager(outlook=outlook, verbose=verbose)
>>>   File "D:\Data\Apps\Python\SpamBayes\manager.py", line 79, in __init__
>>>     import_core_spambayes_stuff(self.ini_filename)
>>>   File "D:\Data\Apps\Python\SpamBayes\manager.py", line 52, in
import_core_spambayes_stuff
>>>     from spambayes import classifier
>>> exceptions.ImportError: No module named spambayes

I originally installed the SpamBayes folder on
[D:\Data\Apps\Python\SpamBayes]. After the load failed the first time, I
copied the SB folder to [C:\Python22\Lib\Site-Packages\SpamBayes]. I then
re-ran addin.py from the new folder on C, same result when starting Outlook.
Curiously, the error output above still refers to the old location on D. I
can find no configuration file that contains this information. Why is it
still looking at D, or more importantly, how can I reset the configuration?
Finally, what have I done wrong and how do I correct it to get SB working
under Outlook?

OS: WinXP
Outlook 2000
Win32Comall-150

Thanks for any help.

JC


From anthony at interlink.com.au  Fri Jan 24 01:46:45 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 23 09:48:21 2003
Subject: [Spambayes] Outlook 2000 Install not working - "No module named
	spambayes" 
In-Reply-To: <9C067B997594D3118BF50050DA24EA1097D5E4@CCG2> 
Message-ID: <200301231446.h0NEkj810300@localhost.localdomain>


>>> Jeff Childers wrote
> Hi all,
> 
> I've installed the SpamBayes files and run addin.py. Seems to register OK
> (at least the trace comments suggest it is so). Then, when loading Outlook,
> SB fails to load and the trace collector shows the following error:

When you say you "installed the SpamBayes files", what do you mean? Did
you run "setup.py install"? It looks like (from your traceback) that
the spambayes module didn't get installed.

I could imagine that a failed run of addin.py would leave things in a...
not good... state.

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From jchilders_98 at yahoo.com  Thu Jan 23 08:44:19 2003
From: jchilders_98 at yahoo.com (J. Childers)
Date: Thu Jan 23 11:44:23 2003
Subject: [Spambayes] [Outlook2000 Install Problem] Ok, got it  :)
Message-ID: <20030123164419.83201.qmail@web13902.mail.yahoo.com>

Ahh, I had to do -two- installs: first SpamBayes *then* the addin.py.  For some reason I thought
the Outlook2000 piece included SB.

Now I have good messages in the trace and some new options in OL2K. Thanks Anthony!

Regards,

JC

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

From noreply at sourceforge.net  Thu Jan 23 10:42:57 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 13:53:43 2003
Subject: [Spambayes] 
 [ spambayes-Patches-639122 ] hammie: ignore emails older than n days
Message-ID: <E18bmJV-0001V5-00@sc8-sf-web2.sourceforge.net>

Patches item #639122, was opened at 2002-11-15 13:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

Category: None
Group: None
Status: Open
Resolution: Later
Priority: 5
Submitted By: Jason Hildebrand (jdhildeb)
Assigned to: Neale Pickett (npickett)
Summary: hammie: ignore emails older than n days

Initial Comment:
Since your documentation stresses the importance of
training using only relatively recent emails, I thought
a good way to do this would be to have hammie do it for me.

So I added a new configuration option:

[Hammie]
# when training, hammie will ignore messages older than
this number of days.
# i.e. set to 365 to ignore messages older than one year.
# Set to 0 to disable any filtering by date.
ignore_old_messages: 0

The patch also modifies Hammie to output the number of
messages it read/ignored for each mail file it processes.

This option might also prove useful for doing
incremental training (i.e. set up cron to train once a
week, and set ignore_old_messages to 7).


----------------------------------------------------------------------

>Comment By: T. Alexander Popiel (popiel)
Date: 2003-01-23 10:42

Message:
Logged In: YES 
user_id=632302

Parsing the topmost received header for the date is a very
valuable tool for maintaining limited database size.  It's a
key feature of my bulkgraph.py script (over and above
dealing with my non-standard everything vs. spam folders). 
Count this as another vote to include such filtering... even
though my peculiar folder setup precludes me from using
mboxtrain.

----------------------------------------------------------------------

Comment By: Neale Pickett (npickett)
Date: 2003-01-22 21:01

Message:
Logged In: YES 
user_id=619391

Jason, does the current mboxtrain.py script do enough of
this functionality for you, or would you still like to see
us work by the Recieved header?  I suspect it might be good
enough...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

From BPettersen at NAREX.com  Thu Jan 23 14:13:04 2003
From: BPettersen at NAREX.com (Bjorn Pettersen)
Date: Thu Jan 23 16:28:30 2003
Subject: [Spambayes] I did something stupid...
Message-ID: <60FB8BB7F0EFC7409B75EEEC13E2019201BFE19B@admin56.narex.com>

> From: Mark Hammond [mailto:mhammond@skippinet.com.au] 
> 
> > ...after setting up spambayes with Outlook XP (training, 
> > telling it to watch the Inbox and move spam), I decided 
> > the icons were too far to the right so I right-clicked 
> > on the toolbar, dragged them down to the next line, and 
> > then Outlook froze. In particular the customize dialog 
> > was unresponsive (although I could still move icons 
> > around on the 
>
> I will try to repro this, but am busy for the next few days.

Thanks! (I hope I didn't imply that this was urgent, I'm fully aware of
what I'm doing when I'm using pre-release software and I'm very grateful
for any time you want to spend on someone who's not contributing :-)
 
> You may like to try the customize dialog, and hitting "Reset" 
> on the toolbars.

Doing that, and re-running addin.py got me back up and running. When I
right clicked on the toolbar again (I just couldn't help myself <wink>),
the dialog box was again frozen, but this time I could close it with
Alt+F4. After I shut down and re-started Outlook, it looked to be
working.

> Otherwise, for the time being, wrap an exception handler around:
> 
> >   File "D:\Transfer\spambayes-1.0a1\Outlook2000\addin.py", 
> > line 470, in
> > _AddControl
> >     item = parent.Controls.Add(Type=control_type, Temporary=True)
> 
> And just ignore it for now.

I will try that if it fails again...

Looking a little closer at addin.py, it looks like I was doing something
you were trying to prevent (461-464):

   [...temporary Toolbars...]
   # Maybe we should consider making them permanent - this would then
   # allow the user to drag them around the toolbars and have them
   # stick. The downside is that should the user uninstall this addin
   # there is no clean way to remove the buttons.  Do we even care?

I would obviously not care <wink>.

Also, at 395, the name of the Toolbar is named explicitly:

  toolbar = bars.Item("Standard")

whereas the second line of toolbar buttons is normally called
"Advanced", and the user could obviously have created custom toolbars
named anything they choose (not sure if this is relevant...)

-- bjorn


From noreply at sourceforge.net  Thu Jan 23 14:02:32 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 17:04:45 2003
Subject: [Spambayes] [ spambayes-Bugs-673388 ] pop3proxy storage
Message-ID: <E18bpQe-0001Cg-00@sc8-sf-web3.sourceforge.net>

Bugs item #673388, was opened at 2003-01-23 23:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fran�ois Granger (fgranger)
Assigned to: Nobody/Anonymous (nobody)
Summary: pop3proxy storage

Initial Comment:
I had a look in the pop3proxy folders, and I found thes strange files. They miss header and maybe part of the message.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673388&group_id=61702

From noreply at sourceforge.net  Thu Jan 23 14:04:35 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 17:04:52 2003
Subject: [Spambayes] [ spambayes-Bugs-673390 ] pop3proxy storage 2nd file
Message-ID: <E18bpSd-0007OG-00@sc8-sf-web1.sourceforge.net>

Bugs item #673390, was opened at 2003-01-23 23:04
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fran�ois Granger (fgranger)
Assigned to: Nobody/Anonymous (nobody)
Summary: pop3proxy storage 2nd file

Initial Comment:
Other file missing header

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673390&group_id=61702

From francois.granger at free.fr  Thu Jan 23 23:05:00 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Thu Jan 23 17:06:27 2003
Subject: [Spambayes] Strange issue storage with pop3proxy
Message-ID: <a05200f27ba5616e7104d@[192.168.1.20]>

Sorry, I submitted it through sourceforge, but the file upload did not worked.

I got partial messages in files missing at least header and maybe 
part of the message.

I enclose two messages here.
-- 
Recently using MacOSX.......-------------- next part --------------
Skipped content of type multipart/appledouble-------------- next part --------------
Skipped content of type multipart/appledouble
From T.A.Meyer at massey.ac.nz  Fri Jan 24 15:23:17 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 23 21:24:04 2003
Subject: [Spambayes] Outlook: new folder selector code
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D387@its-xchg4.massey.ac.nz>

[Mark]
> I have just checked in some significant changes to the 
> "folder selector" dialog
> Please let me know if you have any problems.

Sadly, I do :)  There was more than one, but mostly it was when using from a fresh install.  The defaults provided were entryid only, not (storeid, entryid).  There were also some 'None' entries, which caused problems.

I'll email Mark with my fixed versions and leave it to him to check them in.

Nice and fast though :p

Cheers,
Tony

From T.A.Meyer at massey.ac.nz  Fri Jan 24 15:35:37 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 23 21:36:16 2003
Subject: [Spambayes] I did something stupid...
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D389@its-xchg4.massey.ac.nz>

> Looking a little closer at addin.py, it looks like I was 
> doing something
> you were trying to prevent (461-464):
> 
>    [...temporary Toolbars...]
>    # Maybe we should consider making them permanent - this would then
>    # allow the user to drag them around the toolbars and have them
>    # stick. The downside is that should the user uninstall this addin
>    # there is no clean way to remove the buttons.  Do we even care?

Mark, could the unregister code not delete any permanent buttons?  Along the same lines, why does the addin not show up in the COM add-ins list in the Outlook prefs?  (Tools->Options->Other->Advanced->COM Add-ins) - is this because it's a python scripts and not an .exe or .dll?  Is this just a packaging issue?

> Also, at 395, the name of the Toolbar is named explicitly:
> 
>   toolbar = bars.Item("Standard")
> 
> whereas the second line of toolbar buttons is normally called
> "Advanced", and the user could obviously have created custom toolbars
> named anything they choose (not sure if this is relevant...)

I *think*, from looking at addin.py that this would still be needed even if they became permanent - they would default to the "Standard" toolbar (they have to start somewhere!).

Cheers,
Tony

From noreply at sourceforge.net  Thu Jan 23 17:59:23 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 21:39:47 2003
Subject: [Spambayes] 
 [ spambayes-Patches-673754 ] Outlook exception when starting not in
 the inbox
Message-ID: <E18bt7r-00036g-00@sc8-sf-web4.sourceforge.net>

Patches item #673754, was opened at 2003-01-24 14:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Outlook exception when starting not in the inbox

Initial Comment:
If Outlook is started not in the inbox (any mail folder?) - 
in Outlook Today, for example - an exception is caused 
when you first switch to a mail folder.  The exception 
doesn't seem to cause any errors, but better safe than 
sorry :)

Here's the trace:

pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 275, in 
_Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 280, in 
_invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, 
args, None, None)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 562, in 
_invokeex_
    return DesignatedWrapPolicy._invokeex_( self, 
dispid, lcid, wFlags, args, kwArgs, serviceProvider)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 510, in 
_invokeex_
    return apply(func, args)
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 549, in OnFolderSwitch
    self.but_recover_as.Visible = show_recover_as
  File "D:\Python22\lib\site-
packages\win32com\client\__init__.py", line 368, in 
__getattr__
    raise AttributeError, "'%s' object has no attribute '%s'" 
% (repr(self), attr)
exceptions.AttributeError: '<win32com.client.COMEventC
lass>' object has no attribute 'but_recover_as'

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

From noreply at sourceforge.net  Thu Jan 23 18:33:59 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 23 21:39:53 2003
Subject: [Spambayes] 
 [ spambayes-Patches-673754 ] Outlook exception when starting not in
 the inbox
Message-ID: <E18btfL-0002rW-00@sc8-sf-web3.sourceforge.net>

Patches item #673754, was opened at 2003-01-24 14:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Outlook exception when starting not in the inbox

Initial Comment:
If Outlook is started not in the inbox (any mail folder?) - 
in Outlook Today, for example - an exception is caused 
when you first switch to a mail folder.  The exception 
doesn't seem to cause any errors, but better safe than 
sorry :)

Here's the trace:

pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 275, in 
_Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 280, in 
_invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, 
args, None, None)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 562, in 
_invokeex_
    return DesignatedWrapPolicy._invokeex_( self, 
dispid, lcid, wFlags, args, kwArgs, serviceProvider)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 510, in 
_invokeex_
    return apply(func, args)
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 549, in OnFolderSwitch
    self.but_recover_as.Visible = show_recover_as
  File "D:\Python22\lib\site-
packages\win32com\client\__init__.py", line 368, in 
__getattr__
    raise AttributeError, "'%s' object has no attribute '%s'" 
% (repr(self), attr)
exceptions.AttributeError: '<win32com.client.COMEventC
lass>' object has no attribute 'but_recover_as'

----------------------------------------------------------------------

>Comment By: Tony Meyer (anadelonbrin)
Date: 2003-01-24 15:33

Message:
Logged In: YES 
user_id=552329

Akk...Next time more testing before submitting a patch :)  
That didn't work well at all (same problem, though).  I think 
that this will, though.  There's a comment in addin.py about 
an Outlook bug with OnNewExplorer, but the code doesn't 
seem to do what the comments say.  The first 
OnNewExplorer call is skipped (via the do_activate bool), so 
it's safe to have setup in onactivate and not onselection.

Anyway, this works and fixes the problem on my system.  I'll 
leave it to others to check theirs.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

From jpstbelt at jla.vsnl.net.in  Fri Jan 24 13:54:20 2003
From: jpstbelt at jla.vsnl.net.in (Jai Pall)
Date: Fri Jan 24 03:35:56 2003
Subject: [Spambayes] re.post our mail to your list
Message-ID: <000c01c2c381$ff64f880$09d941db@jps>

Dear Sir,

Pl. post our mail to your list.


Thanks

Jai Pall
From francois.granger at free.fr  Fri Jan 24 09:39:21 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Fri Jan 24 03:39:29 2003
Subject: [Spambayes] Another example
Message-ID: <a05200f2dba56a76d2621@[192.168.1.20]>

Inside the attached archive, there are two versions of the same mail. 
All mail with issues come from the same mailing list. The list 
software is Sympa: <http://www.sympa.org/>. All of these mail come 
throught the same pop server: pop.laposte.net. i'll try to get them 
through another pop server to see if it makes any differences.

The current status of pop3proxy is as follow:

========================================
POP3 proxy running on 127.0.0.1:110, 127.0.0.2:110, 127.0.0.3:110, 
127.0.0.4:110, proxying to pop.nerim.net:110, pop.free.fr:110, 
altern.org:110, pop.laposte.net:110.
Active POP3 conversations: 0.
POP3 conversations this session: 457.
Emails classified this session: 31 spam, 1187 ham, 32 unsure.
Total emails trained: Spam: 120 Ham: 77
========================================

The terminal displayed:

========================================
[...]
adding message 1043357721 to corpus
placing 1043357721 in corpus cache
adding 1043361256 to corpus
storing 1043361256
adding message 1043361256 to corpus
placing 1043361256 in corpus cache
adding 1043361257 to corpus
storing 1043361257
adding message 1043361257 to corpus
placing 1043361257 in corpus cache
adding 1043361262 to corpus
storing 1043361262
[...]
========================================


The "Eudora.txt" one is a copy and past from the Eudora mbox file.

The "1043361257" is the file that pop3proxy stored in his folder 
before I review it.

There is really something wrong here. I guess that the email module 
has still some issues with the Microsoft XML format ? ....
And file storage should be done with the raw data ?
-- 
Recently using MacOSX.......-------------- next part --------------
Skipped content of type multipart/appledouble
From mhammond at skippinet.com.au  Sat Jan 25 00:59:12 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Jan 24 09:00:05 2003
Subject: [Spambayes] I did something stupid...
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D389@its-xchg4.massey.ac.nz>
Message-ID: <005901c2c3b0$c70529e0$530f8490@eden>

[Tony]
> Mark, could the unregister code not delete any permanent
> buttons?

It could, but it would need to start outlook to do so.  I guess that isn't
so bad when I type it here (as opposed to just musing over it!)

> Along the same lines, why does the addin not show
> up in the COM add-ins list in the Outlook prefs?
> (Tools->Options->Other->Advanced->COM Add-ins) - is this
> because it's a python scripts and not an .exe or .dll?  Is
> this just a packaging issue?

Actually, NFI about this.  The docs even say you can (so we have a bug!
<wink>)

> > Also, at 395, the name of the Toolbar is named explicitly:
> >
> >   toolbar = bars.Item("Standard")
> >
> > whereas the second line of toolbar buttons is normally called
> > "Advanced", and the user could obviously have created
> > custom toolbars named anything they choose (not sure if this
> > is relevant...)

It may be, but the code tries to handle this.  Note that _AddControl says:

        item = self.CommandBars.FindControl(
                        Type = control_type,
                        Tag = item_attrs['Tag'])

So we actually search all command bars for the specified tag.  The tag is
assumed unique.  If this command returns None, then the toolbar passed (ie,
"Standard") is where the items are to be added.

At least this is the intent <wink>.  I've added comments to this affect.

> I *think*, from looking at addin.py that this would still be
> needed even if they became permanent - they would default to
> the "Standard" toolbar (they have to start somewhere!).

Yeah, we do need to do something.  I was kinda hoping that the worst that
would happen is a couple of dead buttons should the user be so brain-dead
they choose to uninstall our product <wink>.

Add 2 bugs and assign them to me - one for the doc/code for the plugin, and
the other that we leave dead buttons on uninstall.

Mark.


From noreply at sourceforge.net  Fri Jan 24 01:20:51 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 24 09:51:44 2003
Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code
Message-ID: <E18c015-0005t8-00@sc8-sf-web2.sourceforge.net>

Bugs item #673892, was opened at 2003-01-24 10:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fran�ois Granger (fgranger)
Assigned to: Nobody/Anonymous (nobody)
Summary: Missing compat with 22 code

Initial Comment:
MacOS X 10.2.3 built in python 2.2

In pop3proxy Web interface, I click on the Config link.

Traceback (most recent call last):

  File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator
    getattr(plugin, name)(**params)

  File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig
    isFirstRow = True

NameError: global name 'True' is not defined

Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

From noreply at sourceforge.net  Fri Jan 24 09:13:07 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 24 12:16:23 2003
Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code
Message-ID: <E18c7O7-0001O9-00@sc8-sf-web3.sourceforge.net>

Bugs item #673892, was opened at 2003-01-24 09:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Fran�ois Granger (fgranger)
>Assigned to: Richie Hindle (richiehindle)
Summary: Missing compat with 22 code

Initial Comment:
MacOS X 10.2.3 built in python 2.2

In pop3proxy Web interface, I click on the Config link.

Traceback (most recent call last):

  File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator
    getattr(plugin, name)(**params)

  File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig
    isFirstRow = True

NameError: global name 'True' is not defined

Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem.

----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-24 17:13

Message:
Logged In: YES 
user_id=85414

I'll sort this out - thanks, Fran�ois.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

From neale at woozle.org  Fri Jan 24 10:37:34 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 24 13:38:32 2003
Subject: [Spambayes] re.post our mail to your list
In-Reply-To: <000c01c2c381$ff64f880$09d941db@jps> ("Jai Pall"'s message of
 "Fri, 24 Jan 2003 13:54:20 +0530")
References: <000c01c2c381$ff64f880$09d941db@jps>
Message-ID: <w53d6mmtmvl.fsf@woozle.org>

"Jai Pall" <jpstbelt@jla.vsnl.net.in> writes:

> Dear Sir,
>
> Pl. post our mail to your list.

Looks like it got posted :)

Neale

From db3l at fitlinxx.com  Fri Jan 24 14:51:52 2003
From: db3l at fitlinxx.com (David Bolen)
Date: Fri Jan 24 14:52:14 2003
Subject: [Spambayes] Re: Outlook: new folder selector code
References: <1ED4ECF91CDED24C8D012BCF2B034F1318D387@its-xchg4.massey.ac.nz>
Message-ID: <u65sewckn.fsf@fitlinxx.com>

"Meyer, Tony" <T.A.Meyer@massey.ac.nz> writes:

> Sadly, I do :) There was more than one, but mostly it was when using
> from a fresh install.  The defaults provided were entryid only, not
> (storeid, entryid).  There were also some 'None' entries, which
> caused problems.
> 
> I'll email Mark with my fixed versions and leave it to him to check them in.

I'm also having problems with an exchange server after updating to the
latest from CVS - using my existing configuration it failed when
trying to establish the filtering hooks.  But even from a fresh
install and empty database, any attempt to work with folders generates
an exception - for me it's in NormalizeID in msgstore.py:
"AssertionError: We expect fully qualified IDs"

Given Mark's comment about being tied up currently, if you were
amenable to sending me a copy of your patches, I could also give them
a shot on my system.

-- David


From jh at web.de  Fri Jan 24 21:39:18 2003
From: jh at web.de (Juergen Hermann)
Date: Fri Jan 24 15:38:58 2003
Subject: [Spambayes] Training unsure msgs only
Message-ID: <E18cAan-0005WL-00@smtp.web.de>

Hi!

When you have trained a certain amount of msgs, it's enough to only 
train the unsure msgs, Thus, what do you tink of a "review unsure 
only" link in the pop3proxy (defaulting the two other categories to 
Discard).

Another possibilty would be to have 3 more buttons (train spam / ham / 
unsure only).


Ciao, J?rgen


From richie at entrian.com  Fri Jan 24 20:56:55 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 24 15:57:36 2003
Subject: [Spambayes] Training unsure msgs only
In-Reply-To: <E18cAan-0005WL-00@smtp.web.de>
References: <E18cAan-0005WL-00@smtp.web.de>
Message-ID: <j0a33vsv71mi3rf8afd5dc46l0phhf489g@4ax.com>

Hi Juergen,

> When you have trained a certain amount of msgs, it's enough to only 
> train the unsure msgs, Thus, what do you tink of a "review unsure 
> only" link in the pop3proxy (defaulting the two other categories to 
> Discard).
> 
> Another possibilty would be to have 3 more buttons (train spam / ham / 
> unsure only).

You can click the 'Discard' headers above the ham and spam lists to set all
those messages to Discard.  It's two extra clicks, but IMHO that's better
than an extra piece of user interface - I don't want to make it too busy.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Fri Jan 24 21:14:33 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 24 16:15:06 2003
Subject: [Spambayes] Training unsure msgs only
In-Reply-To: <E18cAzi-0004gG-00@smtp.web.de>
References: <j0a33vsv71mi3rf8afd5dc46l0phhf489g@4ax.com>
	<E18cAzi-0004gG-00@smtp.web.de>
Message-ID: <o2b33v452eh2sb0bnjsni6qp88bkcm2rge@4ax.com>


> Maybe they're described in the manual I did not care to read. ;)

They're in the one I did not care to write (yet?)

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Fri Jan 24 21:21:20 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 24 16:21:56 2003
Subject: [Spambayes] Re: Another example
In-Reply-To: <a05200f2dba56a76d2621@[192.168.1.20]>
References: <a05200f2dba56a76d2621@[192.168.1.20]>
Message-ID: <a7b33v0daetosk1tpgd1mrp4e6tskge482@4ax.com>

Hi Fran?ois,

> Inside the attached archive, there are two versions of the same mail. 

Bizarre.  It's as though it's coming in halfway through a message, and
deciding that the message body up to the first CRNLCRNL is the headers.
This:

> adding message 1043357721 to corpus

implies that you're running with "verbose: True", which must mean you have
a _pop3proxy.log - next time you recieve such a broken email, could you
email me your _pop3proxy.log?  (It has your password in so you might want
to edit that out, but if you do, could you try to use a binary editor that
won't change any line-ending characters?)  Could you zip it up before
sending, again to prevent anything messing with the line endings?

> I guess that the email module 
> has still some issues with the Microsoft XML format ? ....
> And file storage should be done with the raw data ?

The proxy doesn't use the email module to add its headers, so that's not
the problem.  And the storage *is* done with the raw data.

> According to the text editor I used, they are 
> all CR+LF files, whatever the mail server or the source mailer app. 
> Look strange that pop3proxy store them that way on MacOS X ?

It stores the messages exactly as they come over the wire from the POP3
server, and POP3 uses CRLF as the line ending regardless of the platform.
They're stored on the disk in binary mode, because you never know whether
there are unencoded binary characters in there.

-- 
Richie Hindle
richie@entrian.com


From noreply at sourceforge.net  Fri Jan 24 12:10:28 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 24 16:23:51 2003
Subject: [Spambayes] [ spambayes-Bugs-673892 ] Missing compat with 22 code
Message-ID: <E18cA9k-0000h7-00@sc8-sf-web3.sourceforge.net>

Bugs item #673892, was opened at 2003-01-24 09:20
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Fran�ois Granger (fgranger)
Assigned to: Richie Hindle (richiehindle)
Summary: Missing compat with 22 code

Initial Comment:
MacOS X 10.2.3 built in python 2.2

In pop3proxy Web interface, I click on the Config link.

Traceback (most recent call last):

  File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in found_terminator
    getattr(plugin, name)(**params)

  File "/Volumes/OS99/spambayes/OptionConfig.py", line 219, in onConfig
    isFirstRow = True

NameError: global name 'True' is not defined

Adding line 44-48 of pop3proxy.py at line 33 of OptionConfig.py solve the problem.

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-24 17:13

Message:
Logged In: YES 
user_id=85414

I'll sort this out - thanks, Fran�ois.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=673892&group_id=61702

From noreply at sourceforge.net  Fri Jan 24 13:02:48 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 24 16:23:57 2003
Subject: [Spambayes] [ spambayes-Bugs-672495 ] Files not installed by setup.py
Message-ID: <E18cAyO-0003kI-00@sc8-sf-web1.sourceforge.net>

Bugs item #672495, was opened at 2003-01-22 15:33
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: J�rgen Hermann (jhermann)
>Assigned to: Richie Hindle (richiehindle)
Summary: Files not installed by setup.py

Initial Comment:
Patch:

===================================================================
RCS file: /cvsroot/spambayes/spambayes/setup.py,v
retrieving revision 1.13
diff -u -r1.13 setup.py
--- setup.py    17 Jan 2003 06:45:36 -0000      1.13
+++ setup.py    22 Jan 2003 15:28:05 -0000
@@ -39,8 +39,12 @@
            'pop3proxy.py',
            'proxytrainer.py',
            'proxytee.py',
+           'OptionConfig.py',
           ],
-        packages = [ 'spambayes', ],
+        packages = [
+           'spambayes',
+           'spambayes.resources',
+        ],
         classifiers = [
             'Development Status :: 4 - Beta',
             'Environment :: Console',

----------------------------------------------------------------------

>Comment By: Richie Hindle (richiehindle)
Date: 2003-01-24 21:02

Message:
Logged In: YES 
user_id=85414

spambayes.resources is now installed, and OptionConfig.py
now lives in the spambayes package.  Thanks, J�rgen.


----------------------------------------------------------------------

Comment By: J�rgen Hermann (jhermann)
Date: 2003-01-22 20:22

Message:
Logged In: YES 
user_id=39128

The current problem is the import in line 153 of pop3proxy:

from OptionConfig import OptionsConfigurator

Moving OptionConfig into the package is surely the best fix,
including adapting the above import.

----------------------------------------------------------------------

Comment By: Richie Hindle (richiehindle)
Date: 2003-01-22 18:43

Message:
Logged In: YES 
user_id=85414

You're dead right about spambayes.resources, but I'm not
convinced we should be installing OptionConfig.py now that
it's been folded into the main pop3proxy web interface.  I asked
on the list whether anyone thought we should leave it in with the
other scripts and got no replies.  I'm tempted to move it into the
spambayes package, from where pop3proxy.py can import it.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=672495&group_id=61702

From jh at web.de  Fri Jan 24 22:05:08 2003
From: jh at web.de (Juergen Hermann)
Date: Fri Jan 24 16:27:32 2003
Subject: [Spambayes] Training unsure msgs only
In-Reply-To: <j0a33vsv71mi3rf8afd5dc46l0phhf489g@4ax.com>
Message-ID: <E18cAzi-0004gG-00@smtp.web.de>

On Fri, 24 Jan 2003 20:56:55 +0000, Richie Hindle wrote:

>You can click the 'Discard' headers above the ham and spam lists to set all
>those messages to Discard.  It's two extra clicks, but IMHO that's better
>than an extra piece of user interface - I don't want to make it too busy.

Ugh, I did not note them until now. Maybe they're described in the manual I 
did not care to read. ;)


Ciao, J?rgen


From noreply at sourceforge.net  Fri Jan 24 14:29:25 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri Jan 24 17:50:08 2003
Subject: [Spambayes] 
 [ spambayes-Patches-639122 ] hammie: ignore emails older than n days
Message-ID: <E18cCKD-0007fp-00@sc8-sf-web1.sourceforge.net>

Patches item #639122, was opened at 2002-11-15 15:47
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

Category: None
Group: None
Status: Open
Resolution: Later
Priority: 5
Submitted By: Jason Hildebrand (jdhildeb)
Assigned to: Neale Pickett (npickett)
Summary: hammie: ignore emails older than n days

Initial Comment:
Since your documentation stresses the importance of
training using only relatively recent emails, I thought
a good way to do this would be to have hammie do it for me.

So I added a new configuration option:

[Hammie]
# when training, hammie will ignore messages older than
this number of days.
# i.e. set to 365 to ignore messages older than one year.
# Set to 0 to disable any filtering by date.
ignore_old_messages: 0

The patch also modifies Hammie to output the number of
messages it read/ignored for each mail file it processes.

This option might also prove useful for doing
incremental training (i.e. set up cron to train once a
week, and set ignore_old_messages to 7).


----------------------------------------------------------------------

>Comment By: Jason Hildebrand (jdhildeb)
Date: 2003-01-24 16:29

Message:
Logged In: YES 
user_id=173690

Unfortunately, I haven't had time to update to a more recent
spambayes; I'm still using a version from last november. 
Since this version is working well for me, I'm not terribly
interested in messing with it -- since I know things have
changed considerably in CVS since then.   So I'm in a poor
position to judge whether the functionality mboxtrain.py
offers is "good enough" -- I'll have to leave it up to
others to comment on.

----------------------------------------------------------------------

Comment By: T. Alexander Popiel (popiel)
Date: 2003-01-23 12:42

Message:
Logged In: YES 
user_id=632302

Parsing the topmost received header for the date is a very
valuable tool for maintaining limited database size.  It's a
key feature of my bulkgraph.py script (over and above
dealing with my non-standard everything vs. spam folders). 
Count this as another vote to include such filtering... even
though my peculiar folder setup precludes me from using
mboxtrain.

----------------------------------------------------------------------

Comment By: Neale Pickett (npickett)
Date: 2003-01-22 23:01

Message:
Logged In: YES 
user_id=619391

Jason, does the current mboxtrain.py script do enough of
this functionality for you, or would you still like to see
us work by the Recieved header?  I suspect it might be good
enough...

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639122&group_id=61702

From richie at entrian.com  Sat Jan 25 00:02:30 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 24 19:03:04 2003
Subject: [Spambayes] Alpha 2 Release?
Message-ID: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>


I'd like to suggest we make another alpha release next week.  Lots has
happened since alpha 1 (some of which is important to the Linux Journal
articles, which come out first thing in February):

 o Neale's work on plugging into Mutt and other mail clients.
 o Integration of Tim's web configuration page into pop3proxy.py, so you
   no longer need to know about bayescustomize.ini to use the POP3 proxy.
 o The ability to run multiple POP3 proxies on the same port.
 o The ability to limit connections to the web interface to localhost.
 o Various sundry improvements and bugfixes (including MacOS X support).

Neale, do you think your Mutt edits will be ready by the middle of next
week?  I haven't tried them but it sounds like they're pretty much there?

Here's what I think needs doing before another release:

 o I need to get to the bottom of Fran?ois' bizarre pop3proxy problems.
 o We need to document Neale's Mutt work (or just provide an example
   muttrc)
 o People need to test the up-to-date version!

Skip, do you think the two of us can merge proxytrainer.py into
pop3proxy.py before we make this release - hopefully the problems you were
having with pop3proxy.py are cleared up now?  It would be nice not to ship
the duplicated code.  I'm happy to do most if this if you're short of time
- I've already incorporated a couple of your changes.

Any other suggestions for what might need doing?

Anthony, is there anything release-wise that needs to be done, or is it
just a matter of running "setup.py sdist --formats zip,gztar" on a fresh
checkout?

release-early-and-release-often-ly yrs,

-- 
Richie Hindle
richie@entrian.com


From jh at web.de  Sat Jan 25 02:00:06 2003
From: jh at web.de (Juergen Hermann)
Date: Fri Jan 24 19:59:48 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
Message-ID: <E18cEf6-0007Xx-00@smtp.web.de>

On Sat, 25 Jan 2003 00:02:30 +0000, Richie Hindle wrote:

>Any other suggestions for what might need doing?

On the point of documentation, do we have any (beyond the README)? ;)

Do we need more?

I could certainly help with publishing technology (esp. wikis and DocBook), 
less with content itself (too much time needed).


Ciao, J?rgen


From skip at pobox.com  Fri Jan 24 19:27:53 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 24 20:27:58 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
Message-ID: <15921.59417.459454.66175@montanaro.dyndns.org>


    Richie> Skip, do you think the two of us can merge proxytrainer.py into
    Richie> pop3proxy.py before we make this release - hopefully the
    Richie> problems you were having with pop3proxy.py are cleared up now?

Yes, I believe we should be able to merge them.  I'll give pop3proxy another
whirl tonight or tomorrow and let you know what I encounter.

Skip

From francois.granger at free.fr  Sat Jan 25 12:03:40 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 25 06:03:51 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
Message-ID: <a05200f34ba581ccdf835@[192.168.1.20]>

At 00:02 +0000 25/01/2003, in message [Spambayes] Alpha 2 Release?, 
Richie Hindle wrote:

>  o The ability to run multiple POP3 proxies on the same port.

This works great.

>  o Various sundry improvements and bugfixes (including MacOS X support).

Anything needed on this side for inclusion in documentation ?

>  o I need to get to the bottom of Fran?ois' bizarre pop3proxy problems.

Just sent my "_pop3proxy.log" in separate mail to you. But my current 
conclusion is that I speak to a bad french pop server conbined with 
Outlook "special features" ;-)
But, there are at least two versions of Outlook involved.

I would be happy to give you an account on this server for testing if 
you want. This is easily doable since I have a login there that I 
don't use much.

-- 
Recently using MacOSX.......

From francois.granger at free.fr  Sat Jan 25 12:06:44 2003
From: francois.granger at free.fr (=?iso-8859-1?Q?Fran=E7ois?= Granger)
Date: Sat Jan 25 06:06:49 2003
Subject: [Spambayes] Training unsure msgs only
In-Reply-To: <o2b33v452eh2sb0bnjsni6qp88bkcm2rge@4ax.com>
References: <j0a33vsv71mi3rf8afd5dc46l0phhf489g@4ax.com>
 <E18cAzi-0004gG-00@smtp.web.de>
 <o2b33v452eh2sb0bnjsni6qp88bkcm2rge@4ax.com>
Message-ID: <a05200f35ba581febb33a@[192.168.1.20]>

At 21:14 +0000 24/01/2003, in message Re: [Spambayes] Training unsure 
msgs only, Richie Hindle wrote:
>  > Maybe they're described in the manual I did not care to read. ;)
>
>They're in the one I did not care to write (yet?)

But clearly written at the top of the screen:

>  "Click one of the Discard / Defer / Ham / Spam headers
>  to check all of the buttons in that section in one go."

;-)
-- 
Recently using MacOSX.......

From richie at entrian.com  Sat Jan 25 12:12:25 2003
From: richie at entrian.com (Richie Hindle)
Date: Sat Jan 25 07:13:02 2003
Subject: [Spambayes] Training unsure msgs only
In-Reply-To: <a05200f35ba581febb33a@[192.168.1.20]>
References: <j0a33vsv71mi3rf8afd5dc46l0phhf489g@4ax.com>
	<E18cAzi-0004gG-00@smtp.web.de> <o2b33v452eh2sb0bnjsni6qp88bkcm2rge@4ax.com>
	<a05200f35ba581febb33a@[192.168.1.20]>
Message-ID: <plv43vcrot9a6s0ag4lv6fvbuvheh6n31q@4ax.com>


> But clearly written at the top of the screen:
> 
> >  "Click one of the Discard / Defer / Ham / Spam headers
> >  to check all of the buttons in that section in one go."

(Note to self: return Guido's time machine keys).

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Sat Jan 25 12:51:21 2003
From: richie at entrian.com (Richie Hindle)
Date: Sat Jan 25 07:51:57 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <E18cEf6-0007Xx-00@smtp.web.de>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<E18cEf6-0007Xx-00@smtp.web.de>
Message-ID: <et153v4k7kcb3jgqn14o3vhu3u4ect8aum@4ax.com>

Hi Juergen,

> On the point of documentation, do we have any (beyond the README)? ;)

Only what's on the website, which is thin on practical details.  I'm hoping
to write an installation and setup guide for the POP3 proxy and web
interface and add that to the website - I've already done this for my Linux
Journal article, so it will just be an updated version of that.  Unless
anyone else is already doing this?

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Sat Jan 25 12:57:50 2003
From: richie at entrian.com (Richie Hindle)
Date: Sat Jan 25 07:58:27 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f34ba581ccdf835@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<a05200f34ba581ccdf835@[192.168.1.20]>
Message-ID: <si153vcd2411c0fuouu3i80g4eda8kr6qc@4ax.com>

Hi Fran?ois,

> >  o Various sundry improvements and bugfixes (including MacOS X support).
> 
> Anything needed on this side for inclusion in documentation ?

I don't think so, unless there's anything MacOS-X-specific that you've run
across that you think people need to know?

> Just sent my "_pop3proxy.log" in separate mail to you.

Aha!  Thanks for that - I think I've solved it.  One of the extra POP3
features introduced by RFC 2449 (http://www.faqs.org/rfcs/rfc2449.html) is
pipelining, where by client can send lots of requests at once without
waiting for the responses.  The POP3 proxy can't cope with that, but your
POP3 server at pop.laposte.net is using it.

I've 'fixed' the proxy so that when the client asks for the capabilities of
the server, the proxy filters out the 'pipelining' capability - that should
prevent the client from trying to use pipelining.  You shouldn't see any
significant difference in speed, except maybe when doing lots of quick
operations together (eg. deleting hundreds of emails in one go) over a
high-latency connection.  If you still have problems, it could be that your
client is explicitly set up to use pipelining regardless of what the server
says - in that case, look for a configuration option called something like
"Use overlapped POP3 commands" and disable it.

Hope that works...

-- 
Richie Hindle
richie@entrian.com


From skip at pobox.com  Sat Jan 25 10:18:11 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 25 11:18:17 2003
Subject: [Spambayes] uniform command line treatment of database/pickle files?
Message-ID: <15922.47299.145550.835866@montanaro.dyndns.org>

Other than our convenience, I don't see any reason the different tools
should use different mechanisms to specify database or pickle files on the
command line.  Hammiefilter.py uses:

    -d DBFILE
        use database in DBFILE
    -D PICKLEFILE
        use pickle (instead of database) in PICKLEFILE

while pop3proxy.py uses:

    -p FILE : use the named database file
    -d      : the database is a DBM file rather than a pickle

I don't know if there are other ways the same information is spelled, but I
think it would be nice if a pass was made over the existing command line
arguments so that all command line tools use the same flags for the same
purpose.

(Proxytrainer.py is dead!  Long live pop3proxy.py!  Thanks Richie!)

Skip


From skip at pobox.com  Sat Jan 25 10:25:14 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Jan 25 11:25:18 2003
Subject: [Spambayes] Oh, one other thing...
Message-ID: <15922.47722.760937.263327@montanaro.dyndns.org>

I almost forgot...

I have this little blurb in my procmailrc file:

    :0 fw:hamlock
    | proxytee.py --prob=0.2

    :0 fw:hamlock
    | hammiefilter.py -d $HOME/hammie.db

This should probably be collapsed into just

    :0 fw:
    | hammiefilter.py -u

with hammiefilter.py both passing the message along to pop3proxy.py for
training, and getting the score from pop3proxy.py (the -u meant to imply
"don't score it yourself, 'u'pload the message to pop3proxy and use the
score it returns").  

Make sense?

Skip
 

From mhammond at skippinet.com.au  Sat Jan 25 17:51:52 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Jan 26 01:28:16 2003
Subject: [Spambayes] Re: Outlook: new folder selector code
In-Reply-To: <u65sewckn.fsf@fitlinxx.com>
Message-ID: <000c01c2c43e$3ea03b60$530f8490@eden>

> I'm also having problems with an exchange server after updating to the
> latest from CVS - using my existing configuration it failed when
> trying to establish the filtering hooks.  But even from a fresh
> install and empty database, any attempt to work with folders generates
> an exception - for me it's in NormalizeID in msgstore.py:
> "AssertionError: We expect fully qualified IDs"

I've checked in a fix for this.  No idea if it will fix your exchange server
error, but all the IDs should now be fully qualified after a fresh install.

Mark.


From anthony at interlink.com.au  Tue Jan 28 01:24:47 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Jan 27 09:31:45 2003
Subject: [Spambayes] Alpha 2 Release? 
In-Reply-To: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com> 
Message-ID: <200301271424.h0REOmm23697@localhost.localdomain>


>>> Richie Hindle wrote
> 
> I'd like to suggest we make another alpha release next week.  Lots has
> happened since alpha 1 (some of which is important to the Linux Journal
> articles, which come out first thing in February):

Sounds good to me. 
 
>  o People need to test the up-to-date version!

We need to provide an "upgrading guide", of sorts. This can just
be a release note.

We need to find a way to make the install script remove the older
'scripts' that have already been installed (and which may be busted!)
but which are no longer in the distro.

> Anthony, is there anything release-wise that needs to be done, or is it
> just a matter of running "setup.py sdist --formats zip,gztar" on a fresh
> checkout?

I tend to do the following:

make a tarball and a zipfile

unpack them on a totally different machine, install it, diff against 
what was already there beforehand, a bunch of other, similar, sanity
checking.

then it's SF release dance time...

Anthony
-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tim at fourstonesExpressions.com  Mon Jan 27 08:34:15 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 09:34:54 2003
Subject: [Spambayes] Alpha 2 Release? 
In-Reply-To: <200301271424.h0REOmm23697@localhost.localdomain>
Message-ID: <PMZTMGAPO7TP7SRXVOMRNVB8FEEA.3e354367@myst>

We need to begin examining release migration issues, particularly when the 
database won't migrate between releases.  We should at least give instructions 
on how to retrain, but better than that would be automagic upgrade of the 
file.  I'll take a look at this...  - TimS

1/27/2003 8:24:47 AM, Anthony Baxter <anthony@interlink.com.au> wrote:

>
>>>> Richie Hindle wrote
>> 
>> I'd like to suggest we make another alpha release next week.  Lots has
>> happened since alpha 1 (some of which is important to the Linux Journal
>> articles, which come out first thing in February):
>
>Sounds good to me. 
> 
>>  o People need to test the up-to-date version!
>
>We need to provide an "upgrading guide", of sorts. This can just
>be a release note.
>
>We need to find a way to make the install script remove the older
>'scripts' that have already been installed (and which may be busted!)
>but which are no longer in the distro.
>
>> Anthony, is there anything release-wise that needs to be done, or is it
>> just a matter of running "setup.py sdist --formats zip,gztar" on a fresh
>> checkout?
>
>I tend to do the following:
>
>make a tarball and a zipfile
>
>unpack them on a totally different machine, install it, diff against 
>what was already there beforehand, a bunch of other, similar, sanity
>checking.
>
>then it's SF release dance time...
>
>Anthony
>-- 
>Anthony Baxter     <anthony@interlink.com.au>   
>It's never too late to have a happy childhood.
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From noreply at sourceforge.net  Mon Jan 27 06:18:09 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 09:43:24 2003
Subject: [Spambayes] 
 [ spambayes-Patches-673754 ] Outlook exception when starting not in
 the inbox
Message-ID: <E18dA5R-0006b1-00@sc8-sf-web3.sourceforge.net>

Patches item #673754, was opened at 2003-01-24 12:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

Category: Outlook
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
>Assigned to: Mark Hammond (mhammond)
Summary: Outlook exception when starting not in the inbox

Initial Comment:
If Outlook is started not in the inbox (any mail folder?) - 
in Outlook Today, for example - an exception is caused 
when you first switch to a mail folder.  The exception 
doesn't seem to cause any errors, but better safe than 
sorry :)

Here's the trace:

pythoncom error: Python error invoking COM method.
Traceback (most recent call last):
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 275, in 
_Invoke_
    return self._invoke_(dispid, lcid, wFlags, args)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 280, in 
_invoke_
    return S_OK, -1, self._invokeex_(dispid, lcid, wFlags, 
args, None, None)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 562, in 
_invokeex_
    return DesignatedWrapPolicy._invokeex_( self, 
dispid, lcid, wFlags, args, kwArgs, serviceProvider)
  File "D:\Python22\lib\site-
packages\win32com\server\policy.py", line 510, in 
_invokeex_
    return apply(func, args)
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 549, in OnFolderSwitch
    self.but_recover_as.Visible = show_recover_as
  File "D:\Python22\lib\site-
packages\win32com\client\__init__.py", line 368, in 
__getattr__
    raise AttributeError, "'%s' object has no attribute '%s'" 
% (repr(self), attr)
exceptions.AttributeError: '<win32com.client.COMEventC
lass>' object has no attribute 'but_recover_as'

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2003-01-28 01:18

Message:
Logged In: YES 
user_id=14198

Thanks!  Fixed:
Checking in addin.py;
/cvsroot/spambayes/spambayes/Outlook2000/addin.py,v  <-- 
addin.py
new revision: 1.46; previous revision: 1.45


----------------------------------------------------------------------

Comment By: Tony Meyer (anadelonbrin)
Date: 2003-01-24 13:33

Message:
Logged In: YES 
user_id=552329

Akk...Next time more testing before submitting a patch :)  
That didn't work well at all (same problem, though).  I think 
that this will, though.  There's a comment in addin.py about 
an Outlook bug with OnNewExplorer, but the code doesn't 
seem to do what the comments say.  The first 
OnNewExplorer call is skipped (via the do_activate bool), so 
it's safe to have setup in onactivate and not onselection.

Anyway, this works and fixes the problem on my system.  I'll 
leave it to others to check theirs.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=673754&group_id=61702

From noreply at sourceforge.net  Mon Jan 27 06:18:49 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 09:43:42 2003
Subject: [Spambayes] 
 [ spambayes-Patches-639312 ] fix for outlook CompareEntryIDs bug
Message-ID: <E18dA65-0006ch-00@sc8-sf-web3.sourceforge.net>

Patches item #639312, was opened at 2002-11-16 23:35
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639312&group_id=61702

Category: None
Group: None
>Status: Closed
>Resolution: Out of Date
Priority: 5
Submitted By: Piers Haken (piersh)
Assigned to: Mark Hammond (mhammond)
Summary: fix for outlook CompareEntryIDs bug

Initial Comment:
This patch reenables the CompareEntryIDs for 
comparing folder IDs. It passes both the MAPI Session 
and the Oulook Session into the dialog, one for retrieving 
the exchange-compatible IDs and the other for 
comparing them.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2003-01-28 01:18

Message:
Logged In: YES 
user_id=14198

The code has moved on - we are back to a MAPI and
CompareEntryIds implementation.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=639312&group_id=61702

From noreply at sourceforge.net  Mon Jan 27 06:20:04 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 09:44:00 2003
Subject: [Spambayes] [ spambayes-Patches-648271 ] Code to remove the New Mail
	icon
Message-ID: <E18dA7I-0006ha-00@sc8-sf-web3.sourceforge.net>

Patches item #648271, was opened at 2002-12-04 19:59
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=648271&group_id=61702

Category: Outlook
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Submitted By: Peter Arnold (lardladpa)
Assigned to: Nobody/Anonymous (nobody)
Summary: Code to remove the New Mail icon

Initial Comment:
It would be great if having processed the newly arrived
e-mail and discovered that they were all spam the addin
could remove the New Message icon from the system tray.
 I know there's no programitic interface to do this but
I found some VB code at
http://www.slipstick.com/dev/code/clearenvicon.htm 

I've converted the 3 pages of VB to this small bit of
python

import win32gui
                                 
# Locate the outlook window owning the tray icon
hWnd = win32gui.FindWindow("rctrl_renwnd32", "")
if hWnd != 0:
    # Send a NIM_DELETE to remove the icon
    nid = (hWnd, 0)
    win32gui.Shell_NotifyIcon(2, nid)

    # Send a WUM_RESETNOTIFICATION to the owning window
    win32gui.SendMessage(hWnd, 1031, 0, 0)


It would be super if this patch could be integrated
into the outlook plugin although I'm not quite sure
where in the code it would go.

----------------------------------------------------------------------

>Comment By: Mark Hammond (mhammond)
Date: 2003-01-28 01:20

Message:
Logged In: YES 
user_id=14198

Closing this.  If a better proposal for the icon is put
forward, we can review it again.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498105&aid=648271&group_id=61702

From db3l at fitlinxx.com  Mon Jan 27 11:34:51 2003
From: db3l at fitlinxx.com (David Bolen)
Date: Mon Jan 27 11:35:03 2003
Subject: [Spambayes] Re: Outlook: new folder selector code
References: <u65sewckn.fsf@fitlinxx.com> <000c01c2c43e$3ea03b60$530f8490@eden>
Message-ID: <u4r7ur1p0.fsf@fitlinxx.com>

"Mark Hammond" <mhammond@skippinet.com.au> writes:

> I've checked in a fix for this.  No idea if it will fix your exchange server
> error, but all the IDs should now be fully qualified after a fresh install.

Yes, the latest CVS seems to be working fine now against my exchange server.

-- David


From richie at entrian.com  Mon Jan 27 17:50:24 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 27 12:51:15 2003
Subject: [Spambayes] uniform command line treatment of database/pickle
	files?
In-Reply-To: <15922.47299.145550.835866@montanaro.dyndns.org>
References: <15922.47299.145550.835866@montanaro.dyndns.org>
Message-ID: <v1sa3v86j8it0ks0i4s59u9drhjov6j0mb@4ax.com>


[Skip]
> Other than our convenience, I don't see any reason the different tools
> should use different mechanisms to specify database or pickle files on the
> command line.  Hammiefilter.py uses:
> 
>     -d DBFILE
>         use database in DBFILE
>     -D PICKLEFILE
>         use pickle (instead of database) in PICKLEFILE
> 
> while pop3proxy.py uses:
> 
>     -p FILE : use the named database file
>     -d      : the database is a DBM file rather than a pickle
> 
> I don't know if there are other ways the same information is spelled, but I
> think it would be nice if a pass was made over the existing command line
> arguments so that all command line tools use the same flags for the same
> purpose.

I'm not attached to either version, so by all means change one of them.
I'd guess that more people are using command-line switches with hammie than
with pop3proxy, so it should probably be pop3proxy that changes (but I
don't have the time myself).  I don't know of any other tools that have
similar switches, but I haven't looked.

-- 
Richie Hindle
richie@entrian.com


From tim at fourstonesExpressions.com  Mon Jan 27 11:54:54 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 12:55:29 2003
Subject: [Spambayes] uniform command line treatment of database/pickle
	files?
In-Reply-To: <v1sa3v86j8it0ks0i4s59u9drhjov6j0mb@4ax.com>
Message-ID: <EA4W3V9543EFCOIUUTHCKI96LI2ZIG.3e35726e@myst>

1/27/2003 11:50:24 AM, Richie Hindle <richie@entrian.com> wrote:

>
>[Skip]
>> Other than our convenience, I don't see any reason the different tools
>> should use different mechanisms to specify database or pickle files on the
>> command line.  Hammiefilter.py uses:
>> 
>>     -d DBFILE
>>         use database in DBFILE
>>     -D PICKLEFILE
>>         use pickle (instead of database) in PICKLEFILE
>> 
>> while pop3proxy.py uses:
>> 
>>     -p FILE : use the named database file
>>     -d      : the database is a DBM file rather than a pickle
>> 
>> I don't know if there are other ways the same information is spelled, but I
>> think it would be nice if a pass was made over the existing command line
>> arguments so that all command line tools use the same flags for the same
>> purpose.
>
>I'm not attached to either version, so by all means change one of them.
>I'd guess that more people are using command-line switches with hammie than
>with pop3proxy, so it should probably be pop3proxy that changes (but I
>don't have the time myself).  I don't know of any other tools that have
>similar switches, but I haven't looked.

I'll change pop3proxy so that -d/-D work.  I don't think I can keep -p/-d 
alive in the process...  -TimS

>
>-- 
>Richie Hindle
>richie@entrian.com
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From richie at entrian.com  Mon Jan 27 17:59:20 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 27 13:00:10 2003
Subject: [Spambayes] Oh, one other thing...
In-Reply-To: <15922.47722.760937.263327@montanaro.dyndns.org>
References: <15922.47722.760937.263327@montanaro.dyndns.org>
Message-ID: <9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com>


[Skip]
> This should probably be collapsed into just
> 
>     :0 fw:
>     | hammiefilter.py -u
> 
> with hammiefilter.py both passing the message along to pop3proxy.py for
> training, and getting the score from pop3proxy.py (the -u meant to imply
> "don't score it yourself, 'u'pload the message to pop3proxy and use the
> score it returns").  
> 
> Make sense?

Neale has already implemented a similar idea over XMLRPC, in hammiesrv and
hammiecli.  But making pop3proxy.py the server, rather than having another
process, would integrate well with the web interface and mean we had no
worries about database locking.  And folding proxytee into hammiefilter
sounds like a good plan too - they're essentially doing similar jobs ("Do
stuff with this message!").

There are advantages to using XMLRPC, eg. cross-language compatibility.
But Skip's upload system uses HTTP, which is pretty cross-language too.  I
don't know whether it would be easy to incorporate Neale's code into
pop3proxy - it would mean making Dibbler.py (the underlying HTTP layer)
understand XMLRPC.  Probably hard but I don't know.

I like the way Skip's going, letting hammie users use the web interface for
training.  Alongside Mark's Outlook plugin and Neale's Mutt (etc.) scripts,
it's looking like we're covering all the bases nicely - regardless of which
client you use, and whether you use hammie or pop3proxy to classify your
mail, you get a nice training interface either within your email client or
in your browser.

-- 
Richie Hindle
richie@entrian.com


From tim at fourstonesExpressions.com  Mon Jan 27 12:19:45 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 13:20:21 2003
Subject: [Spambayes] To our friends down under... (off topic)
Message-ID: <A0VQA7WQ53LH3XA6622VNH73VQGDED3.3e357841@myst>

Happy Australia Day.  :)

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From jon at bergenstreetsoftware.com  Mon Jan 27 14:22:18 2003
From: jon at bergenstreetsoftware.com (Jonathan Baumgartner)
Date: Mon Jan 27 14:22:28 2003
Subject: [Spambayes] seg faults?
Message-ID: <a0521021eba5b35da4ca4@[192.168.1.101]>

I just installed pop3proxy on my OS X machine. It works beautifully. 
It does seem to have one issue, though. Any time I use the web 
application to train the incoming messages, the program will die with 
a segmentation fault. It doesn't give me any more information that 
that. It looks like this:

[Groundskeeper-Willie:~/spambayes-1.0a1] jon% sudo python pop3proxy.py
Password:
Loading database... Done.
Listener on port 110 is proxying mail.bergenstreetsoftware.com:110
User interface url is http://localhost:8880
Segmentation fault

So the procedure for me to train a message goes something like this:

1. Go to http://localhost:8880. Click on "Review messages." Instead 
of going to the review message page, I get a page that looks like 
this:

body { font: 90% arial, swiss, helvetica; margin: 0 } table { font: 
90% arial, swiss, helvetica } form { margin: 0 } .banner { 
background: #c0e0ff; padding=5; padding-left: 15; border-top: 1px 
solid black; border-bottom: 1px solid black } .header { font-size: 
133% } .content { margin: 15 } .messagetable td { padding-left: 1ex; 
padding-right: 1ex } .sectiontable { border: 1px solid #808080; 
width: 95% } .sectionheading { background: fffae0; padding-left: 1ex; 
border-bottom: 1px solid #808080; font-weight: bold } .sectionbody { 
padding: 1em } .reviewheaders a { color: #000000 } .stripe_on td { 
background: #

2. Switch out to Terminal and restart pop3proxy, which has died.
3. Go back to the browser and reload the page.
4. Hit "Train." Get the same page of HTML tags as in (1).
5. Restart pop3proxy again.
6. Reload the page in the browser again. Get a message that the 
message has been trained.

Couple of questions:

* Do I need to run this as superuser as I've been doing? When I tried 
it without sudo, I got errors about permissions.

* Is anyone else trying this on OS X? I suspect I misconfigured something.

thanks!
jon

From skip at pobox.com  Mon Jan 27 13:28:11 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 27 14:28:21 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <a0521021eba5b35da4ca4@[192.168.1.101]>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
Message-ID: <15925.34891.379413.710608@montanaro.dyndns.org>

    Jon> Couple of questions:

    Jon> * Do I need to run this as superuser as I've been doing? When I
    Jon>   tried it without sudo, I got errors about permissions.

If you want it to listen on port 110.  Pick a high-numbered port, then teach
your MUA to connect to localhost on that port.

    Jon> * Is anyone else trying this on OS X? I suspect I misconfigured
    Jon>   something.

Yes, I am expermenting with it, though I don't actually use the POP proxying
features.  I still use fetchmail over ssh to grab mail from the remote host
and procmail with a couple recipes which invoke hammiefilter and/or proxytee
to classify/direct the message.

Skip


From jh at web.de  Mon Jan 27 20:29:03 2003
From: jh at web.de (Juergen Hermann)
Date: Mon Jan 27 14:29:43 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <PMZTMGAPO7TP7SRXVOMRNVB8FEEA.3e354367@myst>
Message-ID: <E18dEwK-0002mu-00@smtp.web.de>

On Mon, 27 Jan 2003 08:34:15 -0600, Tim Stone - Four Stones Expressions wrote:

>We need to begin examining release migration issues, particularly when the 
>database won't migrate between releases.  We should at least give instructions 
>on how to retrain, but better than that would be automagic upgrade of the 
>file.  

I think the best thing would be an ex-/import tool, with the additional benefit of 
being able to do that not just for upgrading.


Ciao, J?rgen


From richie at entrian.com  Mon Jan 27 19:49:11 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 27 14:50:00 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <a0521021eba5b35da4ca4@[192.168.1.101]>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
Message-ID: <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>

Hi Jonathan,

> I just installed pop3proxy on my OS X machine. It works beautifully. 
> It does seem to have one issue, though. Any time I use the web 
> application to train the incoming messages, the program will die with 
> a segmentation fault.

I don't use OS X myself, but others have reported that increasing the stack
size fixes this:

[Tony Lownds]
> tcsh: ulimit stacksize 2048
> 
> sh: ulimit -s 2048
> 
> Mac OS X's default is 512, I picked 2048 at random.

Hope that helps,

-- 
Richie Hindle
richie@entrian.com


From jon at bergenstreetsoftware.com  Mon Jan 27 14:53:21 2003
From: jon at bergenstreetsoftware.com (Jonathan Baumgartner)
Date: Mon Jan 27 14:53:30 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
 <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
Message-ID: <a05210222ba5b3e6d4f14@[192.168.1.101]>

At 7:49 PM +0000 1/27/03, Richie Hindle wrote:
>I don't use OS X myself, but others have reported that increasing the stack
>size fixes this:
>
>[Tony Lownds]
>>  tcsh: ulimit stacksize 2048
>>
>>  sh: ulimit -s 2048
>>
>>  Mac OS X's default is 512, I picked 2048 at random.
>
>Hope that helps,

Thanks Richie. ulimit doesn't appear to exist on my machine, though. 
At least, which and man have never heard of it.

jon

From python-spambayes at discworld.dyndns.org  Mon Jan 27 13:59:19 2003
From: python-spambayes at discworld.dyndns.org (Charles Cazabon)
Date: Mon Jan 27 14:56:44 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <a05210222ba5b3e6d4f14@[192.168.1.101]>;
	from jon@bergenstreetsoftware.com on Mon, Jan 27, 2003 at 02:53:21PM -0500
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
	<c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
	<a05210222ba5b3e6d4f14@[192.168.1.101]>
Message-ID: <20030127135919.A15195@discworld.dyndns.org>

Jonathan Baumgartner <jon@bergenstreetsoftware.com> wrote:
> 
> Thanks Richie. ulimit doesn't appear to exist on my machine, though. 
> At least, which and man have never heard of it.

It's frequently a shell builtin.  Try the documentation for your shell.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                 <python-spambayes@discworld.dyndns.org>
GPL'ed software available at:     http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------

From richie at entrian.com  Mon Jan 27 20:04:50 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Jan 27 15:05:51 2003
Subject: [Spambayes] uniform command line treatment of database/pickle
	files?
In-Reply-To: <EA4W3V9543EFCOIUUTHCKI96LI2ZIG.3e35726e@myst>
References: <v1sa3v86j8it0ks0i4s59u9drhjov6j0mb@4ax.com>
	<EA4W3V9543EFCOIUUTHCKI96LI2ZIG.3e35726e@myst>
Message-ID: <344b3v06bact2avtdnmo4a9e18cdajn0s3@4ax.com>


[Tim Stone]
> I'll change pop3proxy so that -d/-D work.  I don't think I can keep -p/-d 
> alive in the process...  -TimS

Great!  You shouldn't try to keep the old options as well - it would only
muddy the waters.

-- 
Richie Hindle
richie@entrian.com


From tony-bayes at lownds.com  Mon Jan 27 12:08:27 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Mon Jan 27 15:08:42 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <20030127135919.A15195@discworld.dyndns.org>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
 <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
 <a05210222ba5b3e6d4f14@[192.168.1.101]>
 <20030127135919.A15195@discworld.dyndns.org>
Message-ID: <a05200f16ba5b401d0e41@[10.0.1.2]>

At 1:59 PM -0600 1/27/03, Charles Cazabon wrote:
>Jonathan Baumgartner <jon@bergenstreetsoftware.com> wrote:
>>
>>  Thanks Richie. ulimit doesn't appear to exist on my machine, though.
>>  At least, which and man have never heard of it.
>
>It's frequently a shell builtin.  Try the documentation for your shell.

I got the command wrong :(

  tcsh: limit stacksize 2048

  sh: ulimit -s 2048

On tcsh it's limit, not ulimit.

Would it be desirable to have pop3proxy.py take care of this?

Jonathan,

If you get past the segfault issue and still have issues, let us know 
- I use pop3proxy on OS X quite successfully.

-Tony

From jon at bergenstreetsoftware.com  Mon Jan 27 15:10:54 2003
From: jon at bergenstreetsoftware.com (Jonathan Baumgartner)
Date: Mon Jan 27 15:11:01 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <a05200f16ba5b401d0e41@[10.0.1.2]>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
 <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
 <a05210222ba5b3e6d4f14@[192.168.1.101]>
 <20030127135919.A15195@discworld.dyndns.org>
 <a05200f16ba5b401d0e41@[10.0.1.2]>
Message-ID: <a05210225ba5b42a54c21@[192.168.1.101]>

At 12:08 PM -0800 1/27/03, Tony Lownds wrote:
>I got the command wrong :(
>
>  tcsh: limit stacksize 2048
>
>  sh: ulimit -s 2048
>
>On tcsh it's limit, not ulimit.
>
>Would it be desirable to have pop3proxy.py take care of this?
>
>Jonathan,
>
>If you get past the segfault issue and still have issues, let us 
>know - I use pop3proxy on OS X quite successfully.

Yippee! Thanks, Tony. That did it. Working perfectly now.

jon

From tim at fourstonesExpressions.com  Mon Jan 27 14:33:14 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 15:34:06 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <E18dEwK-0002mu-00@smtp.web.de>
Message-ID: <TQHG747673ZWLKSM6107UR2YFURFEZV.3e35978a@myst>

1/27/2003 1:29:03 PM, "Juergen Hermann" <jh@web.de> wrote:

>On Mon, 27 Jan 2003 08:34:15 -0600, Tim Stone - Four Stones Expressions 
wrote:
>
>>We need to begin examining release migration issues, particularly when the 
>>database won't migrate between releases.  We should at least give 
instructions 
>>on how to retrain, but better than that would be automagic upgrade of the 
>>file.  

My thoughts exactly.  - TimS

>
>I think the best thing would be an ex-/import tool, with the additional 
benefit of 
>being able to do that not just for upgrading.
>
>
>Ciao, J?rgen
>
>
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Mon Jan 27 12:39:29 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 27 15:39:37 2003
Subject: [Spambayes] uniform command line treatment of database/pickle
 files?
In-Reply-To: <v1sa3v86j8it0ks0i4s59u9drhjov6j0mb@4ax.com> (Richie Hindle's
 message of "Mon, 27 Jan 2003 17:50:24 +0000")
References: <15922.47299.145550.835866@montanaro.dyndns.org>
	<v1sa3v86j8it0ks0i4s59u9drhjov6j0mb@4ax.com>
Message-ID: <w53znpm1g5a.fsf@woozle.org>

Richie Hindle <richie@entrian.com> writes:

> I'm not attached to either version, so by all means change one of
> them.  I'd guess that more people are using command-line switches with
> hammie than with pop3proxy, so it should probably be pop3proxy that
> changes (but I don't have the time myself).  I don't know of any other
> tools that have similar switches, but I haven't looked.

Hey guys, I didn't mean to force any changes on anyone (okay well maybe
just a little)--it just seemed like -p was unneccesary.

I'd like to unify the location and type of the word database.  Does
anything still use the pickle?

Neale

From neale at woozle.org  Mon Jan 27 12:43:06 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 27 15:43:10 2003
Subject: [Spambayes] Oh, one other thing...
In-Reply-To: <9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com> (Richie Hindle's
 message of "Mon, 27 Jan 2003 17:59:20 +0000")
References: <15922.47722.760937.263327@montanaro.dyndns.org>
	<9fsa3v89or2nlkiakf8a61m5bj4lq1o757@4ax.com>
Message-ID: <w53wukq1fz9.fsf@woozle.org>

Richie Hindle <richie@entrian.com> writes:

> Neale has already implemented a similar idea over XMLRPC, in hammiesrv
> and hammiecli.

Just so everyone knows, I have absolutely no attachment to hammiecli and
hammiesrv.  In fact, they should probably be moved to contrib.  They
were just a lesson in making a good interface to the Hammie class.

Neale

From tim at fourstonesExpressions.com  Mon Jan 27 14:54:32 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 15:55:08 2003
Subject: [Spambayes] uniform command line treatment of database/pickle
	files?
In-Reply-To: <w53znpm1g5a.fsf@woozle.org>
Message-ID: <GAD8521SRIHA93XRKIYUBAWQMGJEM.3e359c88@myst>

1/27/2003 2:39:29 PM, Neale Pickett <neale@woozle.org> wrote:

>Richie Hindle <richie@entrian.com> writes:
>
>> I'm not attached to either version, so by all means change one of
>> them.  I'd guess that more people are using command-line switches with
>> hammie than with pop3proxy, so it should probably be pop3proxy that
>> changes (but I don't have the time myself).  I don't know of any other
>> tools that have similar switches, but I haven't looked.
>
>Hey guys, I didn't mean to force any changes on anyone (okay well maybe
>just a little)--it just seemed like -p was unneccesary.

Agreed.  Done.  Question is: does hammiebulk need the -p option, or can we use 
-d: dbmfilename and -D: picklefilename?

>
>I'd like to unify the location and type of the word database.  Does
>anything still use the pickle?

Well, I still use pickle, because I trust it a bit more on windoze than the 
dbm that I get with 2.2.  - TimS

>
>Neale
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From tim at fourstonesExpressions.com  Mon Jan 27 14:56:02 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Mon Jan 27 15:56:38 2003
Subject: [Spambayes] Another spam algorithm...
Message-ID: <GPM65043WCB1Y32JELH972ZEBYTA9LH.3e359ce2@myst>

This appeared on kuro5hin recently...  It's  a spam filtering 'algorithm' 
based on using gzip to measure compressability of an email message... hmmm... 
http://www.kuro5hin.org/story/2003/1/25/224415/367

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Mon Jan 27 13:14:23 2003
From: neale at woozle.org (Neale Pickett)
Date: Mon Jan 27 16:14:26 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com> (Richie Hindle's
 message of "Sat, 25 Jan 2003 00:02:30 +0000")
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
Message-ID: <w53ptqi1ej4.fsf@woozle.org>

Richie Hindle <richie@entrian.com> writes:

> Neale, do you think your Mutt edits will be ready by the middle of
> next week?  I haven't tried them but it sounds like they're pretty
> much there?

Absolutely.  Sorry for the delay, I just checked them in.

Thanks a ton for putting a release together, Richie.

Neale

From skip at pobox.com  Mon Jan 27 15:23:43 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 27 16:23:53 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <E18dEwK-0002mu-00@smtp.web.de>
References: <PMZTMGAPO7TP7SRXVOMRNVB8FEEA.3e354367@myst>
        <E18dEwK-0002mu-00@smtp.web.de>
Message-ID: <15925.41823.746577.327335@montanaro.dyndns.org>


    Juergen> I think the best thing would be an ex-/import tool, with the
    Juergen> additional benefit of being able to do that not just for
    Juergen> upgrading.

Might I suggest a simple csv export?  You could then use one of the csv
modules to import.  I'm working with Dave Cole, Kevin Altis and Cliff Wells
separately to try and settle on a single csv module for incorporation in to
the distribution.

Skip


From skip at pobox.com  Mon Jan 27 15:49:43 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 27 16:49:53 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <15925.41823.746577.327335@montanaro.dyndns.org>
References: <PMZTMGAPO7TP7SRXVOMRNVB8FEEA.3e354367@myst>
        <E18dEwK-0002mu-00@smtp.web.de>
        <15925.41823.746577.327335@montanaro.dyndns.org>
Message-ID: <15925.43383.517511.394625@montanaro.dyndns.org>

    Skip> I'm working with Dave Cole, Kevin Altis and Cliff Wells separately
    Skip> to try and settle on a single csv module for incorporation in to
    Skip> the distribution.

"the distribution" is "the Python distribution" not "the Spambayes
distribution", just so there's no confusion.

Skip

From mhammond at skippinet.com.au  Tue Jan 28 09:00:59 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Jan 27 17:01:54 2003
Subject: [Spambayes] To our friends down under... (off topic)
In-Reply-To: <A0VQA7WQ53LH3XA6622VNH73VQGDED3.3e357841@myst>
Message-ID: <008701c2c64f$93ec5410$530f8490@eden>

Cheers (hic!)

If it weren't for the fires and heat, it would be a good one!  Melbourne hit
43.4 last week!

Australian's-all-let-us-rejoice ly,

Mark.

> Happy Australia Day.  :)
>
> c'est moi - TimS
> http://www.fourstonesExpressions.com
> http://wecanstopspam.org
>
>
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes


From gary at inauspicious.org  Mon Jan 27 22:38:22 2003
From: gary at inauspicious.org (Gary Benson)
Date: Mon Jan 27 18:21:19 2003
Subject: [Spambayes] Details in headers
Message-ID: <20030127223812.GA17165@inauspicious.org>

Hi,

Just been playing around with Spambayes -- nice work guys!  I have one
question: how do you get the X-Spambayes-Classification header to
contain individual word scores as mentioned in INTEGRATION.txt?
Mine just say spam/ham/unsure and an overall score.

Cheers,
Gary

[ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]

From T.A.Meyer at massey.ac.nz  Tue Jan 28 12:23:09 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Jan 27 18:30:00 2003
Subject: [Spambayes] Re: Outlook: new folder selector code
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D392@its-xchg4.massey.ac.nz>

[David]
> Given Mark's comment about being tied up currently, if you were
> amenable to sending me a copy of your patches, I could also give them
> a shot on my system.

[Mark]
> I've checked in a fix for this.  No idea if it will fix your 
> exchange server
> error, but all the IDs should now be fully qualified after a 
> fresh install.

Sorry - long weekend here and so I didn't get to any email for a few days, otherwise I would have sent you the patch.  Mark's done it all now, anyway (and somewhat better, as usual, than my attempt).  :)

=Tony Meyer

From noreply at sourceforge.net  Mon Jan 27 15:37:14 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 18:40:35 2003
Subject: [Spambayes] [ spambayes-Bugs-675811 ] Dead buttons left on uninstall
Message-ID: <E18dIoU-00014E-00@sc8-sf-web3.sourceforge.net>

Bugs item #675811, was opened at 2003-01-28 12:37
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Dead buttons left on uninstall

Initial Comment:
The toolbar buttons are temporary, which causes 
problems if they are moved.  If they are permanent, then 
we are left with dead buttons if we uninstall the plugin 
(why would we do this? ;p ).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702

From noreply at sourceforge.net  Mon Jan 27 15:37:34 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 18:40:42 2003
Subject: [Spambayes] [ spambayes-Bugs-675811 ] Dead buttons left on uninstall
Message-ID: <E18dIoo-0005aJ-00@sc8-sf-web1.sourceforge.net>

Bugs item #675811, was opened at 2003-01-28 12:37
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
>Assigned to: Mark Hammond (mhammond)
Summary: Dead buttons left on uninstall

Initial Comment:
The toolbar buttons are temporary, which causes 
problems if they are moved.  If they are permanent, then 
we are left with dead buttons if we uninstall the plugin 
(why would we do this? ;p ).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675811&group_id=61702

From noreply at sourceforge.net  Mon Jan 27 15:40:03 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 18:40:51 2003
Subject: [Spambayes] [ spambayes-Bugs-675812 ] Outlook registration/doc issues
Message-ID: <E18dIrD-0001Au-00@sc8-sf-web3.sourceforge.net>

Bugs item #675812, was opened at 2003-01-28 12:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Outlook registration/doc issues

Initial Comment:
The plugin should be listed in Outlook's COM plug-ins 
list.  In fact, the doc says that this is so!  This is not the 
case (here at least).  This would allow nice removal (and 
addition??) rather than running addin.py --unregister and 
so on.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702

From noreply at sourceforge.net  Mon Jan 27 15:41:50 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon Jan 27 18:40:57 2003
Subject: [Spambayes] [ spambayes-Bugs-675812 ] Outlook registration/doc issues
Message-ID: <E18dIsw-0002kf-00@sc8-sf-web4.sourceforge.net>

Bugs item #675812, was opened at 2003-01-28 12:40
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
>Assigned to: Mark Hammond (mhammond)
Summary: Outlook registration/doc issues

Initial Comment:
The plugin should be listed in Outlook's COM plug-ins 
list.  In fact, the doc says that this is so!  This is not the 
case (here at least).  This would allow nice removal (and 
addition??) rather than running addin.py --unregister and 
so on.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=675812&group_id=61702

From jm at jmason.org  Tue Jan 28 01:09:46 2003
From: jm at jmason.org (Justin Mason)
Date: Mon Jan 27 20:09:01 2003
Subject: [Spambayes] To our friends down under... (off topic) 
In-Reply-To: Message from "Mark Hammond" <mhammond@skippinet.com.au> 
	<008701c2c64f$93ec5410$530f8490@eden> 
Message-ID: <20030128010952.34F4216F16@jmason.org>


Mark Hammond said:
> Cheers (hic!)
> 
> If it weren't for the fires and heat, it would be a good one!  Melbourne hit
> 43.4 last week!

43.4!!  Didn't bloody do that when I was living there last year ;)
cheers,

--j.

From anthony at interlink.com.au  Tue Jan 28 16:59:49 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 28 01:07:37 2003
Subject: [Spambayes] Alpha 2 Release? 
In-Reply-To: <200301271424.h0REOmm23697@localhost.localdomain> 
Message-ID: <200301280559.h0S5xnn31945@localhost.localdomain>


Another thing to consider might be man pages for the major command line
tools - I'm happy to have a shot at these...


-- 
Anthony Baxter     <anthony@interlink.com.au>   
It's never too late to have a happy childhood.


From tony-bayes at lownds.com  Mon Jan 27 23:01:53 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Tue Jan 28 02:02:26 2003
Subject: [Spambayes] Doh!
Message-ID: <a05200f31ba5bdaf22307@[10.0.1.2]>

pop3proxy.py has some syntax errors :)

% cvs diff
Index: pop3proxy.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
retrieving revision 1.45
diff -u -d -b -w -r1.45 pop3proxy.py
--- pop3proxy.py        27 Jan 2003 18:07:11 -0000      1.45
+++ pop3proxy.py        28 Jan 2003 06:47:08 -0000
@@ -1515,13 +1515,13 @@
              state.runTestServer = True
          elif opt == '-b':
              state.launchUI = True
-        elif opt == '-d':   // dbm file
+        elif opt == '-d':   # dbm file
              state.useDB = True
              options.pop3proxy_persistent_storage_file = arg
-        elif opt == '-D':   // pickle file
+        elif opt == '-D':   # pickle file
              state.useDB = False
              options.pop3proxy_persistent_storage_file = arg
-        elif opt == '-p':   // dead option
+        elif opt == '-p':   # dead option
              print >>sys.stderr, "-p option is no longer supported, use -D\n"
              print >>sys.stderr, __doc__
              sys.exit()

From tim at fourstonesExpressions.com  Tue Jan 28 06:32:21 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 28 07:32:58 2003
Subject: [Spambayes] Doh!
In-Reply-To: <a05200f31ba5bdaf22307@[10.0.1.2]>
Message-ID: <RPA098SNDB2ZZVOL43XOKUSC7LIMJSQ.3e367855@myst>

One of the main problems with writing in a half dozen languages at the same 
time... argh - TimS

1/28/2003 1:01:53 AM, Tony Lownds <tony-bayes@lownds.com> wrote:

>pop3proxy.py has some syntax errors :)
>
>% cvs diff
>Index: pop3proxy.py
>===================================================================
>RCS file: /cvsroot/spambayes/spambayes/pop3proxy.py,v
>retrieving revision 1.45
>diff -u -d -b -w -r1.45 pop3proxy.py
>--- pop3proxy.py        27 Jan 2003 18:07:11 -0000      1.45
>+++ pop3proxy.py        28 Jan 2003 06:47:08 -0000
>@@ -1515,13 +1515,13 @@
>              state.runTestServer = True
>          elif opt == '-b':
>              state.launchUI = True
>-        elif opt == '-d':   // dbm file
>+        elif opt == '-d':   # dbm file
>              state.useDB = True
>              options.pop3proxy_persistent_storage_file = arg
>-        elif opt == '-D':   // pickle file
>+        elif opt == '-D':   # pickle file
>              state.useDB = False
>              options.pop3proxy_persistent_storage_file = arg
>-        elif opt == '-p':   // dead option
>+        elif opt == '-p':   # dead option
>              print >>sys.stderr, "-p option is no longer supported, use -D
\n"
>              print >>sys.stderr, __doc__
>              sys.exit()
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From mwh at python.net  Tue Jan 28 12:36:44 2003
From: mwh at python.net (Michael Hudson)
Date: Tue Jan 28 07:36:56 2003
Subject: [Spambayes] Re: seg faults?
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
	<c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
Message-ID: <2m1y2xo3hf.fsf@starship.python.net>

Richie Hindle <richie@entrian.com> writes:

> [Tony Lownds]
>> tcsh: ulimit stacksize 2048
>> 
>> sh: ulimit -s 2048
>> 
>> Mac OS X's default is 512, I picked 2048 at random.

I think 2048 is the largest you can set it to, too.  Could be wrong,
and can't check just now...

Cheers,
M.


From skip at pobox.com  Tue Jan 28 07:11:45 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 28 08:11:51 2003
Subject: [Spambayes] Re: seg faults?
In-Reply-To: <2m1y2xo3hf.fsf@starship.python.net>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
        <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
        <2m1y2xo3hf.fsf@starship.python.net>
Message-ID: <15926.33169.344631.385423@montanaro.dyndns.org>


    >>> Mac OS X's default is 512, I picked 2048 at random.

    mh> I think 2048 is the largest you can set it to, too.  Could be wrong,
    mh> and can't check just now...

Nah, I set it on mine to 8192 with no problems...

    % uname -a
    Darwin montanaro.dyndns.org 6.3 Darwin Kernel Version 6.3: Sat Dec 14 03:11:25 PST 2002; root:xnu/xnu-344.23.obj~4/RELEASE_PPC  Power Macintosh powerpc
    % ulimit -a
    core file size        (blocks, -c) 0
    data seg size         (kbytes, -d) 6144
    file size             (blocks, -f) unlimited
    max locked memory     (kbytes, -l) unlimited
    max memory size       (kbytes, -m) unlimited
    open files                    (-n) 256
    pipe size          (512 bytes, -p) 1
    stack size            (kbytes, -s) 8192
    cpu time             (seconds, -t) unlimited
    max user processes            (-u) 100
    virtual memory        (kbytes, -v) 14336

Skip

From tim at fourstonesExpressions.com  Tue Jan 28 07:53:53 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 28 08:54:30 2003
Subject: [Spambayes] Great article by David Berlind on the spam conference
Message-ID: <7AJIMHYX3YB7XW95RP4ZNLB83105X.3e368b71@myst>

http://techupdate.zdnet.com/techupdate/stories/main/0,14179,2909482,00.html

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Tue Jan 28 12:08:04 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 28 15:08:11 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <15925.41823.746577.327335@montanaro.dyndns.org> (Skip
 Montanaro's message of "Mon, 27 Jan 2003 15:23:43 -0600")
References: <PMZTMGAPO7TP7SRXVOMRNVB8FEEA.3e354367@myst>
	<E18dEwK-0002mu-00@smtp.web.de>
	<15925.41823.746577.327335@montanaro.dyndns.org>
Message-ID: <w53lm15dom3.fsf@woozle.org>

Skip Montanaro <skip@pobox.com> writes:

>     Juergen> I think the best thing would be an ex-/import tool, with the
>     Juergen> additional benefit of being able to do that not just for
>     Juergen> upgrading.
>
> Might I suggest a simple csv export?  You could then use one of the csv
> modules to import.  I'm working with Dave Cole, Kevin Altis and Cliff Wells
> separately to try and settle on a single csv module for incorporation in to
> the distribution.

Sounds like a winner to me.  Import/Export would be incredibly easy,
too.  Just iterate through the dict.

From tim at fourstonesExpressions.com  Tue Jan 28 14:10:28 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 28 15:11:06 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <w53lm15dom3.fsf@woozle.org>
Message-ID: <YT1UTO82MGZYZVFETRA562C8DBXGB8.3e36e3b4@myst>

1/28/2003 2:08:04 PM, Neale Pickett <neale@woozle.org> wrote:

>Skip Montanaro <skip@pobox.com> writes:
>
>>     Juergen> I think the best thing would be an ex-/import tool, with the
>>     Juergen> additional benefit of being able to do that not just for
>>     Juergen> upgrading.
>>
>> Might I suggest a simple csv export?  You could then use one of the csv
>> modules to import.  I'm working with Dave Cole, Kevin Altis and Cliff Wells
>> separately to try and settle on a single csv module for incorporation in to
>> the distribution.
>
>Sounds like a winner to me.  Import/Export would be incredibly easy,
>too.  Just iterate through the dict.

It has the added benefit of being able to change from dbm to pickle to ... 
implementation without retraining... - TimS
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From neale at woozle.org  Tue Jan 28 12:13:00 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 28 15:13:04 2003
Subject: [Spambayes] Details in headers
In-Reply-To: <20030127223812.GA17165@inauspicious.org> (Gary Benson's
 message of "Mon, 27 Jan 2003 22:38:22 +0000")
References: <20030127223812.GA17165@inauspicious.org>
Message-ID: <w53hebtdodv.fsf@woozle.org>

Gary Benson <gary@inauspicious.org> writes:

> Hi,

Hi, Gary :)

> how do you get the X-Spambayes-Classification header to contain
> individual word scores as mentioned in INTEGRATION.txt?

$ cat <<EOD >~/.spambayesrc
[Hammie]
hammie_debug_header: True
EOD


For those in the windows crowd, please tell me a sensible place to look
for a .ini file, and then I'll tell you how to do this on your platform
:)

Neale

From tim at fourstonesExpressions.com  Tue Jan 28 14:17:14 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Tue Jan 28 15:17:56 2003
Subject: [Spambayes] Details in headers
In-Reply-To: <w53hebtdodv.fsf@woozle.org>
Message-ID: <PJVS2PJ51LHM71XLF851V6264UQLK.3e36e54a@myst>

1/28/2003 2:13:00 PM, Neale Pickett <neale@woozle.org> wrote:

>Gary Benson <gary@inauspicious.org> writes:
>
>> Hi,
>
>Hi, Gary :)
>
>> how do you get the X-Spambayes-Classification header to contain
>> individual word scores as mentioned in INTEGRATION.txt?
>
>$ cat <<EOD >~/.spambayesrc
>[Hammie]
>hammie_debug_header: True
>EOD
>
>
>For those in the windows crowd, please tell me a sensible place to look
>for a .ini file, and then I'll tell you how to do this on your platform
>:)

The right way to do it in any case is to use the Option Configurator.  I think 
it's now invoked from the pop3proxy?  Richie knows this stuff now.  I haven't 
gotten it all going since the reorg yet.  - TimS
>
>Neale
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From T.A.Meyer at massey.ac.nz  Wed Jan 29 09:22:26 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 28 15:23:05 2003
Subject: [Spambayes] Details in headers
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD32@its-xchg4.massey.ac.nz>

> For those in the windows crowd, please tell me a sensible 
> place to look
> for a .ini file

This was asked before I think, so I guess it needs an answer :)

Based on a search through my system, the overwhelming vote from developers seems to be in the directory of the application itself (so spambayes/).  The leading second choise is in "[main drive]:\Documents and Settings\[username]\Application Data\[Application Name]\".

I suspect the second one, although less common, is more correct.  Perhaps that's just they way I'm leaning ;).

This is from a Win2k system; IIRC NT and XP have the same sort of structure, but Win9* would not.

=Tony Meyer

From piersh at friskit.com  Tue Jan 28 13:18:41 2003
From: piersh at friskit.com (Piers Haken)
Date: Tue Jan 28 16:01:54 2003
Subject: [Spambayes] Details in headers
Message-ID: <9891913C5BFE87429D71E37F08210CB9297554@zeus.sfhq.friskit.com>

The recommended way to get this path is to call:

SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE);

Remember, the user may not have write access to the spambayes
installation directory.

Piers.

> -----Original Message-----
> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz] 
> Sent: Tuesday, January 28, 2003 12:22 PM
> To: spambayes@python.org
> Subject: RE: [Spambayes] Details in headers
> 
> 
> > For those in the windows crowd, please tell me a sensible
> > place to look
> > for a .ini file
> 
> This was asked before I think, so I guess it needs an answer :)
> 
> Based on a search through my system, the overwhelming vote 
> from developers seems to be in the directory of the 
> application itself (so spambayes/).  The leading second 
> choise is in "[main drive]:\Documents and 
> Settings\[username]\Application Data\[Application Name]\".
> 
> I suspect the second one, although less common, is more 
> correct.  Perhaps that's just they way I'm leaning ;).
> 
> This is from a Win2k system; IIRC NT and XP have the same 
> sort of structure, but Win9* would not.
> 
> =Tony Meyer
> 
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes
> 
From mhammond at skippinet.com.au  Wed Jan 29 08:20:30 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Jan 28 16:21:34 2003
Subject: [Spambayes] Details in headers
In-Reply-To: <9891913C5BFE87429D71E37F08210CB9297554@zeus.sfhq.friskit.com>
Message-ID: <012201c2c713$167bbfb0$530f8490@eden>

> The recommended way to get this path is to call:
> 
> SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE);
> 
> Remember, the user may not have write access to the spambayes
> installation directory.

MS are starting to come up with good reasons for doing this too.  Apart from
the "Portable profiles" (or whatever they called it where your full profile
followed you whereever you went on the corporate LAN), Windows XP now has a
"wizard" that lets you migrate all of your documents and settings to another
computer.  A friend of mine tried this, and it worked well - except it did
require that apps stored their data in these correct places.

The Outlook addin is almost certainly going to use this API.  FYI, it is
used from Python thusly:

>>> from win32com.shell import shell, shellcon
<snip lots of FutureWarnings :( >

>>> shell.SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA)
u'E:\\Documents and Settings\\skip\\Application Data'
>>> 

Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2866 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes/attachments/20030129/bf7402fb/winmail.bin
From T.A.Meyer at massey.ac.nz  Wed Jan 29 10:23:08 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 28 16:23:46 2003
Subject: [Spambayes] Details in headers
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3A7@its-xchg4.massey.ac.nz>

[Piers]
> The recommended way to get this path is to call: 
> SHGetSpecialFolderPath (,,CSIDL_APPDATA,FALSE); 
> Remember, the user may not have write access to the spambayes installation directory. 

Hmm...yes I should have said that.  Although:

from win32com.shell import shell, shellcon
path = shell.SHGetSpecialFolderPath(0, shellcon.CSIDL_APPDATA)

is somewhat less c and somewhat more python.  (I think this defaults to the third (create directory) parameter being false).

=Tony Meyer

From richie at entrian.com  Tue Jan 28 21:34:58 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Jan 28 16:35:48 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <a05200f16ba5b401d0e41@[10.0.1.2]>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
	<c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
	<a05210222ba5b3e6d4f14@[192.168.1.101]>
	<20030127135919.A15195@discworld.dyndns.org>
	<a05200f16ba5b401d0e41@[10.0.1.2]>
Message-ID: <vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>


[Tony]
>   sh: ulimit -s 2048
> Would it be desirable to have pop3proxy.py take care of this?

Is that possible?  Can a process increase its own stack size?  Or would we
need a shellscript wrapper?  Any Mac OS X users fancy taking on the job?

Questions, questions... 8-)

-- 
Richie Hindle
richie@entrian.com


From tony-bayes at lownds.com  Tue Jan 28 13:58:42 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Tue Jan 28 16:58:54 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
 <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
 <a05210222ba5b3e6d4f14@[192.168.1.101]>
 <20030127135919.A15195@discworld.dyndns.org>
 <a05200f16ba5b401d0e41@[10.0.1.2]>
 <vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>
Message-ID: <a05200f44ba5ca995f239@[10.0.1.2]>

At 9:34 PM +0000 1/28/03, Richie Hindle wrote:
>[Tony]
>>    sh: ulimit -s 2048
>>  Would it be desirable to have pop3proxy.py take care of this?
>
>Is that possible?  Can a process increase its own stack size?

Yep!

STACK_NEED = 4<<20
import resource
soft, hard = resource.getrlimit (resource.RLIMIT_STACK)
if soft < STACK_NEED:
   resource.setrlimit (resource.RLIMIT_STACK, (STACK_NEED, hard))

>  Or would we
>need a shellscript wrapper?  Any Mac OS X users fancy taking on the job?

Sure - its a matter of machinery really.

>Questions, questions... 8-)
>

Where would I put this? My suggestion is

spambayes/platform.py

That file would contain code like:

if windows:
   from platform_win import *
elif sys.platform == 'darwin':
   from platform_darwin import *
else:
   # set any defaults
   pass

Then, other parts of spambayes could get attributes from 
spambayes.platform, like, say, where to store database files by 
default. A little machinery for platform-specific stuff seems way 
better to me than sprinkling "if sys.platform...' checks all over the 
place.

-Tony


From skip at pobox.com  Tue Jan 28 16:08:09 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Jan 28 17:08:27 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
        <c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
        <a05210222ba5b3e6d4f14@[192.168.1.101]>
        <20030127135919.A15195@discworld.dyndns.org>
        <a05200f16ba5b401d0e41@[10.0.1.2]>
        <vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>
Message-ID: <15926.65353.991418.385713@montanaro.dyndns.org>


    Richie> [Tony]
    >> sh: ulimit -s 2048
    >> Would it be desirable to have pop3proxy.py take care of this?

    Richie> Is that possible?  Can a process increase its own stack size?
    Richie> Or would we need a shellscript wrapper?  Any Mac OS X users
    Richie> fancy taking on the job?

This topic came up on python-dev.  The conclusion there was that the
regression test script should take care of this for its own needs, but not
to do this in general.  Here's the relevant code from Lib/test/regrtest.py:

    # MacOSX (a.k.a. Darwin) has a default stack size that is too small
    # for deeply recursive regular expressions.  We see this as crashes in
    # the Python test suite when running test_re.py and test_sre.py.  The
    # fix is to set the stack limit to 2048.
    # This approach may also be useful for other Unixy platforms that
    # suffer from small default stack limits.
    if sys.platform == 'darwin':
        try:
            import resource
        except ImportError:
            pass
        else:
            soft, hard = resource.getrlimit(resource.RLIMIT_STACK)
            newsoft = min(hard, max(soft, 1024*2048))
            resource.setrlimit(resource.RLIMIT_STACK, (newsoft, hard))

Skip

From noreply at sourceforge.net  Tue Jan 28 14:19:30 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue Jan 28 17:44:32 2003
Subject: [Spambayes] 
 [ spambayes-Feature Requests-676401 ] Outlook: Storage in default
 user directory
Message-ID: <E18de4o-0005lo-00@sc8-sf-web3.sourceforge.net>

Feature Requests item #676401, was opened at 2003-01-29 11:19
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Outlook: Storage in default user directory

Initial Comment:
Follows from comments in spambayes list from Piers 
Haken and Mark Hammond.

It would be nice if the plugin stored the pck and ini files 
in a more appropriate folder than the outlook root folder - 
as Piers commented, the user might not have write 
access there.

The folder SHGetSpecialFolderPath(0, 
shellcon.CSIDL_APPDATA) would probably be the best 
place.  The pck's are created by the plugin and so are 
easy; how the default .ini file gets there is another issue.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498106&aid=676401&group_id=61702

From gary at inauspicious.org  Tue Jan 28 23:58:00 2003
From: gary at inauspicious.org (Gary Benson)
Date: Tue Jan 28 18:58:07 2003
Subject: [Spambayes] Details in headers
In-Reply-To: <w53hebtdodv.fsf@woozle.org>
References: <20030127223812.GA17165@inauspicious.org>
	<w53hebtdodv.fsf@woozle.org>
Message-ID: <20030128235800.GK19499@inauspicious.org>

Neale Pickett wrote:
> Gary Benson <gary@inauspicious.org> writes:
> 
> > Hi,
> 
> Hi, Gary :)

Well hello Neale :)

> > how do you get the X-Spambayes-Classification header to contain
> > individual word scores as mentioned in INTEGRATION.txt?
> 
> $ cat <<EOD >~/.spambayesrc
> [Hammie]
> hammie_debug_header: True
> EOD

Thank you.  I was using hammiefilter and was trying:

| [hammiefilter]
| hammie_debug_header: True

(but that doesn't work)

Cheers,
Gary

[ gary@inauspicious.org ][ GnuPG 85A8F78B ][ http://inauspicious.org/ ]

From neale at woozle.org  Tue Jan 28 16:58:13 2003
From: neale at woozle.org (Neale Pickett)
Date: Tue Jan 28 19:58:40 2003
Subject: [Spambayes] Details in headers
In-Reply-To: <20030128235800.GK19499@inauspicious.org> (Gary Benson's
 message of "Tue, 28 Jan 2003 23:58:00 +0000")
References: <20030127223812.GA17165@inauspicious.org>
	<w53hebtdodv.fsf@woozle.org> <20030128235800.GK19499@inauspicious.org>
Message-ID: <w53vg08db6i.fsf@woozle.org>

Gary Benson <gary@inauspicious.org> writes:

> Thank you.  I was using hammiefilter and was trying:
>
> | [hammiefilter]
> | hammie_debug_header: True

Yeah.  One thing we should put on the roadmap is getting rid of the
hammie_ prefix on config items under the [hammie] .ini section.  Right
now we're ignoring the section names and creating namespaces based on
property name.  I think that's confusing, to say nothing of the needless
extra typing.

Neale

From anthony at interlink.com.au  Wed Jan 29 14:18:55 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Jan 28 22:21:04 2003
Subject: [Spambayes] vague standardisation of whitespace in code?
Message-ID: <200301290318.h0T3IwD28422@localhost.localdomain>

I'm just about to do a mungo-commit to clean up the whitespace
issues over the whole codebase (using reindent.py) to try and
keep things neat :) 

Is it worth putting something in the CVS commit scripts to 
either fix whitespace, or else to whine if it's not in the
usual 4 space indents?


From T.A.Meyer at massey.ac.nz  Wed Jan 29 16:23:57 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Jan 28 22:24:33 2003
Subject: [Spambayes] Outlook Plugin: Resetting messages as unread
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3CF@its-xchg4.massey.ac.nz>

I've noticed recently that when a message is scored it gets reset back to 'unread' (which it normally would be, but when my machine is working hard I can read a message before it manages to get scored).

Should/can this be fixed?

=Tony Meyer

From tim_one at email.msn.com  Tue Jan 28 22:28:51 2003
From: tim_one at email.msn.com (Tim Peters)
Date: Tue Jan 28 22:29:06 2003
Subject: [Spambayes] vague standardisation of whitespace in code?
In-Reply-To: <200301290318.h0T3IwD28422@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCKEOADKAB.tim_one@email.msn.com>

[Anthony Baxter]
> I'm just about to do a mungo-commit to clean up the whitespace
> issues over the whole codebase (using reindent.py) to try and
> keep things neat :)

    reindent -v -r .

from the root should do it all.

> Is it worth putting something in the CVS commit scripts to
> either fix whitespace, or else to whine if it's not in the
> usual 4 space indents?

Nope -- you can't get people to care (enough), different editors leave
different kinds of slop behind, and running reindent every now & again is
painless.  The std checkin comment for this is "Whitespace normalization" --
every Python and Zope developer instinctively ignores such checkins, so it's
also a good comment to make on a controversial change you don't want anyone
to notice <wink>.


From Paul.Moore at atosorigin.com  Wed Jan 29 09:14:44 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Wed Jan 29 04:16:51 2003
Subject: [Spambayes] Details in headers
Message-ID: <16E1010E4581B049ABC51D4975CEDB886199BC@UKDCX001.uk.int.atosorigin.com>

From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz]
> Based on a search through my system, the overwhelming vote from
> developers seems to be in the directory of the application itself (so
> spambayes/). The leading second choise is in "[main drive]:\Documents
> and Settings\[username]\Application Data\[Application Name]\".

One big problem with the second option is that the "Application Data"
directory in the middle of that is hidden, and on Windows this makes it
*very* hard to get at. Explorer doesn't display it unless you change a
global setting, command line completion and the like ignores it. You
basically have to type it in exactly as written, with no help at all
from the system. And don't forget the quotes you need because of the
space!

I agree the second option is by far the most correct. But putting it
there makes it 99.9% inaccessible for all but the most determined of
users. So you'd better not be expecting the user to edit it manually.
And your install/uninstall process had better create and delete the
directory with no user intervention (for all users, not just the one
doing the (un)install!!!)

As usual, MS had a sensible idea, and then broke it totally in the name
of "user friendliness" :-(

Paul.

From piersh at friskit.com  Wed Jan 29 02:59:30 2003
From: piersh at friskit.com (Piers Haken)
Date: Wed Jan 29 05:42:36 2003
Subject: [Spambayes] Details in headers
Message-ID: <9891913C5BFE87429D71E37F08210CB9297557@zeus.sfhq.friskit.com>

I think that's a bit harsh. The directory is called "Application Data",
not "My Documents": it's designed to be used by well-behaved
applications only and it's generally a bad idea for users to go mucking
about with stuff in there (we're not talking Python developers here,
folks).

Also, it's generally considered bad form to delete the users' data when
the application is uninstalled. The idea is that the user can pick up
where they left off if they reinstall the program. Would you like the
Office uninstall program to go through your hard drive deleting all your
word documents? I think not...

Piers.

> -----Original Message-----
> From: Moore, Paul [mailto:Paul.Moore@atosorigin.com] 
> Sent: Wednesday, January 29, 2003 1:15 AM
> To: Meyer, Tony; spambayes@python.org
> Subject: RE: [Spambayes] Details in headers
> 
> 
> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz]
> > Based on a search through my system, the overwhelming vote from 
> > developers seems to be in the directory of the application 
> itself (so 
> > spambayes/). The leading second choise is in "[main 
> drive]:\Documents 
> > and Settings\[username]\Application Data\[Application Name]\".
> 
> One big problem with the second option is that the 
> "Application Data" directory in the middle of that is hidden, 
> and on Windows this makes it
> *very* hard to get at. Explorer doesn't display it unless you 
> change a global setting, command line completion and the like 
> ignores it. You basically have to type it in exactly as 
> written, with no help at all from the system. And don't 
> forget the quotes you need because of the space!
> 
> I agree the second option is by far the most correct. But 
> putting it there makes it 99.9% inaccessible for all but the 
> most determined of users. So you'd better not be expecting 
> the user to edit it manually. And your install/uninstall 
> process had better create and delete the directory with no 
> user intervention (for all users, not just the one doing the 
> (un)install!!!)
> 
> As usual, MS had a sensible idea, and then broke it totally 
> in the name of "user friendliness" :-(
> 
> Paul.
> 
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org http://mail.python.org/mailman/listinfo/spambayes
> 
From Paul.Moore at atosorigin.com  Wed Jan 29 12:35:10 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Wed Jan 29 07:37:06 2003
Subject: [Spambayes] Details in headers
Message-ID: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com>

From: Piers Haken [mailto:piersh@friskit.com]
> I think that's a bit harsh. The directory is called
> "Application Data", not "My Documents": it's designed
> to be used by well-behaved applications only and it's
> generally a bad idea for users to go mucking about with
> stuff in there

Sorry - I thought we were talking about the location of
the INI file, which (at the moment, at least) is intended
to be user editable.

I've no problem with this location for purely application
maintained configuration data.

But I still think that there should at least be an option
for the application directory to get deleted on uninstall -
otherwise you get the same problem as with the registry of
configuration data for uninstalled applications just getting
left around and forgotten.

Paul.

From neale at woozle.org  Wed Jan 29 08:07:54 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 29 11:08:42 2003
Subject: [Spambayes] Details in headers
In-Reply-To: 
	<16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com>
	("Moore, Paul"'s message of "Wed, 29 Jan 2003 12:35:10 -0000")
References: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com>
Message-ID: <w53k7go2b39.fsf@woozle.org>

"Moore, Paul" <Paul.Moore@atosorigin.com> writes:

> Sorry - I thought we were talking about the location of the INI file,
> which (at the moment, at least) is intended to be user editable.

Yes.  I was, at least.

> But I still think that there should at least be an option for the
> application directory to get deleted on uninstall - otherwise you get
> the same problem as with the registry of configuration data for
> uninstalled applications just getting left around and forgotten.

So I think I'm going to do what I did the last time I brought this up,
which was nothing.  I'm not in a good position to tell windows users
where their files should live, so I'm going to punt.  Hopefully someone
who is in a good position to make a decision about this will check
something in, and then we'll all just use that.

We could always put it in C:\spam.ini ;)

Neale

From neale at woozle.org  Wed Jan 29 08:21:04 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 29 11:21:09 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <w53adhk2ahb.fsf@woozle.org>

Hokay.  I gave a talk on SpamBayes at work the week before the spam
conference, and now all these people are hopping up and down wanting to
run it.  One of our more tenacious tech writers installed the bugger and
hit me with a list of suggestions, which I said I'd pass on to you fine
folks.  So here you are.  Please excuse me if any of these are already
solved--she pulled down the released copy on our web page AIUI.

* In her words, "When you filter to an online folder, SpamBayes
  automatically disables filtering when you connect offline. What I
  would like is that when I reconnect, SpamBayes should automatically
  reenable filtering and run it against those folders. Now I have to do
  this manually."

* She says that the plugin is definitely not filtering public folders.

* Apparently Outlook comes with a "Junk Email" folder.  Instead of
  telling folks to create a "Spam Certain" folder, just have the plugin
  default to sending spam into the Junk folder where folks are used to
  filtering their spam already.

* She feels end-users need more education about what "spam-possible"
  means.

* The sliders in the configuration window should have tick marks.

* In the anti-spam dialog box:

  o  Enable filtering checkbox should be below filters, since you have
     to enable filtering before you can mess with the filters.

  o  The filters box needs a scrollbar, for those with a ton of folders
     to filter so you can see the text.

* Add a "spam column" in the anti-spam pulldown, so it's easy to add a
  new "spam %" column in the current folder view.

* She suggested deleting from public folders should go into a public
  spam folder.

And finally the one I find most intriguing:

* All outbound mail should be trained as ham

I really like this last one.  I don't know if anyone's ever thought of
training on outbound mail before.

Anyhow, that's all.  For the time being, feel free to use me as a
go-between.  I may demand she join this list if I have to relay too
much, though :)

Oh, and by the way, she really digs the Outlook plugin.  Like, her
diggatude is off the charts, that's how much she digs it.

Neale


From tim at fourstonesExpressions.com  Wed Jan 29 10:36:43 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Wed Jan 29 11:37:25 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <w53adhk2ahb.fsf@woozle.org>
Message-ID: <2UO09B0NIAMKTFAICLK4XH4YGFMJ.3e38031b@myst>

1/29/2003 10:21:04 AM, Neale Pickett <neale@woozle.org> wrote:

>Hokay.  I gave a talk on SpamBayes at work the week before the spam
>conference, and now all these people are hopping up and down wanting to
>run it.  One of our more tenacious tech writers installed the bugger and
>hit me with a list of suggestions, which I said I'd pass on to you fine
>folks.  So here you are.  Please excuse me if any of these are already
>solved--she pulled down the released copy on our web page AIUI.
>
>* In her words, "When you filter to an online folder, SpamBayes
>  automatically disables filtering when you connect offline. What I
>  would like is that when I reconnect, SpamBayes should automatically
>  reenable filtering and run it against those folders. Now I have to do
>  this manually."
>
>* She says that the plugin is definitely not filtering public folders.
>
>* Apparently Outlook comes with a "Junk Email" folder.  Instead of
>  telling folks to create a "Spam Certain" folder, just have the plugin
>  default to sending spam into the Junk folder where folks are used to
>  filtering their spam already.
>
>* She feels end-users need more education about what "spam-possible"
>  means.
>
>* The sliders in the configuration window should have tick marks.
>
>* In the anti-spam dialog box:
>
>  o  Enable filtering checkbox should be below filters, since you have
>     to enable filtering before you can mess with the filters.

Leave it to a tech writer....

>
>  o  The filters box needs a scrollbar, for those with a ton of folders
>     to filter so you can see the text.
>
>* Add a "spam column" in the anti-spam pulldown, so it's easy to add a
>  new "spam %" column in the current folder view.
>
>* She suggested deleting from public folders should go into a public
>  spam folder.
>
>And finally the one I find most intriguing:
>
>* All outbound mail should be trained as ham
>
>I really like this last one.  I don't know if anyone's ever thought of
>training on outbound mail before.

I made this suggestion a long time ago, and the powers-that-be decided it was 
decidedly useless.  Don't quite remember why... - TimS

>
>Anyhow, that's all.  For the time being, feel free to use me as a
>go-between.  I may demand she join this list if I have to relay too
>much, though :)
>
>Oh, and by the way, she really digs the Outlook plugin.  Like, her
>diggatude is off the charts, that's how much she digs it.
>
>Neale
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Wed Jan 29 11:04:46 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 29 12:10:16 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <w53adhk2ahb.fsf@woozle.org>
References: <w53adhk2ahb.fsf@woozle.org>
Message-ID: <15928.2478.537965.516443@montanaro.dyndns.org>


    Neale> One of our more tenacious tech writers ...

You know when a company has good tech writers because their documentation is
head and shoulders above the competitions.  I like to think of them as
librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-)

Good tech writers also make extremely good testers because they want the
documentation and the application to match exactly.

Skip

From richie at entrian.com  Tue Jan 28 17:50:47 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Jan 29 12:51:38 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <w53ptqi1ej4.fsf@woozle.org>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org>
Message-ID: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>


[Neale]
> Thanks a ton for putting a release together, Richie.

No problem.  I'm hoping to do this on Friday evening UK time, if that's OK
with everyone else?

I'll fix the Mac OS X stack size problem for pop3proxy before the release -
I may not have time to do it "properly" by introducing a new
platform-dependent module, but we can munge things around afterwards.  It's
more important to get a release out before the Linux Journal articles are
published, and it only seems to be the pop3proxy that has the problem.

Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time
for a deeper investigation I'll do one.

-- 
Richie

From tim at fourstonesExpressions.com  Wed Jan 29 12:21:06 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Wed Jan 29 13:21:56 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <15928.2478.537965.516443@montanaro.dyndns.org>
Message-ID: <WRGAFDN72MLPK7Q4398YUIHHE87HD.3e381b92@myst>

1/29/2003 11:04:46 AM, Skip Montanaro <skip@pobox.com> wrote:

>
>    Neale> One of our more tenacious tech writers ...
>
>You know when a company has good tech writers because their documentation is
>head and shoulders above the competitions.  I like to think of them as
>librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-)
>
>Good tech writers also make extremely good testers because they want the
>documentation and the application to match exactly.
Think we could get her to write our doc?  - TimS
>
>Skip
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From francois.granger at free.fr  Wed Jan 29 20:02:18 2003
From: francois.granger at free.fr (Francois Granger)
Date: Wed Jan 29 14:02:29 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
 <w53ptqi1ej4.fsf@woozle.org> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
Message-ID: <a05200f2aba5dd54b1b98@[192.168.1.20]>

At 17:50 +0000 28/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Richie Hindle wrote:
>[Neale]
>>  Thanks a ton for putting a release together, Richie.

My thanks as well.

>Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time
>for a deeper investigation I'll do one.

No problem.
I did not saw the classification problem since the other day. It 
seems that it is solved.

I got a new fresh traceback tonight when I asked for review:

Traceback (most recent call last):

   File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in 
found_terminator
     getattr(plugin, name)(**params)

   File "/Volumes/OS99/spambayes/pop3proxy.py", line 932, in onReview
     self._appendMessages(page.table, messages, label)

   File "/Volumes/OS99/spambayes/pop3proxy.py", line 823, in _appendMessages
     table += row

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 787, in __iadd__
     nodes = self._nodeListFromSource(other)

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640, 
in _nodeListFromSource
     tree = _generateTree("<x>"+value+"</x>")

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 574, 
in _generateTree
     g.feed(source)

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 499, in feed
     self._parser.Parse(data)

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 529, 
in StartElementHandler
     newAttributes[str(name)] = self._unmungeEntities(str(value))

UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in 
position 86: ordinal not in range(128)


-- 
Recently using MacOSX.......

From neale at woozle.org  Wed Jan 29 12:38:21 2003
From: neale at woozle.org (Neale Pickett)
Date: Wed Jan 29 15:38:31 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f2aba5dd54b1b98@[192.168.1.20]> (Francois Granger's
 message of "Wed, 29 Jan 2003 20:02:18 +0100")
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]>
Message-ID: <w53r8avzo76.fsf@woozle.org>

Francois Granger <francois.granger@free.fr> writes:

> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
> position 86: ordinal not in range(128)

Yeah, my wife's been getting those too.  I'll look into her traceback.

Yikes!  I just swallowed the tine of a plastic fork!

Neale

From richie at entrian.com  Tue Jan 28 21:41:33 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Jan 29 16:46:34 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <w53r8avzo76.fsf@woozle.org>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
Message-ID: <c2ud3vk5hbtoukl1547q8eesvrfl3a7bbe@4ax.com>


[Fran?ois]
> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
> position 86: ordinal not in range(128)

This is bizarre.  This is expat complaining that you can't have high-bit
characters in ASCII XML, which is quite right, but I replace all those
characters with charrefs on the way in:

>>> def replaceHighCharacters(match):
...     return "&#%d;" % ord(match.group(1))
...
>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d")
u'a b &#233; c d'

So what's going on...?


> Yikes!  I just swallowed the tine of a plastic fork!

That'll teach you to try to get out of doing the washing up.  8-)

-- 
Richie Hindle
richie@entrian.com


From francois.granger at free.fr  Wed Jan 29 22:40:03 2003
From: francois.granger at free.fr (Francois Granger)
Date: Wed Jan 29 16:57:50 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <w53r8avzo76.fsf@woozle.org>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
Message-ID: <a05200f2bba5dfa5c91b6@[192.168.1.20]>

At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Neale Pickett wrote:
>Francois Granger <francois.granger@free.fr> writes:
>
>>  UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>>  position 86: ordinal not in range(128)
>
>Yeah, my wife's been getting those too.  I'll look into her traceback.

Your wife speaks some foreign language ? ;-)

>Yikes!  I just swallowed the tine of a plastic fork!

My apologies for this, it is not _that_ important ;-)

Thanks for all.

-- 
Recently using MacOSX.......

From vanhorn at whidbey.com  Wed Jan 29 14:11:55 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Wed Jan 29 17:12:00 2003
Subject: [Spambayes] Details in headers
References: <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com>
Message-ID: <3E3851AB.18018F9E@whidbey.com>

Speaking as one who provides tech support for a hundred or so Windows
users, I find it perverse to put any file where Windows may change it or
obscure it. Against the flow from Redmond, I want my users to put their
data in folders they specifically control (normally on a file server,
never in "My Documents"). I want applications to put everything possible
in their respective directories, not in the registry, not in the current
equivalent of /windows/system. (And I always want to see all file
extensions!)

I imagine that I'll end up putting some form of Spambayes on at least a
couple of dozen systems, so I'll get used to whatever is done, but I'd
strongly prefer that the file locations be easily understood and easily
learned so others don't have to spend so much time when a user messes an
installation up.

Van

"Moore, Paul" wrote:

> From: Piers Haken [mailto:piersh@friskit.com]
> > I think that's a bit harsh. The directory is called
> > "Application Data", not "My Documents": it's designed
> > to be used by well-behaved applications only and it's
> > generally a bad idea for users to go mucking about with
> > stuff in there
>
> Sorry - I thought we were talking about the location of
> the INI file, which (at the moment, at least) is intended
> to be user editable.
>
> I've no problem with this location for purely application
> maintained configuration data.
>
> But I still think that there should at least be an option
> for the application directory to get deleted on uninstall -
> otherwise you get the same problem as with the registry of
> configuration data for uninstalled applications just getting
> left around and forgotten.
>
> Paul.
>
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes

--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD

For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------


From T.A.Meyer at massey.ac.nz  Thu Jan 30 11:32:01 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Jan 29 17:32:40 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD37@its-xchg4.massey.ac.nz>

> * In her words, "When you filter to an online folder, SpamBayes
>   automatically disables filtering when you connect offline. What I
>   would like is that when I reconnect, SpamBayes should automatically
>   reenable filtering and run it against those folders. Now I 
>   have to do this manually."
I don't use any offline features, so I can't comment on this.

> * Apparently Outlook comes with a "Junk Email" folder.
Hmm...does anyone else's Outlook have a "Junk Email" folder?  Mine (2000 SR1) certainly didn't come with one.

> * She feels end-users need more education about what "spam-possible" means.
That would be a documentation issue, right?  And wasn't she a writer? ....

> * The sliders in the configuration window should have tick marks.
I guess I would agree with that.  I don't know who would even use the sliders when there's a text box just there, but ...

> * In the anti-spam dialog box:
>   o  Enable filtering checkbox should be below filters, since you have
>      to enable filtering before you can mess with the filters.
Agreed.

>   o  The filters box needs a scrollbar, for those with a ton of folders
>      to filter so you can see the text.
Or some other way of showing them all.  A scrollbar would make it ugly, wouldn't it?

> * Add a "spam column" in the anti-spam pulldown, so it's easy to add a
>   new "spam %" column in the current folder view.
Is it possible to customise the current view via code?  I wondered about doing this myself, since I seem to be constantly adding the column to new folders, but couldn't find any information about doing so (I must admit I didn't look that hard).

> * She says that the plugin is definitely not filtering public folders.
This might be an issue that has been resolved (I'm not sure what version the release had).  I'm checking this out on my system, but I don't have a lot of (mail) public folders (they're mostly calendars).  I'll have to wait until one of them gets mail.

> * She suggested deleting from public folders should go into a public
>   spam folder.
Perhaps there could be an option to have mail from each folder you filter:
(a) go to the same uncertain/spam folders [as now]
(b) go to individual uncertain/spam folders [one set per filtered folder]
This would be quite a big interface change, though.  Do people think it's worth it?

> * All outbound mail should be trained as ham
> I really like this last one.  I don't know if anyone's ever thought of
> training on outbound mail before.
Tim's post on this is in the November 2002 archive - "Bayes Training".  The arguments against were:
* Because some spam is 'from' yourself, this deteriates the helpfulness of the from header.
* It's easy to find enough ham; much more deteriates the ratio.

If a user saves their outgoing mail (in "Sent Items"), for example, then it's easy to train on that folder.  I do this.

=Tony Meyer

From T.A.Meyer at massey.ac.nz  Thu Jan 30 11:36:44 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Jan 29 17:37:19 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz>

> > * Apparently Outlook comes with a "Junk Email" folder.
> Hmm...does anyone else's Outlook have a "Junk Email" folder?  
> Mine (2000 SR1) certainly didn't come with one.

I take this back.  If you use the Adult Mail/Junk Mail rules that Outlook offers, then if you chose the 'move mail' option and the 'junk mail' folder, then you are prompted to create a new "Junk Mail" folder.

However, not every user will have one of these.  I would suggest that the best option would be to change the documentation to suggest that _if there is one_ then to use the Junk Mail folder.  I guess the plugin could check to see if there was an existing 'junk mail' folder and default to it, but then it could check for a 'spam' folder and default to that, too.  Depends on what people are likely to have.

=Tony Meyer

From ducky at webfoot.com  Wed Jan 29 14:53:06 2003
From: ducky at webfoot.com (Kaitlin Duck Sherwood)
Date: Wed Jan 29 17:49:34 2003
Subject: [Spambayes] egregious patents on anti-spam techniques
Message-ID: <p0510030fba5e07ee9a74@[10.0.0.2]>

Gang --

I've recently become aware of two egregious patent applications 
related to spam fighting.  The first one looks like it might 
conceivably cover Bayesian filtering.  It would be good if someone 
more familiar with Bayesian/classifier/machine learning theory could 
check it out and perhaps challenge ("protest") the application.

The second is on using whitelists, blacklists, challenge-response, 
and digital signatures to combat spam.  I plan to protest that one 
myself.  I have killer prior art for whitelists, blacklists, and 
challenge-response (see p.82 of _Stopping Spam_ by Schwartz & 
Garfinkel, 1998).  I do not know of prior art for using digital 
signatures in the service of stopping spam.  If you know of prior art 
for that, you might want to issue a protest and/or send me the info.

(If you send me prior art on digital signatures/spam, please
+ read the patent claims first
+ put PRIOR ART in the subject line.)

I'm going to Japan for ten days, leaving Friday morning, and will not 
have email connectivity then.

To protest a patent, you need to file prior art (within 60 days!) 
with the patent office.  See:
http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm
and
http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610

Patent application on adaptive spam filtering:
<http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/email&RS=TTL/email> 


Patent application on whitelists, blacklists, challenge-response, and 
digital signatures used in spam-fighting:
<http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p
=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1='20030009698'.PGNR.&OS=DN/20
030009698&RS=DN/20030009698>


From francois.granger at free.fr  Thu Jan 30 08:21:40 2003
From: francois.granger at free.fr (Francois Granger)
Date: Thu Jan 30 02:21:44 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: 
 <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz>
Message-ID: <a05200f2fba5e82a5cae8@[192.168.1.20]>

At 11:36 +1300 30/01/2003, in message RE: [Spambayes] Outlook plugin 
notes, Meyer, Tony wrote:
>I guess the plugin could check to see if there was an existing 'junk 
>mail' folder and default to it

In this case, beware of localization issues. It may be translated in 
localized versions.


-- 
Recently using MacOSX.......

From francois.granger at free.fr  Thu Jan 30 09:42:33 2003
From: francois.granger at free.fr (Francois Granger)
Date: Thu Jan 30 03:42:38 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f2bba5dfa5c91b6@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
 <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
 <a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
 <a05200f2bba5dfa5c91b6@[192.168.1.20]>
Message-ID: <a05200f00ba5e92717e71@[192.168.1.20]>

At 22:40 +0100 29/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Francois Granger wrote:
>At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 
>Release?, Neale Pickett wrote:
>>Francois Granger <francois.granger@free.fr> writes:
>>
>>>  UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>>>  position 86: ordinal not in range(128)

Some more info:

This error showed up since install of 2.3a1. It does not happens with 
my normal setup with Python 2.2
I removed all mail coded with accented chars and kept only english 
mails with no accented chars, no error.

I can pack some mails for you if anybody want.

Side remark:
On MacOS X, upgrading from 2.2 to 2.3 changes the default database format.
The first time I started pop3proxy with 2.3a1, it created a new 
database even with the old one available.after playing with it a 
little, i changed the line:
dbm_type = best
to
dbm_type = dbhash
and it got my old database.

Can we add in the doc that the values for this option are:
"best", "db3hash", "dbhash", "gdbm", "dumbdbm"

-- 
Recently using MacOSX.......

From Paul.Moore at atosorigin.com  Thu Jan 30 09:43:50 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Thu Jan 30 04:45:12 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <16E1010E4581B049ABC51D4975CEDB880113D89E@UKDCX001.uk.int.atosorigin.com>

From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz]
> > * Apparently Outlook comes with a "Junk Email" folder.
> Hmm...does anyone else's Outlook have a "Junk Email"
> folder?  Mine (2000 SR1) certainly didn't come with one.

If you use Outlook's built in junk filtering (which, IMHO, is
pretty useless...) it creates a "Junk Email" folder when you
set it up. But it's not there by default.

I think it's just a normal folder, though, so you could check
for a folder called "Junk Email" and use it if it exists,
otherwise work as at present. I'm not sure if that's worth the
effort, though. Maybe just change the docs to refer to a "Junk
Email" folder rather than a "Spam" folder.

Paul.

From mhammond at skippinet.com.au  Thu Jan 30 22:04:38 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Jan 30 06:05:33 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <w53adhk2ahb.fsf@woozle.org>
Message-ID: <005301c2c84f$61ff0630$530f8490@eden>

Just so you know I am not ignoring this thread, I tend to agree with many of
the points.  My intention is to reply in detail as I fix them!

Also-helping-a-friend-lay-a-concrete-slab-and-am-buggered <wink> ly,

Mark.


From mwh at python.net  Thu Jan 30 11:15:00 2003
From: mwh at python.net (Michael Hudson)
Date: Thu Jan 30 06:15:07 2003
Subject: [Spambayes] Re: Alpha 2 Release?
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
	<c2ud3vk5hbtoukl1547q8eesvrfl3a7bbe@4ax.com>
Message-ID: <2m3cnanb2j.fsf@starship.python.net>

Richie Hindle <richie@entrian.com> writes:

> [Fran?ois]
>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>> position 86: ordinal not in range(128)
>
> This is bizarre.  This is expat complaining that you can't have high-bit
> characters in ASCII XML, which is quite right, but I replace all those
> characters with charrefs on the way in:
>
>>>> def replaceHighCharacters(match):
> ...     return "&#%d;" % ord(match.group(1))
> ...
>>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d")
> u'a b &#233; c d'
>
> So what's going on...?

Umm, that regexp isn't going to match, e.g. u"\N{EURO SIGN}":

>>> ord(u"\N{EURO SIGN}")
8364

Could that be what's happening?

Cheers,
M.

-- 
  > Or can I sweep that can of worms under the rug?
  Please shove them under the garage.
   -- Greg Ward and Guido van Rossum mix their metaphors on python-dev


From richie at entrian.com  Wed Jan 29 16:50:13 2003
From: richie at entrian.com (Richie Hindle)
Date: Thu Jan 30 11:51:09 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f00ba5e92717e71@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
	<a05200f2bba5dfa5c91b6@[192.168.1.20]> <a05200f00ba5e92717e71@[192.168.1.20]>
Message-ID: <1c1g3v8bglqes83r9le7rhod1mdeotsu19@4ax.com>


[Fran?ois]
> I can pack some mails for you if anybody want.

Yes please, that would be very useful.  I'd love to get this one fixed
before the release.

> On MacOS X, upgrading from 2.2 to 2.3 changes the default database format.

This is scary - any Mac OS X people know what's going on here?

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Wed Jan 29 16:50:15 2003
From: richie at entrian.com (Richie Hindle)
Date: Thu Jan 30 11:51:19 2003
Subject: [Spambayes] Re: Alpha 2 Release?
In-Reply-To: <2m3cnanb2j.fsf@starship.python.net>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
	<c2ud3vk5hbtoukl1547q8eesvrfl3a7bbe@4ax.com>
	<2m3cnanb2j.fsf@starship.python.net>
Message-ID: <ce1g3vkoep3gk6u5tu5ou6vod1ufhnrcli@4ax.com>


[Michael]
> Umm, that regexp isn't going to match, e.g. u"\N{EURO SIGN}":

I don't think that's the problem - I believe the input is plain ASCII with
high characters embedded.  I'll know more when Fran?ois (or anyone?)
forwards a troublesome example email to me.

-- 
Richie Hindle
richie@entrian.com


From grobinson at transpose.com  Thu Jan 30 12:11:39 2003
From: grobinson at transpose.com (Gary Robinson)
Date: Thu Jan 30 12:11:43 2003
Subject: [Spambayes] 
 Re: egregious patents on anti-spam techniques (Kaitlin Duck
	Sherwood)
In-Reply-To: <E18e12k-0007dS-02@mail.python.org>
Message-ID: <BA5EC6FB.1E26A%grobinson@transpose.com>

> Patent application on adaptive spam filtering:
> <http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/net
> ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em
> ail&RS=TTL/email>

I looked at this last night.

I am not a lawyer, so don't go to the bank on what I say. And I didn't spend
a huge amount of time on it.

But I do have some experience with patents, and I do understand the
spambayes approach and the gist of their approach. It is my impression that
the patent does not have a scope that encompasses Graham-derived filters,
because they do not calculate "first" and "second" "symantic anchors" as the
term is used in Claim 1.

They seem to be trying to make a straightforward adaptation of technology
that works well for classifying documents according to subject area, latent
semantic analysis, into the spam realm.

It would be very, very interesting to code and test their algorithm's
performance against that of spambayes.

One aspect of using  latent semantic analysis is that it treats synonyms of
known spammy words much as it does the spammy words themselves. It's
sophisticated technology. But I'm not sure that its advantages matter much
for spam detection with the kind of data we have available. It would be very
interesting to know.

--Gary

-- 
[http://ThisURLEnablesEmailToGetThroughOverzealousSpamFilters.org]

Gary Robinson
CEO
Transpose, LLC
grobinson@transpose.com
207-942-3463
http://www.transpose.com
http://radio.weblogs.com/0101454


> From: spambayes-request@python.org
> Reply-To: spambayes@python.org
> Date: Wed, 29 Jan 2003 17:50:54 -0500
> To: spambayes@python.org
> Subject: Spambayes Digest, Vol 53, Issue 55
> 
> Send Spambayes mailing list submissions to
> spambayes@python.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://mail.python.org/mailman/listinfo/spambayes
> or, via email, send a message with subject or body 'help' to
> spambayes-request@python.org
> 
> You can reach the person managing the list at
> spambayes-owner@python.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Spambayes digest..."
> 
> 
> Today's Topics:
> 
>  1. Re: Outlook plugin notes (Skip Montanaro)
>  2. Re: Alpha 2 Release? (Richie Hindle)
>  3. Re: Outlook plugin notes (Tim Stone - Four Stones Expressions)
>  4. Re: Alpha 2 Release? (Francois Granger)
>  5. Re: Alpha 2 Release? (Neale Pickett)
>  6. Re: Alpha 2 Release? (Richie Hindle)
>  7. Re: Alpha 2 Release? (Francois Granger)
>  8. Re: Details in headers (G. Armour Van Horn)
>  9. RE: Outlook plugin notes (Meyer, Tony)
> 10. RE: Outlook plugin notes (Meyer, Tony)
> 11. egregious patents on anti-spam techniques (Kaitlin Duck Sherwood)
> 
> 
> ----------------------------------------------------------------------
> 
> Date: Wed, 29 Jan 2003 11:04:46 -0600
> From: Skip Montanaro <skip@pobox.com>
> To: Neale Pickett <neale@woozle.org>
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Outlook plugin notes
> Message-ID: <15928.2478.537965.516443@montanaro.dyndns.org>
> In-Reply-To: <w53adhk2ahb.fsf@woozle.org>
> References: <w53adhk2ahb.fsf@woozle.org>
> Content-Type: text/plain; charset=us-ascii
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Precedence: list
> Reply-To: skip@pobox.com
> Message: 1
> 
> 
>   Neale> One of our more tenacious tech writers ...
> 
> You know when a company has good tech writers because their documentation is
> head and shoulders above the competitions.  I like to think of them as
> librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-)
> 
> Good tech writers also make extremely good testers because they want the
> documentation and the application to match exactly.
> 
> Skip
> 
> ------------------------------
> 
> Date: Tue, 28 Jan 2003 17:50:47 +0000
> From: Richie Hindle <richie@entrian.com>
> To: spambayes@python.org
> Subject: Re: [Spambayes] Alpha 2 Release?
> Message-ID: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> In-Reply-To: <w53ptqi1ej4.fsf@woozle.org>
> References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
> <w53ptqi1ej4.fsf@woozle.org>
> Content-Type: text/plain; charset=ISO-8859-1
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Precedence: list
> Reply-To: richie@entrian.com
> Message: 2
> 
> 
> [Neale]
>> Thanks a ton for putting a release together, Richie.
> 
> No problem.  I'm hoping to do this on Friday evening UK time, if that's OK
> with everyone else?
> 
> I'll fix the Mac OS X stack size problem for pop3proxy before the release -
> I may not have time to do it "properly" by introducing a new
> platform-dependent module, but we can munge things around afterwards.  It's
> more important to get a release out before the Linux Journal articles are
> published, and it only seems to be the pop3proxy that has the problem.
> 
> Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time
> for a deeper investigation I'll do one.
> 
> -- 
> Richie
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 12:21:06 -0600
> From: Tim Stone - Four Stones Expressions <tim@fourstonesExpressions.com>
> To: Neale Pickett <neale@woozle.org>, skip@pobox.com
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Outlook plugin notes
> Message-ID: <WRGAFDN72MLPK7Q4398YUIHHE87HD.3e381b92@myst>
> In-Reply-To: <15928.2478.537965.516443@montanaro.dyndns.org>
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Precedence: list
> Reply-To: tim@fourstonesExpressions.com
> Message: 3
> 
> 1/29/2003 11:04:46 AM, Skip Montanaro <skip@pobox.com> wrote:
> 
>> 
>>    Neale> One of our more tenacious tech writers ...
>> 
>> You know when a company has good tech writers because their documentation is
>> head and shoulders above the competitions.  I like to think of them as
>> librarians without the Donna Reed ("It's a Wonderful Life") personality. ;-)
>> 
>> Good tech writers also make extremely good testers because they want the
>> documentation and the application to match exactly.
> Think we could get her to write our doc?  - TimS
>> 
>> Skip
>> 
>> _______________________________________________
>> Spambayes mailing list
>> Spambayes@python.org
>> http://mail.python.org/mailman/listinfo/spambayes
>> 
>> 
> 
> 
> c'est moi - TimS
> http://www.fourstonesExpressions.com
> http://wecanstopspam.org
> 
> 
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 20:02:18 +0100
> From: Francois Granger <francois.granger@free.fr>
> To: richie@entrian.com
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Alpha 2 Release?
> Message-ID: <a05200f2aba5dd54b1b98@[192.168.1.20]>
> In-Reply-To: <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
> <w53ptqi1ej4.fsf@woozle.org> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> Content-Type: text/plain; charset="iso-8859-1" ; format="flowed"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Precedence: list
> Message: 4
> 
> At 17:50 +0000 28/01/2003, in message Re: [Spambayes] Alpha 2
> Release?, Richie Hindle wrote:
>> [Neale]
>>>  Thanks a ton for putting a release together, Richie.
> 
> My thanks as well.
> 
>> Fran?ois, I haven't forgotten about you pop3proxy problem - if I have time
>> for a deeper investigation I'll do one.
> 
> No problem.
> I did not saw the classification problem since the other day. It
> seems that it is solved.
> 
> I got a new fresh traceback tonight when I asked for review:
> 
> Traceback (most recent call last):
> 
>  File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in
> found_terminator
>    getattr(plugin, name)(**params)
> 
>  File "/Volumes/OS99/spambayes/pop3proxy.py", line 932, in onReview
>    self._appendMessages(page.table, messages, label)
> 
>  File "/Volumes/OS99/spambayes/pop3proxy.py", line 823, in _appendMessages
>    table += row
> 
>  File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 787, in __iadd__
>    nodes = self._nodeListFromSource(other)
> 
>  File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640,
> in _nodeListFromSource
>    tree = _generateTree("<x>"+value+"</x>")
> 
>  File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 574,
> in _generateTree
>    g.feed(source)
> 
>  File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 499, in feed
>    self._parser.Parse(data)
> 
>  File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 529,
> in StartElementHandler
>    newAttributes[str(name)] = self._unmungeEntities(str(value))
> 
> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
> position 86: ordinal not in range(128)
> 
> 
> 
> 
> -- 
> Recently using MacOSX.......
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 12:38:21 -0800
> From: Neale Pickett <neale@woozle.org>
> To: Francois Granger <francois.granger@free.fr>
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Alpha 2 Release?
> Message-ID: <w53r8avzo76.fsf@woozle.org>
> In-Reply-To: <a05200f2aba5dd54b1b98@[192.168.1.20]> (Francois Granger's
> message of "Wed, 29 Jan 2003 20:02:18 +0100")
> References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
> <w53ptqi1ej4.fsf@woozle.org>
> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> <a05200f2aba5dd54b1b98@[192.168.1.20]>
> Content-Type: text/plain; charset=us-ascii
> MIME-Version: 1.0
> Precedence: list
> Message: 5
> 
> Francois Granger <francois.granger@free.fr> writes:
> 
>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>> position 86: ordinal not in range(128)
> 
> Yeah, my wife's been getting those too.  I'll look into her traceback.
> 
> Yikes!  I just swallowed the tine of a plastic fork!
> 
> Neale
> 
> ------------------------------
> 
> Date: Tue, 28 Jan 2003 21:41:33 +0000
> From: Richie Hindle <richie@entrian.com>
> To: spambayes@python.org
> Subject: Re: [Spambayes] Alpha 2 Release?
> Message-ID: <c2ud3vk5hbtoukl1547q8eesvrfl3a7bbe@4ax.com>
> In-Reply-To: <w53r8avzo76.fsf@woozle.org>
> References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> <a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
> Content-Type: text/plain; charset=ISO-8859-1
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Precedence: list
> Reply-To: richie@entrian.com
> Message: 6
> 
> 
> [Fran?ois]
>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>> position 86: ordinal not in range(128)
> 
> This is bizarre.  This is expat complaining that you can't have high-bit
> characters in ASCII XML, which is quite right, but I replace all those
> characters with charrefs on the way in:
> 
>>>> def replaceHighCharacters(match):
> ...     return "&#%d;" % ord(match.group(1))
> ...
>>>> re.sub('([\x80-\xff])', replaceHighCharacters, u"a b \xe9 c d")
> u'a b &#233; c d'
> 
> So what's going on...?
> 
> 
>> Yikes!  I just swallowed the tine of a plastic fork!
> 
> That'll teach you to try to get out of doing the washing up.  8-)
> 
> -- 
> Richie Hindle
> richie@entrian.com
> 
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 22:40:03 +0100
> From: Francois Granger <francois.granger@free.fr>
> To: spambayes@python.org
> Subject: Re: [Spambayes] Alpha 2 Release?
> Message-ID: <a05200f2bba5dfa5c91b6@[192.168.1.20]>
> In-Reply-To: <w53r8avzo76.fsf@woozle.org>
> References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
> <a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
> Content-Type: text/plain; charset="us-ascii" ; format="flowed"
> MIME-Version: 1.0
> Precedence: list
> Message: 7
> 
> At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2
> Release?, Neale Pickett wrote:
>> Francois Granger <francois.granger@free.fr> writes:
>> 
>>>  UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>>>  position 86: ordinal not in range(128)
>> 
>> Yeah, my wife's been getting those too.  I'll look into her traceback.
> 
> Your wife speaks some foreign language ? ;-)
> 
>> Yikes!  I just swallowed the tine of a plastic fork!
> 
> My apologies for this, it is not _that_ important ;-)
> 
> Thanks for all.
> 
> -- 
> Recently using MacOSX.......
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 14:11:55 -0800
> From: "G. Armour Van Horn" <vanhorn@whidbey.com>
> Cc: spambayes@python.org
> Subject: Re: [Spambayes] Details in headers
> Message-ID: <3E3851AB.18018F9E@whidbey.com>
> References: 
> <16E1010E4581B049ABC51D4975CEDB886199BD@UKDCX001.uk.int.atosorigin.com>
> Content-Type: text/plain; charset=us-ascii
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Precedence: list
> Reply-To: vanhorn@whidbey.com
> Message: 8
> 
> Speaking as one who provides tech support for a hundred or so Windows
> users, I find it perverse to put any file where Windows may change it or
> obscure it. Against the flow from Redmond, I want my users to put their
> data in folders they specifically control (normally on a file server,
> never in "My Documents"). I want applications to put everything possible
> in their respective directories, not in the registry, not in the current
> equivalent of /windows/system. (And I always want to see all file
> extensions!)
> 
> I imagine that I'll end up putting some form of Spambayes on at least a
> couple of dozen systems, so I'll get used to whatever is done, but I'd
> strongly prefer that the file locations be easily understood and easily
> learned so others don't have to spend so much time when a user messes an
> installation up.
> 
> Van
> 
> "Moore, Paul" wrote:
> 
>> From: Piers Haken [mailto:piersh@friskit.com]
>>> I think that's a bit harsh. The directory is called
>>> "Application Data", not "My Documents": it's designed
>>> to be used by well-behaved applications only and it's
>>> generally a bad idea for users to go mucking about with
>>> stuff in there
>> 
>> Sorry - I thought we were talking about the location of
>> the INI file, which (at the moment, at least) is intended
>> to be user editable.
>> 
>> I've no problem with this location for purely application
>> maintained configuration data.
>> 
>> But I still think that there should at least be an option
>> for the application directory to get deleted on uninstall -
>> otherwise you get the same problem as with the registry of
>> configuration data for uninstalled applications just getting
>> left around and forgotten.
>> 
>> Paul.
>> 
>> _______________________________________________
>> Spambayes mailing list
>> Spambayes@python.org
>> http://mail.python.org/mailman/listinfo/spambayes
> 
> --
> ----------------------------------------------------------
> Sign up now for Quotes of the Day, a handful of quotations
> on a theme delivered every morning.
> Enlightenment! Daily, for free!
> mailto:twisted@whidbey.com?subject=Subscribe_QOTD
> 
> For web hosting and maintenance,
> visit Van's home page: http://www.domainvanhorn.com/van/
> ----------------------------------------------------------
> 
> 
> 
> ------------------------------
> 
> Date: Thu, 30 Jan 2003 11:32:01 +1300
> From: "Meyer, Tony" <T.A.Meyer@massey.ac.nz>
> To: <spambayes@python.org>
> Subject: RE: [Spambayes] Outlook plugin notes
> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318CD37@its-xchg4.massey.ac.nz>
> Content-Type: text/plain;
> charset="iso-8859-1"
> MIME-Version: 1.0
> Content-Transfer-Encoding: quoted-printable
> Precedence: list
> Message: 9
> 
>> * In her words, "When you filter to an online folder, SpamBayes
>>   automatically disables filtering when you connect offline. What I
>>   would like is that when I reconnect, SpamBayes should automatically
>>   reenable filtering and run it against those folders. Now I=20
>>   have to do this manually."
> I don't use any offline features, so I can't comment on this.
> 
>> * Apparently Outlook comes with a "Junk Email" folder.
> Hmm...does anyone else's Outlook have a "Junk Email" folder?  Mine (2000 =
> SR1) certainly didn't come with one.
> 
>> * She feels end-users need more education about what "spam-possible" =
> means.
> That would be a documentation issue, right?  And wasn't she a writer? =
> ....
> 
>> * The sliders in the configuration window should have tick marks.
> I guess I would agree with that.  I don't know who would even use the =
> sliders when there's a text box just there, but ...
> 
>> * In the anti-spam dialog box:
>>   o  Enable filtering checkbox should be below filters, since you have
>>      to enable filtering before you can mess with the filters.
> Agreed.
> 
>>   o  The filters box needs a scrollbar, for those with a ton of =
> folders
>>      to filter so you can see the text.
> Or some other way of showing them all.  A scrollbar would make it ugly, =
> wouldn't it?
> 
>> * Add a "spam column" in the anti-spam pulldown, so it's easy to add a
>>   new "spam %" column in the current folder view.
> Is it possible to customise the current view via code?  I wondered about =
> doing this myself, since I seem to be constantly adding the column to =
> new folders, but couldn't find any information about doing so (I must =
> admit I didn't look that hard).
> 
>> * She says that the plugin is definitely not filtering public folders.
> This might be an issue that has been resolved (I'm not sure what version =
> the release had).  I'm checking this out on my system, but I don't have =
> a lot of (mail) public folders (they're mostly calendars).  I'll have to =
> wait until one of them gets mail.
> 
>> * She suggested deleting from public folders should go into a public
>>   spam folder.
> Perhaps there could be an option to have mail from each folder you =
> filter:
> (a) go to the same uncertain/spam folders [as now]
> (b) go to individual uncertain/spam folders [one set per filtered =
> folder]
> This would be quite a big interface change, though.  Do people think =
> it's worth it?
> 
>> * All outbound mail should be trained as ham
>> I really like this last one.  I don't know if anyone's ever thought of
>> training on outbound mail before.
> Tim's post on this is in the November 2002 archive - "Bayes Training".  =
> The arguments against were:
> * Because some spam is 'from' yourself, this deteriates the helpfulness =
> of the from header.
> * It's easy to find enough ham; much more deteriates the ratio.
> 
> If a user saves their outgoing mail (in "Sent Items"), for example, then =
> it's easy to train on that folder.  I do this.
> 
> =3DTony Meyer
> 
> ------------------------------
> 
> Date: Thu, 30 Jan 2003 11:36:44 +1300
> From: "Meyer, Tony" <T.A.Meyer@massey.ac.nz>
> To: <spambayes@python.org>
> Subject: RE: [Spambayes] Outlook plugin notes
> Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3D7@its-xchg4.massey.ac.nz>
> Content-Type: text/plain;
> charset="iso-8859-1"
> MIME-Version: 1.0
> Content-Transfer-Encoding: quoted-printable
> Precedence: list
> Message: 10
> 
>>> * Apparently Outlook comes with a "Junk Email" folder.
>> Hmm...does anyone else's Outlook have a "Junk Email" folder? =20
>> Mine (2000 SR1) certainly didn't come with one.
> 
> I take this back.  If you use the Adult Mail/Junk Mail rules that =
> Outlook offers, then if you chose the 'move mail' option and the 'junk =
> mail' folder, then you are prompted to create a new "Junk Mail" folder.
> 
> However, not every user will have one of these.  I would suggest that =
> the best option would be to change the documentation to suggest that _if =
> there is one_ then to use the Junk Mail folder.  I guess the plugin =
> could check to see if there was an existing 'junk mail' folder and =
> default to it, but then it could check for a 'spam' folder and default =
> to that, too.  Depends on what people are likely to have.
> 
> =3DTony Meyer
> 
> ------------------------------
> 
> Date: Wed, 29 Jan 2003 14:53:06 -0800
> From: Kaitlin Duck Sherwood <ducky@webfoot.com>
> To: spambayes@python.org
> Subject: [Spambayes] egregious patents on anti-spam techniques
> Message-ID: <p0510030fba5e07ee9a74@[10.0.0.2]>
> Content-Type: text/plain; charset=us-ascii; format=flowed
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7BIT
> Precedence: list
> Message: 11
> 
> Gang --
> 
> I've recently become aware of two egregious patent applications
> related to spam fighting.  The first one looks like it might
> conceivably cover Bayesian filtering.  It would be good if someone
> more familiar with Bayesian/classifier/machine learning theory could
> check it out and perhaps challenge ("protest") the application.
> 
> The second is on using whitelists, blacklists, challenge-response,
> and digital signatures to combat spam.  I plan to protest that one
> myself.  I have killer prior art for whitelists, blacklists, and
> challenge-response (see p.82 of _Stopping Spam_ by Schwartz &
> Garfinkel, 1998).  I do not know of prior art for using digital
> signatures in the service of stopping spam.  If you know of prior art
> for that, you might want to issue a protest and/or send me the info.
> 
> (If you send me prior art on digital signatures/spam, please
> + read the patent claims first
> + put PRIOR ART in the subject line.)
> 
> I'm going to Japan for ten days, leaving Friday morning, and will not
> have email connectivity then.
> 
> To protest a patent, you need to file prior art (within 60 days!)
> with the patent office.  See:
> http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm
> and
> http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610
> 
> Patent application on adaptive spam filtering:
> <http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/net
> ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em
> ail&RS=TTL/email>
> 
> 
> Patent application on whitelists, blacklists, challenge-response, and
> digital signatures used in spam-fighting:
> <http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p
> =1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1='20030009698'.PGNR.&OS=DN/20
> 030009698&RS=DN/20030009698>
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Spambayes mailing list
> Spambayes@python.org
> http://mail.python.org/mailman/listinfo/spambayes
> 
> 
> End of Spambayes Digest, Vol 53, Issue 55
> *****************************************
> 


From neale at woozle.org  Thu Jan 30 09:40:06 2003
From: neale at woozle.org (Neale Pickett)
Date: Thu Jan 30 12:40:14 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f00ba5e92717e71@[192.168.1.20]> (Francois Granger's
 message of "Thu, 30 Jan 2003 09:42:33 +0100")
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
	<a05200f2bba5dfa5c91b6@[192.168.1.20]>
	<a05200f00ba5e92717e71@[192.168.1.20]>
Message-ID: <w53n0liwn7t.fsf@woozle.org>

Francois Granger <francois.granger@free.fr> writes:

> Side remark:
> On MacOS X, upgrading from 2.2 to 2.3 changes the default database
> format.

IIRC, 2.3 includes db3.  So "best" would change for you.

Sadly, AAUI, there's no magic way to tell a dbhash from a db3hash.  :/

> Can we add in the doc that the values for this option are:
> "best", "db3hash", "dbhash", "gdbm", "dumbdbm"

Maybe we need some document consolidation.  :(

Neale

From jm at jmason.org  Thu Jan 30 17:57:40 2003
From: jm at jmason.org (Justin Mason)
Date: Thu Jan 30 12:56:42 2003
Subject: [Spambayes] Re: egregious patents on anti-spam techniques
	(Kaitlin Duck Sherwood) 
In-Reply-To: Message from Gary Robinson <grobinson@transpose.com> 
	<BA5EC6FB.1E26A%grobinson@transpose.com> 
Message-ID: <20030130175745.B530B16F16@jmason.org>


Gary Robinson said:

> But I do have some experience with patents, and I do understand the
> spambayes approach and the gist of their approach. It is my impression that
> the patent does not have a scope that encompasses Graham-derived filters,
> because they do not calculate "first" and "second" "symantic anchors" as the
> term is used in Claim 1.
> 
> They seem to be trying to make a straightforward adaptation of technology
> that works well for classifying documents according to subject area, latent
> semantic analysis, into the spam realm.

That was my impression, too, which is good news (to a degree).  The other
one is much broader, and I've forwarded it onto the TMDA users list, since
they are *totally* prior art.

--j.

From francois.granger at free.fr  Thu Jan 30 18:57:49 2003
From: francois.granger at free.fr (Francois Granger)
Date: Thu Jan 30 12:57:56 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <w53n0liwn7t.fsf@woozle.org>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
 <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
 <a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
 <a05200f2bba5dfa5c91b6@[192.168.1.20]>
 <a05200f00ba5e92717e71@[192.168.1.20]> <w53n0liwn7t.fsf@woozle.org>
Message-ID: <a05200f0aba5f1754a675@[192.168.1.20]>

At 09:40 -0800 30/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Neale Pickett wrote:
>Francois Granger <francois.granger@free.fr> writes:
>
>>  Side remark:
>>  On MacOS X, upgrading from 2.2 to 2.3 changes the default database
>>  format.
>
>IIRC, 2.3 includes db3.  So "best" would change for you.

And then I can't access my existing one ;-)
And I got from Robin Dunn that the current 2.3 build I use from 
wxPython sourceforge area does not have dbhash yet. So, I am stuck 
with 2.2 for my "production version" of Spambayes ;-)


-- 
Recently using MacOSX.......

From neale at woozle.org  Thu Jan 30 10:03:29 2003
From: neale at woozle.org (Neale Pickett)
Date: Thu Jan 30 13:03:42 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f0aba5f1754a675@[192.168.1.20]> (Francois Granger's
 message of "Thu, 30 Jan 2003 18:57:49 +0100")
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]> <w53r8avzo76.fsf@woozle.org>
	<a05200f2bba5dfa5c91b6@[192.168.1.20]>
	<a05200f00ba5e92717e71@[192.168.1.20]> <w53n0liwn7t.fsf@woozle.org>
	<a05200f0aba5f1754a675@[192.168.1.20]>
Message-ID: <w53hebqwm4u.fsf@woozle.org>

Francois Granger <francois.granger@free.fr> writes:

> At 09:40 -0800 30/01/2003, in message Re: [Spambayes] Alpha 2 Release?,
> Neale Pickett wrote:

>>IIRC, 2.3 includes db3.  So "best" would change for you.
>
> And then I can't access my existing one ;-)

Yeah, that's what I was intoning.  I was answering the question Richie's
asked later, I guess I should have actually answered his question
instead of responding to yours ;)

> And I got from Robin Dunn that the current 2.3 build I use from
> wxPython sourceforge area does not have dbhash yet. So, I am stuck
> with 2.2 for my "production version" of Spambayes ;-)

Gah.  I guess this should be in the release notes.

So, has someone officially volunteered to be the documentation
coordinator?  It sounds like we need someone for the job...

Neale

From tim.one at comcast.net  Thu Jan 30 15:29:41 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 30 15:30:15 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <a05200f2fba5e82a5cae8@[192.168.1.20]>
Message-ID: <BIEJKCLHCIOIHAGOKOLHCEBJELAA.tim.one@comcast.net>

[Meyer, Tony]
> I guess the plugin could check to see if there was an existing 'junk
> mail' folder and default to it

[Francois Granger]
> In this case, beware of localization issues. It may be translated in
> localized versions.

And it gets worse <wink>:  any folder whatsoever can be the target of the
"junk email" wizard, including Deleted Items.  In addition, there's a
distinct "adult content" rule, which may also target any folder.

These work so poorly it's hard to believe anyone uses them for more than a
few days.  If someone is using them, I'd rather that Mark's plugin use a
different directory, so we don't get blamed for the builtin filters' poor
performance!


From skip at pobox.com  Thu Jan 30 14:41:32 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 30 15:41:47 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f0aba5f1754a675@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
        <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
        <a05200f2aba5dd54b1b98@[192.168.1.20]>
        <w53r8avzo76.fsf@woozle.org>
        <a05200f2bba5dfa5c91b6@[192.168.1.20]>
        <a05200f00ba5e92717e71@[192.168.1.20]>
        <w53n0liwn7t.fsf@woozle.org>
        <a05200f0aba5f1754a675@[192.168.1.20]>
Message-ID: <15929.36348.24505.247194@montanaro.dyndns.org>


    >>> Side remark:
    >>> On MacOS X, upgrading from 2.2 to 2.3 changes the default database
    >>> format.
    >> 
    >> IIRC, 2.3 includes db3.  So "best" would change for you.

    Francois> And then I can't access my existing one ;-)

I'm skeptical that the 2.2 -> 2.3 change is what nailed your database.  I've
been using the HEAD branch of Python CVS as my daily Python interpreter
since well before bsddb3 replaced the old bsddb module.  I had no problems
with database files at the transition point.

More likely, what happened sometime between when you started using 2.2 and
when you started using 2.3 is the the underlying Berkeley DB library got
updated.  Unfortunately, the file(1) command on Mac OS X doesn't grok that
file format, so you have to be a bit sneaky to figure out what happened.

On my system, /usr/bin/python is python2.2.  Its bsddb.so file is in
/usr/lib/python2.2/lib-dynload.  "otool -L bsddb.so" tells me:

    bsddb.so:
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 60.0.0)

This suggests that the Berkeley DB library is bundled in /usr/lib/libSystem.

However, the bsddb.so for Python 2.3 is

    bsddb.so:
            /sw/lib/libdb-3.3.dylib (compatibility version 3.3.0, current version 3.3.11)
            /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 60.2.0)

If you find the appropriate versions of bsddb.so for your two Python
versions, so they disagree in this fashion about Berkeley DB?

Sleepycat provided a fairly simple way out of the woods here.  They provide
db_dump and db_load commands with each version of their library.  These
tools transfer databases out of and back into the binary format
(respectively) using a platform-neutral plain text format.  You'd db_dump
with the version of db_dump compatible with your old library, then db_load
using the version compatible with your new library.  Unfortunately, it
appears Apple saw fit not to deliver them with Mac OS X.  (I got them with
the fink distribution, so only have libdb3.3 versions at the moment.)

All is still not lost, however.  Assuming you have both 2.2 and 2.3
available, you can try something like the following (untested!) code.

Run this with Python 2.2:

    #!/usr/bin/python

    import bsddb
    db = bsddb.hashopen("hammie.db")
    f = open("hammie.txt", "w")
    for key in db.keys():
        f.write('%s\n" % (key, db[k]))
    db.close()
    f.close()

Run this with Python 2.3:

    #!/usr/local/bin/python2.3

    import bsddb
    db = bsddb.hashopen("hammie.db.new", "c")
    for line in open("hammie.txt"):
        key, val = eval(line)
        db[key] = val
    db.close()

Now, replace hammie.db file with hammie.db.new:

    mv hammie.db hammie.db.save
    chmod 444 hammie.db.save
    mv hammie.db.new hammie.db

and see if Spambayes works for you using 2.3.

Skip

From francois.granger at free.fr  Thu Jan 30 23:13:58 2003
From: francois.granger at free.fr (Francois Granger)
Date: Thu Jan 30 17:14:04 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <15929.36348.24505.247194@montanaro.dyndns.org>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>       
 <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>       
 <a05200f2aba5dd54b1b98@[192.168.1.20]>       
 <w53r8avzo76.fsf@woozle.org>       
 <a05200f2bba5dfa5c91b6@[192.168.1.20]>       
 <a05200f00ba5e92717e71@[192.168.1.20]>       
 <w53n0liwn7t.fsf@woozle.org>       
 <a05200f0aba5f1754a675@[192.168.1.20]>
 <15929.36348.24505.247194@montanaro.dyndns.org>
Message-ID: <a05200f0fba5f52141ff0@[192.168.1.20]>

At 14:41 -0600 30/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Skip Montanaro wrote:
>     >>> Side remark:
>     >>> On MacOS X, upgrading from 2.2 to 2.3 changes the default database
>     >>> format.
>     >>
>     >> IIRC, 2.3 includes db3.  So "best" would change for you.
>
>     Francois> And then I can't access my existing one ;-)
>
>I'm skeptical that the 2.2 -> 2.3 change is what nailed your database.  I've
>been using the HEAD branch of Python CVS as my daily Python interpreter
>since well before bsddb3 replaced the old bsddb module.  I had no problems
>with database files at the transition point.


Thanks for the time and all the details.


I just installed today the Python 2.3a1 from the wxPython sf page:
http://sf.net/project/showfiles.php?group_id=10718

Previously, I had spambayes running with the stock Apple Python 2.2

I had a confirmation from Robin Dunn who did the package that there 
was something wrong in it.

>the fink distribution

I downloaded really fex things from Fink. My only motivation to get 
it was to try X. Now that there is an Apple version, I don't use the 
fink one anymore.

>All is still not lost, however.  Assuming you have both 2.2 and 2.3
>available, you can try something like the following (untested!) code.

I'll give it a try tomorrow.

There is still another issue however with 2.3 and pop3proxy that I 
sent in another email:

At 22:40 +0100 29/01/2003, in message Re: [Spambayes] Alpha 2 
Release?, Francois Granger wrote:
>At 12:38 -0800 29/01/2003, in message Re: [Spambayes] Alpha 2 
>Release?, Neale Pickett wrote:
>>Francois Granger <francois.granger@free.fr> writes:
>>
>>>  UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
>>>  position 86: ordinal not in range(128)

Some more info:

This error showed up since install of 2.3a1. It does not happens with 
my normal setup with Python 2.2
I removed all mail coded with accented chars and kept only english 
mails with no accented chars, no error.


-- 
Recently using MacOSX.......

From T.A.Meyer at massey.ac.nz  Fri Jan 31 11:21:32 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 30 17:22:18 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz>

[Neale]
> > * She says that the plugin is definitely not filtering 
> > public folders.
[Tony]
> This might be an issue that has been resolved (I'm not sure 
> what version the release had).  I'm checking this out on my 
> system, but I don't have a lot of (mail) public folders 
> (they're mostly calendars).  I'll have to wait until one of 
> them gets mail.

What happens on my system is that I get an error because I do not have access to create a user-property in the public folder (the trace is below).  So with no score recorded, no action can be taken.  I suspect this might be what is happening for her as well, but I can't check what happens in a public folder where access is permitted, because I don't have access to one a folder like that - does anyone else?

I'm not sure what could be done about this.  Without access then the score can't be written into the message - in fact the message probably can't be moved either, since access wouldn't allow that.  I suspect this is as it should be - only those with appropriate access to the folder should be spam filtering it.

Neale, can you check with her and see what access she has to the public folder she was trying to filter?

=Tony Meyer

From skip at pobox.com  Thu Jan 30 16:23:22 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 30 17:23:35 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f0fba5f52141ff0@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
        <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
        <a05200f2aba5dd54b1b98@[192.168.1.20]>
        <w53r8avzo76.fsf@woozle.org>
        <a05200f2bba5dfa5c91b6@[192.168.1.20]>
        <a05200f00ba5e92717e71@[192.168.1.20]>
        <w53n0liwn7t.fsf@woozle.org>
        <a05200f0aba5f1754a675@[192.168.1.20]>
        <15929.36348.24505.247194@montanaro.dyndns.org>
        <a05200f0fba5f52141ff0@[192.168.1.20]>
Message-ID: <15929.42458.977826.228522@montanaro.dyndns.org>

    Francois> There is still another issue however with 2.3 and pop3proxy
    Francois> that I sent in another email:
    ...
    >>> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in
    >>> position 86: ordinal not in range(128)

Alas, Unicode I can't help you with... :-(

Skip

From neale at woozle.org  Thu Jan 30 14:33:07 2003
From: neale at woozle.org (Neale Pickett)
Date: Thu Jan 30 17:33:12 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz>
	("Meyer, Tony"'s message of "Fri, 31 Jan 2003 11:21:32 +1300")
References: <1ED4ECF91CDED24C8D012BCF2B034F1318D3F0@its-xchg4.massey.ac.nz>
Message-ID: <w53y952uv30.fsf@woozle.org>

"Meyer, Tony" <T.A.Meyer@massey.ac.nz> writes:

> I'm not sure what could be done about this.  Without access then the
> score can't be written into the message - in fact the message probably
> can't be moved either, since access wouldn't allow that.  I suspect
> this is as it should be - only those with appropriate access to the
> folder should be spam filtering it.
>
> Neale, can you check with her and see what access she has to the
> public folder she was trying to filter?

I'll ask her, and post her answer to the list when I get it.  Thanks for
the interest :)

Neale

From rod at borderware.com  Thu Jan 30 18:29:13 2003
From: rod at borderware.com (Rod Gilchrist)
Date: Thu Jan 30 18:30:50 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
	(Kaitlin Duck	Sherwood)
In-Reply-To: <BA5EC6FB.1E26A%grobinson@transpose.com>
References: <BA5EC6FB.1E26A%grobinson@transpose.com>
Message-ID: <3E39B549.5020702@borderware.com>

Gary Robinson wrote:

>>Patent application on adaptive spam filtering:
>><http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/net
>>ahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/em
>>ail&RS=TTL/email>
>>    
>>
>
>I looked at this last night.
>
>I am not a lawyer, so don't go to the bank on what I say. And I didn't spend
>a huge amount of time on it.
>
>But I do have some experience with patents, and I do understand the
>spambayes approach and the gist of their approach. It is my impression that
>the patent does not have a scope that encompasses Graham-derived filters,
>because they do not calculate "first" and "second" "symantic anchors" as the
>term is used in Claim 1.
>  
>
Here's a quote from the background section of the application:

"Latent semantic analysis (LSA) is a method that automatically uncovers 
the salient semantic relationships between words and documents in a 
given corpus. Discrete words are mapped onto a continuous semantic 
vector space, in which clustering techniques may be applied."

Graham derived filters do map words into a 'continuous semantic vector 
space', namely the one dimensional vector
space of the range of [0.0, 1.0] of real numbers, and then 'clustering 
techniques' are applied. Normally clusters are
defined by hyperplanes in N-Space, but in one dimesion they would be 
threshold values. The two 'symantic anchors' are arguably cluster 
centers located at 0.0 and 1.0 (also known as ham and spam in 
Graham-derived filters).

In fact it is quite reasonable to describe a Graham-derived filter as 
having a 'ham anchor' that can
be described as a location in N-Space in which each token string 
describes a dimension and the
'clue' value for that string is the location of the anchor in that 
dimension. Connecting the 'ham anchor'
in N-Space with the 'spam anchor' in N'-Space with a normalized vector 
of unit length and positioning
a hyperplane at some position along the vector and perpendicular to it 
(i.e. a threshold) is dead normal
practice in 'clustering techniques'.

I'd like to write this patent off too, but to me it looks like it likely 
would apply to Graham-derived filters.

I'm not an expert in patents either, but I have a few issued ones of my own.

The good news is the filing date is June 14, 2001.

I'd like to suggest that it would be good to file a protest as Kaitlin 
suggested. There was certainly
work done in this area before June 14, 2001.  Does anyone have pointers 
they can pass along.

- Rod


Kaitlin Duck Sherwood wrote:

 > To protest a patent, you need to file prior art (within 60 days!) 
with the patent office.  See:
 > http://www.uspto.gov/web/offices/pac/mpep/documents/1900.htm
 > and
 > http://www.uspto.gov/web/offices/pac/mpep/documents/0600_610.htm#sect610

 > Patent application on adaptive spam filtering:
 ><http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=/netahtml/PTO/search-bool.html&r=3&f=G&l=50&co1=AND&d=PG01&s1=email.TTL.&OS=TTL/email&RS=TTL/email>

 > Patent application on whitelists, blacklists, challenge-response, and 
digital signatures used in spam-fighting:
 ><http://appft1.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=HITOFF&d=PG01&p
 >=1&u=/netahtml/PTO/srchnum.html&r=1&f=G&l=50&s1='20030009698'.PGNR.&OS=DN/20
 > 030009698&RS=DN/20030009698>

From richard at jowsey.com  Fri Jan 31 10:29:55 2003
From: richard at jowsey.com (Richard Jowsey)
Date: Thu Jan 30 18:31:03 2003
Subject: [Spambayes] Chi-square scoring
In-Reply-To: <BA542692.1DAAE%grobinson@transpose.com>
References: <3E2EBE7C.638.52087A5@localhost>
Message-ID: <3E3A5023.5670.59D1F8C@localhost>

Hi again Gary,

I've implemented your prob-combining technik and a chi-squared 
function in Java, and have run some very revealing tests. The 
first observation I'd make is that *any* measure of "spamminess" 
is only as good as the good/junk word databases. So I've done a 
fair amount of experimentation on ways to fine-tune my training 
corpus, especially wrt the careful quarantining of messages 
which are incorrectly classified, or are decidedly "unsure" and 
will probably remain so forever... <grin>

Now, with a high-Q database, the probability distributions 
(pSpam) for the training corpus very closely approximate two 
binomial/normal distributions, with means around 0.25 and 0.75, 
and standard deviations of approx 1/12 (0.083), which is exactly 
what we'd expect from first principles, n'est-ce pas?

In theory then, the 95%-confidence boundaries of an "unsure" 
zone (centered around pSpam=0.5) can be defined as pSpam falling 
between the 2-sigma points of the training distributions:
   Unsure lower limit: 0.25 + (2 * 1/12) = 0.417
   Unsure upper limit: 0.75 - (2 * 1/12) = 0.583

In repeated testing, this simple approach provides reliable 
classification of randomly-selected streams of incoming email, 
viz. ~zero false positives and extremely accurate "uncertains". 
For comparison, I've also run the same streams through your chi-
squared test, with (as you suggested) the null hypothesis being 
some normal distribution around 0.5, i.e. "I'm absolutely 
uncertain about anything". The outcomes are remarkably similar 
to my 2-sigma approach, but now the unsure zone is "stretched" 
logarithmically between chi-2 scores of ~0.15 and ~0.85. And 
yes, the same bunch of messages drop into the spam/unsure/ham 
regions, whichever scoring method is used.  :-)

Conclusions?  After 1st-pass training, the good/junk word 
databases should definitely be re-tuned against the corpus. A 
low-Q database will simply "muddy" the classifier, irrespective 
of statistical technique. In such a poor signal/noise scenario, 
with lots of "unsures" in the corpus and/or in the sample 
stream, chi-2 scoring is a definite plus! However, this test is 
fairly expensive computationally, so in practice we might only 
need to perform chi-2 when a message's raw pSpam falls between, 
say, 0.25 and 0.75 (which approach gives exactly the same 
outcomes, but is considerably faster when proxying).

I can post you testing logs depicting these various results if 
you're interested...

Cheers,
Richard


From skip at pobox.com  Thu Jan 30 18:21:03 2003
From: skip at pobox.com (Skip Montanaro)
Date: Thu Jan 30 19:21:17 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
        (Kaitlin Duck   Sherwood)
In-Reply-To: <3E39B549.5020702@borderware.com>
References: <BA5EC6FB.1E26A%grobinson@transpose.com>
        <3E39B549.5020702@borderware.com>
Message-ID: <15929.49519.664913.836891@montanaro.dyndns.org>


    Rod> The good news is the filing date is June 14, 2001.

    Rod> I'd like to suggest that it would be good to file a protest as
    Rod> Kaitlin suggested. There was certainly work done in this area
    Rod> before June 14, 2001.  Does anyone have pointers they can pass
    Rod> along.

Check the list archives.  There has been academic research in this area,
though I don't know the reference off the top of my head.  It's come up
in one of these four places:

    * this list
    * at the recent spam workshop
    * on Gary Robinson's website
    * on Paul Graham's website

It would probably be a good idea to collect a bibliography on the spambayes
website, though I'm short on time at the moment.  This is one of those
things where a Wiki would be marvelous.  Anthony, can MoinMoin be run on SF?
If not, I'd be happy to create a new MoinMoin instance on manatee.mojam.com.

Skip


From noreply at sourceforge.net  Thu Jan 30 14:21:15 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 30 19:47:04 2003
Subject: [Spambayes] [ spambayes-Bugs-677804 ] Untouched fitler command error
Message-ID: <E18eN3b-0002Ex-00@sc8-sf-web2.sourceforge.net>

Bugs item #677804, was opened at 2003-01-31 11:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677804&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: Untouched fitler command error

Initial Comment:
When filtering is set to leave uncertain/spam messages 
untouched (rather than copy/move), I get an error:

Failed filtering message! <MAPIMsgStoreMsg, 'Curly 
Hair Help Is Here' (read) 
id=0000000038A1BB1005E5101AA1BB08002B2A56C20
000454D534D44422E444C4C00000000000000001B55FA
20AA6611CD9BC800AA002FC45A0C0000004954532D5
843484734002F6F3D4D617373657920556E69766572736
974792F6F753D4D41535345592F636E3D526563697069
656E74732F636E3D542E412E4D6579657200/00000000
2CFF45187C119D4295E615A8AD7B7676070098B01D27
17B9D411B38F0008C784093100000A46D79700001ED4
ECF91CDED24C8D012BCF2B034F130000001A6CD100
00>
Traceback (most recent call last):
  File "D:\CVS Modules\spambayes\Outlook2000
\filter.py", line 43, in filter_message
    raise RuntimeError, "Eeek - bad action '%r'" % 
(action,)
RuntimeError: Eeek - bad action ''untouched''

Line 34 of filter.py seems to expect the action to start 
with 'no' ('none', perhaps?).  Everything still works, but 
the traceback is a bit ugly.  Changing that one line to 
expect 'un' should fix the problem.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677804&group_id=61702

From noreply at sourceforge.net  Thu Jan 30 15:21:30 2003
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu Jan 30 19:47:12 2003
Subject: [Spambayes] [ spambayes-Bugs-677842 ] COM error on access denied
Message-ID: <E18eNzu-0000nT-00@sc8-sf-web1.sourceforge.net>

Bugs item #677842, was opened at 2003-01-31 12:21
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677842&group_id=61702

Category: Outlook
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Tony Meyer (anadelonbrin)
Assigned to: Nobody/Anonymous (nobody)
Summary: COM error on access denied

Initial Comment:
Some folders (public ones in particular) may not allow 
the user access to create the spam field.  This also 
seems to cause an 'access denied' com error later on.  
An example traceback is below.

Warning: failed to create the Outlook user-property in 
folder 'MCN Newsletter'
 (-2147352567, 'Exception occurred.', (4096, 'Microsoft 
Outlook', "You don't have appropriate permission to 
perform this operation.", None, 0, -2147024891), None)
 This is probably because the code has recently been 
changed, but it will
 have no effect on the filtering or scoring.
AntiSpam: Watching for new messages in folder MCN 
Newsletter
AntiSpam: Watching for new messages in folder Inbox
AntiSpam: Watching for new messages in folder Spam
Error processing missed messages!
Traceback (most recent call last):
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 610, in OnConnection
    self.ProcessMissedMessages()
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 884, in ProcessMissedMessages
  File "D:\CVS Modules\spambayes\Outlook2000
\addin.py", line 129, in ProcessMessage
    if msgstore_message.GetField
(manager.config.field_score_name) is not None:
  File "D:\CVS Modules\spambayes\Outlook2000
\msgstore.py", line 651, in GetField
    prop = self.mapi_object.GetIDsFromNames(props, 0)
[0]
com_error: (-2147024891, 'Access is denied.', None, 
None)


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=498103&aid=677842&group_id=61702

From T.A.Meyer at massey.ac.nz  Fri Jan 31 13:49:31 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Jan 30 19:50:08 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1318D3FE@its-xchg4.massey.ac.nz>

[Mark]
> Just so you know I am not ignoring this thread, I tend to 
> agree with many of
> the points.  My intention is to reply in detail as I fix them!

Make sure you let us know if we can help.

=Tony Meyer

From tim at fourstonesExpressions.com  Thu Jan 30 18:49:39 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 30 19:50:18 2003
Subject: [Spambayes] Skip's Installation.txt
Message-ID: <DCF0A987C0ONXS1XHEFEED4XNFCKG52.3e39c823@myst>

It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated 
into the website... is this on anybody's radar?  If someone has started doing 
this, then I don't want to reinvent it...

c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From jh at web.de  Fri Jan 31 02:11:30 2003
From: jh at web.de (Juergen Hermann)
Date: Thu Jan 30 20:12:14 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
	(Kaitlin Duck   Sherwood)
In-Reply-To: <15929.49519.664913.836891@montanaro.dyndns.org>
Message-ID: <E18ePiM-0004Ym-00@smtp.web.de>

On Thu, 30 Jan 2003 18:21:03 -0600, Skip Montanaro wrote:

>It would probably be a good idea to collect a bibliography on the spambayes
>website, though I'm short on time at the moment.  This is one of those
>things where a Wiki would be marvelous.  Anthony, can MoinMoin be run on SF?
>If not, I'd be happy to create a new MoinMoin instance on manatee.mojam.com.

Not too safely, because sf's web setup requires you to have things world-
writeable. Anyway, either the python.org wiki or the #python wiki are a good 
place to use, unless you expect Spambayes-related pages in the hundreds.

Look at what Mike Rovner did for boost.python at PythonInfo:

http://www.python.org/cgi-bin/moinmoin/boost_2epython


Ciao, J?rgen


From richard at jowsey.com  Fri Jan 31 12:28:05 2003
From: richard at jowsey.com (Richard Jowsey)
Date: Thu Jan 30 20:28:35 2003
Subject: [Spambayes] Bayesian virus detection?
Message-ID: <3E3A6BD5.31975.6094EA5@localhost>

I've "accidently" captured a handful of VB-script-type virii 
lately, since my spam training corpus apparently (luckily!) 
contained a few of these nasties. Got me thinking... I'd like to 
try training my classifier with a corpus of viral material, plus 
add a "virus" classification category into the mix, see what 
happens. 

However, I haven't a clue as to how to go about deliberately 
collecting such crud (and apparently, neither does Google)...

Bloody weird question, but does anyone happen to have a 
collection of quarantined emails containing viral attachments, 
trojans, etc...? Which they could ZIP and send me for research 
purposes? 

Promise not to forward 'em to all my friends, ha ha...

TIA,
Richard

PS: script kiddies and assorted black-hat hackers are officially 
invited to mail their nasty little efforts to richard@jowsey.com


From tim at fourstonesExpressions.com  Thu Jan 30 19:36:07 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Thu Jan 30 20:36:44 2003
Subject: [Spambayes] Bayesian virus detection?
In-Reply-To: <3E3A6BD5.31975.6094EA5@localhost>
Message-ID: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst>

I wonder if Neale's honeypot prog could be used for that...  - TimS

1/30/2003 7:28:05 PM, "Richard Jowsey" <richard@jowsey.com> wrote:

>I've "accidently" captured a handful of VB-script-type virii 
>lately, since my spam training corpus apparently (luckily!) 
>contained a few of these nasties. Got me thinking... I'd like to 
>try training my classifier with a corpus of viral material, plus 
>add a "virus" classification category into the mix, see what 
>happens. 
>
>However, I haven't a clue as to how to go about deliberately 
>collecting such crud (and apparently, neither does Google)...
>
>Bloody weird question, but does anyone happen to have a 
>collection of quarantined emails containing viral attachments, 
>trojans, etc...? Which they could ZIP and send me for research 
>purposes? 
>
>Promise not to forward 'em to all my friends, ha ha...
>
>TIA,
>Richard
>
>PS: script kiddies and assorted black-hat hackers are officially 
>invited to mail their nasty little efforts to richard@jowsey.com
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From grobinson at transpose.com  Thu Jan 30 22:30:15 2003
From: grobinson at transpose.com (Gary Robinson)
Date: Thu Jan 30 22:30:46 2003
Subject: [Spambayes] Re: egregious patents on anti-spam techniques
In-Reply-To: <E18eOAh-0005If-00@mail.python.org>
Message-ID: <BA5F57F7.1E2E9%grobinson@transpose.com>


> 
> Graham derived filters do map words into a 'continuous semantic vector
> space', namely the one dimensional vector
> space of the range of [0.0, 1.0] of real numbers, and then 'clustering
> techniques' are applied. Normally clusters are
> defined by hyperplanes in N-Space, but in one dimesion they would be
> threshold values. The two 'symantic anchors' are arguably cluster
> centers located at 0.0 and 1.0 (also known as ham and spam in
> Graham-derived filters).
> 


I completely understand what you're saying, but having spend a lot of money
and time researching the doctrine of equivalents and the various court
rulings about it, I really, really think that's too much of a stretch. To be
equivalent, the method has to perform "substantially the same function in
substantially the same way"  as the one in the patent. (Graver Tank & Mfg.
Co. v. Linde Air Products, 339 U.S. 605 (1950))

The latter requirement is just not the case here, IMO.

Moreover even if it were the case, many things can happen during the
prosecution of a patent that make the doctrine of equivalents unavailable to
the patent holder. In order to get a grasp on that in this specific case, we
would have to look at the "file wrapper" which contains the history of the
interaction with the patent office examiner. The wind in the courts has been
blowing against the doctrine of equivalents for years.

And without the DoE, you have to infringe exactly, and that is not the case
here. 


> In fact it is quite reasonable to describe a Graham-derived filter as
> having a 'ham anchor' that can
> be described as a location in N-Space in which each token string
> describes a dimension and the
> 'clue' value for that string is the location of the anchor in that
> dimension.

Again, that is just way too much of a stretch to invoke the DoE IMO. There
is no point in n-space that represents spam or ham in Graham. There is no
location. We are reducing everything to one dimenension before we ever try
to determine ham or spamminess, and being near 0 or 1 is not "substantially
the same" as measuring a distance to a point in n-space.

One thing I don't think we should do though is get into a nit-picking
argument about this. It isn't worth it. :)

The main thing is that I agree, prior art should be found if possible to
lessen the chance that anybody is hassled with this. And of course there is
always a random chance element when things actually get to trial. And
getting things right can involve an expensive appeal process. I've read
about very stupid decisions in the lower courts that were corrected on
appeal. But the open-source community wouldn't easily be able to pay for
such a long and expensive process.

So it's certainly better to be safe than sorry.

> I'd like to suggest that it would be good to file a protest as Kaitlin
> suggested. There was certainly
> work done in this area before June 14, 2001.  Does anyone have pointers
> they can pass along.

This might help and couldn't hurt. As I read the claims and specification,
the claims are specific to a particular methodology that is different from
Graham and so I doubt that prior art that is graham-like will overturn the
claims. However, the prior art may go into the prosecution history in such a
way that it could greatly reduce any chance that anyone could try to use the
patent to attack graham-based filters. So it would be worth doing if anyone
can find some prior art.

Gary


From tim.one at comcast.net  Thu Jan 30 23:19:18 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Jan 30 23:19:55 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
 (Kaitlin Duck   Sherwood)
In-Reply-To: <15929.49519.664913.836891@montanaro.dyndns.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>

[Skip Montanaro]
> It would probably be a good idea to collect a bibliography on the
> spambayes website, though I'm short on time at the moment.  This is one
> of those things where a Wiki would be marvelous.  Anthony, can MoinMoin
> be run on SF?

I'm not Anthony, but I doubt it.

> If not, I'd be happy to create a new MoinMoin instance on
> manatee.mojam.com.

Note that Gary Robinson set up a Spam Wiki last year:

    http://wecanstopspam.org/jsp/Wiki?StartingPoints

It's well done and well organized, but doesn't seem to have attracted a
community yet.  If I had something to say, I'd say it there <wink>.


From anthony at interlink.com.au  Fri Jan 31 15:33:30 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu Jan 30 23:35:44 2003
Subject: [Spambayes] Skip's Installation.txt 
In-Reply-To: <DCF0A987C0ONXS1XHEFEED4XNFCKG52.3e39c823@myst> 
Message-ID: <200301310433.h0V4XUR26793@localhost.localdomain>


>>> Tim Stone - Four Stones Expressions wrote
> It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated
> into the website... is this on anybody's radar?  If someone has started doing
> this, then I don't want to reinvent it...

(I assume you mean "INTEGRATION.txt")

I plan to do another reorganisation - this time, of the documentation - 
at the moment, this includes:

Website "background.html" page.
README.txt
INTEGRATION.txt
HAMMIE.txt

Anyone have other docs that they want to throw into the documentation
salad?

It's "in progress" at the moment - I've not checked any of it into the
website CVS repository, as it's got bits everywhere. I might make a 
branch or something, and figure out a way to put this on the website,
in a way that's not the "default" website.

>>> Neale Pickett wrote
> So, has someone officially volunteered to be the documentation
> coordinator?  It sounds like we need someone for the job...

I've been doing a lot of this, and I'm happy to continue doing so.

Anthony

From tim.one at comcast.net  Fri Jan 31 00:10:39 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Jan 31 00:11:25 2003
Subject: [Spambayes] Outlook plugin notes
In-Reply-To: <w53adhk2ahb.fsf@woozle.org>
Message-ID: <LNBBLJKPBEHFEDALKOLCIEHLDLAB.tim.one@comcast.net>

> * All outbound mail should be trained as ham

In connection with the Outlook client specifically, this one is especially
dubious:  Outlook doesn't use "normal" internet formats internally, and the
headers on outgoing mail *as saved in* Sent Items are especially sparse.
For example, this is the complete collection of headers on the last msg I
sent to Python-Dev, as it exists in my Sent Items folder:

"""
X-Exchange-Message: true
Subject: RE: [Python-Dev] Re: native code compiler? (or, OCaml vs. Python)
To: python-dev@python.org
"""

That's it.  No sender, date, from, errors-to, etc etc.  The *lack* of such
headers generates tokens in the default Outlook classifier, and if you don't
train on outgoing msgs they become good spam clues.

OTOH, if sent msgs were special-cased so that just the body got tokenized,
it might be helpful for people who get very few msgs.  But I expect that
anyone using email enough to feel burdened by spam probably also gets more
ham than they have time to deal with.  A better trick may be to make a point
of training-as-ham on received msgs that are *replied* to.


From Paul.Moore at atosorigin.com  Fri Jan 31 09:16:14 2003
From: Paul.Moore at atosorigin.com (Moore, Paul)
Date: Fri Jan 31 04:16:48 2003
Subject: [Spambayes] Outlook plugin notes
Message-ID: <16E1010E4581B049ABC51D4975CEDB886199C6@UKDCX001.uk.int.atosorigin.com>

From: Tim Peters [mailto:tim.one@comcast.net]
> But I expect that anyone using email enough to feel burdened
> by spam probably also gets more ham than they have time to
> deal with.

I'm definitely a counterexample to that. At my home account,
spam outweighs ham by factors of hundreds. But the address is
published in a few places, enough that I don't want to abandon
it.

So spambayes routinely dumps 50-100 spam into my spam folder
every day, and once every week or two leaves a ham message in
my inbox. This is vital to me, as doing it manually pretty
much guaranteed I'd miss something. My favourite example of
that was the email from my sister, reminding me I owed her
some money - "Subject: Debt!!!". Manually, I'd never even
look at the body (and the sender address didn't register with
me at first, either...)

> A better trick may be to make a point of training-as-ham on
> received msgs that are *replied* to.

That would be interesting...

Paul.

From mhammond at skippinet.com.au  Fri Jan 31 22:12:31 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Jan 31 06:13:27 2003
Subject: [Spambayes] Bayesian virus detection?
In-Reply-To: <3E3A6BD5.31975.6094EA5@localhost>
Message-ID: <001301c2c919$a6decd60$530f8490@eden>

> I've "accidently" captured a handful of VB-script-type virii
> lately, since my spam training corpus apparently (luckily!)
> contained a few of these nasties. Got me thinking... I'd like to
> try training my classifier with a corpus of viral material, plus
> add a "virus" classification category into the mix, see what
> happens.
>
> However, I haven't a clue as to how to go about deliberately
> collecting such crud (and apparently, neither does Google)...

I have a Python script that collects lots of "klez" and other "iframe
vulnerability" variants.  IIRC, these have their payload in some illegal
HTML inside an iframe tag.  My outlook (ie, late/patched versions) discards
the illegal HTML, so the payload is lost.  Lots of other useful stuff is
also lost just due to the fact we are talking Outlook <wink>.  In general
though, these mails tend to have very standard or empty bodies, making our
spambayes Outlook filter tend to put them in the "unsure" category.  If I
train enough of them, the filter does eventually get it correct, but I found
a fairly trivial, stand-alone Python based filter works fine.

I collect around 150 of these a day though if you want them.

Mark.


From skip at pobox.com  Fri Jan 31 06:12:13 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 31 07:12:20 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
 (Kaitlin Duck   Sherwood)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>
References: <15929.49519.664913.836891@montanaro.dyndns.org>
        <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>
Message-ID: <15930.26653.795305.227639@montanaro.dyndns.org>


    Tim> Note that Gary Robinson set up a Spam Wiki last year:

    Tim>     http://wecanstopspam.org/jsp/Wiki?StartingPoints

    Tim> It's well done and well organized, but doesn't seem to have
    Tim> attracted a community yet.  If I had something to say, I'd say it
    Tim> there <wink>.

Good enough for me.

Skip

From skip at pobox.com  Fri Jan 31 06:18:42 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 31 07:18:49 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
 (Kaitlin Duck   Sherwood)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>
References: <15929.49519.664913.836891@montanaro.dyndns.org>
        <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>
Message-ID: <15930.27042.863961.390657@montanaro.dyndns.org>


    Tim> Note that Gary Robinson set up a Spam Wiki last year:

    Tim>     http://wecanstopspam.org/jsp/Wiki?StartingPoints

Added to the links on the related page.  I checked it into CVS but can't
"make install".  Once Anthony refreshes the website it will be visible.

Skip

From papaDoc at videotron.ca  Fri Jan 31 08:08:41 2003
From: papaDoc at videotron.ca (papaDoc)
Date: Fri Jan 31 08:08:42 2003
Subject: [Spambayes] Skip's Installation.txt
In-Reply-To: <DCF0A987C0ONXS1XHEFEED4XNFCKG52.3e39c823@myst>
References: <DCF0A987C0ONXS1XHEFEED4XNFCKG52.3e39c823@myst>
Message-ID: <3E3A7559.7000007@videotron.ca>

Hi,

>It seems as if INSTALLATION.TXT has a gob of info that needs to be integrated 
>into the website... is this on anybody's radar?  If someone has started doing 
>this, then I don't want to reinvent it...
>
I have an html file describing how to use pop3proxy and mozilla and some
note on how to use it with ??? on Mac OS X from Francois Granger.

But I need to update this for the new pop3proxy (i.e. had a part when a 
mail client cannot
specify different ports.)

Remi Ricard


From tim at fourstonesExpressions.com  Fri Jan 31 07:12:08 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 31 08:12:47 2003
Subject: [Spambayes] Skip's Installation.txt 
In-Reply-To: <200301310433.h0V4XUR26793@localhost.localdomain>
Message-ID: <ECBB0RMWQVT94QPLK94C0FC2WQTSOJ.3e3a7628@myst>

1/30/2003 10:33:30 PM, Anthony Baxter <anthony@interlink.com.au> wrote:

>
>>>> Tim Stone - Four Stones Expressions wrote
>> It seems as if INSTALLATION.TXT has a gob of info that needs to be 
integrated
>> into the website... is this on anybody's radar?  If someone has started 
doing
>> this, then I don't want to reinvent it...
>
>(I assume you mean "INTEGRATION.txt")
Yeah.  winduhs has dulled my pattern recognition circuits <wink>
>
>I plan to do another reorganisation - this time, of the documentation - 
>at the moment, this includes:
>
>Website "background.html" page.
>README.txt
>INTEGRATION.txt
>HAMMIE.txt
>
>Anyone have other docs that they want to throw into the documentation
>salad?
>
>It's "in progress" at the moment - I've not checked any of it into the
>website CVS repository, as it's got bits everywhere. I might make a 
>branch or something, and figure out a way to put this on the website,
>in a way that's not the "default" website.
Let me know if there's anything I can do to help  - TimS
>
>>>> Neale Pickett wrote
>> So, has someone officially volunteered to be the documentation
>> coordinator?  It sounds like we need someone for the job...
>
>I've been doing a lot of this, and I'm happy to continue doing so.
>
>Anthony
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From skip at pobox.com  Fri Jan 31 07:23:37 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 31 08:23:48 2003
Subject: [Spambayes] Bayesian virus detection?
In-Reply-To: <001301c2c919$a6decd60$530f8490@eden>
References: <3E3A6BD5.31975.6094EA5@localhost>
        <001301c2c919$a6decd60$530f8490@eden>
Message-ID: <15930.30937.343093.173755@montanaro.dyndns.org>


    Mark> In general though, these mails tend to have very standard or empty
    Mark> bodies, making our spambayes Outlook filter tend to put them in
    Mark> the "unsure" category.  

Maybe *your* spambayes Outlook filter can't tell, but *my* spambayes hammie
filter does just fine with them, thank you very much. ;-)

At any rate, after having a chance to ponder a few such messages, spambayes
seems to just vacuum them right up.

Skip

From mourad at aquazul.com  Fri Jan 31 08:26:12 2003
From: mourad at aquazul.com (Mourad De Clerck)
Date: Fri Jan 31 10:13:11 2003
Subject: [Spambayes] use as generic classifier? (not just spam/ham)
Message-ID: <3E3A2514.4000003@aquazul.com>

(please cc: any replies - I'm not subscribed. thx.)

Hi,

This usage is probably rather obvious, but I haven't seen any discussion 
about it, so I'm asking...

I was wondering if spambayes could be used in a different way. Currently 
I have a couple of users on a mailserver (imap+maildir). I'd like to 
make it possible to have them create different subfolders where they 
move mail to (from a mailinglist, about a specific subject, from a 
specific person - whatever) through their standard imap client. Now, 
when a new mail arrives on the mailserver, I'd like that mail to be 
"scored" against all these folders and subfolders (maildirs), and for it 
to be delivered to the folder that corresponds to the system's "best 
guess". If the scores for all folders are within a certain (smallish) 
range (percentage), the system's best guess is not good enough and it 
should just be delivered to the standard inbox.

For the user, I believe this would be extremely convenient, instead of 
having to do rule-based filtering on the client side, or using something 
like procmail. Just by organising your mail like you normally would in 
any case, your new incoming mail would be matched automatically to the 
right folder.

Is this possible with spambayes?

-- Mourad


From neale at woozle.org  Fri Jan 31 08:44:26 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 31 11:44:38 2003
Subject: [Spambayes] Bayesian virus detection?
In-Reply-To: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst> (Tim Stone - Four
 Stones Expressions's message of "Thu, 30 Jan 2003 19:36:07 -0600")
References: <43A5PLLWTNLUQ4ZHFF00NJA5Y2USO.3e39d307@myst>
Message-ID: <w53vg05uv4l.fsf@woozle.org>

Tim Stone - Four Stones Expressions <tim@fourstonesExpressions.com> writes:

> [about collecting worms]
> I wonder if Neale's honeypot prog could be used for that...  - TimS

Not yet.  All the worms I know about scan your address book.  It would
take one that tries to send mail to random IPs for my honeypot to catch
it.

Speaking of which, just as Matt Sargeant predicted, the honeypot isn't
getting hit as much as I thought it would:

  http://woozle.org/~spam/stats.cgi

Neale

PS: Klez and its ilk are properly called worms, not viruses.  Not that
    anyone would fail to understand what you meant if you said virus, I
    guess.

PPS: The plural of "virus" is "viruses".  "viri" means "men".  "virii"
     doesn't mean anything at all.  There are other latin -us latin
     words which are not pluralized by going to -i (notably, corpus).

From msergeant at startechgroup.co.uk  Fri Jan 31 17:07:34 2003
From: msergeant at startechgroup.co.uk (Matt Sergeant)
Date: Fri Jan 31 12:07:38 2003
Subject: [Spambayes] Bayesian virus detection?
In-Reply-To: <w53vg05uv4l.fsf@woozle.org>
Message-ID: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk>

On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote:

> PS: Klez and its ilk are properly called worms, not viruses.  Not that
>     anyone would fail to understand what you meant if you said virus, I
>     guess.

Worms are viruses. As are Trojans and a whole bunch of other things. 
Virus is the generic term for a program that spreads itself by one 
means or another. The more specific term (e.g. worm) indicates the 
means with which it spreads.


From neale at woozle.org  Fri Jan 31 09:33:05 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 31 12:33:13 2003
Subject: [Spambayes] OT: pedantism (was: Bayesian virus detection?)
In-Reply-To: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk> (Matt
 Sergeant's message of "Fri, 31 Jan 2003 17:07:34 +0000")
References: <7DA029EF-353E-11D7-9648-0003939CB5D8@startechgroup.co.uk>
Message-ID: <w53of5xusvi.fsf@woozle.org>

Matt Sergeant <msergeant@startechgroup.co.uk> writes:

> On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote:
>
>> PS: Klez and its ilk are properly called worms, not viruses.  Not that
>>     anyone would fail to understand what you meant if you said virus, I
>>     guess.
>
> Worms are viruses.  As are Trojans and a whole bunch of other
> things. Virus is the generic term for a program that spreads itself by
> one means or another. The more specific term (e.g. worm) indicates the
> means with which it spreads.

Hmm, not according to my dictionary:

  http://catb.org/esr/jargon/html/entry/virus.html

Although it does appear that I was incorrect in characterising Klez as a
worm--it would be, according to esr, a virus.

Every report I've seen on Klez calls it a "worm".  So maybe the larger
point is that there's no longer a clear consensus on what is a worm and
what is a virus, and I should go back to merging mboxtrain and
hammiebulk.  ;)

Neale

From neale at woozle.org  Fri Jan 31 09:42:54 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 31 12:42:58 2003
Subject: [Spambayes] use as generic classifier? (not just spam/ham)
In-Reply-To: <3E3A2514.4000003@aquazul.com> (Mourad De Clerck's message of
 "Fri, 31 Jan 2003 08:26:12 +0100")
References: <3E3A2514.4000003@aquazul.com>
Message-ID: <w53lm11usf5.fsf@woozle.org>

Mourad De Clerck <mourad@aquazul.com> writes:

> when a new mail arrives on the mailserver, I'd like that mail to be
> "scored" against all these folders and subfolders (maildirs), and for it
> to be delivered to the folder that corresponds to the system's "best
> guess". If the scores for all folders are within a certain (smallish)
> range (percentage), the system's best guess is not good enough and it
> should just be delivered to the standard inbox.

While I haven't used it myself, I think you just described ifile:
http://www.nongnu.org/ifile/

I don't there's nothing stopping you from hacking spambayes to do n-way
classification, although I don't compeletely understand all the math
yet, so I couldn't say for sure.  If you can get spambayes to do this
trick, I at least would like to try folding this in to our sources.  But
I think it would be difficult, since currently everything about
spambayes is geared to having two categories.

Neale


From acapnotic at users.sourceforge.net  Fri Jan 31 09:43:44 2003
From: acapnotic at users.sourceforge.net (Kevin Turner)
Date: Fri Jan 31 12:43:50 2003
Subject: [Spambayes] Ximian Evolution
Message-ID: <1044035023.21121.93.camel@troglodyte.funhouse>

[apologies if you see this twice; sent it from the wrong address the
first time and it stuck in the moderator queue.]

A search of the list doesn't turn anything up, so I guess this is a new
thread.  Has anyone given any thought to how to use spambayes with the
Evolution MUA without procmail or proxy?

I just came up with something that sounds plausible; I haven't coded it
yet.  Current versions of Evolution 1.2 can filter messages based on the
exit code of a process you pipe them to.  Or you may be able to chain
filters, first piping it through an action to add the spambayes header,
then filtering on that.[1]  Either way, filtering shouldn't be hard.

For trailing, Evolution seems to lack a way to script it, but you could
copy messages to folders[2] and run hammie on them at your leisure or
via a cron job.  (Does hammie move messages out of this mailbox or flag
them as trained in some way so it doesn't double-train them?)

It would be nice to avoid process-per-message overhead.  Maybe bonobo
holds some wonderful answer to that, I don't know.  For now, I'm
probably willing to not worry about it.

Has anyone set this up?  Can we add a section about it in the
integration docs?  That'd be super.

Thanks,

 - Kevin


[1] The filter-on-header method has the advantage of being easier to
trace and debug, I only hesitate because I haven't convinced myself I
know what order Evolution applies filters in and the conditions in which
it chains them.

[2] These could be in ~/evolution/local, or configured as a separate
account of prococol "mbox" or "Maildir"; whichever option rankles
evolution less, considering we may modify the contents of the mailbox
without its knowledge.  I'm guessing the "seperate account" option is
probably better for that.

-- 
The moon is new, 1.2% illuminated, 28.5 days old.

From grobinson at transpose.com  Fri Jan 31 12:45:04 2003
From: grobinson at transpose.com (Gary Robinson)
Date: Fri Jan 31 12:45:03 2003
Subject: [Spambayes] Re: use as generic classifier? (not just spam/ham)
Message-ID: <BA602050.1E366%grobinson@transpose.com>

I've been thinking about that myself. I wonder if anyone else on this list
has been?

There are traditional classification techniques for classifying documents.
Each document becomes a vector in n-space, and space is partitioned
according to subject area. But that technology is typically complex and
expensive.

it would be interesting to take the whole spambayes spam/ham approach, and
run it separately for every folder. Just do the same exact thing, but
instead of the target being a junk mail folder, it would be a folder of some
subject matter.

So for each folder, instead of spam vs. nonspam, <that folder> vs. <not that
folder> would be calculated. Then, the email would be allocated to the
folder with the greatest certainty would be picked.

That is, I wouldn't try to adapt the approach to work with more than a
binary choice at a time. I don't think that would work (at least not with
chi-square). But if you calculate the binary choice w/r/t each folder, the
one with the most certainty can be chosen.

It's a little bit ugly, if it gets the job done, so what.

I think it would work. It might not work quite as well as in the spam/ham
case, but it would be interesting to actually try it and see how well it
did.

It seems like the mods to spambayes to make it happen wouldn't be
super-huge; if enough people felt that there might be some benefit, it seems
like it might be worth doing...

Just my 2 cents!


> 
> I was wondering if spambayes could be used in a different way. Currently
> I have a couple of users on a mailserver (imap+maildir). I'd like to
> make it possible to have them create different subfolders where they
> move mail to (from a mailinglist, about a specific subject, from a
> specific person - whatever) through their standard imap client. Now,
> when a new mail arrives on the mailserver, I'd like that mail to be
> "scored" against all these folders and subfolders (maildirs), and for it
> to be delivered to the folder that corresponds to the system's "best
> guess". If the scores for all folders are within a certain (smallish)
> range (percentage), the system's best guess is not good enough and it
> should just be delivered to the standard inbox.


--Gary

-- 
[http://ThisURLEnablesEmailToGetThroughOverzealousSpamFilters.org]

Gary Robinson
CEO
Transpose, LLC
grobinson@transpose.com
207-942-3463
http://www.transpose.com
http://radio.weblogs.com/0101454


From francois.granger at free.fr  Fri Jan 31 18:45:10 2003
From: francois.granger at free.fr (Francois Granger)
Date: Fri Jan 31 12:45:16 2003
Subject: [Spambayes] Well done
Message-ID: <a05200f23ba60659662fd@[192.168.1.20]>

Spambayes is really astonishing. It flagged this mail with really few clues.
Following, mail then clues

========================================
Return-Path: <dananunoaawf@aol.com>
Delivered-To: pbarn@altern.org
Received: (qmail 2141 invoked by alias); 31 Jan 2003 15:51:04 -0000
Received: from unknown (HELO aol.com) (213.42.188.253)
   by altern.org with SMTP; 31 Jan 2003 15:51:04 -0000
Message-ID: <000810a4ed04$baa75782$67542232@qdsbgnj.aye>
From: <dananunoaawf@aol.com>
To: <osa@altern.org>
Cc: <osp@altern.org>,
	<oumpahpah@altern.org>,
	<ouplaboum@altern.org>,
	<pagano@altern.org>,
	<pagano2@altern.org>,
	<paimpol@altern.org>,
	<patate20@altern.org>,
	<patate22@altern.org>,
	<patoch@altern.org>,
	<patrickdombrowsky@altern.org>,
	<patrickv@altern.org>,
	<paul@altern.org>,
	<pbarn@altern.org>
Subject: *lock in your 4 pct rate 123 
2633BvYK4-226sJyQ3-17
Date: Fri, 31 Jan 2003 10:25:26 +0500
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_NextPart_000_00C1_74D58D0A.C2060E16"
X-Priority: 3
X-Mailer: Microsoft Outlook, Build 10.0.2616
Importance: Normal
Status:
X-Spambayes-Classification: spam

<x-html><!x-stuff-for-pete base="" src="" id="0" charset="">%


Pftjyealjgultwb

<html><head><title>::M::</title></head><body><center><a 
href="http://211.157.100.107/mortgage/Lead236/index.htm"><img 
border="0" src="http://211.157.100.107/mortgage/p2X.bmp" width="427" 
height="252">
</a>
</center><p><a target="_blank" href="re-move.htm"><font face="Arial" 
size="3">no
more mail?</font></a></p>
</body></html>

X5P5lphu0ycTA7e7H3Na

2129PNyE3-553ZPfu8938Kml22</x-html>

========================================


Spam probability: 0.983622180161
*H*  0.00646291040099
*S*  0.973707270724
header:MIME-Version:1  0.298172028413
header:Return-Path:1  0.376864012565
header:Message-ID:1  0.385762811067
header:Importance:1  0.687752526214
content-type:multipart/mixed  0.714739667627
subject:123  0.844827586207
header:Received:2  0.852656659762
from:no real name:2**0  0.881348131394
subject:your  0.934782608696
x-mailer:microsoft outlook, build 10.0.2616  0.934782608696
from:addr:aol.com  0.95871559633
========================================

-- 
Recently using MacOSX.......

From tim at fourstonesExpressions.com  Fri Jan 31 11:45:11 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 31 12:45:50 2003
Subject: [Spambayes] OT: pedantism (was: Bayesian virus detection?)
In-Reply-To: <w53of5xusvi.fsf@woozle.org>
Message-ID: <TSPNNJNYD883NHCWRTO6YT75LD.3e3ab627@myst>

1/31/2003 11:33:05 AM, Neale Pickett <neale@woozle.org> wrote:

>Matt Sergeant <msergeant@startechgroup.co.uk> writes:
>
>> On Friday, Jan 31, 2003, at 16:44 Europe/London, Neale Pickett wrote:
>>
>>> PS: Klez and its ilk are properly called worms, not viruses.  Not that
>>>     anyone would fail to understand what you meant if you said virus, I
>>>     guess.
>>
>> Worms are viruses.  As are Trojans and a whole bunch of other
>> things. Virus is the generic term for a program that spreads itself by
>> one means or another. The more specific term (e.g. worm) indicates the
>> means with which it spreads.
>
>Hmm, not according to my dictionary:
>
>  http://catb.org/esr/jargon/html/entry/virus.html
>
>Although it does appear that I was incorrect in characterising Klez as a
>worm--it would be, according to esr, a virus.
>
>Every report I've seen on Klez calls it a "worm".  So maybe the larger
>point is that there's no longer a clear consensus on what is a worm and
>what is a virus, and I should go back to merging mboxtrain and
>hammiebulk.  ;)

I think that while you're at it, we should refactor the Corpus stuff, so that 
messages and databases and training and classifying are all handled in exactly 
one place in the system.  Richie has this idea of a 'spambayes server' which 
is the heart and soul of the systems, and that all the user facing stuff 
fronts.... what say you?  - TimS
>
>Neale
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From acapnotic at foobox.net  Fri Jan 31 09:39:28 2003
From: acapnotic at foobox.net (Kevin Turner)
Date: Fri Jan 31 13:00:22 2003
Subject: [Spambayes] Ximian Evolution
Message-ID: <1044034766.21120.88.camel@troglodyte.funhouse>

A search of the list doesn't turn anything up, so I guess this is a new
thread.  Has anyone given any thought to how to use spambayes with the
Evolution MUA without procmail or proxy?

I just came up with something that sounds plausible; I haven't coded it
yet.  Current versions of Evolution 1.2 can filter messages based on the
exit code of a process you pipe them to.  Or you may be able to chain
filters, first piping it through an action to add the spambayes header,
then filtering on that.[1]  Either way, filtering shouldn't be hard.

For trailing, Evolution seems to lack a way to script it, but you could
copy messages to folders[2] and run hammie on them at your leisure or
via a cron job.  (Does hammie move messages out of this mailbox or flag
them as trained in some way so it doesn't double-train them?)

It would be nice to avoid process-per-message overhead.  Maybe bonobo
holds some wonderful answer to that, I don't know.  For now, I'm
probably willing to not worry about it.

Has anyone set this up?  Can we add a section about it in the
integration docs?  That'd be super.

Thanks,

 - Kevin


[1] The filter-on-header method has the advantage of being easier to
trace and debug, I only hesitate because I haven't convinced myself I
know what order Evolution applies filters in and the conditions in which
it chains them.

[2] These could be in ~/evolution/local, or configured as a separate
account of prococol "mbox" or "Maildir"; whichever option rankles
evolution less, considering we may modify the contents of the mailbox
without its knowledge.  I'm guessing the "seperate account" option is
probably better for that.

-- 
The moon is new, 1.2% illuminated, 28.5 days old.

From francois.granger at free.fr  Fri Jan 31 19:04:03 2003
From: francois.granger at free.fr (Francois Granger)
Date: Fri Jan 31 13:04:21 2003
Subject: [Spambayes] This error is new to me
Message-ID: <a05200f24ba6067b3e1b9@[192.168.1.20]>

I cut and past a message from Eudora to pop3proxy then click on classify.
I got this trace back. I redid it another time to be sure, same trace back.
I loaded it from the file in the unknown folder, same result.

The only "wrong" & I could find was in this tag:

<a 
href="http://members.techrepublic.com/cgi-bin9/flo/y/hLHL0GmgEu0GMo0BMqK0Ao&name=francois.granger@free.fr">

I can send the mail on request.

500 Server error

Traceback (most recent call last):

   File "/Volumes/OS99/spambayes/spambayes/Dibbler.py", line 398, in 
found_terminator
     getattr(plugin, name)(**params)

   File "/Volumes/OS99/spambayes/pop3proxy.py", line 967, in onClassify
     cluesTable += cluesRow % (word, wordProb)

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 875, in __mod__
     self._replaceNodeContent(element, sequence.pop())

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 654, 
in _replaceNodeContent
     node.children = self._nodeListFromSource(value)

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 640, 
in _nodeListFromSource
     tree = _generateTree("<x>"+value+"</x>")

   File "/Volumes/OS99/spambayes/spambayes/PyMeldLite.py", line 575, 
in _generateTree
     g.close()

   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", 
line 172, in close
     self.goahead(1)

   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", 
line 405, in goahead
     self.syntax_error("bogus `%s'" % data)

   File 
"/BinaryCache/python/python-3.root~193/usr/lib/python2.2/xmllib.py", 
line 794, in syntax_error
     raise Error('Syntax error at line %d: %s' % (self.lineno, message))

Error: Syntax error at line 1: bogus `&'

-- 
Recently using MacOSX.......

From neale at woozle.org  Fri Jan 31 10:18:49 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 31 13:18:57 2003
Subject: [Spambayes] Ximian Evolution
In-Reply-To: <1044035023.21121.93.camel@troglodyte.funhouse> (Kevin Turner's
 message of "31 Jan 2003 09:43:44 -0800")
References: <1044035023.21121.93.camel@troglodyte.funhouse>
Message-ID: <w5365s5uqra.fsf@woozle.org>

Kevin Turner <acapnotic@users.sourceforge.net> writes:

> I just came up with something that sounds plausible; I haven't coded it
> yet.  Current versions of Evolution 1.2 can filter messages based on the
> exit code of a process you pipe them to.  Or you may be able to chain
> filters, first piping it through an action to add the spambayes header,
> then filtering on that.[1]  Either way, filtering shouldn't be hard.

Okay, that would be easy enough to add as an option to hammiefilter.
What exit codes mean what?  Even better would be if you could run
everything through hammiefilter -t, and use the stdout output as the
message.  Then you could filter on header.

> For trailing, Evolution seems to lack a way to script it, but you could
> copy messages to folders[2] and run hammie on them at your leisure or
> via a cron job.  (Does hammie move messages out of this mailbox or flag
> them as trained in some way so it doesn't double-train them?)

See HAMMIE.txt for an explanation of how to do this.  But I might need
to add a new Evolution folder type if it don't use MH, Maildir, or mbox
mail spools internally.


> It would be nice to avoid process-per-message overhead.  Maybe bonobo
> holds some wonderful answer to that, I don't know.  For now, I'm
> probably willing to not worry about it.

I've only used Evolution once or twice, but it seems to me that their
whole gig is to get you to write plugins for everything.  So a bonobo
(or whatever their object broker is called) component would be the Right
Thing.  If we could make it as snazzy as the Outlook plugin, that'd be
even Righter.

Neale

From richie at entrian.com  Thu Jan 30 18:35:40 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 31 13:36:35 2003
Subject: [Spambayes] seg faults?
In-Reply-To: <15926.65353.991418.385713@montanaro.dyndns.org>
References: <a0521021eba5b35da4ca4@[192.168.1.101]>
	<c13b3vs7la94s51qckbgupatjelavhe2uv@4ax.com>
	<a05210222ba5b3e6d4f14@[192.168.1.101]>
	<20030127135919.A15195@discworld.dyndns.org>
	<a05200f16ba5b401d0e41@[10.0.1.2]>
	<vntd3vsp1kgg2nhnj6qravvt1cjq7pfutr@4ax.com>
	<15926.65353.991418.385713@montanaro.dyndns.org>
Message-ID: <44ri3v8sis31b5gu9uj06j749lgse3kndg@4ax.com>


[Richie]
> Can a process increase its own stack size?

[Skip]
> Here's the relevant code from Lib/test/regrtest.py:

Wonderful, many thanks.  I've integrated this into pop3proxy.py.

We should probably go with Tony's idea of having a module for this kind of
platform-dependent stuff eventually.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Thu Jan 30 18:35:38 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 31 13:36:45 2003
Subject: [Spambayes] Alpha 2 Release?
In-Reply-To: <a05200f2aba5dd54b1b98@[192.168.1.20]>
References: <a4233vsjdua1o8ufi2d1f8mei91o1h2eda@4ax.com>
	<w53ptqi1ej4.fsf@woozle.org> <88gd3v4cj7v7qmdl28hofbj2ptih6hd13s@4ax.com>
	<a05200f2aba5dd54b1b98@[192.168.1.20]>
Message-ID: <0spi3v403pm760r3v19fg44tq5ei6heqlq@4ax.com>


[Francois]
> UnicodeEncodeError: 'ascii' codec can't encode character '\ue9' in 
> position 86: ordinal not in range(128)

This is fixed (I hope!).  You can go back to being "Fran?ois" now.  8-)

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Thu Jan 30 18:35:43 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 31 13:36:56 2003
Subject: [Spambayes] This error is new to me
In-Reply-To: <a05200f24ba6067b3e1b9@[192.168.1.20]>
References: <a05200f24ba6067b3e1b9@[192.168.1.20]>
Message-ID: <furi3vk78dh89rpetnegjasbeaf80d338b@4ax.com>


[Fran?ois]
> I got this trace back.
> [...]
> I can send the mail on request.

Yes please - could you zip up the file from the unknown folder and send me
the zip file?  Many thanks.

And if it's not too much hassle, could you update your software and see
whether my fix to your accented-characters problem with 2.3 is properly
fixed?  I'm pretty sure I've fixed it, but since I'm about to build the
Alpha 2 release I'd love to have independent confirmation.  Not to worry if
you don't have the time.

-- 
Richie Hindle
richie@entrian.com


From richie at entrian.com  Thu Jan 30 19:13:31 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 31 14:14:25 2003
Subject: [Spambayes] This error is new to me
In-Reply-To: <furi3vk78dh89rpetnegjasbeaf80d338b@4ax.com>
References: <a05200f24ba6067b3e1b9@[192.168.1.20]>
	<furi3vk78dh89rpetnegjasbeaf80d338b@4ax.com>
Message-ID: <29ui3v48b8u3htnnbctd17s93g28egd5a4@4ax.com>


[Fran?ois]
> I got this trace back.

Now fixed.

-- 
Richie Hindle
richie@entrian.com


From neale at woozle.org  Fri Jan 31 11:31:44 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Jan 31 14:31:51 2003
Subject: [Spambayes]  Re: egregious patents on anti-spam techniques
 (Kaitlin Duck   Sherwood)
In-Reply-To: <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net> (Tim
 Peters's message of "Thu, 30 Jan 2003 23:19:18 -0500")
References: <LNBBLJKPBEHFEDALKOLCIEHGDLAB.tim.one@comcast.net>
Message-ID: <w531y2tundr.fsf@woozle.org>

Tim Peters <tim.one@comcast.net> writes:

> Note that Gary Robinson set up a Spam Wiki last year:
>
>     http://wecanstopspam.org/jsp/Wiki?StartingPoints

Sweet, I forgot about that.  I'm updating it with what I've learned so
far from running spampot.

Neale

From richie at entrian.com  Thu Jan 30 20:33:22 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Jan 31 15:34:19 2003
Subject: [Spambayes] Alpha2 Pre-release
Message-ID: <ob1j3v0j41mbbqb5kjvgpjm4ofq134hmb7@4ax.com>


I've built an alpha2 source release of Spambayes.  Before we put it up on
the main web site, I'd feel a lot better if someone could smoke-test it for
me - I may have made some horrible mistake that I'm too close to see...

I've put it here:

  http://entrian.com/spambayes/spambayes-1.0a2-pre.zip
  http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz  

For POP3 proxy users, this release should be GUI out of the box - install
it, run pop3proxy.py, point your browser at the URL, go to the Config page
and enter your POP3 server details, change your email client to point at
the proxy, and you're away - messages are classfied and you can train
through the web.

For hammie users there's Neale's new muttrc and spambayes.el, and Skip's
proxytee lets hammie users train through the web interface.  Tim Stone's
import/export script should make upgrading easy, for now and in the future.
Assorted improvements to the tokeniser and classifier make spambayes even
more accurate.

What else has changed?  We should do a proper release announcement - I
don't keep up with the Outlook plug-in, so what's new there?  Who've I
offended by forgetting about their fantastic new feature?  8-)

One question for those who know about these things: I originally built the
release on Windows, but then realised that all the source files in both the
zip and tar.gz archives had Windows line-endings.  People installing and
editing on unix would see '^M's all over the place (possibly, depending on
their editor).  Is there a distutils option I've missed to prevent this?

Anyway, I rebuilt the archives on unix (thanks Neale!).

-- 
Richie Hindle
richie@entrian.com


From tim at fourstonesExpressions.com  Fri Jan 31 14:39:57 2003
From: tim at fourstonesExpressions.com (Tim Stone - Four Stones Expressions)
Date: Fri Jan 31 15:40:37 2003
Subject: [Spambayes] Alpha2 Pre-release
In-Reply-To: <ob1j3v0j41mbbqb5kjvgpjm4ofq134hmb7@4ax.com>
Message-ID: <2W75GCLI31NON1WNKGFECA8051KHKF.3e3adf1d@myst>

1/30/2003 2:33:22 PM, Richie Hindle <richie@entrian.com> wrote:

>
>I've built an alpha2 source release of Spambayes.  Before we put it up on
>the main web site, I'd feel a lot better if someone could smoke-test it for
>me - I may have made some horrible mistake that I'm too close to see...
>
>I've put it here:
>
>  http://entrian.com/spambayes/spambayes-1.0a2-pre.zip
>  http://entrian.com/spambayes/spambayes-1.0a2-pre.tar.gz  
>
>For POP3 proxy users, this release should be GUI out of the box - install
>it, run pop3proxy.py, point your browser at the URL, go to the Config page
>and enter your POP3 server details, change your email client to point at
>the proxy, and you're away - messages are classfied and you can train
>through the web.
>
>For hammie users there's Neale's new muttrc and spambayes.el, and Skip's
>proxytee lets hammie users train through the web interface.  Tim Stone's
>import/export script should make upgrading easy, for now and in the future.

The operative word there is 'should'.  Please back your database up before 
migrating it, until we know for sure there aren't bugs in the script.  - TimS

>Assorted improvements to the tokeniser and classifier make spambayes even
>more accurate.
>
>What else has changed?  We should do a proper release announcement - I
>don't keep up with the Outlook plug-in, so what's new there?  Who've I
>offended by forgetting about their fantastic new feature?  8-)
>
>One question for those who know about these things: I originally built the
>release on Windows, but then realised that all the source files in both the
>zip and tar.gz archives had Windows line-endings.  People installing and
>editing on unix would see '^M's all over the place (possibly, depending on
>their editor).  Is there a distutils option I've missed to prevent this?
>
>Anyway, I rebuilt the archives on unix (thanks Neale!).
>
>-- 
>Richie Hindle
>richie@entrian.com
>
>
>_______________________________________________
>Spambayes mailing list
>Spambayes@python.org
>http://mail.python.org/mailman/listinfo/spambayes
>
>


c'est moi - TimS
http://www.fourstonesExpressions.com
http://wecanstopspam.org


From francois.granger at free.fr  Fri Jan 31 23:40:03 2003
From: francois.granger at free.fr (Francois Granger)
Date: Fri Jan 31 17:40:08 2003
Subject: [Spambayes] A question
Message-ID: <a05200f2eba60aaab1151@[192.168.1.20]>

I recently received a spam properly classified as spam. I copy and 
past it content from Eudora in pop3proxy and click Classify. It give 
me a Spam probability: 0.887897331413. I check my bayescustomize.ini 
where there is :

[Categorization]
ham_cutoff = 0.10
spam_cutoff = 0.90

So, these parameters are not used by pop3proxy ?

-- 
Recently using MacOSX.......

From tony-bayes at lownds.com  Fri Jan 31 14:54:16 2003
From: tony-bayes at lownds.com (Tony Lownds)
Date: Fri Jan 31 17:54:16 2003
Subject: [Spambayes] A question
In-Reply-To: <a05200f2eba60aaab1151@[192.168.1.20]>
References: <a05200f2eba60aaab1151@[192.168.1.20]>
Message-ID: <a05200f2eba60adeb24ac@[204.162.121.125]>

At 11:40 PM +0100 1/31/03, Francois Granger wrote:
>I recently received a spam properly classified as spam. I copy and 
>past it content from Eudora in pop3proxy and click Classify. It give 
>me a Spam probability: 0.887897331413. I check my bayescustomize.ini 
>where there is :
>
>[Categorization]
>ham_cutoff = 0.10
>spam_cutoff = 0.90
>
>So, these parameters are not used by pop3proxy ?

Hi Fran?ois,

Eudora does always keep the content 100% the same as what pop3proxy 
sees. For instance, attachment data is removed when you copy/paste. 
Also, you will see a subset of the headers unless you click the rich 
headers button.

This might explain the discrepancy.

-Tony