>
> How about we move this discussion to spambayes-dev, and see
> if we can get
> extra interest from anyone else on the project? I think the
> concept is
> sound, so see no reason we can't aim at something useful to a
> few of us :)
>
Yes, indeed. Done. Moved.
> Thanks,
>
> Mark.
>
>
-- Sean
From mhammond at skippinet.com.au Tue Aug 5 11:43:38 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Aug 4 20:43:45 2003
Subject: [spambayes-dev] RE: Mark and Sean stand around the water cooler
discussing plugins, and call each other names.
In-Reply-To: <000801c35ae4$65bd8b70$0201a8c0@swapwizard.com>
Message-ID: <034d01c35aea$9cf67330$f502a8c0@eden>
> > If the latter and via a UI, then I don't see the advantage,
> > as we don't need
> > a flexible system; we just need one as good as the UI. In
> > the first stages,
> > when there aren't that many "competing" filters, and where
> > the rules can't
> > get *too* complex and still be reflected in a UI, I don't see
> > the advantage.
> Agreed. The idea is to empower a developer to write simple code. If he
> want's UI, we may be able to offer a simple framework, but I
> have no itch to
> write a generic UI system for every developer. Unlike some
> people who are
> more ambitious than I. ;-)
Note that the filtering ideas we have been discussing privately are
completely separate from the UI work I am doing. I don't see how a "generic
filter API" could be built on our budgets ;)
> Indeed I was. The idea is to have the ability to move source
> code filters
> back and forth between the supported platforms, including any other SB
> systems that need one, and any non-SB systems would be
> welcome to add the
> architecture. Of course, they would need Python. What a shame.
> I'm absolutely in favor of supporting it. I would love to
> write a whole
> series of little plugins that would do interesting things.
I am afraid I am still not clear on exactly what you are proposing.
Initially I thought you were talking about "just" a generic plugin
mechanism, but now it seems you are also talking about specific changes to
the codebase to provide additional, concrete filtering capabilities. Can
you outline exactly what changes you are proposing to add (ie, exactly what
changes the user would see)?
Thanks,
Mark.
From T.A.Meyer at massey.ac.nz Tue Aug 5 14:05:02 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 4 21:07:21 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9EF8A@its-xchg4.massey.ac.nz>
Ok, for those interested in testing this out, there are *two* changes to
make to the code that Sean posted. The first is to change the regex to
include '0', and the second is to yield w and not word. Sean made these
changes and said that his positives results disappeared, but mine
didn't:
[the third and fourth columns are the old, inaccurate results, included
for reference]
filename: august_no_seans august_no_seans
accurate_seans august_seans
ham:spam: 7900:15260 7900:15260
7900:15260 7900:15260
fp total: 2 2 2 2
fp %: 0.03 0.03 0.03 0.03
fn total: 176 175 176 172
fn %: 1.15 1.15 1.15 1.13
unsure t: 501 495 501 499
unsure %: 2.16 2.14 2.16 2.15
real cost: $296.20 $294.00 $296.20 $291.80
best cost: $489.60 $488.80 $489.60 $488.80
h mean: 0.63 0.60 0.63 0.62
h sdev: 4.84 4.75 4.84 4.81
s mean: 94.52 94.49 94.52 94.57
s sdev: 18.67 18.70 18.67 18.56
mean diff: 93.89 93.89 93.89 93.95
k: 3.99 4.00 3.99 4.02
So my fn didn't go down nearly as much, but my unsures went down more.
=Tony Meyer
From mhammond at skippinet.com.au Tue Aug 5 23:13:31 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Aug 5 08:13:31 2003
Subject: [spambayes-dev] CVS branch for new Outlook dialog work.
In-Reply-To: <02f301c35a28$ccb71c90$f502a8c0@eden>
Message-ID: <057601c35b4a$fc4b32c0$f502a8c0@eden>
The 'outlook-dialog-branch' should be close to baked. Everything should be
working correctly (including the "Folder Selector" code :). A few of the
Tooltip bubbles could do with some work, but these are maintained in the
default options documentation strings.
This new code does not depend on "win32ui", which is the module that exposes
the Microsoft Foundation Classes (MFC). All our dialogs are now "raw"
windows API calls. This will reduce the size of the distribution
significantly, but more importantly fix the bugs with the "autocomplete" and
other functions not working in later versions of Outlook.
Some more work will need to be dont WRT the binary - we now rely on a .rc
and .bmp file at runtime that we need to ensure end up in a reasonable
place.
Remember that as this is still on a branch, it is a perfect time to test -
you just revert back to the trunk should you have problems. However, once I
merge it in (which wont be for a little while), backing out will be harder.
So don't say I didn't warn you :)
Also, in case you have an itch in this direction, I would be happy to accept
patches to the .rc file that tweak the dialogs. Just open this .rc file in
MSVC, and off you go.
Thanks,
Mark.
From mhammond at skippinet.com.au Wed Aug 6 00:24:13 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Aug 5 09:24:25 2003
Subject: [spambayes-dev] "Donations" page for SpamBayes
Message-ID: <05b301c35b54$dd64b110$f502a8c0@eden>
Tim and I have been chatting with the PSF about channeling donations for the
SpamBayes project their way. This has come up before on the SpamBayes list,
with Tim prompting that the PSF would be a good place to send money to!
Here is my first draft at the page. Please offer any suggestions you feel
appropriate.
Thanks,
Mark.
Title: Donations for the SpamBayes project
Author-Email: SpamBayes@python.org
Author: SpamBayes
SpamBayes donations
SpamBayes is free software. There is absolutely no obligation to pay any
money to use or redistribute the software.
However, the developers of the software consider the Python Software
Foundation (PSF) a charity worthy of any donation you feel appropriate.
Such a donation would not only demonstrate your appreciation of this tool,
but also to help advance the development of other Open Source tools
tools in the future.
About the PSF
In summary, the PSF is a non-profit organization devoted to advancing the
Python programming language. The PSF is registered as a US charity, so
all deductions made by US residents will be fully tax deducatable
(but see the
PSF donations page
for specific details).
For more information on the PSF, please see the
PSF web site.
Why donate to the PSF?
SpamBayes is written in the
Python programming language. The
developers of SpamBayes believe that if it were not for Python, SpamBayes
would simply not exist - the productivity gains and ease of use made it
possible for a bunch of hackers to experiment freely and somehow end up with
this very nice tool.
In addition, the developers are all strong advocates of Open Source
Software. It gives us powerful, free tools we can use to develop software,
but more importantly, the tools come with the ultimate technical reference -
the source code. When our tools fail to work as we expect (as all software
does at some stage), we know we have the resources necessary to do our job.
Yeah yeah, but why donate to the PSF?
Many different people have donated their time to this project, which
makes
it unreasonable for any individual to collect money. As the PSF is a
registered charity and devoted to promoting Open Source Software, it
seems the logical choice.
What will the PSF do with my money? Will it be spent on SpamBayes?
Your SpamBayes donation goes into the general PSF fund; it is not
earmarked specifically for the SpamBayes project. In the future, the PSF
may make additional funds available for SpamBayes, for some other worthy
Open Source project, or for some other purpose within its charter. You
may like to read the
PSF Mission Statement
for more details.
OK, OK, where do I pay?
Please make sure you have read this document, so you know exactly
why you are giving money ('cos the software is so cool) and to whom
(the PSF).
To donate now using PayPal, simply click here
From neal at metaslash.com Tue Aug 5 10:50:44 2003
From: neal at metaslash.com (Neal Norwitz)
Date: Tue Aug 5 09:51:21 2003
Subject: [spambayes-dev] Re: [PSC] "Donations" page for SpamBayes
In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden>
References: <05b301c35b54$dd64b110$f502a8c0@eden>
Message-ID: <20030805135044.GO1266@epoch.metaslash.com>
Looks good, with minor changes below. -- Neal
--
> About the PSF
> In summary, the PSF is a non-profit organization devoted to advancing the
> Python programming language. The PSF is registered as a US charity, so
> all deductions made by US residents will be fully tax deducatable
> (but see the
> PSF donations page
> for specific details).
Remove "In summary" and spell deducatable -> deductible.
> In addition, the developers are all strong advocates of Open Source
> Software. It gives us powerful, free tools we can use to develop software,
> but more importantly, the tools come with the ultimate technical reference -
> the source code. When our tools fail to work as we expect (as all software
> does at some stage), we know we have the resources necessary to do our job.
>
I don't understand the last sentance. Are you saying that since the
tools are open source, you have the resources to fix it? (ie, source
code) If so, maybe start the last sentance, "When our Open Source tools
fail" ... Hmmm, just thinking out loud.
Neal
From skip at pobox.com Tue Aug 5 10:04:14 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 5 10:04:37 2003
Subject: [spambayes-dev] "Donations" page for SpamBayes
In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden>
References: <05b301c35b54$dd64b110$f502a8c0@eden>
Message-ID: <16175.47454.438037.304069@montanaro.dyndns.org>
Mark> Tim and I have been chatting with the PSF about channeling
Mark> donations for the SpamBayes project their way. This has come up
Mark> before on the SpamBayes list, with Tim prompting that the PSF
Mark> would be a good place to send money to!
One point is to note that Aahz created a /psf/donate-spambayes.ht file
yesterday. It's substantially different than yours, but it's probably worth
coordinating efforts.
Mark> Here is my first draft at the page. Please offer any suggestions
Mark> you feel appropriate.
Mark> SpamBayes donations
Mark> SpamBayes is free software. There is absolutely no obligation
Mark> to pay any money to use or redistribute the software.
Mark> However, the developers of the software consider the Python
Mark> Software Foundation (PSF) a charity worthy of any donation you
Mark> feel appropriate. Such a donation would not only demonstrate your
Mark> appreciation of this tool, but also to help advance the
Mark> development of other Open Source tools tools in the future.
I'd smush that into one paragraph with slight mods:
SpamBayes is free software. There is absolutely no obligation to pay
any money to use or redistribute the software. However, the developers
of the software consider the Python Software Foundation (PSF) a charity
worthy of your support. A donation to the PSF would not only
demonstrate your appreciation of this tool, but also to help advance the
development of other Python-based open source tools tools in the
future.
Mark> In summary, the PSF is a non-profit organization devoted to
Mark> advancing the Python programming language. The PSF is registered
Mark> as a US charity, so all deductions made by US residents will be
Mark> fully tax deducatable (but see the href="http://www.python.org/psf/donations.html">PSF donations
Mark> page for specific details).
I'd can the "In summary, " prefix and change "is registered as a US charity"
to "is a registered US non-profit organization". It's spelled "deductible".
Mark> ... When our tools fail to work as we expect (as all software does
Mark> at some stage), we know we have the resources necessary to do our
Mark> job.
Maybe end with:
... we know we have the resources necessary to fix them.
Mark> Many different people have donated their time to this project,
Mark> which makes it unreasonable for any individual to collect money.
Mark> As the PSF is a registered charity and devoted to promoting Open
Mark> Source Software, it seems the logical choice.
again, "registered non-profit" seems more familiar to my US eyeballs.
Mark> To donate now using PayPal, simply click here
Mark>
Mark>
I think this may one place you and Aahz might want to coordinate to make
sure all Spambayes-related donations wind up in the same category.
Skip
From kennypitt at hotmail.com Tue Aug 5 11:08:20 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Tue Aug 5 10:08:40 2003
Subject: [spambayes-dev] "Donations" page for SpamBayes
In-Reply-To: <05b301c35b54$dd64b110$f502a8c0@eden>
References: <05b301c35b54$dd64b110$f502a8c0@eden>
Message-ID: <3F2FBA54.7060000@hotmail.com>
Mark Hammond wrote:
> Tim and I have been chatting with the PSF about channeling donations for the
> SpamBayes project their way. This has come up before on the SpamBayes list,
> with Tim prompting that the PSF would be a good place to send money to!
>
> Here is my first draft at the page. Please offer any suggestions you feel
> appropriate.
>
I'm also cool with the verbage. Here are a few more corrections in
addition to those submitted by Neal.
[snip]
> However, the developers of the software consider the Python Software
> Foundation (PSF) a charity worthy of any donation you feel appropriate.
> Such a donation would not only demonstrate your appreciation of this tool,
> but also to help advance the development of other Open Source tools
"help to advance" might sound better here than "to help advance". Also,
"tools" is doubled in "Open Source tools tools".
> tools in the future.
>
> About the PSF
> In summary, the PSF is a non-profit organization devoted to advancing the
> Python programming language. The PSF is registered as a US charity, so
> all deductions made by US residents will be fully tax deducatable
Did you mean "all donations" here instead of "all deductions"?
> (but see the
> PSF donations page
> for specific details).
[snip]
I think routing donations to the PSF is a great idea. I don't get to
code in Python as often as I would like because of the requirements of
my job (combined with a shortage of free time at home ;-) ), but I would
love to see it become more widespread.
--
Kenny Pitt
From aahz at pythoncraft.com Tue Aug 5 11:39:23 2003
From: aahz at pythoncraft.com (Aahz)
Date: Tue Aug 5 10:39:27 2003
Subject: [PSC] Re: [spambayes-dev] "Donations" page for SpamBayes
In-Reply-To: <16175.47454.438037.304069@montanaro.dyndns.org>
References: <05b301c35b54$dd64b110$f502a8c0@eden>
<16175.47454.438037.304069@montanaro.dyndns.org>
Message-ID: <20030805143923.GC8860@panix.com>
On Tue, Aug 05, 2003, Skip Montanaro wrote:
>
> I think this may one place you and Aahz might want to coordinate to make
> sure all Spambayes-related donations wind up in the same category.
Don't worry, we're going over this in painful, agonizing detail. ;-)
--
Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/
This is Python. We don't care much about theory, except where it intersects
with useful practice. --Aahz
From popiel at wolfskeep.com Tue Aug 5 21:24:41 2003
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Tue Aug 5 23:24:46 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
In-Reply-To: Message from "Sean True" of "Mon,
04 Aug 2003 15:56:29 EDT."
<00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
Message-ID: <20030806032441.70E1E2DEB4@cashew.wolfskeep.com>
In message: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
"Sean True" writes:
>
>Not exactly a patch, but it's a one minute cut and paste. I'm theorizing
>that the memory hit is not horrendous -- mostly generates sensible
>fragments
>www.microsoft.com -> www, microsoft, com
>Very_naughty_bits -> very, naughty, bits
With the two fixes mentioned earlier, here's my results on 48 days of
data...
filename: fragment
normal
ham:spam: 1978:6166
1978:6166
fp total: 1 1
fp %: 0.05 0.05
fn total: 28 25
fn %: 0.45 0.41
unsure t: 172 152
unsure %: 2.11 1.87
real cost: $72.40 $65.40
best cost: $44.20 $41.80
h mean: 0.25 0.27
h sdev: 3.71 3.80
s mean: 98.51 98.66
s sdev: 8.97 8.56
mean diff: 98.26 98.39
k: 7.75 7.96
In other words, for me it's a significant loss.
- Alex
From kennypitt at hotmail.com Wed Aug 6 15:30:28 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Wed Aug 6 14:30:35 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
In-Reply-To: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
Message-ID: <3F314944.5050809@hotmail.com>
Sean True wrote:
> This is the code that does it, in context, if not in patch form. I had
> mailed to to Tony, but not the whole list.
> Sorry about that.
>
> -- Sean
>
> Not exactly a patch, but it's a one minute cut and paste. I'm theorizing
> that the memory hit is not horrendous -- mostly generates sensible fragments
> www.microsoft.com -> www, microsoft, com
> Very_naughty_bits -> very, naughty, bits
>
[snip]
>
> -> # Break up composite words looking for good stuff
> -> for w in longword_re.findall(word):
> -> if 3 <= len(w) <= maxword:
> -> yield word
> ->
Seems like most people are seeing this change as a loss or at best no
gain. I wonder if it would make a difference in the accuracy if we
returned special compound word tokens instead of returning the
components as normal words? Something like:
yield 'compound:' + word
I'm just speculating here because I, unfortunately, don't have a
sufficient number of messages saved up to test this myself. Anyone want
to give this variation a try?
--
Kenny Pitt
From kennypitt at hotmail.com Wed Aug 6 15:40:58 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Wed Aug 6 14:41:08 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
In-Reply-To: <3F314944.5050809@hotmail.com>
References: <00b201c35ac2$7ed88460$0201a8c0@swapwizard.com>
<3F314944.5050809@hotmail.com>
Message-ID: <3F314BBA.7040700@hotmail.com>
Kenny Pitt wrote:
[snip]
>
>> -> # Break up composite words looking for good stuff
>> -> for w in longword_re.findall(word):
>> -> if 3 <= len(w) <= maxword:
>> -> yield word
>> ->
>
>
> Seems like most people are seeing this change as a loss or at best no
> gain. I wonder if it would make a difference in the accuracy if we
> returned special compound word tokens instead of returning the
> components as normal words? Something like:
>
> yield 'compound:' + word
>
> I'm just speculating here because I, unfortunately, don't have a
> sufficient number of messages saved up to test this myself. Anyone want
> to give this variation a try?
>
Uh oh, just noticed a bug in the original that I didn't catch before
hitting Send. The original code above should be:
yield w
instead of:
yield word
The variation would then be:
yield 'compound:' + w
Did everyone who previously tested this change catch the error? Without
this fix you would be inserting the *entire* compound token into your
training data once for each component word found (e.g. Very_Naughty_Bits
would result in 'Very_Naughty_Bits' with a count of 3 instead of 'Very',
'Naughty', and 'Bits' each with a count of 1). This could definately
have a negative impact on the results.
--
Kenny Pitt
From T.A.Meyer at massey.ac.nz Thu Aug 7 12:21:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 19:22:24 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F619@its-xchg4.massey.ac.nz>
> Uh oh, just noticed a bug in the original that I didn't catch before
> hitting Send. The original code above should be:
> yield w
> instead of:
> yield word
> The variation would then be:
> yield 'compound:' + w
>
> Did everyone who previously tested this change catch the error?
My original results, and Sean's, were pre fixing this. My later
results, and Alex's were post fixing. (And Sean indicated that his
retest after fixing was also a loss, although he was going to try
different bucket sizes).
Ironically, the incorrect method had better results for Sean, and
similar for me. Unless anyone is going to post some more results, I
suspect that this will be thrown in the "nice idea but doesn't produce
the needed results" bin.
(If someone had the time, it would be great to take all the comments
from the list, tokenizer.py and elsewhere and make a coherent summary of
all the things that have been tested and what the results were...)
Anyone up for some more testing?
=Tony Meyer
From T.A.Meyer at massey.ac.nz Thu Aug 7 12:39:14 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 19:40:00 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz>
> Seems like most people are seeing this change as a loss or at best no
> gain. I wonder if it would make a difference in the accuracy if we
> returned special compound word tokens instead of returning the
> components as normal words? Something like:
>
> yield 'compound:' + word
>
> Anyone want to give this variation a try?
(I changed "yield w" to "yield 'compound:' + w")
filename: august_no_seans kennys
august_seans
ham:spam: 7900:15260 7900:15260
7900:15260
fp total: 2 2 2
fp %: 0.03 0.03 0.03
fn total: 176 172 174
fn %: 1.15 1.13 1.14
unsure t: 501 499 491
unsure %: 2.16 2.15 2.12
real cost: $296.20 $291.80 $292.20
best cost: $489.60 $488.80 $485.00
h mean: 0.63 0.62 0.61
h sdev: 4.84 4.81 4.80
s mean: 94.52 94.57 94.56
s sdev: 18.67 18.56 18.58
mean diff: 93.89 93.95 93.95
k: 3.99 4.02 4.02
Interesting. FN's are better than not doing anything with the compound
words, but not as good as with just the word. Unsures, however, are
even better. I might try this on a different corpus and see how it goes
there.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Thu Aug 7 12:52:16 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 19:53:05 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F64B@its-xchg4.massey.ac.nz>
[Tony's results]
> fp total: 2 2 2
BTW, if anyone is wondering, the two false positives that are in all of
my results are:
o A message from mtnsms.com telling me about the (then) new smspop
service. This remains my only email from mtnsms.com, and it does look
very spammy, but was definitely ham (in fact, I now use smspop). It
scored 0.950.
o A message from a company doing a survey about how a transaction with
another company was. Again, looked a lot like spam, but wasn't, and
again the only message from this company that I've received (AFAIK). It
scored 0.966.
I was expecting the second, so could retrieve it and this wasn't a
problem. The first one was a real FP, though. Any suggestions on
options I could fiddle to correct this (without creating lots of other
incorrect classified messages) would be of interest! :)
=Tony Meyer
From T.A.Meyer at massey.ac.nz Thu Aug 7 13:05:49 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 20:06:28 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F671@its-xchg4.massey.ac.nz>
And, FWIW, here are results on a different corpus:
filename: no_sean2s kenny2s
sean2s
ham:spam: 7580:7580 7580:7580
7580:7580
fp total: 44 47 45
fp %: 0.58 0.62 0.59
fn total: 16 17 17
fn %: 0.21 0.22 0.22
unsure t: 356 348 344
unsure %: 2.35 2.30 2.27
real cost: $527.20 $556.60 $535.80
best cost: $592.40 $611.00 $584.40
h mean: 3.40 3.38 3.36
h sdev: 14.19 14.21 14.14
s mean: 97.94 97.95 97.98
s sdev: 9.43 9.49 9.40
mean diff: 94.54 94.57 94.62
k: 4.00 3.99 4.02
Kenny's version again does better than Sean's original, although still 1
FN and 1 FP more than not having it at all, in exchange for 12 fewer
unsures. (I think I would rather have the unsures).
=Tony Meyer
From skip at pobox.com Wed Aug 6 20:06:19 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 6 20:06:32 2003
Subject: [spambayes-dev] FYI -- dumbdbm is nuked
Message-ID: <16177.38907.329762.442311@montanaro.dyndns.org>
I just checked in a simple change to spambayes/dbmstorage.py which erases
dumbdbm from the candidate dbm-style modules. The remaining three
candidates are gdbm, bsddb3 (aka PyBSDDB, aka bsddb in Python 2.3), and
bsddb (before Python 2.3). If you were using dumbdbm previously, you will
have to retrain.
(Should the plain old dbm module be considered? If available, its
restrictions on key and value length shouldn't be a problem.)
Skip
From mhammond at skippinet.com.au Thu Aug 7 11:21:07 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Aug 6 20:21:15 2003
Subject: [spambayes-dev] RE: "Donations" page for SpamBayes
In-Reply-To: <3F2FBA54.7060000@hotmail.com>
Message-ID: <041701c35c79$cd4fe7b0$f502a8c0@eden>
Thanks for all the comments on the "Donations" page. I think I got them
all.
The page is now online at http://www.spambayes.org/donations.html, and the
source .ht file is in the SpamBayes CVS tree. If any psc members have more
comments, just send them to me. Spambayes-devers should just make the
change themself.
Note that this page is not linked in anywhere yet. Once everyone seems
happy, I will create a few links.
Thanks,
Mark.
From mhammond at skippinet.com.au Thu Aug 7 11:40:19 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Aug 6 20:40:26 2003
Subject: [spambayes-dev] FAQ and donations
Message-ID: <042601c35c7c$7b4bef60$f502a8c0@eden>
As part of linking the "Donations" page in, I should update the FAQ
slightly. I am thinking of moving section:
4.15 Do I have to pay for SpamBayes? Can I pay you money if I really want
to?
to a new section
1.2 What is the license? Does it cost anything? Can I pay anyway?".
And slightly tweaking the text (ie, moving some of the "FAQ" text to the
donations page), and adding a link to the new page.
It makes sense to me that (a) we include license details in the same entry,
and that (b) the entry be much closer to the top of the FAQ.
Any objections?
Mark.
From mhammond at skippinet.com.au Thu Aug 7 11:45:10 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Aug 6 20:45:09 2003
Subject: [spambayes-dev] FW: [PSC] RE: "Donations" page for SpamBayes
Message-ID: <042901c35c7d$2861eec0$f502a8c0@eden>
I think Kevin is correct - if underlines were turned off in my Mozilla, the
links would look very subtle. I guess this is in our .css? Any takers? :)
Mark.
-----Original Message-----
From: Kevin Altis [mailto:altis@semi-retired.com]
Sent: Thursday, 7 August 2003 10:35 AM
To: Mark Hammond
Subject: RE: [PSC] RE: "Donations" page for SpamBayes
Mark,
maybe it is just me and my IE 5.5x browser, but there isn't a difference in
the color of the regular text and the hyperlink text, it is all dark
gray/black. I don't have underlines turned on in my browser, so that's
probably the problem, but I would suggest having a color that stands out
more for links. Of course I can see the button at the bottom. Maybe it would
be good to add the button at the top too.
ka
> -----Original Message-----
> From: psc-bounces@python.org [mailto:psc-bounces@python.org]On Behalf Of
> Mark Hammond
> Sent: Wednesday, August 06, 2003 5:21 PM
> To: spambayes-dev@python.org; psc@python.org
> Subject: [PSC] RE: "Donations" page for SpamBayes
>
>
> Thanks for all the comments on the "Donations" page. I think I got them
> all.
>
> The page is now online at http://www.spambayes.org/donations.html, and the
> source .ht file is in the SpamBayes CVS tree. If any psc members
> have more
> comments, just send them to me. Spambayes-devers should just make the
> change themself.
>
> Note that this page is not linked in anywhere yet. Once everyone seems
> happy, I will create a few links.
>
> Thanks,
>
> Mark.
From skip at pobox.com Wed Aug 6 20:48:09 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 6 20:49:18 2003
Subject: [spambayes-dev] RE: "Donations" page for SpamBayes
In-Reply-To: <041701c35c79$cd4fe7b0$f502a8c0@eden>
References: <3F2FBA54.7060000@hotmail.com>
<041701c35c79$cd4fe7b0$f502a8c0@eden>
Message-ID: <16177.41417.6823.709306@montanaro.dyndns.org>
Mark,
The donations page looks good to me.
Skip
From T.A.Meyer at massey.ac.nz Thu Aug 7 14:04:35 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 21:05:23 2003
Subject: [spambayes-dev] FAQ and donations
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F6BA@its-xchg4.massey.ac.nz>
> And slightly tweaking the text (ie, moving some of the "FAQ"
> text to the donations page), and adding a link to the new page.
>
> It makes sense to me that (a) we include license details in
> the same entry, and that (b) the entry be much closer to the
> top of the FAQ.
+1
=Tony Meyer
From T.A.Meyer at massey.ac.nz Thu Aug 7 14:12:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 6 21:13:25 2003
Subject: [spambayes-dev] FW: [PSC] RE: "Donations" page for SpamBayes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F6C6@its-xchg4.massey.ac.nz>
> I think Kevin is correct - if underlines were turned off in
> my Mozilla, the links would look very subtle. I guess this
> is in our .css? Any takers? :)
I've checked in a new colour/color. It's similar to the light purple
that the sidebar ends at. It looks visible to me, but I'm not attached
if someone likes another colour/color more :)
=Tony Meyer
From skip at pobox.com Wed Aug 6 21:45:10 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 6 21:45:22 2003
Subject: [spambayes-dev] PGClassifier checked in
Message-ID: <16177.44838.909642.578739@montanaro.dyndns.org>
The storage module gained two new classes:
SQLClassifier - a base class for people wishing to store their hammie
info in SQL databases
PGClassifier - a concrete implementation using the psycopg module to
access a PostgreSQL database
This code has a number of problems, not the least of which is that none of
the other modules and scripts in the system know about it yet. For those of
you not subscribed to spambayes-checkins, Here's the checkin message:
----------------------------------------------------------------------------
**** Danger, Will Robinson! Do not use the PGClassifier class yet! ****
This is an initial stab at SQLClassifier and PGClassifier classes. This
still needs a lot of work, to wit:
* I've tried to break functionality into the two classes in such a way
that adding other SQLClassifier subclasses should be reasonably easy,
but I don't know much about writing portable SQL. Python's DB API
helps, to be sure, but isn't perfect.
* Scoring messages is dreadfully slow. I don't know if I'm commit()ing
too frequently, creating too many cursors or if I have some other
problem. My past use of SQL has generally been of the "scads of
SELECTs per INSERT" sort of thing, so I've never paid a lot of
attention to commit().
* I've encountered a couple bad cases. With the word column defined as
bytea (PostgreSQL's binary string type), both of these calls fail if c
is a cursor object:
c.execute("select * from bayes where word=%s", ('report.\\n";',))
c.execute("select * from bayes where word=%s", ('reserved\x00',))
If the word column is defined as the more traditional varchar(128),
the first call succeeds but the second still fails.
----------------------------------------------------------------------------
Skip
From skip at pobox.com Wed Aug 6 21:45:54 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 6 21:47:15 2003
Subject: [spambayes-dev] FAQ and donations
In-Reply-To: <042601c35c7c$7b4bef60$f502a8c0@eden>
References: <042601c35c7c$7b4bef60$f502a8c0@eden>
Message-ID: <16177.44882.988137.66691@montanaro.dyndns.org>
Mark> As part of linking the "Donations" page in, I should update the
Mark> FAQ slightly....
Mark> Any objections?
Go for it. ;-)
S
From skip at pobox.com Wed Aug 6 22:04:47 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 6 22:05:01 2003
Subject: [spambayes-dev] What do you make of this spam?
Message-ID: <16177.46015.503433.306766@montanaro.dyndns.org>
What is up with this message? I get a fair number of them (just killed
three or four) and SB nails them quite well, as you can see. I don't
understand the motive for sending it though. Is it just a "ping" message to
see if the email address is valid?
Skip
-------------- next part --------------
An embedded message was scrubbed...
From: "guy smith"
Subject: hey ghyc
Date: Thu, 07 Aug 03 08:24:20 GMT
Size: 1550
Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030806/7f90eab4/attachment.eml
From mhammond at skippinet.com.au Thu Aug 7 13:31:09 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Aug 6 22:32:11 2003
Subject: [spambayes-dev] What do you make of this spam?
In-Reply-To: <16177.46015.503433.306766@montanaro.dyndns.org>
Message-ID: <049301c35c8b$f67e2ea0$f502a8c0@eden>
> What is up with this message? I get a fair number of them
> (just killed
> three or four) and SB nails them quite well, as you can see. I don't
> understand the motive for sending it though. Is it just a
> "ping" message to
> see if the email address is valid?
Yeah - maybe just to collect the replies. Sometimes when a new Klez like
worm hits, people get virus attachments from my email address (but not from
me :) I am amazed at how many people reply with "I had trouble opening it -
please resend it". These are people who would have absolutely no clue who I
am, and they don't enquire, nor ask what it was I was sending them.
Mark.
From tim.one at comcast.net Wed Aug 6 23:56:58 2003
From: tim.one at comcast.net (Tim Peters)
Date: Wed Aug 6 22:57:29 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F64B@its-xchg4.massey.ac.nz>
Message-ID:
[Tony Meyer]
> BTW, if anyone is wondering, the two false positives that are in
> all of my results are:
>
> o A message from mtnsms.com telling me about the (then) new
> smspop service. This remains my only email from mtnsms.com,
> and it does look very spammy, but was definitely ham (in fact,
> I now use smspop). It scored 0.950.
>
> o A message from a company doing a survey about how a transaction
> with another company was. Again, looked a lot like spam, but
> wasn't, and again the only message from this company that I've
> received (AFAIK). It scored 0.966.
>
> I was expecting the second, so could retrieve it and this wasn't a
> problem. The first one was a real FP, though. Any suggestions on
> options I could fiddle to correct this (without creating lots of
> other incorrect classified messages) would be of interest! :)
I'm afraid there aren't any. The system has no intelligence whatsoever, it
just counts tokens. It works so good 99.99% of the time it's easy to
believe it must be smarter than that <0.9 wink>; but it isn't. In the first
case, I take it you didn't also discuss sms in other ham. Those are the
nasty FP for me: dealing with an online business in an area (product or
service) I've never emailed about before. For example, I'm a smoker and buy
my cigarettes over the web, but the only clue about that you'll find in my
ham training set is seven(!) msgs from my death vendor -- it took that many
to convince spambayes I really wanted those (but didn't want spam trying to
sell me cigars).
It's often the case that training on just one will knock the next into
Unsure territory, but before the first you're stuck. Whitelists don't help
this either, since you can't whitelist an address for a msg (like your
first) you're not expecting. Another option is to hire a personal assistant
to sort your email for you -- expect most people seem to want to hide that
they live for farm porn spam .
From T.A.Meyer at massey.ac.nz Thu Aug 7 18:22:24 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 7 01:23:03 2003
Subject: [spambayes-dev] PGClassifier checked in
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3063@its-xchg4.massey.ac.nz>
> The storage module gained two new classes:
And now a third:
mySQLClassifier - a concrete implementation using the MySQLdb
module to
access a mySQL database
> This code has a number of problems, not the least of which is
> that none of the other modules and scripts in the system know
> about it yet.
I still like your earlier suggestion, and that's what I implemented to
test it.
For example, in my config file:
[Storage]
persistent_storage_file:mysql::host=localhost dbname=bayes
And in pop3proxy.py (if this does get used, some sort of central
function that all the apps can use would be good, IMO).
if self.useDB:
if '::' in filename:
available_sqls = {"mysql" : storage.mySQLClassifier,
"pgsql" : storage.PGClassifier,
}
sql_type, rest = filename.split('::', 1)
if available_sqls.has_key(sql_type.lower()):
self.bayes =
available_sqls[sql_type.lower()](filename)
else:
# raise some sort of InvalidClassifierError
pass
else:
self.bayes = storage.DBDictClassifier(filename)
...
[I needed to change your ":" to a "::" because Windows & MacOS<10 use
":" in filenames, whereas I think "::" is a no-no on both, and hopefully
*nix users don't want to put their hammie.db file in a path with a "::"]
> * I've tried to break functionality into the two classes
> in such a way that adding other SQLClassifier subclasses should be
> reasonably easy,
I can certainly say that adding the mySQL subclass was very easy. It's
possible that even more code could go into the base class - some of the
mySQL functions are very similar to the PG ones. I'll let you figure
that out ;)
I'll leave the other things to those more experienced with SQL :)
=Tony Meyer
From mal at lemburg.com Thu Aug 7 10:41:49 2003
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu Aug 7 03:42:17 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
In-Reply-To: <041701c35c79$cd4fe7b0$f502a8c0@eden>
References: <041701c35c79$cd4fe7b0$f502a8c0@eden>
Message-ID: <3F3202BD.4010204@lemburg.com>
Mark Hammond wrote:
> Thanks for all the comments on the "Donations" page. I think I got them
> all.
>
> The page is now online at http://www.spambayes.org/donations.html, and the
> source .ht file is in the SpamBayes CVS tree. If any psc members have more
> comments, just send them to me. Spambayes-devers should just make the
> change themself.
>
> Note that this page is not linked in anywhere yet. Once everyone seems
> happy, I will create a few links.
Before making it public, you should probably test the donate button.
Other than that, it looks OK.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Software directly from the Source (#1, Aug 07 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
From T.A.Meyer at massey.ac.nz Thu Aug 7 21:12:24 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 7 04:13:16 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3085@its-xchg4.massey.ac.nz>
> Before making it public, you should probably test the donate
> button. Other than that, it looks OK.
Heh. I nominate Mark to spend his money testing it .
=Tony Meyer
From mhammond at skippinet.com.au Thu Aug 7 23:47:15 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 7 08:47:29 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3085@its-xchg4.massey.ac.nz>
Message-ID: <013601c35ce2$09937990$f502a8c0@eden>
[Tony]
> > Before making it public, you should probably test the donate
> > button. Other than that, it looks OK.
>
> Heh. I nominate Mark to spend his money testing it .
Hah! Bet you didn't think I would! :) I hope everyone appreciates that I
have now donated nearly *four* local dollars to the PSF on behalf of
SpamBayes. $US2 goes a long way.
And-you-damn-well-better-spend-it-wisely ,
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1764 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030807/db710129/winmail-0001.bin
From mal at lemburg.com Thu Aug 7 15:57:05 2003
From: mal at lemburg.com (M.-A. Lemburg)
Date: Thu Aug 7 08:57:41 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
In-Reply-To: <013501c35ce2$08649680$f502a8c0@eden>
References: <013501c35ce2$08649680$f502a8c0@eden>
Message-ID: <3F324CA1.1020500@lemburg.com>
Mark Hammond wrote:
> [Tony]
>
>>>Before making it public, you should probably test the donate
>>>button. Other than that, it looks OK.
>>
>>Heh. I nominate Mark to spend his money testing it .
>
>
> Hah! Bet you didn't think I would! :) I hope everyone appreciates that I
> have now donated nearly *four* local dollars to the PSF on behalf of
> SpamBayes. $US2 goes a long way.
>
> And-you-damn-well-better-spend-it-wisely ,
That's one free beer at the next PyCon event ;-)
Looks like everything worked just fine.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Software directly from the Source (#1, Aug 07 2003)
>>> Python/Zope Products & Consulting ... http://www.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
From altis at semi-retired.com Thu Aug 7 10:07:23 2003
From: altis at semi-retired.com (Kevin Altis)
Date: Thu Aug 7 12:00:38 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
In-Reply-To: <013501c35ce2$08649680$f502a8c0@eden>
Message-ID:
Since the donations link is working, I'll go ahead and email Jon Udell and
let him know there is a way of giving back. The donate page should at least
get a mention on his blog and perhaps a mention on one of his InfoWorld
articles. If there has been any other high-visibility coverage of SpamBayes,
those writers should be contacted as well.
ka
> From: Mark Hammond
> Sent: Thursday, August 07, 2003 5:47 AM
> To: 'Meyer, Tony'
> Cc: psc@python.org; spambayes-dev@python.org
> Subject: RE: [spambayes-dev] Re: [PSC] RE: "Donations" page for
> SpamBayes
>
>
> [Tony]
> > > Before making it public, you should probably test the donate
> > > button. Other than that, it looks OK.
> >
> > Heh. I nominate Mark to spend his money testing it .
>
> Hah! Bet you didn't think I would! :) I hope everyone appreciates that I
> have now donated nearly *four* local dollars to the PSF on behalf of
> SpamBayes. $US2 goes a long way.
>
> And-you-damn-well-better-spend-it-wisely ,
>
> Mark.
From popiel at wolfskeep.com Thu Aug 7 10:32:20 2003
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Thu Aug 7 12:32:24 2003
Subject: [spambayes-dev] Very small change for composite word tokenizing.
In-Reply-To: Message from "Meyer, Tony" of "Thu,
07 Aug 2003 11:39:14 +1200."
<1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz>
Message-ID: <20030807163220.126A92DEB4@cashew.wolfskeep.com>
In message: <1ED4ECF91CDED24C8D012BCF2B034F1302A9F632@its-xchg4.massey.ac.nz>
"Meyer, Tony" writes:
>
>(I changed "yield w" to "yield 'compound:' + w")
>
>filename: august_no_seans kennys
> august_seans
>ham:spam: 7900:15260 7900:15260
> 7900:15260
>fp total: 2 2 2
>fp %: 0.03 0.03 0.03
>fn total: 176 172 174
>fn %: 1.15 1.13 1.14
>unsure t: 501 499 491
>unsure %: 2.16 2.15 2.12
>real cost: $296.20 $291.80 $292.20
>best cost: $489.60 $488.80 $485.00
>h mean: 0.63 0.62 0.61
>h sdev: 4.84 4.81 4.80
>s mean: 94.52 94.57 94.56
>s sdev: 18.67 18.56 18.58
>mean diff: 93.89 93.95 93.95
>k: 3.99 4.02 4.02
>
>Interesting. FN's are better than not doing anything with the compound
>words, but not as good as with just the word. Unsures, however, are
>even better. I might try this on a different corpus and see how it goes
>there.
Here's my results:
filename: normal fragment
compound
ham:spam: 1978:6166 1978:6166
1978:6166
fp total: 1 1 1
fp %: 0.05 0.05 0.05
fn total: 25 28 25
fn %: 0.41 0.45 0.41
unsure t: 152 172 154
unsure %: 1.87 2.11 1.89
real cost: $65.40 $72.40 $65.80
best cost: $41.80 $44.20 $41.40
h mean: 0.27 0.25 0.26
h sdev: 3.80 3.71 3.76
s mean: 98.66 98.51 98.65
s sdev: 8.56 8.97 8.51
mean diff: 98.39 98.26 98.39
k: 7.96 7.75 8.02
The 'compound:' modifier on the generated tokens makes the
fragmentation code neutral for me, again.
- Alex
From jm at jmason.org Thu Aug 7 12:20:43 2003
From: jm at jmason.org (Justin Mason)
Date: Thu Aug 7 14:21:15 2003
Subject: [spambayes-dev] testing tweaks
Message-ID: <20030807182048.A95B516F18@jmason.org>
Hey SBers,
Have you guys considered testing how a tweak effects DB size -- ie. including
that in the test results output? I find that's a pretty major factor in
a lot of cases in SpamAssassin.
cheers,
--j.
From popiel at wolfskeep.com Thu Aug 7 12:56:59 2003
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Thu Aug 7 14:57:04 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To: Message from jm@jmason.org (Justin Mason) of "Thu,
07 Aug 2003 11:20:43 PDT." <20030807182048.A95B516F18@jmason.org>
References: <20030807182048.A95B516F18@jmason.org>
Message-ID: <20030807185700.02EC52DEB4@cashew.wolfskeep.com>
In message: <20030807182048.A95B516F18@jmason.org>
jm@jmason.org (Justin Mason) writes:
>
> Hey SBers,
>
>Have you guys considered testing how a tweak effects DB size -- ie. including
>that in the test results output? I find that's a pretty major factor in
>a lot of cases in SpamAssassin.
We've looked at DB size a couple times in the past, but some of the
complicating factors of this are that the actual DB size (as opposed
to token counts) is highly dependent on what sort of backend you use,
and people have very different thresholds for what is acceptable
space usage. As a result, it's very difficult to get any consensus
on what sort of DB size behaviour is acceptable.
Add into the mix that the largest effector of DB size is training
style... and no two of us use the same style, and there's little
support been made for simulating the training styles of different
people for testing. (There's the bare beginning of a framework in
the incremental stuff I did, but there's insufficient training rules
built for simulating different styles.)
I personally am happy to give a couple gigabytes to training data
(aka my historical mail record... I never delete any mail anymore),
and up to about 50 megabytes to the live database (it's currently
bounded at about 20 megabytes by my training style). I'm sure that
Tim's sister would have different priorities.
So yes, we've considered it, but only barely, and not recently.
This is one area where theory has fallen to lassitude.
- Alex
From tim.one at comcast.net Thu Aug 7 19:48:11 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Aug 7 18:48:46 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To: <20030807182048.A95B516F18@jmason.org>
Message-ID:
[Justin Mason]
> Have you guys considered testing how a tweak effects DB size -- ie.
> including that in the test results output? I find that's a pretty
> major factor in a lot of cases in SpamAssassin.
I paid a lot of attention in the early days, since I was running training
sets with tens of thousands of messages, and used an entirely in-memory
Python dict to hold all the stats.
Most gimmicks didn't make a difference worth noting. There is one hack in
our tokenizer to reduce database size: tokens exceeding 12 characters are
replaced by a synthesized token just recording the first character, and
floor(len(token)/10)*10. Testing showed that recording "long tokens" in
full didn't make any difference to results, but bloated the database with
many fat hapaxes.
In effect, then, no matter what the other tokenization gimmicks, we don't
create tokens with more than 12 characters, and create a number of tokens
approximately equal to the number of non-whitespace runs in the message.
The option replace_nonascii_chars is also very effective at reducing
database size (it replaces each high-bit and control byte with a question
mark), and actually helps English-speaking users nail Asian spam. It would
also presumably murder Asian ham, but that's not a problem I have .
That option is off by default in the codebase, but on by default in the
Outlook addin.
Other gimmicks we don't use had huge effects on database size. Character
5-grams were murder on database size. They also performed worse, so
dropping them was no pain. Schemes also looking at token pairs (bigrams)
more than doubled the database size.
If I ever get time for it, I'd like to pursue a specific mixed
unigram-bigram scheme worked out with Gary Robinson. For example, given
"penis size", that can be viewed as a bigram, or as two unigrams, or as two
unigrams *and* a bigram. The last choice isn't so good because it
systematically creates highly correlated clues, which leads to mistakes that
don't make sense to a human eye (I'll claim that experienced spambayes users
are sympathetic to the mistakes it makes -- spambayes judgments are
"intuitive", in some real sense). But with enough effort, it's possible to
"tile" a message with non-overlapping unigrams and bigrams, so that each
token contributes to exactly one scored entity. The trick is to do this in
a way that maximizes the overall strength of the entities that get scored.
So, for example, and simplifying too much, if the bigram "penis size" has a
spamprob closer to 0.0 or 1.0 than either of the unigrams "penis" and
"size", view it as a bigram; but if "penis" has a spamprob closer to 0.0 or
1.0 than "penis size", view it as two unigrams instead.
I only had time to run a few tests on that, and it looked very promising,
learning faster than our current pure-unigram scheme, and doing at least as
well on all error measures. It was (of course) slower to score, and the
database more than doubled in size. For my own use, it would have been
worth it, since my personal databases are still relatively tiny (about 1,000
training msgs total), and the code runs too fast for me to notice it now. I
suspect, but don't know, that this mixed scheme would do significantly
better on short messages.
From altis at semi-retired.com Thu Aug 7 17:18:38 2003
From: altis at semi-retired.com (Kevin Altis)
Date: Thu Aug 7 19:11:53 2003
Subject: [spambayes-dev] Re: [PSC] RE: "Donations" page for SpamBayes
In-Reply-To:
Message-ID:
How's this for progress?!
http://weblog.infoworld.com/udell/2003/08/07.html#a771
ka
p.s. In case nobody notices that Nancy Tindle and I have the same address,
that SpamBayes donation to the PSF was from us, she wears the PayPal pants
in the family. I bet nobody thought to calculate family rankings into the
donations page. ;-)
http://www.egenix.com/files/python/psf-donations.html
> -----Original Message-----
> From: Kevin Altis
>
> Since the donations link is working, I'll go ahead and email Jon Udell and
> let him know there is a way of giving back. The donate page
> should at least
> get a mention on his blog and perhaps a mention on one of his InfoWorld
> articles. If there has been any other high-visibility coverage of
> SpamBayes,
> those writers should be contacted as well.
>
> ka
>
> > From: Mark Hammond
> > Sent: Thursday, August 07, 2003 5:47 AM
> > To: 'Meyer, Tony'
> > Cc: psc@python.org; spambayes-dev@python.org
> > Subject: RE: [spambayes-dev] Re: [PSC] RE: "Donations" page for
> > SpamBayes
> >
> >
> > [Tony]
> > > > Before making it public, you should probably test the donate
> > > > button. Other than that, it looks OK.
> > >
> > > Heh. I nominate Mark to spend his money testing it .
> >
> > Hah! Bet you didn't think I would! :) I hope everyone
> appreciates that I
> > have now donated nearly *four* local dollars to the PSF on behalf of
> > SpamBayes. $US2 goes a long way.
> >
> > And-you-damn-well-better-spend-it-wisely ,
> >
> > Mark.
From tim.one at comcast.net Thu Aug 7 20:20:44 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Aug 7 19:21:16 2003
Subject: [spambayes-dev] Cool Outlook mystery
Message-ID:
Our bug 782709 is pretty interesting! Tony just added a good clue to it.
I'll partly confirm it here, and add another bit of evidence.
After retraining and rescoring from scratch, there's a particular msg in my
Ham folder showing a spam score of 3% in my Spam column. "show spam clues"
rates it much higher:
Spam Score: 0.180576
word spamprob #ham #spam
'*H*' 0.722595 - -
'*S*' 0.083747 - -
Some of the token scores are amazing:
'to:no real name:2**0' 0.342745 7 7
'header:To:1' 0.398161 7 9
'to:2**0' 0.398161 7 9
'header:Date:1' 0.64742 1 4
'header:Message-Id:1' 0.764668 0 1
'subject:.' 0.764668 0 1
'subject: ' 0.846122 0 2
'header:From:1' 0.871695 1 16
Notice I said this was a ham message, and I trained on it as ham. Therefore
it shouldn't be possible that I see *any* token (let alone 3) in this
message with a ham-count of 0. I've certainly got, e.g., way more than
1+4=5 training messages with a Date header too, and way more than 16 with a
"To" header, etc.
In my professional opinion, something is royally hosed . My
observations so far match Tony's that it's confined to tokens in headers, so
it's probably not a database bug.
From jm at jmason.org Thu Aug 7 18:12:33 2003
From: jm at jmason.org (Justin Mason)
Date: Thu Aug 7 20:12:51 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To:
Message-ID: <20030808001238.F404C16F0E@jmason.org>
Tim Peters writes:
> If I ever get time for it, I'd like to pursue a specific mixed
> unigram-bigram scheme worked out with Gary Robinson. For example, given
> "penis size", that can be viewed as a bigram, or as two unigrams, or as two
> unigrams *and* a bigram. The last choice isn't so good because it
> systematically creates highly correlated clues, which leads to mistakes that
> don't make sense to a human eye (I'll claim that experienced spambayes users
> are sympathetic to the mistakes it makes -- spambayes judgments are
> "intuitive", in some real sense). But with enough effort, it's possible to
> "tile" a message with non-overlapping unigrams and bigrams, so that each
> token contributes to exactly one scored entity. The trick is to do this in
> a way that maximizes the overall strength of the entities that get scored.
> So, for example, and simplifying too much, if the bigram "penis size" has a
> spamprob closer to 0.0 or 1.0 than either of the unigrams "penis" and
> "size", view it as a bigram; but if "penis" has a spamprob closer to 0.0 or
> 1.0 than "penis size", view it as two unigrams instead.
>
> I only had time to run a few tests on that, and it looked very promising,
> learning faster than our current pure-unigram scheme, and doing at least as
> well on all error measures. It was (of course) slower to score, and the
> database more than doubled in size. For my own use, it would have been
> worth it, since my personal databases are still relatively tiny (about 1,000
> training msgs total), and the code runs too fast for me to notice it now. I
> suspect, but don't know, that this mixed scheme would do significantly
> better on short messages.
That's interesting -- it's like the idea of "decomposing" tokens and using
the strongest output of the result. e.g. for "Free!", decompose that to
"free!" "Free" "free" and use the strongest result of those 4 lookups.
Yeah, I'm interested because I'd be pretty sure that compound-word breakup
tweak would increase db size, but that doesn't seem to be mentioned...
--j.
From tim.one at comcast.net Fri Aug 8 00:31:06 2003
From: tim.one at comcast.net (Tim Peters)
Date: Thu Aug 7 23:31:43 2003
Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/Outlook2000
msgstore.py, 1.61, 1.62
In-Reply-To:
Message-ID:
[Mark Hammond]
> Modified Files:
> msgstore.py
> Log Message:
> Fix [ 782709 ] not match between actual score and what's shown in
> outlook We can't trust potentially large properties in the data used
> to create the msg object. Thanks Tim, Tony, everyone.
Excellent, Mark! I confirm that all the (subtle, unless you're looking for
them) symptoms went away for me. This had a major-league good effect on my
score distributions too: I've been mildly puzzled for a long time that the
scores-after-training in my ham and spam Outlook data had much higher
variance than in standalone non-Outlook tests. Now I suspect my modest
1,000-msg training database is much bigger than I really need <0.9 wink>.
BTW, the ham msg I posted about before, scoring 0.18 or 0.03 (depending on
where you looked), now scores a much more satisfying 0.000972835.
In effect, the Outlook addin has been acting much like a body-only
classifier? Wow. No wonder I had to keep training Laura Creighton's
two-liners as ham .
From mhammond at skippinet.com.au Sat Aug 9 16:27:04 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sat Aug 9 01:27:05 2003
Subject: [spambayes-dev] Merge outlook dialog branch soon
Message-ID: <000501c35e36$def8ad30$f502a8c0@eden>
After a few days to see if Outlook-007 has fatal flaws, I intend merging the
Outlook dialog branch onto the trunk. This will mean the next binary will
come with this new code.
Any objections or suggestions?
Mark.
From skip at pobox.com Sat Aug 9 13:32:09 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sat Aug 9 13:32:16 2003
Subject: [spambayes-dev] RE: [Spambayes] Loosing spam database?
In-Reply-To: <019501c35e12$d84c57d0$f502a8c0@eden>
References:
<019501c35e12$d84c57d0$f502a8c0@eden>
Message-ID: <16181.12313.339459.9415@montanaro.dyndns.org>
>>>>> "Mark" == Mark Hammond writes:
Mark> It looks like you have a very old version of the program.
Maybe we should set sys.excepthook to a function which dumps version info
when dumping a traceback. That obviously wouldn't help with people running
old(er) versions, though we'd be able to tell they were not running the
latest.
Skip
From eloff at helpmygame.com Sat Aug 9 13:14:07 2003
From: eloff at helpmygame.com (Daniel Eloff)
Date: Sat Aug 9 15:14:24 2003
Subject: [spambayes-dev] Excellent program
Message-ID:
A really excelent approach to the problem of spam. I'm going through the
classification code and I came across two things I don't understand so far.
I'm trying to understand what these lines of code do in the classifier.py
and chi2.py files:
(yesterday was my first brush with the python language).
in chi2_spamprob:
clues = self._getclues(wordstream)
for prob, word, record in clues
here we first see prob which is used extensively
throughout the function. It's obviosuly very key to understanding
the function, but I have no idea what value it is, or why
for that matter. word and record don't seem to be used...
and in chi2Q:
assert v & 1 == 0
What's this statment do? (I'm familair with assert statments, but not
what's (v & 1) == 0 mean?
Thanks to anybody who can help me understand this.
--
-Daniel Eloff
From tim.one at comcast.net Sat Aug 9 16:39:45 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Aug 9 15:40:18 2003
Subject: [spambayes-dev] Excellent program
In-Reply-To:
Message-ID:
[Daniel Eloff]
> ...
> I'm trying to understand what these lines of code do in the
> classifier.py and chi2.py files:
> (yesterday was my first brush with the python language).
Work your way thru the Python Tutorial, then: it will save a ton of
frustration. The Tutorial that comes with Python is most suitable for
experienced (in some other language) programmers.
> in chi2_spamprob:
>
> clues = self._getclues(wordstream)
> for prob, word, record in clues
>
> here we first see prob which is used extensively
> throughout the function. It's obviosuly very key to understanding
> the function, but I have no idea what value it is, or why
> for that matter. word and record don't seem to be used...
You need to learn more about Python first. Then when I say that the loop
iterates over a sequence of 3-tuples, you won't have to stop and wonder
about each word too. Crawl first, run later .
> and in chi2Q:
>
> assert v & 1 == 0
>
> What's this statment do? (I'm familair with assert statments, but not
> what's (v & 1) == 0 mean?
Same thing as in C (except for operator precedence): it's asserting that
the last bit in v is 0, or, IOW, that v is an even integer. This means the
same thing:
assert v % 2 == 0
From mhammond at skippinet.com.au Sun Aug 10 10:21:36 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sat Aug 9 19:21:29 2003
Subject: [spambayes-dev] ComputerWorld SpamBayes articles
Message-ID: <01f001c35ecc$fb5cad00$f502a8c0@eden>
Thanks Larry! I hadn't seen it. I hope you don't mind me forwarding
this...
Mark.
-----Original Message-----
From: Larry Fresinski
Sent: Saturday, 9 August 2003 9:30 PM
To: Mark Hammond
Subject: RE: [Spambayes-announce] ANNOUNCE: Version 007 of the Outlook
pluginavailable, and donations scheme up and running
Mark,
If you haven't seen it yet, I've told ComputerWorld about SPAMbayes work.
It's in there Aug. 4 edition and online here...
http://www.computerworld.com/softwaretopics/software/groupware/story/0,10801
,83689,00.html
http://www.computerworld.com/softwaretopics/software/groupware/story/0,10801
,83684,00.html
-Larry
From mhammond at skippinet.com.au Sun Aug 10 18:27:39 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 10 03:27:29 2003
Subject: [spambayes-dev] Merge outlook dialog branch soon
In-Reply-To: <000501c35e36$def8ad30$f502a8c0@eden>
Message-ID: <000001c35f10$e1dd5480$f502a8c0@eden>
> After a few days to see if Outlook-007 has fatal flaws, I
> intend merging the
> Outlook dialog branch onto the trunk. This will mean the
> next binary will
> come with this new code.
Done! The branch is dead.
Thanks,
Mark.
From T.A.Meyer at massey.ac.nz Mon Aug 11 14:23:59 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 10 21:24:33 2003
Subject: [spambayes-dev] ComputerWorld SpamBayes articles
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF36F3@its-xchg4.massey.ac.nz>
> If you haven't seen it yet, I've told ComputerWorld about
> SPAMbayes work. It's in there Aug. 4 edition and online here...
Is there any way we can stop people saying that SpamBayes doesn't work
with Outlook Express? Sure the Outlook plugin doesn't (which is
probably why it's called an Outlook plugin ;), but the other tools do.
Is the stuff on our website not clear enough, perhaps? Or are
journalists just too lazy to check what they publish?
=Tony Meyer
From mhammond at skippinet.com.au Mon Aug 11 12:49:19 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 10 21:49:22 2003
Subject: [spambayes-dev] RE: [Spambayes] does SpamBayes work with Outlook
Rules Wizard ?
In-Reply-To: <16182.61558.922112.812084@montanaro.dyndns.org>
Message-ID: <006101c35faa$c8604fe0$f502a8c0@eden>
[CC-ing spambayes-dev and dropping spambayes]
Skip:
> Actually, if you want something to run before Outlook, use
> pop3proxy or
> imapfilter. The interface is web-based, so it's obviously
> not going to be
> tightly integrated with Outlook, but it's functional.
>
>
>
> Another possibility might be to combine the best of the
> Outlook and the
> pop3proxy. When Outlook starts, you could fire up a proxy
> (like the core of
> pop3proxy) and reconfigure Outlook to get mail from the proxy
> and the proxy
> to get mail from the real POP server, restoring the POP3
> settings upon exit.
> The user interface would still be embedded in Outlook, but
> would control the
> proxy via XML-RPC. This would separate the UI from the proxy engine
> completely, allowing the proxy engine to be reused with plugins for
> different mailers.
>
>
I think that is actually a great long term idea. By splitting the "proxy"
part out well enough, we could still move the Spam and Unsure to Outlook
folders, and keep the same level of integration.
This wouldn't be easy, but would be a nice feature. Note that the new
Outlook dialogs are already using the "OptionsClass" objects, in much the
same way as the existing web interface does.
However, I'm not going to drive any effort like this for at least a few
months! I've got to get things as they stand now back to a managable level.
Mark.
From mhammond at skippinet.com.au Mon Aug 11 12:55:09 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 10 21:55:05 2003
Subject: [spambayes-dev] ComputerWorld SpamBayes articles
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF36F3@its-xchg4.massey.ac.nz>
Message-ID: <006201c35fab$9921c780$f502a8c0@eden>
> Is there any way we can stop people saying that SpamBayes doesn't work
> with Outlook Express?
Probably by removing the *bold* text on my download page that says "it does
not work with Outlook express" . However, that is in the context of
the addin, and is qualified in the next sentence, and elsewhere on the page.
I'd be happy to move this to sourceforge, but keep giving up when I have to
define whatever it is that sourceforge asks me to define to release a file.
If someone set up everything and mailed me the 2 steps I would need to go
through to do it via sourceforge, I promise I would do them
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1880 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/04d4f2fa/winmail.bin
From T.A.Meyer at massey.ac.nz Mon Aug 11 15:14:33 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 10 22:15:15 2003
Subject: [spambayes-dev] does SpamBayes work with OutlookRules Wizard ?
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF375D@its-xchg4.massey.ac.nz>
> I think that is actually a great long term idea. By
> splitting the "proxy" part out well enough, we could still
> move the Spam and Unsure to Outlook folders, and keep the
> same level of integration.
Would you then have Exchange and Hotmail proxies as well? Or still do
filtering for them in the same way?
Does this fit with the 'generic filter' Mark/Sean idea at all?
=Tony Meyer
From tim.one at comcast.net Mon Aug 11 00:20:40 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sun Aug 10 23:21:13 2003
Subject: [spambayes-dev] ComputerWorld SpamBayes articles
In-Reply-To: <006201c35fab$9921c780$f502a8c0@eden>
Message-ID:
[Mark Hammond]
> ...
> I'd be happy to move this to sourceforge, but keep giving up when I
> have to define whatever it is that sourceforge asks me to define to
> release a file. If someone set up everything and mailed me the 2
> steps I would need to go through to do it via sourceforge, I
> promise I would do them
I'm afraid it can't be reduced to two steps. Alas, it's one of those things
that's easy to do after you've done it, but takes forever to explain due to
the sheer number of buttons you have to click all over different screens;
the workflow for an SF file release is plain convoluted, but not truly
difficult.
If you'd like to release files from SF, I'd be happy to help -- I used to do
PLabs Python releases all the time on SF, and don't have to think about it.
Your part would be to upload the installer, via anonymous ftp, to the
incoming directory at upload.sf.net, then tell me the name of the uploaded
file. It takes about 10 steps after that to release the file, but they only
take about 2 minutes total (provided you don't have to think for 5 minutes
each 10 times before each step ).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1036 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030810/abe44f99/winmail.bin
From mhammond at skippinet.com.au Mon Aug 11 14:25:52 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 10 23:25:52 2003
Subject: [spambayes-dev] does SpamBayes work with OutlookRules Wizard ?
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF375D@its-xchg4.massey.ac.nz>
Message-ID: <00d401c35fb8$456e8d00$f502a8c0@eden>
> > I think that is actually a great long term idea. By
> > splitting the "proxy" part out well enough, we could still
> > move the Spam and Unsure to Outlook folders, and keep the
> > same level of integration.
>
> Would you then have Exchange and Hotmail proxies as well? Or still do
> filtering for them in the same way?
Yeah, we could not proxy them effectively. But yeah, we could still use
both techniques. But then we are back where we started for a significant
number of users.
> Does this fit with the 'generic filter' Mark/Sean idea at all?
No impact at all really.
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1868 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/d06edf29/winmail.bin
From T.A.Meyer at massey.ac.nz Mon Aug 11 16:50:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 10 23:51:22 2003
Subject: [spambayes-dev] testing tweaks
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz>
[Tim explains his & Gary's uni/bigram idea]
> I only had time to run a few tests on that, and it looked very
> promising,
You're right about that! I had a little play around with this idea over
the weekend and it certainly improves the results.
I was lazy, so I did this the easiest way (well, what seemed the easiest
way), producing *token* bigrams, rather than *word* bigrams. This means
that "This is a test" produces "This", "test" and "This test", since
"is" and "a" don't generate tokens. It also means that our synthetic
tokens become part of bigrams (so a bigram could be skip information, or
headers and so on). Whether it's better or worse than word bigrams, I
don't know (that's what testing is for!). Also as a result of the
laziness, I left in the circular bigram that was created of the last
token and first token; since the first token is likely to be fairly
constant, I doubt this makes much difference.
Here are preliminary results [using "timtest.py -n5"].
The two columns with "fresh" in the filename are results with a
fresh-from-cvs spambayes. The "tim1s" column are results where I
mistakenly allowed duplicate tokens to be generated (if a token had a
stronger difference than both the bigram with the previous token and the
next token then it is used twice).
The "tim2" columns are with this mistake removed, and the "tim3" columns
are like tim2, but also with Kenny's variant of Sean's
split_compound_words idea enabled.
filename: sa_freshs sa_tim3s tim1s tim2s tim3s
sa_tim2s freshs
ham:spam: 7580:7580 7580:7580 7900:15260 7900:15260
7580:7580 7900:15260 7900:15260
fp total: 44 47 47 2 2 2 2
fp %: 0.58 0.62 0.62 0.03 0.03 0.03 0.03
fn total: 16 12 13 176 94 128 127
fn %: 0.21 0.16 0.17 1.15 0.62 0.84 0.83
unsure t: 356 315 320 501 497 482 500
unsure %: 2.35 2.08 2.11 2.16 2.15 2.08 2.16
real cost: $527.20 $545.00 $547.00 $296.20 $213.40 $244.40 $247.00 best
cost: $592.40 $843.20 $825.20 $489.60 $379.20 $402.20 $416.40
h mean: 3.40 4.07 4.07 0.63 1.19 0.92 0.94
h sdev: 14.19 15.55 15.49 4.84 7.05 5.98 6.09
s mean: 97.94 98.76 98.74 94.52 96.23 96.02 95.99
s sdev: 9.43 7.80 7.88 18.67 14.79 15.54 15.64
mean diff: 94.54 94.69 94.67 93.89 95.04 95.10 95.05
k: 4.00 4.06 4.05 3.99 4.35 4.42 4.37
So a *big* win on the second set (which is from my actual mail; the
other corpus is based on the SpamAssassin public corpus) in terms of
fn's. In fact the mistake variant did best - almost halving the number
of fn's. Not sure about the first set - 3 more fp's, but 3 fewer fn's
and quite a drop in the number of unsures. I care more about the second
set, anyway.
My (bsddb based) databases ballooned from about 1.5MB to about 10MB, but
what do I care?
Although the second set was all from my actual mail, the training set I
use is much smaller - about 400 ham and 4000 spam (a crazy imbalance,
but it works...). These results are from this smaller set, using
"timtest.py -n3", first without the adjustment, and then with.
filename: reals real_tims real_tim_adjs
real_adjs
ham:spam: 754:8884 754:8884
754:8884 754:8884
fp total: 0 0 0 0
fp %: 0.00 0.00 0.00 0.00
fn total: 193 72 583 455
fn %: 2.17 0.81 6.56 5.12
unsure t: 638 470 541 438
unsure %: 6.62 4.88 5.61 4.54
real cost: $320.60 $166.00 $691.20 $542.60
best cost: $316.00 $197.20 $435.40 $391.60
h mean: 2.88 5.94 1.09 1.39
h sdev: 11.20 16.56 5.79 6.87
s mean: 92.54 95.98 84.92 87.91
s sdev: 21.09 14.86 32.43 29.64
mean diff: 89.66 90.04 83.83 86.52
k: 2.78 2.87 2.19 2.37
Again, a clear win for me (although the ham mean does jump up quite a
bit).
=Tony Meyer
From tim.one at comcast.net Mon Aug 11 01:23:01 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Aug 11 00:23:35 2003
Subject: [spambayes-dev] Outlook release on SourceForge
Message-ID:
[pyton-dev'ers, read the last paragraph and weep ]
I created a "Outlook Addin" package on the spambayes file release page, and
added Mark's 0.7 installer to it:
http://sf.net/project/showfiles.php?group_id=61702
I also ripped off much of the text from Mark's Starship page (this is what
you get to if you click on the "Version 0.7" on the page above):
http://sf.net/project/shownotes.php?release_id=177282
Mark, if you object to any of this, don't be shy!
Meaningless statistics :
Project UNIX name: spambayes
Registered: 2002-09-03 22:33
Activity Percentile (last week): 99.4015%
"Activity ratings" on SF are computed by an arcane formula, but generally
speaking the higher the percentile the more "active" a project is. With the
percentile above, this is how many SF projects rank below spambayes:
>>> round(66620 * 0.994015)
66221.0
>>>
In particular, we rank higher than Python(!) now, which currently sits at
99.0583%. Python used to be among the 10 most active projects every week,
but lost a lot when it stopped releasing files from SF (downloads count a
lot toward the activity ranking). It's possible that downloads of the
Outlook addin could boost spambayes to that lofty level. It's also possible
that I'll get a job working on Python someday .
From popiel at wolfskeep.com Sun Aug 10 23:05:10 2003
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Mon Aug 11 01:05:14 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To: Message from "Meyer, Tony" of "Mon,
11 Aug 2003 15:50:39 +1200."
<1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz>
Message-ID: <20030811050510.9AE5C2DDF1@cashew.wolfskeep.com>
In message: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz>
"Meyer, Tony" writes:
>[Tim explains his & Gary's uni/bigram idea]
>Here are preliminary results [using "timtest.py -n5"].
Hey, where's the patch? It's kind of hard to generate corroborating
evidence without a patch...
- Alex
From T.A.Meyer at massey.ac.nz Mon Aug 11 18:52:34 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 11 01:53:18 2003
Subject: [spambayes-dev] testing tweaks
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz>
> Hey, where's the patch? It's kind of hard to generate
> corroborating evidence without a patch...
Good point . Attached are "diff -u"s - is that right?
Anyone wise in the ways of Python is welcome to point out the
inefficiencies in the code; I'm happy to learn :)
=Tony Meyer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: classifier.diff
Type: application/octet-stream
Size: 3777 bytes
Desc: classifier.diff
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/classifier.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: msgs.diff
Type: application/octet-stream
Size: 324 bytes
Desc: msgs.diff
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/msgs.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Tester.diff
Type: application/octet-stream
Size: 1828 bytes
Desc: Tester.diff
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/b9a2f1d3/Tester.obj
From mhammond at skippinet.com.au Mon Aug 11 17:29:57 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Aug 11 02:29:52 2003
Subject: [spambayes-dev] RE: Outlook release on SourceForge
In-Reply-To:
Message-ID: <01af01c35fd1$fc66b960$f502a8c0@eden>
> Mark, if you object to any of this, don't be shy!
Sounds good to me - thanks! In the meantime, I changed the link on my
starship page to the sourceforge download URL -
http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Setup-007.exe
?download - is there any evidence this will or will not have the same effect
as sending then to the "file releases" page?
Let's-beat-those-damn-pythoneers ly,
Mark.
From anthony at interlink.com.au Mon Aug 11 18:00:01 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon Aug 11 03:00:16 2003
Subject: [spambayes-dev] RE: Outlook release on SourceForge
In-Reply-To: <01af01c35fd1$fc66b960$f502a8c0@eden>
Message-ID: <200308110700.h7B701kk022337@localhost.localdomain>
>>> "Mark Hammond" wrote
> Sounds good to me - thanks! In the meantime, I changed the link on my
> starship page to the sourceforge download URL -
> http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Setup-007.exe
> ?download - is there any evidence this will or will not have the same effect
> as sending then to the "file releases" page?
Geez - I had a brief look at the starship http logs, and according
to them, there's been 6400 downloads of the 006 installer in the first
9 days of August alone. If Mark can get all of those users to kick in $0 each,
then pretty soon he might fail to be very very wealthy.
Ah well, you can always sell off the excess goodwill generated to someone
like Microsoft.
Anthony.
From mark.winder4 at btinternet.com Mon Aug 11 11:30:11 2003
From: mark.winder4 at btinternet.com (Mark Winder)
Date: Mon Aug 11 05:28:29 2003
Subject: [spambayes-dev] Newbie is well pythoned! Help!
Message-ID: <000801c35feb$2a493560$0200a8c0@bigdaddy>
Hi,
I tried to use Spambayes but have fallen at the first post. From what I've read I could easily use it if you gave me only a tincy wincy bit more info. It may not in fact be your fault that I've failed.
I'd like to use with Outlook Express under Win98 SE, so I've installed the Python 2.3 using the windows installer. I accepted all the defaults, and I also looked at the advanced settings that said make file associations with .py etc etc, it was checked.
I then downloaded and unpacked the Spambayes source code to C:\temp so that it was in the directory c:\temp\spambyes-1.0-a4
Next, according to your instructions you say
Once you've downloaded and unpacked the source archive, do the regular setup.py build; setup.py install dance,
Well typing setup.py at the dos prompt produced "bad command or filename"
Clicking on it seemed to do something, but its not clear what.
After this I'm stuck. As you will have gathered, I don't know python. I alost ried entering the python GUI, but this didn't help.
Can you give me a clue ?
regards,
Mark Winder. .
---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.507 / Virus Database: 304 - Release Date: 04/08/03
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/9e9cde9d/attachment.htm
From paulhar at netapp.com Mon Aug 11 15:04:56 2003
From: paulhar at netapp.com (Hargreaves, Paul)
Date: Mon Aug 11 08:05:33 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
Message-ID: <765B6B38B4D29D498077F8E644E23B7F01A1CE04@nlhoe2k02.europe.netapp.com>
"3.10 - It is recommended that you configure auto-complete to keep at
least a few days of Spam around,"
I'm sure this is supposed to be "auto-archive" as mentioned in the
paragraph above.
A question I have that may/may not be suitable for the FAQ:
Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e.
mail sorting rather than spam detection).
Regards,
Paul Hargreaves
Systems Engineer
Network Appliance
Unit 1160 Elliot Court
Herald Avenue
Coventry Business Park
Coventry, CV5 6UB
From kennypitt at hotmail.com Mon Aug 11 11:59:26 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Mon Aug 11 10:59:38 2003
Subject: [spambayes-dev] FolderSelector problem in Define Filters dialog
Message-ID: <3F37AF4E.1060209@hotmail.com>
Recently I reinstalled Outlook using a different profile name. Rather
than copying my old plug-in .INI file to the new profile name, I decided
to just redo my settings from scratch in the SpamBayes Manager. When I
went into Define Filters I found that the Browse buttons under Certain
Spam and Possible Spam did *nothing*, while the Browse button for
"Filter the following folders" worked fine.
I tracked the problem down to two fixes, for which I have attached
diffs. The first change to opt_processors.py allowed the FolderSelector
dialog to display when Browse was clicked, but I couldn't click OK and
the status did not update at the bottom of the dialog. The second fix
in FolderSelector.py seems to have corrected the problem, and I now have
my spam folders selected and filtering enabled.
--
Kenny Pitt
-------------- next part --------------
Index: opt_processors.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/opt_processors.py,v
retrieving revision 1.2
diff -u -r1.2 opt_processors.py
--- opt_processors.py 10 Aug 2003 07:26:50 -0000 1.2
+++ opt_processors.py 11 Aug 2003 14:50:20 -0000
@@ -186,7 +186,7 @@
if is_multi:
ids = self.option.get()
else:
- ids = [self.optin.get()]
+ ids = [self.option.get()]
from dialogs import FolderSelector
if self.option_include_sub:
cb_state = self.option_include_sub.get()
-------------- next part --------------
Index: FolderSelector.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/FolderSelector.py,v
retrieving revision 1.21
diff -u -r1.21 FolderSelector.py
--- FolderSelector.py 10 Aug 2003 07:26:49 -0000 1.21
+++ FolderSelector.py 11 Aug 2003 14:47:11 -0000
@@ -340,8 +340,8 @@
# If single-select, the checked state is not used, just the
# selected state.
try:
- h = win32gui.SendMessage(self.list, commctrl.TVM_GETSELECTEDITEM,
- commctrl.TVGN_CARET, h)
+ h = win32gui.SendMessage(self.list, commctrl.TVM_GETNEXTITEM,
+ commctrl.TVGN_CARET, commctrl.TVI_ROOT)
except win32gui.error:
return
info = self._GetLVItem(h)
From tim.one at comcast.net Mon Aug 11 12:58:14 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Aug 11 11:58:47 2003
Subject: [spambayes-dev] Slashdotted
Message-ID:
Barry W pointed out that spambayes was "the winner" in this comparative
review linked to from slashdot.org today (under the "Comparison of Bayesian
POP3 Spam Filters" headline on Slashdot's front page):
http://home.dataparty.no/kristian/reviews/bayesian/
Yawn .
From skip at pobox.com Mon Aug 11 12:04:30 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 11 12:04:50 2003
Subject: [spambayes-dev] Re: [Spambayes] Slashdotted
In-Reply-To:
References:
Message-ID: <16183.48782.267489.993991@montanaro.dyndns.org>
Tim> Barry W pointed out that spambayes was "the winner" in this
Tim> comparative review linked to from slashdot.org today (under the
Tim> "Comparison of Bayesian POP3 Spam Filters" headline on Slashdot's
Tim> front page):
Tim> http://home.dataparty.no/kristian/reviews/bayesian/
Tim> Yawn .
I see that it mentions "even grandma can use it". I don't suppose your
sisters are grandmas yet, are they? Perhaps the author was referring to
them. ;-)
Skip
From skip at pobox.com Mon Aug 11 14:51:51 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 11 14:52:33 2003
Subject: [spambayes-dev] Re: [Spambayes] Bug
In-Reply-To: <3F37DFD1.8080809@newsguy.com>
References: <3F37DFD1.8080809@newsguy.com>
Message-ID: <16183.58823.134841.786455@montanaro.dyndns.org>
John> Got this from SpamBayes, just downloaded and installed the latest
John> from the site today and installed for the first time.
This has been seen more and more recently. I think we could "correct"
messages which have raw 8-bit text in their headers before feeding them to
the email package. Thus this particular message's Subject: header would get
converted from
Cvnf stability money-maker w!hen life seem5 to expensive, you ?eed to get ah?ad
to (I think):
=?ISO-8859-1?Q?Cvnf stability money-maker w!hen life seem5 to expensive, you =F1eed to g=
et ah=EAad=
You'd obviously have to make an educated guess about the actual encoding. I
have a function that does a pretty good job. The list of encodings to try
could be an option with a default oriented toward ISO-8859-* and its various
Windows variants.
Skip
From igor at gameplasma.com Mon Aug 11 15:47:27 2003
From: igor at gameplasma.com (Igor "JI" Murashkin)
Date: Mon Aug 11 16:01:05 2003
Subject: [spambayes-dev] SpamBayes - on a Linux mail server?
Message-ID: <000001c36041$799fc380$660010ac@swordsmaster>
Hello, I have a Linux box, on which I have a mail server installed. In
short, I fetch my mail from the server that's on Linux, but I have
enough users -- 20-30 who use my mail server daily.
My question is, would it be possible to install SpamBayes so that it
filters everything out server side, so that the spam never even reaches
the users, being instantly discarded? That would be lovely, as I
wouldn't want all my users to worry about installing their own spam
filters.
Thanks!
-Igor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/8ce100fc/attachment.htm
From tim.one at comcast.net Mon Aug 11 17:30:42 2003
From: tim.one at comcast.net (Tim Peters)
Date: Mon Aug 11 16:31:10 2003
Subject: [spambayes-dev] RE: Outlook release on SourceForge
In-Reply-To: <01af01c35fd1$fc66b960$f502a8c0@eden>
Message-ID:
[Mark Hammond]
> ...
> In the meantime, I changed the link on my starship page to the
< sourceforge download URL -
> http://prdownloads.sourceforge.net/spambayes/SpamBayes-Outlook-Set
> up-007.exe ?download - is there any evidence this will or will not
> have the same effect as sending then to the "file releases" page?
I expect it's the same, but don't know for sure.
> Let's-beat-those-damn-pythoneers ly,
We already beat those losers; I'm searching for a worthier adversary now
.
From vanhorn at whidbey.com Mon Aug 11 14:34:57 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Mon Aug 11 16:35:00 2003
Subject: [spambayes-dev] SpamBayes - on a Linux mail server?
References: <000001c36041$799fc380$660010ac@swordsmaster>
Message-ID: <3F37FDF1.EC35B437@whidbey.com>
Philosophically, the answer is not to do this, as no two people are
likely to have exactly the same reaction to different messages. Now, how
much of a difference is this? Probably quite a bit. I have two instances
of pop3proxy running, one handles four or five accounts on my local
workstation, the other one handles four accounts on another workstation.
Mine works like a charm.
The second one, which handles two low-volume accounts of mine and my
wife's mail, has definite problems because I didn't know every list she
subscribed to. I know that anything that involves quilts or beads is
going to be ham for her, but she has some general interests as well. So
there were some messages that looked like spam to SpamBayes, and I
corroborated that judgment in the training, so even when I go back and
take those same messages and train as ham, it's going to be a while
before similar messages stop ending up in her unsure folder.
Now, if you can figure out a way to make sure that each user only trains
on their own mail, then it probably would work. But the default is for
SB to train by reinforcing its existing decisions, which introduces some
real problems in the multi-user scenario.
I am, however, considering offering the proxy or IMAP on my mail server,
which would allow the users to drop their volume of mail that they
actually pick up. They will be able to do their own training, but they
won't have to do their own local installs.
But sharing a training database looks like a loser to me.
Van
Igor \"JI\" Murashkin wrote:
> Hello, I have a Linux box, on which I have a mail server installed.
> In short, I fetch my mail from the server that's on Linux, but I have
> enough users -- 20-30 who use my mail server daily.
> My question is, would it be possible to install SpamBayes so that it
> filters everything out server side, so that the spam never even
> reaches the users, being instantly discarded? That would be lovely, as
> I wouldn't want all my users to worry about installing their own spam
> filters. Thanks!-Igor
>
> ----------------------------------------------------------------
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
>
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030811/7db79a9d/attachment.htm
From richie at entrian.com Tue Aug 12 00:21:43 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Aug 11 18:21:51 2003
Subject: [spambayes-dev] Newbie is well pythoned! Help!
In-Reply-To: <000801c35feb$2a493560$0200a8c0@bigdaddy>
References: <000801c35feb$2a493560$0200a8c0@bigdaddy>
Message-ID: <805gjvscr7frqthndr8k7gg0h2qu0au7de@4ax.com>
Hi Mark,
> do the regular setup.py build; setup.py install dance,
> Well typing setup.py at the dos prompt produced "bad command or filename"
You need to start up a Command Prompt and do something like this:
> cd \temp\spambayes-1.0a4
> c:\python23\python setup.py install
You may need to tweak the commands to fit your machine, but those are the
steps of the dance on Win98.
The reason you can't just double-click setup.py is that it expects a
command-line argument ('install'). Where a script doesn't expect an
argument, and this includes pop3proxy.py, you should be able to just
double-click it. However, if you do that, you might not see any errors
it outputs (not that pop3proxy ever causes any errors 8-) so even for
argumentless scripts, running them from a command prompt is a good idea.
The spambayes scripts, including pop3proxy.py, will be installed into
\python23\scripts (or its equivalent on your machine). Don't run the
ones in \temp\spambayes-1.0a4 - you can delete that once you've
installed the software.
By the way, the "spambayes-dev" list is really for discussion of the
development of the spambayes code - the "spambayes" list is for
discussion of installation and usage.
--
Richie Hindle
richie@entrian.com
From T.A.Meyer at massey.ac.nz Tue Aug 12 16:29:44 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 11 23:30:34 2003
Subject: [spambayes-dev] RE: [Spambayes] Training good messages has no effect
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF39B4@its-xchg4.massey.ac.nz>
> > Current version is 0.6, latest is 0.6.
>
> Current is actually 0.7, but we keep managing to screw up the
> website where this is stored. "Check Latest Version" should
> start reporting 0.7 soon.
What about my suggestion that we make installing Version.cfg separate
from making all? So a "make install" *doesn't* install Version.cfg, and
a "make version install" command is necessary?
I can see this accidentally happening all the time...
=Tony Meyer
From popiel at wolfskeep.com Mon Aug 11 22:19:26 2003
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Tue Aug 12 00:19:30 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To: Message from "Meyer, Tony" of "Mon,
11 Aug 2003 17:52:34 +1200."
<1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz>
Message-ID: <20030812041926.933632DE8C@cashew.wolfskeep.com>
In message: <1ED4ECF91CDED24C8D012BCF2B034F1302BF387F@its-xchg4.massey.ac.nz>
"Meyer, Tony" writes:
>This is a multi-part message in MIME format.
>
>------_=_NextPart_001_01C35FCC.C2CD2B12
>Content-Type: text/plain;
> charset="US-ASCII"
>Content-Transfer-Encoding: quoted-printable
>
>> Hey, where's the patch? It's kind of hard to generate=20
>> corroborating evidence without a patch...
>
>Good point . Attached are "diff -u"s - is that right?
It looks like only the classifier change is needed; the others
look like null changes to me. Is this correct?
Also, for those of us still running 2.2, it's nice to stick in
the 'from __future__ import generators' at the top of the file,
while using yield.
I'm now having the following error thrown:
Traceback (most recent call last):
File "timcv.py", line 167, in ?
main()
File "timcv.py", line 164, in main
drive(nsets)
File "timcv.py", line 113, in drive
d.test(hamstream, spamstream)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/TestDriver.py", line 265, in test
t.predict(spam, True, new_spam)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/Tester.py", line 92, in predict
prob = guess(example)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 225, in chi2_spamprob
clues = self._getclues(wordstream)
File "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/classifier.py", line 452, in _getclues
q = wordstream.next()
AttributeError: 'Msg' object has no attribute 'next'
Given that the error clearly didn't happen on the first message it tried
to classify, I suspect it's triggered by a peculiarity of one of my
messages... as a random guess, I'd say perhaps a MIME multipart/digest
or some other thing that has an embedded rfc822 section? In any case,
I'm looking at how I might rephrase the classifier to avoid this issue...
- Alex
From mhammond at skippinet.com.au Tue Aug 12 15:29:46 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Aug 12 00:29:48 2003
Subject: [spambayes-dev] RE: [Spambayes] Training good messages has no effect
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF39B4@its-xchg4.massey.ac.nz>
Message-ID: <00cf01c3608a$5ce8f470$f502a8c0@eden>
> What about my suggestion that we make installing Version.cfg separate
> from making all? So a "make install" *doesn't* install
> Version.cfg, and
> a "make version install" command is necessary?
>
> I can see this accidentally happening all the time...
I agree 100%. Feel free to beat me too it :) I'd be happy with a single
target, say 'version' that also did the install. Indeed, I would be
surprised if you can convince 'make' to work so that "make version install"
updates version.cfg, but neither "make version" nor "make install" do.
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1788 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030812/fa5fcb11/winmail-0001.bin
From tim.one at comcast.net Tue Aug 12 01:45:23 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Aug 12 00:45:59 2003
Subject: [spambayes-dev] testing tweaks
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF37D9@its-xchg4.massey.ac.nz>
Message-ID:
There was a thread partly about the mixed unigram/bigram scheme last
November, starting here:
http://mail.python.org/pipermail/spambayes/2002-November/001912.html
It wasted time starting with a unigram+bigram+trigram scheme, and wasted
more time trying to use hash codes to reduce the database burden (we've
regretted that every time we've tried it).
The spambayes results on my main test data were already so good then that
testing couldn't verify any claimed improvement (it could only demonstrate
that a suggested idea did worse). The "I only had time to run a few tests
on that, and it looked very promising" refers to later small tests I never
wrote up. They were closest to what a msg late in this thread called "bix"
(exact (non-hashing) bigrams).
Like Tony did, I was really using token bigrams (and trigrams, at the
start). There were many mysteries related to bigrams created from header
tokens, as pointed out in several of that thread's messages. Another
mystery covered there is that split-on-whitespace still beat "extract words"
for the fundamental tokenization gimmick. It's a mystery because the only
"reason" I ever found for s-o-w winning with unigrams was the weak context
info it offers (like "free!!" is more likely to be spammy than "free").
Moving to bigrams (or higher) really should give much stronger context info
than we get from keeping punctuation.
So many mysteries, so little time ...
From T.A.Meyer at massey.ac.nz Tue Aug 12 17:58:01 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 12 00:58:40 2003
Subject: [spambayes-dev] Contact page
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz>
A while back we had some suggestions about making the mailing list
details more prominent, and one of the suggestions was that we have a
separate contact page with this (and possibly other) information. (I'm
too lazy to give you links, but it's there in the archives somewhere).
I've checked in (and uploaded) a stab at a contact page (no other pages
have been changed). This would be linked from the side bar - instead of
"Email Us", have "Contact Us", and link it to the page, rather than
spambayes@python.org.
Comments?
=Tony Meyer
From skip at pobox.com Tue Aug 12 01:41:37 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 12 01:41:45 2003
Subject: [spambayes-dev] Contact page
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A1F@its-xchg4.massey.ac.nz>
Message-ID: <16184.32273.139414.488082@montanaro.dyndns.org>
Tony> I've checked in (and uploaded) a stab at a contact page (no other
Tony> pages have been changed). This would be linked from the side bar
Tony> - instead of "Email Us", have "Contact Us", and link it to the
Tony> page, rather than spambayes@python.org.
+1.
Skip
From anthony at interlink.com.au Tue Aug 12 16:58:36 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Aug 12 01:59:12 2003
Subject: [spambayes-dev] Re: [Spambayes] SpamBayes Problem
In-Reply-To: <012f01c36093$7108d890$f502a8c0@eden>
Message-ID: <200308120558.h7C5wjlS016774@localhost.localdomain>
>>> "Mark Hammond" wrote
> You could try resetting all the toolbars - see
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/spambayes/spambaye
> s/Outlook2000/docs/troubleshooting.html, and the bit about deleting
> "outcmd.dat".
I'm thinking a page http://spambayes.sourceforge.net/troubleshooting.html
that simply redirects to the above page could be a good thing.
--
Anthony Baxter
It's never too late to have a happy childhood.
From mhammond at skippinet.com.au Tue Aug 12 17:04:26 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Aug 12 02:04:27 2003
Subject: [spambayes-dev] RE: [Spambayes] SpamBayes Problem
In-Reply-To: <200308120558.h7C5wjlS016774@localhost.localdomain>
Message-ID: <014e01c36097$96b490d0$f502a8c0@eden>
> I'm thinking a page http://spambayes.sourceforge.net/troubleshooting.html
> that simply redirects to the above page could be a good thing.
Yeah - or "outlook/troubleshooting.html" etc - then we could run amok ;)
I-better-go-find-'html-for-dummies' ly,
Mark.
From T.A.Meyer at massey.ac.nz Tue Aug 12 19:33:16 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 12 02:33:59 2003
Subject: [spambayes-dev] testing tweaks
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A53@its-xchg4.massey.ac.nz>
> It looks like only the classifier change is needed; the
> others look like null changes to me. Is this correct?
Sorry, it probably is. I made the other changes when I was trying to
make the change to tokenizer (to get word, not token, bigrams), before I
reconsidered and moved to classifier.
> Also, for those of us still running 2.2, it's nice to stick
> in the 'from __future__ import generators' at the top of the
> file, while using yield.
Sorry, I'll try and be more considerate...
> I'm now having the following error thrown:
[...]
> File
> "/home/cashew/popiel/spambayes/testing/spambayes/spambayes/cla
> ssifier.py", line 452, in _getclues
> q = wordstream.next()
> AttributeError: 'Msg' object has no attribute 'next'
I recall seeing something like this - it's the reason for the if
type(wordstream) stuff in learn and unlearn. Sometimes the wordstream
is a Msg object rather than a generator. However, this:
"""
if type(wordstream) == types.GeneratorType:
wordstream = self._enhance_wordstream(wordstream)
"""
should probably be:
"""
if type(wordstream) == type(Msg):
wordstream = self._enhance_wordstream(wordstream.as_tokens)
else:
wordstream = self._enhance_wordstream(wordstream)
"""
(which then does require some of the other changes. I'm not sure if Msg
is in the namespace, either).
Perhaps the difference is that I was using timtest and you were using
timcv? I can't recall when I saw the error, but it was definitely only
in learning/unlearning, not getting probability.
Off to look at Tim's original patch...
=Tony Meyer
From T.A.Meyer at massey.ac.nz Tue Aug 12 20:13:43 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 12 03:14:36 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz>
> "3.10 - It is recommended that you configure auto-complete to keep at
> least a few days of Spam around,"
>
> I'm sure this is supposed to be "auto-archive" as mentioned in the
> paragraph above.
Thanks. This is now fixed.
> A question I have that may/may not be suitable for the FAQ:
>
> Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e.
> mail sorting rather than spam detection).
I'm not sure if this is a FAQ or not, but the short answer is "no - look
at POPfile instead". The longer answer is "maybe, read through the
archives for discussions about how you could implement this".
=Tony Meyer
From vanhorn at whidbey.com Tue Aug 12 12:03:58 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Tue Aug 12 14:04:02 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz>
Message-ID: <3F392C0E.70ABD77E@whidbey.com>
I think the long answer would be more along the lines of this:
"SpamBayes wasn't designed for that, while POPFiles was. However, like any
two-state filter, a series of SpamBayes instances could be cascaded to
filter into any number of categories, although each would need to be
trained. The result might be very effective, but not efficient."
Telling anyone to "read through the archives" is cruel, given the volume
here. I think I read every post that doesn't relate to Outlook, and even
some of those, daily. I'd hate to have to go back and find anything if I got
behind.
Van
"Meyer, Tony" wrote:
> > "3.10 - It is recommended that you configure auto-complete to keep at
> > least a few days of Spam around,"
> >
> > I'm sure this is supposed to be "auto-archive" as mentioned in the
> > paragraph above.
>
> Thanks. This is now fixed.
>
> > A question I have that may/may not be suitable for the FAQ:
> >
> > Xxx. Can I use SpamBayes to filter into more than 2 categories (i.e.
> > mail sorting rather than spam detection).
>
> I'm not sure if this is a FAQ or not, but the short answer is "no - look
> at POPfile instead". The longer answer is "maybe, read through the
> archives for discussions about how you could implement this".
>
> =Tony Meyer
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
From tim.one at comcast.net Tue Aug 12 15:35:12 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Aug 12 14:35:52 2003
Subject: [spambayes-dev] RE: [Spambayes] SpamBayes : contribution
In-Reply-To: <159ijvgekl9go0bgtu0hq26rk195hq6d2l@4ax.com>
Message-ID:
[Tim]
>> If we wait a few months, the PSF is currently paying a lawyer to
>> review a "joint ownership" contribution agreement, enabling the PSF
>> and a contributor to effectively share copyright.
[Richie Hindle]
> That's a good idea - could you keep us (or more likely spambayes-dev)
> up to date with what happens? Ta.
Oh sure. It will be publicized when it happens . An already
out-of-date draft is in the
Proposed Contributor Agreement
section at
http://www.python.org/psf/psf-contributor-agreement.html
From skip at pobox.com Tue Aug 12 16:17:32 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 12 16:17:42 2003
Subject: [spambayes-dev] Simply n-way classifier
Message-ID: <16185.19292.111612.453684@montanaro.dyndns.org>
I checked in a simple n-way classifier to the contrib directory just now.
Executing 'python nway.py -h" should give interested parties a reasonable
idea of how to use it and create databases for it.
Skip
From skip at pobox.com Tue Aug 12 22:54:39 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 12 22:54:50 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
In-Reply-To: <3F392C0E.70ABD77E@whidbey.com>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3A5D@its-xchg4.massey.ac.nz>
<3F392C0E.70ABD77E@whidbey.com>
Message-ID: <16185.43119.305872.694102@montanaro.dyndns.org>
Van> I think the long answer would be more along the lines of this:
Van> "SpamBayes wasn't designed for that, while POPFiles was. However,
Van> like any two-state filter, a series of SpamBayes instances could be
Van> cascaded to filter into any number of categories, although each
Van> would need to be trained. The result might be very effective, but
Van> not efficient."
I added a question and answer about this topic to the FAQ today. I also
added a toy app to the contrib directory (nway.py) modelled after
hammiefilter.py.
Skip
From mhammond at skippinet.com.au Wed Aug 13 15:11:02 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Wed Aug 13 00:10:48 2003
Subject: [spambayes-dev] www.python.org/sf/sb/12345?
Message-ID: <062101c36150$e9224100$f502a8c0@eden>
I really like the www.python.org/sf/xxxxx cgi script - is there any hope of
getting one for us? If it is hard for our sf based site, can we beg, borrow
or steal space on python.org? Maybe that could be a benefit of being a "PSF
sponsoring project" (of which we are a founding member) . Or maybe on
starship?
putting-off-the-bookwork ly,
Mark.
From T.A.Meyer at massey.ac.nz Wed Aug 13 17:13:01 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 00:13:38 2003
Subject: [spambayes-dev] www.python.org/sf/sb/12345?
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz>
> I really like the www.python.org/sf/xxxxx cgi script - is
> there any hope of getting one for us?
Definitely +1. I've thought this many times, too.
=Tony Meyer
From skip at pobox.com Wed Aug 13 00:39:15 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 00:39:28 2003
Subject: [spambayes-dev] www.python.org/sf/sb/12345?
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D04@its-xchg4.massey.ac.nz>
Message-ID: <16185.49395.656633.177189@montanaro.dyndns.org>
>> I really like the www.python.org/sf/xxxxx cgi script - is there any
>> hope of getting one for us?
Tony> Definitely +1. I've thought this many times, too.
Okay, I have it running at
http://staging.musi-cal/com/cgi-bin/sf
Usage is the same as the the one for the Python project. If you fail to
give an id it prompts for one.
Where can this go? Does SF support the ability to run CGI scripts?
Skip
From skip at pobox.com Wed Aug 13 00:42:13 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 00:42:27 2003
Subject: [spambayes-dev] www.python.org/sf/sb/12345?
In-Reply-To: <062101c36150$e9224100$f502a8c0@eden>
References: <062101c36150$e9224100$f502a8c0@eden>
Message-ID: <16185.49573.966870.718177@montanaro.dyndns.org>
Mark> I really like the www.python.org/sf/xxxxx cgi script - is there
Mark> any hope of getting one for us? If it is hard for our sf based
Mark> site, can we beg, borrow or steal space on python.org? Maybe that
Mark> could be a benefit of being a "PSF sponsoring project" (of which
Mark> we are a founding member) . Or maybe on starship?
I can tweak the version I have on the Musi-Cal staging server to not collide
with the Python project's use, then see if one script can serve both
projects (default the groupid to 5470). Alternatively, the SB version can
run as "sb" instead of "sf". Either way is fine with me.
Skip
From T.A.Meyer at massey.ac.nz Wed Aug 13 19:53:32 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 02:54:08 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz>
> I think the long answer would be more along the lines of this:
>
> "SpamBayes wasn't designed for that, while POPFile was.
> However, like any two-state filter, a series of SpamBayes
> instances could be cascaded to filter into any number of
> categories, although each would need to be trained. The
> result might be very effective, but not efficient."
That is a much better answer :)
> Telling anyone to "read through the archives" is cruel, given
> the volume here.
I should have said "search through the archives", not read. Googling
for 'site:mail.python.org spambayes "n-way"' brings up lots of posts,
which are probably the relevant ones, or at least a starting point.
(Although the change from "pipermail-21" to "pipermail" breaks almost
all the links, it's fixable by hand).
=Tony Meyer
From T.A.Meyer at massey.ac.nz Wed Aug 13 19:57:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 02:58:34 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz>
> I added a question and answer about this topic to the FAQ
> today. I also added a toy app to the contrib directory
> (nway.py) modelled after hammiefilter.py.
Any reason why the FAQ doesn't mention that script? BTW, did you try
the script out? If so, how did it go, at first glance?
=Tony Meyer
From T.A.Meyer at massey.ac.nz Wed Aug 13 21:59:45 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 05:00:25 2003
Subject: [spambayes-dev] New Outlook Dialogs Problem
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
If I haven't got enough training information to enable filtering, the
"enable filtering" box isn't greyed out anymore. If I try to check it,
it doesn't check, and I get this traceback:
Traceback (most recent call last):
File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
286, in OnCommand
self.ApplyHandlingOptionValueError(handler.OnCommand, wparam,
lparam)
File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
245, in ApplyHandlingOptionValueError
self.dialog_def.caption, mb_flags)
AttributeError: ProcessorDialog instance has no attribute 'dialog_def'
(I'd try and fix this myself, but the new dialog code still has me
overwhelmed ;)
=Tony Meyer
From barry at python.org Wed Aug 13 12:47:18 2003
From: barry at python.org (Barry Warsaw)
Date: Wed Aug 13 07:47:19 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7A@its-xchg4.massey.ac.nz>
Message-ID: <1060775205.14202.8.camel@anthem>
On Wed, 2003-08-13 at 02:53, Meyer, Tony wrote:
> I should have said "search through the archives", not read. Googling
> for 'site:mail.python.org spambayes "n-way"' brings up lots of posts,
> which are probably the relevant ones, or at least a starting point.
> (Although the change from "pipermail-21" to "pipermail" breaks almost
> all the links, it's fixable by hand).
If "by hand" you mean one of mine that participated in adding a redirect
in mail.python.org's Apache config... you were right! Or left. Or,
well let's give both my hands a hand. :)
-Barry
From edrubins at andisplace.com Wed Aug 13 10:01:43 2003
From: edrubins at andisplace.com (Ed Rubinsky)
Date: Wed Aug 13 09:02:01 2003
Subject: [spambayes-dev] FAQ entry for Eudora clients
Message-ID: <5.1.0.14.0.20030813085310.00b118b0@localhost>
The attached diff will add a entry on configuring Eudora mail clients for
use with pop3proxy.py, as well as a few spelling corrections
Best, Ed
-------------- next part --------------
*** faq.txt Wed Aug 13 07:48:48 2003
--- orig_faq.txt Wed Aug 13 07:10:22 2003
***************
*** 71,77 ****
the `I'm not a programmer but still want to help`_ question for more
details.
! * Donate money to the Python Software Foundations. For more
information, including why you would want to donate to the PSF,
please see our `donations page`_.
--- 71,77 ----
the `I'm not a programmer but still want to help`_ question for more
details.
! * Dontate money to the Python Software Foundations. For more
information, including why you would want to donate to the PSF,
please see our `donations page`_.
***************
*** 92,98 ****
Spambayes or help other users.
2. The `Spambayes developers list`_ provides a forum for people
! maintaining and improving the package.
3. The `Spambayes announcements list`_ is a low-volume list where
announcements about new releases are posted.
--- 92,98 ----
Spambayes or help other users.
2. The `Spambayes developers list`_ provides a forum for people
! maintaining and improving the pacakge.
3. The `Spambayes announcements list`_ is a low-volume list where
announcements about new releases are posted.
***************
*** 344,415 ****
.. _IMAP: http://spambayes.sf.net/applications.html#imap
- How do I configure Eudora for use with Spambayes?
- -------------------------------------------------
-
- Note: The following instructions have been verified using Eudora 5.1
- under Windows. If anyone is using Eudora under Max OS please let us
- know if the configuration is the same as Windows.
-
- Eudora does not allow configuring the server port through the
- normaloptions dialogue. However a large number of options are exposed
- in an intitialization file (eudora.ini) read at startup. The contents
- of the initialization file are documented by clicking on Help->Topics
- and searching on EUDORA.INI (you may want to print this help page for
- future reference.) Depending on how you installed Eudora, eudora.ini
- is located either in the Eudora install directory or the user's
- setting directory
- (C:\Documents and Settings\userid\ApplicationData\Qualcomm\Eudora\eudora.ini on my system.)
-
- 1. Locate eudora.ini.
-
- 2. Make two copies - eudoraok.ini for backup and eudorame.ini to
- modify.
-
- 3. Configure pop3proxy for each of Eudora's personalities' POP3
- servers, specifying a separate port for each. For example 1110, 1120,
- 1130 and 1140 for four personalities. Do the same for smtpproxy - for
- example 1115, 1125, 1135 and 1145 corresponding to the four POP3
- servers.
-
- 4. Close Eudora.
-
- 5. Open eudorame.ini with a text editor - wordpad for example. DO NOT
- USE A WORD PROCESSOR TO EDIT THE INITIALIZATION FILE.
-
- 6. Ffind the section starting with [Settings]. This contains settings
- for the dominant personality.
-
- 7. Find the line beggining POPAccount. The last part of the account
- name starting with @ is the server. Change it to @localhost.
-
- 8. Find the lines beggining SMTPServer and POPServer. They will have
- the server names defined for your dominant personality.
-
- 9. Change both server names to localhost
-
- 10, Add the following two lines. Use whatever ports you assigned to
- pop3proxy and smtpproxy for the dominant personality.
- POPPort=1110
- SMTPPort=1115
-
- 11. Setting for other personalities are kept in sections begging with
- [Persona-personality_name]. For each personality make the same changes
- as you made for the dominant personality, substituting the proper port
- numbers.
-
- 12. Copy eudorame.ini to eudora.ini and re-start Eudora.
-
- 13. In the password dialog for each personality you should see
- localhost where you used to see the actual server name. This may take
- some getting used to at first. Since every personality will now have a
- server named localhost you will have to know what order Eudora prompts
- for the user id's and passwords.
-
- 14. If there are any problems, close Eudora, copy eudoraok.ini to
- eudora.ini and restart Eudora. This will restore Eudora's original
- configuration until the problem can be resolved.
-
Outlook Plugin
==============
--- 344,349 ----
From skip at pobox.com Wed Aug 13 10:36:42 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 10:36:54 2003
Subject: [spambayes-dev] missized fonts
Message-ID: <16186.19706.77495.119643@montanaro.dyndns.org>
I'm adding Ed Rubinsky's Eudora configuration q&a to the FAQ. I used
``...`` around some text I intended to be displayed in a fixed-width font.
All the text is rendered in my browser (Safari, Mac OS X) in Courier,
however all such text is about half the height of the surrounding regular
text. Here's how one little snippet is defined in HTML:
eudoraok.ini
Something needs tweaking in style.css, but I know next to nothing about it.
I added
TT.literal {
font-size: 12pt;
}
to style.css, because it appeared that the main text is supposed to be
rendered at that size. The fixed-width text grew, but is still smaller than
the surrounding text.
I checked in a modifies style.css and the updated FAQ, but can someone with
the proper CSS-fu tweak style.css appropriately?
Thanks,
Skip
From skip at pobox.com Wed Aug 13 10:37:05 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 10:37:15 2003
Subject: [spambayes-dev] FAQ entry for Eudora clients
In-Reply-To: <5.1.0.14.0.20030813085310.00b118b0@localhost>
References: <5.1.0.14.0.20030813085310.00b118b0@localhost>
Message-ID: <16186.19729.129989.21182@montanaro.dyndns.org>
Ed> The attached diff will add a entry on configuring Eudora mail
Ed> clients for use with pop3proxy.py, as well as a few spelling
Ed> corrections
Got it, thanks.
Skip
From kennypitt at hotmail.com Wed Aug 13 11:37:57 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Wed Aug 13 10:38:09 2003
Subject: [spambayes-dev] New Outlook Dialogs Problem
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
Message-ID: <3F3A4D45.2030304@hotmail.com>
Meyer, Tony wrote:
> If I haven't got enough training information to enable filtering, the
> "enable filtering" box isn't greyed out anymore. If I try to check it,
> it doesn't check, and I get this traceback:
>
> Traceback (most recent call last):
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 286, in OnCommand
> self.ApplyHandlingOptionValueError(handler.OnCommand, wparam,
> lparam)
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 245, in ApplyHandlingOptionValueError
> self.dialog_def.caption, mb_flags)
> AttributeError: ProcessorDialog instance has no attribute 'dialog_def'
>
> (I'd try and fix this myself, but the new dialog code still has me
> overwhelmed ;)
>
Here's a first stab at a fix. Maybe Mark can clean this up and plug any
additional holes that I didn't notice.
Index: dlgcore.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dlgcore.py,v
retrieving revision 1.2
diff -u -r1.2 dlgcore.py
--- dlgcore.py 10 Aug 2003 07:26:49 -0000 1.2
+++ dlgcore.py 13 Aug 2003 14:34:54 -0000
@@ -86,6 +86,9 @@
def DoModal(self):
return self._DoCreate(win32gui.DialogBoxIndirect)
+
+ def GetCaption(self):
+ return win32gui.GetWindowText(self.hwnd)
def GetMessageMap(self):
ret = {
@@ -242,7 +245,7 @@
except ValueError, why:
mb_flags = win32con.MB_ICONEXCLAMATION | win32con.MB_OK
win32gui.MessageBox(self.hwnd, str(why),
- self.dialog_def.caption, mb_flags)
+ self.GetCaption(), mb_flags)
return False
def SaveAllControls(self):
Index: dialog_map.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/Outlook2000/dialogs/dialog_map.py,v
retrieving revision 1.2
diff -u -r1.2 dialog_map.py
--- dialog_map.py 10 Aug 2003 07:26:49 -0000 1.2
+++ dialog_map.py 13 Aug 2003 14:34:46 -0000
@@ -35,6 +35,14 @@
0, db_status)
class FilterEnableProcessor(BoolButtonProcessor):
+ def OnOptionChanged(self, option):
+ self.Init()
+
+ def Init(self):
+ BoolButtonProcessor.Init(self)
+ reason = self.window.manager.GetDisabledReason()
+ win32gui.EnableWindow(self.GetControl(), reason is None)
+
def UpdateValue_FromControl(self):
check = win32gui.SendMessage(self.GetControl(),
win32con.BM_GETCHECK)
if check:
--
Kenny Pitt
From skip at pobox.com Wed Aug 13 10:54:47 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 10:55:02 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D7B@its-xchg4.massey.ac.nz>
Message-ID: <16186.20791.737601.992031@montanaro.dyndns.org>
>>>>> "Tony" == Tony Meyer writes:
>> I added a question and answer about this topic to the FAQ today. I
>> also added a toy app to the contrib directory (nway.py) modelled
>> after hammiefilter.py.
Tony> Any reason why the FAQ doesn't mention that script? BTW, did you
Tony> try the script out? If so, how did it go, at first glance?
I wrote the FAQ first, then later on decided to give the script a go. I
just updated the faq.
The script seems to work fine given the minimal amount of testing I've done.
I used the multiple mboxtrain.py runs scheme outlined in the docstring to
create five databases besides my usual spam database. The inputs were
existing mailboxes specific to each of the five subjects. I then ran a
small handful of messages from my incoming directory against the nway
script. It seemed to classify the messages properly. One Spambayes-related
message that found its way into my normal mbox was correctly classified as
being Python-related. (It had been sent in private mail, so didn't contain
any of the header flags my procmail recipes look for.)
Skip
From adam.walker at rbwconsulting.com Wed Aug 13 12:07:57 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Wed Aug 13 11:08:12 2003
Subject: [spambayes-dev] New Outlook Dialogs Problem
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
Message-ID: <20030813150808.5CC19862BD@plunder.dreamhost.com>
D'oh! I'll take the blame for the exception. I removed dialog_def when I
added a utility to remove dependence on the rc when making a binary.
However, the control not graying out is someone else's ;)
--Adam
> -----Original Message-----
> From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-
> bounces@python.org] On Behalf Of Meyer, Tony
> Sent: Wednesday, August 13, 2003 5:00 AM
> To: spambayes-dev@python.org
> Subject: [spambayes-dev] New Outlook Dialogs Problem
>
> If I haven't got enough training information to enable filtering, the
> "enable filtering" box isn't greyed out anymore. If I try to check it,
> it doesn't check, and I get this traceback:
>
> Traceback (most recent call last):
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 286, in OnCommand
> self.ApplyHandlingOptionValueError(handler.OnCommand, wparam,
> lparam)
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 245, in ApplyHandlingOptionValueError
> self.dialog_def.caption, mb_flags)
> AttributeError: ProcessorDialog instance has no attribute 'dialog_def'
>
> (I'd try and fix this myself, but the new dialog code still has me
> overwhelmed ;)
>
> =Tony Meyer
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
From kennypitt at hotmail.com Wed Aug 13 12:39:46 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Wed Aug 13 11:40:28 2003
Subject: [spambayes-dev] missized fonts
In-Reply-To: <16186.19706.77495.119643@montanaro.dyndns.org>
References: <16186.19706.77495.119643@montanaro.dyndns.org>
Message-ID: <3F3A5BC2.7060605@hotmail.com>
Skip Montanaro wrote:
> I'm adding Ed Rubinsky's Eudora configuration q&a to the FAQ. I used
> ``...`` around some text I intended to be displayed in a fixed-width font.
> All the text is rendered in my browser (Safari, Mac OS X) in Courier,
> however all such text is about half the height of the surrounding regular
> text. Here's how one little snippet is defined in HTML:
>
> eudoraok.ini
>
> Something needs tweaking in style.css, but I know next to nothing about it.
> I added
>
> TT.literal {
> font-size: 12pt;
> }
>
> to style.css, because it appeared that the main text is supposed to be
> rendered at that size. The fixed-width text grew, but is still smaller than
> the surrounding text.
>
> I checked in a modifies style.css and the updated FAQ, but can someone with
> the proper CSS-fu tweak style.css appropriately?
>
Courier and Courier New render smaller at the same point size than
Arial, Verdana, etc. Try something like:
TT.literal {
font-size: 110%;
}
Tweak the percentage until it looks right to you, and then just *hope*
that it renders the same for everyone else. ;-)
--
Kenny Pitt
From T.A.Meyer at massey.ac.nz Thu Aug 14 12:38:41 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 19:39:16 2003
Subject: [spambayes-dev] Correction and new question to the FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3EF9@its-xchg4.massey.ac.nz>
> > (Although the change from "pipermail-21" to "pipermail"
> > breaks almost all the links, it's fixable by hand).
>
> If "by hand" you mean one of mine that participated in adding
> a redirect in mail.python.org's Apache config... you were right!
> Or left. Or, well let's give both my hands a hand. :)
Thanks for that - it makes googling through the archives much simpler
again.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Thu Aug 14 13:24:01 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 20:24:46 2003
Subject: [spambayes-dev] missized fonts
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz>
> The fixed-width text grew, but is still smaller
> than the surrounding text.
This is what it looks like here (Windows, IE6, Opera, and Mozilla):
Is it smaller for you (mac<->windows font sizes have always been
problematic)? I like that the pre text is slightly smaller than the
surrounding text - it helps it stand out. It's definitely fixed at 12,
anyway, since if I shrink the page (ctrl-scroll wheeling down, whatever
that does...presumably makes everything "smaller" in css terms), it
looks like:
=Tony Meyer
From skip at pobox.com Wed Aug 13 21:48:57 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 13 21:49:13 2003
Subject: [spambayes-dev] missized fonts
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3F46@its-xchg4.massey.ac.nz>
Message-ID: <16186.60041.239700.150794@montanaro.dyndns.org>
>> The fixed-width text grew, but is still smaller than the surrounding
>> text.
Tony> This is what it looks like here (Windows, IE6, Opera, and Mozilla):
Tony>
Yeah, about like hat. Someone else (sorry, I forgot who already) indicated
that courier fonts tend to be smaller than their verdana, arial, etc
counterparts for the same point size.
Tony> ... It's definitely fixed at 12, anyway, since if I shrink the
Tony> page ...
Well, that's probably helped by the fact that I defined it that way. I was
just copying the setting for the body:
BODY { background: white;
color: #484848;
margin-right: 15%;
font-family: geneva, verdana, arial, "ms sans serif", sans-serif;
font-size: 12pt;
}
...
TT.literal {
font-size: 12pt;
}
Should it have been "font-size: 100%"? Someone suggested that, right?
Skip
From kiko at async.com.br Thu Aug 14 00:35:15 2003
From: kiko at async.com.br (Christian Reis)
Date: Wed Aug 13 22:36:33 2003
Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)
In-Reply-To:
References: <020901c35236$e5576f10$f502a8c0@eden>
Message-ID: <20030814023515.GO3095@async.com.br>
On Fri, Jul 25, 2003 at 07:25:48AM +0200, Martin v. L?wis wrote:
> "Mark Hammond" writes:
>
> > The "best" solution to this probably involves removing Python being
> > dependent on the locale - there is even an existing patch for that.
>
> While the feature is desirable, I don't like the patch it all. It
> copies the relevant code of Gnome glib, and I
> a) doubt it works on all systems we care about, and
I'm sorry you don't like the patch, but if there's something that can be
fixed, we will fix it :-)
Well, glib is known to be quite portable, and we would make sure that it
does run on the supported platforms before considering checking it in.
(I'm betting it does.)
> b) is too much code for us to maintain, and
It's not *that* much code, and we can rely on fixes that are produced to
glib being easily ported to us -- we get free maintenance of the code
if we choose to do so, actually.
> c) introduces yet another license (although the true authors
> of that code would be willing to relicense it)
Which means that c) is a non-issue?
> It would be better if system functions could be found for a
> locale-agnostic atof/strtod on all systems. For example, glibc
> has a strtod_l function, which expects a locale_t in addition
> to the char*.
Yes, but if all we were worried about was glibc, then point a) would be
a non-issue too. I imagine it's easier to make sure the code we *have*
runs on multiple platforms than trying to find and call code that *may*
exist on each given platform.
> It would be good if something similar was discovered for VC. Using
> undocumented or straight Win32 API functions would be fine.
> Unfortunately, the "true" source of atof (i.e. from conv.obj) is not
> shipped with MSVC :-(
I don't understand this bit. You'd rather use an undocumented API
function than an open source, well-tested, properly licensed set of
functions?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
From kiko at async.com.br Thu Aug 14 00:38:49 2003
From: kiko at async.com.br (Christian Reis)
Date: Wed Aug 13 22:39:01 2003
Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)
In-Reply-To:
References:
Message-ID: <20030814023849.GP3095@async.com.br>
On Fri, Jul 25, 2003 at 03:13:46AM -0400, Tim Peters wrote:
> [martin@v.loewis.de]
> > While the feature is desirable, I don't like the patch it all. It
> > copies the relevant code of Gnome glib, and I
> > a) doubt it works on all systems we care about, and
> > b) is too much code for us to maintain, and
> > c) introduces yet another license (although the true authors
> > of that code would be willing to relicense it)
>
> OTOH, even assuming "C" locale, Python's float<->string story varies across
> platforms anyway, due to different C libraries treating things like
> infinities, NaNs, signed zeroes, and the number of digits displayed in an
> exponent differently. This also has bad consequences, although one-platform
> programmers usually don't notice them (Windows programmers do more than
> most, because MS's C library can't read back the strings it produces for
> NaNs and infinities -- which Python also produces and can't read back in
> then).
>
> So it's not that the patch is too much code to maintain, it's not enough
> code to do the whole job <0.9 wink>.
My question, now, is if we would we be able to cobble something even more
magical into the g_ascii_* functions that makes Python more robust to
these changes (over time)?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
From mkoehler at stblaw.com Wed Aug 13 23:56:25 2003
From: mkoehler at stblaw.com (Koehler, Michael W)
Date: Wed Aug 13 22:56:36 2003
Subject: [spambayes-dev] Addition to Outlook FAQ
Message-ID:
Thanks to all for a great product!
Things have been working beautifully, then the other day my Inbox was once again filled with spam. My first thought was that SpamBayes had stopped filtering. Everything seemed to be working correctly, the spam just would not move to the Spam folder. After much fiddling I realized the obvious... My Spam folder was full. I did not bother to count, but deleting a few hundred got SpamBayes going again. I now auto-archive my spam folder. The limit seems to be 16,384. See http://support.microsoft.com/default.aspx?scid=kb;[LN];Q196494
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20030813/2f2afc7b/attachment.htm
From T.A.Meyer at massey.ac.nz Thu Aug 14 15:59:25 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Wed Aug 13 23:00:06 2003
Subject: [spambayes-dev] Addition to Outlook FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D91F08@its-xchg4.massey.ac.nz>
> Everything seemed to be working correctly, the
> spam just would not move to the Spam folder.
> After much fiddling I realized the obvious...
> My Spam folder was full. I did not bother to
> count, but deleting a few hundred got SpamBayes
> going again. I now auto-archive my spam folder.
> The limit seems to be 16,384.
Interestingly, Mark just checked in a comment about this (maybe someone
else ran into this and reported it to him).
Mark - do we add this to the FAQ, or do you want to put in some sort of
dialog to pop up when this happens, explaining the problem?
=Tony Meyer
From guido at python.org Wed Aug 13 21:34:16 2003
From: guido at python.org (Guido van Rossum)
Date: Wed Aug 13 23:35:22 2003
Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)
In-Reply-To: Your message of "Wed, 13 Aug 2003 23:35:15 -0300."
<20030814023515.GO3095@async.com.br>
References: <020901c35236$e5576f10$f502a8c0@eden>
<20030814023515.GO3095@async.com.br>
Message-ID: <200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com>
> I don't understand this bit. You'd rather use an undocumented API
> function than an open source, well-tested, properly licensed set of
> functions?
I don't know what the exact requirements of this license are, but I
assure you that redistributing code that is not under the PSF license
is a pain, even if it's an open source license. If we can get the
original authors to contribute the code to the PSF without the
requirement to include a license of any kind (beyond the PSF license)
in redistributions, either by the PSF or downstream, even if those
redistributions are commercial or contain proprietary code in addition
to open source code. This is what's possible with the PSF license,
and that needs to remain the case. In particular, the GPL is *not*
acceptable for this purpose.
--Guido van Rossum (home page: http://www.python.org/~guido/)
From kiko at async.com.br Thu Aug 14 01:48:03 2003
From: kiko at async.com.br (Christian Reis)
Date: Wed Aug 13 23:48:37 2003
Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)
In-Reply-To: <200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com>
References: <020901c35236$e5576f10$f502a8c0@eden>
<20030814023515.GO3095@async.com.br>
<200308140334.h7E3YGa02755@12-236-84-31.client.attbi.com>
Message-ID: <20030814034803.GB5693@async.com.br>
On Wed, Aug 13, 2003 at 08:34:16PM -0700, Guido van Rossum wrote:
> > I don't understand this bit. You'd rather use an undocumented API
> > function than an open source, well-tested, properly licensed set of
> > functions?
>
> I don't know what the exact requirements of this license are, but I
> assure you that redistributing code that is not under the PSF license
> is a pain, even if it's an open source license. If we can get the
> original authors to contribute the code to the PSF without the
> requirement to include a license of any kind (beyond the PSF license)
> in redistributions, either by the PSF or downstream, even if those
> redistributions are commercial or contain proprietary code in addition
> to open source code. This is what's possible with the PSF license,
You omit the predicate that follows this if clause, but I'm hoping you
meant something positive like `we will gladly accept it'
I'm waiting on Alex's answer on relicensing the code, but he's said on
IRC that he'd be willing to do it, so barring any environmental
disasters, that should be solved sometime soon.
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
From adam.walker at rbwconsulting.com Thu Aug 14 18:58:28 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Thu Aug 14 17:58:50 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <20030814215845.D46BA13E248@sack.dreamhost.com>
I finally got around to installing MS Visual Studio on my machine the other
day and have been hacking around in the dialog stuff. I tried using WEdit
from win32-lcc and gave up.
Linked is a screenshot of the manager dialog with a SpamBayes logo I threw
together. I'll submit a patch after I iron out of the details of the other
dialogs.
http://meta.xenogeist.com/images/manager.jpg
Feedback?
--Adam
From mhammond at skippinet.com.au Fri Aug 15 09:58:12 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 14 18:58:00 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <20030814215845.D46BA13E248@sack.dreamhost.com>
Message-ID: <085d01c362b7$8a85af20$f502a8c0@eden>
That is great! From my eye, I would suggest:
* Drop some of the vspace at the top of the logo before the word.
* Drop the "Outlook Addin" part - then we have a generic logo every app can
use.
If you are comfortable with the code changes that will come with it, just
check it in rather than going via a patch (but obviously mail if you need
guidance)
Cool :)
Mark.
> -----Original Message-----
> From: spambayes-dev-bounces@python.org
> [mailto:spambayes-dev-bounces@python.org]On Behalf Of Adam Walker
> Sent: Friday, 15 August 2003 7:58 AM
> To: spambayes-dev@python.org
> Subject: [spambayes-dev] Dialog Hacking
>
>
> I finally got around to installing MS Visual Studio on my
> machine the other
> day and have been hacking around in the dialog stuff. I tried
> using WEdit
> from win32-lcc and gave up.
> Linked is a screenshot of the manager dialog with a SpamBayes
> logo I threw
> together. I'll submit a patch after I iron out of the details
> of the other
> dialogs.
>
> http://meta.xenogeist.com/images/manager.jpg
>
> Feedback?
> --Adam
>
>
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
From adam.walker at rbwconsulting.com Thu Aug 14 21:34:14 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Thu Aug 14 20:34:40 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <085d01c362b7$8a85af20$f502a8c0@eden>
Message-ID: <20030815003436.8A28A8627F@plunder.dreamhost.com>
Here's take two ;)
http://meta.xenogeist.com/images/manager2.jpg
It's mainly the gui for the timer delays I'm worried about (and that's
giving me fits. Damn sliders.).
The logo is in Jasc Paint Shop Pro 8 format. Should I check that file in as
well as the bmp?
--Adam
[Mark Hammond]
> -----Original Message-----
>
> That is great! From my eye, I would suggest:
> * Drop some of the vspace at the top of the logo before the word.
> * Drop the "Outlook Addin" part - then we have a generic logo every app
> can
> use.
>
> If you are comfortable with the code changes that will come with it, just
> check it in rather than going via a patch (but obviously mail if you need
> guidance)
>
> Cool :)
>
> Mark.
From T.A.Meyer at massey.ac.nz Fri Aug 15 13:55:34 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 14 20:56:19 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz>
> Here's take two ;) http://meta.xenogeist.com/images/manager2.jpg
This is much better. I couldn't really read the "Outlook plugin" bit
because the gradient was too wide, and too dark in parts.
There's still too much (IMO) space between the logo and the rest of the
dialog (the black bit). This could just be a couple of pixels (again,
IMO).
(I would like a full-stop at the end of the explanation of training too,
but that isn't a dialog design issue).
> It's mainly the gui for the timer delays I'm worried about
> (and that's giving me fits. Damn sliders.).
Is this what you are putting in the (returned!) advanced section? Or
are you putting it in the filter dialog somewhere?
I think that the new dialogs should expose the four 'read status'
options as well. I haven't seen anything much in the way of problems
reported about them, and they are often requested.
> The logo is in Jasc Paint Shop Pro 8 format. Should I check
> that file in as well as the bmp?
Yes. There's nothing worse than working with a bitmap image that was
originally a vector image. You could check in some other form of vector
image that isn't PSP only if you'd rather (something that the Gimp or
Photoshop could open), but it doesn't really matter.
=Tony Meyer
[ While you're playing with things, you should put in an easter egg of
sorts that opens up a browser window with python.org if you click on the
Python Powered image... ;) ]
From mhammond at skippinet.com.au Fri Aug 15 12:10:32 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 14 21:10:14 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <08ba01c362ca$0785a220$f502a8c0@eden>
Oops - I didn't see the spambayes-dev CC! I just sent this to Adam:
> It's mainly the gui for the timer delays I'm worried about (and that's
> giving me fits. Damn sliders.).
Are my "processors" making sense? Note I was considering dropping these
values down to integer seconds for the "real" version - so if that makes
stuff easier that is fine (we *will* move the option names for this stuff,
so we can change it. We may migrate values, but obviously can handle that
too). I really don't think we should expose "ms" in the UI - at the worst,
I think it should be fractions of a second (eg, slider has a 1, 1.5, 2)
sequence. What do you think?
Should we be considering using property pages?
> The logo is in Jasc Paint Shop Pro 8 format. Should I check
> that file in as
> well as the bmp?
Yeah, I guess so. Although maybe not with the psp logo in the Outlook tree.
Maybe something like:
spambayes/logos - new directory - pspro file here
spambayes/Outlook/dialogs/somewhere - bmp + readme here.
spambayes/somewhere/- jpg used by pop3proxy
Not sure about "logos" - maybe a name more general purpose - but we already
have the "website" directory, so I can't see, eg, documents ever living
here, so I am back to "logos" :)
Maybe take this one back to spambayes-dev, and just check the .bmp in
Outlook whereever it makes sense, following up with the "source file" later.
Remember to add this stuff with "-kb" so they are flagged as binary.
Thanks,
Mark.
> --Adam
>
> [Mark Hammond]
> > -----Original Message-----
> >
> > That is great! From my eye, I would suggest:
> > * Drop some of the vspace at the top of the logo before the word.
> > * Drop the "Outlook Addin" part - then we have a generic
> logo every app
> > can
> > use.
> >
> > If you are comfortable with the code changes that will come
> with it, just
> > check it in rather than going via a patch (but obviously
> mail if you need
> > guidance)
> >
> > Cool :)
> >
> > Mark.
>
>
From mhammond at skippinet.com.au Fri Aug 15 12:16:22 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 14 21:16:11 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz>
Message-ID: <08c101c362ca$d7a77e60$f502a8c0@eden>
> I think that the new dialogs should expose the four 'read status'
> options as well. I haven't seen anything much in the way of problems
> reported about them, and they are often requested.
The simple way to start here is in the "Filter" dialog, and a simple
checkbox in both the "Spam" and "Unsure" sections saying something like "and
mark the message as read". This would handle the common cases. I'm worried
about overwhelming the dialogs.
> [ While you're playing with things, you should put in an easter egg of
> sorts that opens up a browser window with python.org if you
> click on the
> Python Powered image... ;) ]
yeah yeah - let's make an easter egg - i've never done that :) Sounds like
more fun that sorting bug dupes :)
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1964 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/5441fd35/winmail.bin
From adam.walker at rbwconsulting.com Thu Aug 14 22:43:45 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Thu Aug 14 21:44:19 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D921AD@its-xchg4.massey.ac.nz>
Message-ID: <20030815014410.5CDB513E240@sack.dreamhost.com>
> There's still too much (IMO) space between the logo and the rest of the
> dialog (the black bit). This could just be a couple of pixels (again,
> IMO).
Done.
>
> (I would like a full-stop at the end of the explanation of training too,
> but that isn't a dialog design issue).
Full-stop. Period. Dot. Whatever, it's there now ;)
http://meta.xenogeist.com/images/manager3.jpg
>
> > It's mainly the gui for the timer delays I'm worried about
> > (and that's giving me fits. Damn sliders.).
>
> Is this what you are putting in the (returned!) advanced section? Or
> are you putting it in the filter dialog somewhere?
Yep. The timers and verbose logging options are going under "Advanced".
>
> I think that the new dialogs should expose the four 'read status'
> options as well. I haven't seen anything much in the way of problems
> reported about them, and they are often requested.
I put two of them in the filter dialog. I missed the other two (in the
general section) until you said that.
>
> > The logo is in Jasc Paint Shop Pro 8 format. Should I check
> > that file in as well as the bmp?
>
> Yes. There's nothing worse than working with a bitmap image that was
> originally a vector image. You could check in some other form of vector
> image that isn't PSP only if you'd rather (something that the Gimp or
> Photoshop could open), but it doesn't really matter.
I've switched to photoshop (psd) format as all three programs can read psd
files. Or at least photoshop version 3 files. BTW, I think you mean layered
not vector. Last time I tried, neither photoshop nor the gimp support vector
graphics, but paintshop does to some extent.
--Adam
From adam.walker at rbwconsulting.com Thu Aug 14 22:47:26 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Thu Aug 14 21:47:52 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <08ba01c362ca$0785a220$f502a8c0@eden>
Message-ID: <20030815014748.50F1113E213@sack.dreamhost.com>
> spambayes/logos - new directory - pspro file here
> spambayes/Outlook/dialogs/somewhere - bmp + readme here.
> spambayes/somewhere/- jpg used by pop3proxy
>
> Not sure about "logos" - maybe a name more general purpose - but we
> already
> have the "website" directory, so I can't see, eg, documents ever living
> here, so I am back to "logos" :)
Here's some options...
*graphics
*gfx
*images
*resources (we already have spambayes/spambayes/resources and
spambayes/outlook2000/dialogs/resources)
*use the pre-existing contrib directory
From T.A.Meyer at massey.ac.nz Fri Aug 15 15:31:17 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 14 22:32:05 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92241@its-xchg4.massey.ac.nz>
> > Not sure about "logos" - maybe a name more general purpose - but we
> > already have the "website" directory, so I can't see, eg, documents
> > ever living here, so I am back to "logos" :)
>
> Here's some options...
[...]
> *resources (we already have spambayes/spambayes/resources and
> spambayes/outlook2000/dialogs/resources)
> *use the pre-existing contrib directory
If this is something that the others apps (like the web UI) will use,
then I think putting them in spambayes/spambayes/resources makes sense -
that's where the existing (non-Outlook) ones are. Anything that's
*only* for Outlook can go in spambayes/Outlook2000/dialogs/resources.
Resources includes all the files that are neither Python scripts nor
text files (html and images, at the moment). It has both the 'raw'
format (eg jpg) and the format used by the interface (.py via
resourcepackage).
Since the plugin already uses modules from spambayes/spambayes, also
getting stuff from sp/sp/resources doesn't seem like a big deal.
The contrib directory is more for 'optional extra' type things, I think
(like Skip's n-way, and the procmail recipe), rather than required
resources like the logo.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Fri Aug 15 15:36:27 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 14 22:37:10 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92247@its-xchg4.massey.ac.nz>
> The simple way to start here is in the "Filter" dialog, and a
> simple checkbox in both the "Spam" and "Unsure" sections
> saying something like "and mark the message as read". This
> would handle the common cases. I'm worried about
> overwhelming the dialogs.
Fair enough. Those are the simpler options anyway, and probably the
more requested ones. Given that the dialogs don't have anything at all
(IIRC) about the delete/recover buttons, there isn't a logical place for
them.
Maybe if the advanced dialog does make a comeback (well, the button
makes a comeback, and the dialog arrives), they could slot into there?
There isn't much else to go there really (at the moment).
> yeah yeah - let's make an easter egg - i've never done that
> :) Sounds like more fun that sorting bug dupes :)
Of course, discussing a hidden feature in a public forum isn't the
cleverest thing ;) You'll have to come up with something else (possibly
as well :) and let us go-a-lookin'.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Fri Aug 15 15:43:45 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 14 22:44:26 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92255@its-xchg4.massey.ac.nz>
[...]
> http://meta.xenogeist.com/images/manager3.jpg
Yes, I like this a lot more now.
> > I think that the new dialogs should expose the four 'read status'
> > options as well.
> I put two of them in the filter dialog. I missed the other
> two (in the general section) until you said that.
See also the message from/to Mark.
> I've switched to photoshop (psd) format as all three programs
> can read psd files. Or at least photoshop version 3 files.
Sounds good.
> BTW, I think you mean layered not vector. Last time I tried,
> neither photoshop nor the gimp support vector graphics, but
> paintshop does to some extent.
Well, I don't know how you've drawn it, but I probably mean layered
vector graphics. Layered as in the text and background are separately
editable, which is important, but even more so that the text is stored
as a vector rather than a bitmap, so that it is easily resizable (which
is the edit that I find myself needing to do most on logo-type images).
Both PSP & Gimp can do this - it's a long, long, time since I've used
Photoshop, but it presumably can too since it's meant to be the king of
such things.
=Tony Meyer
From mhammond at skippinet.com.au Fri Aug 15 13:46:49 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 14 22:46:45 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92241@its-xchg4.massey.ac.nz>
Message-ID: <092701c362d7$7a4e0510$f502a8c0@eden>
> If this is something that the others apps (like the web UI) will use,
> then I think putting them in spambayes/spambayes/resources
> makes sense -
How about "ui_resources" or some such - just "resources" isn't descriptive
enough, whereas "dialogs/resources" makes more sense due to the context.
However, I'm still not sure what would ever go in this directory except
images. "ui_resources" could still imply Python code that manages a
cross-application UI, for example.
So I still lean towards "images" or "logos" simply as it describes exactly
the only things I see being stored there.
> Since the plugin already uses modules from spambayes/spambayes, also
> getting stuff from sp/sp/resources doesn't seem like a big deal.
Nah.
And re the easter eggs:
> Of course, discussing a hidden feature in a public forum
> isn't the cleverest thing ;) You'll have to come up with
> something else (possibly as well :) and let us go-a-lookin'.
There is no way I am going to even *try* to be clever enough to hide it from
this crowd in an open source app
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2172 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/eadb502e/winmail.bin
From skip at pobox.com Fri Aug 15 01:08:57 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Aug 15 01:09:09 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <20030815003436.8A28A8627F@plunder.dreamhost.com>
References: <085d01c362b7$8a85af20$f502a8c0@eden>
<20030815003436.8A28A8627F@plunder.dreamhost.com>
Message-ID: <16188.27369.234971.796179@montanaro.dyndns.org>
Adam> Here's take two ;)
Not to be a wet blanket, but I'm not really keen on the colors, the
gradients or the very busy background, but maybe that's just me.
Adam> The logo is in Jasc Paint Shop Pro 8 format. Should I check that
Adam> file in as well as the bmp?
I would only check stuff in which can be directly used (jpeg, png, bmp are
okay). I have no idea what tools I might have at my disposal to manipulate
Paint Shop Pro format images.
Skip
From tys at tvg.ca Fri Aug 15 00:04:43 2003
From: tys at tvg.ca (Tys von Gaza)
Date: Fri Aug 15 01:16:18 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <16188.27369.234971.796179@montanaro.dyndns.org>
References: <085d01c362b7$8a85af20$f502a8c0@eden><20030815003436.8A28A8627F@plunder.dreamhost.com>
<16188.27369.234971.796179@montanaro.dyndns.org>
Message-ID: <3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca>
Skip> I would only check stuff in which can be directly used (jpeg, png,
bmp are
Skip> okay). I have no idea what tools I might have at my disposal to
Skip> manipulate
Skip> Paint Shop Pro format images.
Can't do any easy editing to a jpeg, png, bmp though. It would be like
checking in a binary with no source files. The PSP or PSD are like the
source files where it is much easier to make graphical changes, imho it
would be a good idea to include them.
--
Tys von Gaza
tys@tvg.ca
From T.A.Meyer at massey.ac.nz Fri Aug 15 18:15:59 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 15 01:16:45 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92311@its-xchg4.massey.ac.nz>
> Not to be a wet blanket, but I'm not really keen on the
> colors, the gradients or the very busy background, but maybe
> that's just me.
I must admit that I'm not a fan of the busy background either, but then
I'm no expert on logo design...
The colours could perhaps match those on the website, assuming that
there's a reason that the website has those colours...I don't really
care about the gradient.
So this isn't just negative ;) I do like that it uses a simple font, and
the (presence of the) subtle-but-there Python-powered logo.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Fri Aug 15 18:22:12 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 15 01:22:51 2003
Subject: [spambayes-dev] Dialog Hacking
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92315@its-xchg4.massey.ac.nz>
> How about "ui_resources" or some such - just "resources"
> isn't descriptive enough, whereas "dialogs/resources" makes
> more sense due to the context.
It depends if this is for things that are just used by Outlook's
dialogs, or by other things (like the web UI) as well. If it's just for
Outlook, then something like sb/sb/Outlook2000/dialogs/images would be
best, but if it's more general, then sb/sb/resources/images perhaps?
> However, I'm still not sure what would ever go in this
> directory except images.
Well, sb/sb/resources has the web ui's html pages and images (plus the
corresponding resourcepackage .py files).
(Something else that might go in either directory are sound files.
Anyone up for a cute sound to be played when you click "delete as spam"?
)
=Tony Meyer
From skip at pobox.com Fri Aug 15 01:23:22 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Aug 15 01:23:36 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca>
References: <085d01c362b7$8a85af20$f502a8c0@eden>
<20030815003436.8A28A8627F@plunder.dreamhost.com>
<16188.27369.234971.796179@montanaro.dyndns.org>
<3860.142.179.244.169.1060923883.squirrel@mail.tvg.ca>
Message-ID: <16188.28234.936527.569022@montanaro.dyndns.org>
>>>>> "Tys" == Tys von Gaza writes:
Skip> I would only check stuff in which can be directly used (jpeg, png,
Skip> bmp are okay). I have no idea what tools I might have at my
Skip> disposal to manipulate Paint Shop Pro format images.
Tys> Can't do any easy editing to a jpeg, png, bmp though. It would be
Tys> like checking in a binary with no source files. The PSP or PSD are
Tys> like the source files where it is much easier to make graphical
Tys> changes, imho it would be a good idea to include them.
What are we checking in, images for websites or layers for people to edit?
It's been awhile since I used Gimp, but it has a native format as well and
has the added benefit of being open source and very widely available. (I
have it on my Mac even.)
Skip
From mhammond at skippinet.com.au Fri Aug 15 17:14:23 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Aug 15 02:14:10 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <16188.27369.234971.796179@montanaro.dyndns.org>
Message-ID: <002101c362f4$79696b40$f502a8c0@eden>
> I would only check stuff in which can be directly used (jpeg,
> png, bmp are
> okay).
I can see the guy's problem though - these really can't be used effectively
as a "source" format. Unfortunately, I don't know enough about the tools or
formats to have a reasonable opinion beyond that though.
So personally, I am happy with whatever it was Tony and Adam agreed on.
Mark.
From mhammond at skippinet.com.au Fri Aug 15 17:21:18 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Aug 15 02:21:04 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <16188.28234.936527.569022@montanaro.dyndns.org>
Message-ID: <002801c362f5$70afe0f0$f502a8c0@eden>
> What are we checking in, images for websites or layers for
> people to edit?
> It's been awhile since I used Gimp, but it has a native
> format as well and
> has the added benefit of being open source and very widely
> available. (I
> have it on my Mac even.)
The "source code" analogy was good. We want to check in the source to the
images (the layers etc) so that future tweaks are reasonable. However, as
the tools are either not freely or not commonly available to convert from
the source to the binary (eg, jpeg for the website), we are also forced to
check in the binaries. This situtation is not good, but the only other
reasonable alternative is to check in binaries only - which seems worse to
me.
So the suggestion was to check the "source" into a special/reasonable
directory, and check the "binary" version whereever it makes most sense for
the consumer of the binary - eg, Outlook\dialogs\logo.bmp for Outlook,
website/logo.jpg for the website, etc.
At least, that is what I am talking about
Mark.
From adam.walker at rbwconsulting.com Fri Aug 15 11:56:02 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Fri Aug 15 10:56:24 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <16188.28234.936527.569022@montanaro.dyndns.org>
Message-ID: <20030815145613.23A8013E213@sack.dreamhost.com>
> What are we checking in, images for websites or layers for people to edit?
> It's been awhile since I used Gimp, but it has a native format as well and
> has the added benefit of being open source and very widely available. (I
> have it on my Mac even.)
The problem with using the Gimp's format is the Gimp is the only program
that reads it. The last time I used the Gimp on windows (~3 months ago) it
didn't seem ready for day-in-day-out use as a graphic artist. Photoshop, The
Gimp, and Paint Shop Pro can all read and write psd files -- so it's a happy
medium.
--Adam
From skip at pobox.com Fri Aug 15 11:38:26 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Aug 15 11:38:43 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <20030815145613.23A8013E213@sack.dreamhost.com>
References: <16188.28234.936527.569022@montanaro.dyndns.org>
<20030815145613.23A8013E213@sack.dreamhost.com>
Message-ID: <16188.65138.238776.506389@montanaro.dyndns.org>
Adam> Photoshop, The Gimp, and Paint Shop Pro can all read and write psd
Adam> files -- so it's a happy medium.
That's fine. I was unaware of the problem.
Skip
From adam.walker at rbwconsulting.com Fri Aug 15 12:52:03 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Fri Aug 15 11:52:13 2003
Subject: [spambayes-dev] Dialog Hacking
In-Reply-To: <16188.65138.238776.506389@montanaro.dyndns.org>
Message-ID: <20030815155210.71D2613E23D@sack.dreamhost.com>
Take four http://meta.xenogeist.com/images/manager4.jpg
I tried it with a flat blue background and opted for the gradient from the
website.
--Adam
From skip at pobox.com Fri Aug 15 11:54:27 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Aug 15 11:54:43 2003
Subject: [spambayes-dev] Re: [Spambayes] mboxtrain ham headers overwritten
In-Reply-To: <4793.142.179.244.169.1060929746.squirrel@mail.tvg.ca>
References: <2598.129.128.138.135.1060907963.squirrel@mail.tvg.ca>
<4793.142.179.244.169.1060929746.squirrel@mail.tvg.ca>
Message-ID: <16189.563.364259.921642@montanaro.dyndns.org>
Tys> Ok, found my error, and of course it was stupid but I didn't see it
Tys> documented anywhere, so here it is.
Tys> I had the following set in my ~/.spambayesrc
Tys> [globals]
Tys> verbose=True
Tys> This caused the following lines to be added to the start of each
Tys> e-mail message that got filtered through procmail:
Tys> """
Tys> Loading state from /home/gaza/maildata/bayes.db database
Tys> /home/gaza/maildata/bayes.db is an existing database, with 69 spam and 21 ham
Tys> """
I think those messages should go to stderr. storage.py is littered with
prints to stdout when verbose is set.
Skip
From neale at woozle.org Fri Aug 15 12:52:52 2003
From: neale at woozle.org (Neale Pickett)
Date: Fri Aug 15 14:52:58 2003
Subject: [spambayes-dev] ["WatchGuard LiveSecurity"] Keep Spam at Bay with
SpamBayes
Message-ID:
I thought you folks would like to see the message that went out last
week to WatchGuard's LiveSecurity subscription service. It goes out to
an audience of 40,000, I'm told. Maybe something in here for
quotes.html?
(Sorry it's all HTML--I'm passing the body along exactly as I recieved
it.)
-------------- next part --------------
An embedded message was scrubbed...
From: "WatchGuard LiveSecurity"
Subject: LiveSecurity | Keep Spam at Bay with SpamBayes
Date: 8 Aug 2003 11:41:34 -0700
Size: 24202
Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030815/77a03eaf/attachment.eml
From skip at pobox.com Fri Aug 15 15:14:57 2003
From: skip at pobox.com (Skip Montanaro)
Date: Fri Aug 15 15:15:11 2003
Subject: [spambayes-dev] Regarding the WatchGuard article about SpamBayes
Message-ID: <16189.12593.462122.385869@montanaro.dyndns.org>
Neale Pickett sent a copy of your article about SpamBayes to the SpamBayes
developers mailing list. I skimmed it and thought I would send you a little
feedback:
1. You mention that it might not scale well for large organizations. I
assume you stated this because the Outlook plugin must be installed on
the users' computers. I agree installation might be troublesome, however
on the plus side, since the plugin runs on the client machines, it's not
going to burn up your mail servers' cpus. From that perspective it
probably scales better than a server-based solution. It also has the
added functional benefit that unlike most centralized solutions,
SpamBayes allows each user to define what is and is not spam to them.
2. You linked directly to the 006 version of the installer. Note that the
007 version has already been released. You'd be much better off linking
to the Windows page on the SpamBayes website:
http://spambayes.sourceforge.net/windows.html
3. I'm sure most of your readers use Windows & Outlook, but it might be
worth noting that there are a number of other SpamBayes applications
which allow you to integrate the technology into other platforms (Unix,
Mac, etc) or with other mail readers. In particular, there are POP3 and
IMAP proxies (both of which are controlled via a web interface) and a
simple filter which takes a message on stdin, scores it, and writes it to
stdout with the score carried in a new header.
--
Skip Montanaro
Got gigs? http://www.musi-cal.com/
Got spam? http://spambayes.sf.net/
From anthony at interlink.com.au Sat Aug 16 18:04:41 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sat Aug 16 03:04:52 2003
Subject: [spambayes-dev] SF rankings...
Message-ID: <200308160704.h7G74g4f016971@localhost.localdomain>
Well, in the week since the outlook addin was moved to SF,
the spambayes project's gone from 97th to 14th in the rankings.
(The 11th of August saw 2,068 downloads!)
(Is it worth changing the project title from
"Bayesian anti-spam classifier" to something more
descriptive?)
2068-times-$0-is-still-$0-sorry-mark,
Anthony
From T.A.Meyer at massey.ac.nz Sat Aug 16 20:12:48 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sat Aug 16 03:13:27 2003
Subject: [spambayes-dev] SF rankings...
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92365@its-xchg4.massey.ac.nz>
> Well, in the week since the outlook addin was moved to SF,
> the spambayes project's gone from 97th to 14th in the
> rankings. (The 11th of August saw 2,068 downloads!)
That damn 'Python' project is still at #4 for the 'all time' rankings,
though ;) We have a way to go yet...
BTW I noticed that it's only 19 days until SpamBayes' first birthday
(taking birth as the project opening on sf - any prior life within
Python is really just time in the womb ;)
Maybe we could 'celebrate' by putting out 1.0b1 then?
=Tony Meyer
From tim.one at comcast.net Sat Aug 16 13:25:41 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Aug 16 12:26:16 2003
Subject: [spambayes-dev] World domination
Message-ID:
Following up on my Machiavellian plan to release the spambayes Outlook addin
from SourceForge, the spambayes project ranked 99.9551% at SF last week, and
is now on the front page as the 7th "most active" project (of 67,000) at SF
last week. There have been at least 4,300 downloads of the OL addin from
SF.
Congratulations! I wish I could claim more credit for myself, but I've had
little to do with it since last year. The credit belongs to the currently
active developers, who've wrestled tirelessly with never-ending nightmares
from Outlook to IMAP. Thanks to all -- great work!
Neverthless, beatings will continue until it's #1 . To help brand
recognition, I've changed the SF "public name" of the project from "Bayesian
anti-spam classifier" to "SpamBayes anti-spam".
taking-thrills-where-i-can-find-'em-ly y'rs - tim
From tim.one at comcast.net Sat Aug 16 13:44:52 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Aug 16 12:45:34 2003
Subject: [spambayes-dev] SF rankings...
In-Reply-To: <200308160704.h7G74g4f016971@localhost.localdomain>
Message-ID:
[Anthony Baxter]
> Well, in the week since the outlook addin was moved to SF,
> the spambayes project's gone from 97th to 14th in the rankings.
> (The 11th of August saw 2,068 downloads!)
It's at #7 now.
> (Is it worth changing the project title from
> "Bayesian anti-spam classifier" to something more
> descriptive?)
Harmonic convergence! I did that before seeing your email -- see my later
email.
> 2068-times-$0-is-still-$0-sorry-mark,
Don't feel too badly for Mark -- I have it on good authority that his is now
the most widely recognized face in Australia. It's not from the photograph
that ran in the Aussie press, it's actually from the "Delete As Spam" frowny
face icon in the OL addin .
From richie at entrian.com Sun Aug 17 22:29:47 2003
From: richie at entrian.com (Richie Hindle)
Date: Sun Aug 17 16:29:58 2003
Subject: [spambayes-dev] Calling Outlook Express users...
Message-ID:
...on this list? I must be joking.
Romain Guy has posted a patch and a new module that lets the web interface
train on Outlook Express mailboxes, by uploading a .dbx file in the same
was as you can upload an mbox file:
http://sourceforge.net/tracker/?func=detail&atid=498105&aid=789916&group_id=61702
I'm happy to code-review and commit the patch, but not being an Outlook
Express user I can't test it.
Are there any Spambayes developers who are also Outlook Express users?
Feel free to email me privately if you don't want to admit it here. 8-)
--
Richie Hindle
richie@entrian.com
From adam.walker at rbwconsulting.com Sun Aug 17 18:31:48 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 17:31:55 2003
Subject: [spambayes-dev] Outlook Manager dialog
Message-ID: <20030817213151.B296413E26B@sack.dreamhost.com>
Ok, The new dialog stuff is checked to cvs. This includes the logo (bmp only
currently), new options, and switched most of the dialogs to property pages
on the manager dialog. Thanks to the rc file and Mark's control processor
framework moving controls to another property page is mostly trivial.
--Adam
From mhammond at skippinet.com.au Mon Aug 18 09:44:39 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 17 18:44:19 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <20030817213151.B296413E26B@sack.dreamhost.com>
Message-ID: <0c5b01c36511$24cfe680$f502a8c0@eden>
> Ok, The new dialog stuff is checked to cvs. This includes the
> logo (bmp only
> currently), new options, and switched most of the dialogs to
> property pages
> on the manager dialog. Thanks to the rc file and Mark's
> control processor
> framework moving controls to another property page is mostly trivial.
It looks pretty good. I've a few comments though. I think the property
pages may have gone too far, and should reflect the workflow a little
closer.
Off the top of my head:
* I think "training" should be the second property page - training is the
first thing the user will need to do.
* "Spam" and "Possible Spam" should maybe back back on a single "Filter"
page
This leaves us with:
General, Training, Filter, Filter Now, Advanced
There are a few other tweaks, such as the combos should be list-boxes etc,
but it is looking good. Thanks!
I'd really like to hear other people's opinions on this too.
Mark.
From adam.walker at rbwconsulting.com Sun Aug 17 20:06:06 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 19:06:17 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <0c5b01c36511$24cfe680$f502a8c0@eden>
Message-ID: <20030817230614.B20B213E235@sack.dreamhost.com>
> * I think "training" should be the second property page - training is the
> first thing the user will need to do.
Sure.
> * "Spam" and "Possible Spam" should maybe back back on a single "Filter"
> page
This is really a question of how big each page should be. I have no qualms
with recombining the two ... it's just a lot longer than the other pages.
From anthony at interlink.com.au Mon Aug 18 11:07:01 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Aug 17 20:07:57 2003
Subject: [spambayes-dev] SF rankings...
In-Reply-To:
Message-ID: <200308180007.h7I0712U024089@localhost.localdomain>
>>> "Tim Peters" wrote
> Don't feel too badly for Mark -- I have it on good authority that his is now
> the most widely recognized face in Australia. It's not from the photograph
> that ran in the Aussie press, it's actually from the "Delete As Spam" frowny
> face icon in the OL addin .
I have to say, the resemblance _is_ uncanny.
On a more topical note, SB developers might note that Microsoft last week
announced that they were ending Outlook Express development. It will still
be available, but will gradually rot away in favour of the full Outlook.
Amusingly, one of the reasons I saw reported was that Outlook's rather
large footprint is now less of a concern because computers are faster and
have more memory.
Well-if-MS-won't-support-it-why-should-we,
Anthony
--
Anthony Baxter
It's never too late to have a happy childhood.
From T.A.Meyer at massey.ac.nz Mon Aug 18 13:26:56 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 20:27:43 2003
Subject: [spambayes-dev] Outlook Manager dialog
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258A@its-xchg4.massey.ac.nz>
> * "Spam" and "Possible Spam" should maybe back on a
> single "Filter" page
+1
> This leaves us with:
> General, Training, Filter, Filter Now, Advanced
To me it seems odd that the general page has a 'training' section, and
then there is a separate 'training' tab. There seem to be three
sections - training, filtering and advanced. Squishing everything into
three pages probably means pages that are too big, but maybe not.
The layout I think makes most sense:
Page 1: Training. Includes the current training box from 'general', and
all of the content of the training tab. Results in a larger page, but
only slightly (also see page 3).
Page 2: Filtering. Includes the current filtering box from 'general',
and all of the content of the spam and unsure tabs. Results in a larger
page, but not too much, and makes more sense (also see page 3). The
"mark spam as read" option seems to take up too much room, but I don't
have any idea how to improve it... :)
Page 3: Status (this could maybe be page 1). The version string could
be moved here, plus the database ham/spam count, plus the watch folder
information. This is the page that I would want to see most often once
everything is running. This would save space on other pages, and also
allow the watch folder information space to be a bit bigger (it runs out
of room very quickly). This is a big shuffle, though.
Page 4: Advanced. As it is, although I wonder if "save spam score"
belongs in advanced. The wording of the delete-as-spam-marks-as-read
option isn't clear either. If you select "None", it says "When a
message is deleted as spam, change its read state to None", which isn't
what happens.
Where is "Filter now", you ask? In a separate dialog, accessed via
either a button on the filtering tab, or as a separate toolbar menuitem.
The rest is status/settings, this is an action [1]; it makes sense to
differentiate it.
> I'd really like to hear other people's opinions on this too.
Personally I don't really like tabs, and thought the old one was better
(it made more logical sense). I realise that this is probably a
minority opinion, and that the users are familiar with a tabbed
interface, though.
=Tony Meyer
[1] Ok, training is an action, but it's a 'settings' kind of action ;)
From T.A.Meyer at massey.ac.nz Mon Aug 18 13:28:38 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 20:29:26 2003
Subject: [spambayes-dev] Calling Outlook Express users...
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258D@its-xchg4.massey.ac.nz>
> Subject: [spambayes-dev] Calling Outlook Express users...
> ...on this list? I must be joking.
:)
> Romain Guy has posted a patch and a new module that lets the
> web interface train on Outlook Express mailboxes, by
> uploading a .dbx file in the same was as you can upload an mbox file:
> I'm happy to code-review and commit the patch, but not being an
Outlook
> Express user I can't test it.
I'll do some testing on this as soon as I get a chance, although if you
want to go over the code first, that would be great :)
(Note that I'm not an OE user for my mail, but I use it to test all
sorts of things, plus my fiance uses it at home - with pop3proxy at the
moment).
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 18 13:38:08 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 20:40:59 2003
Subject: [spambayes-dev] SF rankings...
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D9259F@its-xchg4.massey.ac.nz>
> On a more topical note, SB developers might note that
> Microsoft last week announced that they were ending Outlook
> Express development. It will still be available, but will
> gradually rot away in favour of the full Outlook.
[...]
> Well-if-MS-won't-support-it-why-should-we,
This is, of course, a backwards step. We have some ability to work with
OE - we have no ability at all to work with whatever version of hotmail
gets built in to replace OE. I doubt MS is planning on putting in
convenient plug-in hooks, either.
Still, I can't see people abandoning OE (and their pop3/imap addresses)
in droves any time soon, so who knows what will happen...
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 18 13:35:27 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 20:41:08 2003
Subject: [spambayes-dev] World domination
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz>
> front page as the 7th "most active" project (of 67,000) at SF
> last week. There have been at least 4,300 downloads of the
> OL addin from SF.
#6 now...maybe one fewer beating today? ;) Interestingly, the number of
downloads fell quite a lot over the last couple of days, but the ranking
went up. Maybe it was a slow day for everyone...
> Congratulations! I wish I could claim more credit for
> myself, but I've had little to do with it since last year.
Well, to be fair, as nice as the Outlook plug-in is, I doubt many people
would be recommending it if it didn't actually do the business of
correctly classifying mail. You can have as much of that credit as you
can get before Gary et. al. realise that there is credit to be taken and
come for their share... :)
> Nevertheless, beatings will continue until it's #1 .
I don't know if I've ever seen the sf page without "Compiere ERP + CRM
Business Solution", "Gaim", and "phpMyAdmin" in the top five. Pushing
them out will be quite a feat, but then there's bound to be a surge with
the next release of the plug-in, so maybe that would do it.
=Tony Meyer
From anthony at interlink.com.au Mon Aug 18 11:51:01 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Aug 17 20:51:17 2003
Subject: [spambayes-dev] SF rankings...
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9259F@its-xchg4.massey.ac.nz>
Message-ID: <200308180051.h7I0p1XN025105@localhost.localdomain>
>>> "Meyer, Tony" wrote
> This is, of course, a backwards step. We have some ability to work with
> OE - we have no ability at all to work with whatever version of hotmail
> gets built in to replace OE. I doubt MS is planning on putting in
> convenient plug-in hooks, either.
>
> Still, I can't see people abandoning OE (and their pop3/imap addresses)
> in droves any time soon, so who knows what will happen...
One of the quotes I saw was from an MS product manager complaining that
IMAP wasn't a rich enough protocol for an email client. This suggests that
they're planning to do more proprietary crap between Outlook and Exchange.
This is not good :-(
--
Anthony Baxter
It's never too late to have a happy childhood.
From anthony at interlink.com.au Mon Aug 18 11:55:01 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Aug 17 20:55:14 2003
Subject: [spambayes-dev] SF rankings...
In-Reply-To: <200308180051.h7I0p1XN025105@localhost.localdomain>
Message-ID: <200308180055.h7I0t1Q2025192@localhost.localdomain>
>>> Anthony Baxter wrote
> One of the quotes I saw was from an MS product manager complaining that
> IMAP wasn't a rich enough protocol for an email client. This suggests that
> they're planning to do more proprietary crap between Outlook and Exchange.
> This is not good :-(
Hm. It seems like they might have backed down.
http://news.zdnet.co.uk/software/applications/0,39020384,39115720,00.htm
(original piece:
http://new.zdnet.co.uk/zdnetuk/news/software/applications/0,39020384,39115680,00.htm
)
--
Anthony Baxter
It's never too late to have a happy childhood.
From adam.walker at rbwconsulting.com Sun Aug 17 22:18:09 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 21:18:24 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9258A@its-xchg4.massey.ac.nz>
Message-ID: <20030818011819.0A14D862F7@plunder.dreamhost.com>
> To me it seems odd that the general page has a 'training' section, and
> then there is a separate 'training' tab.
I agree it's odd. Of course it was odd before when we had a training section
and a training dialog.
I'm of the opinion that we may need two GUIs. A simplified and a power user
(or maybe a wizard-for-first-setup and standard?) interface. I think most
users are confused by the multiple places to select folders.
* spam for training.
* ham for training.
* folders to filter.
* spam for the filter.
* unsure for the filter.
* folders under filter now.
Having used it for a few months, I understand why all those options are
there. But to the start out, it's a bit overwhelming -- the new user simply
wants to point the plug-in at a pile ham, a pile of spam, and folder for
unsures and click a "finish" button. At which point the plug-in would train,
set the other folder options from the choices made before, set defaults for
the read state options, and enable itself.
>Where is "Filter now", you ask? In a separate dialog, accessed via either
>a button on the filtering tab, or as a separate toolbar menuitem.
As long as the button/menuitem say "Filter Now..." (or something with "..."
and the end) and not "Filter Now" if it will bring up a dialog.
>Personally I don't really like tabs, and thought the old one was better (it
>made more logical sense). I realise that this is probably a minority
>opinion, and that the users are familiar with a tabbed interface, though.
The old layout suffered many of the same problems the current one does (I
didn't change the layout much other breaking up some pages) and violated
some GUI design conventions. It may have made sense code-wise but not usage
wise. At least that my $.02. Exchange rates may vary.
--Adam
From mhammond at skippinet.com.au Mon Aug 18 12:23:37 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 17 21:23:28 2003
Subject: [spambayes-dev] World domination
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz>
Message-ID: <0d0401c36527$5a4447a0$f502a8c0@eden>
> surge with
> the next release of the plug-in, so maybe that would do it.
Yes - I reckon that if we release a new version once per week, with each one
containing a "critical bug that you can't see, but we promise is there"
(just like the last release ) we could get real mileage. Add a
background thread so that the plugin checks for a new version each time it
starts and *insists* you download it, and I believe we could get there!
hehe - this is almost sounding serious. We could take the approach of
"windows update", and download in the background, saying we are ready to
upgrade once we get it.
*sigh* - ok, back to the real world. I'm avoiding looking at the spambayes
list for a few days, but am about to fiddle with the dialogs :)
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 1956 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030818/69406e4f/winmail.bin
From T.A.Meyer at massey.ac.nz Mon Aug 18 14:37:46 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 21:38:33 2003
Subject: [spambayes-dev] Outlook Manager dialog
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92600@its-xchg4.massey.ac.nz>
> I agree it's odd. Of course it was odd before when we had a
> training section and a training dialog.
True, although not as odd, because you moved from a general training
section into a whole new dialog.
> I'm of the opinion that we may need two GUIs. A simplified
> and a power user (or maybe a wizard-for-first-setup and
> standard?) interface.
I hate having advanced and simple GUIs, but realise that I may again be
in the minority. IMO, a good interface is easy for beginners to use,
and easy for 'experts' to use faster. One of the problems with
beginner/advanced options is that most people start out as beginner,
progress out of it, but never reach expert, ending up somewhere in the
middle.
That said, a 'wizard' type thing just for an initial setup would
probably be a good thing. InBoxer has one of these.
> The old layout suffered many of the same problems the current
> one does
True, but while we're changing it, we should fix them all! (When I say
"we", I mean you and Mark, of course ).
> (I didn't change the layout much other breaking up
> some pages)
The particular change I was referring to was the change from multiple
dialogs to a single, tabbed, dialog. I know this is how Microsoft (and
others) believe that this is how a dialog should be, but I don't agree
(sometimes they are right, sometimes they aren't).
=Tony Meyer
From ta-meyer at ihug.co.nz Mon Aug 18 14:40:48 2003
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Sun Aug 17 21:41:26 2003
Subject: [spambayes-dev] RE: [Spambayes-checkins] website related.ht, 1.10,
1.11
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302CE977C@its-xchg4.massey.ac.nz>
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212ACB9@its-xchg4.massey.ac.nz>
[From the check-ins list]
> Bring the 'related projects' up to date wrt inboxer/spambayes.
Does this mean that SpamAtBay no longer exists? (I was always a bit
confused about how InBoxer and SpamAtBay related, anyway). It was the
better name, IMO, although perhaps not as marketable.
> ! Some developers like the SpamBayes project enough to
> invest in building other projects on top of it. Please
> contact us if you would like to be listed here. A listing
> here does not mean that the SpamBayes team endorses the
> project. Commercial projects offer the same success in
> filtering mail, but in exchange for your money, strive to be
> more user-friendly, offer more in the way of support, or
> additional features that enhance the functions of the core
> SpamBayes code base.
This sounds much better than what was previously there, BTW.
=Tony Meyer
From mhammond at skippinet.com.au Mon Aug 18 12:46:00 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 17 21:45:46 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <20030818011819.0A14D862F7@plunder.dreamhost.com>
Message-ID: <0d2201c3652a$7b9f86f0$f502a8c0@eden>
> I'm of the opinion that we may need two GUIs. A simplified
> and a power user
> (or maybe a wizard-for-first-setup and standard?) interface.
Agreed. I admit to liking the wizard idea. Maybe borrowing ideas from MS
again:
* SpamBayes manager dialog is tabbed, with first tab being, pretty much,
"About SpamBayes". It has the version info etc, but a nice large button
"Configuration Wizard" or some such. We still try and keep the tabs down
though.
* When SpamBayes starts, if it is not correctly configured, it displays the
wizard.
The idea is that the user will be up-and-running before they ever see the
main SpamBayes dialog. When they *do* bring up the dialog, they will be
able to see everything reflected correctly, and/or will have a clear way of
"re-configuring" using the same process they did at the start.
I like tabs, and I hate them. The simple reality is that we are ending up
with too many options to have a single dialog, with modal dialogs popping up
all over the place. Modal dialogs off modal dialogs are considered a
hanging offence by people who take this stuff seriously . However
tabs also quickly get overwhelming, and at some point we have to ask
ourselves if the option is better left out of the GUI completely -
especially ones that are "working around" bugs we don't know how to fix
A good example of this is the "Save Spam Score" option (which I mentioned to
Adam in private mail) - I see no legitimate reason a user would ever want to
disable this, unless it had some negative impact on the system - such as
modifying the message in some unexpected way, or just caused the whole thing
to fail. In this case we have a bug. As we don't know how to fix all such
obscure bugs yet, the option is a perfectly thing to have - but probably not
a reasonable thing to try and cram in the UI. Once we get 10,000 users, if
only a fraction of a percent start mailing the list with "why would I want
to turn this option off", we are in trouble . (Note that I am talking
more in general than this specific option - if this fits OK and makes sense
then great, but not all options will)
I think I am simply saying I believe some options are best left to the
people capable of finding them
> I think most
> users are confused by the multiple places to select folders.
...
> Having used it for a few months, I understand why all those
> options are
I agree 100%, especially for initial configuration. I was having this
discussion with a real-world mate the other day who just started using it.
I told him that unfortunately, the UI still reflects the underlying code
structure rather than the best user experience. You are saying the exact
same thing :)
> The old layout suffered many of the same problems the current
> one does (I
Actually, the biggest problem by far with the old one was the inflexibility
of the dialogs. This meant that once I considered them "good enough for
now", that is how they stayed. The Outlook GUI has not changed in any
significant way since my first checkin of them ages ago (until now of course
:)
Thankfully this has been fixed, and we are now able to not only have this
discussion, but do something about it :)
Mark.
From seant at iname.com Sun Aug 17 23:07:00 2003
From: seant at iname.com (Sean True)
Date: Sun Aug 17 22:09:20 2003
Subject: [spambayes-dev] RE: [Spambayes-checkins] website related.ht, 1.10,
1.11
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212ACB9@its-xchg4.massey.ac.nz>
Message-ID: <005c01c3652d$69f42ed0$0201a8c0@swapwizard.com>
>
> [From the check-ins list]
> > Bring the 'related projects' up to date wrt inboxer/spambayes.
>
> Does this mean that SpamAtBay no longer exists? (I was always a bit
> confused about how InBoxer and SpamAtBay related, anyway). It was the
> better name, IMO, although perhaps not as marketable.
>
> > ! Some developers like the SpamBayes project enough to
> > invest in building other projects on top of it. Please
> > contact us if you would like to be listed here. A listing
> > here does not mean that the SpamBayes team endorses the
> > project. Commercial projects offer the same success in
> > filtering mail, but in exchange for your money, strive to be
> > more user-friendly, offer more in the way of support, or
> > additional features that enhance the functions of the core
> > SpamBayes code base.
>
> This sounds much better than what was previously there, BTW.
>
> =Tony Meyer
It's complicated. Basically, it's a triumph of marketers over developers
(should sound familiar), but since I'm squarely on both sides of the fence,
I don't mind a bit.
I'll be announcing something definitive shortly, but in the mean time I am
trying to reduce confusion on public venues. InBoxer has a review coming up
in PCMag in a couple of weeks, and we expect all _you know what_ to break
loose:
no matter what label is on it, I'm firmly on the hook for support.
If you want to get a preview of the marketing slant for InBoxer, take a look
at
http://www.inboxer.com/0sb.html
Compared to the nice spare h2tohtml site we all know (and love), it's pretty
cluttered.
-- Sean
From skip at pobox.com Sun Aug 17 22:09:44 2003
From: skip at pobox.com (Skip Montanaro)
Date: Sun Aug 17 22:09:50 2003
Subject: [Python-Dev] RE: [spambayes-dev] World domination
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92599@its-xchg4.massey.ac.nz>
Message-ID: <16192.13672.160054.733779@montanaro.dyndns.org>
Tony> I don't know if I've ever seen the sf page without "Compiere ERP +
Tony> CRM Business Solution", "Gaim", and "phpMyAdmin" in the top five.
I can understand Gaim and - to a certain degree - phpMyAdmin, but where's
the geek appeal in an ERP/CRM tool?
Skip
From romain.guy at jext.org Mon Aug 18 05:18:04 2003
From: romain.guy at jext.org (Romain GUY)
Date: Sun Aug 17 22:22:32 2003
Subject: [spambayes-dev] Windows installer for non Outlook users
Message-ID: <20038184184.810601@Thinthalion>
Hello everyone,
I've just taken a few minutes to explain a friend of mine how to make spambayes run on his Windows XP/Outlook Express platform. And one thing is certain : if installing the spambayes Windows service is not hard at all when you have clear, precise and convenient instructions, it becomes quite difficult when it comes to "normal users" (that is to say Tim's sister ;-) with no programmer friend around.
As I am a (almost) full time Windows users and as I'm used to use Inno Setup I am willing to set up a very simple Windows installer. This installer would include a Python interpreter (with the only required libraries to make spambayes run), the win32all extension and some picture based documentation to teach users how to set up their mail client. This installer would also install the spambayes service so that they won't bother running it manually at every startup. Maybe we could even add start menu/desktop icons which would launch the web interface (we can also consider adding a bookmark in IE and/or Netscape). Maybe the installer could also try to set up Outlook Express directly (finding user's accounts in registry, setting them in spambayes). The Notate To: option could be also activated by default.
So, if you do agree, I'm ready to take care of this.
--
Romain GUY
romain.guy@jext.org
http://www.jext.org
http://progx.jext.org
From mhammond at skippinet.com.au Mon Aug 18 13:33:19 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 17 22:33:00 2003
Subject: [spambayes-dev] Windows installer for non Outlook users
In-Reply-To: <20038184184.810601@Thinthalion>
Message-ID: <0d7301c36531$16a74650$f502a8c0@eden>
> So, if you do agree, I'm ready to take care of this.
I agree, but see no reason we can't use py2exe etc for this. It may be hard
to do optimally, but even a "simple" py2exe distribution may be better than
this, and would have alot less change of screwing up an existing Python
install.
Thomas Heller (py2exe guy) is currently working on some nice features that I
proposed simply so I could make my installer even better. My intention, and
we are not that far off, is to have a single installer that comes with
pop3proxy as a service, as a "standard" exe for win9x, and as the outlook
plugin. Inno would then detect which is most appropriate. The size of the
distribution would not grow much at all, as all common code would live in a
.zip file, and shared between the various small, "stub" exes and dlls.
but yeah, if the above makes you balk, I have no objection.
Mark.
From T.A.Meyer at massey.ac.nz Mon Aug 18 16:19:39 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 23:20:27 2003
Subject: [Python-Dev] RE: [spambayes-dev] World domination
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B1@its-xchg4.massey.ac.nz>
Tony> I don't know if I've ever seen the sf page without "Compiere ERP +
Tony> CRM Business Solution", "Gaim", and "phpMyAdmin" in the top five.
Skip> I can understand Gaim and - to a certain degree - phpMyAdmin,
Skip> but where's the geek appeal in an ERP/CRM tool?
Maybe lots of geeks are running small to medium sized enterprises? Or
think they will be in the future, and are planning for that day?
I agree; it is out of place.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 18 16:26:38 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 17 23:27:22 2003
Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_service and smtpproxy
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B7@its-xchg4.massey.ac.nz>
> When I explicitly start pop3proxy.py, it starts smtpproxy as
> expected. If I use pop3proxy_service.py, the smtpproxy is
> not started. Any ideas why this is happening?
The service starts pop3proxy's main function, but smtpproxy is started
in pop3proxy's run function (which calls main).
What's the best way to fix this?
o Move the smtpproxy starting code into main().
o Add smtpproxy starting code into pop3proxy_service.py.
o Have a separate service for smtpproxy (this is problematic because
they have to share the database).
o Something else.
I don't really know much about Windows services, so I'm throwing this to
the -dev list.
=Tony Meyer
From adam.walker at rbwconsulting.com Mon Aug 18 00:31:22 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 23:31:47 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D9262F@its-xchg4.massey.ac.nz>
Message-ID: <20030818033139.EA64A13E21F@sack.dreamhost.com>
For those kinds of things, I'd made it a right in a blank area of the
advanced tab. The log level numbers should be replaced with words
("minimal", "debug", "verbose") if they are not hidden.
> -----Original Message-----
> From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz]
> Sent: Sunday, August 17, 2003 10:06 PM
> To: Mark Hammond; Adam Walker
> Subject: RE: [spambayes-dev] Outlook Manager dialog
>
> > Maybe a better option would be some cleverly worded,
> > semi-hidden option for "diagnostics". It could also help
> > locate the log file - even creating a mail with the log
> > attached. The "clever wording" could reflect that this
> > should only be touched when specifically asked by a developer
> > in the process of tracking down a problem.
>
> I like this idea a lot. I don't know how or where you'd hide or word it,
> but it's a good idea.
>
> Cheers,
> Tony
From adam.walker at rbwconsulting.com Mon Aug 18 00:35:14 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 23:35:26 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <0d2701c3652b$3cf27e70$f502a8c0@eden>
Message-ID: <20030818033523.2391113E261@sack.dreamhost.com>
>
> I think I better hack a wizard framework together eh?
Shouldn't be too hard. The wizards tend to operate like tabs.
--Adam
From adam.walker at rbwconsulting.com Mon Aug 18 00:43:19 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Sun Aug 17 23:43:32 2003
Subject: [spambayes-dev] Outlook Manager dialog
In-Reply-To: <20030818033139.EA64A13E21F@sack.dreamhost.com>
Message-ID: <20030818034327.F0FC413E230@sack.dreamhost.com>
D'oh! That should say right-click.
I'm going to sleep now ;)
> -----Original Message-----
> From: spambayes-dev-bounces@python.org [mailto:spambayes-dev-
> bounces@python.org] On Behalf Of Adam Walker
> Sent: Sunday, August 17, 2003 11:31 PM
> To: 'Meyer, Tony'; 'Mark Hammond'
> Cc: spambayes-dev@python.org
> Subject: RE: [spambayes-dev] Outlook Manager dialog
>
> For those kinds of things, I'd made it a right in a blank area of the
> advanced tab. The log level numbers should be replaced with words
> ("minimal", "debug", "verbose") if they are not hidden.
>
> > -----Original Message-----
> > From: Meyer, Tony [mailto:T.A.Meyer@massey.ac.nz]
> > Sent: Sunday, August 17, 2003 10:06 PM
> > To: Mark Hammond; Adam Walker
> > Subject: RE: [spambayes-dev] Outlook Manager dialog
> >
> > > Maybe a better option would be some cleverly worded,
> > > semi-hidden option for "diagnostics". It could also help
> > > locate the log file - even creating a mail with the log
> > > attached. The "clever wording" could reflect that this
> > > should only be touched when specifically asked by a developer
> > > in the process of tracking down a problem.
> >
> > I like this idea a lot. I don't know how or where you'd hide or word it,
> > but it's a good idea.
> >
> > Cheers,
> > Tony
>
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
From mhammond at skippinet.com.au Mon Aug 18 14:56:43 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sun Aug 17 23:56:58 2003
Subject: [spambayes-dev] RE: [Spambayes] pop3proxy_service and smtpproxy
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D926B7@its-xchg4.massey.ac.nz>
Message-ID: <0df601c3653c$be822830$f502a8c0@eden>
> The service starts pop3proxy's main function, but smtpproxy is started
> in pop3proxy's run function (which calls main).
>
> What's the best way to fix this?
>
> o Move the smtpproxy starting code into main().
> o Add smtpproxy starting code into pop3proxy_service.py.
> o Have a separate service for smtpproxy (this is problematic because
> they have to share the database).
> o Something else.
The best way is to have a single entry-point that pop3proxy_service.py can
call to start everything it needs. I don't care how it is spelt, or what it
looks or smells like :)
> I don't really know much about Windows services, so I'm
> throwing this to
> the -dev list.
Note pop3proxy_service.py has about 100 lines of code, so should be easy to
get your head around. About the only real issue is the need for an
asynchronous "stop" command that can be issued (which may need to be
implemented via a local socket - whatever). This is missing now, and is
slightly dangerous as currently pop3proxy is simply aborted at service
shutdown time.
I don't really know much about pop3proxy
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2148 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030818/9a152589/winmail.bin
From vanhorn at whidbey.com Sun Aug 17 22:07:17 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Mon Aug 18 00:07:26 2003
Subject: [spambayes-dev] SF rankings...
References: <200308180051.h7I0p1XN025105@localhost.localdomain>
Message-ID: <3F4050F5.5495E1D@whidbey.com>
Anthony Baxter wrote:
> >>> "Meyer, Tony" wrote
> > This is, of course, a backwards step. We have some ability to work with
> > OE - we have no ability at all to work with whatever version of hotmail
> > gets built in to replace OE. I doubt MS is planning on putting in
> > convenient plug-in hooks, either.
> >
> > Still, I can't see people abandoning OE (and their pop3/imap addresses)
> > in droves any time soon, so who knows what will happen...
>
> One of the quotes I saw was from an MS product manager complaining that
> IMAP wasn't a rich enough protocol for an email client. This suggests that
> they're planning to do more proprietary crap between Outlook and Exchange.
> This is not good :-(
The more things change, the more they stay the same: Friends don't let friends
use Microsoft mail products.
Van
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
From kennypitt at hotmail.com Mon Aug 18 10:52:47 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Mon Aug 18 09:53:32 2003
Subject: [spambayes-dev] New Outlook Dialogs Problem
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302BF3D8A@its-xchg4.massey.ac.nz>
Message-ID: <3F40DA2F.1080104@hotmail.com>
Meyer, Tony wrote:
> If I haven't got enough training information to enable filtering, the
> "enable filtering" box isn't greyed out anymore. If I try to check it,
> it doesn't check, and I get this traceback:
>
> Traceback (most recent call last):
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 286, in OnCommand
> self.ApplyHandlingOptionValueError(handler.OnCommand, wparam,
> lparam)
> File "D:\cvs\spambayes\spambayes\Outlook2000\dialogs\dlgcore.py", line
> 245, in ApplyHandlingOptionValueError
> self.dialog_def.caption, mb_flags)
> AttributeError: ProcessorDialog instance has no attribute 'dialog_def'
>
A fix was checked in for the exception caused by trying to enable
filtering without enough training data, but I haven't heard any further
public discussion of the second part about disabling the checkbox. I
noticed that it is still not disabled in the latest dialog updates that
Adam just checked in.
Was it decided whether or not we want to do this? If anyone is
interested, I will gladly update the patch that I submitted for this so
that it works with Adam's new dialogs.
--
Kenny Pitt
From david at rebirthing.co.nz Mon Aug 18 15:53:26 2003
From: david at rebirthing.co.nz (David McNab)
Date: Mon Aug 18 10:53:27 2003
Subject: [spambayes-dev] FAQ Contribution
Message-ID: <1061218278.1144.10.camel@rebirth>
Q: I notice the web interface rejects browser access unless the browser
is running on the same host. How do I enable web access from other nodes
on the LAN?
A: Edit the bayescustomize.ini script. Just below the line '[html_ui]',
add the line 'allow_remote_connections:True'. But make sure you firewall
off outside access to port 8880, to stop unauthorised users from messing
with the web interface.
--
Cheers
David
From mhammond at skippinet.com.au Tue Aug 19 09:43:36 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Aug 18 18:45:00 2003
Subject: [spambayes-dev] New Outlook Dialogs Problem
In-Reply-To: <3F40DA2F.1080104@hotmail.com>
Message-ID: <12d001c365da$29f8e840$f502a8c0@eden>
> A fix was checked in for the exception caused by trying to enable
> filtering without enough training data, but I haven't heard
> any further
> public discussion of the second part about disabling the checkbox. I
> noticed that it is still not disabled in the latest dialog
> updates that
> Adam just checked in.
>
> Was it decided whether or not we want to do this? If anyone is
> interested, I will gladly update the patch that I submitted
> for this so
> that it works with Adam's new dialogs.
Yes, please do. I don't think we know for sure exactly what we want, but
will know it when we see it .
Mark.
From romain.guy at jext.org Tue Aug 19 02:31:49 2003
From: romain.guy at jext.org (Romain GUY)
Date: Mon Aug 18 19:36:13 2003
Subject: [spambayes-dev] A weird trick for Outlook Express users
Message-ID: <200381913149.445701@Thinthalion>
I've attached a mail to this mail. It is a .eml file that Outlook Express users can drag and drop into their mailbox. The trick with this mail is that when you read it, a new browser window is opened (IE, Mozilla, Firebird, Opera... according to your default browser choice) directly in the SpamBayes Web Interface. Maybe Outlook Express might actually like it.
I'm not sure but maybe it would be possible to allow Outlook Express users to manage SpamBayes right from the mail client using tricky email files...
--
Romain GUY
romain.guy@jext.org
http://www.jext.org
http://progx.jext.org
-------------- next part --------------
An embedded message was scrubbed...
From: unknown sender
Subject: no subject
Date: no date
Size: 3710
Url: http://mail.python.org/pipermail/spambayes-dev/attachments/20030819/1858d50d/SpamBayes.eml
From T.A.Meyer at massey.ac.nz Tue Aug 19 20:38:36 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 19 03:39:27 2003
Subject: [spambayes-dev] SpamBayes Readme
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz>
> I suggest the following approach. Move the current README
> aside and create a new one. It can just have the section
> headings in there initially, then people can write sections
> as they go. The current approach of hoping for
> someone to come along and rewrite it all isn't working.
That's probably right. However, I couldn't be bothered creating
appropriate headings, so I've created a new version.
It's here at the moment:
Comments? Or should I just check this in? I'll create a
"readme-devel.txt" file that has all the testing (etc) info that the old
readme had.
BTW, because there are so many different ways to use & train spambayes,
it is quite difficult to write an introduction that suits everyone. I'm
not sure this does this yet, but it's a step along the way. It is very
similar to INTEGRATION.TXT (which probably could be retired if we use
this alternative readme).
Richard: is this more what you were after? Apart from having things
moved around, it is *very* similar to INTEGRATION.TXT, which apparently
wasn't good enough. Can you suggest improvements?
=Tony Meyer
From anthony at interlink.com.au Tue Aug 19 18:48:26 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Tue Aug 19 04:24:51 2003
Subject: [spambayes-dev] Re: SpamBayes Readme
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz>
Message-ID: <200308190748.h7J7mQqY017297@localhost.localdomain>
You need to have Python 2.2 or later (2.3 is recommended). You can
download Python from .
+ Many distributions of unix now ship with Python - try typing 'python'
+ at a shell prompt.
As far as the many different ways to install SB, perhaps we should pick
one or two as the "suggested way" to do it? Then have a bunch of files
for the other sorts of ways, and reference them from the main README.
Anthony
--
Anthony Baxter
It's never too late to have a happy childhood.
From richardjones at optushome.com.au Wed Aug 20 11:58:58 2003
From: richardjones at optushome.com.au (Richard Jones)
Date: Fri Aug 22 11:35:16 2003
Subject: [spambayes-dev] Re: SpamBayes Readme
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92C0C@its-xchg4.massey.ac.nz>
Message-ID: <200308201059.02790.richardjones@optushome.com.au>
On Tue, 19 Aug 2003 05:38 pm, Meyer, Tony wrote:
> > I suggest the following approach. Move the current README
> > aside and create a new one. It can just have the section
> > headings in there initially, then people can write sections
> > as they go. The current approach of hoping for
> > someone to come along and rewrite it all isn't working.
>
> That's probably right. However, I couldn't be bothered creating
> appropriate headings, so I've created a new version.
>
> It's here at the moment:
>
>
> Comments? Or should I just check this in? I'll create a
> "readme-devel.txt" file that has all the testing (etc) info that the old
> readme had.
That's a huge improvement for new users over the existing readme! Especially
the Really Impatient part, which gives the immediate impression that the
software is easy to use. Then the most common use-cases are laid out straight
away in a clear and concise manner. I'm a little bemused by your having
"everyone else" before "procmail filtering" though :)
No mention of setup.py ... is it supposed to be used at all? If it is, I
suggest that the scripts be renamed to make them a little more SB-specific
(eg. "sb-pop3proxy"), as on my system it installs them all to /usr/bin (as I
used /usr/bin/python2.3 to install them).
Running pop3proxy.py, the first problem is that the webbrowser module is
b0rken for me... I've posted a patch to Python bug 687747 which consists of:
--- webbrowser.py 2003-08-20 10:28:07.000000000 +1000
+++ /usr/lib/python2.3/webbrowser.py 2003-08-04 10:18:17.000000000 +1000
@@ -354,7 +354,7 @@
if "BROWSER" in os.environ:
# It's the user's responsibility to register handlers for any unknown
# browser referenced by this value, before calling open().
- _tryorder[0:0] = os.environ["BROWSER"].split(os.pathsep)
+ _tryorder = os.environ["BROWSER"].split(os.pathsep)
for cmd in _tryorder:
if not cmd.lower() in _browsers:
[might be useful for your later reference if the problem pops up]
After I fixed that, the web interface came up nicely. I then tried to set up a
proxy for my main email account. I stupidly put in "110" as the port number
(having not read the instructions fully, I thought I a was entering the port
number for the target pop server, not the local port), and promptly got an
error and a whole slew of output to the console pop3proxy was running in
consisting of a constant stream of:
warning: unhandled read event
warning: unhandled write event
Ehem. So, I need to choose a port number above 1024 there :)
In response to Skip's question, I'm running KMail to two POP servers and one
IMAP server (I also use the OSX Mail.app to read the IMAP box). I now know
that setting up the spambayes for the POP server will be a breeze. The IMAP
box is already spam-filtered (poorly) by Mail.app. Getting the POP accounts
spam-filtered will be a big win for now.
Also on this note, I've previously used spamassasin through kmail's filtering
(invoking the command-line tool from a filter). I think that the procmail
setup could be adapted to do this, but training is an issue.
BTW, I don't even know what format my mail is stored in by KMail (I suggest
that a lot of users wouldn't know this :). I do know it's not mailbox format.
It looks like a directory per folder, with "cur", "new" and "tmp" for each,
and then a file per message. *shrug*
Also, I had a look in the FAQ, but I couldn't see any reference to starting
spambayes on boot-time. Has any work been done on some sort of rc script?
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030820/472ed1d4/attachment-0001.bin
From T.A.Meyer at massey.ac.nz Wed Aug 20 17:02:59 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 22 16:19:50 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz>
> > BTW, there was a suggestion a while back (from Skip) that
> > we adopt the
> > format that ssh uses for port forwarding: "remote host:remote
> > port:local port". For example: "pop.example.com:110:110". Do you
> > think this would be any clearer?
>
> I think it would be.
It would be interesting to hear what the other -dev people have to say.
There was a suspicious quiet after Skip suggested it ;) It looks more
confusing to me, but then I've never used ssh port forwarding...
> Note that there's a large number of people out there that
> won't know that POP is normally on port 110 :)
True, which is why the remote port just defaults to 110, and the doc
suggests 110 for the proxy.
> Why not just arbitrarily assign a port number if the user
> doesn't supply one?
One reason is that unless/until we integrate more tightly with the mail
clients, the user needs to know the number, so that they can tell they
mail client which port to connect to. It would make it easier in terms
of support, too, if almost everyone was using the same port.
> > I haven't heard much about how well Mail.app filters. Is it really
> > that bad?
>
> The filtering is fine - the spam detection isn't so crash-hot
I worded that poorly. I meant the spam detecting.
> The interface *rocks* though ... for a given message, there's a
> button with which you can simply say "this is spam" or
> "you marked this as spam and got it wrong".
[...]
> I'm assuming the outlook interface is like that.
Almost exactly.
> Unless KMail includes spambayes by default
> (oooh!) it's unlikely to get that level of integration.
Any client that provides a decent interface for plug-ins should be able
to have this sort of integration, as long as someone's willing to put in
the time to do it. How difficult it is depends a lot on the plug-in
interface; Outlook is very complicated, so a lot of work. Eudora, for
example, also has a plug-in interface and (in theory) should be somewhat
simpler (especially since you can copy work that Mark has done). I
think someone was going to work on a (Windows) Eudora plug-in, but I
haven't heard anything further.
=Tony Meyer
From richardjones at optushome.com.au Wed Aug 20 12:54:07 2003
From: richardjones at optushome.com.au (Richard Jones)
Date: Fri Aug 22 16:30:18 2003
Subject: [spambayes-dev] Re: SpamBayes Readme
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz>
Message-ID: <200308201154.07329.richardjones@optushome.com.au>
On Wed, 20 Aug 2003 11:28 am, Meyer, Tony wrote:
> BTW, there was a suggestion a while back (from Skip) that we adopt the
> format that ssh uses for port forwarding: "remote host:remote port:local
> port". For example: "pop.example.com:110:110". Do you think this would
> be any clearer?
I think it would be. Note that there's a large number of people out there that
won't know that POP is normally on port 110 :)
Why not just arbitrarily assign a port number if the user doesn't supply one?
The roundup demo script does this as a means of making things easier on the
end user via this code:
# figure basic params for server
hostname = socket.gethostname()
# pick a fairly odd, random port
port = 8917
while 1:
print 'Trying to set up web server on port %d ...'%port,
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
try:
s.connect((hostname, port))
except socket.error, e:
if not hasattr(e, 'args') or e.args[0] != errno.ECONNREFUSED:
raise
print 'should be ok.'
break
else:
s.close()
print 'already in use.'
port += 100
> I haven't heard much about how well Mail.app filters. Is it really that
> bad?
The filtering is fine - the spam detection isn't so crash-hot (from what
people have told me - I don't get much spam at that address). The interface
*rocks* though ... for a given message, there's a button with which you can
simply say "this is spam" or "you marked this as spam and got it wrong". No
idea what sort of scheme they're using under the covers. I'm assuming the
outlook interface is like that. Unless KMail includes spambayes by default
(oooh!) it's unlikely to get that level of integration.
Richard
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030820/0e24d359/attachment-0001.bin
From T.A.Meyer at massey.ac.nz Wed Aug 20 14:28:05 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 22 17:07:54 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E21@its-xchg4.massey.ac.nz>
> That's a huge improvement for new users over the existing
> readme!
It really is only a reordering of INTEGRATION.TXT for the most part.
(The IMAP stuff being an exception, which is my fault anyway).
> I'm a little bemused by your having
> "everyone else" before "procmail filtering" though :)
That's an area that needs improvement. The thing is that the "everyone
else" section applies to procmail (and vi/emacs later on) users. I'm
not sure where the information can really go, without duplicating it in
each section. I'll integrate Peter's procmail steps and see if that
helps.
> No mention of setup.py ... is it supposed to be used at all?
Opps. I don't use it, so forgot about it. Yes, there should be an
"installation" action before anything else, although it's not strictly
necessary.
> If it is, I suggest that the scripts be renamed to make them a
> little more SB-specific (eg. "sb-pop3proxy"), as on my system
> it installs them all to /usr/bin (as I used /usr/bin/python2.3
> to install them).
This was suggested and debated not all that long ago on the -dev list.
IIRC, there was enough agreement that this probably will
happen...possibly by the next release (better sooner than later, I
suppose). It's an annoying task, though, because you have to go through
*everything* and make sure that you've corrected all the names...
> Running pop3proxy.py, the first problem is that the
> webbrowser module is broken for me...
[...]
> [might be useful for your later reference if the problem pops up]
Thanks.
> After I fixed that, the web interface came up nicely. I then
> tried to set up a proxy for my main email account. I stupidly
> put in "110" as the port number (having not read the instructions
> fully, I thought I a was entering the port number for the target
> pop server, not the local port), and promptly got an error and
> a whole slew of output to the console pop3proxy was running in
> consisting of a constant stream of:
>
> warning: unhandled read event
> warning: unhandled write event
>
> Ehem. So, I need to choose a port number above 1024 there :)
This isn't a restriction for everyone, of course. Windows users are
able to use 110 as the port without needing to be logged in as
administrator or anything. There should be a note about this, though.
It could probably handle the error more nicely, too.
BTW, there was a suggestion a while back (from Skip) that we adopt the
format that ssh uses for port forwarding: "remote host:remote port:local
port". For example: "pop.example.com:110:110". Do you think this would
be any clearer?
> The IMAP box is already spam-filtered (poorly) by Mail.app.
I haven't heard much about how well Mail.app filters. Is it really that
bad?
=Tony Meyer
From T.A.Meyer at massey.ac.nz Wed Aug 20 14:12:09 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 22 17:12:28 2003
Subject: [spambayes-dev] FAQ Contribution
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302D92E12@its-xchg4.massey.ac.nz>
> Q: I notice the web interface rejects browser access unless
> the browser is running on the same host. How do I enable web
> access from other nodes on the LAN?
>
> A: Edit the bayescustomize.ini script. Just below the line
> '[html_ui]', add the line 'allow_remote_connections:True'.
> But make sure you firewall off outside access to port 8880,
> to stop unauthorised users from messing with the web interface.
Thanks. I've added a FAQ about this, with additional information about
the new-to-cvs ability to specify individual IP [ranges] that are
allowed access.
=Tony Meyer
From anthony at interlink.com.au Sat Aug 23 18:26:42 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Sun Aug 24 00:42:50 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz>
Message-ID: <200308230726.h7N7Qgx1015054@localhost.localdomain>
>>> "Meyer, Tony" wrote
> > > BTW, there was a suggestion a while back (from Skip) that
> > > we adopt the
> > > format that ssh uses for port forwarding: "remote host:remote
> > > port:local port". For example: "pop.example.com:110:110". Do you
> > > think this would be any clearer?
> >
> > I think it would be.
>
> It would be interesting to hear what the other -dev people have to say.
> There was a suspicious quiet after Skip suggested it ;) It looks more
> confusing to me, but then I've never used ssh port forwarding...
>
So long as we actually follow the actual ssh approach, which is
localport:remotehost:remoteport
Anthony
--
Anthony Baxter
It's never too late to have a happy childhood.
From richie at entrian.com Sun Aug 24 11:00:34 2003
From: richie at entrian.com (Richie Hindle)
Date: Sun Aug 24 06:17:59 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
Message-ID: <6jvgkvokp45a0fck4mib4781ql8c97897d@4ax.com>
[Resend]
[Tony]
> there was a suggestion a while back (from Skip) that
> we adopt the format that ssh uses for port forwarding:
> For example: "pop.example.com:110:110".
> [...]
> It would be interesting to hear what the other -dev people have to say.
> There was a suspicious quiet after Skip suggested it ;) It looks more
> confusing to me, but then I've never used ssh port forwarding...
Personally I don't like it - I imagine someone unfamiliar with the idea of
ports would not have a clue what it meant. By separating the local and
remote ports into different fields, we separate the concepts rather than
muddling them together (there's an added complication relating items in
one list with items in the other, but that's not difficult and you only
typically have a small number of them). Maybe the fix is as simple as
changing the Configuration page to say "Spambayes ports" or "Local ports"
or "Listening ports" instead of just "Ports"?
--
Richie Hindle
richie@entrian.com
From richie at entrian.com Sat Aug 23 12:12:43 2003
From: richie at entrian.com (Richie Hindle)
Date: Sun Aug 24 06:59:03 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz>
Message-ID:
[Tony]
> there was a suggestion a while back (from Skip) that
> we adopt the format that ssh uses for port forwarding:
> For example: "pop.example.com:110:110".
> [...]
> It would be interesting to hear what the other -dev people have to say.
> There was a suspicious quiet after Skip suggested it ;) It looks more
> confusing to me, but then I've never used ssh port forwarding...
Personally I don't like it - I imagine someone unfamiliar with the idea of
ports would not have a clue what it meant. By separating the local and
remote ports into different fields, we separate the concepts rather than
muddling them together (there's an added complication relating items in
one list with items in the other, but that's not difficult and you only
typically have one or two of them). Maybe the fix is as simple as
changing the Configuration page to say "Spambayes ports" or "Local ports"
or "Listening ports" instead of just "Ports"?
--
Richie Hindle
richie@entrian.com
From vanhorn at whidbey.com Sun Aug 24 14:44:35 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Sun Aug 24 16:51:36 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
References: <6jvgkvokp45a0fck4mib4781ql8c97897d@4ax.com>
Message-ID: <3F4923B3.A6129656@whidbey.com>
Greetings:
I've been meaning to ask if someone couldn't generate a new source collection
for the download page. It looks like 1.0a4 was released on 7 July, and I know
that at least two major issues (unicode in headers and default database) have
been dealt with since then.
Either that or come over here this afternoon and help figure out why I've
never been able to get CVS working!
Van
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
From paul.huygen at huygen.nl Sun Aug 24 16:34:24 2003
From: paul.huygen at huygen.nl (Paul Huygen)
Date: Sun Aug 24 18:12:32 2003
Subject: [spambayes-dev] Proposal for Emacs script to save spam-nonspam
messages for training purposes
Message-ID: <16200.48864.657573.887131@Grootgrut.hit>
Hi,
While looking on the web for code to optimize emacs's VM for handling
spam, I found (in url http://mail.python.org/pipermail/spambayes-checkins/2003-May/001291.html) your script:
> (defun copy-to-spam ()
> (interactive)
> (vm-save-message (expand-file-name "~/tmp/newspam"))
> (vm-undelete-message 1))
>
> (defun copy-to-nonspam ()
> (interactive)
> (vm-save-message (expand-file-name "~/tmp/newham"))
> (vm-undelete-message 1))
>
> (define-key vm-mode-map "ls" 'copy-to-spam)
> (define-key vm-summary-mode-map "ls" 'copy-to-spam)
> (define-key vm-mode-map "lh" 'copy-to-nonspam)
> (define-key vm-summary-mode-map "lh" 'copy-to-nonspam)
Thank you for this script. However, it did not run properly in my
case. The function "copy-to-spam" saves the current message into
"~/tmp/newspam", then marks it for deletion, jumps to the next message and
undeletes that message. In my case, the following modification of e.g.
copy-to-spam seems to work:
(defun copy-to-spam ()
(interactive)
(let ((vm-move-after-deleting nil))
(vm-save-message (expand-file-name "~/mail/mboxes/newspam")))
(let ((vm-move-after-undeleting t)) (vm-undelete-message 1)))
Best regards,
Paul Huygen
From T.A.Meyer at massey.ac.nz Mon Aug 25 13:43:13 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sun Aug 24 20:44:29 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDABA5@its-xchg4.massey.ac.nz>
> Personally I don't like it - I imagine someone unfamiliar
> with the idea of ports would not have a clue what it meant.
That's my opinion, too.
> Maybe the
> fix is as simple as changing the Configuration page to say
> "Spambayes ports" or "Local ports" or "Listening ports"
> instead of just "Ports"?
This sounded like a good idea, so I checked it in :) I guess we leave
the local:remote:remote format for now, since it's undecided whether
it's good or not, and either way it's more work ;)
=Tony Meyer
From richie at entrian.com Sun Aug 24 23:41:22 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Aug 25 01:18:10 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID:
[Van]
> I've been meaning to ask if someone couldn't generate a new source collection
> for the download page. It looks like 1.0a4 was released on 7 July, and I know
> that at least two major issues (unicode in headers and default database) have
> been dealt with since then.
I've been wondering this myself (mostly because I'm fed up with replying
to people complaining about that unicode header bug!)
However, there's an issue with the web interface (messages not appearing
on the Review page) that I'd like to look at before the next release.
Hopefully I'll have time to look at that in the next couple of days - if
anyone was going to be mad-keen enough to build a release right now, could
they hold off until I've looked at that? Ta. (I'm happy to build the
release myself, unless anyone else particularly wants to do it.)
Is there anyone else with a pending edit that they'd like to see in 1.0a5?
(There's Romain's HTTP Auth patch, 791393, which should be my job to apply
but I don't know when I'll get the chance.)
--
Richie Hindle
richie@entrian.com
From T.A.Meyer at massey.ac.nz Mon Aug 25 19:36:24 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 02:37:11 2003
Subject: [spambayes-dev] 1.0a5 release
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD51@its-xchg4.massey.ac.nz>
> I've been wondering this myself (mostly because I'm fed up
> with replying to people complaining about that unicode header bug!)
Indeed. I imagine that there will be a new Outlook plug-in release
soon, too, and it's nice when they come out at similar times. (Although
Mark's up to 8, and we're up to 5 ;).
> However, there's an issue with the web interface (messages
> not appearing on the Review page) that I'd like to look at
> before the next release. Hopefully I'll have time to look at
> that in the next couple of days
I agree that this really must be fixed; I've been meaning to look at it
too (it was probably me or TimS that broke it) but haven't had a chance
yet either.
> (I'm happy to build the
> release myself, unless anyone else particularly wants to do it.)
+1 to you doing it :)
> Is there anyone else with a pending edit that they'd like to
> see in 1.0a5?
I checked in the new readmes and modified smtpproxy today, which I
wanted in.
I'd actually like to see 1.0b1, though. In terms of the API, we must be
pretty stable now, right? Except that if we are going to do the
renaming thing (as proposed by Greg), we should probably do that (really
the sooner the better if we are going to, even if we are still at
1.0a5).
I think we should also catch the 'no dbm available' traceback (I can't
remember the wording) and replace it with a nice error for (at least)
pop3proxy users, so that those using dumbdbm don't flood us with "1.0a5
doesn't work" messages.
> (There's Romain's HTTP Auth patch, 791393, which should be my
> job to apply but I don't know when I'll get the chance.)
If you don't, I can probably do this.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 25 19:43:08 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 02:43:49 2003
Subject: [spambayes-dev] 1.0a5 release
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD52@its-xchg4.massey.ac.nz>
[me]
> I'd actually like to see 1.0b1, though.
One thing I forgot: I still like the cuteness of releasing the next
version on the 4th of September, at which point SpamBayes will have been
at sourceforge for exactly a year. Although this means people have to
wait another 10 days, it could very well be another ten days before we
get everything ready to go... :)
(Mark could buy us a birthday cake with all the $0's he has collected
from sales of the Outlook plug-in ;)
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 25 19:55:45 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 02:56:25 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz>
> Is there anyone else with a pending edit that they'd like to
> see in 1.0a5?
Ok, there was more I forgot:
o I'm going to put in the request for notate_to to optionally work on
ham & unsure messages.
o I'll also put in the request for an 'advanced' config page for the
web ui.
o I'd like to be able to get the overkill script [renamed and] working
with OE (it works with other MUAs). I might not get time for this, and
it's not that important.
o The fix that Mark & I made to pop3proxy_service so that smtpproxy
also starts needs an improvement (not surprisingly, with my bit, not
Mark's), in that the stop() function doesn't do what it should.
(unless anyone has any objections to any of these).
There's also the option of ripping out all the backwards compatibility
code from Options.py/OptionsClass.py. This might mean that the odd
config file no longer works, but it's got to be done at some time, and
would make the code a lot tidier.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Mon Aug 25 21:10:27 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 04:11:26 2003
Subject: [spambayes-dev] stopping pop3proxy
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz>
[Tony]
> > Done. (There are now only three pop3proxy functions of
> > interest: prepare(), start() and stop(). They do what you
> > would expect, and all take a pop3proxy.State object (like
> > pop3proxy.state) as the only parameter).
[Mark]
> stop() appears dangerous - remember it is called
> asynchronously, so another thread could be doing almost
> anything. What we really need is a trigger to tell the main
> loop to stop, and the save should be done as that loop terminates.
I understand what you're saying, but I don't know how to do this -
hopefully Richie does. The closest I can think of is that stop() needs
to call stop_when_done() on each of the BayesProxy objects, and then
wait until they have all stopped (I don't know how to check this). Then
save then return.
Richie - any advice?
Also in terms of stopping pop3proxy, on my fiance's computer pop3proxy
is running all the time so that she doesn't have to know how to start &
stop it when checking for mail (it's too old for pop3proxy_service).
This works fine, except that when she shuts down Windows, it comes up
with a "can't shut this program down" message (it does terminate it
after a delay, but it's annoying). Is there some way that we can handle
this? I presume Windows sends a sigterm or something to the
application? Would other OSs do exactly the same thing?
=Tony Meyer
From richie at entrian.com Mon Aug 25 11:18:51 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Aug 25 05:19:17 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD57@its-xchg4.massey.ac.nz>
Message-ID:
[Tony]
> There's also the option of ripping out all the backwards compatibility
> code from Options.py/OptionsClass.py.
> [...]
> I'd actually like to see 1.0b1, though. In terms of the API, we must be
> pretty stable now, right? Except that if we are going to do the
> renaming thing (as proposed by Greg), we should probably do that (really
> the sooner the better if we are going to, even if we are still at
> 1.0a5).
I'm +1 on both pulling out the backward-compatibility code and on renaming
everything, but I don't think we can do that in a beta release - even the
first one. Major changes like that should happen during the alpha cycle
IMHO.
It could even be worth releasing 1.0a5 *before* making those edits, with
an announcement that the old options and script names are deprecated, then
immediately releasing 1.0a6 with just those edits in place. That way,
no-one will be obliged to swallow the new names in order to get their
hands on a bugfix.
> I've been meaning to look at [the problem of messages not appearing on
> the review page] too
If I find a decent amount of time to devote to it I'll let you know, and
if you could do the same then we won't duplicate the work.
> I think we should also catch the 'no dbm available' traceback (I can't
> remember the wording) and replace it with a nice error for (at least)
> pop3proxy users, so that those using dumbdbm don't flood us with "1.0a5
> doesn't work" messages.
+1
> I still like the cuteness of releasing the next
> version on the 4th of September, at which point SpamBayes will have been
> at sourceforge for exactly a year.
I'd like to say +1, especially since it's my birthday too! (Spambayes is
exactly 31 years younger than me, and already considerably brighter 8-)
But if we're ready more than a couple of days before that, we should
probably go ahead.
--
Richie Hindle
richie@entrian.com
From vanhorn at whidbey.com Mon Aug 25 04:02:38 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Mon Aug 25 06:02:42 2003
Subject: [spambayes-dev] 1.0a5 release
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD52@its-xchg4.massey.ac.nz>
Message-ID: <3F49DEBE.ACE7BB49@whidbey.com>
Okay, how about 1.0a5 in the next couple of days, and then go for broke with
1.0b1 on the anniversary?
Van
"Meyer, Tony" wrote:
> [me]
> > I'd actually like to see 1.0b1, though.
>
> One thing I forgot: I still like the cuteness of releasing the next
> version on the 4th of September, at which point SpamBayes will have been
> at sourceforge for exactly a year. Although this means people have to
> wait another 10 days, it could very well be another ten days before we
> get everything ready to go... :)
>
> (Mark could buy us a birthday cake with all the $0's he has collected
> from sales of the Outlook plug-in ;)
>
> =Tony Meyer
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
From richie at entrian.com Mon Aug 25 14:15:13 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Aug 25 08:15:34 2003
Subject: [spambayes-dev] stopping pop3proxy
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAD5F@its-xchg4.massey.ac.nz>
Message-ID:
[Mark]
> stop() appears dangerous - remember it is called
> asynchronously, so another thread could be doing almost
> anything. What we really need is a trigger to tell the main
> loop to stop, and the save should be done as that loop terminates.
[Tony]
> I understand what you're saying, but I don't know how to do this -
> hopefully Richie does. The closest I can think of is that stop() needs
> to call stop_when_done() on each of the BayesProxy objects, and then
> wait until they have all stopped (I don't know how to check this). Then
> save then return.
>
> Richie - any advice?
What needs to be done is to refuse all new connections (or rather accept
them but push() back an error message and call close_when_done(),
according to whether some flag, probably state.isShuttingDown or similar),
then exit when all current connections complete (close_when_done is
per-connection, so it's not what we want - the connections will close
anyway). I think the best way to do this is to call sys.exit() when
BayesProxy.close() is called and state.activeSessions goes to zero with
state.isShuttingDown set. I'll try to have a look at this - if anybody
else wants to look at it, let me know so we don't duplicate the work.
> Also in terms of stopping pop3proxy, on my fiance's computer pop3proxy
> is running all the time so that she doesn't have to know how to start &
> stop it when checking for mail (it's too old for pop3proxy_service).
> This works fine, except that when she shuts down Windows, it comes up
> with a "can't shut this program down" message (it does terminate it
> after a delay, but it's annoying). Is there some way that we can handle
> this? I presume Windows sends a sigterm or something to the
> application? Would other OSs do exactly the same thing?
Windows sends a message to all top-level windows on shutdown, but the
pop3proxy has no windows. Oddly, my XP box shuts down fine, with no
warnings - I can't explain that... Mark?
--
Richie Hindle
richie@entrian.com
From mhammond at skippinet.com.au Mon Aug 25 23:31:41 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Aug 25 08:31:50 2003
Subject: [spambayes-dev] stopping pop3proxy
In-Reply-To:
Message-ID: <037d01c36b04$d61fdf00$f502a8c0@eden>
> > Also in terms of stopping pop3proxy, on my fiance's
> computer pop3proxy
> > is running all the time so that she doesn't have to know
> how to start &
> > stop it when checking for mail (it's too old for pop3proxy_service).
> > This works fine, except that when she shuts down Windows,
> it comes up
> > with a "can't shut this program down" message (it does terminate it
> > after a delay, but it's annoying). Is there some way that
> we can handle
> > this? I presume Windows sends a sigterm or something to the
> > application? Would other OSs do exactly the same thing?
>
> Windows sends a message to all top-level windows on shutdown, but the
> pop3proxy has no windows. Oddly, my XP box shuts down fine, with no
> warnings - I can't explain that... Mark?
I've seen similar things on Win9x, but not the NT platform. Windows does
try and shut down console apps, but I have no idea exactly what it does.
Or-even-roughly
Mark.
From papaDoc at videotron.ca Mon Aug 25 11:17:38 2003
From: papaDoc at videotron.ca (papaDoc)
Date: Mon Aug 25 10:35:53 2003
Subject: [spambayes-dev] Patch for pop3proxy
Message-ID: <3F4A1A82.9080106@videotron.ca>
Hi,
I have this problem with pop3proxy.py (cvs of 2003.08.25) and python 2.2.2
SpamBayes POP3 Proxy Beta1, version 0.1 (May 2003),
using SpamBayes POP3 Proxy Web Interface Alpha2, version 0.02
and engine SpamBayes Beta2, version 0.2 (July 2003).
Loading database... Filename for database =
d:/NoBackup/users/ricard/Spambayes/hammie.db
Traceback (most recent call last):
File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 819, in ?
run()
File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 804, in run
prepare(state=state)
File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 746, in prepare
state.createWorkers()
File "C:\Devtools\SPAMBA~1\SPAMBA~2.25\POP3PR~1.PY", line 614, in
createWorkers
if '::' in filename:
TypeError: 'in ' requires character as left operand
So this is a patch to solve this problem
***************
*** 611,617 ****
filename = os.path.expanduser(filename)
print "Filename for database = %s" % filename
if self.useDB:
! if re.search(r'::', filename):
sql_types = {"pgsql" : storage.PGClassifier,
"mysql" : storage.mySQLClassifier,
}
--- 611,617 ----
filename = os.path.expanduser(filename)
print "Filename for database = %s" % filename
if self.useDB:
! if '::' in filename:
sql_types = {"pgsql" : storage.PGClassifier,
"mysql" : storage.mySQLClassifier,
}
I don't know if this is the good way to do it but it solves my
problem.......
Remi
From skip at pobox.com Mon Aug 25 11:37:15 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 11:37:32 2003
Subject: [spambayes-dev] Patch for pop3proxy
In-Reply-To: <3F4A1A82.9080106@videotron.ca>
References: <3F4A1A82.9080106@videotron.ca>
Message-ID: <16202.11563.472963.116863@montanaro.dyndns.org>
papaDoc> if '::' in filename:
papaDoc> TypeError: 'in ' requires character as left operand
Fixed in CVS using string object's find() method. (Actual fix is in
spambayes/storage.py now due to some reshuffling over the weekend.)
papaDoc> I don't know if this is the good way to do it but it solves my
papaDoc> problem.......
It worked, but using regular expressions may have been overkill.
Probably my favorite Internet quote of all time is this one from Jamie
Zawinski:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
Skip
From skip at pobox.com Mon Aug 25 13:00:12 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 13:00:25 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To:
References:
Message-ID: <16202.16540.742359.215660@montanaro.dyndns.org>
Richie> Is there anyone else with a pending edit that they'd like to see
Richie> in 1.0a5?
We didn't resolve the issue of the print statements in storage.py. I have a
simple change which will shoot them out to sys.stderr instead. I think that
should be considered a bug fix, not an enhancement.
Skip
From richie at entrian.com Mon Aug 25 20:02:10 2003
From: richie at entrian.com (Richie Hindle)
Date: Mon Aug 25 14:02:15 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <16202.16540.742359.215660@montanaro.dyndns.org>
References:
<16202.16540.742359.215660@montanaro.dyndns.org>
Message-ID:
[Richie]
> Is there anyone else with a pending edit that they'd like to see
> in 1.0a5?
[Skip]
> We didn't resolve the issue of the print statements in storage.py. I have a
> simple change which will shoot them out to sys.stderr instead. I think that
> should be considered a bug fix, not an enhancement.
+1 from me. Is there a reason *not* to do it that I'm not aware of?
--
Richie Hindle
richie@entrian.com
From skip at pobox.com Mon Aug 25 15:02:44 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 15:05:10 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To:
References:
<16202.16540.742359.215660@montanaro.dyndns.org>
Message-ID: <16202.23892.947280.850525@montanaro.dyndns.org>
>> We didn't resolve the issue of the print statements in storage.py. I
>> have a simple change which will shoot them out to sys.stderr instead.
>> I think that should be considered a bug fix, not an enhancement.
Richie> +1 from me. Is there a reason *not* to do it that I'm not aware
Richie> of?
I thought someone else had an alternative solution which involved dumping
the prints altogether. I'll check in my change in a moment.
Skip
From skip at pobox.com Mon Aug 25 15:23:42 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 15:24:00 2003
Subject: [spambayes-dev] Proposal for Emacs script to save spam-nonspam
messages for training purposes
In-Reply-To: <16200.48864.657573.887131@Grootgrut.hit>
References: <16200.48864.657573.887131@Grootgrut.hit>
Message-ID: <16202.25150.488685.552545@montanaro.dyndns.org>
Paul> Thank you for this script. However, it did not run properly in my
Paul> case. The function "copy-to-spam" saves the current message into
Paul> "~/tmp/newspam", then marks it for deletion, jumps to the next
Paul> message and undeletes that message. In my case, the following
Paul> modification of e.g. copy-to-spam seems to work:
Paul> (defun copy-to-spam ()
Paul> (interactive)
Paul> (let ((vm-move-after-deleting nil))
Paul> (vm-save-message (expand-file-name "~/mail/mboxes/newspam")))
Paul> (let ((vm-move-after-undeleting t)) (vm-undelete-message 1)))
Thanks for the feedback. FWIW, I don't use those precise Emacs Lisp
incantations anymore. I'm not too surprised they didn't work in all
situations. VM is a fairly complex beast.
Here's what I do now.
(defun train-as-spam ()
(interactive)
(let ((vm-delete-after-saving nil))
(vm-save-message (expand-file-name "~/tmp/newspam"))
(vm-add-message-labels "trained" 1))
(vm-pipe-message-to-command "hammiefilter.py -s >/dev/null" nil))
(defun train-as-nonspam ()
(interactive)
(let ((vm-delete-after-saving nil))
(vm-save-message (expand-file-name "~/tmp/newham"))
(vm-add-message-labels "trained" 1))
(vm-pipe-message-to-command "hammiefilter.py -g >/dev/null" nil))
(define-key vm-mode-map "ls" 'train-as-spam)
(define-key vm-summary-mode-map "ls" 'train-as-spam)
(define-key vm-mode-map "lh" 'train-as-nonspam)
(define-key vm-summary-mode-map "lh" 'train-as-nonspam)
It changes two things. One, it tries not to delete the message, so the
problem you encountered should be gone. Two, it trains on the message. I
don't think I've encountered any problems in this regard, though it's worth
noting that mail could be arriving at the same time as hammiefilter has the
database open for write.
I'll update the faq.
Skip
From skip at pobox.com Mon Aug 25 15:32:42 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 15:48:33 2003
Subject: [spambayes-dev] Website problems
Message-ID: <16202.25690.487337.38536@montanaro.dyndns.org>
I tried pushing a change to the faq just now, but errors from rsync:
% make install
cd download ; make install
Push to shell1.sourceforge.net:/home/groups/s/sp/spambayes/htdocs//download ...
rsync --rsh=ssh -v -r -l -t --update --exclude-from=../scripts/rsync-excludes ./ shell1.sourceforge.net:/home/groups/s/sp/spambayes/htdocs//download
building file list ... done
rsync: recv_generator: mkdir "/home/groups/s/sp/spambayes/htdocs//download": No such file or directory (2)
stat /home/groups/s/sp/spambayes/htdocs//download : No such file or directory
rsync: recv_generator: mkdir "/home/groups/s/sp/spambayes/htdocs//download": No such file or directory (2)
stat /home/groups/s/sp/spambayes/htdocs//download : No such file or directory
wrote 32 bytes read 20 bytes 11.56 bytes/sec
total size is 0 speedup is 0.00
rsync error: some files could not be transferred (code 23) at main.c(620)
make[1]: *** [install] Error 23
make: *** [local_install] Error 2
I logged into shell1.sourceforge.net and see that /home/groups/s is empty.
The login message says:
On 2003-08-23, one project file server (of seven) sufferred a multi-disk
failure; data from this file server has been restored from tape. At
this time, the filesystem for impacted projects is marked read-only
pending completion of our analysis and resolution of this issue. A
small number of users will not be able to login until this resolution
occurs. Watch the "Site Status" page on the SourceForge.net site for
updates. Projects served from this file server start with the letters
j, q, s, and y.
I guess this means that for the time being we can't make changes to the
website. Has anyone else encountered this?
Skip
From skip at pobox.com Mon Aug 25 15:41:56 2003
From: skip at pobox.com (Skip Montanaro)
Date: Mon Aug 25 15:51:33 2003
Subject: [spambayes-dev] RE: SpamBayes Readme
In-Reply-To: <200308230726.h7N7Qgx1015054@localhost.localdomain>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDA1FA@its-xchg4.massey.ac.nz>
<200308230726.h7N7Qgx1015054@localhost.localdomain>
Message-ID: <16202.26244.998766.755415@montanaro.dyndns.org>
Anthony> So long as we actually follow the actual ssh approach, which is
Anthony> localport:remotehost:remoteport
Yeah, which reads to me, "when the user connects to localport, forward it to
remotehost:remoteport". I apologize if I screwed up the syntax in my
original suggestion.
Skip
From T.A.Meyer at massey.ac.nz Tue Aug 26 12:20:13 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 19:21:14 2003
Subject: [spambayes-dev] Patch for pop3proxy
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAED9@its-xchg4.massey.ac.nz>
> papaDoc> if '::' in filename:
> papaDoc> TypeError: 'in ' requires character as
> left operand
>
> Fixed in CVS using string object's find() method. (Actual
> fix is in spambayes/storage.py now due to some reshuffling
> over the weekend.)
Sorry, this was me. This is something that was added in Python 2.3,
wasn't it - I had forgotten that.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Tue Aug 26 12:26:17 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 19:27:10 2003
Subject: [spambayes-dev] stopping pop3proxy
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAEE4@its-xchg4.massey.ac.nz>
> What needs to be done is
[...clever sounding complicated stuff...]
> I'll try to have a look
> at this - if anybody else wants to look at it, let me know so
> we don't duplicate the work.
All yours :)
[Richie]
> Windows sends a message to all top-level windows on shutdown,
> but the pop3proxy has no windows.
[Mark]
> I've seen similar things on Win9x, but not the NT platform.
> Windows does try and shut down console apps, but I have no idea
> exactly what it does.
Perhaps if I ran pop3proxy with pythonw instead of python, so that there
wasn't a console window? I'll give that a go tonight. I suppose that
whatever Windows does to try and shut down console apps must be
documented somewhere on msdn, so I could go look for it myself...
=Tony Meyer
From mhammond at skippinet.com.au Tue Aug 26 10:40:18 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon Aug 25 19:40:29 2003
Subject: [spambayes-dev] stopping pop3proxy
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAEE4@its-xchg4.massey.ac.nz>
Message-ID: <15e101c36b62$3e80a090$f502a8c0@eden>
[Tony]
> Perhaps if I ran pop3proxy with pythonw instead of python, so
> that there
> wasn't a console window? I'll give that a go tonight. I suppose that
> whatever Windows does to try and shut down console apps must be
> documented somewhere on msdn, so I could go look for it myself...
No, I think the issue will actually be "does the app have a message queue?".
Even if running under pythonw.exe, there is no way for Windows to cleanly
shutdown a Python app other than a "console control event" (I think it is
called).
Running a messagw queue would also allow you to detect logon/loggoff etc
events under Win9x (NT platforms should just use the service)
Back to the original issue - I thought a common way to shutdown a server
like this was simply to make a local connection to the server and issue a
shutdown command. Or was it just to make a temporary local connection just
to wake up each listener so it can shutdown? Either way, I have no clue
about how pop3proxy is structured, so have no idea if this even makes sense
:)
Mark.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 2148 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20030826/eab6424c/winmail.bin
From T.A.Meyer at massey.ac.nz Tue Aug 26 13:16:09 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Mon Aug 25 20:16:50 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz>
>> We didn't resolve the issue of the print statements in storage.py.
>> I have a simple change which will shoot them out to sys.stderr
instead.
>> I think that should be considered a bug fix, not an enhancement.
[...]
> I thought someone else had an alternative solution which
> involved dumping the prints altogether. I'll check in my
> change in a moment.
I was the one suggesting removing them, but this would be an
enhancement, whereas your checkin was a bug fix. Instead of removing
them, I think that we might want to move towards an integer verbose
level at some point, which could mean that they are only printed at a
high enough level. The last time this was suggested it got some +0's,
but nothing else, so I presume people don't really care either way.
It's not urgent, though (IMO), and could wait until a later release.
=Tony Meyer
From ta-meyer at ihug.co.nz Tue Aug 26 17:34:39 2003
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Tue Aug 26 00:35:18 2003
Subject: [spambayes-dev] pop3proxy/imapfilter advanced configuration page
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AD4D@its-xchg4.massey.ac.nz>
I've just checked in an update to the web interface for pop3proxy/imapfilter
that provides an "Advanced Configuration" page (there's a button at the
bottom of the regular config page).
This was requested (#791254), but it seemed to make sense to me that some
people might want to play around with advanced options, but not want to have
to understand how to edit the config file by hand (and there's also the need
to use Python to get a list of the options available).
I tried to choose options that seemed too advanced to go on the regular
config page (and I moved a couple from there), but none that are simply too
complicated for people to use unless they know what they are doing. If
anyone thinks I've included something that I shouldn't have, or have missed
an option that I should have included, please let me know.
Any comments welcome.
=Tony Meyer
From T.A.Meyer at massey.ac.nz Tue Aug 26 18:17:51 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 26 01:18:38 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz>
> I'm +1 on both pulling out the backward-compatibility code
> and on renaming everything, but I don't think we can do that
> in a beta release - even the first one. Major changes like
> that should happen during the alpha cycle IMHO.
Fair enough. It just seems that every time things are stable enough to
label the release beta, something comes up. I guess the version info
for each app that's in Version.py clarifies this somewhat, though.
> It could even be worth releasing 1.0a5 *before* making those
> edits, with an announcement that the old options and script
> names are deprecated, then immediately releasing 1.0a6 with
> just those edits in place.
The deprecation of the options was announced ages ago (while you were
off creating a family), and everyone was instructed to change to the new
ones (a script was even provided to do this). I think these must be
ready to go for 1.0a5. Changing the names, though, hasn't been
announced at all.
Overall, then, +1 to your idea.
> I'd like to say +1, especially since it's my birthday too!
> But if we're ready more than a
> couple of days before that, we should probably go ahead.
Given the last couple of days, it looks like we probably will. Perhaps
1.0a5 in a couple of days and then 1.0a6 (as above) on your birthday?
=Tony Meyer
From skip at pobox.com Tue Aug 26 09:32:14 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 09:32:59 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <200308252112.32705.mark.tabash@novacolor.ca>
References: <200308251630.18299.mark.tabash@novacolor.ca>
<16202.30590.705370.358704@montanaro.dyndns.org>
<200308252112.32705.mark.tabash@novacolor.ca>
Message-ID: <16203.24926.483126.523549@montanaro.dyndns.org>
(cc'ing spambayes-dev)
Mark> Thanks. I guess I will delete the database and start over again.
Mark> But what guarantees me that this is not going to happen again?
There are no guarantees. We don't at this moment know what the problem is.
(What follows is perhaps more for the developers than Mark...)
I contacted Sleepycat about distributing binaries of their command line
executables (it doesn't seem they'd have a problem with it). In addition to
information about that I got this information about your specific error:
DB_RUNRECOVERY is the error that is returned when the library detects a
fatal error or structure in the shared region or environment files that
are used to coordinate the interaction between multiple threads of
control. Once this occurs, the shared region is marked invalid and the
application must be shut down, recovery must be run and the application
can be brought back up. Recovery can be run as a standalone utility
(db_recover) or from the application, by specifying DB_RECOVER when
opening the environment.
If the Outlook plugin is executing multiple threads, two of which might
operate on the database simultaneously, I suspect it will have to lock
access to the db file. Mark Hammond, does the above comment jive with the
structure of the plugin?
Skip
From mhammond at skippinet.com.au Wed Aug 27 00:52:24 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue Aug 26 09:52:31 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <16203.24926.483126.523549@montanaro.dyndns.org>
Message-ID: <022b01c36bd9$48bc96a0$f502a8c0@eden>
> If the Outlook plugin is executing multiple threads, two of
> which might
> operate on the database simultaneously, I suspect it will have to lock
> access to the db file. Mark Hammond, does the above comment
> jive with the
> structure of the plugin?
The addin is "mainly" single-threaded - Outlook always calls us from the
same thread. The only time a second thread is used is by the "training" or
"filtering" dialogs. If "training" is running, then this thread will be
updating the database - however, in that case, the dialog is up, which is
modal, so there is no way the other thread could be doing a training
operation.
The passage you quoted doesn't rule out the possibility that this error
could occur even if only one thread is writing, but another is reading. If
that is a problem, then yes, we would hit it :)
Mark.
From barry at python.org Tue Aug 26 15:18:13 2003
From: barry at python.org (Barry Warsaw)
Date: Tue Aug 26 10:18:20 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <022b01c36bd9$48bc96a0$f502a8c0@eden>
References: <022b01c36bd9$48bc96a0$f502a8c0@eden>
Message-ID: <1061907454.23837.13.camel@yyz>
On Tue, 2003-08-26 at 09:52, Mark Hammond wrote:
> The addin is "mainly" single-threaded - Outlook always calls us from the
> same thread. The only time a second thread is used is by the "training" or
> "filtering" dialogs. If "training" is running, then this thread will be
> updating the database - however, in that case, the dialog is up, which is
> modal, so there is no way the other thread could be doing a training
> operation.
>
> The passage you quoted doesn't rule out the possibility that this error
> could occur even if only one thread is writing, but another is reading. If
> that is a problem, then yes, we would hit it :)
Note that for the Berkeley-based storages, I had to implement
application level locking for all reads and writes, but in that
"application" there are definitely multiple threads doing both.
Berkeley itself has a lock subsystem, but I couldn't trust that because
it is statically allocated and there are situations during some
transactions where an unbounded number of pages could get touched,
exhausting the lock table. So I ditched Berkeley's locks and used an
application level (threading) lock.
FWIW,
-Barry
p.s. at one point there was talk of a Usenet group for BerkeleyDB
programmers. I sure wish that existed.
From skip at pobox.com Tue Aug 26 10:17:57 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 10:18:31 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAF4D@its-xchg4.massey.ac.nz>
Message-ID: <16203.27669.269986.138770@montanaro.dyndns.org>
Tony> Instead of removing them, I think that we might want to move
Tony> towards an integer verbose level at some point, which could mean
Tony> that they are only printed at a high enough level. The last time
Tony> this was suggested it got some +0's, but nothing else, so I
Tony> presume people don't really care either way.
I think if we are going to get that sophisticated, we might as well use the
logging module, though that would break 2.2 compatibility. In any case,
I've seen no crying need for anything beside normal and verbose.
Tony> It's not urgent, though (IMO), and could wait until a later release.
Agreed.
Skip
From barry at python.org Tue Aug 26 15:28:11 2003
From: barry at python.org (Barry Warsaw)
Date: Tue Aug 26 10:28:12 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <1061907454.23837.13.camel@yyz>
References: <022b01c36bd9$48bc96a0$f502a8c0@eden>
<1061907454.23837.13.camel@yyz>
Message-ID: <1061908058.23837.19.camel@yyz>
On Tue, 2003-08-26 at 10:17, Barry Warsaw wrote:
> Note that for the Berkeley-based storages, I had to implement
> application level locking for all reads and writes, but in that
> "application" there are definitely multiple threads doing both.
> Berkeley itself has a lock subsystem, but I couldn't trust that because
> it is statically allocated and there are situations during some
> transactions where an unbounded number of pages could get touched,
> exhausting the lock table. So I ditched Berkeley's locks and used an
> application level (threading) lock.
Tim helpfully reminds me that you're using the dbapi to Berkeley, which
doesn't create an environment. I've never actually run Berkeley without
an environment (i.e. a directory that contains all the db files) and
IIRC we couldn't find much information on running Berkeley in that type
of configuration.
-Barry
From skip at pobox.com Tue Aug 26 10:56:08 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 10:56:21 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDB0E6@its-xchg4.massey.ac.nz>
Message-ID: <16203.29960.949254.86283@montanaro.dyndns.org>
Tony> It just seems that every time things are stable enough to label
Tony> the release beta, something comes up.
Maybe we should go into feature freeze after 1.0a5 is released, so we can
focus on bug fixes.
Skip
From skip at pobox.com Tue Aug 26 11:06:37 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 11:06:57 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <022b01c36bd9$48bc96a0$f502a8c0@eden>
References: <16203.24926.483126.523549@montanaro.dyndns.org>
<022b01c36bd9$48bc96a0$f502a8c0@eden>
Message-ID: <16203.30589.46546.16580@montanaro.dyndns.org>
Mark> The passage you quoted doesn't rule out the possibility that this
Mark> error could occur even if only one thread is writing, but another
Mark> is reading. If that is a problem, then yes, we would hit it :)
I'll check with Sleepycat, but it seems to me that the most expedient course
would be to acquire a lock around database accesses.
Skip
From skip at pobox.com Tue Aug 26 11:09:20 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 11:09:36 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <1061907454.23837.13.camel@yyz>
References: <022b01c36bd9$48bc96a0$f502a8c0@eden>
<1061907454.23837.13.camel@yyz>
Message-ID: <16203.30752.500092.713526@montanaro.dyndns.org>
Barry> p.s. at one point there was talk of a Usenet group for BerkeleyDB
Barry> programmers. I sure wish that existed.
I didn't know Sleepycat had a time machine:
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&group=comp.databases.berkeley-db
Have they been talking to Guido?
Skip
From tim.one at comcast.net Tue Aug 26 12:45:04 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Aug 26 11:46:45 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <16203.30589.46546.16580@montanaro.dyndns.org>
Message-ID:
[Skip]
> I'll check with Sleepycat, but it seems to me that the most expedient
> course would be to acquire a lock around database accesses.
Brrrr. Running a Berkeley backend is already soooooo much slower than
running from a dict. I didn't really notice that until the SoBig worm turds
starting swamping my inbox, but after a few days of that I switched back to
using a pickled dict. Adding a lock around each stinkin' access is a good
way to soak up excess cycles, anyway .
From skip at pobox.com Tue Aug 26 13:32:38 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 13:38:01 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To:
References: <16203.30589.46546.16580@montanaro.dyndns.org>
Message-ID: <16203.39350.978784.935238@montanaro.dyndns.org>
>> I'll check with Sleepycat, but it seems to me that the most expedient
>> course would be to acquire a lock around database accesses.
Tim> Brrrr. Running a Berkeley backend is already soooooo much slower
Tim> than running from a dict. I didn't really notice that until the
Tim> SoBig worm turds starting swamping my inbox, but after a few days
Tim> of that I switched back to using a pickled dict. Adding a lock
Tim> around each stinkin' access is a good way to soak up excess cycles,
Tim> anyway .
I suspect that the Outlook plugin simply makes it easier to find problems
(more users, more worm mail, more concurrent threads, whatever). I think
the same (or a similar) problem would exist were two instances of
hammiefilter running at the same time, both trying to update the file. I'm
just fortunate enough to have never encountered that problem. Even using a
pickle, you really ought to use some sort of lock protocol when reading or
writing the pickle file if there's any chance of concurrent access by
another process or thread. That you only read it at the beginning and write
it at the end only limits the opportunity for collision.
I just (re)ran a little experiment. (I'm sure we've done this in the past.)
I took my current hammie.db (153685 keys, no hapaxes, the result of
processing 11,000+ hams and 8,000+ spams) and converted it to a pickle using
dbExpImp. Startup time is dramatically different:
% time python -c 'import pickle ; db = pickle.load(open("hammie.pck"))'
real 0m32.193s
user 0m22.850s
sys 0m0.430s
% time python -c 'import cPickle ; db = cPickle.load(open("hammie.pck"))'
real 0m5.650s
user 0m3.720s
sys 0m0.350s
% time python -c 'import shelve ; db = shelve.open("hammie.db")'
real 0m0.155s
user 0m0.050s
sys 0m0.050s
This is not to imply that my huge database is typical or that my usage of
hammiefilter is either. Using pickles for moderately sized training
databases would probably work, regardless of the application. With
long-running SB apps like the Outlook plugin or pop3proxy, pickles are
probably the way to go. (Maybe it's time to give up on hammiefilter
altogether.)
Skip
From skip at pobox.com Tue Aug 26 15:21:17 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 15:21:29 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
Message-ID: <16203.45869.990264.444596@montanaro.dyndns.org>
Dave Segleau at Sleepycat confirmed for me that making Windows binaries of
the Sleepycat db_* utilities available from the SpamBayes website would be
okay. Who can take the time to make them available? We would need to make
sure that db_recover will actually fix Mark Tabash's (and others?) database
file.
He also said that running db_recover isn't the correct solution. The
correct way to do this is to create an environment object with the
DB_RECOVER flag set. That's not compatible with the anydbm interface
however. I see a few possible solutions:
* Special case the situation where whichdb.whichdb() returns "dbhash"
and make direct calls to the relevant bsddb package functions to
create a db object which is resilient in a multi-threaded
environment. This might be done either using Python's lock facilities
or using Sleepycat's environment locks.
* Modify the behavior of bsddb.hashopen() (and cousins) so that it
creates a DBEnv object with the DB_RECOVER flag and passes it to the
DB() constructor:
def hashopen(....):
flags = bsddb._checkflag(flag)
d = bsddb.db.DB(bsddb.db.DBEnv(bsddb.db.DB_RECOVER))
...
bsddb.hashopen = hashopen
* Provide locks around all database file accesses.
Skip
From richie at entrian.com Tue Aug 26 21:38:42 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Aug 26 15:38:48 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <16203.45869.990264.444596@montanaro.dyndns.org>
References: <16203.45869.990264.444596@montanaro.dyndns.org>
Message-ID:
[Skip]
> Dave Segleau at Sleepycat confirmed for me that making Windows binaries of
> the Sleepycat db_* utilities available from the SpamBayes website would be
> okay. Who can take the time to make them available?
I already have, for db_recover: http://entrian.com/db_recover.zip
Let me know if you need any more and I'll put them up.
> We would need to make sure that db_recover will actually fix Mark
> Tabash's (and others?) database file.
That's the problem. I couldn't make head nor tail of how to use it to
recover a Spambayes database - it expects the database to be a directory
(full of files under bsddb's control) rather than a single file. I assume
this is what Sleepycat mean by an "environment".
For what it's worth, I'm not 100% convinced that what we have is a
threading problem. I keep getting a corrupt spambayes.messageinfo.db, and
I'm pretty sure that's only ever accessed by one thread. I even added
debug statements to print the thread ID, and I only ever saw access from
one thread.
--
Richie Hindle
richie@entrian.com
From richie at entrian.com Tue Aug 26 22:43:21 2003
From: richie at entrian.com (Richie Hindle)
Date: Tue Aug 26 16:43:33 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <16203.45869.990264.444596@montanaro.dyndns.org>
References: <16203.45869.990264.444596@montanaro.dyndns.org>
Message-ID: <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com>
[Skip]
> I see a few possible solutions:
>
> * [...]
* Use a different embedded database? PySQLite? It's just as easy
to install (on Windows at least) as pybsddb for Python 2.2, and
although I've never used it, I've heard good things about it.
Does anyone here have any experience with it?
--
Richie Hindle
richie@entrian.com
From skip at pobox.com Tue Aug 26 17:48:01 2003
From: skip at pobox.com (Skip Montanaro)
Date: Tue Aug 26 17:48:14 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com>
References: <16203.45869.990264.444596@montanaro.dyndns.org>
<5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com>
Message-ID: <16203.54673.552408.583277@montanaro.dyndns.org>
>> I see a few possible solutions:
>>
>> * [...]
Richie> * Use a different embedded database? PySQLite? It's just as
Richie> easy to install (on Windows at least) as pybsddb for Python
Richie> 2.2, and although I've never used it, I've heard good things
Richie> about it. Does anyone here have any experience with it?
I have none. I briefly played around with PostgreSQL and found it much
slower than the anydbm-based storage. That might just have been because I
am not a very sophisticated SQL programmer.
Isn't SQLite supposed to be an embedded SQL engine? If so, where's the
database and how is it shared across (for example) two instances of
hammiefilter?
I think the cleanest way to do this would be to run a server which simply
fronts a pickle. All apps would talk to it for reading and updating the
info. You run into performance problems with network overhead and it makes
deploying all applications that much more complex.
Skip
From romain.guy at jext.org Wed Aug 27 02:31:06 2003
From: romain.guy at jext.org (Romain GUY)
Date: Tue Aug 26 19:35:37 2003
Subject: [spambayes-dev] Outlook express abandon
Message-ID: <20038271316.824848@Thinthalion>
I don't know if this news reached the mailing list but anyway. Microsoft finally announced they won't drop Outlook Express development and support :
http://news.zdnet.co.uk/software/applications/0,39020384,39115720,00.htm
--
Romain GUY
romain.guy@jext.org
http://www.jext.org
http://progx.jext.org
From T.A.Meyer at massey.ac.nz Wed Aug 27 12:36:06 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 26 19:36:57 2003
Subject: [spambayes-dev] RE: [Spambayes] FAQ
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz>
> I changed the footer for both the digest and non-digest
> versions of the list to include an admonition that people
> check the FAQ before posting questions. This is a check that
> I didn't screw it up somehow.
Good idea. We should probably also update reply.txt since there aren't
the problems with the Outlook plug-in that there was when it was
created, and we could make the mention of the FAQ more prominent.
>From memory, Skip, Tim or Barry have to do this, right?
=Tony Meyer
From tim.one at comcast.net Tue Aug 26 22:58:07 2003
From: tim.one at comcast.net (Tim Peters)
Date: Tue Aug 26 22:05:49 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To: <16203.39350.978784.935238@montanaro.dyndns.org>
Message-ID:
[Skip]
> I suspect that the Outlook plugin simply makes it easier to find
> problems (more users, more worm mail, more concurrent threads,
> whatever).
Is that relevant? I've never seen a database corruption complaint from
someone using the Outlook addin (did I miss one?), and I deliberately
switched my 3 classifiers to Berkeley in order to try to provoke one. No
luck. IIRC, Mark has never seen this either.
The first message in this thread:
http://mail.python.org/pipermail/spambayes-dev/2003-August/000873.html
was copied to spambayes-dev from some other source, and was missing
sufficient context to tell what it was talking about. Trying to track the
source down probably leads to here:
http://mail.python.org/pipermail/spambayes/2003-August/007311.html
If so, the OP was running on Windows, but was almost certainly not using the
Outlook addin:
Now I'm getting an error message in the email my
headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError
((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery --
fatal region error detected; run recovery')) in __getitem__() at
C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key]
The Outlook addin never inserts email headers, so I don't believe that
fellow's problem had anything to do with the addin.
> I think the same (or a similar) problem would exist were two
> instances of hammiefilter running at the same time, both trying
> to update the file. I'm just fortunate enough to have never
> encountered that problem. Even using a pickle, you really ought to
> use some sort of lock protocol when reading or writing the pickle
> file if there's any chance of concurrent access by another process or
> thread. That you only read it at the beginning and write it at the
> end only limits the opportunity for collision.
Python dicts are safe for multiple-reader single-writer access without
explicit synchronization, and per-access locks are so bloody expensive that
I don't want to change anything in the absence of proof that there's a
problem that can't be wormed around more cheaply. To date, I don't believe
we've seen any report of corruption via the Outlook addin, which suggests
it's doing something right .
> I just (re)ran a little experiment. (I'm sure we've done this in the
> past.) I took my current hammie.db (153685 keys, no hapaxes, the
> result of processing 11,000+ hams and 8,000+ spams) and converted it
> to a pickle using dbExpImp. Startup time is dramatically different:
Of course.
> % time python -c 'import pickle ; db =
> pickle.load(open("hammie.pck"))'
>
> real 0m32.193s
> user 0m22.850s
> sys 0m0.430s
> % time python -c 'import cPickle ; db =
> cPickle.load(open("hammie.pck"))'
>
> real 0m5.650s
> user 0m3.720s
> sys 0m0.350s
> % time python -c 'import shelve ; db = shelve.open("hammie.db")'
>
> real 0m0.155s
> user 0m0.050s
> sys 0m0.050s
>
> This is not to imply that my huge database is typical or that my
> usage of hammiefilter is either. Using pickles for moderately sized
> training databases would probably work, regardless of the
> application. With long-running SB apps like the Outlook plugin or
> pop3proxy, pickles are probably the way to go. (Maybe it's time to
> give up on hammiefilter altogether.)
I don't know about hammiefilter (haven't used it). I'll remind that the
original spambayes design was done with the expectation that the "big dict"
would eventually be replaced by a BTree stored in ZODB. That's still a
nearly perfect database for spambayes, although only Jeremy pursued it (I
continue to feel guilty about it, though ).
From T.A.Meyer at massey.ac.nz Wed Aug 27 15:23:49 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Tue Aug 26 22:29:33 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz>
> Maybe we should go into feature freeze after 1.0a5 is
> released, so we can focus on bug fixes.
What about we combine this with Richie's suggestion:
* We release 1.0a5 any time now.
* We rename the scripts and move them into the scripts directory and
cut the backwards compatibility code from the options and immediately
release 1.0a6.
* We go into feature freeze for a while and then release 1.0b1.
(All this excludes the Outlook2000 directory, of course ;)
=Tony Meyer
From anthony at interlink.com.au Wed Aug 27 15:15:50 2003
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed Aug 27 00:16:12 2003
Subject: [spambayes-dev] Formatting bugs in the auto-responder message
In-Reply-To:
Message-ID: <200308270415.h7R4FoWC022078@localhost.localdomain>
A couple of underlines are horked in the auto-responder message:
>>> spambayes-bounces@python.org wrote
> READ THIS! (If you want help.)
>
>
> What is Spambayes? ------------------
>
>
> I found a bug. --------------
>
--
Anthony Baxter
It's never too late to have a happy childhood.
From ta-meyer at ihug.co.nz Wed Aug 27 20:19:32 2003
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Wed Aug 27 03:20:09 2003
Subject: [spambayes-dev] Sourceforge's up-to-24 hour delay
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212AD84@its-xchg4.massey.ac.nz>
Although the last newsletter I got from them said that the anonymous cvs
server would be back to full speed real soon now, it's still taking a while.
This has been fairly annoying on a few occasions recently, when trying to
get people to test out bug fixes. Is there some way that (until sourceforge
recovers) we could get the script that processes the checkins (the one that
sends the email) to also create an up-to-date tarball and put it online
somewhere? (spambayes.org/downloads/currentcvs.tgz or something).
If this is a reasonable idea, would someone be able to put it together? I
presume for this sort of thing you need to be an admin, rather than just a
developer.
=Tony Meyer
From skip at pobox.com Wed Aug 27 10:35:15 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 27 10:35:43 2003
Subject: [spambayes-dev] RE: [Spambayes] FAQ
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F130308BC01@its-xchg4.massey.ac.nz>
Message-ID: <16204.49571.34410.320571@montanaro.dyndns.org>
Tony> We should probably also update reply.txt since there aren't the
Tony> problems with the Outlook plug-in that there was when it was
Tony> created, and we could make the mention of the FAQ more prominent.
Tony> From memory, Skip, Tim or Barry have to do this, right?
Yes. Anyone with CVS update privileges can check in changes to the
reply.txt file in the website top level dir then let one of us know about
it. I'm in the midst of a protracted house move (moving out of the old
house before the new one is ready - great fun), so I'm suffering from low
availability during off-hours at the moment, but if you drop me a note I'll
try and adjust the auto-response text in Mailman.
Skip
From skip at pobox.com Wed Aug 27 10:45:07 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 27 10:45:26 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
In-Reply-To:
References: <16203.39350.978784.935238@montanaro.dyndns.org>
Message-ID: <16204.50163.430449.976434@montanaro.dyndns.org>
Tim> [Skip]
>> I suspect that the Outlook plugin simply makes it easier to find
>> problems (more users, more worm mail, more concurrent threads,
>> whatever).
Tim> Is that relevant? I've never seen a database corruption complaint
Tim> from someone using the Outlook addin (did I miss one?), and I
Tim> deliberately switched my 3 classifiers to Berkeley in order to try
Tim> to provoke one. No luck. IIRC, Mark has never seen this either.
I guess I was mistaken. Sorry about that.
Tim> If so, the OP was running on Windows, but was almost certainly not
Tim> using the Outlook addin:
Tim> Now I'm getting an error message in the email my
Tim> headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError
Tim> ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery --
Tim> fatal region error detected; run recovery')) in __getitem__() at
Tim> C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key]
Tim> The Outlook addin never inserts email headers, so I don't believe
Tim> that fellow's problem had anything to do with the addin.
I have this bad habit of jumping to the conclusion that the user was running
the Outlook plugin if a traceback is posted which includes "C:\...". This
would have then been an error in pop3proxy I guess.
>> I think the same (or a similar) problem would exist were two
>> instances of hammiefilter running at the same time, both trying to
>> update the file. I'm just fortunate enough to have never encountered
>> that problem. Even using a pickle, you really ought to use some sort
>> of lock protocol when reading or writing the pickle file if there's
>> any chance of concurrent access by another process or thread. That
>> you only read it at the beginning and write it at the end only limits
>> the opportunity for collision.
Tim> Python dicts are safe for multiple-reader single-writer access
Tim> without explicit synchronization, and per-access locks are so
Tim> bloody expensive that I don't want to change anything in the
Tim> absence of proof that there's a problem that can't be wormed around
Tim> more cheaply. To date, I don't believe we've seen any report of
Tim> corruption via the Outlook addin, which suggests it's doing
Tim> something right .
Skip> ... Startup time is dramatically different:
Tim> Of course.
[ times elided ]
>> This is not to imply that my huge database is typical or that my
>> usage of hammiefilter is either.
Tim> I don't know about hammiefilter (haven't used it).
My only reason for referring to hammiefilter is that its runtime is
dominated by startup and shutdown costs, since all it does is train on or
score a single message. That makes the pickle/dict solution painfully slow.
Were it not for the presence of one-shot apps like hammiefilter, we could
probably just use a pickle for storage and be done with it.
Skip
From skip at pobox.com Wed Aug 27 10:46:18 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 27 10:46:41 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F130308BD02@its-xchg4.massey.ac.nz>
Message-ID: <16204.50234.942808.115814@montanaro.dyndns.org>
Tony> What about we combine this with Richie's suggestion:
...
Fine with me. I won't really be able to contribute anything for the next
week or two I don't think.
Skip
From skip at pobox.com Wed Aug 27 10:53:11 2003
From: skip at pobox.com (Skip Montanaro)
Date: Wed Aug 27 10:54:49 2003
Subject: [spambayes-dev] Formatting bugs in the auto-responder message
In-Reply-To: <200308270415.h7R4FoWC022078@localhost.localdomain>
References:
<200308270415.h7R4FoWC022078@localhost.localdomain>
Message-ID: <16204.50647.615545.604379@montanaro.dyndns.org>
Anthony> A couple of underlines are horked in the auto-responder message:
Alas, it appears to be a Mailman problem. This URL
http://spambayes.sf.net/reply.txt
is simply pasted into the auto-response text field of the Mailman config
stuff. It looks okay in the text area.
Skip
From vanhorn at whidbey.com Wed Aug 27 09:09:56 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Wed Aug 27 11:10:16 2003
Subject: [spambayes-dev] Re: [Spambayes] fatal error?
References: <16203.39350.978784.935238@montanaro.dyndns.org>
<16204.50163.430449.976434@montanaro.dyndns.org>
Message-ID: <3F4CC9C4.CDF7813@whidbey.com>
I didn't follow the start of this closely, but I understand the conclusion you
have been jumping to, even though I personally run pop3proxy on a couple of
Windows machines. That C:\... reference is sort of a giveaway. But what's that
right after, they're using pTython23 as the directory? I suspect a typo, one
that Windows should be able to find with a search for files containing the
string "ptython".
Van
Skip Montanaro wrote:
> Tim> [Skip]
> >> I suspect that the Outlook plugin simply makes it easier to find
> >> problems (more users, more worm mail, more concurrent threads,
> >> whatever).
>
> Tim> Is that relevant? I've never seen a database corruption complaint
> Tim> from someone using the Outlook addin (did I miss one?), and I
> Tim> deliberately switched my 3 classifiers to Berkeley in order to try
> Tim> to provoke one. No luck. IIRC, Mark has never seen this either.
>
> I guess I was mistaken. Sorry about that.
>
> Tim> If so, the OP was running on Windows, but was almost certainly not
> Tim> using the Outlook addin:
>
> Tim> Now I'm getting an error message in the email my
> Tim> headers: X-Spambayes-Exception: bsddb._DBRunRecoveryError
> Tim> ((-30982, 'DB_RUNRECOVERY: Fatal error, run database recovery --
> Tim> fatal region error detected; run recovery')) in __getitem__() at
> Tim> C:\PTYTHON23\lib\bsddb\__init.py line 86: return self.db[key]
>
> Tim> The Outlook addin never inserts email headers, so I don't believe
> Tim> that fellow's problem had anything to do with the addin.
>
> I have this bad habit of jumping to the conclusion that the user was running
> the Outlook plugin if a traceback is posted which includes "C:\...". This
> would have then been an error in pop3proxy I guess.
>
> >> I think the same (or a similar) problem would exist were two
> >> instances of hammiefilter running at the same time, both trying to
> >> update the file. I'm just fortunate enough to have never encountered
> >> that problem. Even using a pickle, you really ought to use some sort
> >> of lock protocol when reading or writing the pickle file if there's
> >> any chance of concurrent access by another process or thread. That
> >> you only read it at the beginning and write it at the end only limits
> >> the opportunity for collision.
>
> Tim> Python dicts are safe for multiple-reader single-writer access
> Tim> without explicit synchronization, and per-access locks are so
> Tim> bloody expensive that I don't want to change anything in the
> Tim> absence of proof that there's a problem that can't be wormed around
> Tim> more cheaply. To date, I don't believe we've seen any report of
> Tim> corruption via the Outlook addin, which suggests it's doing
> Tim> something right .
>
> Skip> ... Startup time is dramatically different:
>
> Tim> Of course.
>
> [ times elided ]
>
> >> This is not to imply that my huge database is typical or that my
> >> usage of hammiefilter is either.
>
> Tim> I don't know about hammiefilter (haven't used it).
>
> My only reason for referring to hammiefilter is that its runtime is
> dominated by startup and shutdown costs, since all it does is train on or
> score a single message. That makes the pickle/dict solution painfully slow.
> Were it not for the presence of one-shot apps like hammiefilter, we could
> probably just use a pickle for storage and be done with it.
>
> Skip
>
> _______________________________________________
> spambayes-dev mailing list
> spambayes-dev@python.org
> http://mail.python.org/mailman/listinfo/spambayes-dev
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------
From richie at entrian.com Wed Aug 27 23:49:15 2003
From: richie at entrian.com (Richie Hindle)
Date: Wed Aug 27 17:49:24 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <16203.54673.552408.583277@montanaro.dyndns.org>
References: <16203.45869.990264.444596@montanaro.dyndns.org>
<5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com>
<16203.54673.552408.583277@montanaro.dyndns.org>
Message-ID: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com>
[Skip]
> Isn't SQLite supposed to be an embedded SQL engine? If so, where's the
> database and how is it shared across (for example) two instances of
> hammiefilter?
The database is just a file. It supports multithreaded operation, and as
far as I can tell at a quicj glance that extends to multiprocess
operation. You just use a different connection in each thread.
Tim, or anyone who knows - is ZODB (without ZEO, which as I understand it
is essentially what Skip suggests below) shareable across
threads/processes?
> I think the cleanest way to do this would be to run a server which simply
> fronts a pickle. All apps would talk to it for reading and updating the
> info. You run into performance problems with network overhead and it makes
> deploying all applications that much more complex.
This is something I've talked about in the past (but talk is cheap) and
which Neale Pickett kind of implemented with hammiesrv. hammiesrv is a
message-classifying XMLRPC server, whereas you seems to proposing more of
a database server, but the underlying idea is the same. I'd envisaged
pop3proxy becoming a component of a generic spambayes server, which serves
or proxies POP3, SMTP, HTML/HTTP, XMLRPC and any other protocols we need.
With the exception of XMLRPC, it's all that already (and it even has a
non-human HTTP client in the shape of proxytee.py).
But as you say, it's another installation/maintenance headache.
Hmm, it seems talk is still cheap. 8-)
--
Richie Hindle
richie@entrian.com
From mhammond at skippinet.com.au Thu Aug 28 21:48:32 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Thu Aug 28 06:48:32 2003
Subject: [spambayes-dev] pop3proxy binaries
Message-ID: <0ef101c36d51$ecf1e440$f502a8c0@eden>
I pretty much have py2exe and SpamBayes working together. The new py2exe
code I am helping with allows us to create a binary distribution for Windows
with a .zip file containing *all* Python code, and and arbitrary number of
executables which share this .zip for their Python library. Thus, each new
.exe/.dll is <30k, meaning we can have as many as we like :) I have a .dll
for Outlook, and a "windows" and a "service" exe for pop3proxy - and single
installation .exe that detects if Outlook is installed and does "the right
thing" would be almost trivial. I expect to check in my scripts etc soon.
However, this does have an impact on pop3proxy, in terms of "out of the box"
setup. Off the top of my head:
* We should ensure only 1 proxy is running on the machine (ie, prevent
starting either the service or the .exe twice)
* We should think about where the databases are stored (the "program files"
directory where we install is probably not appropriate - but a "per user"
database directory makes no sense for a .exe
* Consider a "start_pop3proxy" program that "does the right thing" depending
on the platform and configuration. Eg, it could start the correct program
(service if not running but installed, standard exe otherwise) and fire the
browser to the config URL if it detects it is unconfigured etc.
* other stuff :)
FWIW, we could detect a "binary build" by checking is sys.frozen exists. If
it does, we can also assume the win32all extensions are available - eg, we
could check for a single instance by using a global, named Mutex, etc. I am
willing to help out significantly, but I am unable to "drive" anything, as I
don't own it, and don't want to.
Does this interest anyone enough to take it on with me?
At-your-service ly,
Mark.
From kennypitt at hotmail.com Thu Aug 28 10:48:12 2003
From: kennypitt at hotmail.com (Kenny Pitt)
Date: Thu Aug 28 09:49:16 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com>
References: <16203.45869.990264.444596@montanaro.dyndns.org> <5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com> <16203.54673.552408.583277@montanaro.dyndns.org>
<9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com>
Message-ID: <3F4E081C.6020701@hotmail.com>
Richie Hindle wrote:
> [Skip]
> ...
>>I think the cleanest way to do this would be to run a server which simply
>>fronts a pickle. All apps would talk to it for reading and updating the
>>info. You run into performance problems with network overhead and it makes
>>deploying all applications that much more complex.
>
>
> This is something I've talked about in the past (but talk is cheap) and
> which Neale Pickett kind of implemented with hammiesrv. hammiesrv is a
> message-classifying XMLRPC server, whereas you seems to proposing more of
> a database server, but the underlying idea is the same. I'd envisaged
> pop3proxy becoming a component of a generic spambayes server, which serves
> or proxies POP3, SMTP, HTML/HTTP, XMLRPC and any other protocols we need.
> With the exception of XMLRPC, it's all that already (and it even has a
> non-human HTTP client in the shape of proxytee.py).
Sounds good in theory and in some cases would probably work quite well,
but maybe not in all. As an example, pop3proxy is basically
machine-specific instead of user-specific, particularly if it is running
as a Windows service. If I'm not mistaken, it uses only one database
regardless of which user is logged in and making requests through it, so
a training data server would serve the same purpose. On the other hand,
one of the wonderful things about the Outlook plugin is that it stores
training data on a per-user basis. It seems like handling user-specific
data on a centralized training server would make things much more
complicated.
--
Kenny Pitt
From richie at entrian.com Thu Aug 28 23:12:36 2003
From: richie at entrian.com (Richie Hindle)
Date: Thu Aug 28 17:12:43 2003
Subject: [spambayes-dev] pop3proxy binaries
In-Reply-To: <0ef101c36d51$ecf1e440$f502a8c0@eden>
References: <0ef101c36d51$ecf1e440$f502a8c0@eden>
Message-ID:
[Mark]
> Thus, each new .exe/.dll is <30k, meaning we can have as many as we like :)
> I have a .dll for Outlook, and a "windows" and a "service" exe for pop3proxy
> - and single installation .exe that detects if Outlook is installed and does
> "the right thing" would be almost trivial.
Very cool!
> * We should ensure only 1 proxy is running on the machine (ie, prevent
> starting either the service or the .exe twice)
It's not quite as simple as that - there's no reason you can't run
multiple POP3 proxies on different ports, either as multiple listening
sockets under the same server, or as different processes with different
databases. Whether that's relevant for a binary release I don't know -
perhaps the binary release should be simplified. We should note that
Windows doesn't always give you a bind() failure when you try to bind() to
a port that's already bound - you're probably right about needing a mutex
or similar.
> * We should think about where the databases are stored (the "program files"
> directory where we install is probably not appropriate - but a "per user"
> database directory makes no sense for a .exe
SHGetFolderPath(CSIDL_APPDATA)?
> * Consider a "start_pop3proxy" program that "does the right thing" depending
> on the platform and configuration. Eg, it could start the correct program
> (service if not running but installed, standard exe otherwise) and fire the
> browser to the config URL if it detects it is unconfigured etc.
The way I've envisaged this working (in an ideal world, because it would
probably be a significant effort) is that there's one executable which
gives you several options when you first run it:
o install and run as a service (on systems that support it)
o on 9x, install in RunServices
o add the web UI homepage to your IE bookmarks
o install a tray icon with a simple 'stop/start/launch UI' menu
o auto-configure OE to point to the proxy, and the proxy to point to OE's
configured POP3 server(s) (Tony has talked about this before).
I'd also love to see a traditional 'Windows executable' wrapper for the UI
- just a wrapper that hosts the web UI in an embedded IE, and would run in
a separate process, or better, a separate (and totally independent) thread
in the pop3proxy process. Or even just a .HTA - provided the server is
running, that would be enough. I worry that the fact that the UI only
ever appears in a browser will prove too weird for people used to
traditional Windows programs.
> I am willing to help out significantly, but I am unable to "drive" anything,
> as I don't own it, and don't want to. Does this interest anyone enough to
> take it on with me?
I would love to, but I'm just too pushed for time right now to be in
charge of this. I'm going to try to make the service-stopping code more
safe this weekend, but I don't have enough time to take this whole thing
on as the main developer. I'm happy to act as the resident pop3proxy
'expert', but (like you) I can't drive the project, much as I'd like to.
--
Richie Hindle
richie@entrian.com
From T.A.Meyer at massey.ac.nz Fri Aug 29 12:17:41 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 28 19:19:10 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz>
Tony> What about we combine this with Richie's suggestion:
Tony> We release 1.0a5 any time now.
Tony> * We rename the scripts and move them into the scripts
directory and
Tony> cut the backwards compatibility code from the options and
immediately
Tony> release 1.0a6.
Tony> * We go into feature freeze for a while and then release
1.0b1.
Skip> Fine with me. I won't really be able to contribute anything
Skip> for the next week or two I don't think.
Richie> That sounds like an excellent plan.
Ok, looks like this is what we're going to do. Richie - the only thing
that I have left before 1.0a5 is that last smtpproxy bug (well, it's the
only one I'm bothered with). I'll also update the changelog, version.py
and what's new file today. Do you want to package up 1.0a5 about this
time tomorrow? We could aim to get 1.0a6 out four days later (it would
be good to have a *little* testing ;), say the 04/09. (If you're too
busy celebrating your getting-older-ness, I could do 1.0a6).
=Tony Meyer
From ta-meyer at ihug.co.nz Fri Aug 29 14:12:59 2003
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Thu Aug 28 21:13:43 2003
Subject: [spambayes-dev] Bug messages
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130212ADB3@its-xchg4.massey.ac.nz>
Now this isn't important, but I'm curious :)
Recently (I think), the messages to the bug list are missing a space.
Specifically, the space after the second word of the (bug) subject. For
example:
[spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew
messages unread
Summary: IMAP: keep new messages unread
Is this some weird sourceforge thing, or something we have setup wrong? (Or
somehow something I've got wrong? ;)
=Tony Meyer
From T.A.Meyer at massey.ac.nz Fri Aug 29 15:10:30 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Thu Aug 28 22:11:43 2003
Subject: [spambayes-dev] pop3proxy binaries
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C403@its-xchg4.massey.ac.nz>
> I pretty much have py2exe and SpamBayes working together.
~~~~~~~~~~~ :)
> I expect to check in my scripts etc soon.
Will these work for anyone else? (i.e. do you have a magic version of
py2exe that you and Thomas are working on?)
> * We should think about where the databases are stored (the
> "program files" directory where we install is probably not
> appropriate - but a "per user" database directory makes no
> sense for an .exe
Wouldn't the same setup as the Outlook plug-in make sense? Once a
location is decided I'm happy to write a script that sets the
appropriate options to the correct values.
> * Consider a "start_pop3proxy" program
[...]
It shouldn't be that hard to put this together, so I can do this.
> I am willing to help out significantly, but I am unable to
> "drive" anything, as I don't own it, and don't want to.
>
> Does this interest anyone enough to take it on with me?
I get the terrible feeling that this will end up on the quotes page
under you taking on the Outlook plugin , but since Richie's too
busy, my hand is up. I don't know pop3proxy as well as Richie, but as
long as he keeps going on being the expert , then I'm happy to
take the distribution part on. Although I don't use pop3proxy a huge
amount myself, it is the one that I've installed on family machines
(including one at home)...
Tell me what I need to do :)
=Tony Meyer
From jeremy at alum.mit.edu Thu Aug 28 23:48:24 2003
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Thu Aug 28 22:49:31 2003
Subject: [spambayes-dev] db_* binaries for Windows, DB_RECOVER
In-Reply-To: <9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com>
References: <16203.45869.990264.444596@montanaro.dyndns.org>
<5cgnkv8f9vdd16l2e6ovm18da8lot8r7ob@4ax.com>
<16203.54673.552408.583277@montanaro.dyndns.org>
<9q7qkv4dkj0vanhk8fbva9j61bnf4clsnn@4ax.com>
Message-ID: <1062125303.13897.361.camel@localhost.localdomain>
On Wed, 2003-08-27 at 17:49, Richie Hindle wrote:
> Tim, or anyone who knows - is ZODB (without ZEO, which as I understand it
> is essentially what Skip suggests below) shareable across
> threads/processes?
Yes, both. All storages can be shared by multiple threads in a single
process. To share a storage among multiple processes, you must use ZEO.
Skip's suggestion below is a big vague, so I'm not sure why (or why not)
you would think that's what ZEO is.
If you want to share a database across processes and/or machines, you've
got to have some kind of IPC. ZEO uses sockets to share access to a
single storage.
It is indeed hard to run a ZEO server that it is to run a single
database. You've got to worry about whether the server process is
running in addition to the client process. You've got two applications
to configure. There are more performance issues to think about.
On the other hand, ZEO is widely used in the Zope community. A lot of
the issues have been worked out.
Jeremy
From richie at entrian.com Fri Aug 29 07:27:21 2003
From: richie at entrian.com (Richie Hindle)
Date: Fri Aug 29 01:27:28 2003
Subject: [spambayes-dev] 1.0a5 release [was: SpamBayes Readme]
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz>
References: <1ED4ECF91CDED24C8D012BCF2B034F130308C332@its-xchg4.massey.ac.nz>
Message-ID:
[Tony]
> * We release 1.0a5 any time now.
> * We rename the scripts and move them into the scripts directory and
> cut the backwards compatibility code from the options and immediately
> release 1.0a6.
> * We go into feature freeze for a while and then release 1.0b1.
[Tony again, after some discussion]
> Ok, looks like this is what we're going to do. Richie - the only thing
> that I have left before 1.0a5 is that last smtpproxy bug (well, it's the
> only one I'm bothered with). I'll also update the changelog, version.py
> and what's new file today. Do you want to package up 1.0a5 about this
> time tomorrow? We could aim to get 1.0a6 out four days later (it would
> be good to have a *little* testing ;), say the 04/09. (If you're too
> busy celebrating your getting-older-ness, I could do 1.0a6).
I need to improve the pop3proxy service shutdown code, which I may not be
able to do until Saturday, but yes, I'll package 1.0a5 as soon as it's
done.
Any more for any more?
--
Richie Hindle
richie@entrian.com
From T.A.Meyer at massey.ac.nz Fri Aug 29 16:47:46 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Fri Aug 29 01:43:14 2003
Subject: [spambayes-dev] pop3proxy binaries
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C496@its-xchg4.massey.ac.nz>
[Mark]
> a single installation .exe that detects if Outlook is
> installed and does "the right thing" would be almost trivial.
Of course it should also check if the user is using pop3 or imap, and do
the right thing then, too ;)
[Richie]
> It's not quite as simple as that - there's no reason you can't run
> multiple POP3 proxies on different ports, either as multiple listening
> sockets under the same server, or as different processes with
> different databases. Whether that's relevant for a binary release I
> don't know - perhaps the binary release should be simplified.
I think we could make this a restriction of the binary release. If
people want to do something more esoteric, then they can get Python &
the source.
> o on 9x, install in RunServices
What's RunServices? (I never really used 9x much).
> o install a tray icon with a simple 'stop/start/launch UI' menu
I have a basic one of these made already (it's pretty simple to make
based on the demo that comes with the win32 extensions). I can check it
in if you think it's worth having.
> o auto-configure OE to point to the proxy, and the proxy to
> point to OE's configured POP3 server(s) (Tony has talked about
> this before).
I have indeed, and always put it off because it's such a major effort.
Romain's OE module should make this easier, though (although it only
reads folder/message data at the moment, it's a start). OTOH, doing
this sort of auto-conf for Eudora/Mozilla mail will be a piece of cake.
> I'd also love to see a traditional 'Windows executable'
> wrapper for the UI
[...]
> Or even just a .HTA - provided the
> server is running, that would be enough.
To create a .hta, all we have to do is save a copy of the page with the
.hta extension (into a temp file), and then execute the temp file,
right? (plus fill any details in the hta tag that we care about). This
doesn't sound that difficult.
> I worry that the fact that the UI only
> ever appears in a browser will prove too weird for people used to
> traditional Windows programs.
Me, too, although I'm not sure if a .HTA would be that much more
reassuring. I think if we really want to reassure them, then we could
build on the tray app, to the extreme of having a tabbed dialog pop up
(hmm, where can we find code for that? ) to do the config on.
Then the web ui (as nice as it is!) is just for those who don't like
tray apps, and non Windows users. (Who are clever enough to use the web
ui and not get freaked ).
=Tony Meyer
From mhammond at skippinet.com.au Fri Aug 29 23:18:14 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Aug 29 08:19:26 2003
Subject: [spambayes-dev] pop3proxy binaries
In-Reply-To:
Message-ID: <044201c36e27$9f7c28c0$f502a8c0@eden>
> > * We should ensure only 1 proxy is running on the machine
> (ie, prevent
> > starting either the service or the .exe twice)
>
> It's not quite as simple as that
Indeed.
> Whether that's relevant for a binary release I
> don't know -
> perhaps the binary release should be simplified.
I vote we go that option. However, my "simplified" isn't
> > * We should think about where the databases are stored (the
> > "program files" directory where we install is probably not
> > appropriate - but a "per user" database directory makes no
> > sense for a .exe
Doh - I meant "for a service".
>
> SHGetFolderPath(CSIDL_APPDATA)?
Unfortunately, this is "per user". For a service logged on as the "system"
user, this would be a problem (as it would mean a "default service" and a
standard .exe would have different directories.
How about this for a first cut:
* A service installed, but configured for the system user is considered
"unconfigured", and will refuse to start.
* A mutex named something like "SpamBayes\{username}" is always created -
service and .exe. GetCurrentUser() is used to create the mutex name, and
SHGetFolderPath(CSIDL_APPDATA) is used.
* The "bootstrap" executable is always the "tray icon" program. If the
mutex is alreay set, then it simply offers to launch the UI and whatever
else we feel necessary. If the mutex is not set, it runs the proxy in
process.
If this process detects the service running, it could present the exact same
UI as if the proxy was running in-process - except it would control the
service instead of running the proxy. However, it doesn't sound like being
a tray icon is compatible with "RunServices" - but then again CSIDL_APPDATA
doesn't either - so maybe we just stick to being a "normal" tray icon
process on 9x?
Then for later versions someone else figures out what in that doens't work
:)
> > * Consider a "start_pop3proxy" program that "does the right
> thing" depending
> > on the platform and configuration. Eg, it could start the
> correct program
> > (service if not running but installed, standard exe
> otherwise) and fire the
> > browser to the config URL if it detects it is unconfigured etc.
>
> The way I've envisaged this working (in an ideal world,
> because it would
> probably be a significant effort) is that there's one executable which
> gives you several options when you first run it:
I see no reason this needs to be one executable. Each new exe under this
py2exe scheme is <30k, so we should be able to develop a "driver" program
that detects the environment, and delegates to the correct "sub-exe".
> I'd also love to see a traditional 'Windows executable'
> wrapper for the UI
> - just a wrapper that hosts the web UI in an embedded IE, and
> would run in
> a separate process,
Aww shucks, I could throw a Pythonwin based one of them together :) At the
cost of around 1MB in the installer I just managed to remove (MFC) .
The Outlook dialog/wizard infrastructure works almost exclusively with
"OptionClass" objects - so a stand-alone Wizard that configured your options
would not be impossible. Finding the time to do it is though
> or better, a separate (and totally
> independent) thread
> in the pop3proxy process.
That doesn't work for a service, but would work well for a tray-based icon
hmmm - I think I will be able to come up with something :)
And Tony said:
> I have a basic one of these made already (it's pretty simple to make
> based on the demo that comes with the win32 extensions). I can check it
> in if you think it's worth having.
Yes please. In the "windows" directory along with pop3proxy_service.py.
And I saw Tony ask about py2exe: It will be in the standard CVS version of
py2exe - but the sandbox directory. You can download and configure this
tree now (but remember to run setup.py from sandbox). None of the py2exe
samples are likely to work, but a simple "standard" py2exe script, as per
the docs, should (but there isn't one of them in the "samples" directory -
only advanced ones.) I expect to check new code into that sandbox directory
before I go to bed :)
Mark.
From mhammond at skippinet.com.au Fri Aug 29 23:23:09 2003
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri Aug 29 08:23:10 2003
Subject: [spambayes-dev] skippinet.com.au slightly constipated
Message-ID: <044701c36e28$4fa9c860$f502a8c0@eden>
My ISP has had to dedicate a server soley to handle the sobig traffic to
skippinet.com.au. But I only got a small one . If you need to reach
me and would like to me read it in the same week it is sent, you should use
mhammond shift-2 keypoint.com.au. I've redirected most mailing list and
sourceforge traffic directly to this address.
Mark.
From adam.walker at rbwconsulting.com Fri Aug 29 12:35:44 2003
From: adam.walker at rbwconsulting.com (Adam Walker)
Date: Fri Aug 29 11:35:51 2003
Subject: [spambayes-dev] pop3proxy binaries
In-Reply-To: <044201c36e27$9f7c28c0$f502a8c0@eden>
Message-ID: <20030829153545.77A448627A@plunder.dreamhost.com>
> > I'd also love to see a traditional 'Windows executable'
> > wrapper for the UI
> > - just a wrapper that hosts the web UI in an embedded IE, and
> > would run in
> > a separate process,
>
> Aww shucks, I could throw a Pythonwin based one of them together :) At
> the
> cost of around 1MB in the installer I just managed to remove (MFC) .
>
> The Outlook dialog/wizard infrastructure works almost exclusively with
> "OptionClass" objects - so a stand-alone Wizard that configured your
> options
> would not be impossible. Finding the time to do it is though
>
I'll volunteer to work on the tray program. I wrote a python script that
needed the minimize to tray function, so I've already worked out how to do
that part.
--Adam
From tim.one at comcast.net Fri Aug 29 12:44:24 2003
From: tim.one at comcast.net (Tim Peters)
Date: Fri Aug 29 11:45:00 2003
Subject: [spambayes-dev] Bug messages
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130212ADB3@its-xchg4.massey.ac.nz>
Message-ID:
[Tony Meyer]
> Now this isn't important, but I'm curious :)
>
> Recently (I think), the messages to the bug list are missing a space.
> Specifically, the space after the second word of the (bug) subject.
> For example:
>
> [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew
> messages unread Summary: IMAP: keep new messages unread
Heh. I'm not sure what this is an example of. Could you be very explicit
about what it is in that two lines of stuff you're talking about?
> Is this some weird sourceforge thing, or something we have setup
> wrong? (Or somehow something I've got wrong? ;)
No idea (neither about what's causing it, nor about what "it" is).
From mhammond at keypoint.com.au Sat Aug 30 11:01:11 2003
From: mhammond at keypoint.com.au (Mark Hammond)
Date: Fri Aug 29 20:01:40 2003
Subject: [spambayes-dev] Bug messages
In-Reply-To:
Message-ID: <059201c36e89$d31f3ac0$f502a8c0@eden>
> [Tony Meyer]
> > Now this isn't important, but I'm curious :)
> >
> > Recently (I think), the messages to the bug list are
> missing a space.
> > Specifically, the space after the second word of the (bug) subject.
> > For example:
> >
> > [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew
> > messages unread Summary: IMAP: keep new messages unread
>
> Heh. I'm not sure what this is an example of. Could you be
> very explicit
> about what it is in that two lines of stuff you're talking about?
The Subject line of the bug includes the string "IMAP: keepnew", but the bug
summary itself contains the string "IMAP: keep new". Something is removing
a single space character in the bug summary as it appears in the subject of
the mail, but not in the body or anywhere else.
I've pondered the same thing. Maybe it is a gravity issue, and that first
space can't handle the world down under?
Mark.
From T.A.Meyer at massey.ac.nz Sat Aug 30 21:47:02 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sat Aug 30 04:47:50 2003
Subject: [spambayes-dev] pop3proxy binaries
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C51E@its-xchg4.massey.ac.nz>
> How about this for a first cut:
[details cut]
+1 to all of that. Is writing this stuff part of the guidance you
offered, or is it my job? :)
> The Outlook dialog/wizard infrastructure works almost
> exclusively with "OptionClass" objects - so a stand-alone
> Wizard that configured your options would not be impossible.
> Finding the time to do it is though
I'm happy to do this (probably not for the first release, though), since
I'm familiar with the OptionClass objects (or ought to be ;), and would
like to be familiar with the Outlook dialog infrastructure.
[checking in the simple pop3proxy tray thing that Tony made]
> Yes please. In the "windows" directory along with
> pop3proxy_service.py.
Will do (that's where it was already :).
=Tony Meyer
From T.A.Meyer at massey.ac.nz Sat Aug 30 21:48:57 2003
From: T.A.Meyer at massey.ac.nz (Meyer, Tony)
Date: Sat Aug 30 04:49:36 2003
Subject: [spambayes-dev] pop3proxy binaries
Message-ID: <1ED4ECF91CDED24C8D012BCF2B034F130308C51F@its-xchg4.massey.ac.nz>
> I'll volunteer to work on the tray program. I wrote a python
> script that needed the minimize to tray function, so I've
> already worked out how to do that part.
I have no huge desire to be doing this particular part, so feel free to
rip apart the script that I'll check in shortly
(windows/pop3proxy_tray.py).
(I actually originally wrote it for the 'overkill' script (which I'm
still tinkering with), and I'll keep on working on that version).
=Tony Meyer
From martin at v.loewis.de Sun Aug 31 01:41:54 2003
From: martin at v.loewis.de (Martin v. =?iso-8859-15?q?L=F6wis?=)
Date: Sat Aug 30 20:42:13 2003
Subject: [spambayes-dev] RE: [Python-Dev] RE: [Spambayes] Question
(orpossibly a bug report)
In-Reply-To: <20030814023515.GO3095@async.com.br>
References: <020901c35236$e5576f10$f502a8c0@eden>
<20030814023515.GO3095@async.com.br>
Message-ID:
Christian Reis writes:
> I don't understand this bit. You'd rather use an undocumented API
> function than an open source, well-tested, properly licensed set of
> functions?
Precisely. I don't want to maintain any more floating-point code.
Regards,
Martin
From tim.one at comcast.net Sat Aug 30 22:38:57 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sat Aug 30 21:39:33 2003
Subject: [spambayes-dev] Bug messages
In-Reply-To: <059201c36e89$d31f3ac0$f502a8c0@eden>
Message-ID:
[Tony Meyer]
>>> Now this isn't important, but I'm curious :)
>>>
>>> Recently (I think), the messages to the bug list are missing a
>>> space. Specifically, the space after the second word of the (bug)
>>> subject. For example:
>>>
>>> [spambayes-bugs] [ spambayes-Feature Requests-791246 ] IMAP: keepnew
>>> messages unread Summary: IMAP: keep new messages unread
[Tim]
>> Heh. I'm not sure what this is an example of. Could you be very
>> explicit about what it is in that two lines of stuff you're talking
>> about?
[Mark Hammond]
> The Subject line of the bug includes the string "IMAP: keepnew", but
> the bug summary itself contains the string "IMAP: keep new".
> Something is removing a single space character in the bug summary as
> it appears in the subject of the mail, but not in the body or
> anywhere else.
>
> I've pondered the same thing. Maybe it is a gravity issue, and that
> first space can't handle the world down under?
Thanks for the clarification! The problem is obvious now, but the cause is
not.
If there's an ongoing problem here, it has to be due to something SF is
doing. The only control we (project admins) have over the bug-report email
is:
1. whether or not to send it;
and,
2. if we do want to send it, the email address to which it gets sent.
So the only thing anyone did here was tell SF to email bug tracker stuff to
spambayes-bugs@python.org. There are no other hooks into that system (e.g.,
we don't run any scripts when bug email is generated, and couldn't even if
we wanted to). Sounds like a PHP bug to me .
From tim.one at comcast.net Sun Aug 31 20:23:24 2003
From: tim.one at comcast.net (Tim Peters)
Date: Sun Aug 31 19:25:29 2003
Subject: [spambayes-dev] Bug messages
In-Reply-To: <1ED4ECF91CDED24C8D012BCF2B034F130308C653@its-xchg4.massey.ac.nz>
Message-ID:
> Oh well, it's hardly important, anyway. Thanks for the clarification :)
If it's any consolation, it's not just spambayes -- I just noticed that the
same thing is happening to Python bug reports, like
[ python-Bugs-793822 ] gc.get_referrers() isinherently dangerous
isinherently looks so much like the name of a built-in function that I had
to stare at it for two hours to realize a space was missing .
From vanhorn at whidbey.com Sun Aug 31 18:44:53 2003
From: vanhorn at whidbey.com (G. Armour Van Horn)
Date: Sun Aug 31 20:45:05 2003
Subject: [spambayes-dev] IMAP setup
References: <1ED4ECF91CDED24C8D012BCF2B034F1302EDAED9@its-xchg4.massey.ac.nz>
Message-ID: <3F529685.BBDF094D@whidbey.com>
I hadn't gotten time to actually look at the source yet, but I've been
planning on setting up a mail server with imapfilter. IMAP is a
server-based process, so I assumed that imapfilter would run on the
server, but reading the top of the source I'm not at all sure that's the
case. Have I missed the boat in a really, really big way here?
My plan was to run Sendmail, MailScanner, SpamAssassin, etc in their
normal fashion, but then let individual users have their own specific
SpamBayes IMAP filter to really clean things up.
Van
--
----------------------------------------------------------
Sign up now for Quotes of the Day, a handful of quotations
on a theme delivered every morning.
Enlightenment! Daily, for free!
mailto:twisted@whidbey.com?subject=Subscribe_QOTD
For web hosting and maintenance,
visit Van's home page: http://www.domainvanhorn.com/van/
----------------------------------------------------------