From dave at boost-consulting.com  Wed Nov  1 14:00:27 2006
From: dave at boost-consulting.com (David Abrahams)
Date: Wed, 01 Nov 2006 08:00:27 -0500
Subject: [spambayes-dev] ZODB backward compatibility issue?
Message-ID: <87wt6fb9uc.fsf@pereiro.luannocracy.com>

A non-text attachment was scrubbed...
Name: storage.py.patch
Type: text/x-patch
Size: 668 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20061101/5a1db49b/attachment.bin 

From dave at boost-consulting.com  Wed Nov  1 14:21:37 2006
From: dave at boost-consulting.com (David Abrahams)
Date: Wed, 01 Nov 2006 08:21:37 -0500
Subject: [spambayes-dev] HEAD leaving ZODB locks around?
Message-ID: <87mz7bb8v2.fsf@pereiro.luannocracy.com>


when running sb_imapfilter.py, I seem to be able to change my settings
once, but thereafter, never again:

  500 Server error

  Traceback (most recent call last):

    File "/usr/local/lib/python2.4/site-packages/spambayes/Dibbler.py", line 476, in found_terminator
      getattr(plugin, name)(**params)

    File "/usr/local/lib/python2.4/site-packages/spambayes/ImapUI.py", line 334, in onChangeopts
      UserInterface.UserInterface.onChangeopts(self, **parms)

    File "/usr/local/lib/python2.4/site-packages/spambayes/UserInterface.py", line 884, in onChangeopts
      self.reReadOptions()

    File "/usr/local/lib/python2.4/site-packages/spambayes/ImapUI.py", line 184, in reReadOptions
      self.change_db()

    File "/usr/local/bin/sb_imapfilter.py", line 1198, in change_db
      classifier = storage.open_storage(*storage.database_type(opts))

    File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 952, in open_storage
      return klass(data_source_name)

    File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 695, in __init__
      self.load()

    File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 730, in load
      self.create_storage()

    File "/usr/local/lib/python2.4/site-packages/spambayes/storage.py", line 715, in create_storage
      read_only=self.mode=='r')

    File "/usr/local/lib/python2.4/site-packages/ZODB/FileStorage.py", line 232, in __init__
      self._lock_file = LockFile(file_name + '.lock')

    File "/usr/local/lib/python2.4/site-packages/ZODB/lock_file.py", line 62, in __init__
      lock_file(self._fp)

    File "/usr/local/lib/python2.4/site-packages/ZODB/lock_file.py", line 42, in lock_file
      fcntl.flock(file.fileno(), _flags)

  IOError: [Errno 35] Resource temporarily unavailable

-- 
Dave Abrahams
Boost Consulting
www.boost-consulting.com


From amiles at csusm.edu  Wed Nov  1 20:05:21 2006
From: amiles at csusm.edu (Alan Miles)
Date: Wed, 1 Nov 2006 11:05:21 -0800
Subject: [spambayes-dev] Spambayes crashes Outlook
Message-ID: <86BF48E0F11BE44D99952F9FB349BC4B0F6EF832@priority.csusm.edu>

I checked you FAQ's for this problem and it doesn't address the problem
for my system.

SpamBayes Frequently Asked Questions

5.8   After installing SpamBayes, Outlook crashes and then asks for the
plug-in to be disabled. <http://spambayes.sourceforge.net/faq.html#id65>


Are you using an Athlon 64 with DEP? There are issues with DEP and
Outlook with a SpamBayes-based plug-in. Listing Outlook as a safe
application on an Athlon 64 should "solve" the problem.
-------------------------------------

I am not using an Athlon-based computer. It's an Intel Pentium 4 3.00GHz
running Win XP Pro. The program was working on an earlier version. My
system had a hardware problem and the motherboard was replaced. I
noticed that the plug-in no longer worked. I uninstalled the older
Spambayes and installed version 1.1a3
It crashes exactly as stated in the FAQ. I'm not sure what else to try.
I'm using Outlook 2003 (vers. 11) with all patches applied.

Thanks,		---Alan

From amiles at csusm.edu  Fri Nov  3 02:07:43 2006
From: amiles at csusm.edu (Alan Miles)
Date: Thu, 2 Nov 2006 17:07:43 -0800
Subject: [spambayes-dev] Spambayes crashes Outlook
In-Reply-To: <86BF48E0F11BE44D99952F9FB349BC4B0E46B476@priority.csusm.edu>
Message-ID: <86BF48E0F11BE44D99952F9FB349BC4B0F6EFE47@priority.csusm.edu>

Apparently there was an issue with DEP on my machine. Another program
was also having issues but it called up an option to change the DEP
settings. This seemed to cure the Spambayes problems too.

Thanks,		---Alan

-----Original Message-----
From: Alan Miles 
Sent: Wednesday, November 01, 2006 11:05 AM
To: 'spambayes-dev at python.org'
Subject: Spambayes crashes Outlook

I checked you FAQ's for this problem and it doesn't address the problem
for my system.

SpamBayes Frequently Asked Questions

5.8   After installing SpamBayes, Outlook crashes and then asks for the
plug-in to be disabled. <http://spambayes.sourceforge.net/faq.html#id65>


Are you using an Athlon 64 with DEP? There are issues with DEP and
Outlook with a SpamBayes-based plug-in. Listing Outlook as a safe
application on an Athlon 64 should "solve" the problem.
-------------------------------------

I am not using an Athlon-based computer. It's an Intel Pentium 4 3.00GHz
running Win XP Pro. The program was working on an earlier version. My
system had a hardware problem and the motherboard was replaced. I
noticed that the plug-in no longer worked. I uninstalled the older
Spambayes and installed version 1.1a3 It crashes exactly as stated in
the FAQ. I'm not sure what else to try. I'm using Outlook 2003 (vers.
11) with all patches applied.

Thanks,		---Alan

From pl at symbolic.it  Mon Nov  6 08:46:28 2006
From: pl at symbolic.it (Luigi Pugnetti)
Date: Mon, 06 Nov 2006 08:46:28 +0100
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <17742.14240.9321.436394@montanaro.dyndns.org>
References: <000b01c6f63c$78785320$0600000a@kasper>
	<17724.54543.797457.531620@montanaro.dyndns.org>
	<003901c6f6be$3b39a310$0600000a@kasper>
	<17724.65310.502709.225367@montanaro.dyndns.org>
	<001301c6fe8a$6a6cb950$0600000a@kasper>
	<1162478403.27899.22.camel@localhost.localdomain>
	<17742.14240.9321.436394@montanaro.dyndns.org>
Message-ID: <1162799189.27385.9.camel@localhost.localdomain>

moved to spambayes-dev as requested by some people.

as a note the first part of the message is referring to a mail from Vibe
Grevsen

On Sun, 2006-11-05 at 13:12 -0600, skip at pobox.com wrote:
> Okay, I'm finally actually editing the necessary files.
> 
>     2> is supported by WinNT, 2k and XP I just newer saw it used before.
>     2> is not supported in Win9x and ME.
> 
> I don't think we care about Win9x or WinME (though someone should feel free
> to demonstrate my ignorance here).
> 
>     >> However /dev/null is - of course - not found in Windows. Equivalent
>     >> is nul (case insensitive).  Better use os.path.devnull like shown
>     >> here. Parenthesis required for string formatting!
> 
> Correct.  Will be checked in shortly.
> 
>     Luigi> On windows you have to put quote around pnmfile to protect
>     Luigi> against space in path ...
> 
> Not a problem here, since the pnmfile is named using the tempfile.mkstemp
> function.  It won't contain any characters which require special treatment.

as long as python returns a short path you have no problem otherwise you
could get a path with a space (like c:\Document and Settings\...).
It's could be required also on Linux (but as already said it's unlikely
you get a path that requires quote there).
Putting quotes around pnmfile does no harm and may protect you from
unusual setup

> 
>     >> Finally I was surprised to find that
>     >> 
>     >> ocrad -s4 -x out.txt >ocr.txt logo.pgm
>     >> 
>     >> did produce an ocr.txt but no out.txt for this image
>     >> http://www.unlockaarhus.dk/dev/logo.pgm.
>     >> 
>     >> Maybe it's only a problem with small images? Could you please test if
>     >> this is the case under Unix as well?
> 
>     Luigi> using -s (and other flags as well) disable -x.
> 
> Hmmm...  That sucks.  I see the lines in ocrad's code where that happens.  I
> mailed a note to bug-ocrad asking why this is so.  Hopefully it's just a
> simple bug that can be squashed.
> 
>     Luigi> orf file is never used. probably is there from the start before
>     Luigi> skip introduce the scale parameter
> 
> Actually, yes, it is used:
> 

yes this was a "language" problem.  what I meant was you get no info
because it's always empty.
sorry for the error.

>     for line in open(orf):
>         if line.startswith("lines"):
>             nlines = int(line.split()[1])
>             if nlines:
>                 ctokens.add("image-text-lines:%d" %
>                             int(log2(nlines)))
> 
> so no image-text-lines:NN tokens are generated.
> 
> Skip
-- 
Luigi Pugnetti

Symbolic S.p.A.
V.le Mentana, 29
I-43100 Parma
Italy

Tel: +39 0521 708811
Fax: +39 0521 776190


From skip at pobox.com  Mon Nov  6 15:22:55 2006
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Nov 2006 08:22:55 -0600
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <1162799189.27385.9.camel@localhost.localdomain>
References: <000b01c6f63c$78785320$0600000a@kasper>
	<17724.54543.797457.531620@montanaro.dyndns.org>
	<003901c6f6be$3b39a310$0600000a@kasper>
	<17724.65310.502709.225367@montanaro.dyndns.org>
	<001301c6fe8a$6a6cb950$0600000a@kasper>
	<1162478403.27899.22.camel@localhost.localdomain>
	<17742.14240.9321.436394@montanaro.dyndns.org>
	<1162799189.27385.9.camel@localhost.localdomain>
Message-ID: <17743.17727.753844.91535@montanaro.dyndns.org>


    Luigi> as long as python returns a short path you have no problem
    Luigi> otherwise you could get a path with a space (like c:\Document and
    Luigi> Settings\...).  It's could be required also on Linux (but as
    Luigi> already said it's unlikely you get a path that requires quote
    Luigi> there).  Putting quotes around pnmfile does no harm and may
    Luigi> protect you from unusual setup

Fixed.

Thx,

Skip

From pl at symbolic.it  Mon Nov  6 15:32:24 2006
From: pl at symbolic.it (Luigi Pugnetti)
Date: Mon, 06 Nov 2006 15:32:24 +0100
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <17743.17727.753844.91535@montanaro.dyndns.org>
References: <000b01c6f63c$78785320$0600000a@kasper>
	<17724.54543.797457.531620@montanaro.dyndns.org>
	<003901c6f6be$3b39a310$0600000a@kasper>
	<17724.65310.502709.225367@montanaro.dyndns.org>
	<001301c6fe8a$6a6cb950$0600000a@kasper>
	<1162478403.27899.22.camel@localhost.localdomain>
	<17742.14240.9321.436394@montanaro.dyndns.org>
	<1162799189.27385.9.camel@localhost.localdomain>
	<17743.17727.753844.91535@montanaro.dyndns.org>
Message-ID: <1162823544.27385.47.camel@localhost.localdomain>

On Mon, 2006-11-06 at 08:22 -0600, skip at pobox.com wrote:
>     Luigi> as long as python returns a short path you have no problem
>     Luigi> otherwise you could get a path with a space (like c:\Document and
>     Luigi> Settings\...).  It's could be required also on Linux (but as
>     Luigi> already said it's unlikely you get a path that requires quote
>     Luigi> there).  Putting quotes around pnmfile does no harm and may
>     Luigi> protect you from unusual setup
> 
> Fixed.

On windows you cannot use ' to quote a path you have to use ".

you must change
ocr = os.popen("%s -s %s -c %s -f '%s' 2>%s"

to 

ocr = os.popen('%s -s %s -c %s -f "%s" 2>%s'


sorry no patch

-- 
Luigi Pugnetti

Symbolic S.p.A.
V.le Mentana, 29
I-43100 Parma
Italy

Tel: +39 0521 708811
Fax: +39 0521 776190


From skip at pobox.com  Mon Nov  6 15:51:03 2006
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Nov 2006 08:51:03 -0600
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <1162823544.27385.47.camel@localhost.localdomain>
References: <000b01c6f63c$78785320$0600000a@kasper>
	<17724.54543.797457.531620@montanaro.dyndns.org>
	<003901c6f6be$3b39a310$0600000a@kasper>
	<17724.65310.502709.225367@montanaro.dyndns.org>
	<001301c6fe8a$6a6cb950$0600000a@kasper>
	<1162478403.27899.22.camel@localhost.localdomain>
	<17742.14240.9321.436394@montanaro.dyndns.org>
	<1162799189.27385.9.camel@localhost.localdomain>
	<17743.17727.753844.91535@montanaro.dyndns.org>
	<1162823544.27385.47.camel@localhost.localdomain>
Message-ID: <17743.19415.61887.328377@montanaro.dyndns.org>


    Luigi> On windows you cannot use ' to quote a path you have to use ".

*sigh* "fixed" 'again'.

Thx,

S

From grevsen at gmail.com  Mon Nov  6 19:24:54 2006
From: grevsen at gmail.com (Vibe Grevsen)
Date: Mon, 6 Nov 2006 10:24:54 -0800 (PST)
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <17743.19415.61887.328377@montanaro.dyndns.org>
References: <1162799189.27385.9.camel@localhost.localdomain>
	<17743.17727.753844.91535@montanaro.dyndns.org>
	<1162823544.27385.47.camel@localhost.localdomain>
	<17743.19415.61887.328377@montanaro.dyndns.org>
Message-ID: <7204201.post@talk.nabble.com>


skip-2 wrote:
> 
> 
>     Luigi> On windows you cannot use ' to quote a path you have to use ".
> 
> *sigh* "fixed" 'again'.
> 
> Thx,
> 
> S
> 

Ok, nice work. Your idea with strip and log2 is good I think.

May I ask why you did not use popen3 for the call?  Is there some reason to
avoid it?

                ocr_cmd = ur'ocrad -s %s -c %s "%s"' % (scale, charset,
pnmfile)

                # os.popen3() returns [stdin, stdout, stderr]
                ocr = os.popen3( ocr_cmd )[1]

You're right 9x is no big market, but if it is as easy as this I don't see a
reason not to support it.
Also unicode is really a good idea for localized windows systems.
-- 
View this message in context: http://www.nabble.com/Re%3A--Spambayes--Unwanted-stock-solicitations-tf2581077.html#a7204201
Sent from the Python - spambayes-dev mailing list archive at Nabble.com.


From grevsen at gmail.com  Mon Nov  6 19:33:38 2006
From: grevsen at gmail.com (Vibe Grevsen)
Date: Mon, 6 Nov 2006 10:33:38 -0800 (PST)
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <17743.19415.61887.328377@montanaro.dyndns.org>
References: <1162799189.27385.9.camel@localhost.localdomain>
	<17743.17727.753844.91535@montanaro.dyndns.org>
	<1162823544.27385.47.camel@localhost.localdomain>
	<17743.19415.61887.328377@montanaro.dyndns.org>
Message-ID: <7204354.post@talk.nabble.com>


skip-2 wrote:
> 
> 
>     Luigi> On windows you cannot use ' to quote a path you have to use ".
> 
> *sigh* "fixed" 'again'.
> 
> Thx,
> 
> S
> 

One more...

Did you forget about the default in this line

179:         scale = options["Tokenizer", "ocrad_scale"] or 1

Seems redundant at best...


Happy coding :)

Vibe
-- 
View this message in context: http://www.nabble.com/Re%3A--Spambayes--Unwanted-stock-solicitations-tf2581077.html#a7204354
Sent from the Python - spambayes-dev mailing list archive at Nabble.com.


From skip at pobox.com  Mon Nov  6 20:17:35 2006
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 6 Nov 2006 13:17:35 -0600
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <7204201.post@talk.nabble.com>
References: <1162799189.27385.9.camel@localhost.localdomain>
	<17743.17727.753844.91535@montanaro.dyndns.org>
	<1162823544.27385.47.camel@localhost.localdomain>
	<17743.19415.61887.328377@montanaro.dyndns.org>
	<7204201.post@talk.nabble.com>
Message-ID: <17743.35407.202601.161766@montanaro.dyndns.org>


    Vibe> May I ask why you did not use popen3 for the call?  Is there some
    Vibe> reason to avoid it?

Because os.popen works?

    Vibe> You're right 9x is no big market, but if it is as easy as this I
    Vibe> don't see a reason not to support it.  Also unicode is really a
    Vibe> good idea for localized windows systems.

I don't know if SpamBayes is supported on Win9X at all.  It doesn't have
anything to do with this particular edit.  Again, note that I am creating
all the files used by the os.popen call.  None of the characters should be
non-ASCII.  Are you thinking that some user is going to set the TMPDIR, TEMP
or TMP environment variables to some Unicode directory?  That's the only way
I think Unicode could leak in.

Skip

From mhammond at skippinet.com.au  Mon Nov  6 23:45:26 2006
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Tue, 7 Nov 2006 09:45:26 +1100
Subject: [spambayes-dev] [Spambayes] Unwanted stock solicitations
In-Reply-To: <17743.35407.202601.161766@montanaro.dyndns.org>
Message-ID: <15b401c701f5$417dd790$0c0a0a0a@enfoldsystems.local>

> None of the characters should be
> non-ASCII.  Are you thinking that some user is going to set
> the TMPDIR, TEMP or TMP environment variables to some
> Unicode directory?  That's the only way I think Unicode
> could leak in.

The TEMP env var will, by default, include the username in the full path
(eg, "\Documents and Settings\{username}\Local Settings\...") - hence, a
username with extended chars is relevant.

On the good side though, accessing these variables through os.environ()
should be fine.  In most cases, the 'short name' will be stored in the
environment (meaning they will never be non-ascii), or it will include an
'mbcs' encoded value of the name, which can be passed directly to system
functions without decoding.

So in general, I think you will be alright.

Cheers,

Mark


From grevsen at gmail.com  Tue Nov  7 21:05:34 2006
From: grevsen at gmail.com (Vibe Grevsen)
Date: Tue, 7 Nov 2006 21:05:34 +0100
Subject: [spambayes-dev]  Error saving settings
Message-ID: <000501c702a8$16ba0700$0600000a@kasper>


Hi,

here is something to look into.

When saving settings from the web interface I get the following error.
It appears the settings are actually saved. I haven't done any debugging on this yet.

FYI: WinXP, SpamBayes running from recent sources.
Problem did not appear in 1.0x.


Happy coding :)

Vibe
---
500 Server error
Traceback (most recent call last):

  File "C:\Programmer\Python25\lib\site-packages\spambayes\Dibbler.py", line 476, in found_terminator
    getattr(plugin, name)(**params)

  File "C:\Programmer\Python25\lib\site-packages\spambayes\UserInterface.py", line 884, in onChangeopts
    self.reReadOptions()

  File "C:\Programmer\Python25\lib\site-packages\spambayes\ProxyUI.py", line 782, in reReadOptions
    state = self.state_recreator()

  File "C:\spambayes\spambayes\scripts\sb_server.py", line 1007, in _recreateState
    prepare()

  File "C:\spambayes\spambayes\scripts\sb_server.py", line 1022, in prepare
    state.prepare(can_stop)

  File "C:\spambayes\spambayes\scripts\sb_server.py", line 822, in prepare
    self.createWorkers()

  File "C:\spambayes\spambayes\scripts\sb_server.py", line 889, in createWorkers
    self.stats = Stats.Stats(options, self.mdb)

  File "C:\Programmer\Python25\lib\site-packages\spambayes\Stats.py", line 60, in __init__
    self.from_date = self.messageinfo_db.get_statistics_start_date()

AttributeError: 'NoneType' object has no attribute 'get_statistics_start_date'
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20061107/691fabbe/attachment.html 

From musicproducing at yahoo.it  Fri Nov 10 19:11:25 2006
From: musicproducing at yahoo.it (Manuel Marino)
Date: Fri, 10 Nov 2006 18:11:25 -0000
Subject: [spambayes-dev] =?iso-8859-1?q?New_fare_for_Indies_only!_Next_mon?=
	=?iso-8859-1?q?ths_only!?=
Message-ID: <20061110.UMYQGLQKJSREZERW@yahoo.it>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20061110/1dc7a459/attachment.htm 

From f.rougon at free.fr  Sun Nov 12 00:09:48 2006
From: f.rougon at free.fr (Florent Rougon)
Date: Sun, 12 Nov 2006 00:09:48 +0100
Subject: [spambayes-dev] An alternative to spambayes.el for those using Gnus
Message-ID: <87zmaxr35v.fsf@florent.maison>

A non-text attachment was scrubbed...
Name: flo-spambayes.el
Type: application/emacs-lisp
Size: 11332 bytes
Desc: Interface between Spambayes and Gnus
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20061112/e795cf11/attachment.bin 

From tzz at lifelogs.com  Tue Nov 14 16:14:32 2006
From: tzz at lifelogs.com (Ted Zlatanov)
Date: Tue, 14 Nov 2006 15:14:32 +0000
Subject: [spambayes-dev] An alternative to spambayes.el for those using
	Gnus
In-Reply-To: <87zmaxr35v.fsf@florent.maison> (Florent Rougon's message of
	"Sun\, 12 Nov 2006 00\:09\:48 +0100")
References: <87zmaxr35v.fsf@florent.maison>
Message-ID: <g69y7qenjqf.fsf@lifelogs.com>

On 11 Nov 2006, f.rougon at free.fr wrote:

> I've been running my own interface code between Gnus and Spambayes for a
> while, and improved it a bit today to the point that I think it should
> be ready for public consumption.
>
> It can do the same things as spambayes.el, but in a way that should be
> cleaner and slightly faster (using `call-process-region' instead of
> `shell-command-on-region', for instance). It also provides a few more
> things, most notably:
>
> - a command for (re-)running the classifier on an article (or
> process-marked articles). Useful when you've recently trained
> Spambayes and want to see how the newly-trained filter
> performs---and maybe even respool some articles with this new
> filter.
>
> - a command to examine what the Spambayes filter thinks of an article
> (read-only operation): whether it is classified as ham or spam, the
> overall spam score as well as the various spam clues with their
> respective scores (from the 'X-Spambayes-Evidence' header).
>
> This has been tested with GNU Emacs 21.4, Spambayes 1.0.3 and No Gnus
> v0.6 (also with Gnus v5.10.7).
>
> It works well for me, and I hope others will find it useful.

Hi,

would you consider merging your code with the Gnus spam.el system?

You need to write a backend, which includes:

- a spam/ham check function (1 function)
- spam/ham register/unregister functions (4 functions)

Plus update several variables.  It's not a lot of work.  Let me know
if you are interested.  Your spambayes.el can exist with spam.el
(exporting functions for its use) or you can merge the code right in.
It's up to you.

spam.el does much of the infrastructure you mention above, especially
deciding when to run the classifier and on which articles.

Thanks
Ted

From f.rougon at free.fr  Sun Nov 19 23:50:11 2006
From: f.rougon at free.fr (Florent Rougon)
Date: Sun, 19 Nov 2006 23:50:11 +0100
Subject: [spambayes-dev] An alternative to spambayes.el for those using
	Gnus
In-Reply-To: <g69y7qenjqf.fsf@lifelogs.com> (Ted Zlatanov's message of "Tue,
	14 Nov 2006 15:14:32 +0000")
References: <87zmaxr35v.fsf@florent.maison> <g69y7qenjqf.fsf@lifelogs.com>
Message-ID: <874psvrqzg.fsf@florent.maison>

Hi,

Ted Zlatanov <tzz at lifelogs.com> wrote:

> would you consider merging your code with the Gnus spam.el system?

Sorry for the late reply. I was a bit busy and wanted to reread the
"Spam Package Introduction" Info node to avoid making an uninformed
answer.

Having just read it, I'm not sure the scheme implemented in spam.el fits
well with the way I want to work with Spambayes. One of the reasons is
that I do *not* want to train the filter on every article. To have an
efficient Spambayes filter, experiments made by Spambayes users and
developers have shown that it is often a good idea to only train the
filter on its mistakes (after an initial training).

[ Personally, I don't even train the filter on every mistake, because
  there are articles that I believe are too well-crafted spam: I fear
  I'll pollute my Spambayes database if I train on these articles. These
  are articles that mostly contain words that are part of my usual
  ham.?]

Therefore, I wouldn't want the "spam and ham processors" to do anything
when I exit a group. I want to carefully select which articles get to
train the filter.

As a consequence, the paragraph in the "Spam Package Introduction" node
that reads:

,----
|    If the spam filter failed to mark a spam message, you can mark it
| yourself, so that the message is processed as spam when you exit the
| group:
| 
| `M-d'
| `M s x'
| `S x'
|      Mark current article as spam, showing it with the `$' mark
|      (`gnus-summary-mark-as-spam').
| 
| Similarly, you can unmark an article if it has been erroneously marked
| as spam.  *Note Setting Marks::.
`----

would be misleading to users, because marking articles as ham or spam
wouldn't make any difference in the absence of any action from the "spam
and ham processors".

There's another thing in spam.el that doesn't seem to work the way I
want:

,----
|    The second thing that the Spam package does when you exit a group is
| to move ham articles out of spam groups, and spam articles out of ham
| groups.  Ham in a spam group is moved to the group specified by the
| variable `gnus-ham-process-destinations', or the group parameter
| `ham-process-destination'.  Spam in a ham group is moved to the group
| specified by the variable `gnus-spam-process-destinations', or the
| group parameter `spam-process-destination'.
`----

This means that if, e.g., I had a ham that was classified as spam and I
mark it as ham before leaving the group, then the article will be moved
to the group specified by `gnus-ham-process-destinations'---regardless
of the specific article.

I prefer my way of doing that: if an article is misclassifed, there are
two possibilities:
  - either I don't want to train the filter on the article (for
    instance, because several similar articles were misclassifed in a
    row and I already trained the filter on one of them). In this case,
    I usually simply use 'B m' to move the article manually to the right
    group.

    There is another possiblity that works well in the example I gave in
    the parenthesis: since the filter was trained on a similar article,
    you can expect it to classify the article correctly next time;
    therfore, you can call '(flo-spambayes-gnus-classify t)' in order
    to:

       1. rerun the classifier on the article;
       2. respool it afterwards (this is because of the "t" argument).

    The respooled article will eventually end up in the right group
    according to `nnmail-split-methods'.

  - or I use 'B s' (resp. 'B h') to tell the filter "Dude, this was
    spam!" (resp. "Dude, this was ham!"), i.e., I train the filter on
    the article. These key sequences, which are mapped to lambda
    expressions evaluating '(flo-spambayes-gnus-refile-as-spam t)' and
    '(flo-spambayes-gnus-refile-as-ham t)' respectively, do two things:

       1. train the filter on the article;
       2. respool it afterwards (this is because of the "t" argument).

    As a consequence, the article will (most probably) end up in the
    right group, according to `nnmail-split-methods'.

    [ I say "most probably", because it might be that the filter was so
      badly trained in the past that it still couldn't classify the
      article correctly the second time. This never happened to me, but
      I think it's possible. ]

The key point here is that in either case, if the article was, e.g.,
something for the ding mailing-list wrongly classified as spam when the
incoming mail was split, it will end up directly in my "ding" group
after the corrective actions I described, not in whichever group
specified by `gnus-ham-process-destinations'.

Lastly, there's another thing I'm not sure about when reading the Info
node:

,----
| The Spam package divides Gnus groups into three categories: ham
| groups, spam groups, and unclassified groups.
`----

What exactly do unclassified groups contain? With Spambayes, when you
run an article through the classifer, it gets a spam score (between 0
and 1) and a category depending on the spam score. There are three
categories: ham, unsure and spam (from lowest score to highest score).
"unsure" means the article got a score that is not low enough to be
confident it's ham, and not high enough to be confident it's spam. But
it surely doesn't mean the article wasn't _classifed_ (i.e., it did go
through the classifier---whose output was "unsure"). That's why I'm not
sure the "unclassified group" mentioned in the above sentence is
well-suited for articles marked as "unsure" by Spambayes.

To rephrase it differently: you said a spam backend must provide a
function that tells whether a message is ham or spam. But this is not
suited to Spambayes, since there are 3 possible outcomes from the filter
by default, not 2 (unless you tweak it to make the "unsure" score range
vanish, but that would be silly in most cases).

Regards,

-- 
Florent

From tzz at lifelogs.com  Tue Nov 21 19:50:55 2006
From: tzz at lifelogs.com (Ted Zlatanov)
Date: Tue, 21 Nov 2006 18:50:55 +0000
Subject: [spambayes-dev] An alternative to spambayes.el for those using
	Gnus
In-Reply-To: <874psvrqzg.fsf@florent.maison> (Florent Rougon's message of
	"Sun\, 19 Nov 2006 23\:50\:11 +0100")
References: <87zmaxr35v.fsf@florent.maison> <g69y7qenjqf.fsf@lifelogs.com>
	<874psvrqzg.fsf@florent.maison>
Message-ID: <g69zmaktz00.fsf@lifelogs.com>

On 19 Nov 2006, f.rougon at free.fr wrote:

> Having just read it, I'm not sure the scheme implemented in spam.el fits
> well with the way I want to work with Spambayes. One of the reasons is
> that I do *not* want to train the filter on every article. To have an
> efficient Spambayes filter, experiments made by Spambayes users and
> developers have shown that it is often a good idea to only train the
> filter on its mistakes (after an initial training).

OK.  This *can* be the usage mode, but basically we leave it up to the
user, and it's a global choice.  Read on...

> [ Personally, I don't even train the filter on every mistake, because
> there are articles that I believe are too well-crafted spam: I fear
> I'll pollute my Spambayes database if I train on these articles. These
> are articles that mostly contain words that are part of my usual
> ham.?]
>
> Therefore, I wouldn't want the "spam and ham processors" to do anything
> when I exit a group. I want to carefully select which articles get to
> train the filter.

OK, then you don't want spam or ham groups, which are the only groups
where automatic action is taken.  Unclassified groups have the
behavior that only explicitly marked (by you) spam is processed by a
backend.

> As a consequence, the paragraph in the "Spam Package Introduction" node
> that reads:
>
> ,----
> | If the spam filter failed to mark a spam message, you can mark it
> | yourself, so that the message is processed as spam when you exit the
> | group:
> |
> | `M-d'
> | `M s x'
> | `S x'
> |      Mark current article as spam, showing it with the `$' mark
> |      (`gnus-summary-mark-as-spam').
> |
> | Similarly, you can unmark an article if it has been erroneously marked
> | as spam.  *Note Setting Marks::.
> `----
>
> would be misleading to users, because marking articles as ham or spam
> wouldn't make any difference in the absence of any action from the "spam
> and ham processors".

I'm not sure what you mean.  In any group, whatever articles are
marked as spam on exit, are processed as spam by the group's spam
backends.  Spam groups have some extra behavior here.  If the group is
unclassified (neither ham nor spam group) then no automatic spam
marking will be done, but the processing is always done.

> There's another thing in spam.el that doesn't seem to work the way I
> want:
>
> ,----
> | The second thing that the Spam package does when you exit a group is
> | to move ham articles out of spam groups, and spam articles out of ham
> | groups.  Ham in a spam group is moved to the group specified by the
> | variable `gnus-ham-process-destinations', or the group parameter
> | `ham-process-destination'.  Spam in a ham group is moved to the group
> | specified by the variable `gnus-spam-process-destinations', or the
> | group parameter `spam-process-destination'.
> `----
>
> This means that if, e.g., I had a ham that was classified as spam and I
> mark it as ham before leaving the group, then the article will be moved
> to the group specified by `gnus-ham-process-destinations'---regardless
> of the specific article.
>
> I prefer my way of doing that: if an article is misclassifed, there are
> two possibilities:
> - either I don't want to train the filter on the article (for
> instance, because several similar articles were misclassifed in a
> row and I already trained the filter on one of them). In this case,
> I usually simply use 'B m' to move the article manually to the right
> group.

OK.  This doesn't interfere with the spam.el processing.

> There is another possiblity that works well in the example I gave in
> the parenthesis: since the filter was trained on a similar article,
> you can expect it to classify the article correctly next time;
> therfore, you can call '(flo-spambayes-gnus-classify t)' in order
> to:
>
> 1. rerun the classifier on the article;
> 2. respool it afterwards (this is because of the "t" argument).
>
> The respooled article will eventually end up in the right group
> according to `nnmail-split-methods'.

We have a 'respool spam or ham destination which will do the
respooling you describe.  You can use it in addition to any spam
backends for that group.

> - or I use 'B s' (resp. 'B h') to tell the filter "Dude, this was
> spam!" (resp. "Dude, this was ham!"), i.e., I train the filter on
> the article. These key sequences, which are mapped to lambda
> expressions evaluating '(flo-spambayes-gnus-refile-as-spam t)' and
> '(flo-spambayes-gnus-refile-as-ham t)' respectively, do two things:
>
> 1. train the filter on the article;
> 2. respool it afterwards (this is because of the "t" argument).
>
> As a consequence, the article will (most probably) end up in the
> right group, according to `nnmail-split-methods'.
>
> [ I say "most probably", because it might be that the filter was so
> badly trained in the past that it still couldn't classify the
> article correctly the second time. This never happened to me, but
> I think it's possible. ]
>
> The key point here is that in either case, if the article was, e.g.,
> something for the ding mailing-list wrongly classified as spam when the
> incoming mail was split, it will end up directly in my "ding" group
> after the corrective actions I described, not in whichever group
> specified by `gnus-ham-process-destinations'.

I think you want immediate spam/ham processing and to see what
happened right away.  spam.el doesn't do that because it's very slow
for some filters, deferring the action to the time you exit the group
instead (batching all backend processing).  I think it could be done
for individual backends, or per group, though.

> Lastly, there's another thing I'm not sure about when reading the Info
> node:
>
> ,----
> | The Spam package divides Gnus groups into three categories: ham
> | groups, spam groups, and unclassified groups.
> `----
>
> What exactly do unclassified groups contain? With Spambayes, when you
> run an article through the classifer, it gets a spam score (between 0
> and 1) and a category depending on the spam score. There are three
> categories: ham, unsure and spam (from lowest score to highest score).
> "unsure" means the article got a score that is not low enough to be
> confident it's ham, and not high enough to be confident it's spam. But
> it surely doesn't mean the article wasn't _classifed_ (i.e., it did go
> through the classifier---whose output was "unsure"). That's why I'm not
> sure the "unclassified group" mentioned in the above sentence is
> well-suited for articles marked as "unsure" by Spambayes.

Spam groups: all unread messages are marked as spam when you enter.
Unclassified groups: no extra marking is done.
Ham groups: no extra marking is done.

All other differences are for summary exit processing.  So the type of
group has to do with marking and processing, and most of the work is
aimed at making sure that spam ends up in spam groups and processed by
a spam backend, and ham outside spam groups and processed by ham
backends.

> To rephrase it differently: you said a spam backend must provide a
> function that tells whether a message is ham or spam. But this is not
> suited to Spambayes, since there are 3 possible outcomes from the filter
> by default, not 2 (unless you tweak it to make the "unsure" score range
> vanish, but that would be silly in most cases).

Actually you can also return nil, which means "unsure" :)  In the
context of nnmail-split-methods that means "go to the next method."

spam.el tries to be very flexible, and the rules are aimed at making
the user's life easier.  If you think the docs or the workflow are
confusing, I'll be glad to take any suggestions you have.

Ted

From greetings at webmail.2000Greetings.com  Wed Nov 22 16:42:36 2006
From: greetings at webmail.2000Greetings.com (2000Greetings.com)
Date: Wed, 22 Nov 2006 16:42:36 +0100 (CET)
Subject: [spambayes-dev] you have received a 2000Greetings Card...
Message-ID: <20061122154236.6EC15833422@p15139065.pureserver.info>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20061122/8af9df6a/attachment.htm 

From koester at zpost.plala.or.jp  Thu Nov 23 12:12:36 2006
From: koester at zpost.plala.or.jp (koester)
Date: Thu, 23 Nov 2006 20:12:36 +0900
Subject: [spambayes-dev] Setup Problem
Message-ID: <000601c70ef0$484b9000$0400a8c0@landc9cc55d4f3>

In Outlook Express I can not get to the localhost:8880/config
Web Browser: page not found.
I did exacly according Your manual.

Regards, Stefan Koster-Hirose

PC 072-0222
Stefan Koester-Hirose
Biei-cho Aza Mita Dai2
Hokkaido - Japan

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20061123/1b5b6a80/attachment.htm 

From engineer.xyliu at gmail.com  Tue Nov 28 07:46:37 2006
From: engineer.xyliu at gmail.com (engineer.xyliu)
Date: Tue, 28 Nov 2006 14:46:37 +0800
Subject: [spambayes-dev] hi,nice to meet you.
Message-ID: <b331aba90611272246rbfc4130q53061e5b23b9f557@mail.gmail.com>

Hi,
   I'm doing reserach on AntiSpam personally, I need a lot of spam samples.
I have try a lot to incur spam, but to now, I can only get about 300 spam
per day.
   So, could you kind to help me with spam collection? I use
engineer.xyliu at gmail.com to collect spam. Please send as many as possible to
that mailbox. Or if you have spam archive, please send them to me to the
mailbox too.
   Thanks in Advance!
   Regards.
                       engineer.xyliu at gmail
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20061128/8a8da82f/attachment.html 

From richie at entrian.com  Wed Nov 29 00:14:08 2006
From: richie at entrian.com (Richie Hindle)
Date: Tue, 28 Nov 2006 23:14:08 +0000
Subject: [spambayes-dev] hi,nice to meet you.
In-Reply-To: <b331aba90611272246rbfc4130q53061e5b23b9f557@mail.gmail.com>
References: <b331aba90611272246rbfc4130q53061e5b23b9f557@mail.gmail.com>
Message-ID: <oggpm2lgjsmmqcpvrrj9p1tnppoqsa9mtc@4ax.com>

Hi,

[engineer.xyliu]
> I need a lot of spam samples.

I have a collection of about 41,000 spams collected over three years.  I
can't guarantee it's 100% clean, but it's all been hand-sorted so it will be
very close.  Note that some of the messages have Spambayes headers in them.

You can download it here: http://entrian.com/spam.zip

-- 
Richie Hindle
richie at entrian.com

From skip at pobox.com  Wed Nov 29 02:33:46 2006
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 28 Nov 2006 19:33:46 -0600
Subject: [spambayes-dev] hi,nice to meet you.
In-Reply-To: <oggpm2lgjsmmqcpvrrj9p1tnppoqsa9mtc@4ax.com>
References: <b331aba90611272246rbfc4130q53061e5b23b9f557@mail.gmail.com>
	<oggpm2lgjsmmqcpvrrj9p1tnppoqsa9mtc@4ax.com>
Message-ID: <17772.58234.565664.80455@montanaro.dyndns.org>


    Richie> [engineer.xyliu]
    >> I need a lot of spam samples.

    Richie> I have a collection of about 41,000 spams ...
    Richie> You can download it here: http://entrian.com/spam.zip

I have 17000+ spams collected since this July I am currently uploading to my
web server (lots of duplicates and near duplicates because several email
addresses funnel into my email address).  Like Richie's collection it's
going to have some SpamBayes headers.  It's going to take a little while to
complete the upload because it's going upstream on a cable modem link.  Once
it's done it will be at

    http://orca.mojam.com/~skip/hispam.NNN.gz

where NNN ranges from 000 through 018 inclusive.

Skip