From ta-meyer at ihug.co.nz  Sun Jan  2 06:13:58 2005
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Sun Jan  2 06:14:37 2005
Subject: [spambayes-dev] SpamBayes i18n
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBC1@its-xchg4.massey.ac.nz>

An update on the SpamBayes i18n progress:

 1.  I've checked in the changes to the main spambayes code to work with
gettext.  I haven't extensively checked this, so there may be bits that need
work.  I haven't made any change to the loading of the translation manager,
so it'll still be currently looking for an outlook_addin.mo file.

(As an aside: I think maybe one large messages.po file may be best - as time
progresses the overlap between the web interface and the Outlook plug-in
grows, and so most messages need to be translated for both, even though
that's a lot of work).

 2.  I've written up a "how to translate" section in README-DEVEL.txt.  I
think everything in there is correct, but I haven't really actually done any
translation work, so there may be errors.  It would be great if people
interested in doing translations could work through the steps outlined and
let me know if there are problems/mistakes.

<http://cvs.sourceforge.net/viewcvs.py/*checkout*/spambayes/spambayes/README
-DEVEL.txt?rev=1.17>

(The link will work once anonymous CVS catches up).

Thanks again to everyone that is willing to help with this effort!

=Tony.Meyer

From theller at python.net  Mon Jan  3 09:57:45 2005
From: theller at python.net (Thomas Heller)
Date: Mon Jan  3 09:56:27 2005
Subject: [spambayes-dev] sb_imapfilter fix
Message-ID: <r7l2dh4m.fsf@python.net>

Just guesswork:

Index: sb_imapfilter.py
===================================================================
RCS file: /cvsroot/spambayes/spambayes/scripts/sb_imapfilter.py,v
retrieving revision 1.50
diff -c -r1.50 sb_imapfilter.py
*** sb_imapfilter.py	23 Dec 2004 18:14:32 -0000	1.50
--- sb_imapfilter.py	3 Jan 2005 08:55:26 -0000
***************
*** 1087,1093 ****
              imap = IMAPSession(server, port, imapDebug, doExpunge)
  
          # Load stats manager.
!         stats = Stats(options, message_db)
          
          httpServer = UserInterfaceServer(options["html_ui", "port"])
          httpServer.register(IMAPUserInterface(classifier, imap, pwd,
--- 1087,1093 ----
              imap = IMAPSession(server, port, imapDebug, doExpunge)
  
          # Load stats manager.
!         stats = Stats.Stats(options, message_db)
          
          httpServer = UserInterfaceServer(options["html_ui", "port"])
          httpServer.register(IMAPUserInterface(classifier, imap, pwd,

From nviry at kerberos.fr  Mon Jan  3 23:31:32 2005
From: nviry at kerberos.fr (nviry@kerberos.fr)
Date: Mon Jan  3 23:31:35 2005
Subject: [spambayes-dev] Translation
Message-ID: <3248.81.56.10.70.1104791492.squirrel@www.kerberos.fr>

Hi,

sorry to spam you with this message. It contains the .rc (outlook plugin)
translated into french.
I'm new to the CVS tools so I think I need a password to write directly
the file on the server, this is why I send you the fire directly.
I used vi to translate the file.
HTML file will be next.

As I'm not an expert in .rc files, please double check the follwing lines :
> LANGUAGE LANG_FRENCH, SUBLANG_FRENCH_FR (2 times)

lines like this one where not translated, do the need translation ?
> CONTROL "Folder names...\nLine 2

Thanks, next post in a few days
Nicolas

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dialogs.rc
Type: application/octet-stream
Size: 33539 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20050103/5f0e2bc5/dialogs-0001.obj
From tameyer at ihug.co.nz  Tue Jan  4 01:16:29 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Tue Jan  4 01:17:41 2005
Subject: [spambayes-dev] sb_imapfilter fix
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801C2BDF3@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBCC@its-xchg4.massey.ac.nz>

> Just guesswork:
> 
> Index: sb_imapfilter.py 
[...]
> !         stats = Stats(options, message_db)
[...]
> !         stats = Stats.Stats(options, message_db)

Thanks; fixed - and I've added a test to test_sb_imapfilter.py to check that
the web interface works.

=Tony.Meyer

From market at cc.wwu.edu  Tue Jan  4 02:01:42 2005
From: market at cc.wwu.edu (TJ Olney)
Date: Tue Jan  4 02:01:45 2005
Subject: [spambayes-dev] sb_mboxtrain.py trashes some pine mailboxes
 interpreting them as only one message
Message-ID: <41D9EAF6.6080907@cc.wwu.edu>

Then leaving only the first message behind.
Is this a known problem?

Since it worked fine with the first few -g mailboxes I tried, I was 
pretty confident, but when I tried it on a couple of huge mailing list 
subscription files, it choked and left behind only the first message. 
This is unix and pine 4.61 uw-imapd.

It appears that this happens when an older form of email is in the 
folder that might not have the appropriate headers.
like this:
> ^?^?    (Fwd) Re: (Fwd) Re: Relationships Gone Sour: Divorce?
> 
> From @uga.cc.uga.edu:owner-crm-l@EMUVM1.CC.EMORY.EDU T

It then deletes all the rest of the mailbox.

My fault for using folders with such old messages in them, but I thought 
you should know.

I'm really looking forward to having this working!

TJ Olney

From hatukanezumi at users.sourceforge.net  Tue Jan  4 02:41:25 2005
From: hatukanezumi at users.sourceforge.net (Hatuka*nezumi)
Date: Tue Jan  4 02:41:34 2005
Subject: [spambayes-dev] Some problems about i18n
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801B0F4DE@its-xchg4.massey.ac.nz>
References: <ECBA357DDED63B4995F5C1F5CBE5B1E801B25D4E@its-xchg4.massey.ac.nz>
	<ECBA357DDED63B4995F5C1F5CBE5B1E801B0F4DE@its-xchg4.massey.ac.nz>
Message-ID: <20050104104125.7ee787f3.hatukanezumi@users.sourceforge.net>

On Wed, 22 Dec 2004 15:01:54 +1300
"Tony Meyer" <tameyer@ihug.co.nz> wrote:

> I had hoped that Hatuka Nezumi would have responded to the earlier message,
> but I haven't heard anything from him for a while (busy, perhaps).  He is
> leading the i18n process for SpamBayes (I'm helping and doing the checking
> in).

Sorry for no response.  I'm in new-year ('shogatsu' in japanese) 
vacation till 6 January.  I'll go back next week.

Problems for Japanese/CJK:
1. Recommended charset of Japanese e-mail message is ISO-2022-JP 
  (cf. RFC1468).  This charset isn't suitable for XML/XHTML parser
  and isn't compatible with Windows ANSI codepage (CP932 for 
  Japanese).
2. ISO-2022-* aren't suitable for spambayes tokenizer also.
3. More than one charsets may be used for messages of one language
  (e.g. ISO-8859-*, UTF-8 and UTF-7 for West-Latin.
  ISO-2022-JP, Shift_JIS, EUC-JP, UTF-8 and UTF-7 for Japanese).
4. In some East-asian languages (Japanese or Chinese), words are
  not space-separated then they won't be effectively tokenized.

Patch #824651 try to solve these problems.

For current i18n works, problem 1. should be solved at least.

I am planning to provide sub-patches related to each problems
(except problem 4.), converting message headers/bodies to suitable 
charset for tokenizer (Unicode), web interface (e.g. UTF-8) and 
Outlook plug-in (mbcs).  This solution also will provide really 
i18n'ized message handling.

Note that this solution can require bind_textdomain_codeset 
function for overlapping gettext catalog of web interface and 
Outlook plug-in.  But I'm not familiar with this function...

  --- nezumi
From tameyer at ihug.co.nz  Tue Jan  4 02:59:16 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Tue Jan  4 02:59:52 2005
Subject: [spambayes-dev] Translation
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801C2BDFA@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBCD@its-xchg4.massey.ac.nz>

> sorry to spam you with this message.

This is definitely not spam!

> It contains the .rc (outlook plugin) translated into french.

Great; thanks!

> I'm new to the CVS tools so I think I need a password to 
> write directly the file on the server, this is why I send
> you the file directly.

Yes, only the developers can make changes to CVS.  Sending patches/files
here is fine, or alternatively, you could submit a patch via the sourceforge
system: <http://sf.net/projects/spambayes>

> I used vi to translate the file.
> HTML file will be next.

Great :)  It all looks fine, although I had to make a few size changes where
the French was larger than the English.  I've checked this in, so CVS users
ought to be able to set their desired language to fr_FR and have the dialogs
(mostly) in French!

> As I'm not an expert in .rc files, please double check the 
> follwing lines :
> > LANGUAGE LANG_FRENCH, SUBLANG_FRENCH_FR (2 times)

It didn't matter for SpamBayes, but VC++ chokes on those, for some reason.
I've left them as the English versions, since we don't use it.

> lines like this one where not translated, do the need translation ?
> > CONTROL "Folder names...\nLine 2

Nope - you're right in leaving those (well, it wouldn't matter if they
were).  They're just placeholds for dynamically generated text.

Thanks heaps for the contribution!

=Tony.Meyer

From kenny.pitt at gmail.com  Wed Jan  5 16:11:22 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Wed Jan  5 16:11:27 2005
Subject: [spambayes-dev] Training problem in latest CVS
Message-ID: <41dc039b.23bd3560.3e0a.0ee2@smtp.gmail.com>

I trained an Unsure message this morning and was surprised that the score
didn't seem to change after it was moved back to my Inbox.  I looked in the
log file and found the following interesting lines:

"""
Bayes database initialized with 366 spam and 218 good messages
...
Moving and spam training message 'Who's Winning? ' -  Training on message
'Who's Winning? ' in 'Personal Folders/Possible Spam -  already was trained
as spam
Saving bayes database with 366 spam and 218 good messages
...
Recovering to folder 'Inbox' and ham training message 'Delta cuts U.S. air
fares up to 50% ' -  Training on message 'Delta cuts U.S. air fares up to
50% ' in 'Personal Folders/Possible Spam -  already was trained as good
Saving bayes database with 366 spam and 218 good messages
"""

Both of these were just-received messages that were classified as Unsure,
but notice that SpamBayes thinks they had already been trained.  Looks like
a bug may have snuck in with how we detect the training status of a message.
I'll look into it when I get a chance, but I'm hoping Tony will know what to
do since he has done a lot of work with the message info database lately.

-- 
Kenny Pitt

From tameyer at ihug.co.nz  Fri Jan  7 04:35:07 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Fri Jan  7 04:35:37 2005
Subject: [spambayes-dev] Training problem in latest CVS
In-Reply-To: <41dc039b.23bd3560.3e0a.0ee2@smtp.gmail.com>
Message-ID: <JIEMKPKMIMNMNCGKKKGOEEICCAAA.tameyer@ihug.co.nz>

> I trained an Unsure message this morning and was surprised that the score
> didn't seem to change after it was moved back to my Inbox.

I noticed this a couple of days ago, too, but didn't have the time to look
into it just then.

[...]
> I'll look into it when I get a chance, but I'm hoping Tony will know what
to
> do since he has done a lot of work with the message info database lately.

Feel free to leave it if you want - I have a reasonable idea of what is
going wrong (or "how I broke things") and what needs to be fixed.  I'd do it
now, but the Outlook installation on my main machine is broken and usable,
and I'm still trying to figure out how to fix it :(.  Once that's done (I
have hopes that it will be today, but if not, then Monday is the next time
I'll get a chance) I'll look into this.

=Tony.Meyer

From nviry at kerberos.fr  Sun Jan  9 18:57:08 2005
From: nviry at kerberos.fr (nviry@kerberos.fr)
Date: Sun Jan  9 18:57:12 2005
Subject: [spambayes-dev] Translation - HTML file
Message-ID: <3129.81.56.10.70.1105293428.squirrel@www.kerberos.fr>

Hi,

here is the html file attached.

I can translate the web site now. I've wgeted the web pages but they seem
to have been generated. Are the pages in the cvs directory ? If not, where
can I find them ?

Nicolas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050109/b77e439c/ui-0001.html
From tameyer at ihug.co.nz  Mon Jan 10 03:34:07 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Mon Jan 10 03:34:54 2005
Subject: [spambayes-dev] Translation - HTML file
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801C2C12B@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBE5@its-xchg4.massey.ac.nz>

> here is the html file attached.

Great - thanks!  I've checked this in.

> I can translate the web site now. I've wgeted the web pages 
> but they seem to have been generated. Are the pages in the 
> cvs directory ? If not, where can I find them ?

Yes they are in CVS.  There is a "website" module that you can checkout
which has the source in it.  (They are .ht files, apart from the FAQ, which
is .txt).

There hasn't been a translation of the SpamBayes website before.  I guess
the translation can just be a mirror in a (eg) fr directory?  So something
like:

<http://spambayes.org/fr/index.html>
<http://spambayes.org/fr/contact.html>

And then have links to the translations on the main (English) page at
<http://spambayes.org/index.html>?

If that sounds like the right way to go, then I can make the necessary
adjustments to the scripts that generate the html and copy it to the live
site.  (Whatever happens, if you just provide the translations, then the
rest of us can figure out where to put them).

Thanks again for the contribution!

=Tony.Meyer

From lenneis at wu-wien.ac.at  Mon Jan 10 19:20:30 2005
From: lenneis at wu-wien.ac.at (Joerg Lenneis)
Date: Mon Jan 10 19:31:34 2005
Subject: [spambayes-dev] sb_mailsort.py status
Message-ID: <vdbmzvh16z5.fsf@lyric.example.com>

Dear all,

I have only last week started to use Spambayes and I am very impressed
so far. This is my first attempt at spam filtering. I finally gave up,
my mail address has been around and used for ages, so without
filtering I get an insane amount of spam. I feared a not insignificant
number of false positives, but so far things have worked very well,
with no message classified as a false positive.

I use sb_mailsort.py for training and filtering (use a CDB database
for probabilities, sort into one of two Maildirs depending on wether a
message is above the spam threshhold or not) because it gives me a
failproof way of updating the probabilities and delivery of mails,
even over NFS. I am not concerened about the additional overhead that
the database is reconstructed from scratch on every training session.

I have noticed from CVS that sb_mailsort.py is somewhat dated now,
with the last update about 5 months ago. There are a couple of things
that might be useful, like being able to set the spam threshhold via
the command line. The Maildir algorithm could also be adapted somewhat
to conform more closely to the specification. 

Are patches to do this welcome here, or alternatively, is the original
author still interested to continue work on sb_mailsort.py?

best regards,


-- 

Joerg Lenneis

email: lenneis@wu-wien.ac.at

From nas at arctrix.com  Mon Jan 10 21:00:58 2005
From: nas at arctrix.com (Neil Schemenauer)
Date: Mon Jan 10 21:01:02 2005
Subject: [spambayes-dev] sb_mailsort.py status
In-Reply-To: <vdbmzvh16z5.fsf@lyric.example.com>
References: <vdbmzvh16z5.fsf@lyric.example.com>
Message-ID: <20050110200057.GB585@mems-exchange.org>

On Mon, Jan 10, 2005 at 07:20:30PM +0100, Joerg Lenneis wrote:
> I have noticed from CVS that sb_mailsort.py is somewhat dated now,
> with the last update about 5 months ago.

That's because it's such high quality code. ;-)

> There are a couple of things that might be useful, like being able
> to set the spam threshhold via the command line. The Maildir
> algorithm could also be adapted somewhat to conform more closely
> to the specification. 
> 
> Are patches to do this welcome here, or alternatively, is the original
> author still interested to continue work on sb_mailsort.py?

Patches are definitely welcome.  I'm extremely busy at the moment
but I will be glad to review changes.

I actually don't use sb_mailsort.py anymore.  The problem was that I
was receiving so much crap that I no longer could review the spam
folder.  I now have a SMTP reverse proxy that uses Spambayes and a
CDB database.  Bouncing high scoring spam is necessary for me
because I can't review it all.

  Neil
From sethg at GoodmanAssociates.com  Mon Jan 10 21:38:54 2005
From: sethg at GoodmanAssociates.com (Seth Goodman)
Date: Mon Jan 10 21:38:55 2005
Subject: [spambayes-dev] sb_mailsort.py status
In-Reply-To: <20050110200057.GB585@mems-exchange.org>
Message-ID: <MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com>

> From: Neil Schemenauer
> Sent: Monday, January 10, 2005 2:01 PM

<...>

> I now have a SMTP reverse proxy that uses Spambayes and a
> CDB database.  Bouncing high scoring spam is necessary for me
> because I can't review it all.

That's great!  I hope when you say "bounce" you actually mean reject at the
end of data.  Rejecting unwanted messages is the way to go, because even in
the rare event of a false positive, the sender gets a DSN and there's no
backscatter.  If that's what you are doing, please ignore the following,
which is for those who still send DSN's for spam.

Bouncing spam after acceptance is a real problem, even though false
positives would still get a DSN.  The problem is that in the majority of
spam, both the MAIL FROM: and the From: addresses are forged.  Sending a
bounce just abuses innocent third parties, in addition to giving the spammer
a second chance to get their payload delivered.

--

Seth Goodman

From tameyer at ihug.co.nz  Mon Jan 10 23:20:12 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Mon Jan 10 23:20:50 2005
Subject: [spambayes-dev] Training problem in latest CVS
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801C2C2D3@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBE8@its-xchg4.massey.ac.nz>

[Kenny Pitt]
> I trained an Unsure message this morning and was surprised 
> that the score didn't seem to change after it was moved back
> to my Inbox.

Ok, this ought to be fixed now.  Apologies for the delay - fixing my Outlook
install (dead profile, corrupted mapi32.dll) took longer than expected.

=Tony.Meyer

From popiel at wolfskeep.com  Mon Jan 10 23:35:39 2005
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Mon Jan 10 23:35:42 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: Message from "Seth Goodman" <sethg@goodmanassociates.com> 
	of "Mon, 10 Jan 2005 14:38:54 CST."
	<MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com> 
References: <MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com> 
Message-ID: <20050110223539.49FD12DDB6@cashew.wolfskeep.com>

In message:  <MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com>
             "Seth Goodman" <sethg@goodmanassociates.com> writes:
>
>Bouncing spam after acceptance is a real problem, even though false
>positives would still get a DSN.  The problem is that in the majority of
>spam, both the MAIL FROM: and the From: addresses are forged.  Sending a
>bounce just abuses innocent third parties, in addition to giving the spammer
>a second chance to get their payload delivered.

Unfortunately, bouncing spam after acceptance is increasingly unavoidable
for anyone who has a backup MX host as insurance against their primary
host being down.  Many spammers are targetting the secondary MX instead
of the primary MX... and a secondary MX sufficiently isolated from the
primary to actually be useful as a failover is likelyly to just accept,
queue, and relay.  When the primary then rejects the message from the
secondary, the secondary is stuck trying to deliver a DSN.

I'm currently working on a hack to postfix such that locally-generated
DSN messages cannot get deferred (the RFC says that you must generate
a DSN, but doesn't say that you have to try to deliver it more than once).
This will at least prevent my secondary MX from crumbling under the load
of bouncing spam sent to nonexistant addresses on my primary.  (These
spam DSNs frequently end up deferred because the purported source either
doesn't exist or issues a 400-series response to trying to deliver the
DSN... and the retries of these deferrals for 4 days is what pushes my
secondary over the edge.)

- Alex, peeved at having to hack his mail server because of the spammers

PS. No, I'm not willing to not have a secondary MX.  My primary does
    crash occasionally, though (thankfully) not as much as it used to
    before I replaced the motherboard.
From kenny.pitt at gmail.com  Mon Jan 10 23:43:14 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Mon Jan 10 23:43:17 2005
Subject: [spambayes-dev] Training problem in latest CVS
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBE8@its-xchg4.massey.ac.nz>
Message-ID: <41e30503.12405188.0671.130f@smtp.gmail.com>

Tony Meyer wrote:
> [Kenny Pitt]
>> I trained an Unsure message this morning and was surprised
>> that the score didn't seem to change after it was moved back
>> to my Inbox.
> 
> Ok, this ought to be fixed now.  Apologies for the delay - fixing my
> Outlook install (dead profile, corrupted mapi32.dll) took longer than
> expected. 

Works for me.  Thanks, Tony.

If anyone else verifies this fix, just make sure you don't try it on a
message that you had problems with before the fix.  The damage is already
done.

-- 
Kenny Pitt

From nas at arctrix.com  Mon Jan 10 23:44:42 2005
From: nas at arctrix.com (Neil Schemenauer)
Date: Mon Jan 10 23:44:45 2005
Subject: [spambayes-dev] sb_mailsort.py status
In-Reply-To: <20050110223539.49FD12DDB6@cashew.wolfskeep.com>
References: <MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com>
	<20050110223539.49FD12DDB6@cashew.wolfskeep.com>
Message-ID: <20050110224442.GB1491@mems-exchange.org>

On Mon, Jan 10, 2005 at 02:35:39PM -0800, T. Alexander Popiel wrote:
> When the primary then rejects the message from the secondary, the
> secondary is stuck trying to deliver a DSN.

You really should not generate DSNs, IMHO.  They will very likely be
sent to forged From addresses.  In that case, they are as bad as
spam.

> PS. No, I'm not willing to not have a secondary MX.  My primary does
>     crash occasionally, though (thankfully) not as much as it used to
>     before I replaced the motherboard.

I don't see how that makes a secondary MX necessary.  The sending
servers have outgoing queues and they will retry.

  Neil
From sethg at GoodmanAssociates.com  Tue Jan 11 01:43:21 2005
From: sethg at GoodmanAssociates.com (Seth Goodman)
Date: Tue Jan 11 01:43:22 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: <20050110223539.49FD12DDB6@cashew.wolfskeep.com>
Message-ID: <MHEGIFHMACFNNIMMBACAKEIMJEAA.sethg@GoodmanAssociates.com>

> From: T. Alexander Popiel [mailto:popiel@wolfskeep.com]
> Sent: Monday, January 10, 2005 4:36 PM

<...>

> Unfortunately, bouncing spam after acceptance is increasingly unavoidable
> for anyone who has a backup MX host as insurance against their primary
> host being down.

Anyone who runs a secondary MX with less security than the primary, i.e. no
list of real mailboxes, no FCrDNS, etc., might as well drop their anti-spam
measures on the primary, because:


> Many spammers are targetting the secondary MX instead of the primary
> MX...

as would anyone who wants to deliver a message that you don't want to accept
and therefore seeks out the MX with the weakest security.


> and a secondary MX sufficiently isolated from the primary to actually
> be useful as a failover is likelyly to just accept, queue, and relay.

It is perfectly reasonable to establish a secondary in another facility that
still has the same list of real mailboxes and the same incoming policy as
the primary.  This might rule out some providers of backup MX services, but
not all.  If you're a large operation, you should have complete control over
your secondary.  If you're a small operation, you might want to rethink if
it is really necessary to have a secondary MX if it is not possible to clone
the setup of your primary.  Hardware and network connections are more
reliable than they used to be and senders will queue your mail upon
temporary failures.


> When the primary then rejects the message from the
> secondary, the secondary is stuck trying to deliver a DSN.

This is exactly the situation you never want to be in.  A spammer should get
the same rejection at your secondary that they get at your primary.


<...>

> (These spam DSNs frequently end up deferred because the purported
> source either doesn't exist or issues a 400-series response to
> trying to deliver the DSN... and the retries of these deferrals for
> 4 days is what pushes my secondary over the edge.)

Another possibility is that the systems to whom you are sending bogus DSN's
are teergrubing you (forcing you to keep a socket and process alive for a
long time) as punishment for abuse.  Check your logs to see if these 4xx
transactions are taking a very long time.  Operating any MX in store and
forward mode and sending out DSN's to return addresses on spam that you
haven't confirmed are good during the SMTP session can easily turn you into
a spam reflector.  Even if the envelope return addresses on spam are valid,
they are likely to be joe-job forgeries, so you still don't want to send
DSN's in response to spam.

--

Seth Goodman

From sethg at GoodmanAssociates.com  Tue Jan 11 02:41:17 2005
From: sethg at GoodmanAssociates.com (Seth Goodman)
Date: Tue Jan 11 02:41:16 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: <20050110223539.49FD12DDB6@cashew.wolfskeep.com>
Message-ID: <MHEGIFHMACFNNIMMBACAKEINJEAA.sethg@GoodmanAssociates.com>

> From: T. Alexander Popiel [mailto:popiel@wolfskeep.com]
> Sent: Monday, January 10, 2005 4:36 PM

<...>

> PS. No, I'm not willing to not have a secondary MX.  My primary does
>     crash occasionally, though (thankfully) not as much as it used to
>     before I replaced the motherboard.

If you can't clone your primary setup onto your secondary and you can't live
without a secondary, here's another possibility.  Only accept mail at the
secondary when the primary is down.  This should greatly limit the damage,
since your primary will rarely be down.  If you refuse a connection at your
secondary MX and they don't retry at your primary, you can be pretty sure it
wasn't real mail.

--

Seth Goodman

From popiel at wolfskeep.com  Tue Jan 11 04:29:27 2005
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Tue Jan 11 04:29:31 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: Message from Neil Schemenauer <nas@arctrix.com> of "Mon,
	10 Jan 2005 17:44:42 EST." <20050110224442.GB1491@mems-exchange.org> 
References: <MHEGIFHMACFNNIMMBACAIEIGJEAA.sethg@GoodmanAssociates.com>
	<20050110223539.49FD12DDB6@cashew.wolfskeep.com>
	<20050110224442.GB1491@mems-exchange.org> 
Message-ID: <20050111032927.9079B2DDF3@cashew.wolfskeep.com>

In message:  <20050110224442.GB1491@mems-exchange.org>
             Neil Schemenauer <nas@arctrix.com> writes:
>On Mon, Jan 10, 2005 at 02:35:39PM -0800, T. Alexander Popiel wrote:
>> When the primary then rejects the message from the secondary, the
>> secondary is stuck trying to deliver a DSN.
>
>You really should not generate DSNs, IMHO.  They will very likely be
>sent to forged From addresses.  In that case, they are as bad as
>spam.

RFC 2821 requires DSNs if a site has accepted a message that is
subsequently discovered to be undeliverable:

# If an SMTP server has accepted the task of relaying the mail and
# later finds that the destination is incorrect or that the mail cannot
# be delivered for some other reason, then it MUST construct an
# "undeliverable mail" notification message and send it to the
# originator of the undeliverable mail (as indicated by the reverse-
# path).  Formats specified for non-delivery reports by other standards
# (see, for example, [24, 25]) SHOULD be used if possible.

Personally, I'm not willing to allow other people's anti-social
behavior to induce me to violate clearly specified standards.

>> PS. No, I'm not willing to not have a secondary MX.  My primary does
>>     crash occasionally, though (thankfully) not as much as it used to
>>     before I replaced the motherboard.
>
>I don't see how that makes a secondary MX necessary.  The sending
>servers have outgoing queues and they will retry.

Not all senders have outgoing queues (in particular, some mail clients
insist on trying to send mail direct to the destination, and have no
facility to queue until the destination is available).  Moreover, there
are times when my machine has been down for several days with hardware
failures, and while I can control the queue expiry on my secondary MX
(setting it to 30 days or so, when I know I'm going to be down a while),
I cannot control the expire times on any sender's queues.

- Alex
From popiel at wolfskeep.com  Tue Jan 11 04:48:34 2005
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Tue Jan 11 04:48:36 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: Message from "Seth Goodman" <sethg@GoodmanAssociates.com> 
	of "Mon, 10 Jan 2005 18:43:21 CST."
	<MHEGIFHMACFNNIMMBACAKEIMJEAA.sethg@GoodmanAssociates.com> 
References: <MHEGIFHMACFNNIMMBACAKEIMJEAA.sethg@GoodmanAssociates.com> 
Message-ID: <20050111034834.DF6CA2DDB6@cashew.wolfskeep.com>

In message:  <MHEGIFHMACFNNIMMBACAKEIMJEAA.sethg@GoodmanAssociates.com>
             "Seth Goodman" <sethg@GoodmanAssociates.com> writes:
>> From: T. Alexander Popiel [mailto:popiel@wolfskeep.com]
>> Sent: Monday, January 10, 2005 4:36 PM
>
><...>
>
>> Unfortunately, bouncing spam after acceptance is increasingly unavoidable
>> for anyone who has a backup MX host as insurance against their primary
>> host being down.

[...]

>It is perfectly reasonable to establish a secondary in another facility that
>still has the same list of real mailboxes and the same incoming policy as
>the primary.  This might rule out some providers of backup MX services, but
>not all.  If you're a large operation, you should have complete control over
>your secondary.  If you're a small operation, you might want to rethink if
>it is really necessary to have a secondary MX if it is not possible to clone
>the setup of your primary.  Hardware and network connections are more
>reliable than they used to be and senders will queue your mail upon
>temporary failures.

I'm an extremely small operation... this is my home box, maintained
in my spare time, and I trade secondary MX services with friends
on multiple continents.  Yes, I could export my list of valid addresses
to said friends, but it would still be a hack to the mail server to
obey that list for relay (unless I'm missing something in the postfix
docs, which is entirely possible).

>> When the primary then rejects the message from the
>> secondary, the secondary is stuck trying to deliver a DSN.
>
>This is exactly the situation you never want to be in.  A spammer should get
>the same rejection at your secondary that they get at your primary.

In a world of robust communication between primary and secondary,
yes... but that requires much more investment in the infrastructure
than I've had opportunity to make.

>> (These spam DSNs frequently end up deferred because the purported
>> source either doesn't exist or issues a 400-series response to
>> trying to deliver the DSN... and the retries of these deferrals for
>> 4 days is what pushes my secondary over the edge.)
>
>Another possibility is that the systems to whom you are sending bogus DSN's
>are teergrubing you (forcing you to keep a socket and process alive for a
>long time) as punishment for abuse.

1. The DSNs are not bogus; it's the messages that they're in response
   to that were bogus.

2. Since it's disk space that's the problem and not CPU time, teergrubing
   is not an issue.  (My home DSL link (or that of my secondary) isn't fat
   enough for it to be worth teergrubing me, anyway.)

3. It's unforunate when obeying the RFCs is considered abuse.

- Alex
From popiel at wolfskeep.com  Tue Jan 11 04:51:33 2005
From: popiel at wolfskeep.com (T. Alexander Popiel)
Date: Tue Jan 11 04:51:35 2005
Subject: [spambayes-dev] sb_mailsort.py status 
In-Reply-To: Message from "Seth Goodman" <sethg@GoodmanAssociates.com> 
	of "Mon, 10 Jan 2005 19:41:17 CST."
	<MHEGIFHMACFNNIMMBACAKEINJEAA.sethg@GoodmanAssociates.com> 
References: <MHEGIFHMACFNNIMMBACAKEINJEAA.sethg@GoodmanAssociates.com> 
Message-ID: <20050111035133.2F5C32DDB6@cashew.wolfskeep.com>

In message:  <MHEGIFHMACFNNIMMBACAKEINJEAA.sethg@GoodmanAssociates.com>
             "Seth Goodman" <sethg@GoodmanAssociates.com> writes:
>> From: T. Alexander Popiel [mailto:popiel@wolfskeep.com]
>> Sent: Monday, January 10, 2005 4:36 PM
>
><...>
>
>> PS. No, I'm not willing to not have a secondary MX.  My primary does
>>     crash occasionally, though (thankfully) not as much as it used to
>>     before I replaced the motherboard.
>
>If you can't clone your primary setup onto your secondary and you can't live
>without a secondary, here's another possibility.  Only accept mail at the
>secondary when the primary is down.  This should greatly limit the damage,
>since your primary will rarely be down.  If you refuse a connection at your
>secondary MX and they don't retry at your primary, you can be pretty sure it
>wasn't real mail.

... Or that there was another routing foulup in Sprint's Seattle hub.
Having some parts of the net able to reach me but not others happens
about once a quarter for an hour or two.

- Alex
From tameyer at ihug.co.nz  Tue Jan 11 23:20:09 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Tue Jan 11 23:48:50 2005
Subject: [spambayes-dev] RE: [Spambayes-checkins]
	spambayes/Outlook2000/dialogs/resourcesdialogs.h, 1.25,
	1.26 dialogs.rc, 1.50, 1.51
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801CD153F@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCBED@its-xchg4.massey.ac.nz>

> Make all resources use the same sub-language.  Due to editing 
> by developers from different countries, we had developed a mixture
> of English (United States) and English (Australia) resources.
> With apologies to Tony and Mark, I'm in the US so I standardized
> on English (United States).

No need to apologise to me - I don't think any NZer was ever in favour of
something Australian <wink>.

=Tony.Meyer

From luciagomes8475z at hotmail.com  Fri Jan 14 23:33:44 2005
From: luciagomes8475z at hotmail.com (Lucia Gomes)
Date: Fri Jan 14 23:33:46 2005
Subject: [spambayes-dev] listagem de e-mails
Message-ID: <20050114223343.C95191E4010@bag.python.org>

Mais Emails, venda online de listas de email, fazemos mala direta e 
propaganda de sua empresa ou neg?cio para milh?es de emails. Temos listas
de email Mala Direta, Mala-Direta, Cadastro de Emails, Lista de Emails,
Mailing List, Milh?es de Emails, Programas de Envio de Email, Email
Bombers, Extratores de Email, Listas Segmentadas de Email, Emails
Segmentados, Emails em Massa, E-mails

http://www.estacion.de/maladireta

Temos listas de email Mala Direta, Mala-Direta, Cadastro de Emails, Lista
de Emails, Mailing List, Milh?es de Emails, Programas de Envio de Email,
Email Bombers, Extratores de Email, Listas Segmentadas de Email, Emails
Segmentados, Emails em Massa, E-mails

http://www.estacion.de/maladireta
From theller at python.net  Tue Jan 18 17:00:58 2005
From: theller at python.net (Thomas Heller)
Date: Tue Jan 18 16:59:42 2005
Subject: [spambayes-dev] Re: Cannot find saved message
References: <8y91sszb.fsf@python.net>
	<JIEMKPKMIMNMNCGKKKGOMEAMCAAA.tameyer@ihug.co.nz>
	<vfc2oos5.fsf@python.net>
Message-ID: <llaqrakl.fsf@python.net>

Thomas Heller <theller@python.net> writes:

> "Tony Meyer" <tameyer@ihug.co.nz> writes:
>
>>>From time to time, I'm getting this traceback, in the sb_imapfilter:
>> [...]
>>>  File "sb_imapfilter.py", line 559, in Save
>>>    raise BadIMAPResponseError("Cannot find saved message", "")
>>> BadIMAPResponseError: The command 'Cannot find saved message'
>>> failed to give an OK response.
>> [...]
>>> Does anyone have a solution to this, before I examine this further?
>>
>> Not a solution, but there is the material in here:
>>
>> [ 1023797 ] Imapfilter fails: 'Cannot find saved message'
>> <https://sourceforge.net/tracker/?func=detail&aid=1023797&group_id=61702&atid=498103>
>>
>> I haven't managed to figure this one out yet, sorry.  (If you have the
>> time to, that would be great!). I believe the problem comes from the
>> way imapfilter now waits for an EXISTS message from the IMAP server
>> before trying to find the new message (this is to try and overcome a
>> problem the old version had with servers that wouldn't immediately
>> find new messages).
>>
>> However, if you're getting as far as 559, then an EXISTS response has
>> been received, but the newly created message isn't found anyway.
>> (Maybe a different message arrived, but the one we created isn't
>> available?  That would be wierd).
>>
>> Running with -i4 ought to give enough detail of the IMAP4 conversation
>> that you can see why its failing.  If you don't have time to look at
>> it, if you could attach your -i4 output to the tracker (removing your
>> username/password details) and remind me to get to this quickly, I'll
>> try and do that.
>
> Maybe related, maybe not - running with -i4 seems (?) to cure the
> problem.  At least is has not yet happended again.

The bad news is - it didn't.  The problem remains.

But the good news is: running sb_imapfilter (from CVS) with Python2.4
instead of 2.3 really fixed the problem.

Thomas

From tameyer at ihug.co.nz  Tue Jan 18 23:20:23 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Tue Jan 18 23:21:46 2005
Subject: [spambayes-dev] Re: Cannot find saved message
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801D488A6@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E8FBCC3F@its-xchg4.massey.ac.nz>

> >>>From time to time, I'm getting this traceback, in the 
> sb_imapfilter:
> >> [...]
> >>>  File "sb_imapfilter.py", line 559, in Save
> >>>    raise BadIMAPResponseError("Cannot find saved message", "")
> >>> BadIMAPResponseError: The command 'Cannot find saved 
> message' failed 
> >>> to give an OK response.
> >> [...]
> >>> Does anyone have a solution to this, before I examine 
> this further?
[...]
> The bad news is - it didn't.  The problem remains.
> 
> But the good news is: running sb_imapfilter (from CVS) with 
> Python2.4 instead of 2.3 really fixed the problem.

Were you using sb_imapfilter not from CVS before?  (i.e. was the fix using
Python 2.4, or using Python 2.4 *and* CVS imapfilter?).

The main thing that I can think of that changes with 2.4 is using email 3.0,
which means that there's no need for the unparseable message handling.  If
the fix was just changing to 2.4, then maybe the problem is just occurring
with malformed messages?  That would certainly be a good place for me to
start looking, at least :)

=Tony.Meyer

From ta-meyer at ihug.co.nz  Wed Jan 19 00:20:13 2005
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Wed Jan 19 00:20:55 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD3D@its-xchg4.massey.ac.nz>

Results for a couple of timcv.py tests that I've done recently are here:

<http://entrian.com/sbwiki/SpfTokenizing>
<http://entrian.com/sbwiki/DeAnagraming>

The former was in response to a request to tokenize the Received-SPF
headers.  I don't have a great deal of mail with those headers (and looking
at the specs, it's not clear whether they are still meant to be used).
Hardly anything changed, anyway, so it doesn't seem worth doing anything
with them at the moment.

The latter was prompted by a comment in JGC's latest newsletter (though I'm
sure I've seen this somewhere before, too).  To avoid deliberate
misspellings and the so-called 'cambridge effect' you replace each (or
generate a new) token that is made up of the letters in the original token
sorted into a constant order (e.g. alphabetical).  So "god" becomes "dgo",
but so does "dog".

I tried both replacing the original token and adding a new one, and tried
making the change in the headers, in the body, and both.  In the good cases
FPs weren't really effected, but FNs always increased, as did unsures, so
that with the effect of making the database harder to read, makes this a bad
idea it seems.

Anyway, just FYI :)

=Tony.Meyer

From skip at pobox.com  Wed Jan 19 02:23:13 2005
From: skip at pobox.com (Skip Montanaro)
Date: Wed Jan 19 03:00:14 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD3D@its-xchg4.massey.ac.nz>
References: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD3D@its-xchg4.massey.ac.nz>
Message-ID: <16877.46721.450856.19583@montanaro.dyndns.org>


    Tony> The latter was prompted by a comment in JGC's latest newsletter
    Tony> (though I'm sure I've seen this somewhere before, too).  

Who's JGC?

Has anyone tried de-l33t-ing words that contain numbers?

    http://www.bbc.co.uk/dna/h2g2/A787917

Skip
From ta-meyer at ihug.co.nz  Wed Jan 19 03:06:23 2005
From: ta-meyer at ihug.co.nz (Tony Meyer)
Date: Wed Jan 19 03:07:14 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801D48CF7@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD49@its-xchg4.massey.ac.nz>

> Who's JGC?

Sorry.  John Graham-Cumming of POPfile (<http://www.jgc.org/>).

> Has anyone tried de-l33t-ing words that contain numbers?
> 
>     http://www.bbc.co.uk/dna/h2g2/A787917

Not me.

=Tony.Meyer

From tim.peters at gmail.com  Wed Jan 19 03:13:57 2005
From: tim.peters at gmail.com (Tim Peters)
Date: Wed Jan 19 03:13:59 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD3D@its-xchg4.massey.ac.nz>
References: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD3D@its-xchg4.massey.ac.nz>
Message-ID: <1f7befae05011818136de1b412@mail.gmail.com>

[Tony Meyer]
> Results for a couple of timcv.py tests that I've done recently are
> here:

It's sure nice to see someone is still testing ideas!  It would be
even nicer if we could find a good one <wink>.

> <http://entrian.com/sbwiki/SpfTokenizing>
> <http://entrian.com/sbwiki/DeAnagraming>
>
> The former was in response to a request to tokenize the
> Received-SPF headers.  I don't have a great deal of mail with
> those headers (and looking at the specs, it's not clear whether
> they are still meant to be used).  Hardly anything changed,
> anyway, so it doesn't seem worth doing anything with them
> at the moment.

Indeed, I had to stare hard to find any difference at all.

> The latter was prompted by a comment in JGC's latest
> newsletter (though I'm sure I've seen this somewhere before,
> too).  To avoid deliberate misspellings and the so-called
> 'cambridge effect' you replace each (or generate a new) token
> that is made up of the letters in the original token sorted into a
> constant order (e.g. alphabetical).  So "god" becomes "dgo",
> but so does "dog".
>
> I tried both replacing the original token and adding a new one,
> and tried making the change in the headers, in the body, and
> both.  In the good cases FPs weren't really effected, but FNs
> always increased, as did unsures, so that with the effect of
> making the database harder to read, makes this a bad
> idea it seems.

Yup.  I see very little Camridbge Unvierstiy obfuscation, so I
wouldn't expect this to help.  In effect, replacing tokens with a
canonicalized form is a limited kind of hashing (mapping multiple
tokens to one), and the only kind of deliberate token-confusion that
ever won in tests was the "skip:" gimmick for very long tokens.

In the cases where you added the canonicalized form (in addition to
retaining the original form), it may have a bad interaction with the
bigram option (which I believe you use), destroying the natural
bigrams.  It would be clearer to turn bigrams off in that case.  But I
wouldn't expect it to help anyway.
From tameyer at ihug.co.nz  Wed Jan 19 03:31:00 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Wed Jan 19 03:31:24 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801D48CFF@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD4A@its-xchg4.massey.ac.nz>

[Tim Peters]
> It's sure nice to see someone is still testing ideas!

I was mother-in-law-sitting for a few days, so leaving the machine running
tests was an easy option :)

> It would be even nicer if we could find a good one <wink>.

It would be easier if the with-defaults results weren't so good, of course
:) If I find time to really try something out I suppose I ought to start by
staring at the mistakes that running with defaults is generating. 

> Indeed, I had to stare hard to find any difference at all.

Me too :)  I wondered if maybe I didn't have any of the things, so grepped
through the mail, but there were some (mostly from the same domains, which
probably means that there were clues being harvested already).

[...]
> In the cases where you added the canonicalized form (in addition to
> retaining the original form), it may have a bad interaction with the
> bigram option (which I believe you use), destroying the natural
> bigrams.  It would be clearer to turn bigrams off in that case. 

I ran them all with bigrams off, although I do have it on with the
classifier I actually use.

> But I wouldn't expect it to help anyway.

You'd be right :)

=Tony.Meyer

From adrian at apsistemas.info  Wed Jan 19 22:52:31 2005
From: adrian at apsistemas.info (Adrian Perello Marin)
Date: Wed Jan 19 22:54:13 2005
Subject: [spambayes-dev] Dialogs.rc TRANSLATION
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD4A@its-xchg4.massey.ac.nz>
Message-ID: <006401c4fe71$32ca6120$4501a8c0@samsungx10>

Please can you tell me if the original dialogs.rc must to be translated only
the part of

English (U.S.) resources or the English (Australia) resources must to be
translated too ??

Thanks.

From kenny.pitt at gmail.com  Wed Jan 19 23:49:12 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Wed Jan 19 23:49:22 2005
Subject: [spambayes-dev] Dialogs.rc TRANSLATION
In-Reply-To: <006401c4fe71$32ca6120$4501a8c0@samsungx10>
Message-ID: <41eee3ef.5cfa5707.1694.00a7@smtp.gmail.com>

Adrian Perello Marin wrote:
> Please can you tell me if the original dialogs.rc must to be
> translated only the part of
> 
> English (U.S.) resources or the English (Australia) resources must to
> be translated too ??

The latest CVS version of dialogs.rc should contain only English (U.S.)
resources.  I checked in an update a week or two ago to change all the
Australia resources back to U.S.

-- 
Kenny Pitt

From tameyer at ihug.co.nz  Thu Jan 20 02:44:21 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Thu Jan 20 02:44:56 2005
Subject: [spambayes-dev] RE: [Spambayes] Trained two times as much spam as
	ham
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801D48C58@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD52@its-xchg4.massey.ac.nz>

[Skip Montanaro]
> It might be useful to codify some of these ideas into a tool 
> the user can run to reduce training dataset sizes without 
> necessarily committing to the train-to-exhaustion concept.

I think it would be a good idea if we had a spambayes.training module that
contained various training code like this, code to do tte/nonedge/etc and so
forth.  contrib/tte.py (and maybe other new contrib/ or utilties/ scripts)
could just be the getopt stuff and then a few lines of code calling the
appropriate functions in spambayes.training, and the other scripts could
make use of the same code (I'd also like to have Outlook, sb_server and
sb_imapfilter to have a slightly higher abstraction for training to allow
for flexibility in what's done).

=Tony.Meyer

From sethg at GoodmanAssociates.com  Thu Jan 20 03:44:43 2005
From: sethg at GoodmanAssociates.com (Seth Goodman)
Date: Thu Jan 20 03:44:42 2005
Subject: [spambayes-dev] Dialogs.rc TRANSLATION
In-Reply-To: <41eee3ef.5cfa5707.1694.00a7@smtp.gmail.com>
Message-ID: <MHEGIFHMACFNNIMMBACAMELJJFAA.sethg@GoodmanAssociates.com>

> From: Kenny Pitt
> Sent: Wednesday, January 19, 2005 4:49 PM

<...>

> The latest CVS version of dialogs.rc should contain only English (U.S.)
> resources.  I checked in an update a week or two ago to change all the
> Australia resources back to U.S.

Thanks for doing the translation.

--

Seth Goodman

From skip at pobox.com  Fri Jan 21 03:25:38 2005
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 21 04:00:16 2005
Subject: [spambayes-dev] minor csv module problem
Message-ID: <16880.26658.474204.913311@montanaro.dyndns.org>

In my message training I train into a pickle (faster at that point), then
use sb_dbexpimp to dump it to a csv file.  For use by sb_bnfilter I then
convert that to a Berkeley db file.  (The csv file also serves as a
convenient debug/interchange format.)  The Python csv module is used both to
write and read the csv file.  Unfortunately, it seems to have a bug.  It
generates this line:

    "subject:          \r",0,1\r

(\r subbing for the real CR), which it later refuses to read because it
thinks there is a newline inside the string.  This is a long-standing bug as
far as I can tell.  I can reproduce it with Python 2.3 and 2.4, though is
fixed in the latest CVS, probably as a side-effect of the recent changes to
the csv module.

I imagine we'll get the csv problem fixed (hopefully by the 2.3.5 release),
but that doesn't help SpamBayes in the short term, so I think a workaround
is in order.  The problem is a token generated that ends with a \r
character.  One spam's subject is:

    '=?iso-2022-jp?B?k36LeILdgs2DRYNug0WDbiAgICAgICAgICAN?='

After decoding by email.Header.decode_header we have

    '\x93~\x8bx\x82\xdd\x82\xcd\x83E\x83n\x83E\x83n          \r'

The tokenizer generates this token as part of its output:

    'subject:          \r'

Perhaps we could replace '\r' with ' ' in the subject before tokenizing
without losing much/any accuracy.  I don't believe we can get whitespace in
body tokens.

Skip
From tameyer at ihug.co.nz  Fri Jan 21 04:08:43 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Fri Jan 21 04:09:19 2005
Subject: [spambayes-dev] minor csv module problem
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801D492F3@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD60@its-xchg4.massey.ac.nz>

> Perhaps we could replace '\r' with ' ' in the subject before 
> tokenizing without losing much/any accuracy.  I don't believe
> we can get whitespace in body tokens.

+1.

(I presume that this is a nicer solution than having our own csv subclass
that has the problem fixed?)

=Tony.Meyer

From skip at pobox.com  Fri Jan 21 05:43:01 2005
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 21 06:00:12 2005
Subject: [spambayes-dev] minor csv module problem
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD60@its-xchg4.massey.ac.nz>
References: <ECBA357DDED63B4995F5C1F5CBE5B1E801D492F3@its-xchg4.massey.ac.nz>
	<ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD60@its-xchg4.massey.ac.nz>
Message-ID: <16880.34901.565759.516302@montanaro.dyndns.org>


    >> Perhaps we could replace '\r' with ' ' in the subject before
    >> tokenizing without losing much/any accuracy.  I don't believe we can
    >> get whitespace in body tokens.

    Tony> +1.

    Tony> (I presume that this is a nicer solution than having our own csv
    Tony> subclass that has the problem fixed?)

Well, given that the bug is in the underlying _csv extension module, I
suspect so. ;-)

Checked in as tokenizer.py 1.34.

Skip


From skip at pobox.com  Fri Jan 21 21:12:13 2005
From: skip at pobox.com (Skip Montanaro)
Date: Fri Jan 21 21:43:41 2005
Subject: [spambayes-dev] "approximately" the same size
Message-ID: <16881.25117.750274.132042@montanaro.dyndns.org>


When we tell people not to let their ham/spam imbalance get too bad, we are
referring to the number of messages trained.  There is another way to look
at this imbalance though: number of tokens generated from each stream.  For
me, ham messages are much larger on average than spam messages.
Consequently, for roughly the same number of tokens to come from each
stream, I need more spams than hams.  Is there some way to tell how this
might affect scoring?  Is it relevant to the scoring?

ATM, I have nearly three times as many spams as hams in my training set:

    % egrep '^From ' newham.old | wc -l
          93 
    % egrep '^From ' newspam.old | wc -l
         267 

but the hams contribute approximately the same number of unique tokens as
the spams:

    >>> from spambayes import mboxutils, tokenizer
    >>> hs = set()           
    >>> ss = set()
    >>> for msg in mboxutils.getmbox("newham.old"):
    ...    hs |= set(tokenizer.tokenize(msg))
    ... 
    >>> for msg in mboxutils.getmbox("newspam.old"):
    ...    ss |= set(tokenizer.tokenize(msg))
    ... 
    >>> len(hs)
    20360
    >>> len(ss)
    24734

Most tokens are unique to one set or the other:

    >>> len(ss & hs)
    5205
    >>> len(ss - hs)
    19529
    >>> len(hs - ss)
    15155

Skip
From kenny.pitt at gmail.com  Mon Jan 24 15:58:58 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Mon Jan 24 15:59:02 2005
Subject: [spambayes-dev] "approximately" the same size
In-Reply-To: <16881.25117.750274.132042@montanaro.dyndns.org>
Message-ID: <41f50d33.17f5d111.78b9.0051@smtp.gmail.com>

Skip Montanaro wrote:
> When we tell people not to let their ham/spam imbalance get too bad,
> we are referring to the number of messages trained.  There is another
> way to look at this imbalance though: number of tokens generated from
> each stream.  For me, ham messages are much larger on average than
> spam messages. Consequently, for roughly the same number of tokens to
> come from each stream, I need more spams than hams.  Is there some
> way to tell how this might affect scoring?  Is it relevant to the
> scoring? 

Mathematically, the total number of tokens should have no effect on the
probabilities.  We only count a token once per message, and we divide the
number of messages that have contained the token by the total number of
messages.  The total number of tokens never figures into the calculation at
all.

It would be interesting to know, though, if this type of imbalance might
skew the selection of the significant tokens that figure into the
calculation of the final score.  If there are significantly more ham tokens
in the training, is it more likely that the 150 significant tokens chosen
will also have a higher percentage of ham tokens?

-- 
Kenny Pitt

From skip at pobox.com  Mon Jan 24 16:29:02 2005
From: skip at pobox.com (Skip Montanaro)
Date: Mon Jan 24 17:00:55 2005
Subject: [spambayes-dev] "approximately" the same size
In-Reply-To: <41f50d33.17f5d111.78b9.0051@smtp.gmail.com>
References: <16881.25117.750274.132042@montanaro.dyndns.org>
	<41f50d33.17f5d111.78b9.0051@smtp.gmail.com>
Message-ID: <16885.5182.803670.71266@montanaro.dyndns.org>


    Kenny> Mathematically, the total number of tokens should have no effect
    Kenny> on the probabilities.  We only count a token once per message,
    Kenny> and we divide the number of messages that have contained the
    Kenny> token by the total number of messages.  The total number of
    Kenny> tokens never figures into the calculation at all.

Still, it seems to me the number of unique tokens seen (and the overlap
between those seen in ham and those in spam) must have some effect on the
effectiveness of the algorithm.  The more disjoint the set of tokens
appearing in hams and spams are the easier it should be to distinguish ham
from spam.  If there are 1000 tokens that appear in ham and 100 tokens that
appear in spam, is it more likely that the intersection of the two
approximates the set of spam tokens?

    Kenny> It would be interesting to know, though, if this type of
    Kenny> imbalance might skew the selection of the significant tokens that
    Kenny> figure into the calculation of the final score.  If there are
    Kenny> significantly more ham tokens in the training, is it more likely
    Kenny> that the 150 significant tokens chosen will also have a higher
    Kenny> percentage of ham tokens?

That's sort of what I was thinking (though my thought was not as
well-formed).

So, getting back to the original problem.  Assume I have tried hard to
maintain a nearly 1:1 ham:spam ratio.  Given that most hams are much larger
than most spams, there will be many more tokens found in hams than tokens
found in spams.  Most tokens seen in spams will have been seen in some hams,
thus lessening their effectiveness

A corollary thought: Given H and S, the sets of ham and spam tokens,
respectively, what would be effect of simply deleting their intersection
from the database?

Skip

From t-meyer at ihug.co.nz  Tue Jan 25 23:05:08 2005
From: t-meyer at ihug.co.nz (Tony Meyer)
Date: Tue Jan 25 23:05:31 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD75@its-xchg4.massey.ac.nz>

If no-one objects, I'd like to put 1.0.2 out tomorrow (all going well).
There are a couple of web interface bugs that are particularly annoying
(changing config problem and display when the config path contains on of
'<>&'), plus various minor fixes.  It should also work with Python 2.4, and
(this is the main change for Outlook users) the binary will be build with
2.4.

At this point, I'm not particularly interested in continuing work for a
1.0.3 release (although if 1.1 takes a long time, then maybe that will
change), so this would the last in the 1.0.x line.

If you have any changes you want in 1.0.2, please let me know so I can hold
off the build.  Otherwise I'll put it together later today and put a release
up on sourceforge and announce it here so that maybe one or two people can
give it a go (there are so few changes that it ought to be reasonably safe)
before a proper announcement tomorrow.

After that, I'd like to try and get a 1.1a1 out the door so that people can
try it out (there are heaps of changes - checkins that date from May last
year!).  I had hoped to get this out by the end of the month, but that's
rapidly approaching...my rough plan is this:

  1.1a1:  End of January (31st maybe, since that's a holiday here)
  1.1a2:  End of February
  1.1b1:  Mid March (assuming both alphas go well)
  1.1rc1: Start of April
  1.1:    Early April

Are there any things not done yet that people would like to see in 1.1?  If
I recall the process correctly, when 1.1b1 goes out we consider the trunk
frozen (apart from bugfixes) and once 1.1 is out we cut a branch for it and
unfreeze the trunk.

Things I'm planning on doing before 1.1a1, if I can find the time:

 * I'm not 100% sure that the ZODBClassifier and (particularly)
ZEOClassifier storages classes are working exactly as they should.

Things I'm planning on doing before 1.1a2, if I can find the time:

 * Finishing up getting at least basic unit test scripts done for all the
spambayes package.

 * For the binary, updating the installer script to include sb_imapfilter
and sb_pop3dnd and a few minor changes that have been suggested by people.

 * It would be great to have at least one translation completely done.  We
currently have most of French and some of Spanish.  All I can do is check
the stuff in, of course.

Any others?  Or any suggestions for changes to the proposed schedule?

=Tony.Meyer

From tameyer at ihug.co.nz  Wed Jan 26 03:28:40 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Wed Jan 26 03:28:45 2005
Subject: [spambayes-dev] More stupid beats smart timcv.py results
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DA8DDC@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD84@its-xchg4.massey.ac.nz>

[Tony Meyer, last week]
> The latter was prompted by a comment in JGC's latest 
> newsletter (though I'm sure I've seen this somewhere before, 
> too).  To avoid deliberate misspellings and the so-called 
> 'cambridge effect' you replace each (or generate a new) token 
> that is made up of the letters in the original token sorted 
> into a constant order (e.g. alphabetical).  So "god" becomes 
> "dgo", but so does "dog".

At the MIT Spam Conference John mentioned (offhand, regarding something
else) that POPFile does this just for words that are longer than 6
characters.  Since I already had the stuff at hand, I gave this a go, in
case the poor results were just from those short words.

Compared to all-defaults, fp and fn were unchanged and unsure rose 0.03%.
So the verdict is unchanged.

(I can post cmp.py or table.py results if anyone is interested, but there's
nothing really interesting here).

=Tony.Meyer

From tameyer at ihug.co.nz  Wed Jan 26 05:22:54 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Wed Jan 26 05:23:00 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DA97F1@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD88@its-xchg4.massey.ac.nz>

[Tony Meyer]
> I'll put it together 
> later today and put a release up on sourceforge and announce 
> it here so that maybe one or two people can give it a go 
> (there are so few changes that it ought to be reasonably 
> safe) before a proper announcement tomorrow.

I've done this.  It seems unlikely that anyone is really after anything
extra to go into 1.0.2, but if there is, speak up and I'll redo the builds
tomorrow (NZ time).  Otherwise, please feel free to download the source or
binary and give it a spin.

I've got some more testing of it myself to do tomorrow, and barring any
problems cropping up, I'll put out an announcement probably tomorrow
afternoon.

<https://sourceforge.net/project/showfiles.php?group_id=61702&package_id=581
41&release_id=299864>

=Tony.Meyer

From kenny.pitt at gmail.com  Wed Jan 26 16:05:12 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Wed Jan 26 16:05:29 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD75@its-xchg4.massey.ac.nz>
Message-ID: <41f7b1aa.5556a87b.7801.0383@smtp.gmail.com>

Tony Meyer wrote:
> After that, I'd like to try and get a 1.1a1 out the door so that
> people can try it out (there are heaps of changes - checkins that
> date from May last year!).  I had hoped to get this out by the end of
> the month, but that's rapidly approaching...my rough plan is this:
> 
>   1.1a1:  End of January (31st maybe, since that's a holiday here)
>   1.1a2:  End of February
>   1.1b1:  Mid March (assuming both alphas go well)
>   1.1rc1: Start of April
>   1.1:    Early April
> 
> Are there any things not done yet that people would like to see in
> 1.1?

Just a couple of minor ones.  Unfortunately, I'm on a tight deadline at work
until at least the end of January.

> Things I'm planning on doing before 1.1a1, if I can find the time:

I'd like to get a tab added to the Manager for configuring the notification
sounds.  It may be just edit boxes for the filenames at first (no browse),
but I think we need something if general users are going to be able to take
advantage of it.

> Things I'm planning on doing before 1.1a2, if I can find the time:

Given that we haven't been able to solve the bsddb corruption/run-recovery
problem, I wanted to try something with automatic backups to at least
provide a recovery mechanism.  My idea was to detect if the database is
still good when shutting down, and if so make a backup copy of the .db file.
When starting up, if the training data is corrupt then try to restore this
"known good" backup and then attempt the open again.

>  * It would be great to have at least one translation completely
> done.  We currently have most of French and some of Spanish.  All I
> can do is check the stuff in, of course.

Speaking of translations, have you been able to build a binary since the
translation support was added?  Whenever I try to run setup_all.py, I get
the following error:

"""
running py2exe
running build_py
error: package directory 'spambayes\resources' does not exist
"""

-- 
Kenny Pitt

From t-meyer at ihug.co.nz  Thu Jan 27 03:55:15 2005
From: t-meyer at ihug.co.nz (Tony Meyer)
Date: Thu Jan 27 03:56:20 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801E202F1@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD97@its-xchg4.massey.ac.nz>

As most will probably have seen, I've done the 1.0.2 release.  I
smoke-tested the source (I've run it lots of times recently, and there are
few changes) and did a bit of testing with the binary (turns out using
Python 2.4 wasn't as simple as expected).

[Tony Meyer]
>> Are there any things not done yet that people would like to see in
>> 1.1?

[Kenny Pitt]
> Just a couple of minor ones.  Unfortunately, I'm on a tight
> deadline at work until at least the end of January.

Even with my rough plan there's a month between a1 and a2, so plenty of time
to add more :)  I think it'd be good to get something out though, since the
'deeper' changes can be tested already and people can give us feedback about
various changes.

> I'd like to get a tab added to the Manager for configuring
> the notification sounds.  It may be just edit boxes for the 
> filenames at first (no browse), but I think we need something 
> if general users are going to be able to take advantage of it.

+1.  Browse shouldn't be too hard - you should be able to just call 
CreateFileDialog, right?  I agree that this would be nice before a1, so 
that people get a feel for the interface.  I can hold off until it's 
done (I'm fairly busy with stuff at the moment, too).

> Given that we haven't been able to solve the bsddb
> corruption/run-recovery problem,

I've completely given up and am happy in my ZODB.FileStorage world <0.1
wink>.  I'd like to include the necessary ZODB stuff in a a1 build so that
people can give ZODB a go if they would like, assuming that it doesn't bloat
the installer too much.  However, there still isn't a ZODB release for
Python 2.4, so I'd have to build it myself, which I'm not too sure about.
Maybe one is coming soon.  (Tim would know, I expect).

> I wanted to try something 
> with automatic backups to at least provide a recovery 
> mechanism.  My idea was to detect if the database is still 
> good when shutting down,

How?  Back when I was playing with things (and I think Richie found this
too, but it's a long time back) I found that you could corrupt the database
and the RUN_RECOVERY error wouldn't be triggered until several accesses
later.

> and if so make a backup copy of the 
> .db file. When starting up, if the training data is corrupt 
> then try to restore this "known good" backup and then attempt 
> the open again.

Making a backup could be worth a go, though.  Maybe this ought to be an
(Outlook experimental) option?  It will almost double the amount of disk
space that the plug-in requires.

> Speaking of translations, have you been able to build a
> binary since the translation support was added?  Whenever I 
> try to run setup_all.py, I get the following error:
> 
> """
> running py2exe
> running build_py
> error: package directory 'spambayes\resources' does not exist """

That's unrelated to the translation stuff.  It's a distutils change, IIRC -
I checked in a fix to the 1.0.x branch, but never got around to checking it
into HEAD.  I'll do that now.

=Tony.Meyer

From kenny.pitt at gmail.com  Thu Jan 27 17:40:39 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Thu Jan 27 17:40:53 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD97@its-xchg4.massey.ac.nz>
Message-ID: <41f9198c.367236f1.373e.0927@smtp.gmail.com>

Tony Meyer wrote:
>> I'd like to get a tab added to the Manager for configuring
>> the notification sounds.  It may be just edit boxes for the
>> filenames at first (no browse), but I think we need something
>> if general users are going to be able to take advantage of it.
> 
> +1.  Browse shouldn't be too hard - you should be able to just call
> CreateFileDialog, right?

Unfortunately, that's kind of the sticking point.  CreateFileDialog comes
from win32ui, not win32gui, and win32ui requires the MFC dlls.  After all
the work that was going on around the time I joined the project to eliminate
the need for win32ui, I hate to add it all back just for one options page.

>> Given that we haven't been able to solve the bsddb
>> corruption/run-recovery problem,
> 
> I've completely given up and am happy in my ZODB.FileStorage world
> <0.1 wink>.  I'd like to include the necessary ZODB stuff in a a1
> build so that people can give ZODB a go if they would like, assuming
> that it doesn't bloat the installer too much.

I played around a little with a SQLite classifier also, which is very
lightweight to install.  It was pretty slow when working with the 2.8
version because of the way we do our commits, but I'd like to give it
another go with the 3.0 version of SQLite to see if the situation has
improved.  Maybe I can get this in for 1.1a2 so that it could get some
testing.  Right now, I don't know if it would really be any more stable than
bsd or not.

>> I wanted to try something
>> with automatic backups to at least provide a recovery
>> mechanism.  My idea was to detect if the database is still
>> good when shutting down,
> 
> How?  Back when I was playing with things (and I think Richie found
> this too, but it's a long time back) I found that you could corrupt
> the database and the RUN_RECOVERY error wouldn't be triggered until
> several accesses later.

That could definitely be a problem.  It's still just an idea at this point,
so figuring out how to make it work is still a ways off. <wink>  My idea was
to check only when shutting down, so unless the corruption was caused by the
shutdown itself we would probably be OK.  It should, at the very least,
significantly reduce the window for total loss of training data.

>> and if so make a backup copy of the
>> .db file. When starting up, if the training data is corrupt
>> then try to restore this "known good" backup and then attempt
>> the open again.
> 
> Making a backup could be worth a go, though.  Maybe this ought to be
> an (Outlook experimental) option?  It will almost double the amount
> of disk space that the plug-in requires.

Yeah, I definitely want to make it optional.  It not only increases the disk
space, but also increases the time it takes to shut down.

-- 
Kenny Pitt

From tameyer at ihug.co.nz  Fri Jan 28 03:37:44 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Fri Jan 28 03:37:48 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801EC72EA@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFDB2@its-xchg4.massey.ac.nz>

[Tony Meyer]
>> Browse shouldn't be too hard - you should be able to just call
>> CreateFileDialog, right?

[Kenny Pitt]
> Unfortunately, that's kind of the sticking point.  
> CreateFileDialog comes from win32ui, not win32gui, and 
> win32ui requires the MFC dlls.

Ah - I didn't realise that CreateFileDialog was in win32ui and not win32gui.
Does this mean all common dialogs need win32ui, or can we just call them
(CFileDialog) explicitly ourselves?  (With something like
win32api.LoadLibrary).

> I played around a little with a SQLite classifier also
[...]
> Right now, I don't know if it would really be any 
> more stable than bsd or not.

That's the trouble - I never get an bsddb corruption problems anymore, so I
have no idea if I would trigger problems with other storage methods either.
It would be good if we could have a range available at least in the 1.1
alphas anyway.  Maybe in 1.2 the storage method could even be exposed via
the GUI (in Advanced options).

=Tony.Meyer

From TBrigham at venocoinc.com  Fri Jan 28 00:11:31 2005
From: TBrigham at venocoinc.com (TBrigham@venocoinc.com)
Date: Fri Jan 28 03:44:01 2005
Subject: [spambayes-dev] Multiple Windows User Profiles
Message-ID: <6AC011B1CB7FD411BC78001083FC582602183627@venocosrv.venocoinc.com>

Skipped content of type multipart/alternative-------------- next part --------------
A non-text attachment was scrubbed...
Name: javaacro.gif
Type: image/gif
Size: 44668 bytes
Desc: not available
Url : http://mail.python.org/pipermail/spambayes-dev/attachments/20050127/28d2620c/javaacro-0001.gif
From t-meyer at ihug.co.nz  Fri Jan 28 04:10:46 2005
From: t-meyer at ihug.co.nz (Tony Meyer)
Date: Fri Jan 28 04:10:51 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFD97@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFDB8@its-xchg4.massey.ac.nz>

[Kenny Pitt]
>> I'd like to get a tab added to the Manager for configuring the 
>> notification sounds.
[...]

[Tony Meyer]
> I can hold off until it's done (I'm fairly busy with stuff at the
> moment, too).

BTW, since I used up the time I would have spent doing the remaining things
that I personally want done for 1.1a1 putting together 1.0.3, it's fairly
likely that I won't get to 1.1a1 until the end of next week.

=Tony.Meyer

From kenny.pitt at gmail.com  Fri Jan 28 17:39:03 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Fri Jan 28 17:39:10 2005
Subject: [spambayes-dev] RE: [Spambayes-checkins] spambayes/windows
	spambayes.iss, 1.18, 1.19
In-Reply-To: <E1CuKwW-0005jf-EG@sc8-pr-cvs1.sourceforge.net>
Message-ID: <41fa6aaa.7f1eb6f4.388b.0d46@smtp.gmail.com>

Tony Meyer wrote:
> Modified Files:
> 	spambayes.iss
> 
> +   UsagePage := CreateInputOptionPage(UserPage.ID,
> +     'Personal Information', 'How will you use My Program?',
> +     'Please specify how you would like to use My Program, then
> click Next.', +     True, False);
> +   UsagePage.Add('Light mode (no ads, limited functionality)');
> +   UsagePage.Add('Sponsored mode (with ads, full functionality)');
> +   UsagePage.Add('Paid mode (no ads, full functionality)');

Dare I even ask what this stuff is all about? <wink>

BTW, py2exe doesn't put MSVCR71.dll in the dist/bin by default so the
InnoSetup script won't compile initially.  Is copying MSVCR71.DLL to
py2exe/dist/bin just a manual step that needs to be done before running
Inno?  If so, we may want to include that in README-DEVEL.txt.

-- 
Kenny Pitt

From kenny.pitt at gmail.com  Fri Jan 28 17:39:03 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Fri Jan 28 17:39:11 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFDB2@its-xchg4.massey.ac.nz>
Message-ID: <41fa6aac.2be75911.388b.0d47@smtp.gmail.com>

Tony Meyer wrote:
> [Tony Meyer]
>>> Browse shouldn't be too hard - you should be able to just call
>>> CreateFileDialog, right?
> 
> [Kenny Pitt]
>> Unfortunately, that's kind of the sticking point.
>> CreateFileDialog comes from win32ui, not win32gui, and
>> win32ui requires the MFC dlls.
> 
> Ah - I didn't realise that CreateFileDialog was in win32ui and not
> win32gui. Does this mean all common dialogs need win32ui, or can we
> just call them (CFileDialog) explicitly ourselves?  (With something
> like win32api.LoadLibrary).

There's a win32gui.GetOpenFileName() method that provides a lower-level
interface to the open dialog.  It requires you to manually construct the
Win32 API OPENFILENAME structure and pass it in as a string, and I haven't
had time to figure out how to do that yet.  If Mark is still listening to
this list, maybe he will chime in with a pointer to some more info on this.

-- 
Kenny Pitt

From tvarnedoe at earthlink.net  Fri Jan 28 18:55:43 2005
From: tvarnedoe at earthlink.net (T Varnedoe)
Date: Fri Jan 28 18:55:47 2005
Subject: [spambayes-dev] Spambayes single installation for multiple mail
	clients
Message-ID: <20050128175546.3A18F1E4004@bag.python.org>

Question: Is there a way to install a single instance of Spambayes and have
it work with 2 mail clients installed on the same computer? I.e. Outlook
Express v2003 (Corporate email) client and Outlook 2003 (Personal email)
clients. Thanks in advance for your time and efforts on my behalf.
 
Best Regards
Tom V
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050128/106a73fe/attachment.html
From kenny.pitt at gmail.com  Fri Jan 28 19:09:56 2005
From: kenny.pitt at gmail.com (Kenny Pitt)
Date: Fri Jan 28 19:10:00 2005
Subject: [spambayes-dev] Spambayes single installation for multiple
	mailclients
In-Reply-To: <20050128175546.3A18F1E4004@bag.python.org>
Message-ID: <41fa7ff5.3b11a490.7801.192e@smtp.gmail.com>

Outlook Express and Outlook process mail in entirely different ways.
Although the same core SpamBayes classification engine is used, different
SpamBayes applications are needed to get access to the incoming mail and
provide training and configuration.
 
It is possible to configure both of the SpamBayes applications to use the
same training database, but this is not necessarily a good idea. The Outlook
Addin and the POP3 or IMAP applications for Outlook Express have slightly
different views of the incoming mail which will have an effect on the spam
clues that are generated by each. This could potentially have a negative
impact on your accuracy. You would also need to make sure that you never run
both SpamBayes applications at the same time because it can cause your
training data to become corrupted if two applications try to update the same
database file at the same time.
 
-- 
Kenny Pitt
 

  _____  

From: spambayes-dev-bounces@python.org
[mailto:spambayes-dev-bounces@python.org] On Behalf Of T Varnedoe
Sent: Friday, January 28, 2005 12:56 PM
To: spambayes-dev@python.org
Subject: [spambayes-dev] Spambayes single installation for multiple
mailclients


Question: Is there a way to install a single instance of Spambayes and have
it work with 2 mail clients installed on the same computer? I.e. Outlook
Express v2003 (Corporate email) client and Outlook 2003 (Personal email)
clients. Thanks in advance for your time and efforts on my behalf.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/spambayes-dev/attachments/20050128/180d5cff/attachment.htm
From tameyer at ihug.co.nz  Sun Jan 30 23:15:52 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Sun Jan 30 23:15:56 2005
Subject: [spambayes-dev] RE: [Spambayes-checkins]
	spambayes/windowsspambayes.iss, 1.18, 1.19
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801EC7647@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFDC2@its-xchg4.massey.ac.nz>

[Tony Meyer]
>> Modified Files:
>> 	spambayes.iss
>> 
>> +   UsagePage := CreateInputOptionPage(UserPage.ID,
>> +     'Personal Information', 'How will you use My Program?',
>> +     'Please specify how you would like to use My Program, then
>> click Next.', +     True, False);
>> +   UsagePage.Add('Light mode (no ads, limited functionality)');
>> +   UsagePage.Add('Sponsored mode (with ads, full functionality)');
>> +   UsagePage.Add('Paid mode (no ads, full functionality)');

[Kenny Pitt]
> Dare I even ask what this stuff is all about? <wink>

Opps.  As you've guessed, I had decided to try and update spambayes.iss for
Inno 5.x, and accidentally checked that stuff in.  I'll revert it soonish.

> BTW, py2exe doesn't put MSVCR71.dll in the dist/bin by 
> default so the InnoSetup script won't compile initially.  Is 
> copying MSVCR71.DLL to py2exe/dist/bin just a manual step 
> that needs to be done before running Inno?  If so, we may 
> want to include that in README-DEVEL.txt.

I'm still trying to get my head around what to do with msvcr71.dll.  Thomas
Heller on the py2exe-users list said that he thinks that (in his IANAL
opinion) you need a license to redistribute mscvr71.dll.  If that's the
case, then we can't include it with SpamBayes (I'll do a 1.0.4 with Python
2.3, and 1.1a1 can be built with 2.3 as well), AFAICT.

If it is legit to include it, then we need to figure where to source it.
Either it's not distributed with Python or the Python install puts it in
windows\system32 (I haven't had a chance to check).  (Thomas's (again,
IANAL) opinion was that it was legit for Python to redistribute the dll.
There was some discussion of this a while back on python-dev, I believe).
If Python does install it, then we can just source it from wherever it gets
put (the setup_all.py script can do this).  If Python doesn't install it,
but we are going to, then a manual copy is probably the only option, and
we'll just have to update README-DEVEL.txt.

Using Python 2.4 has turned out to be a right PITA, really, and I wish I had
just stuck with 2.3.  It is tempting to give up using 2.4, and just include
email 3.0 instead (IIRC that's a reasonably simple option), since that's the
primary reason for using 2.4.

=Tony.Meyer

From tameyer at ihug.co.nz  Sun Jan 30 23:34:17 2005
From: tameyer at ihug.co.nz (Tony Meyer)
Date: Sun Jan 30 23:34:22 2005
Subject: [spambayes-dev] 1.0.2 and 1.1a1
In-Reply-To: <ECBA357DDED63B4995F5C1F5CBE5B1E801EC7641@its-xchg4.massey.ac.nz>
Message-ID: <ECBA357DDED63B4995F5C1F5CBE5B1E801DAFDC6@its-xchg4.massey.ac.nz>

> There's a win32gui.GetOpenFileName() method that provides a 
> lower-level interface to the open dialog.  It requires you to 
> manually construct the Win32 API OPENFILENAME structure and 
> pass it in as a string, and I haven't had time to figure out 
> how to do that yet.

There's an example here:

<http://mail.python.org/pipermail/pythonce/2002-October/000204.html>

It's for Windows CE, I think, but is probably more-or-less the same :)

=Tony.Meyer